from:"Jan van der Laan"

Re: [R] Is it possible to get a downward pointing solid triangle plotting symbol in R?

2023-10-06 Thread Jan van der Laan

Another thing that I considered, but doesn't seem to be supported, is 
rotating the symbols. I noticed that that does work with text. So you 
could use a arrow symbol and then specify the angle aesthetic. But this 
still relies on text and unfortunately there are no arrowlike symbols in 
ASCII: except perhaps 'V'.


I can't say how the support for non-ascii text is over different OS-es 
and localities. 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Encoding-issues 
gives some 'hints'





On 06-10-2023 14:21, Chris Evans via R-help wrote:
Thanks again Jan.  That is lovely and clean and I probably should have 
seen that option.


I had anxieties about the portability of using text.  (The function will 
end up in my
https://github.com/cpsyctc/CECPfuns package so I'd like it to be fairly 
immune to character

sets and different platforms in different countries.

I'm morphing this question a lot now but I guess it's still on topic 
really.  I know
I need to put in some time to understand the complexities of R and 
platforms (I'm
pretty exclusively on Linux, Ubuntu or Debian now so have mostly done 
the ostrich thing
about these issues though I do hit problems exchanging things with my 
Spanish speaking
colleagues).  Jan or anyone: any simple reassurance or pointers to 
resources I should

best use for homework about these issues?

TIA (again!)

Chris

On 06/10/2023 12:55, Jan van der Laan wrote:

You are right, sorry.

Another possible solution then: use geom_text instead of geom_point 
and use a triangle shape as text:


ggplot(data = tmpTibPoints,
   aes(x = x, y = y)) +
  geom_polygon(data = tmpTibAreas,
   aes(x = x, y = y, fill = a)) +
  geom_text(data = tmpTibPoints,
 aes(x = x, y = y, label = "▼", color = c),
 size = 6) + guides(color = FALSE)


[much snipped]




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is it possible to get a downward pointing solid triangle plotting symbol in R?

2023-10-06 Thread Jan van der Laan


You are right, sorry.

Another possible solution then: use geom_text instead of geom_point and 
use a triangle shape as text:


ggplot(data = tmpTibPoints,
   aes(x = x, y = y)) +
  geom_polygon(data = tmpTibAreas,
   aes(x = x, y = y, fill = a)) +
  geom_text(data = tmpTibPoints,
 aes(x = x, y = y, label = "▼", color = c),
 size = 6) + guides(color = FALSE)



On 06-10-2023 12:11, Chris Evans via R-help wrote:
Sadly, no.  Still shows the same legend with both sets of fill 
mappings.  I have found a workaround, sadly
much longer than yours (!) that does get me what I want but it is a real 
bodge.  Still interested to see
if there is a way to create a downward pointing solid symbol but here is 
my bodge using new_scale_fill()
and new_scale_color() from the ggnewscale package (many thanks to Elio 
Campitelli for that).


library(tidyverse)
library(ggnewscale) # allows me to change the scales used
tibble(x = 2:9, y = 2:9,
    ### I have used A:C to ensure the changes sort in the correct 
order to avoid the messes of using shape to scale an ordinal variable
    ### have to say that seems a case where it is perfectly sensible 
to map shapes to an ordinal variable, scale_shape_manual() makes

    ### this difficult hence this bodge
    c = c(rep("A", 5), "B", rep("C", 2)),
    change = c(rep("Deteriorated", 5), "No change", rep("Improved", 
2))) %>%

   ### this is just keeping the original coding but not used below
   mutate(change = ordered(change,
   levels = c("Deteriorated", "No change", 
"Improved"))) -> tmpTibPoints

### create the area mapping
tibble(x = c(1, 5, 5, 1), y = c(1, 1, 5, 5), a = rep("a", 4)) -> 
tmpTibArea1
tibble(x = c(5, 10, 10, 5), y = c(1, 1, 5, 5), a = rep("b", 4)) -> 
tmpTibArea2
tibble(x = c(1, 5, 5, 1), y = c(5, 5, 10, 10), a = rep("c", 4)) -> 
tmpTibArea3
tibble(x = c(5, 10, 10, 5), y = c(5, 5, 10, 10), a = rep("d", 4)) -> 
tmpTibArea4

bind_rows(tmpTibArea1,
   tmpTibArea2,
   tmpTibArea3,
   tmpTibArea4) -> tmpTibAreas
### now plot
ggplot(data = tmpTib,
    aes(x = x, y = y)) +
   geom_polygon(data = tmpTibAreas,
    aes(x = x, y = y, fill = a),
    alpha = .5) +
   scale_fill_manual(name = "Areas",
     values = c("orange", "purple", "yellow", "brown"),
     labels = letters[1:4]) +
   ### next two lines use ggnewscale functions to reset the scale mappings
   new_scale_fill() +
   new_scale_colour() +
   ### can now use the open triangles and fill aesthetic to map them
   geom_point(data = tmpTibPoints,
  aes(x = x, y = y, shape = c, fill = c, colour = c),
  size = 6) +
   ### use the ordered variable c to get mapping in desired order
   ### which, sadly, isn't the alphabetical order!
   scale_shape_manual(name = "Change",
    values = c("A" = 24,
   "B" = 23,
   "C" = 25),
    labels = c("Deteriorated",
   "No change",
   "Improved")) +
   scale_colour_manual(name = "Change",
    values = c("A" = "red",
   "B" = "grey",
   "C" = "green"),
    labels = c("Deteriorated",
   "No change",
   "Improved")) +
   scale_fill_manual(name = "Change",
    values = c("A" = "red",
   "B" = "grey",
   "C" = "green"),
    labels = c("Deteriorated",
   "No change",
   "Improved"))

That gives the attached plot which is really what I want.  Long bodge 
though!*

*

On 06/10/2023 11:50, Jan van der Laan wrote:


Does adding

, show.legend = c("color"=TRUE, "fill"=FALSE)

to the geom_point do what you want?

Best,
Jan

On 06-10-2023 11:09, Chris Evans via R-help wrote:

library(tidyverse)
tibble(x = 2:9, y = 2:9, c = c(rep("A", 5), rep("B", 3))) -> 
tmpTibPoints
tibble(x = c(1, 5, 5, 1), y = c(1, 1, 5, 5), a = rep("a", 4)) -> 
tmpTibArea1
tibble(x = c(5, 10, 10, 5), y = c(1, 1, 5, 5), a = rep("b", 4)) -> 
tmpTibArea2
tibble(x = c(1, 5, 5, 1), y = c(5, 5, 10, 10), a = rep("c", 4)) -> 
tmpTibArea3
tibble(x = c(5, 10, 10, 5), y = c(

Re: [R] Is it possible to get a downward pointing solid triangle plotting symbol in R?

2023-10-06 Thread Jan van der Laan




Does adding

, show.legend = c("color"=TRUE, "fill"=FALSE)

to the geom_point do what you want?

Best,
Jan

On 06-10-2023 11:09, Chris Evans via R-help wrote:

library(tidyverse)
tibble(x = 2:9, y = 2:9, c = c(rep("A", 5), rep("B", 3))) -> tmpTibPoints
tibble(x = c(1, 5, 5, 1), y = c(1, 1, 5, 5), a = rep("a", 4)) -> 
tmpTibArea1
tibble(x = c(5, 10, 10, 5), y = c(1, 1, 5, 5), a = rep("b", 4)) -> 
tmpTibArea2
tibble(x = c(1, 5, 5, 1), y = c(5, 5, 10, 10), a = rep("c", 4)) -> 
tmpTibArea3
tibble(x = c(5, 10, 10, 5), y = c(5, 5, 10, 10), a = rep("d", 4)) -> 
tmpTibArea4

bind_rows(tmpTibArea1,
   tmpTibArea2,
   tmpTibArea3,
   tmpTibArea4) -> tmpTibAreas
ggplot(data = tmpTib,
    aes(x = x, y = y)) +
   geom_polygon(data = tmpTibAreas,
    aes(x = x, y = y, fill = a)) +
   geom_point(data = tmpTibPoints,
  aes(x = x, y = y, fill = c),
  pch = 24,
  size = 6)


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] overlay shaded area in base r plot

2023-09-19 Thread Jan van der Laan


Shorter/simpler alternative for adding a alpha channel

adjustcolor("lightblue", alpha = 0.5)


So I would use something like:


# Open new plot; make sure limits are ok; but don't plot
plot(0, 0, xlim=c(1,20),
  ylim = range(c(mean1+sd1, mean2+sd2, mean1-sd1, mean2-sd2)),
  type="n", las=1,
  xlab="Data",
  ylab=expression(bold("Val")),
  cex.axis=1.2,font=2,
  cex.lab=1.2)
polygon(c(1:20,20:1),
  c(mean1[1:20]+sd1[1:20],mean1[20:1]-sd1[20:1]),
  col=adjustcolor("blue", 0.5),
  border = NA)
polygon(c(1:20,20:1),
  c(mean2[1:20]+sd2[1:20],mean2[20:1]-sd2[20:1]),
  col=adjustcolor("yellow", 0.5),
  border = NA)
lines(1:20, mean1,lty=1,lwd=2,col="blue")
lines(1:20, mean2,lty=1,lwd=2,col="yellow")


On 19-09-2023 09:16, Ivan Krylov wrote:

В Tue, 19 Sep 2023 13:21:08 +0900
ani jaya  пишет:


polygon(c(1:20,20:1),c(mean1[1:20]+sd1[1:20],mean1[20:1]),col="lightblue")
polygon(c(1:20,20:1),c(mean1[1:20]-sd1[1:20],mean1[20:1]),col="lightblue")
polygon(c(1:20,20:1),c(mean2[1:20]+sd2[1:20],mean2[20:1]),col="lightyellow")
polygon(c(1:20,20:1),c(mean2[1:20]-sd2[1:20],mean2[20:1]),col="lightyellow")


If you want the areas to overlap, try using a transparent colour. For
example, "lightblue" is rgb(t(col2rgb("lightblue")), max = 255) →
"#ADD8E6", so try setting the alpha (opacity) channel to something less
than FF, e.g., "#ADD8E688".

You can also use rgb(t(col2rgb("lightblue")), alpha = 128, max = 255)
to generate hexadecimal colour strings for a given colour name and
opacity value.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Obtaining R-squared from All Possible Combinations of Linear Models Fitted

2023-07-18 Thread Jan van der Laan


The dredge function has a `extra` argument to get other statistics:

optional additional statistics to be included in the result, provided as 
functions, function names or a list of such (preferably named or 
quoted). As with the rank argument, each function must accept as an 
argument a fitted model object and return (a value coercible to) a 
numeric vector. This could be, for instance, additional information 
criteria or goodness-of-fit statistics. The character strings "R^2" and 
"adjR^2" are treated in a special way and add a likelihood-ratio based 
R² and modified-R² to the result, respectively (this is more efficient 
than using r.squaredLR directly).


HTH
Jan



On 17-07-2023 19:24, Paul Bernal wrote:

Dear friends,

I need to automatically fit all possible linear regression models (with all
possible combinations of regressors), and found the MuMIn package, which
has the dredge function.

This is the dataset  I am working with:

dput(final_frame)

structure(list(y = c(41.9, 44.5, 43.9, 30.9, 27.9, 38.9, 30.9,
28.9, 25.9, 31, 29.5, 35.9, 37.5, 37.9), x1 = c(6.6969, 8.7951,
9.0384, 5.9592, 4.5429, 8.3607, 5.898, 5.6039, 4.9176, 6.2712,
5.0208, 5.8282, 5.9894, 7.5422), x4 = c(1.488, 1.82, 1.5, 1.121,
1.175, 1.777, 1.24, 1.501, 0.998, 0.975, 1.5, 1.225, 1.256, 1.69
), x8 = c(22, 50, 23, 32, 40, 48, 51, 32, 42, 30, 62, 32, 40,
22), x2 = c(1.5, 1.5, 1, 1, 1, 1.5, 1, 1, 1, 1, 1, 1, 1, 1.5),
 x7 = c(3, 4, 3, 3, 3, 4, 3, 3, 4, 2, 4, 3, 3, 3)), class =
"data.frame", row.names = c(NA,
-14L))

I started with the all regressor model, which I called globalmodel as
follows:
#Fitting Regression model with all possible combinations of regressors
options(na.action = "na.fail") # change the default "na.omit" to prevent
models
globalmodel <- lm(y~., data=final_frame)

Then, the following code provides the different coefficients (for
regressors and the intercept) for each of the possible model combinations:
combinations <- dredge(globalmodel)
print(combinations)
  I would like to retrieve  the R-squared generated by each combination, but
have not been able to get it thus far.

Any guidance on how to retrieve the R-squared from all linear model
combinations would be greatly appreciated.

Kind regards,
Paul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting directly to memory?

2023-05-28 Thread Jan van der Laan




Perhaps the ragg package? That has an `agg_capture` device "that lets 
you access the device buffer directly from your R session." 
https://github.com/r-lib/ragg


HTH,
Jan




On 28-05-2023 13:46, Duncan Murdoch wrote:
Is there a way to open a graphics device that plots entirely to an array 
or raster in memory?  I'd prefer it to use base graphics, but grid would 
be fine if it makes a difference.


For an explicit example, I'd like to do the equivalent of this:

   filename <- tempfile(fileext = ".png")
   png(filename)
   plot(1:10, 1:10)
   dev.off()

   library(png)
   img <- readPNG(filename)

   unlink(filename)


which puts the desired plot into the array `img`, but I'd like to do it 
without needing the `png` package or the temporary file.


A possibly slightly simpler request would be to do this only for 
plotting text, i.e. I'd like to rasterize some text into an array.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nth kludge

2023-03-09 Thread Jan van der Laan


Hi Avi, list,

Below an alternative suggestion:

func <- function(a, b, c) {
  list(a, b, c)
}

1:3 |> list(x = _) |> with(func(a, x, b))


Not sure if this is more readable than some of the other solutions, e.g. 
your solution, but you could make a variant of with more specific for 
this use case:


named <- function(expr, ...) {
  eval(substitute(expr), list(...), enclos = parent.frame())
}

then you can do:

1:3 |> named(func(1, x, mean(x)), x= _)

or perhaps you can even simplify further using the same strategy:


dot <- function(.,  expr) {
  eval(substitute(expr), list(. = .), enclos = parent.frame())
}

1:3 |> dot(func(1, ., mean(.)))

This seams simpler than the lambda notation and more general than your 
solution. Not sure if this has any drawbacks.


HTH,
Jan



On 08-03-2023 21:23, avi.e.gr...@gmail.com wrote:

I see many are not thrilled with the concise but unintuitive way it is
suggested you use with the new R pipe function.
  
I am wondering if any has created one of a family of functions that might be

more intuitive if less general.
  
Some existing pipes simply allowed you to specify where in an argument list

to put the results from the earlier pipeline as in:
  
. %>% func(first, . , last)
  
In the above common case, it substituted into the second position.
  
What would perhaps be a useful variant is a function that does not evaluate

it's arguments and expects a first argument passed from the pipe and a
second argument that is a number like 2 or 3 and  a third  argument that is
the (name of) a function and remaining arguments.
  
The above might look like:
  
. %>% the_nth(2, func, first , last)
  
The above asks to take the new implicitly passed first argument which I will

illustrate with a real argument as it would also work without a pipeline:
  
the_nth(piped, 2, func, first, last)
  
So it would make a list out of the remaining arguments that looks like

list(first, last) and interpolate piped at position 2 to make list(first,
piped, last) and then use something like do.call()
  
do.call(func, list(first, piped, last))
  
I am not sure if this is much more readable, but seems like a

straightforward function to write, and perhaps a decent version could make
it into the standard library some year that is general and more useful than
the darn anonymous lambda notation.
  


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] foreign package: unable to read S-Plus objects

2023-01-17 Thread Jan van der Laan

You could try to see what stattransfer can make of it. They have a free 
version that imports only part of the data. You could use that to see if 
stattransfer would help and perhaps discover what format it is in.


HTH
Jsn


On 16-01-2023 23:22, Joseph Voelkel wrote:

Dear foreign maintainers and others,

I am trying to import a number of S-Plus objects into R. The only way I see how 
to do this is by using the foreign package.

However, when I try to do this I receive an error message. A snippet of code 
and the error message follows:

read.S(file.path(Spath, "nrand"))
Error in read.S(file.path(Spath, "nrand")) : not an S object

I no longer know the version of S-Plus in which these objects were created. I 
do know that I have printed documentation, dated July 2001, from S-Plus 6; and 
that all S-Plus objects were created in the 9/2004 -- 5/2005 range.

I am afraid that I simply have S-Plus objects that are not the S version 3 
files that the foreign package can read, yes? But I am still hoping that it may 
be possible to read these in.

I am not attaching some sample S-Plus objects to this email, because I  believe 
they will be stripped away as binary files. However, a sample of these files 
may be found at

https://drive.google.com/drive/folders/1wFVa972ciP44Ob2YVWfqk8SGIodzAXPv?usp=sharing
  (simdat is the largest file, at 469 KB)

Thank you for any assistance you may provide.

R 4.2.2
Microsoft Windows [Version 10.0.22000.1455]
foreign_0.8-83


Joe Voelkel
Professor Emeritus
RIT

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading very large text files into R

2022-09-29 Thread Jan van der Laan

You're sure the extra column is indeed an extra column? According to the 
documentation 
(https://artefacts.ceda.ac.uk/badc_datadocs/ukmo-midas/RH_Table.html) 
there should be 15 columns.


Could it, for example, be that one of the columns contains records with 
commas?


Jan



On 29-09-2022 15:54, Nick Wray wrote:

Hello   I may be offending the R purists with this question but it is
linked to R, as will become clear.  I have very large data sets from the UK
Met Office in notepad form.  Unfortunately,  I can’t read them directly
into R because, for some reason, although most lines in the text doc
consist of 15 elements, every so often there is a sixteenth one and R
doesn’t like this and gives me an error message because it has assumed that
every line has 15 elements and doesn’t like finding one with more.  I have
tried playing around with the text document, inserting an extra element
into the top line etc, but to no avail.

Also unfortunately you need access permission from the Met Office to get
the files in question so this link probably won’t work:

https://catalogue.ceda.ac.uk/uuid/bbd6916225e7475514e17fdbf11141c1

So what I have done is simply to copy and paste the text docs into excel
csv and then read them in, which is time-consuming but works.  However the
later datasets are over the excel limit of 1048576 lines.  I can paste in
the first 1048576 lines but then trying to isolate the remainder of the
text doc to paste it into a second csv doc is proving v difficult – the
only way I have found is to scroll down by hand and that’s taking ages.  I
cannot find another way of editing the notepad text doc to get rid of the
part which I have already copied and pasted.

Can anyone help with a)ideally being able to simply read the text tables
into R  or b)suggest a way of editing out the bits of the text file I have
already pasted in without laborious scrolling?

Thanks Nick Wray

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to represent tree-structured values

2022-05-30 Thread Jan van der Laan

For visualising hierarchical data a treemap can also work well. For 
example, using the treemap package:


n <- 1000

library(data.table)
library(treemap)

dta <- data.table(
  level1 = sample(LETTERS[1:5], n, replace = TRUE),
  level2 = sample(letters[1:5], n, replace = TRUE),
  level3 = sample(1:9, n, replace = TRUE),
  event = sample(0:1, n, replace = TRUE)
  )

tab <- dta[, .(n = .N, rate = sum(event)/.N),
  by = .(level1, level2, level3)]

treemap(tab, index = names(tab)[1:3], vSize = "n", vColor = "rate",
  type = "value", fontsize.labels = 20*c(1, 0.7, 0))


--

Jan




On 30-05-2022 11:40, Jim Lemon wrote:

Hi Richard,
Thinking about this, you might also find intersectDiagram, also in
plotrix, to be useful.

Jim

On Mon, May 30, 2022 at 4:37 PM Jim Lemon  wrote:

Hi Richard,
Some years ago I had a try at illustrating Multiple Causes of Death
(MCoD) data. I settled on what is sometimes called a "sizetree". You
can see some examples in the sizetree function help page in "plotrix".
Unfortunately I can't use the original data as it was confidential.

Jim

On Mon, May 30, 2022 at 2:55 PM Richard O'Keefe  wrote:

There is a kind of data I run into fairly often
which I have never known how to represent in R,
and nothing I've tried really satisfies me.

Consider for example
  ...
  - injuries
...
- injuries to limbs
  ...
  - injuries to extremities
...
- injuries to hands
  - injuries to dominant hand
  - injuries to non-dominant hand
...
  ...
...

This isn't ordinal data, because there is no
"left to right" order on the values.  But there
IS a "part/whole" order, which an analysis should
respect, so it's not pure nominal data either.

As one particular example, if I want to
tabulate data like this, an occurrence of one
value should be counted as an occurrence of
*every* superordinate value.

Examples of such data include "why is this patient
being treated", "what drug is this patient being
treated with", "what geographic region is this
school from", "what biological group does this
insect belong to".

So what is the recommended way to represent
and the recommended way to analyse such data in R?

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization of loops in R

2021-11-17 Thread Jan van der Laan


Have a look at the base functions tapply and aggregate.

For example see:
- 
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#The-function-tapply_0028_0029-and-ragged-arrays 
,

- https://online.stat.psu.edu/stat484/lesson/9/9.2,
- or ?tapply and ?aggregate.

Also your current code seems to contain an error: `s = df[df$y == i,]` 
should be `s = df$z[df$y == i]` I think.


HTH,
Jan






On 17-11-2021 14:20, Luigi Marongiu wrote:

Hello,
I have a dataframe with 3 variables. I want to loop through it to get
the mean value of the variable `z`, as follows:
```
df = data.frame(x = c(rep(1,5), rep(2,5), rep(3,5)),
y = rep(letters[1:5],3),
z = rnorm(15),
stringsAsFactors = FALSE)
m = vector()
for (i in unique(df$y)) {
s = df[df$y == i,]
m = append(m, mean(s$z))
}
names(m) = unique(df$y)

(m)

a  b  c  d  e
-0.6355382 -0.4218053 -0.7256680 -0.8320783 -0.2587004
```
The problem is that I have one million `y` values, so the work takes
almost a day. I understand that vectorization will speed up the
procedure. But how shall I write the procedure in vectorial terms?
Thank you

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there a hash data structure for R

2021-11-03 Thread Jan van der Laan

On 03-11-2021 00:42, Avi Gross via R-help wrote:

Finally, someone mentioned how creating a data.frame with duplicate names
for columns is not a problem as it can automagically CHANGE them to be
unique. That is a HUGE problem for using that as a dictionary as the new
name will not be known to the system so all kinds of things will fail.

I think you are referring to my remark which was:

> However, the data.frame construction method will detect this and
> generate unique names (which also might not be what you want):

I didn't say this means that duplicate names aren't a problem; I just 
mentioned the the behaviour is different. Personally, I would actually 
prefer the behaviour of list (keep the duplicated name) with a warning.

Most of the responses seem to assume that the OP actually wants a hash 
table. Yes, he did ask for that and for a hash table an environment 
(with some work) would be a good option. But in many cases, where other 
languages would use a hash-table-like object (such as a dict) in R you 
would use other types of objects. Furthermore, for many operations where 
you might use hash tables to implement the operation, R has already 
built in options, for example %in%, match, duplicated. These are also 
vectorised; so two vectors: one with keys and one with values might 
actually be faster than an environment in some use cases.

Best,
Jan

And there are also packages for many features like sets as well as functions
to manipulate these things.

-Original Message-
From: R-help  On Behalf Of Bill Dunlap
Sent: Tuesday, November 2, 2021 1:26 PM
To: Andrew Simmons 
Cc: R Help 
Subject: Re: [R] Is there a hash data structure for R

Note that an environment carries a hash table with it, while a named list
does not.  I think that looking up an entry in a list causes a hash table to
be created and thrown away.  Here are some timings involving setting and
getting various numbers of entries in environments and lists.  The times are
roughly linear in n for environments and quadratic for lists.

vapply(1e3 * 2 ^ (0:6), f, L=new.env(parent=emptyenv()),

FUN.VALUE=NA_real_)
[1] 0.00 0.00 0.00 0.02 0.03 0.06 0.15

vapply(1e3 * 2 ^ (0:6), f, L=list(), FUN.VALUE=NA_real_)

[1]  0.01  0.03  0.15  0.53  2.66 13.66 56.05

f

function(n, L, V = sprintf("V%07d", sample(n, replace=TRUE))) {
 system.time(for(v in V)L[[v]]<-c(L[[v]],v))["elapsed"] }

Note that environments do not allow an element named "" (the empty string).

Elements named NA_character_ are treated differently in environments and
lists, neither of which is great.  You may want your hash table functions to
deal with oddball names explicitly.

-Bill

On Tue, Nov 2, 2021 at 8:52 AM Andrew Simmons  wrote:

If you're thinking about using environments, I would suggest you
initialize them like

x <- new.env(parent = emptyenv())

Since environments have parent environments, it means that requesting
a value from that environment can actually return the value stored in
a parent environment (this isn't an issue for [[ or $, this is
exclusively an issue with assign, get, and exists) Or, if you've
already got your values stored in a list that you want to turn into an
environment:

x <- list2env(listOfValues, parent = emptyenv())

Hope this helps!

On Tue, Nov 2, 2021, 06:49 Yonghua Peng  wrote:

But for data.frame the colnames can be duplicated. Am I right?

Regards.

On Tue, Nov 2, 2021 at 6:29 PM Jan van der Laan 

wrote:

True, but in a lot of cases where a python user might use a dict
an R user will probably use a list; or when we are talking about
arrays of dicts in python, the R solution will probably be a
data.frame (with

each

dict field in a separate column).

Jan

On 02-11-2021 11:18, Eric Berger wrote:

One choice is
new.env(hash=TRUE)
in the base package

On Tue, Nov 2, 2021 at 11:48 AM Yonghua Peng  wrote:

I know this is a newbie question. But how do I implement the
hash

structure

which is available in other languages (in python it's dict)?

I know there is the list, but list's names can be duplicated here.

x <- list(x=1:5,y=month.name,x=3:7)

x

$x

[1] 1 2 3 4 5

$y

   [1] "January"   "February"  "March" "April" "May"

  "June"

   [7] "July"  "August""September" "October"   "November"

"December"

$x

[1] 3 4 5 6 7

Thanks a lot.

  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list

Re: [R] Is there a hash data structure for R

2021-11-02 Thread Jan van der Laan

Yes. A data.frame is basically a list where all elements are vectors of 
the same length. So this issue also exists in a data.frame. However, the 
data.frame construction method will detect this and generate unique 
names (which also might not be what you want):

> data.frame(a=1:3, a=1:3) 

  a a.1 

  1 1 
 1 

2 2   2 

  3 3   3

But still with a little effort you can still create a data.frame with 
multiple columns with the same name. But as Duncan Murdoch mentions you 
can usually control for that.

Best,
Jan

On 02-11-2021 11:32, Yonghua Peng wrote:

But for data.frame the colnames can be duplicated. Am I right?

Regards.

On Tue, Nov 2, 2021 at 6:29 PM Jan van der Laan <mailto:rh...@eoos.dds.nl>> wrote:

True, but in a lot of cases where a python user might use a dict an R
user will probably use a list; or when we are talking about arrays of
dicts in python, the R solution will probably be a data.frame (with
each
dict field in a separate column).

Jan

On 02-11-2021 11:18, Eric Berger wrote:
 > One choice is
 > new.env(hash=TRUE)
 > in the base package
 >
 >
 >
 > On Tue, Nov 2, 2021 at 11:48 AM Yonghua Peng mailto:y...@pobox.com>> wrote:
 >
 >> I know this is a newbie question. But how do I implement the
hash structure
 >> which is available in other languages (in python it's dict)?
 >>
 >> I know there is the list, but list's names can be duplicated here.
 >>
 >>> x <- list(x=1:5,y=month.name <http://month.name>,x=3:7)
 >>
 >>> x
 >>
 >> $x
 >>
 >> [1] 1 2 3 4 5
 >>
 >>
 >> $y
 >>
 >>   [1] "January"   "February"  "March"     "April"     "May" 
  "June"

 >>
 >>   [7] "July"      "August"    "September" "October" 
  "November"  "December"

 >>
 >>
 >> $x
 >>
 >> [1] 3 4 5 6 7
 >>
 >>
 >>
 >> Thanks a lot.
 >>
 >>          [[alternative HTML version deleted]]
 >>
 >> __
 >> R-help@r-project.org <mailto:R-help@r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
 >> https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
 >> PLEASE do read the posting guide
 >> http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
 >> and provide commented, minimal, self-contained, reproducible code.
 >>
 >
 >       [[alternative HTML version deleted]]
 >
 > __
 > R-help@r-project.org <mailto:R-help@r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
 > https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
 > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
 > and provide commented, minimal, self-contained, reproducible code.
 >

__
R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there a hash data structure for R

2021-11-02 Thread Jan van der Laan




True, but in a lot of cases where a python user might use a dict an R 
user will probably use a list; or when we are talking about arrays of 
dicts in python, the R solution will probably be a data.frame (with each 
dict field in a separate column).


Jan




On 02-11-2021 11:18, Eric Berger wrote:

One choice is
new.env(hash=TRUE)
in the base package



On Tue, Nov 2, 2021 at 11:48 AM Yonghua Peng  wrote:


I know this is a newbie question. But how do I implement the hash structure
which is available in other languages (in python it's dict)?

I know there is the list, but list's names can be duplicated here.


x <- list(x=1:5,y=month.name,x=3:7)



x


$x

[1] 1 2 3 4 5


$y

  [1] "January"   "February"  "March" "April" "May"   "June"

  [7] "July"  "August""September" "October"   "November"  "December"


$x

[1] 3 4 5 6 7



Thanks a lot.

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Getting different results with set.seed()

2021-08-19 Thread Jan van der Laan





What you could also try is check if the self coded functions use the 
random generator when defining them:


starting_seed <- .Random.seed

Step 1. Self-coded functions (these functions generate random numbers as 
well)


# check if functions have modified the seed:
all.equal(starting_seed, .Random.seed)

Step 2: set.seed (123)



What has also happened to me is that some of the functions I called had 
their own random number generator independent of that of R. For example 
using one in C/C++.


Do your functions do stuff in parallel? For example using the parallel 
or snow package? In that case you also have to set the seed in the 
parallel workers.


Best,
Jan









On 19-08-2021 11:25, PIKAL Petr wrote:

Hi

Did you try different order?

Step 2: set.seed (123)

Step 1. Self-coded functions (these functions generate random numbers as well)

Step 3: Call those functions.

Step 4: model results.

Cheers
Petr.

And BTW, do not use HTML formating, it could cause problems in text only list.


From: Shah Alam 
Sent: Thursday, August 19, 2021 10:10 AM
To: PIKAL Petr 
Cc: r-help mailing list 
Subject: Re: [R] Getting different results with set.seed()

Dear Petr,

It is more than 2000 lines of code with a lot of functions and data inputs. I
am not sure whether it would be useful to upload it. However, you are
absolutely right. I used

Step 1. Self-coded functions (these functions generate random numbers as well)

Step 2: set.seed (123)

Step 3: Call those functions.

Step 4: model results.

I close the R session and run the code from step 1. I get different results
for the same set of values for parameters.

Best regards,
Shah




On Thu, 19 Aug 2021 at 09:56, PIKAL Petr 
wrote:
Hi

Please provide at least your code preferably with some data to reproduce
this behaviour. I wonder if anybody could help you without such information.

My wild guess is that you used

set.seed(1234)

some code

the code used again

in which case you have to expect different results.

Cheers
Petr


-Original Message-
From: R-help  On Behalf Of Shah Alam
Sent: Thursday, August 19, 2021 9:46 AM
To: r-help mailing list 
Subject: [R] Getting different results with set.seed()

Dear All,

I was using set.seed to reproduce the same results for the discrete event
simulation model. I have 12 unknown parameters for optimization (just a
little background). I got a good fit of parameter combinations. However,
when I use those parameters combinations again in the model. I am getting
different results.

Is there any problem with the set.seed. I assume the set.seed should
produce the same results.

I used set.seed(1234).

Best regards,
Shah

   [[alternative HTML version deleted]]

__
mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Read fst files

2021-06-09 Thread Jan van der Laan





read_fst is from the package fst. The fileformat fst uses is a binary 
format designed to be fast readable. It is a column  oriented format and 
compressed. So, to be able to work fst needs access to the file itself 
and wont accept a file connection as functions like read.table an 
variants accept.


Also, because it is a binary compressed format using a compression 
method that is fast to read, compressing also to zip seems to defeat the 
purpose of fst.


HTH,
Jan


On 09-06-2021 15:28, Duncan Murdoch wrote:

On 09/06/2021 9:12 a.m., Jeff Reichman wrote:

Duncan

Yea that will work. It appears to be related to setting my working 
dir, for what ever reason neither seem to work
(1) knitr::opts_knit$set(root.dir 
="~/My_Reference_Library/Regression") # from R Notebook or
(2) 
setwd("C:/Users/reichmaj/Documents/My_Reference_Library/Regression") # 
from R chunk


So it appears I can either (as you suggested) use two steps or combine 
but I need to enter the full path. Why other file types don't seem to 
need the full path ?


You need to read the documentation for read_fst() to find what it needs. 
  If it doesn't explain this, then you should report the issue to its 
author.




myObject <- 
read_fst(unz("C:/Users/reichmaj/Documents/My_Reference_Library/Regression/Datasest.zip", 
filename = "myFile.fst"))


Thank you. I guess just one of those R things


No, it's a read_fst() thing.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What is an alternative to expand.grid if create a long vector?

2021-04-20 Thread Jan van der Laan




This is an optimisation problem that you are trying to solve using a 
grid search. There are numerous methods for optimisation, see 
https://cran.r-project.org/web/views/Optimization.html for and overview 
for R. It really depends on the exact problem what method is appropriate.


As Petr said helping you decide which method to use does not fit on this 
list. Perhaps de overview linked to above (and the terms 'grid search' 
and 'optimization') can help you find an appropriate method.


HTH,
Jan


On 20-04-2021 09:02, PIKAL Petr wrote:

Hi



Keep your mails on the list. Actually you did not say much about your data and
the way how do you want to model them. There are plenty of modelling functions
in R starting with e.g. lm but I am not aware of a procedure in which you just
design your explanatory variables to set plausible model. But I am not expert
in statistics and this list is not ment for solving statistical problems.



Cheers

Petr





From: Shah Alam 
Sent: Monday, April 19, 2021 5:20 PM
To: PIKAL Petr 
Subject: Re: [R] What is an alternative to expand.grid if create a long
vector?



Dear Petr,



Thanks for your response. I am designing a model with 10 unknown parameters.
generating the combination of unknown parameters will be used in the model to
estimate the set of vectors that fits well to actual data. Is there any other
was to do it? I also used randomLHS function from lhs package. But, it did not
serve the purpose.



Best regards,

Shah Alam





On Mon, 19 Apr 2021 at 16:07, PIKAL Petr mailto:petr.pi...@precheza.cz> > wrote:

Hi

Actually expand.grid produces data frame and not vector. And dimension of
the data frame is "big"


dim(A)

[1] 1 4

str(A)

'data.frame':   1 obs. of  4 variables:
  $ Var1: num  0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 ...
  $ Var2: num  1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04
...
  $ Var3: num  0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 ...
  $ Var4: num  0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 ...
  - attr(*, "out.attrs")=List of 2
   ..$ dim : int [1:4] 100 100 100 100
   ..$ dimnames:List of 4
   .. ..$ Var1: chr [1:100] "Var1=0.001" "Var1=0.002" "Var1=0.003"
"Var1=0.004" ...
   .. ..$ Var2: chr [1:100] "Var2=0.000100" "Var2=0.0001090909"
"Var2=0.0001181818" "Var2=0.0001272727" ...
   .. ..$ Var3: chr [1:100] "Var3=0.380" "Var3=0.3804040"
"Var3=0.3808081" "Var3=0.3812121" ...
   .. ..$ Var4: chr [1:100] "Var4=0.120" "Var4=0.1206061"
"Var4=0.1212121" "Var4=0.1218182" ...




in case of 4 sequences 1e8 rows, 4 columns
in case of 10 sequences 1e20 rows and 10 columns
in your last example 1.4e8 rows and 10 columns which probably cross the
memory capacity of your PC.

Maybe you could increase memory of you PC. If I am correct to store the
first you need about 3.2GB, to strore the last 11.2 GB.

May I ask what you want to do with such a big object?

Cheers
Petr


-Original Message-
From: R-help mailto:r-help-boun...@r-project.org> > On Behalf Of Shah Alam
Sent: Monday, April 19, 2021 2:36 PM
To: r-help mailing list mailto:r-help@r-project.org>
  >
Subject: [R] What is an alternative to expand.grid if create a long

vector?


Dear All,

I would like to know that is there any problem in *expand.grid* function

or it

is a limitation of this function.

I am trying to create a combination of elements using expand.grid

function.


A <- expand.grid(
c(seq(0.001, 0.1, length.out = 100)),
c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out =

100)),

c(seq(0.12, 0.18, length.out = 100)))

Four combinations work fine. However, If I increase the combinations up to
ten. The following error appears.

  A <- expand.grid(
c(seq(0.001, 1, length.out = 100)),
c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out =

100)),

c(seq(0.12, 0.18, length.out = 100)), c(seq(0.01, 0.04, length.out =

100)),

c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.0001, 0.001, length.out =
100)), c(seq(0.001, 0.01, length.out = 100)), c(seq(0.01, 0.3, length.out

= 100))

)

*Error in rep.int   (rep.int
  (seq_len(nx),
rep.int   (rep.fac, nx)), orep) :   invalid
'times' value*

After reducing the length to 10. It produced a different type of error

A <- expand.grid(
c(seq(0.001, 0.005, length.out = 10)),
c(seq(0.0001, 0.0005, length.out = 10)), c(seq(0.38, 0.42, length.out =

5)),

c(seq(0.12, 0.18, length.out = 7)), c(seq(0.01, 0.04, length.out = 5)),
c(seq(0.0001, 0.001, length.out = 10)), c(seq(0.0001, 0.001, length.out =

10)),

c(seq(0.001, 0.01, length.out = 10)), c(seq(0.1, 0.8, length.out = 8))
)

*Error: cannot allocate vector of size 1.0 Gb*

What is an alternative to expand.grid if create a long vector based on 10
elements?

With kind regards,
Shah Alam

   [[alternative HTML version deleted]]

__

Re: [R] What is an alternative to expand.grid if create a long vector?

2021-04-20 Thread Jan van der Laan

But even if you could have a generator that is superefficient and 
perform an calculation that is superfast the number of elements is 
ridiculously large.

If we take 1 nanosec per element; the computation would still take:

> (100^10)*1E-9/3600
[1] 2778

hours, or

> (100^10)*1E-9/3600/24/365
[1] 3170.979

years.

--
Jan

On 20-04-2021 03:46, Avi Gross via R-help wrote:

Just some thoughts I am considering about the issue of how to make giant 
objects in memory without making them giant or all in memory.

As stupid as this sounds, when things get really big, it can mean not only 
processing your data in smaller amounts but using other techniques than asking 
expand.grid to create all possible combinations in advance.

Some languages like python allow generators that yield one item at a time and 
are called until exhausted, which sounds more like your usage. A single 
function remains resident in memory and each time it is called it uses the 
resident values in a calculation and returns the next. That approach may not 
work well with the way expand.grid works.

So a less efficient way would be to write your own deeply nested loop that 
generates one set of ten or so variables each time through the deepest nested 
loop that you can use one at a time. Alternatively, you can use such a loop to 
write a line at a time in something like a .CSV format and later read N lines 
at a time from the file or even have multiple programs work in parallel by 
taking their own allocations after ignoring the lines not meant for them, or 
some other method.

Deeply nested loops in R tend to be slow, as I have found out, which is indeed 
why I switched to using pmap() on a data.frame made using expand.grid first. 
But if your needs are exorbitant and you have limited memory, 

Can you squeeze some memory out of your design? Your data seems highly 
repetitive and if you really want to store something like this in a column:
c(seq(0.001, 1, length.out = 100))

The size of that, for comparison, is:

object.size(seq(0.001, 1, length.out = 100))
848 bytes

So it is 8 bytes per number plus some overhead.

Then consider storing something like that another way. First, the c() wrapper 
around the above is redundant, albeit harmless. Why not store this:
1L:100L

object.size(1L:100L)
448 bytes

So, four bytes per number plus some overhead.

That stores integers between 1 and 100 and in your case that means that later 
you can divide by a thousand or so to get the number you want each time but not 
store a full double-precision number.

And if you use factors, it may take less space. I note some of your other 
values pick different starting and ending points but in all cases you ask for 
100 equally-spaced values to be calculated by seq() which is fine but you could 
simply record a factor with umpteen specific values as either doubles or 
integers and if expand.grid honors that, it would use less space in any final 
output.  My experiments (not shown here) suggest you can easily cut sizes in 
half and perhaps more with judicious usage.

Perhaps finding or writing a more efficient loop in a C or C++ function would 
allow a way to loop through all possibilities more efficiently and provide a 
function for it to call on each iteration. Depending on your need, that can do 
a calculation using local variables and perhaps add a line to an output file, 
or add another set of values to a vector or other data structure that gets 
returned at the end of processing.

One possibility to consider is using an on-line resource, perhaps paying a fee, 
that will run your R program for you in an environment with more allowed 
resources like memory:

  https://rstudio.cloud/

Some of the professional options allow 8 GB of memory and perhaps 4 CPU. You 
can, of course, configure your own machine to have more memory or perhaps 
allocate lots more swap space and allow your process to abuse it.

There are many possible solutions but also consider if the sizes and amounts 
you are working on are realistic. I worked on a project a while ago where I 
generated a huge amount of instances with 500 iterations per instance and was 
asked to bump that up to 10,000 per instance (20 times as much) just to show 
the results were similar and that 500 had been enough. It ran for DAYS and 
luckily the rest of the project went back to more manageable numbers.

So, back to your scenario, I wonder if the regularity of your data would allow 
interesting games to be played. Imagine smaller combinations of say 10 levels 
each and for each row in the resulting data.frame, expand that out again so the 
number 2,3,4 (using just three for illustration) becomes (2:29, 3:39, 4:49) and 
is given to expand.grid to make a smaller local one-use expansion table to use. 
Your original giant problem is converted to making a modest table that for each 
row expands to a second modest table that is used and immediately discarded and 
replaced by a similar

Re: [R] /usr/local/lib/R/site-library is not writable

2021-04-08 Thread Jan van der Laan




I would actually go a step in the other direction: per project 
libraries. For example by adding a .Rprofile file to your project 
directory. This ensures that everybody working on a project uses the 
same version of the packages (even on different machines e.g. on shared 
folders).


This can give issues when a new version of R arrives, but that is 
usually easy to solve. Either hard code the path to the old R-version or 
decide to update all packages in a project to the new R-version (and 
test that everything is still working ok).


We have the most often used packages installed centrally on the 
server/network, so I actually usually end up with a mixture of central, 
personal and project libraries. Theory vs practice.


HTH,
Jan



On 08-04-2021 02:58, Dirk Eddelbuettel wrote:

Hi Gene,

"It's complicated". (Not really, but listen for a sec...)

We need to ship a default policy that makes sense for all / most
situations.  So

- users cannot write into /usr/local/lib/R/site-library -- unless they are
   set up to, but adding them to the 'group' that owns that directory

- root can (but ideally one should not run as root as one generally does not
   now what code you might get slipped in a tar.gz); but root can enable users

- so we recommend letting (some or all) users write there by explicitly
   adding them to an appropriate group.

Personally, I do not think personal libraries are a good idea on shared
machines because you can end up with a different set of package (versions)
than your colleague on the same machine.  And or you running shiny from $HOME
have different packages than shiny running as server. And on and on. Other
people differ, and that is fine. If one wants personal libraries one can.

I must have explained the reasoning and fixes a dozen times each on
r-sig-debian (where you could have asked this too) and StackOverflow. At
least the latter can be searched so look at this set:
https://stackoverflow.com/search?q=user%3Ame+is%3Aanser+%2Fusr%2Flocal%2Flib%2FR%2Fsite-library

Happy to take it offline too, and who knows, we even get to meet for a coffee
one of these days.

Hope this helps, Dirk



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with connection issue for R (just joined, leading R for our agency)

2020-12-15 Thread Jan van der Laan


Alejandra,

If it was initially working ok, I would first check with the IT 
department if there has been a change to the configuration of the 
firewall, virus scanners, file system etc. as these can affect the 
performance of R-studio. R-studio uses a client-server setup on your 
machine, so a firewall/malware scanner inspecting all communication 
between R-studio and the R session can have a large effect. If  you 
can't find the problem, you are probably better of asking at the 
R-studio fora. A similar question was asked a while back: 
https://community.rstudio.com/t/rstudio-suddenly-slow-processing/4959; 
perhaps some of the solutions proposed also work for you.


As an alternative to Emacs/R-studio you could also have a look at visual 
studio code. It has a R-plugin. If your organisation is microsoft 
oriented there might already be a chance that it is available. You need 
a relatively recent version though.


HTH,
Jan




On 14-12-2020 12:54, Michael Dewey wrote:
Just to add to Petr's comment there are other basic editors with syntax 
highlighting like Notepad++ which are also OK if you want a fairly 
minimalist approach.


Michael

On 14/12/2020 08:16, PIKAL Petr wrote:

Hallo Alejandra

Although RStudio and ESS could help with some automation (each with 
its own
way), using R alone is not a big problem, especially if you are not 
familiar

with Emacs basics and perceiving RStudio issues. I use R with simple
external editor - it could be notepad but I could recommend TINN-R

https://sourceforge.net/projects/tinn-r/

which has syntax highlighting and works smoothly if R is console is 
set to

multiple windows. It is also quite easy to manage.

Good luck with R.

Cheers
Petr


-Original Message-
From: R-help  On Behalf Of Alejandra 
Barrio

Gorski
Sent: Tuesday, December 8, 2020 7:48 PM
To: R-help@r-project.org
Subject: [R] Help with connection issue for R (just joined, leading R 
for

our

agency)

Dear fellow R users,

Greetings, I am new to this list. I joined because I am pioneering 
the use

of R

for the agency I work for. I essentially work alone and would like to

reach

out for help on an issue I have been having. Here it is:

    - From one day to the next, my RStudio does not execute commands 
when

I
    press ctrl + enter. Nothing happens, and then after a few minutes 
out

of
    nowhere, it runs everything at once. This makes it very hard to 
do my

work.

    - I tried uninstalling and re-installing both R and Rstudio, but the
    error comes up again. I tested commands on my R program alone, 
and it

works
    fine there. It could be the way that Rstudio connects to R.
    - I am on a Windows 10 computer. I work for a government agency so
there
    may be a few firewall/virus protection issues.

I would love any pointers.

Thank you,
Alejandra

--

*Alejandra Barrio*
Linkedin  | Website

MPP | M.A., International and Area Studies University of California,

Berkeley


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] saveRDS() and readRDS() Why?

2018-11-07 Thread Jan van der Laan




Are you sure you didn't do saveRDS("rawData", file = "rawData.rds") 
instead of saveRDS(rawData, file = "rawData.rds") ? This would explain 
the result you have under linux.


In principle saveRDS and readRDS can be used to copy objects between 
R-sessions without loosing information.


What does readRDS return on windows with the same file?

What type of object is rawData? Do str(rawData). Some objects created by 
packages cannot be serialized, e.g. objects that point to memory 
allocated by a package. The pointer is then serialized not the memory 
pointed to.


Also, if the object is generated by a package, you might need to load 
the package to get the printing etc. of the object right.


HTH,

Jan







On 07-11-18 09:45, Patrick Connolly wrote:

On Wed, 07-Nov-2018 at 08:27AM +, Robert David Burbidge wrote:

|> Hi Patrick,
|>
|> From the help: "save writes a single line header (typically
|> "RDXs\n") before the serialization of a single object".
|>
|> If the file sizes are the same (see Eric's message), then the
|> problem may be due to different line terminators. Try serialize and
|> unserialize for low-level control of saving/reading objects.

I'll have to find out what 'serialize' means.

On Windows, it's a huge table, looks like it's all hexadecimal.

On Linux, it's just the text string 'rawData' -- a lot more than line
terminators.

Have I misunderstood what the idea is?  I thought I'd get an identical
object, irrespective of how different the OS stores and zips it.



|>
|> Rgds,
|>
|> Robert
|>
|>
|> On 07/11/18 08:13, Eric Berger wrote:
|> >What do you see at the OS level?
|> >i.e. on windows
|> >DIR rawData.rds
|> >on linux
|> >ls -l rawData.rds
|> >compare the file sizes on both.
|> >
|> >
|> >On Wed, Nov 7, 2018 at 9:56 AM Patrick Connolly 
|> >wrote:
|> >
|> >> From a Windows R session, I do
|> >>
|> >>>object.size(rawData)
|> >>31736 bytes  # from scraping a non-reproducible web address.
|> >>>saveRDS(rawData, file = "rawData.rds")
|> >>Then copy to a Linux session
|> >>
|> >>>rawData <- readRDS(file = "rawData.rds")
|> >>>rawData
|> >>[1] "rawData"
|> >>>object.size(rawData)
|> >>112 bytes
|> >>>rawData
|> >>[1] "rawData" # only the name and something to make up 112 bytes
|> >>Have I misunderstood the syntax?
|> >>
|> >>It's an old version on Windows.  I haven't used Windows R since then.
|> >>
|> >>major  3
|> >>minor  2.4
|> >>year   2016
|> >>month  03
|> >>day16
|> >>
|> >>
|> >>I've tried R-3.5.0 and R-3.5.1 Linux versions.
|> >>
|> >>In case it's material ...
|> >>
|> >>I couldn't get the scraping to work on either of the R installations
|> >>but Windows users told me it worked for them.  So I thought I'd get
|> >>the R object and use it.  I could understand accessing the web address
|> >>could have different permissions for different OSes, but should that
|> >>affect the R objects?
|> >>
|> >>TIA
|> >>
|> >>--
|> >>~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
|> >>___Patrick Connolly
|> >>  {~._.~}   Great minds discuss ideas
|> >>  _( Y )_ Average minds discuss events
|> >>(:_~*~_:)  Small minds discuss people
|> >>  (_)-(_)  . Eleanor Roosevelt
|> >>
|> >>~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
|> >>
|> >>__
|> >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
|> >>https://stat.ethz.ch/mailman/listinfo/r-help
|> >>PLEASE do read the posting guide
|> >>http://www.R-project.org/posting-guide.html
|> >>and provide commented, minimal, self-contained, reproducible code.
|> >>
|> >  [[alternative HTML version deleted]]
|> >
|> >__
|> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
|> >https://stat.ethz.ch/mailman/listinfo/r-help
|> >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
|> >and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot a path

2018-11-01 Thread Jan van der Laan

Below a similar example, using sf and leaflet; plotting the trajectory 
on a background map.



library(leaflet)
library(sf)
library(dplyr)

# Generate example data
gen_data <- function(id, n) {
  data.frame(
id = id,
date = 1:n,
lat = runif(10, min = -90, max = 90),
lon = runif(10, min = -180, max = 180)
  )
}

dta <- lapply(1:2, gen_data, n = 10) %>% bind_rows()

# Transform all records of one object/person to a st_linestring, then
# combine into one sf column
lines <- dta %>%
  arrange(id, date) %>%
  split(dta$id) %>%
  lapply(function(d) st_linestring(cbind(d$lon, d$lat))) %>%
  unname() %>%   # Without the unname it doesn't work for some reason
  st_sfc()

# Plot using leaflet
leaflet() %>%
  addTiles() %>%
  addPolylines(data = lines)


HTH - Jan


On 01-11-18 11:27, Rui Barradas wrote:

Hello,

The following uses ggplot2.

First, make up a dataset, since you have not posted one.



lat0 <- 38.736946
lon0 <- -9.142685
n <- 10

set.seed(1)
Date <- seq(Sys.Date() - n + 1, Sys.Date(), by = "days")
Lat <- lat0 + cumsum(c(0, runif(n - 1)))
Lon <- lon0 + cumsum(c(0, runif(n - 1)))
Placename <- rep(c("A", "B"), n/2)

path <- data.frame(Date, Placename, Lat, Lon)
path <- path[order(path$Date), ]


Now, two graphs, one with just one line of all the lon/lat and the other 
with a line for each Placename.


library(ggplot2)

ggplot(path, aes(x = Lon, y = Lat)) +
   geom_point() +
   geom_line()


ggplot(path, aes(x = Lon, y = Lat, colour = Placename)) +
   geom_point(aes(fill = Placename)) +
   geom_line()


Hope this helps,

Rui Barradas

Às 21:27 de 31/10/2018, Ferri Leberl escreveu:


Dear All,
I have a dataframe with four cols: Date, Placename, geogr. latitude, 
geogr. longitude.
How can I plot the path as a line, ordered by the date, with the 
longitude as the x-axis and the latitude as the y-axis?

Thank you in advance!
Yours, Ferri

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating just a single row of dissimilarity/distance matrix

2018-10-27 Thread Jan van der Laan

Please respond to the list; there are more people answering there.

As explained in the documentation gower_dist performes a pairwise 
comparison of the two arguments recycling the shortest one if needed, so 
indeed gower_dist(iris[1:5, ], iris) doesn't do what you want.

Possible solutions are:

tmp <- split(iris[1:150, ], seq_len(150))

sapply(gower_dist, iris)

and:

library(dplyr)

library(tidyr)

pairs <- expand.grid(x = 1:5, y = 1:nrow(iris))
pairs$dist <- gower_dist(iris[pairs$x, ], iris[pairs$y, ])
pairs %>% spread(y, dist)

Don't know which one is faster. And there are probably various other 
solutions too.

--
Jan

On 27-10-18 18:04, Aerenbkts bkts wrote:

Dear Jan

Thanks for your help. Actually it works for the first element. But I 
tried to calculate distance values for the first N rows. For example;

gower_dist(iris[1:5,], iris) // gower distance for the first 5 rows. 
but it did not work. Do you have any suggestion about it?

On Fri, 26 Oct 2018 at 21:31, Jan van der Laan <mailto:rh...@eoos.dds.nl>> wrote:

Using another implementation of the gower distance:

library(gower)

gower_dist(iris[1,], iris)

HTH,

Jan

On 26-10-18 15:07, Aerenbkts bkts wrote:
> I have a data-frame with 30k rows and 10 features. I would like to
> calculate distance matrix like below;
>
> gower_dist <- daisy(data-frame, metric = "gower"),
>
>
> This function returns whole dissimilarity matrix. I want to get just
> the first row.
> (Just distances of the first element in data-frame). How can I
do it?
> Do you have an idea?
>
>
> Regards
>
>       [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org <mailto:R-help@r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating just a single row of dissimilarity/distance matrix

2018-10-26 Thread Jan van der Laan




Using another implementation of the gower distance:


library(gower)

gower_dist(iris[1,], iris)


HTH,

Jan



On 26-10-18 15:07, Aerenbkts bkts wrote:

I have a data-frame with 30k rows and 10 features. I would like to
calculate distance matrix like below;

gower_dist <- daisy(data-frame, metric = "gower"),


This function returns whole dissimilarity matrix. I want to get just
the first row.
(Just distances of the first element in data-frame). How can I do it?
Do you have an idea?


Regards

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Erase content of dataframe in a single stroke

2018-09-27 Thread Jan van der Laan


Or

testdf <- testdf[FALSE, ]

or

testdf <- testdf[numeric(0), ]

which seems to be slightly faster.

Best,
Jan


Op 27-9-2018 om 10:32 schreef PIKAL Petr:

Hm

I would use


testdf<-data.frame(A=c(1,2),B=c(2,3),C=c(3,4))
str(testdf)

'data.frame':   2 obs. of  3 variables:
  $ A: num  1 2
  $ B: num  2 3
  $ C: num  3 4

testdf<-testdf[-(1:nrow(testdf)),]
str(testdf)

'data.frame':   0 obs. of  3 variables:
  $ A: num
  $ B: num
  $ C: num

Cheers
Petr


-Original Message-
From: R-help  On Behalf Of Jim Lemon
Sent: Thursday, September 27, 2018 10:12 AM
To: Luigi Marongiu ; r-help mailing list 
Subject: Re: [R] Erase content of dataframe in a single stroke

Ah, yes, try 'as.data.frame" on it.

Jim

On Thu, Sep 27, 2018 at 6:00 PM Luigi Marongiu 
wrote:

Thank you Jim,
this requires the definition of an ad hoc function; strange that R
does not have a function for this purpose...
Anyway, it works but it changes the structure of the data. By
redefining the dataframe as I did, I obtain:


df

[1] A B C
<0 rows> (or 0-length row.names)

str(df)

'data.frame': 0 obs. of  3 variables:
  $ A: num
  $ B: num
  $ C: num

When applying your function, I get:


df

$A
NULL

$B
NULL

$C
NULL


str(df)

List of 3
  $ A: NULL
  $ B: NULL
  $ C: NULL

The dataframe has become a list. Would that affect downstream

applications?

Thank you,
Luigi
On Thu, Sep 27, 2018 at 9:45 AM Jim Lemon 

wrote:

Hi Luigi,
Maybe this:

testdf<-data.frame(A=1,B=2,C=3)

testdf

  A B C
1 1 2 3
toNull<-function(x) return(NULL)
testdf<-sapply(testdf,toNull)

Jim
On Thu, Sep 27, 2018 at 5:29 PM Luigi Marongiu

 wrote:

Dear all,
I would like to erase the content of a dataframe -- but not the
dataframe itself -- in a simple and fast way.
At the moment I do that by re-defining the dataframe itself in this way:


df <- data.frame(A = numeric(),

+   B = numeric(),
+   C = character())

# assign
A <- 5
B <- 0.6
C <- 103
# load
R <- cbind(A, B, C)
df <- rbind(df, R)
df

   A   B   C
1 5 0.6 103

# erase
df <- data.frame(A = numeric(),

+  B = numeric(),
+  C = character())

df

[1] A B C
<0 rows> (or 0-length row.names)
Is there a way to erase the content of the dataframe in a simplier
(acting on all the dataframe at once instead of naming each column
individually) and nicer (with a specific erasure command instead
of re-defyining the object itself) way?

Thank you.
--
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních 
partnerů PRECHEZA a.s. jsou zveřejněny na: 
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about 
processing and protection of business partner’s personal data are available on 
website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a 
podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: 
https://www.precheza.cz/01-dovetek/ | This email and any documents attached to 
it may be confidential and are subject to the legally binding disclaimer: 
https://www.precheza.cz/en/01-disclaimer/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] security using R at work

2018-08-09 Thread Jan van der Laan

You can also inadvertently transmit data to the internet using a package
without being obviously 'stupid', e.g. by using a package that uses an
external service for data processing. For example, some javascript
visualisation libs can do that (not sure if those wrapped in R-packages
do), or, for example, a geocoding service.

Not having an (outgoing) internet connection at least helps against
mistakes like this (and probably against many untargeted attacks). If it
is allowed to have the sensitive data on that computer, using R on that
computer is probably not going to make is less safe.

Jan

On 09-08-18 09:19, Rainer M Krug wrote:

I can not agree more, Barry. Very nicely put.

Rainer

On 8 Aug 2018, at 18:10, Barry Rowlingson wrote:

On Wed, Aug 8, 2018 at 4:09 PM, Laurence Clark
wrote:

Hello all,

I want to download R and use it for work purposes. I hope to use it to analyse
very sensitive data from our clients.

My question is:

If I install R on my work network computer, will the data ever leave our
network? I need to know if the data goes anywhere other than our network,
because this could compromise it's security.

Is there is any chance the data could go to a server owned by 'R' or anything
else that's not immediately obvious, but constitutes the data leaving our
network?

You are talking mostly to statisticians here, and if p>0 then there's
"a chance". I'd say yes, there's a chance, but its pretty small, and
would only occur through stupidity, accident or malice.

In the ordinary course of things your data will be on your hard disk,
or on your corporate network drives, and only exist between your
corporate network server and your PC's memory. R will load the data
into that memory, do stuff with it in that memory, and write results
back to hard disk. Nothing leaves the network this way.

However... R has facilities for talking to the internet. You can save
data to google docs spreadsheets, for example, but you'd have to be
signed in to google, and have to type something like:

writeGoogleDoc(my_data, "secretdata.xls")

that covers "stupid". You should know that google docs are on google's
servers, and google's servers aren't on your network, and your secret
data shouldn't go on google's servers.

Accidents happen. You might be working on non-secret data which you
want to save to google docs, and accidentally save "data1" which is
secret instead of "data2" which is okay to be public. Oops. You sent
it to google. Accidents happen.

"malice" would be if someone had put code into R or an add-on package
that you use that sends your data over the network without you
knowing. For example maybe every time you fit a linear model with:

lm(age~beauty, data=people)

R could be transmitting the data to hackers. But the chance of this is
very small, and I don't think any malicious code has ever been
discovered in R or the 12000 add-on packages downloadable from CRAN.
Doesn't mean it hasn't been discovered yet or won't be in the future.

It used to be said that the only machine safe from hackers was one
unplugged from the network. But now hackers can get to your machine
via malicious USB sticks, keyboard loggers, and various other nasties.
The only machine safe from hackers is one with the power off. But take
the power plug out because a wake-on-lan packet could switch your
machine on remotely

Barry

Thank you

Laurence

--
Laurence Clark
Business Data Analyst
Account Management
Health Management Ltd

Mobile: 07584 556498
Switchboard:0845 504 1000
Email: laurence.cl...@healthmanltd.com
Web:www.healthmanagement.co.uk

--
CONFIDENTIALITY NOTICE: This email, including attachments, is for the sole use of the
intended recipients and may contain confidential and privileged information or otherwise be
protected by law. Any unauthorised review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender, and destroy all copies
and the original message.MAXIMUS People Services Limited is registered
in England and Wales (registered number: 03752300); registered office: 202 - 206 Union
Street, London, SE1 0LX, United Kingdom. The Centre for Health and Disability Assessments
Ltd (registered number: 9072343) and Health Management Ltd (registered number: 4369949) are
registered in England and Wales. The registered office for each is Ash House, The Broyle,
Ringmer, East Sussex, BN8 5NN, United Kingdom. Remploy Limited is registered in England and
Wales (registered number: 09457025); registered office: 18c Meridian East, Meridian
Business Park,

Re: [R] Help understanding why glm and lrm.fit runs with my data, but lrm does not

2017-09-14 Thread Jan van der Laan



With lrm.fit you are fitting a completely different model. One of the 
things lrm does, is preparing the input for lrm.fit which in this case 
means that dummy variables are generated for categorical variables such 
as 'KILLIP'.


The error message means that model did not converge after the maximum 
number of iterations. One possible solution is to try to increase the 
maximum number of iterations, e.g.:


fit1 <- lrm(DAY30~AGE+HYP+KILLIP+HRT+ANT, data = gusto2, maxit = 100)

HTH,

Jan



On 14-09-17 09:30, Bonnett, Laura wrote:

Dear all,

I am using the publically available GustoW dataset.  The exact version I am 
using is available here: 
https://drive.google.com/open?id=0B4oZ2TQA0PAoUm85UzBFNjZ0Ulk

I would like to produce a nomogram for 5 covariates - AGE, HYP, KILLIP, HRT and ANT.  I 
have successfully fitted a logistic regression model using the "glm" function 
as shown below.

library(rms)
gusto <- spss.get("GustoW.sav")
fit <- 
glm(DAY30~AGE+HYP+factor(KILLIP)+HRT+ANT,family=binomial(link="logit"),data=gusto,x=TRUE,y=TRUE)

However, my review of the literature and other websites suggest I need to use "lrm" for 
the purposes of producing a nomogram.  When I run the command using "lrm" (see below) I 
get an error message saying:
Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
   Unable to fit model using "lrm.fit"

My code is as follows:
gusto2 <- gusto[,c(1,3,5,8,9,10)]
gusto2$HYP <- factor(gusto2$HYP, labels=c("No","Yes"))
gusto2$KILLIP <- factor(gusto2$KILLIP, labels=c("1","2","3","4"))
gusto2$HRT <- factor(gusto2$HRT, labels=c("No","Yes"))
gusto2$ANT <- factor(gusto2$ANT, labels=c("No","Yes"))
var.labels=c(DAY30="30-day Mortality", AGE="Age in Years", KILLIP="Killip Class", 
HYP="Hypertension", HRT="Tachycardia", ANT="Anterior Infarct Location")
label(gusto2)=lapply(names(var.labels),function(x) 
label(gusto2[,x])=var.labels[x])

ddist = datadist(gusto2)
options(datadist='ddist')

fit1 <- lrm(DAY30~AGE+HYP+KILLIP+HRT+ANT,gusto2)

Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
   Unable to fit model using "lrm.fit"

Online solutions to this problem involve checking whether any variables are 
redundant.  However, the results for my data suggest  that none are.
redun(~AGE+HYP+KILLIP+HRT+ANT,gusto2)

Redundancy Analysis

redun(formula = ~AGE + HYP + KILLIP + HRT + ANT, data = gusto2)

n: 2188 p: 5nk: 3

Number of NAs:   0

Transformation of target variables forced to be linear

R-squared cutoff: 0.9   Type: ordinary

R^2 with which each variable can be predicted from all other variables:

AGEHYP KILLIPHRTANT
  0.028  0.032  0.053  0.046  0.040

No redundant variables

I've also tried just considering "lrm.fit" and that code seems to run without 
error too:
lrm.fit(cbind(gusto2$AGE,gusto2$KILLIP,gusto2$HYP,gusto2$HRT,gusto2$ANT),gusto2$DAY30)

Logistic Regression Model

  lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP, gusto2$HRT,
  gusto2$ANT), y = gusto2$DAY30)

Model Likelihood DiscriminationRank Discrim.
   Ratio Test   Indexes   Indexes
  Obs  2188LR chi2 233.59R2   0.273C   0.846
   0   2053d.f. 5g1.642Dxy 0.691
   1135Pr(> chi2) <0.0001gr   5.165gamma   0.696
  max |deriv| 4e-09  gp   0.079tau-a   0.080
 Brier0.048

Coef S.E.   Wald Z Pr(>|Z|)
  Intercept -13.8515 0.9694 -14.29 <0.0001
  x[1]0.0989 0.0103   9.58 <0.0001
  x[2]0.9030 0.1510   5.98 <0.0001
  x[3]1.3576 0.2570   5.28 <0.0001
  x[4]0.6884 0.2034   3.38 0.0007
  x[5]0.6327 0.2003   3.16 0.0016

I was therefore hoping someone would explain why the "lrm" code is producing an error message, 
while "lrm.fit" and "glm" do not.  In particular I would welcome a solution to ensure I 
can produce a nomogram.

Kind regards,
Laura

Dr Laura Bonnett
NIHR Post-Doctoral Fellow

Department of Biostatistics,
Waterhouse Building, Block F,
1-5 Brownlow Street,
University of Liverpool,
Liverpool,
L69 3GL

0151 795 9686
l.j.bonn...@liverpool.ac.uk



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loading large .pxt and .asc datasets causes issues.

2016-02-23 Thread Jan van der Laan

First, the file does contain 302 columns; the variable layout 
(http://www.cdc.gov/brfss/annual_data/2006/varlayout_table_06.htm) 
contains 302 columns. So, reading the SASS file probably works correctly.


Second, the read.asc function you use is for reading geographic raster 
files, not fixed width files.


Below, I show how you could read the file using the LaF package (sorry 
for the long dump of variable files; copy-pasted them from the page 
linked to above):


columns <- "StartingColumn  VariableNameFieldLength
1_STATE2
3_GEOSTR2
5_DENSTR21
6PRECALL1
7REPNUM5
12REPDEPTH2
14FMONTH2
16IDATE8
16IMONTH2
18IDAY2
20IYEAR4
24INTVID3
27DISPCODE3
30SEQNO10
30_PSU10
40NATTMPTS2
42NRECSEL6
48NRECSTR9
57CTELENUM1
58CELLFON11
59PVTRESID1
60NUMADULT2
62NUMMEN2
64NUMWOMEN2
73GENHLTH1
74PHYSHLTH2
76MENTHLTH2
78POORHLTH2
80HLTHPLAN1
81PERSDOC21
82MEDCOST1
83CHECKUP1
84EXERANY21
85DIABETE21
86LASTDEN31
87RMVTETH31
88DENCLEAN1
89CVDINFR31
90CVDCRHD31
91CVDSTRK31
92ASTHMA21
93ASTHNOW1
94QLACTLM21
95USEEQUIP1
96SMOKE1001
97SMOKDAY21
98STOPSMK21
99AGE2
101HISPANC21
102MRACE6
108ORACE21
109MARITAL1
110CHILDREN2
112EDUCA1
113EMPLOY1
114INCOME22
116WEIGHT24
120HEIGHT34
124CTYCODE3
132NUMHHOL21
133NUMPHON21
134TELSERV21
135SEX1
136PREGNANT1
137VETERAN1
138DRNKANY41
139ALCDAY43
142AVEDRNK22
144DRNK3GE52
146MAXDRNKS2
148FLUSHOT31
149FLUSPRY21
162PNEUVAC31
163HEPBVAC1
164HEPBRSN1
165FALL3MN22
167FALLINJ22
169SEATBELT1
170DRINKDRI2
172HADMAM1
173HOWLONG1
174PROFEXAM1
175LENGEXAM1
176HADPAP21
177LASTPAP21
178HADHYST21
179PSATEST1
180PSATIME1
181DIGRECEX1
182DRETIME1
183PROSTATE1
184BLDSTOOL1
185LSTBLDS21
186HADSIGM31
187LASTSIG21
188HIVTST51
189HIVTSTD26
195WHRTST72
197HIVRDTST1
198EMTSUPRT1
199LSATISFY1
200RCSBIRTH6
206RCSGENDR1
207RCHISLAT1
208RCSRACE6
214RCSBRACE1
215RCSRELN11
216DRHPCH1
217HAVHPCH1
218CIFLUSH21
219RCVFVCH26
225RNOFVCH22
227CASTHDX21
228CASTHNO21
229DIABAGE22
231INSULIN1
232DIABPILL1
233BLDSUGAR3
236FEETCHK23
239FEETSORE1
240DOCTDIAB2
242CHKHEMO32
244FEETCHK2
246EYEEXAM1
247DIABEYE1
248DIABEDU1
249VIDFCLT21
250VIREDIF21
251VIPRFVS21
252VINOCRE22
254VIEYEXM21
255VIINSUR21
256VICTRCT21
257VIGLUMA21
258VIMACDG21
259VIATWRK21
260PAINACT22
262QLMENTL22
264QLSTRES22
266QLREST22
268QLHLTH22
270ASTHMAGE2
272ASATTACK1
273ASERVIST2
275ASDRVIST2
277ASRCHKUP2
279ASACTLIM3
282ASYMPTOM1
283ASNOSLEP1
284ASTHMED21
285ASINHALR1
286BRTHCNT31
287TYPCNTR42
289NOBCUSE22
291FPCHLDFT1
292FPCHLDHS1
293VITAMINS1
294MULTIVIT1
295FOLICACD1
296TAKEVIT3
299RECOMMEN1
300HOUSESMK1
301INDOORS1
302SMKPUBLC1
303SMKWORK1
304IAQHTSRC1
305IAQGASAP1
306IAQHTDYS3
309IAQCODTR1
310IAQMOLD1
311HEWTRSRC1
312HEWTRDRK1
313HECHMHOM3
316HECHMYRD3
319RRCLASS21
320RRCOGNT21
321RRATWORK1
322RRHCARE21
323RRPHYSM11
324RREMTSM11
325ADPLEASR2
327ADDOWN2
329ADSLEEP2
331ADENERGY2
333ADEAT2
335ADFAIL2
337ADTHINK2
339ADMOVE2
341ADANXEV1
342ADDEPEV1
343SVSAFE1
344SVSEXTCH1
345SVNOTCH1
346SVEHDSE11
347SVHDSX121
348SVEANOS11
349SVNOSX121
350SVRELAT22
352SVGENDER1
353IPVSAFE1
354IPVTHRAT1
355IPVPHYV11
356IPVPHHRT1
357IPVUWSEX1
358IPVPVL121
359IPVSXINJ1
360IPVRELT12
362GPWELPRD1
363GPVACPLN1
364GP3DYWTR1
365GP3DYFOD1
366GP3DYPRS1
367GPBATRAD1
368GPFLSLIT1
369GPMNDEVC1
370GPNOTEVC2
372GPEMRCOM1
373GPEMRINF1
741QSTVER1
742QSTLANG2
800

Re: [R] Coding systems.

2013-11-26 Thread Jan van der Laan



Could it be that your r-script is saved in a different encoding than  
the one used by R (which will probably be UTF8 since you're working on  
linux)?


--
Jan



gerald.j...@dgag.ca schreef:


Hello,

I am using R, 2.15.2, on a 64-bit Linux box.  I run R through Emacs' ESS.

R runs in a French, Canadian-French, locale and lately I got surprising
results
from functions making factor variables from character variables.  Many of
the
variables in input data.frames are character variables and contain latin
accents, for exemple the é in Montréal.  I waisted several days playing
with coding systems and trying to understand why some code when run one
command at
a time from the command line gives the expected result while when cut and
pasted in a function it doesn't???

For example the following code:

==
ttt.rmr - sima.31122012$rmrnom
ttt.rmr.2 - ifelse (ttt.rmr %in% c(Edmonton, Edmundston,
Charlottetown, Calgary, Winnipeg,
Victoria, Vancouver, Toronto,
St. John's, Saskatoon, Regina,
Québec, Ottawa - Gatineau (Ontario,
Ottawa - Gatineau (partie,
Montréal,
Halifax, Fredericton),
 Grandes villes, ifelse(ttt.rmr == , Manquant,
Autres))
unique(ttt.rmr.2)
ttt.rmr.2 - factor(ttt.rmr.2, levels = c(Grandes villes, Autres,
Manquant),
labels = c(Grandes villes, Autres, Manquant))

==

will have Montréal and Québec in the Grandes villes level of the
factor
variable, while running the same code in a function will have them in
Autres.
The variable rmr.Merged in the data.frame test2.sima.31122012.DataPrep
is
the output of the function, which, of course, does a lot of other stuff.

==
ttt.w - which(ttt.rmr.2 != test2.sima.31122012.DataPrep$rmr.Merged)
frequence(test2.sima.31122012.DataPrep$rmrnom[ttt.w])
 Frequency  Percent Cum.Freq Cum.Percent
Montréal   1301254 79.57173  130125479.57173
Québec  334068 20.42827  1635322   100.0
==

All other city names, no accents, were correctly classified but Montréal
and
Québec, together they represent over 1.5M records, not negligeable!!!

Following is my .Renviron file where I set up environment variables for
R.

R_PROFILE_USER=/home/jeg002/MyRwork/StartUp/profile.R
# export R_PROFILE_USER
R_HISTFILE=/home/jeg002/MyRwork/.Rhistory
## Default editor
EDITOR=${EDITOR-${VISUAL-'/usr/local/bin/emacsclient'}}
## Default pager
PAGER=${PAGER-'/usr/local/bin/emacsclient'}

## Setting locale, hoping it will be OK all the time!!!
LANG=fr_CA
LANGUAGE=fr_CA
LC_ADDRESS=fr_CA
LC_COLLATE=fr_CA
LC_TYPE=fr_CA
LC_IDENTIFICATION=fr_CA
LC_MEASUREMENT=fr_CA
LC_MESSAGES=fr_CA
LC_NAME=fr_CA
LC_PAPER=en_US
LC_NUMERIC=en_US
LC_TELEPHONE=fr_CA
LC_MONETARY=fr_CA
LC_TIME=fr_CA
R_PAPERSIZE='letter'
==

and:


Sys.getlocale()

[1]
LC_CTYPE=fr_CA;LC_NUMERIC=C;LC_TIME=fr_CA;LC_COLLATE=fr_CA;LC_MONETARY=fr_CA;LC_MESSAGES=fr_CA;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=fr_CA;LC_IDENTIFICATION=C


Sys.getenv(c(LANGUAGE, LANG))

LANGUAGE LANG
 fr_CA  fr_CA

I must be missing something!!!  Maybe someone can make sense of this!!!
Thanks
for your support,

Gérald Jean

 (Embedded image moved to file:
 pic06023.gif)

 Gerald Jean, M. Sc. en statistiques
 Conseiller senior en statistiques Lévis (siège social)

 Actuariat corporatif, 418 835-4900, poste
 Modélisation et Recherche 7639
 Assurance de dommages 1 877 835-4900, poste
 Mouvement Desjardins  7639
   Télécopieur : 418
   835-6657




 Faites bonne impression et imprimez seulement au besoin!

 Ce courriel est confidentiel, peut être protégé par le secret  
professionnel et
 est adressé exclusivement au destinataire. Il est strictement  
interdit à toute
 autre personne de diffuser, distribuer ou reproduire ce message. Si  
vous l'avez

 reçu par erreur, veuillez immédiatement le détruire et aviser l'expéditeur.
 Merci.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading in csv data with ff package

2013-11-19 Thread Jan van der Laan


The following seems to work:

data = read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows = 500,
  next.rows = 1005,sep=,,colClasses = c(integer,factor,logical))


'character' doesn't work because ff does not support character  
vectors. Character vector need to be stored as factors. The  
disadvantage of that is that the levels are stored in memory, so if  
the number of levels is very large (e.g. with unique strings) you  
might still run into memory problems.


'integer' doesn't work because read.csv.ffdf passes the colClasses on  
to read.table, which then tries to converts your second column to  
integer which it can't.


Jan



Nick McClure nfmccl...@gmail.com schreef:


I've spent some time trying to wrap my head around reading in large csv
files with the ff-package.  I think I know how to do it, but am bumping
into some problems.  I've tried to recreate the issues as best as I can
with a smaller example and maybe someone can help explain the problems.

The following code just creates a csv file with an integer column,
character column and logical column.
-
library(ff)
#Create data
size = 2000
fake.data =
data.frame(Integer=round(10*runif(size)),Character=sample(LETTERS,size,replace=T),Logical=sample(c(T,F),size,replace=T))

#Write to csv
write.csv(fake.data,data.csv,row.names=F)
-

Now to read it in as a 'ffdf' class, I can do the following:

-
data = read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows = 500,
next.rows = 1005,sep=,)
-

That works.  But with my current large data set, read.csv.ffdf is debating
with me about the classes it's importing. I was also messing around with
the first.rows/next.rows, but that's a question for another time. So I'll
try to load the data in, specifying the column types (same exact command,
except with specifying colClasses):

-

data = read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows =  
500, next.rows = 1005,sep=,,colClasses =  
c(integer,integer,logical))Error in scan(file, what, nmax,  
sep, dec, quote, skip, nlines, na.strings,  :

  scan() expected 'an integer', got 'J' data =
read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows = 500,
next.rows = 1005,sep=,,colClasses =
c(integer,character,logical))Error in ff(initdata = initdata,
length = length, levels = levels, ordered = ordered,  :
  vmode 'character' not implemented data =
read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows = 500,
next.rows = 1005,sep=,,colClasses = rep(character,3))Error in
ff(initdata = initdata, length = length, levels = levels, ordered =
ordered,  :
  vmode 'character' not implemented data =
read.csv.ffdf(x=NULL,file=data.csv,nrows=1001,first.rows = 500,
next.rows = 1005,sep=,,colClasses = rep(raw,3))Error in scan(file,
what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  scan() expected 'a raw', got '8601'

-
I just can't find a combination of classes that will result in this reading
in.  I really don't understand why the classes 'character' won't work for
all of them.  Any thoughts as to why?  I appreciate the help and time.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] laf_open_fwf

2013-08-09 Thread Jan van der Laan


Christian,

In my original example I had an n=1E5 argument in readLines:

lines - readLines(con, n=1E5)

This ensures that every iteration of the loop only 10 lines are read 
(which should usually fit into memory). Without this argument readLines 
tries to read in the complete file.


Jan


On 08/09/2013 04:43 PM, christian.kame...@astra.admin.ch wrote:

Jan,

Many thanks for your suggestion! The code runs perfectly fine on the test set. 
Applying it to the complete data set, however, results in the following error:


while (TRUE) {

+  lines - readLines(con, encoding='LATIN1')
+  if (length(lines) == 0) break
+  lines - sprintf(%-238s, lines)
+  writeLines(lines, out, useBytes=TRUE) }
Error: cannot allocate vector of size 23.2 Mb


Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and Communications 
DETEC
Federal Roads Office FEDRO
Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.ch
www.astra.admin.ch


-Ursprüngliche Nachricht-
Von: Jan van der Laan [mailto:rh...@eoos.dds.nl]
Gesendet: Freitag, 9. August 2013 10:01
An: Kamenik Christian ASTRA
Betreff: Re: AW: AW: [R] laf_open_fwf

Christian,

It seems some of the lines in your file have additional characters at the end 
causing the line lengths to vary. The only way I could think of is to first add 
whitespace to the shorter lines to make all line lengths equal:

# Add whitespace to the end of the lines to make all lines the same length con - file(testdata.txt, 
rt) out - file(testdata_2.txt, wt) while (TRUE) {
lines - readLines(con, n=1E5)
if (length(lines) == 0) break
lines - sprintf(%-238s, lines)
writeLines(lines, out, useBytes=TRUE) }
close(con)
close(out)


I am then able to read you test file using LaF:

library(LaF)

column_widths - c(3, 28, 4, 30, 28, 6, 3, 30, 10, 26, 25, 30, 2, 5, 5) column_types - 
rep(string, length(column_widths)) column_types[c(1, 3, 7)] - integer

laf - laf_open_fwf(testdata_2.txt, column_types = column_types, 
column_widths = column_widths)


HTH,
Jan







christian.kame...@astra.admin.ch schreef:


Hello Jan

I attached an example. Any help is highly appreciated!

Kind Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and
Communications DETEC Federal Roads Office FEDRO Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.ch
www.astra.admin.ch
-Ursprüngliche Nachricht-
Von: Jan van der Laan [mailto:rh...@eoos.dds.nl]
Gesendet: Donnerstag, 8. August 2013 13:58
An: r-help@r-project.org
Cc: Kamenik Christian ASTRA
Betreff: Re: AW: [R] laf_open_fwf


Without example data it is difficult to give suggestions on how you
might read this file.

Are you sure your file is fixed width? Sometimes columns are neatly
aligned using whitespace (tabs/spaces). In that case you could use
read.table with the default settings.

Another possibility might be that the file is encoded in utf8. I
expect that reading it in assuming another encoding (such as latin1)
would lead to varying line sizes. Although I would expect the lengths
to be larger than the sum of your column widths (as one symbol can be
larger than one byte).

Jan



christian.kame...@astra.admin.ch schreef:


Dear Jan

Many thanks for your help. In fact, all lines are shorter than my
column width...

my.column.widths:   238
range(nchar(lines)):235 237

So, it seems I have an inconsistent file structure...
I guess there is no way to handle this in an automated way?

Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and
Communications DETEC Federal Roads Office FEDRO Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.ch
www.astra.admin.ch
-Ursprüngliche Nachricht-
Von: Jan van der Laan [mailto:rh...@eoos.dds.nl]
Gesendet: Mittwoch, 7. August 2013 20:57
An: r-help@r-project.org
Cc: Kamenik Christian ASTRA
Betreff: Re: [R] laf_open_fwf

Dear Christian,

Well... it shouldn't normally do that. The only way I can currently
think of that might cause this problem is that the file has \r\n\r\n,
which would mean that every line is followed by an empty line.

Another cause might be (although I would not really expect the
results you see) that the sum of your column widths is larger than
the actual with of the line.

You can check your line lengths using:

lines - readLines(my.filename)
nchar(lines)

Each line should have the same length and be equal to (or at least
larger than) sum(my.column.widths)

If this is not the problem

Re: [R] laf_open_fwf

2013-08-08 Thread Jan van der Laan



Without example data it is difficult to give suggestions on how you  
might read this file.


Are you sure your file is fixed width? Sometimes columns are neatly  
aligned using whitespace (tabs/spaces). In that case you could use  
read.table with the default settings.


Another possibility might be that the file is encoded in utf8. I  
expect that reading it in assuming another encoding (such as latin1)  
would lead to varying line sizes. Although I would expect the lengths  
to be larger than the sum of your column widths (as one symbol can be  
larger than one byte).


Jan



christian.kame...@astra.admin.ch schreef:


Dear Jan

Many thanks for your help. In fact, all lines are shorter than my  
column width...


my.column.widths:   238
range(nchar(lines)):235 237

So, it seems I have an inconsistent file structure...
I guess there is no way to handle this in an automated way?

Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and  
Communications DETEC 

Federal Roads Office FEDRO
Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.ch
www.astra.admin.ch
-Ursprüngliche Nachricht-
Von: Jan van der Laan [mailto:rh...@eoos.dds.nl]
Gesendet: Mittwoch, 7. August 2013 20:57
An: r-help@r-project.org
Cc: Kamenik Christian ASTRA
Betreff: Re: [R] laf_open_fwf

Dear Christian,

Well... it shouldn't normally do that. The only way I can currently  
think of that might cause this problem is that the file has  
\r\n\r\n, which would mean that every line is followed by an empty  
line.


Another cause might be (although I would not really expect the  
results you see) that the sum of your column widths is larger than  
the actual with of the line.


You can check your line lengths using:

lines - readLines(my.filename)
nchar(lines)

Each line should have the same length and be equal to (or at least  
larger than) sum(my.column.widths)


If this is not the problem: would it be possible that you send me a  
small part of your file so that I could try to reproduce the  
problem? Or if you cannot share your data: replace the actual values  
with nonsense values.


Regards,
Jan

PS I read your mail by chance as I am not a regular r-help reader.  
When you have specific LaF problems it is better to also cc me  
directly.


On 08/06/2013 12:35 PM, christian.kame...@astra.admin.ch wrote:

Dear all

I was trying the (fairly new) LaF package, and came across the  
following problem:


I opened a connection to a fixed width ASCII file using
laf_open_fwf(my.filename, my.column_types, my.column_widths,
my.column_names)

When looking at the data, it turned out that \n (newline) and \r  
(carriage return) were considered as characters, thus destroying  
the structure in my data (the second column does not include any  
numbers):



my.data[1565:1575,1:3]


MF_FARZ1  Fahrzeugarttext MF_MARKE
1 \n043 Landwirt. Traktor2140
2 \n043 Landwirt. Traktor6206
3 \n001 Personenwagen2026
4 \n001 Personenwagen2026
5\r\n00 1Personenwagen404
6\r\n02 0Gesellschaftswagen   710
7\r\n00 1Personenwagen505
8\r\n00 1Personenwagen505
9\r\n00 1Personenwagen301
10   \r\n00 1Personenwagen553
11   \r\n04 3Landwirt. Traktor257

I am working on Windows 7 32-bit.

Any help would be highly appreciated.

Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and
Communications DETEC Federal Roads Office FEDRO Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.chmailto:christian.kamenik@astra.admin.
ch www.astra.admin.chhttp://www.astra.admin.ch/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How is a file descriptor stored ?

2013-08-07 Thread Jan van der Laan



I don't know how many files you are planning to open, but what you  
also might run into is the maximum number of connections namely 125.  
See ?file.


Jan



mohan.radhakrish...@polarisft.com schreef:


Hi,
I thought that 'R' like java will allow me to store file names
(keys) and file descriptors(values) in a hashmap.


filelist.array - function(n){
  sink(nmon.log)
  cpufile - new.env(hash=T, parent=emptyenv())
  for (i in 1:n) {
key - paste(output, i, .txt, sep = )
assign(key, file( key, w ), cpufile)
  }
sink()
   return (cpufile)
}

But when I try to test it like this there is an exception

[1] Exception is  Error in UseMethod(\close\): no applicable method for
'close' applied to an object of class \c('integer', 'numeric')\\n

test.simple.filelist.array - function() {

execution - tryCatch({
sink(nmon.log)
listoffiles - filelist.array(3)
for (v in ls(listoffiles)) {
print(paste(Map value is [, listoffiles[[v]], ]))
fd - listoffiles[[v]]
close(fd)
}
sink()
}, error = function(err){
print(paste(Exception is ,err))
})
}

I think I am missing some fundamentals.

Thanks,
Mohan






This e-Mail may contain proprietary and confidential information and  
is sent for the intended recipient(s) only.  If by an addressing or  
transmission error this mail has been misdirected to you, you are  
requested to delete this mail immediately. You are also hereby  
notified that any use, any form of reproduction, dissemination,  
copying, disclosure, modification, distribution and/or publication  
of this e-mail message, contents or its attachment other than by its  
intended recipient/s is strictly prohibited.


Visit us at http://www.polarisFT.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] laf_open_fwf

2013-08-07 Thread Jan van der Laan


Dear Christian,

Well... it shouldn't normally do that. The only way I can currently 
think of that might cause this problem is that the file has \r\n\r\n, 
which would mean that every line is followed by an empty line.


Another cause might be (although I would not really expect the results 
you see) that the sum of your column widths is larger than the actual 
with of the line.


You can check your line lengths using:

lines - readLines(my.filename)
nchar(lines)

Each line should have the same length and be equal to (or at least 
larger than) sum(my.column.widths)


If this is not the problem: would it be possible that you send me a 
small part of your file so that I could try to reproduce the problem? Or 
if you cannot share your data: replace the actual values with nonsense 
values.


Regards,
Jan

PS I read your mail by chance as I am not a regular r-help reader. When 
you have specific LaF problems it is better to also cc me directly.


On 08/06/2013 12:35 PM, christian.kame...@astra.admin.ch wrote:

Dear all

I was trying the (fairly new) LaF package, and came across the following 
problem:

I opened a connection to a fixed width ASCII file using
laf_open_fwf(my.filename, my.column_types, my.column_widths, my.column_names)

When looking at the data, it turned out that \n (newline) and \r (carriage 
return) were considered as characters, thus destroying the structure in my data 
(the second column does not include any numbers):


my.data[1565:1575,1:3]


MF_FARZ1  Fahrzeugarttext MF_MARKE
1 \n043 Landwirt. Traktor2140
2 \n043 Landwirt. Traktor6206
3 \n001 Personenwagen2026
4 \n001 Personenwagen2026
5\r\n00 1Personenwagen404
6\r\n02 0Gesellschaftswagen   710
7\r\n00 1Personenwagen505
8\r\n00 1Personenwagen505
9\r\n00 1Personenwagen301
10   \r\n00 1Personenwagen553
11   \r\n04 3Landwirt. Traktor257

I am working on Windows 7 32-bit.

Any help would be highly appreciated.

Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and Communications 
DETEC
Federal Roads Office FEDRO
Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.chmailto:christian.kame...@astra.admin.ch
www.astra.admin.chhttp://www.astra.admin.ch/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.table.ffdf and fixed width files

2013-08-07 Thread Jan van der Laan



What probably is the problem is that read.table.ffdf uses the nrows 
argument to read the file in chunks. However, read.fwf doesn't use a 
nrow argument but a n argument.


One (non tested) solution is to write a wrapper around read.fwf and pass 
this wrapper to read.table.ffwf. Something like:


my.read.fwf - function(file, nrow=-1, ...) {
   read.fwf(file=file, n=nrow, ...)
}

Perhaps you'll also need to wrap some additional arguments.


read.fwf is terribly slow for large fixed width files. I would advise to 
use the LaF package in combination with the laf_to_ffwf function from 
the ffbase package. ... Although judging from your other question you 
already looked at that.


HTH,
Jan



On 08/06/2013 10:47 AM, christian.kame...@astra.admin.ch wrote:

Dear all

I am working on Windows 7 32-bit, and the ff- package is my daily life-saver to 
overcome the inherent memory limitations. Recently, I tried using 
read.table.ffdf to import data from a fixed-width ASCII file (file size: 
1'440'865'015 Bytes) with 6'079'455 lines and 32 variables using the command
read.table.ffdf(file=my.filename, FUN=read.fwf, width=my.format, 
asffdf_args=list(col_args=list(pattern = my.pattern))

The command generates a temporary file, which has 1'629'328'120 Bytes, plus 32 
ff files following my.pattern. The latter 32 files, however, only take up 
136'000 Bytes. And the resulting R object has a dimension of 1000 x 32. To me, 
it seems that read.table.ffdf aborts the data import after 1000 lines, instead 
of importing the entire file.

I tried running read.table.ffdf with different parameter settings, I was 
browsing the help pages and the mailing lists, but I did not find any hint on 
why read.table.ffdf aborts the data import. (Does it really? - The file size of 
the temporary file suggests that all data were read.)

Any help would be highly appreciated

Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and Communications 
DETEC
Federal Roads Office FEDRO
Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.chmailto:christian.kame...@astra.admin.ch
www.astra.admin.chhttp://www.astra.admin.ch/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to use character in C code?

2013-05-17 Thread Jan van der Laan



Characters in R are zero terminated (although I couldn't find that in 
the R extensions manual). So, you could use:



void dealWithCharacter(char **chaine, int *size){
  Rprintf(The string is '%s'\n, chaine[0]);
}

Jan



On 05/10/2013 03:51 PM, cgenolin wrote:

Hi the list,
I include some C code in a R function using .C. The argument is a
character.
I find how to acces to the characters one by one:
--- 8 --- C 
void dealWithCharacter(char **chaine, int *size){
  int i=0;
  for(i=0;i*size;i++){
Rprintf(Le caractere %i est %c\n,i,chaine[0][i]);
  };
}
--- 8 -- R -
ch - zerta
.C(dealWithCharacter,as.character(ch),as.integer(nchar(ch)))
--- 8 --

But is it possible to acces to the full word at once?

Christophe


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] path reference problems in R 3.0.0

2013-04-28 Thread Jan van der Laan




Some colleagues ran into similar problems after migrating to windows 7. 
They could no longer install packages in certain network locations 
because the read only bit was set (which could be unset after which 
windows set it again). Perhaps the following helps:


http://itexpertvoice.com/home/fixing-the-windows-7-read-only-folder-blues/

Jan




On 04/28/2013 08:43 AM, Jeff Newmiller wrote:

On Sun, 28 Apr 2013, Melissa Key wrote:



On Apr 28, 2013, at 2:15 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us
wrote:


a) You seem to be under the impression that running as Administrator
fixes problems... in my experience, it simply multiplies them. [1]


On a windows machine, I don't have any impressions either way.  I find
the entire OS to be counter-intuitive and annoying, and I tend to take
the path of least resistance when it comes to playing with it.  the
point I was trying to make was that it isn't an insufficient
privileges problem.  I understand your point (and the point of the
link you sent), and I'll see how easy I find it to change the install
folders to a user folder.  (right now, I do everything except installs
in Rstudio, so its pretty easy to open a instance of Rgui just for the
install.packages


Actually, the Administrator mode does not override all read and write
privilege settings in the filesystem... they are more complicated than
that. Truly, it is heroin you are dealing with.




b) Your personal library is in an unusual place... it is usually an R
directory under your Documents folder...

C:\Users\melissa\R\win-library\3.0

should normally be

C:\Users\melissa\Documents\R\win-library\3.0

on a Win7 system.  This mis-location may be related to your problems.


This sounds like a potential culprit (along with c).  I've been doing
standard R installs (nothing fancy, I promise), so I'm not sure why
this would be the case.  Is there any easy way to correct this?


I don't know.  It is possible that R is struggling with the hash you
have made of file permissions in your mucking with Administrator mode,
and is trying out default paths that used to work. Duncan may have more
constructive input here than I do.

You might want to look at R for Windows FAQ 2.17, or if you have any
environment variables that contain offensive paths.


Thanks for your response!

Melissa






c) Your errors seem to be suggesting that you have a Windows XP-style
personal library path in your install somewhere:

c:/Docume~1/melissa/R/win-library/3.0

That is definitely not kosher on a Win7 system.

[1] http://www.mail-archive.com/r-help@r-project.org/msg193966.html

On Sat, 27 Apr 2013, Melissa Key wrote:


Hi-



I just upgraded R to 3.0.0 from 2.15.1 (which worked fine).  When I
started
trying to install updated versions of the libraries, I saw the
following
error:




install.packages(lme4)


Installing package into 'c:/Docume~1/melissa/R/win-library/3.0'

(as 'lib' is unspecified)

Warning in install.packages :

path[1]=c:/Docume~1/melissa/R/win-library/3.0: Access is denied

trying URL
'http://cran.case.edu/bin/windows/contrib/3.0/lme4_0.99-2.zip'

Content type 'application/zip' length 1408286 bytes (1.3 Mb)

opened URL

downloaded 1.3 Mb



Error in install.packages :
path[1]=c:\Docume~1\melissa\R\win-library\3.0:
Access is denied



At that point, I noticed that a similar error was occurring when R
loads:

Warning message:
In normalizePath(path.expand(path), winslash, mustWork) :
path[1]=c:/Docume~1/melissa/R/win-library/3.0: Access is denied



The relevant directory does exist, although it keeps getting set to
read-only.  I can't imagine that being a big issue if I'm running
R as an
administrator though

C:\Users\melissa\R\win-library\3.0



Also, I can successfully install packages into other directories
(e.g. when
running as an administrator, this works fine):


install.packages(lme4, lib=C:/Program Files/R/R-3.0.0/library)

trying URL
'http://cran.case.edu/bin/windows/contrib/3.0/lme4_0.99-2.zip'
Content type 'application/zip' length 1408286 bytes (1.3 Mb)
opened URL
downloaded 1.3 Mb

package 'lme4' successfully unpacked and MD5 sums checked

The downloaded binary packages are in

C:\Users\melissa\AppData\Local\Temp\RtmpEXtf89\downloaded_packages



This will allow me to work with most R packages, but not
Bioconductor, due
path to the references in the biocLite source file.



I haven't seen any other messages regarding similar issues, so I'm
not sure
what is going on.  I've tried reinstalling R, (although I didn't try
a fresh
download).



Other relevant details:

This is a personal computer running windows 7.





Any thoughts or ideas of how to get this to work?



Thank you!



Melissa Key




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible

Re: [R] Read big data (3G ) methods ?

2013-04-27 Thread Jan van der Laan

I believe it was already mentioned, but I can recommend the LaF package
(not completely impartial being the maintainer of LaF ;-)

However, the speed differences between packages will not be very large.
Eventually all packages will have to read in 6 GB of data and convert
the text data to numeric data. So the tricks are to

1 only read in columns that you need
2 only read in lines that you need
3 and if you need to read the data more than once convert it to some
binary format first (RDS, ff, sqlite, bigmemory, ...). Most packages
have routines to convert CSV files to the binary format.

With all of the above LaF helps. ffbase contains a routine laf_to_ffdf
to convert to to ff format.

HTH,

Jan

On 04/27/2013 04:34 AM, Kevin Hao wrote:

Thank you very much.

More and more methods are coming. That sounds great!

Thanks,

kevin

On Fri, Apr 26, 2013 at 7:51 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote:

On 13-04-26 3:00 PM, Kevin Hao wrote:

Hi Ye,

Thanks.

That is a good method. have any other methods instead of using database?

If you know the format of the file, you can probably write something in C
(or other language) that is faster than R. Convert your .csv file to a
nice binary format, and R will read it in no time at all.

If writing it in C is hard, then R is probably a better use of your time.
Read the file once, write it out using saveRDS(), and read it in using
readRDS() after that.

In either case, the secret is to do the conversion from ugly character
encoded numbers to beautiful binary numbers just once.

Duncan Murdoch

kevin

On Fri, Apr 26, 2013 at 1:58 PM, Ye Lin ye...@lbl.gov wrote:

Have you think of build a database then then let R read it thru that db

instead of your desktop?

On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao rfans4ch...@gmail.com
wrote:

Hi all scientists,

Recently, I am dealing with big data ( 3G txt or csv format ) in my
desktop (windows 7 - 64 bit version), but I can not read them faster,
thought I search from internet. [define colClasses for read.table,
cobycol
and limma packages I have use them, but it is not so fast].

Could you share your methods to read big data to R faster?

Though this is an odd question, but we need it really.

Any suggest appreciates.

Thank you very much.

kevin

[[alternative HTML version deleted]]

__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Running other programs from R

2013-03-17 Thread Jan van der Laan


Have a look at the system command:

?system


HTH,

Jan


On 03/16/2013 10:09 PM, Sedat Sen wrote:

Dear list,

I want to run a statistical program (using its .exe file)  from R  by
writing a script. I know there are some packages that call WinBUGS, Mplus
etc. form R. I just want to call the .exe extension of this program and run
several times writing a code in R. Thus, I want to have the output inside R.

I just don't know where to start. Does anyone have any idea about that? Is
there a universal package to call application files of other stat programs
using their application files.

p.s. The program I am talking about is an IRT program called Multilog.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] HOw to achieve big vector times big dataframe in R?

2013-03-14 Thread Jan van der Laan



apply((t(as.matrix(b)) * a), 2, sum)

should do what you want.

Why this works; see,  
http://cran.r-project.org/doc/manuals/r-release/R-intro.html#The-recycling-rule and the paragraph before  
that.


Jan



Tammy Ma metal_lical...@live.com schreef:


HI,

I have the following question:

Vector a with lenght 150

A B C D.

dataframe b with dim 908X150

1   1   1   1.
2   2   2   2
3   3   3   3
4   4   4   4


final result I want is the vector with length 908:
A*1+B*1+C*1+D*1+.
A*2+B*2+C*2+D*2+.
A*3+B*3+C*3+D*3+.
A*4+B*4+C*4+D*4+.


because of too large dimension, how can I achieve this in R? Thanks.

Kind Regards,
Tammy


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to transpose it in a fast way?

2013-03-08 Thread Jan van der Laan



You could use the fact that scan reads the data rowwise, and the fact  
that arrays are stored columnwise:


# generate a small example dataset
exampl - array(letters[1:25], dim=c(5,5))
write.table(exampl, file=example.dat, row.names=FALSE. col.names=FALSE,
sep=\t, quote=FALSE)

# and read...
d - scan(example.dat, what=character())
d - array(d, dim=c(5,5))

t(exampl) == d


Although this is probably faster, it doesn't help with the large size.  
You could used the n option of scan to read chunks/blocks and feed  
those to, for example, an ff array (which you ideally have  
preallocated).


HTH,

Jan




peter dalgaard pda...@gmail.com schreef:


On Mar 7, 2013, at 01:18 , Yao He wrote:


Dear all:

I have a big data file of 6 columns and 6 rows like that:

AA AC AA AA ...AT
CC CC CT CT...TC
..
.

I want to transpose it and the output is a new like that
AA CC 
AC CC
AA CT.
AA CT.


AT TC.

The keypoint is  I can't read it into R by read.table() because the
data is too large,so I try that:
c-file(silygenotype.txt,r)
geno_t-list()
repeat{
 line-readLines(c,n=1)
 if (length(line)==0)break  #end of file
 line-unlist(strsplit(line,\t))
geno_t-cbind(geno_t,line)
}
write.table(geno_t,xxx.txt)

It works but it is too slow ,how to optimize it???



As others have pointed out, that's a lot of data!

You seem to have the right idea: If you read the columns line by  
line there is nothing to transpose. A couple of points, though:


- The cbind() is a potential performance hit since it copies the  
list every time around. geno_t - vector(list, 6) and then

geno_t[[i]] - etc

- You might use scan() instead of readLines, strsplit

- Perhaps consider the data type as you seem to be reading strings  
with 16 possible values (I suspect that R already optimizes string  
storage to make this point moot, though.)


--
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] can not read table in dbReadTable

2012-11-02 Thread Jan van der Laan



I suspect it should be

my.data.copy - dbReadTable(con, week42)

(with con instead of tbs as first argument)

Jan




Tammy Ma metal_lical...@live.com schreef:


tbs-dbListTables(con)

tbs

[1] lowend time   week30 week33 week39 week42


my.data.copy - dbReadTable(tbs, week42)

Error in function (classes, fdef, mtable)  :
  unable to find an inherited method for function dbReadTable, for  
signature character, character


I have created tables in db. but there is this error show up when I  
try to read the table in my db.. whats wrong with it??




Thanks a lot.


Tammy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Start R from bash/bat file and end in interactive mode

2012-11-01 Thread Jan van der Laan



I have a r-script (rook.R) that starts a Rook server. To present users  
from having to start R and type in source(rook.R), I want to create  
a bash script and bat file that starts R and sources the script.  
However, to keep the Rook server running R should not close after  
running the script and stay in interactive mode. This proves more  
difficult than expected.


I tried various combinations of commandline parameters and pipes, but  
none of them seem to work:


$R -f rook.R --interactive
Runs and quits

$ cat rook.R | R
Fatal error: you must specify '--save', '--no-save' or '--vanilla'

$cat rook.R | R --no-save
Runs and quits

$R --no-save  rook.R
Runs and quits

$R --no-save --interactive  rook.R
Runs and quits

I would have expected the --interactive parameter to do what I want,  
but it doesn't seem to do anything.


What does work is to create a .Rprofile with sourc(rook.R) in it in  
the directory and then start R (just plain R). However I don't find  
this a very elegant solution. I could of create the .Rprofile file in  
the bash script which is somewhat better, but still not very elegant.  
I end up with the following bash script:


#!/bin/bash
echo source(\rook.R\)  .Rprofile
R

Another, not very elegant, possible solution which I haven't tried is  
to start a while loop at the end of the script with a sleep command in  
it.


Does there exist a better solution?

Regards,

Jan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loop over several variables

2012-11-01 Thread Jan van der Laan



Or

ti - aggregate(dataframename[paste0(y, 1:3)],
by=dataframename[aggregationvar],
sum,na.rm=TRUE)

which gives you all results in one data.frame.

Jan



MacQueen, Don macque...@llnl.gov schreef:


Many ways. Here is one:

### supposing you have y1, y2, and y3 in your data frame

for (i in 1:3) {

  yi - paste('y',i,sep='')

  ti - aggregate(dataframename[[yi]],
   by=data.frame(dataframename$aggregationvar),
   sum,na.rm=TRUE)

  assign( paste('ti',i,sep='') , ti, '.GlobalEnv')
}

Or if you happen to think using assign() is bad form you can store each ti
in a list().

-Don


--
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 11/1/12 4:32 AM, bchr bochrist...@web.de wrote:


Hey everybody,

I am looking for a way to loop commands over several variables in a
dataframe. Basically I want to do something like this:

ti1-aggregate(dataframename$y1,
  by=data.frame(dataframename$aggregationvar),
  sum,na.rm=TRUE)

This works fine as it is but i want to do it for several variables thereby
generating several tix. I tried with a for-loop but the problem was that I
could neither find a way combine my indexnumber i (1 ... x) with the y or
ti
(as for example in Stata I could do by writing y`i')  nor did it work
using
a vector of string variables  (y1, ... yx) and looping over that
(while
using yx also as a name for the target dataframe instead of tix - i
would'nt
mind that).

Preferably I would be looking for a solution that can do without any of
the
apply functions (yes, I know they are more R-like, but frankly, I don't
get
the logic behind them, so for the time being I would prefer another way)

Tanks very much for your help

Bernhard



--
View this message in context:
http://r.789695.n4.nabble.com/Loop-over-several-variables-tp4648112.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] own function: computing time

2012-10-10 Thread Jan van der Laan



Did not see a simple way to make it faster. However, this is a piece of 
code which can be made to run much faster in C. See below.


I don't know if you are familiar with running c-code from R. If not, the 
official documentation is in the R Extensions manual. However, this is 
not the most easy documentation for a first read. If you want to use the 
c-code and have problems getting it running, let me/us know your 
operating system and I/we will try to walk you through it.


HTH,
Jan


=== c-code ===
void foo(double* m, int* pn, int* r) {
  int n = *pn;
  double* pm1 = m;
  double* pm2 = m + n;
  int* pr = r;
  for (int i = 0; i  n; ++i, ++pm1, ++pm2, ++pr) {
*pr = 1;
double* qm1 = m;
double* qm2 = m + n;
for (int j = 0; j  n; ++j, ++qm1, ++qm2) {
  if ((*qm1  *pm1)  (*qm2  *pm2)) {
*pr = 0;
break;
  }
}
  }
}

=== r-code ===
dyn.load(rtest.so)

foo - function(m) {
  n - dim(m)[1]
  .C(foo,
  as.double(m),
  as.integer(n),
  r = logical(n))$r
}

x - runif(32000)
y - runif(32000)
xy - cbind(x,y)

t1 - system.time({
outer - function(z){
!any(x  z[1]  y  z[2])
}
j - apply(xy,1, outer)
})

t2 - system.time({
j2 - foo(xy)
})

=== results ===
 all(j == j2)
[1] TRUE
 t1
   user  system elapsed
 35.462   0.028  35.549
 t2
   user  system elapsed
  0.008   0.000   0.008





On 10/10/2012 12:15 PM, tonja.krue...@web.de wrote:

Hi all,

I wrote a function that actually does what I want it to do, but it tends to be 
very slow for large amount of data. On my computer it takes 5.37 seconds for 
16000 data points and 21.95 seconds for 32000 data points. As my real data 
consists of 1800 data points it would take ages to use the function as it 
is now.
Could someone help me to speed up the calculation?

Thank you, Tonja

system.time({
x - runif(32000)
y - runif(32000)

xy - cbind(x,y)

outer - function(z){
!any(x  z[1]  y  z[2])}
j - apply(xy,1, outer)

plot(x,y)
points(x[j],y[j],col=green)

})

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ffbase, help with %in%

2012-10-02 Thread Jan van der Laan



It doesn't seem possible to index an ff-vector using a logical  
ff-vector. You can use subset (also in ffbase) or first convert 'a' to  
a normal logical vector:



library(ff)
library(ffbase)

data1  - as.ffdf(data.frame(a = letters[1:10], b=1:10))
data2  - as.ffdf(data.frame(a = letters[5:26], b=5:26))

a - data1[[1]] %in% data2$a

subset(data1, a)
data1[a[], ]


HTH,
Jan




Lucas Chaparro lpchaparro...@gmail.com schreef:


Hello to everyone.
I'm trying to use the %in% to match to vectors in ff format.
a-as.ff(data[,1]) %in% fire$fecha


aff (open) logical length=3653 (3653)

   [1][2][3][4][5][6][7][8][3646]
 FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  :  FALSE
[3647] [3648] [3649] [3650] [3651] [3652] [3653]
 FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE


Here you see a part of the data:

data[1:20,]  (just a sample, data has 3653 obs)

fecha juliano altura  UTM.E   UTM.N
1  1990-07-01 182 15 248500 6239500
2  1990-07-02 183 15 248500 6239500
3  1990-07-03 184 15 248500 6239500
4  1990-07-04 185 15 248500 6239500
5  1990-07-05 186 15 248500 6239500
6  1990-07-06 187 15 248500 6239500
7  1990-07-07 188 15 248500 6239500
8  1990-07-08 189 15 248500 6239500
9  1990-07-09 190 15 248500 6239500
10 1990-07-10 191 15 248500 6239500
11 1990-07-11 192 15 248500 6239500
12 1990-07-12 193 15 248500 6239500
13 1990-07-13 194 15 248500 6239500
14 1990-07-14 195 15 248500 6239500
15 1990-07-15 196 15 248500 6239500
16 1990-07-16 197 15 248500 6239500
17 1990-07-17 198 15 248500 6239500
18 1990-07-18 199 15 248500 6239500
19 1990-07-19 200 15 248500 6239500
20 1990-07-20 201 15 248500 6239500


fire$fecha[1:20,] [1] 1984-11-08 1984-11-08 1984-11-09  
1984-11-09 1984-11-09

 [6] 1984-11-10 1984-11-10 1984-11-11 1984-11-11 1984-11-11
[11] 1984-11-11 1984-11-11 1984-11-11 1984-11-12 1984-11-12
[16] 1984-11-13 1984-11-13 1984-11-13 1984-11-14 1984-11-14


to see if a got any match:


table.ff(a)


FALSE  TRUE
 1687  1966 Mensajes de aviso perdidosIn if (useNA == no) c(NA, NaN) :
  la condición tiene longitud  1 y sólo el primer elemento será usado


in a regular data.frame I use data[a,] to extract the rows that a ==
TRUE, but when i do this in a ffdf i get this error:



data[a,]Error: vmode(index) == integer is not TRUE



I'm just learning how to use the ff package so, obviously I'm  
missing something



If any of you knows how to solve this, please teach me.

Thank you so much.


Lucas.

[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] splitting a vector

2012-08-02 Thread Jan van der Laan


I come up with:

runs - function(numbers) {
tmp - diff(c(0, which(diff(numbers) = 0), length(numbers)))
split(numbers, rep(seq_along(tmp), tmp))
}



Can't say it's elegant, but it seems to work


runs(c(1:3, 1:4))

$`1`
[1] 1 2 3

$`2`
[1] 1 2 3 4


runs(c(1,1,1))

$`1`
[1] 1

$`2`
[1] 1

$`3`
[1] 1


runs(c(1:3, 2:3, 3))

$`1`
[1] 1 2 3

$`2`
[1] 2 3

$`3`
[1] 3


HTH,

Jan



capy_bara hettl...@few.vu.nl schreef:


Hello,

I have a vector with positive integer numbers, e.g.


numbers - c(1,2,1,2,3,4,5)


and want to split the vector whenever an element in the vector is smaller or
equal to its predecessor.
Hence I want to obtain two vectors: c(1,2) and c(1,2,3,4,5).
I tried with which(), but it is not so elegant:


numbers[1:(which(numbers=numbers[1])[2]-1)]
numbers[which(numbers=numbers[1])[2]:length(numbers)]


Sure I can do it with a for-loop, but that seems a bit tedious for that
small problem.
Does maybe anyone know a simple and elegant solution for this? I'm searching
for a general solution, since
my vector may change and maybe be split into more than two vectors, e.g.
give five vectors for c(1,1,2,3,4,5,1,2,3,2,3,4,5,6,4,5).

Many thanks in advance,

Hannes







--
View this message in context:  
http://r.789695.n4.nabble.com/splitting-a-vector-tp4638675.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ff package: reading selected columns from csv

2012-07-26 Thread Jan van der Laan



Having had a quick look at the source code for read.table.ffdf, I  
suspect that using 'NULL' in the colClasses argument is not allowed.  
Could you try to see if you can use read.table.ffdf with specifying  
the colClasses for all columns (thereby reading in all columns in the  
file)? If that works, you can be quite sure that indeed that number of  
columns is constant in the file (sometimes a ' or unquoted , can mess  
things up).


Jan




threshold r.kozar...@gmail.com schreef:


*Dear R users, Ive just started using the ff package.

There is a csv file (~4Gb) with 7 columns and 6e+7 rows. I want to read only
column from the file, skipping the first 100 rows.
Below Ive provided different outcomes, which will clarify my problem
*

sessionInfo()

R version 2.14.2 (2012-02-29)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
...

attached base packages:
[1] tools stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] ff_2.2-7  bit_1.1-8

##---
## *I want to read the second column only:*
x.class - c('NULL', 'numeric','NULL','NULL','NULL', 'NULL', 'NULL')

##* The following command works fine:*


read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
colClasses=x.class, nrows=1e3)

ffdf (all open) dim=c(1000,1), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
   PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix
V2   V2   doubledouble FALSE   FALSE
   PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
V2FALSE 11   1
   PhysicalIsOpen
V2   TRUE
ffdf data
  V2
1-0.5412
2-0.5842
3-0.5920
4-0.5451
5-0.5099
6-0.5021
7-0.4943
8-0.5490
:  :
993  -0.4865
994  -0.6584
995  -0.7482
996  -0.8732
997  -0.8303
998  -0.7248
999  -0.5490
1000 -0.4240

*Then I extend nrows by 1, I get warning about number of columns:*


read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
colClasses=x.class, nrows=1001)

ffdf (all open) dim=c(1001,1), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
   PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix
V2   V2   doubledouble FALSE   FALSE
   PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
V2FALSE 11   1
   PhysicalIsOpen
V2   TRUE
ffdf data
  V2
1-0.5412
2-0.5842
3-0.5920
4-0.5451
5-0.5099
6-0.5021
7-0.4943
8-0.5490
:  :
994  -0.6584
995  -0.7482
996  -0.8732
997  -0.8303
998  -0.7248
999  -0.5490
1000 -0.4240
1001 -0.3849
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 1 != length(data) = 7




*Then, going much beyond 1000 brings problems:*

read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
colClasses=x.class, nrows=1e4)

Error in read.table(file = file, header = header, sep = sep, quote = quote,
:
  more columns than column names

*Question is why? The number of columns does not change in the file...

I will appreciate any help..


Best, Robert

*




--
View this message in context:  
http://r.789695.n4.nabble.com/ff-package-reading-selected-columns-from-csv-tp4637794.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ff package: reading selected columns from csv

2012-07-26 Thread Jan van der Laan

Looking at the source code for read.table.ffdf what seems to happen is
that when reading the first block of data by read.table (standard 1000
lines) the specified colClasses are used. In subsequent calls the
types of the columns of the ffdf object are used as colClasses. In
your case the ffdf object had only one column. This probably causes
the error.

What you could try is to use the packages ffbase and LaF (untested):

library(ffbase)
library(LaF)

x.class - c('character', 'numeric','character','character',
'character', 'character', 'character')
laf - laf_open_csv(file=csvfile, header=FALSE,
skip=100, column_types=x.class)
yourdata - laf_to_ffdf(laf, columns=2)

I specify column type 'character' as a type is needed. However, by
using the column=2 argument only the second column is read.

It looks like you have a decent amount of memory, so you could also try

yourdata - laf[,2]

to read the data in as a standard R vector.

HTH,

Jan

threshold r.kozar...@gmail.com schreef:

Dear Jan, thank you for your answer.
I am basically following the code Ive been using with read.table, where
x.class - c('NULL', 'numeric','NULL','NULL','NULL', 'NULL', 'NULL')
has been working fine.

Reading all columns works with me but take much longer than allowed time
constrains.. (460 such sets+ time for processing). The number of columns
remains 7 over the whole data set.

Best, Robert

--
View this message in context:
http://r.789695.n4.nabble.com/ff-package-reading-selected-columns-from-csv-tp4637794p4637896.html

Sent from the R help mailing list archive at Nabble.com.

Re: [R] ff package: reading selected columns from csv

2012-07-26 Thread Jan van der Laan



You probably have a character (which is converted to factor) or factor  
column with a large number of distinct values. All the levels of a  
factor are stored in memory in ff.


Jan


threshold r.kozar...@gmail.com schreef:


*..plus I get the following message after reading the whole set (all 7
columns):*


read.csv.ffdf(file=csvfile, header=FALSE, skip=100, first.rows=1000,
next.rows=1e7, VERBOSE=TRUE)


read.table.ffdf 1..1000 (1000)  csv-read=0.02sec ffdf-write=0.08sec
read.table.ffdf 1001..10001000 (1000)  csv-read=282.16sec
ffdf-write=65.01sec
read.table.ffdf 10001001..20001000 (1000)  csv-read=240.3sec
ffdf-write=63.84sec
read.table.ffdf 20001001..30001000 (1000)  csv-read=213.78sec
ffdf-write=149.2sec
read.table.ffdf 30001001..40001000 (1000)  csv-read=217.36sec
ffdf-write=379.8sec
read.table.ffdf 40001001..50001000 (1000)  csv-read=541.28secError:
cannot allocate vector of size 381.5 Mb
In addition: There were 14 warnings (use warnings() to see them)

warnings()

Warning messages:
1: In match(levels(x), lev) :
  Reached total allocation of 7987Mb: see help(memory.size)
2: In match(levels(x), lev) :
  Reached total allocation of 7987Mb: see help(memory.size)



--
View this message in context:  
http://r.789695.n4.nabble.com/ff-package-reading-selected-columns-from-csv-tp4637794p4637900.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] complexity of operations in R

2012-07-20 Thread Jan van der Laan



See below for the complete mail to which I reply which was not sent to rhelp.


==

emptyexpandlist2-list(ne=0,l=array(NA, dim=c(1, 1000L)),len=1000L)

addexpandlist2-function(x,prev){
  if(prev$len==prev$ne){
n2-prev$len*2
prev - list(ne=prev$ne, l=array(prev$l, dim=c(1, n2)), len=n2)
  }
  prev$ne-prev$ne+1
  prev$l[prev$ne]-x
  return(prev)
}

compressexpandlist2-function(prev){
  return(prev$l[seq.int(prev$ne)])
}

h3-function(dotot){
  v-emptyexpandlist2
  for(i in 1:dotot){
v-addexpandlist2(FALSE,v)
  }
  return(compressexpandlist2(v))
}

===


The problem with your addexpandlist2 is that R in principle works with  
pass by value (certainly when you modify the objects you pass in your  
function as you do with prev). Therefore, when you pass your list to  
addexpendlist2 it makes a copy of the entire list.


You can avoid that by using environments that are passed by reference.  
The code below shows an example of this. If you would like to  
implement something like that I would recommend using reference  
classes (see ?ReferenceClasses) . Personally I don't find the 'messy  
code' that messy. You get used to it.


myvector - function(N = 1000) {
data - vector(list, N)
n- 0

append - function(d) {
n - n + 1
if (n  N) {
N - 2*N
length(data) - N
}
data[[n]] - d
}

length - function() {
return(n)
}

get - function() {
return(data[seq_len(n)])
}

return(list(append=append, length=length, get=get))
}


h4 - function(dotot){
v - myvector()
for(i in seq_len(dotot)) {
v$append(FALSE)
}
return(v$get())
}



system.time(h3(1E5))

   user  system elapsed
 22.846   0.536  23.407

system.time(h4(1E5))

   user  system elapsed
  0.700   0.000   0.702


Jan




Johan Henriksson maho...@areta.org schreef:


On Thu, Jul 19, 2012 at 5:02 PM, Jan van der Laan rh...@eoos.dds.nl wrote:


Johan,

Your 'list' and 'array doubling' code can be written much more efficient.

The following function is faster than your g and easier to read:

g2 - function(dotot) {
  v - list()
  for (i in seq_len(dotot)) {
v[[i]] - FALSE
  }
}



the reason for my highly convoluted code was to simulate a linked list - to
my knowledge a list() in R is not a linked list but a vector. I was
assuming that R would not copy the entire memory into each sublist but
rather keep a pointer. if this had worked, it would also be possible to
create fancier data structures like trees and heaps

http://en.wikipedia.org/wiki/Linked_list
http://en.wikipedia.org/wiki/Heap_%28data_structure%29





In the following line in you array doubling function

  prev$l-rbind(prev$l,matrix(**ncol=1,nrow=nextsize))

you first create a new array: the second half of your new array. Then
rbind creates a new array and has to copy the contents of both into this
new array. The following routine is much faster and almost scales linearly
(see below):

h2 - function(dotot) {
  n - 1000L
  v - array(NA, dim=c(1, n))
  for(i in seq_len(dotot)) {
if (i  n) {
  n - 2*n
  v - array(v, dim=c(1, n))
}
v[, i] - TRUE
  }
  return(v[, seq_len(i)])
}




that's blazingly fast! thanks! I've also learned some nice optimization
tricks, like L, and seq.int, and the resizing with array...

given that this works, as you see, it's rather messy to use on a day-to-day
basis. my goal was next to hide to this in a couple of convenient functions
(emptyexpandlist and addexpandlist) so that this can be reused without
cluttering otherwise fine code. but the overhead of the function call, and
the tuple, seems to kill off the major advantage. that said, I present
these convenience functions below:

==

emptyexpandlist2-list(ne=0,l=array(NA, dim=c(1, 1000L)),len=1000L)

addexpandlist2-function(x,prev){
  if(prev$len==prev$ne){
n2-prev$len*2
prev - list(ne=prev$ne, l=array(prev$l, dim=c(1, n2)), len=n2)
  }
  prev$ne-prev$ne+1
  prev$l[prev$ne]-x
  return(prev)
}

compressexpandlist2-function(prev){
  return(prev$l[seq.int(prev$ne)])
}

h3-function(dotot){
  v-emptyexpandlist2
  for(i in 1:dotot){
v-addexpandlist2(FALSE,v)
  }
  return(compressexpandlist2(v))
}

===

I haven't checked the scaling but take it works as it should. the constant
factor is really bad though:

dotot=5

system.time(f(dotot))

   user  system elapsed
  5.250   0.020   5.279

system.time(h(dotot))

   user  system elapsed
  2.650   0.060   2.713

system.time(h2(dotot))

   user  system elapsed
  0.140   0.000   0.148

system.time(h3(dotot))

   user  system elapsed
  2.480   0.020   2.495

still better than without the optimization though, and pretty much as
readable.
moral of the story: it seems possible to write fast R, but the code won't
look pretty

thanks for the answers!








Storing the data column wise makes it easier to increase the size of the
array.

As a reference for the timing I use the following routine

Re: [R] complexity of operations in R

2012-07-19 Thread Jan van der Laan


Johan,

Your 'list' and 'array doubling' code can be written much more efficient.

The following function is faster than your g and easier to read:

g2 - function(dotot) {
  v - list()
  for (i in seq_len(dotot)) {
v[[i]] - FALSE
  }
}


In the following line in you array doubling function

  prev$l-rbind(prev$l,matrix(ncol=1,nrow=nextsize))

you first create a new array: the second half of your new array. Then 
rbind creates a new array and has to copy the contents of both into this 
new array. The following routine is much faster and almost scales 
linearly (see below):


h2 - function(dotot) {
  n - 1000L
  v - array(NA, dim=c(1, n))
  for(i in seq_len(dotot)) {
if (i  n) {
  n - 2*n
  v - array(v, dim=c(1, n))
}
v[, i] - TRUE
  }
  return(v[, seq_len(i)])
}

Storing the data column wise makes it easier to increase the size of the 
array.


As a reference for the timing I use the following routine in which I 
assume I know the size of the end result.


ref - function(dotot) {
  v - array(NA, dim=c(1, dotot))
  for(i in seq_len(dotot)) {
v[, i] - FALSE
  }
  return(v)
}


Timing the different routines:

dotot - c(10, 100, 200, 500, 1000, 2000, 5000, 1,
2, 5, 10)
times - array(NA, dim=c(length(dotot), 5))

i - 1
for (n in dotot) {
  cat(n, \n)

  times[i,1] - system.time(f(n))[3]
  #times[i,2] - system.time(g(n))[3]
  times[i,2] - system.time(g2(n))[3]
  times[i,3] - system.time(h(n))[3]
  times[i,4] - system.time(h2(n))[3]
  times[i,5] - system.time(ref(n))[3]

  i - i + 1
}


[,1]   [,2]   [,3]  [,4]  [,5]
 [1,]  0.000  0.000  0.000 0.001 0.000
 [2,]  0.001  0.000  0.001 0.000 0.000
 [3,]  0.001  0.000  0.002 0.000 0.000
 [4,]  0.003  0.002  0.007 0.002 0.001
 [5,]  0.009  0.006  0.013 0.002 0.003
 [6,]  0.031  0.020  0.032 0.006 0.004
 [7,]  0.181  0.099  0.098 0.016 0.010
 [8,]  0.722  0.370  0.272 0.032 0.020
 [9,]  2.897  1.502  0.766 0.066 0.044
[10,] 18.681 11.770  4.465 0.162 0.103
[11,] 77.757 57.960 17.912 0.322 0.215


The speed of the array doubling function is comparable to the function 
where we know the size of the end result and scales almost linearly. 
(The code is a bit messy however)


Jan



On 07/17/2012 10:58 PM, Johan Henriksson wrote:

thanks for the link! I should read it through. that said, I didn't find any
good general solution to the problem so here I post some attempts for
general input. maybe someone knows how to speed this up. both my solutions
are theoretically O(n) for creating a list of n elements. The function to
improve is O(n^2) which should suck tremendously - but the slow execution
of R probably blows up the constant factor of the smarter solutions.

Array doubling comes close in speed for large lists but it would be great
if it could be comparable for smaller lists. One hidden cost I see directly
is that allocating a list in R is O(n), not O(1) (or close), since it
always fills it with values. Is there a way around this? I guess by using
C, one could just malloc() and leave the content undefined - but is there
no better way?

thanks,
/Johan



# the function we wish to improve

f-function(dotot){
   v-matrix(ncol=1,nrow=0)
   for(i in 1:dotot){
 v-rbind(v,FALSE)
   }
   return(v)
}

##
# first attempt: linked lists

emptylist - NA

addtolist - function(x,prev){
   return(list(x,prev))
}

g-function(dotot){
   v-emptylist
   for(i in 1:dotot){
 v-addtolist(FALSE,v)
   }
   return(v)
}


# second attempt: array doubling

emptyexpandlist-list(nelem=0,l=matrix(ncol=1,nrow=0))

addexpandlist-function(x,prev){
   if(nrow(prev$l)==prev$nelem){
 nextsize-max(nrow(prev$l),1)
 prev$l-rbind(prev$l,matrix(ncol=1,nrow=nextsize))
   }
   prev$nelem-prev$nelem+1
   prev$l[prev$nelem]-x
   return(prev)
}

compressexpandlist-function(prev){
   return(as.vector(prev$l[1:prev$nelem]))
}

h-function(dotot){
   v-emptyexpandlist
   for(i in 1:dotot){
 v-addexpandlist(FALSE,v)
   }
   return(compressexpandlist(v))
}

#

dotot=10
system.time(f(dotot))
#system.time(g(dotot))
system.time(h(dotot))








On Tue, Jul 17, 2012 at 8:42 PM, Patrick Burns pbu...@pburns.seanet.comwrote:


Johan,

If you don't know 'The R Inferno', it might
help a little.  Circle 2 has an example of
how to efficiently (relatively speaking) grow
an object if you don't know the final length.

http://www.burns-stat.com/**pages/Tutor/R_inferno.pdfhttp://www.burns-stat.com/pages/Tutor/R_inferno.pdf

If you gave a simple example of how your code
looks now and what you want it to do, then you
might get some ideas of how to improve it.


Pat


On 17/07/2012 12:47, Johan Henriksson wrote:


Hello!
I am optimizing my code in R and for this I need to know a bit more about
the internals. It would help tremendously if someone could link me to a
page with O()-complexities of all the operations.

In this

Re: [R] complexity of operations in R

2012-07-19 Thread Jan van der Laan


On 07/19/2012 05:50 PM, Hadley Wickham wrote:

On Thu, Jul 19, 2012 at 8:02 AM, Jan van der Laan rh...@eoos.dds.nl wrote:

The following function is faster than your g and easier to read:

g2 - function(dotot) {
   v - list()
   for (i in seq_len(dotot)) {
 v[[i]] - FALSE
   }
}

Except that you don't need to pre-allocate lists...

I don't think I understand what you mean by that. Could you elaborate?

Jan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] complexity of operations in R

2012-07-19 Thread Jan van der Laan


On 07/19/2012 06:11 PM, Bert Gunter wrote:

Hadley et. al:

Indeed. And using a loop is a poor way to do it anyway.

v - as.list(rep(FALSE,dotot))

is way faster.

-- Bert


I agree that not using a loop is much faster, but I assume that the 
original question is about the situation where the size of the end 
result is not known. The values FALSE in the vector/list are just 
examples of results from a more complex computation. But indeed avoid 
loops where possible (and often this is possible).


Jan


On Thu, Jul 19, 2012 at 8:50 AM, Hadley Wickham had...@rice.edu wrote:

On Thu, Jul 19, 2012 at 8:02 AM, Jan van der Laan rh...@eoos.dds.nl wrote:

Johan,

Your 'list' and 'array doubling' code can be written much more efficient.

The following function is faster than your g and easier to read:

g2 - function(dotot) {
   v - list()
   for (i in seq_len(dotot)) {
 v[[i]] - FALSE
   }
}

Except that you don't need to pre-allocate lists...

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] complexity of operations in R

2012-07-19 Thread Jan van der Laan



When the length of the end result is not known, doubling the length of 
the list is also much faster than increasing the size of the list with 
single items.


f - function(n, preallocate) {
v - if(preallocate) vector(list,n) else list() ;
for(i in seq_len(n)) {
   v[[i]] - i
}
v
}

g - function(n) {
N - 1000
v - vector(list, N)
for(i in seq_len(n)) {
   if (i  N) {
   N - 2 * N
   length(v) - N
   }
   v[[i]] - i
}
v[1:i]
}


 system.time(f(5E4, TRUE))
   user  system elapsed
  0.968   0.000   0.975
 system.time(f(5E4, FALSE))
   user  system elapsed
 52.611   0.136  54.197
 system.time(g(5E4))
   user  system elapsed
  1.388   0.008   1.424


What causes these differences? I can imagine that the time needed for 
memory allocations play a role: multiple small allocations will be 
smaller than one large allocation. But that doesn't explain the 
quadratic growth in time. I would expect that to be linear. When doing 
v[[i]] - i the list isn't copied, right?


Jan



On 07/19/2012 06:21 PM, William Dunlap wrote:

Preallocation of lists does speed things up.  The following shows
time quadratic in size when there is no preallocation and linear
growth when there is, for size in the c. 10^4 to 10^6 region:

f - function(n, preallocate) { v - if(preallocate)vector(list,n) else list() ; 
for(i in seq_len(n)) v[[i]] - i ; v }
identical(f(17,pre=TRUE), f(17,pre=FALSE))

[1] TRUE

system.time(f(n=1e4, preallocate=FALSE))

user  system elapsed
   0.324   0.000   0.326

system.time(f(n=2e4, preallocate=FALSE)) # 2x n, 4x time

user  system elapsed
   1.316   0.012   1.329

system.time(f(n=4e4, preallocate=FALSE)) # ditto

user  system elapsed
   5.720   0.028   5.754


system.time(f(n=1e4, preallocate=TRUE))

user  system elapsed
   0.016   0.000   0.017

system.time(f(n=2e4, preallocate=TRUE)) # 2x n, 2x time

user  system elapsed
   0.032   0.004   0.036

system.time(f(n=4e4, preallocate=TRUE)) # ditto

user  system elapsed
   0.068   0.000   0.069


system.time(f(n=4e5, preallocate=TRUE)) # 10x n, 10x time

user  system elapsed
   0.688   0.000   0.688

Above 10^6 there is some superlinearity

system.time(f(n=4e6, preallocate=TRUE)) # 10x n, 20x time

user  system elapsed
  11.125   0.052  11.181

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Bert Gunter
Sent: Thursday, July 19, 2012 9:11 AM
To: Hadley Wickham
Cc: r-help@r-project.org
Subject: Re: [R] complexity of operations in R

Hadley et. al:

Indeed. And using a loop is a poor way to do it anyway.

v - as.list(rep(FALSE,dotot))

is way faster.

-- Bert

On Thu, Jul 19, 2012 at 8:50 AM, Hadley Wickham had...@rice.edu wrote:

On Thu, Jul 19, 2012 at 8:02 AM, Jan van der Laan rh...@eoos.dds.nl wrote:

Johan,

Your 'list' and 'array doubling' code can be written much more efficient.

The following function is faster than your g and easier to read:

g2 - function(dotot) {
   v - list()
   for (i in seq_len(dotot)) {
 v[[i]] - FALSE
   }
}


Except that you don't need to pre-allocate lists...

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] PPM to BMP converter

2012-05-09 Thread Jan van der Laan



I don't know if any R-packages exist that can do this, but you could  
install imagemagick (http://www.imagemagick.org), which provides  
command line tools for image manipulation and conversion, and call  
these from R using system. Something like:


system(convert yourimage.ppm yourimage.bmp)

HTH,

Jan



ZHANG Yingqi zhangyin...@ivpp.ac.cn schreef:


Dear all,
Several days ago, I posted How to write a bmp file pixel by  
pixel. Instead of bmp, I succeeded in writing a PPM file by using  
the pixmap package. Thanks for the hint generously provided by Uwe  
Ligges.
Now I have a new question. How to convert a PPM file to BMP  
file in R? I know I can do this in photoshop or by some other  
softwares, but I think if I Can do this in R, that would be great!  
Would anyone please give me any hints? just hints, I will dig it  
out! Thanks a lot!



Yingqi


Yingqi ZHANG

Beijing P.O. Box 643, China 100044
Institute of Vertebrate Paleontology and Paleoanthropology (IVPP)
Chinese Academy of Sciences
Tel: +86-10-88369378 Fax: +86-10-68337001
Email: arvico...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to deduplicate records, e.g. using melt() and cast()

2012-05-07 Thread Jan van der Laan


using reshape:

library(reshape)
m - melt(my.df, id.var=pathway, na.rm=T)
cast(m, pathway~variable, sum, fill=NA)

Jan


On 05/07/2012 12:30 PM, Karl Brand wrote:

Dimitris, Petra,

Thank you! aggregate() is my lesson for today, not melt() | cast()

Really appreciate the super fast help,

Karl

On 07/05/12 12:09, Dimitris Rizopoulos wrote:

you could try aggregate(), e.g.,

my.df - data.frame(pathway = c(rep(pw.A, 2), rep(pw.B, 3),
rep(pw.C, 1)),
cond.one = c(0.5, NA, 0.4, NA, NA, NA),
cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
cond.three = c(NA, NA, NA, NA, 0.1, NA))


aggregate(my.df[-1], my.df['pathway'], sum, na.rm = TRUE)

or

sum. - function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
aggregate(my.df[-1], my.df['pathway'], sum.)


I hope it helps.

Best,
Dimitris


On 5/7/2012 11:50 AM, Karl Brand wrote:

Esteemed UseRs,

This must be embarrassingly trivial to achieve with e.g., melt() and
cast(): deduplicating records (pw.X in example) for a given set of
responses (cond.Y in example).

Hopefully the runnable example shows clearly what i have and what i'm
trying to convert it to. But i'm just not getting it, ?cast that is! So
i'd really appreciate some ones patience to clarify this, using the
reshape package, or any other approach.

With sincere thanks in advance,

Karl


## Runnable example
## The data.frame i have:
library(reshape)
my.df - data.frame(pathway = c(rep(pw.A, 2), rep(pw.B, 3),
rep(pw.C, 1)),
cond.one = c(0.5, NA, 0.4, NA, NA, NA),
cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
cond.three = c(NA, NA, NA, NA, 0.1, NA))
my.df
## The data fram i want:
wanted.df - data.frame(pathway = c(pw.A, pw.B, pw.C),
cond.one = c(0.5, 0.4, NA),
cond.two = c(0.6, 0.9, 0.2),
cond.three = c(NA, 0.1, NA))
wanted.df








__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can't import this 4GB DATASET

2012-05-05 Thread Jan van der Laan



Perhaps you could contact the persons that supplied/created the file and 
ask them what the format of the file exactly is. That is probably the 
safest thing to do.


If you are sure that the lines containing only whitespace are 
meaningless, then you could alter the previous code to make a copy of 
the file containing only lines with a length equal to 97 characters (you 
can do this by changing the '!=' to '==').


Since all lines are then of equal length, I suspect you have fixed width 
file. You could open and read this file using the LaF package 
(http://cran.r-project.org/web/packages/LaF/index.html; see the manual 
vignette for more information). In the package ffbase 
(http://cran.r-project.org/web/packages/ffbase/index.html) is a function 
to convert from LaF to ff (laf_to_ffdf). I do not known if packages such 
as rsqlite or bigmemory can import fixed width files.


The warning message indicates that the last line does not end with a new 
line character which could indicate an incomplete file but often doesn't 
mean anything. You could check the last line of the file to be sure.


HTH,

Jan



On 05/05/2012 05:21 AM, iliketurtles wrote:

Your code works!

strangelines.txt was created, and it's a text file with just spacebars ...
Seems like a few thousand lines of complete blanks (not 1 non-blank entry).

One thing, when I ran your code there was an error message;


setwd(C:/Users/admin/Desktop/hons/Thesis)
con- file(dataset.txt, rt)
out- file(strangelines.txt, wt)
# skip first 5 lines
lines- readLines(con, n=5)
# read the rest in blocks of 100.000 lines
while (TRUE) {

+ lines- readLines(con, n=1E5)
+ if (length(lines) == 0) break;
+ strangelines- lines[nchar(lines) != 97]
+ writeLines(strangelines, con=out)
+ }
Warning message:
In readLines(con, n = 1e+05) : incomplete final line found on 'dataset.txt'




I'm really not sure where to go from here. This has gone way out of my
depth.

-


Isaac
Research Assistant
Quantitative Finance Faculty, UTS
--
View this message in context: 
http://r.789695.n4.nabble.com/Can-t-import-this-4GB-DATASET-tp4607862p4610446.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can't import this 4GB DATASET

2012-05-04 Thread Jan van der Laan



read.table imports the company name GREAT FALLS GAS CO as four  
separate columns. I think that needs to be one column. I can imagine  
that further one in your file you will have another company name that  
does not consist of four words which would cause the error you  
observed. From your output it seems that the columns are separated by  
spaces and not tabs (you would see \t in your output of readLines  
otherwise). As there are also spaces in your company names this makes  
it difficult to read the file in correctly.


Perhaps you have a fixed width file (columns are identified not by  
separator but by position in line) in which case all lines should have  
an equal length. You could check for this using the following code  
(not tested so could contain errors):


con - file(dataset.txt, rt)
# skip first 5 lines
lines - readLines(con, n=5)
# initialize vector of line sizes (we'll have a growing vector which is not
# efficient, but we just want to have a a quick scan of the file)
line_sizes - c()
# read the rest in blocks of 100.000 lines
while (TRUE) {
  lines - readLines(con, n=1E5)
  if (length(lines) == 0) break;
  line_sizes - c(line_sizes, nchar(lines))
}
# create a table of line sizes to check if they are all equal
table(lines_sizes)



HTH,
Jan



iliketurtles isaacm...@gmail.com schreef:


Dear Experienced R Practitioners,

I have 4GB .txt data called dataset.txt and have attempted to use *ff,
bigmemory, filehash and sqldf *packages to import it, but have had no
success. The readLines output of this data is:

readLines(dataset.txt,n=20)
 [1]  
 [2] 

 [3]  
 [4]   PERMNO  DATESHRCDCOMNAM
PRC   VOL
 [5] 
 [6]1000101/09/1986 11  GREAT FALLS GAS CO
-5.75000 14160
 [7]1000101/10/1986 11  GREAT FALLS GAS CO
-5.87500 0
 [8]1000101/13/1986 11  GREAT FALLS GAS CO
-5.87500  2805
 [9]1000101/14/1986 11  GREAT FALLS GAS CO
-5.87500  2070
[10]1000101/15/1986 11  GREAT FALLS GAS CO
-6.06250  6000
[11]1000101/16/1986 11  GREAT FALLS GAS CO
-6.25000  1500
[12]1000101/17/1986 11  GREAT FALLS GAS CO
-6.25000  7100
[13]1000101/20/1986 11  GREAT FALLS GAS CO
-6.31250  1700
[14]1000101/21/1986 11  GREAT FALLS GAS CO
-6.18750  4000
[15]1000101/22/1986 11  GREAT FALLS GAS CO
-6.18750  5200
[16]1000101/23/1986 11  GREAT FALLS GAS CO
-6.18750  4100
[17]1000101/24/1986 11  GREAT FALLS GAS CO
-6.18750  1500
[18]1000101/27/1986 11  GREAT FALLS GAS CO
-6.18750  4000
[19]1000101/28/1986 11  GREAT FALLS GAS CO
-6.12500  3500
[20]1000101/29/1986 11  GREAT FALLS GAS CO
-6.06250  4600

This data goes on for a huge number of rows (not sure exactly how many).
Each element in each row is separated by and uneven number of (what seem to
be) spaces (maybe TAB? not sure). Further, there are some rows that are
incomplete, i.e. there's missing elements.

Take the first 29 rows of dataset.txt into a separate data file, let's
call it dataset2.txt.  read.table(dataset2.txt,skip=5) gives the perfect
table that I want to end up with, except I want it with the 4GB data through
bigmemory, ff or filehash.

read.table('dataset2.txt',skip=5)
  V1 V2 V3V4V5  V6 V7  V8V9
1  10001 01/09/1986 11 GREAT FALLS GAS CO -5.7500 14160
2  10001 01/10/1986 11 GREAT FALLS GAS CO -5.8750 0
3  10001 01/13/1986 11 GREAT FALLS GAS CO -5.8750  2805
4  10001 01/14/1986 11 GREAT FALLS GAS CO -5.8750  2070
5  10001 01/15/1986 11 GREAT FALLS GAS CO -6.0625  6000
6  10001 01/16/1986 11 GREAT FALLS GAS CO -6.2500  1500
7  10001 01/17/1986 11 GREAT FALLS GAS CO -6.2500  7100
8  10001 01/20/1986 11 GREAT FALLS GAS CO -6.3125  1700
9  10001 01/21/1986 11 GREAT FALLS GAS CO -6.1875  4000
10 10001 01/22/1986 11 GREAT FALLS GAS CO -6.1875  5200
11 10001 01/23/1986 11 GREAT FALLS GAS CO -6.1875  4100
12 10001 01/24/1986 11 GREAT FALLS GAS CO -6.1875  1500
13 10001 01/27/1986 11 GREAT FALLS GAS CO -6.1875  4000
14 10001 01/28/1986 11 GREAT FALLS GAS CO -6.1250  3500
15 10001 01/29/1986 11 GREAT FALLS GAS CO -6.0625  4600
16 10001 01/30/1986 11 GREAT FALLS GAS CO -6.0625  3830
17 10001 01/31/1986 11 GREAT FALLS GAS CO -6.1250   675
18 10001 02/03/1986 11 GREAT FALLS GAS CO -6.1250  2300
19 10001 02/04/1986 11 GREAT FALLS GAS CO -6.1250  4200
20 10001 02/05/1986 11 GREAT FALLS GAS CO -6.1250  1000
21 10001 02/06/1986 11 GREAT FALLS GAS CO -6.1250  4200
22 10001 02/07/1986 11 GREAT FALLS GAS CO -6.1250  1800
23 10001 02/10/1986 11 GREAT FALLS GAS CO -6.1875   100
24 10001 02/11/1986 11 GREAT FALLS GAS CO -6.3125  1500
25 10001 02/12/1986 11 GREAT FALLS GAS CO -6.2500  2500
26 10001 02/13/1986 11 GREAT FALLS GAS CO -6.2500  1000
27 10001 02/14/1986 11 GREAT FALLS

Re: [R] read.table() vs read.delim() any difference??

2012-05-04 Thread Jan van der Laan



read.delim calls read.table so any differences between the two are  
caused by differences in the default values of some of the parameters.  
Take a look at the help file ?read.table


read.table uses white space as separator; read.delim tabs
read.table uses  and ' as quotes; read.delim just 
etc.

Jan


Rameswara Sashi Kiran Challa scha...@umail.iu.edu schreef:


Hi,

I have a tab seperated file with 206 rows and 30 columns.

I read in the file into R using read.table() function. I checked the dim()
of the data frame created in R, it had only 103 rows (exactly half), 30
columns. Then I tried reading in the file using read.delim() function and
this time the dim() showed to be 206 rows, 30 columns as expected.
Reading the read.table() R-help documentation, I came across count.fields()
function. On using that on the tab seperated file, I got to learn that the
header line alone has 30 fields and rest of the rows have 9 fields. I am
now just wondering why read.delim() function was able to read in the file
correctly and read.table() wasn't able to read the file completely ?

Could anyone please throw some light on this?

Thanks for your valuable time,

Regards
Sashi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Equality of multiple vectors

2012-05-04 Thread Jan van der Laan


or

identical(vec1, vec2)  identical(vec2, vec3)

Jan



Petr Savicky savi...@cs.cas.cz schreef:


On Fri, May 04, 2012 at 12:53:12AM -0700, aaurouss wrote:

Hello,

I'm writing a piece of code where I need to compare multiple same length
vectors.

I've gone through the basic functions like identical() or all(), but they
only work for comparing 2 vectors. From 3 vectors on, it doesn't work .

Example: Assuming
vec1 - c (1,2,3,4,5)
vec2 - c(1,2,3,4,5)
vec3 - c(1,2,3,4,4)

identical (vec1,vec2,vec3) returns TRUE, since the 2 first vectors are
equal. I need a function that returns FALSE if one of the vectors is
different.


Hi.

Try the following.

  length(unique(list(vec1, vec2, vec3))) == 1

  [1] FALSE

  length(unique(list(vec1, vec2, vec1))) == 1

  [1] TRUE

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can't import this 4GB DATASET

2012-05-04 Thread Jan van der Laan



OK, not all, but most lines have the same length. Perhaps you could  
write the lines with a different line size to a separate file to have  
a closer look at those lines. Modifying the previous code (again not  
tested):


con - file(dataset.txt, rt)
out - file(strangelines.txt, wt)
# skip first 5 lines
lines - readLines(con, n=5)
# read the rest in blocks of 100.000 lines
while (TRUE) {
   lines - readLines(con, n=1E5)
   if (length(lines) == 0) break;
   strangelines - lines[nchar(lines) != 97]
   writeLines(strangelines, con=out)
}
close(con)
close(out)

Jan



Quoting iliketurtles isaacm...@gmail.com:


Jan, thank you.


table(line_sizes)

line_sizes
   01   97  256
1430 2860 46869069 1430

-


Isaac
Research Assistant
Quantitative Finance Faculty, UTS
--
View this message in context:   
http://r.789695.n4.nabble.com/Can-t-import-this-4GB-DATASET-tp4607862p4608172.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Handling 8GB .txt file in R?

2012-03-25 Thread Jan van der Laan



What you could try to do is skip the first 5 lines. After that the file 
seems to be 'normal'. With read.table.ffdf you could try something like


# open a connection to the file
con - file('yourfile', 'rt')
# skip first 5 lines
tmp - readLines(con, n=5)
# read the remainder using read.table.ffdf
ffdf - read.table.ffdf(file=con)
# close connection
close(con)

HTH

Jan

On 03/25/2012 06:20 AM, iliketurtles wrote:

Thanks to all the suggestions. To the first individual that replied, I can't
do any stuff with unix or perl. All I know is R.

@KEN:
I'm using Windows 7, 64 bit.

@Steve:
Here's the readLines output.. As we can see, lines 1-3 are empty and line 5
is empty, and there's also empty elements after line 5!.

  [1]  
   [2] 

   [3]  
   [4]   PERMNO  DATETICKERPERMCO PRC
VOLNUMTRDvwretdewretd
   [5] 
   [6]106/01/19867952  .
. . -0.000138  0.001926
   [7]107/01/1986OMFGA   7952-2.56250
1000 .  0.013809  0.011061
   [8]108/01/1986OMFGA   7952-2.5
12800 . -0.020744 -0.005117
   [9]109/01/1986OMFGA   7952-2.5
1400 . -0.011219 -0.011588
  [10]110/01/1986OMFGA   7952-2.5
8500 .  0.83  0.003651
  [11]113/01/1986OMFGA   7952-2.62500
5450 .  0.002749  0.002433

-


Isaac
Research Assistant
Quantitative Finance Faculty, UTS
--
View this message in context: 
http://r.789695.n4.nabble.com/Handling-8GB-txt-file-in-R-tp4500971p4502706.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading big files in chunks-ff package

2012-03-25 Thread Jan van der Laan

Your question is not completely clear. read.csv.ffdf automatically
reads in the data in chunks. You don´t have to do anything for that. You
can specify the size of the chunks using the next.rows option.

Jan

On 03/24/2012 09:29 PM, Mav wrote:

Hello!
A question about reading large CSV files

I need to analyse several files with sizes larger than 3 GB. Those files
have more than 10million rows (and up to 25 million) and 9 columns. Since I
don´t have a large RAM memory, I think that the ff package can really help
me. I am trying to use read.csv.ffdf but I have some questions:

How can I read the files in several chunks…with an automatic way of
calculating the number of rows to include in each chunk? (my problem is that
the files have different number of rows)

For instance…. I have used
read.csv.ffdf(NULL, “file.csv”, sep=|, dec=.,header = T,row.names =
NULL,colClasses = c(rep(integer, 3), rep(integer, 10), rep(integer,
6)))
But with this way I am reading the whole fileI would prefer to read it
in chunksbut I don´t know how to read it in chunks

I have read the ff documentation but I am not good with R!

Thanks in advance!

--
View this message in context:
http://r.789695.n4.nabble.com/Reading-big-files-in-chunks-ff-package-tp4502070p4502070.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] Reading big files in chunks-ff package

2012-03-25 Thread Jan van der Laan

The 'normal' way of doing that with ff is to first convert your csv
file completely to a
ffdf object (which stores its data on disk so shouldn't give any
memory problems). You
can then use the chunk routine (see ?chunk) to divide your data in the
required chunks.

Untested so may contain errors:

ffdf - read.table.ffdf(...)

chnks - chunk(from=1, to=nrow(yourffdf), by=5E6, method='seq')

for (chnk in chnks) {
# read data
data - ffdf[chnk, ]
# do your thing with the data
# clean up
rm(data)
gc()
}

If you want to process your csv file directly in chunks, you could
also have a look at
the LaF package. Especially the process_blocks routine which does
exactly that. The
manual vignette
(http://cran.r-project.org/web/packages/LaF/vignettes/LaF-manual.pdf)

contains some examples how to do that.

Jan

Quoting Mav mastorvar...@gmail.com:

Thank you Jan

My problem is the following:
For instance, I have 2 files with different number of rows (15 million and 8
million of rows each).
I would like to read the first one in chunks of 5 million each. However
between the first and second chunk, I would like to analyze those first 5
million of rows, write the analysis in a new csv and then proceed to read
and analyze the second chunk and so on until the third chunk. With the
second file, I would like to do the same...read the first chunk, analyze it
and continue to read the second and analyze it.

Basically my problem is that I manage to read the filesbut with so many
rows...I cannot do any analyses (even filtering the rows) because of the RAM
restrictions.

Sorry if is still not clear.

Thank you

--
View this message in context:
http://r.789695.n4.nabble.com/Reading-big-files-in-chunks-ff-package-tp4502070p4503642.html

Sent from the R help mailing list archive at Nabble.com.

Re: [R] check for data in a data.frame and return correspondent number

2012-03-14 Thread Jan van der Laan


Marianna,

You can use merge for that (or match). Using merge:

MyData - data.frame(
V1=c(red-j, red-j, red-j, red-j, red-j, red-j),
V4=c(10.5032, 9.3749, 10.2167, 10.8200, 9.2831, 8.2838),
redNew=c(appearance blood-n, appearance ground-n, appearance 
sea-n, appearance sky-n, area chicken-n, area color-n)

  )

MyVector - data.frame(
V1 = c(appearance blood-n, appearance ground-n, appearance 
sea-n, as_adj_as fire-n, as_adj_as carrot-n, appearance sky-n, 
area chicken-n, area color-n)

  )


merge(MyVector, MyData[, c(V4, redNew)] , by.x=V1, by.y=redNew, 
all.x=TRUE)



Btw I saw some spaces in some of your strings (I have removed these in 
the example above). Be aware that the character string   appearance 
ground-n is not equal to appearance ground-n.


HTH
Jan





On 03/14/2012 06:49 PM, mari681 wrote:

Dear R-ers,

still the newbie. With a question about coordinates of a vector appearing or
not in a data.frame.
I have a data.frame (MyData) with 3 columns which looks like this:

V1V4  redNew
  red-j   10.5032  appearance blood-n
  red-j9.3749   appearance ground-n
  red-j   10.2167  appearance sea-n
  red-j   10.8200  appearance sky-n
 red-j9.2831   area chicken-n
 red-j8.2838area color-n

and a MyVector  which includes also (but not only) the data in the 3rd
column:

   appearance blood-n
   appearance ground-n
   appearance sea-n
   as_adj_as fire-n
  as_adj_as carrot-n
  appearance sky-n
 area chicken-n
 area color-n

I would like to get a data.frame of 2 columns where in the first column
there is all MyVector, and in the second column  there is either the
correspondent number found in MyData (shown in column 2) or a 0 if the
entrance is not found.

I've tried some options, among which a loop:

out-for(x in MyVector) if (x %in% MyData) print (MyData[,2])

but obviously doesn't work.
How can I select the correspondent element on column 2 for each x found in
column 3?

Suggestions in general?
Thank you for consideration!!!

Have a nice day,
Marianna


--
View this message in context: 
http://r.789695.n4.nabble.com/check-for-data-in-a-data-frame-and-return-correspondent-number-tp4472634p4472634.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading in 9.6GB .DAT File - OK with 64-bit R?

2012-03-09 Thread Jan van der Laan



You could also have a look at the LaF package which is written to  
handle large text files:


http://cran.r-project.org/web/packages/LaF/index.html

Under the vignettes you'll find a manual.

Note: LaF does not help you to fit 9GB of data in 4GB of memory, but  
it could help you reading your file block by block and filtering it.


Jan






RHelpPlease rrum...@trghcsolutions.com schreef:


Hi Barry,

You could do a similar thing in R by opening a text connection to
your file and reading one line at a time, writing the modified or
selected lines to a new file.

Great!  I'm aware of this existing, but don't know the commands for R.  I
have a variable [560,1] to use to pare down the incoming large data set (I'm
sure of millions of rows).  With other data sets they've been small enough
where I've been able to use the merge function after data has been read in.
Obviously I'm having trouble reading in this large data set in in the first
place.

Any additional help would be great!


--
View this message in context:  
http://r.789695.n4.nabble.com/Reading-in-9-6GB-DAT-File-OK-with-64-bit-R-tp4457220p4458074.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Novice Alert!: odfWeave help!

2012-03-08 Thread Jan van der Laan



Step by step:

1. Create a new document in Open/LibreOffice
2. Copy/paste the following text into the document (as an example)

helloworld=
cat(Hello, world)
@

2. Save the file (e.g. hello.odt)
3. Start R (if not already) shouldn't matter if its plain R/RStudio
4. Change working directory to the folder in which you odt-document resides

setwd(/path/to/your/file)

4. Load odfWeave

library(odfWeave)

5. odfWeave your document. All code-chunks are taken from your  
document, executed in R and the output of the R-commands is inserted  
into the resulting odt-document.


odfWeave(hello.odt, hello_out.odt)

You can now open hello_out.odt (or whatever you named it) and see  
the resulting output.



HTH,

Jan







metatarsals sjcast...@gmail.com schreef:


Hello world,
I'm pretty new to computer code: for example, I consider it a small
victory that I (all by myself!) managed to ssh into the server at my
lab from home and copy a file onto my desktop. Be gentle. I have
primarily used R for running some pretty mid-level statistics
(creating distance matrices, manipulating graphs for pretty figures,
etc).

I'm working through Bolker's Ecological Models and Data in R (which is
a great book for ecologists/life sciences types who want to learn how
to just barely get by in R, with know previous knowledge of R code
presupposed). My advisor wants me to explore odfWeave to stream-line
my notes. This is important because I will inevitably be his TA in his
R stat course, and I will need to be proficient with the software. So
far I have been copy-pasting my codes into a word processor (both open
office and word) and inserting my plots after saving them.

I do not understand how to use odfWeave. The way it was explained to
me initially sounded like it was some kind of Open Office add-on I
could install and my chunks of code would be automatically translated.
Six hours of research later, I realize this is not the case, and that
I need outside help. I'm on a Mac OSx 10.7.3 Lion, I normally use
RStudios, but I have R and R64 and I operate at about, oh, let's say
the level of a 2- or 3-year-old does with language and walking.

So, what exactly does odfWeave do? Do I stick my chunks of code (I
know I need to use  to start and @ to end to bracket off the
sections of code) in the .odf document, then do the file.in/file.out
commands, which then reads the code and pops out a pretty little graph
to my specified parameters? Or do I use the file.in/file.out commands
to paste code I've created in R into an existing .odf doc?

Any baby steps or example code you could give me would warm my little heart.

If the first scenario (write the code into an .odf document, set off
as mentioned above, and then tell R to do stuff to it) is the
scenario, I'd be happy to send an example.

Thanks! I can offer a cute picture of a cat as payment, if desired!


--
View this message in context:  
http://r.789695.n4.nabble.com/Novice-Alert-odfWeave-help-tp4455481p4455481.html

Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Week number from a date

2012-02-22 Thread Jan van der Laan



The suggestion below gives you week numbers with week 1 being the week  
containing the first monday of the year and weeks going from monday to  
sunday. There are other conventions. The ISO convention is that week 1  
is the first week containing at least 4 days in the new year (week 1  
of 2012 starts on 2nd januari; week 1 of 2008 starts on december 29th  
2008).


http://www.r-bloggers.com/iso-week/

gives a function for that type of week numbers (not tested by me).

Jan



Patrick Breheny patrick.breh...@uky.edu schreef:

To give a little more detail, you can convert your character strings  
into POSIX objects, then extract from it virtually anything you  
would want using strftime.  In particular, %W is how you get the  
week number:



dateRange - c(2008-10-01,2008-12-01)
x - as.POSIXlt(dateRange)
strftime(x,format=%W)

[1] 39 48

--Patrick

On 02/22/2012 08:37 AM, Ingmar Visser wrote:

?strptime is a good place to start
hth, Ingmar

On Wed, Feb 22, 2012 at 2:09 PM, arunkumarakpbond...@gmail.com  wrote:


Hi

My data looks like this

startDate=2008-06-01

dateRange =c( 2008-10-01,2008-12-01)
Is there any method to find the week number from the startDate range


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems reading tab-delim files using read.table and read.delim

2012-02-08 Thread Jan van der Laan

I don't know if this completely solves your problem, but here are some
arguments to read.table/read.delim you might try:

row.names=FALSE
fill=TRUE
The details section also suggests using the colClasses argument as the
number of columns is determined from the first 5 rows which may not be
correct.

HTH

Jan

mails mails00...@gmail.com schreef:

Hello,

I used read.xlsx to read in Excel files but for large files it turned out to
be not very efficient.
For that reason I use a programme which writes each sheet in an Excel file
into tab-delim txt files.
After that I tried using read.table and read.delim to read in those txt
files. Unfortunately, the results
are not as expected. To show you what I mean I created a tiny Excel sheet
with some rows and columns and
read it in using read.xlsx. I also used my script to write that sheet to a
tab-delim txt file and read that one it with
read.table and read.delim. Here is the R output:

(test - read.table(Sheet1.txt, header=TRUE, sep=\t))

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:
line 1 did not have 5 elements

(test - read.delim(Sheet1.txt, header=TRUE, sep=\t))

c1 c2 c3 X
123 213 NA NA NA
234 asd NA NA NA

(test - read.xlsx(file.path(data), Sheet1))

c1 c2 c3 NA. NA..1 NA..2
1 123 NA 213NA NA
2 234 asd NA NA

The last output is what I would expect the file to be read in. Columns 4 to
6 do not have any header rows. in R1C4 I added some white spaces as well as
into R2C5 and R2C6 which a read in correctly by the read.xlsx function.

read.table and read.delim seem not to be able to handle such files. Is there
any workaround for that?

Cheers

--
View this message in context:
http://r.789695.n4.nabble.com/Problems-reading-tab-delim-files-using-read-table-and-read-delim-tp4369195p4369195.html

Sent from the R help mailing list archive at Nabble.com.

Re: [R] Not generating line chart

2012-01-19 Thread Jan van der Laan


Devarayalu,

This is FAQ 7.22:

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-do-lattice_002ftrellis-graphics-not-work_003f

use print(qplot())

Regards,
Jan


Sri krishna Devarayalu Balanagu balanagudevaray...@gvkbio.com schreef:


Hi All,


Can you please help me, why this code in not generating line chart?



library(ggplot2)
par(mfrow=c(1,3))

#qplot(TIME1, BASCHGA, data=Orange1, geom= c(point, line),  
colour= ACTTRT)

unique(Orange1$REFID) - refid
for (i in refid)
{
Orange2 - Orange1[i == Orange1$REFID, ]
pdf('PGA.pdf')
qplot(TIME1, BASCHGA, data=Orange2, geom= c(line), colour= ACTTRT)
dev.off()
}
Regards,
Devarayalu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Not generating line chart

2012-01-19 Thread Jan van der Laan


Devarayalu,

Please reply to the list.

And it would have easier if you would have outputted your data using  
dput (in your case dput(Orange1)) so that I and other r-help members  
can just copy the data into R. Not everybody had Excell available (I  
for example haven't). The easier you make it for people to look into  
your problem, the higher the probability that you will get a usefull  
answer. In your case your data is quite small, so using dput is no  
problem.


To answer your question. Except for the probable error

refid - unique(Orange2$REFID)

which should probably be

refid - unique(Orange1$REFID)

and the fact that overwrite your files in the loop, I have no problem  
generating the graphs. On my system the following code runs and  
generates two graphs:



library(ggplot2)

Orange1 - structure(list(REFID = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 9L,
9L, 9L, 9L), ARM = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L,
2L, 2L), SUBARM = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), ACTTRT = structure(c(3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L,
1L, 2L, 2L), .Label = c(ABC, DEF, LCD, Vehicle), class = factor),
TIME1 = c(0L, 2L, 6L, 12L, 0L, 2L, 6L, 12L, 0L, 12L, 0L,
12L), ENDPOINT = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = PGA, class = factor), BASCHGA = c(0L,
-39L, -47L, -31L, 0L, -34L, -25L, -12L, 0L, -30L, 0L, -40L
), STATANAL = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = UNK, class = factor), X = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c(,
Dansinger_2010_20687812), class = factor)), .Names = c(REFID,
ARM, SUBARM, ACTTRT, TIME1, ENDPOINT, BASCHGA, STATANAL,
X), class = data.frame, row.names = c(NA, -12L))

refid - unique(Orange1$REFID)
for (i in refid)
{
  Orange2 - Orange1[i == Orange1$REFID, ]
  pdf(paste('PGA', i, '.pdf', sep=''))
  print(qplot(TIME1, BASCHGA, data=Orange2, geom= c(line), colour= ACTTRT))
  dev.off()
}



Regards,
Jan



Sri krishna Devarayalu Balanagu balanagudevaray...@gvkbio.com schreef:


Jan

Thank you, for your valuable reply. But...

Sorry still I am not getting by using print() with the following  
modified code. I am also attaching the raw datafile.



par(mfrow=c(1,3))

#qplot(TIME1, BASCHGA, data=Orange1, geom= c(point, line),  
colour= ACTTRT)

unique(Orange1$REFID) - refid
for (i in refid)
{
Orange2 - Orange1[i == Orange1$REFID, ]
pdf('PGA.pdf')
print(qplot(TIME1, BASCHGA, data=Orange2, geom= c(line), colour= ACTTRT))
dev.off()
}
Regards
Devarayalu





-Original Message-
From: Jan van der Laan [mailto:rh...@eoos.dds.nl]
Sent: Thursday, January 19, 2012 4:25 PM
To: Sri krishna Devarayalu Balanagu
Cc: r-help@r-project.org
Subject: Re: [R] Not generating line chart

Devarayalu,

This is FAQ 7.22:

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-do-lattice_002ftrellis-graphics-not-work_003f

use print(qplot())

Regards,
Jan


Sri krishna Devarayalu Balanagu balanagudevaray...@gvkbio.com schreef:


Hi All,


Can you please help me, why this code in not generating line chart?



library(ggplot2)
par(mfrow=c(1,3))

#qplot(TIME1, BASCHGA, data=Orange1, geom= c(point, line),
colour= ACTTRT)
unique(Orange1$REFID) - refid
for (i in refid)
{
Orange2 - Orange1[i == Orange1$REFID, ]
pdf('PGA.pdf')
qplot(TIME1, BASCHGA, data=Orange2, geom= c(line), colour= ACTTRT)
dev.off()
}
Regards,
Devarayalu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Not generating line chart

2012-01-19 Thread Jan van der Laan

As I mentioned in my previous reply: do not only email to me  
personally but also include the mailinglist. This gives other members  
also the opportunity to answer your question and lets other members,  
who might have a similar question, also see the answer.


As for your first question: put the pdf(...) and dev.off() outside of  
the loop. I am not an ggplot2 expert, but you could also have a look  
at the facets option of qplot.


As for your second question: have a look at
levels(Orange1$ACTTRT)
and
?factor

Regards,
Jan


Sri krishna Devarayalu Balanagu balanagudevaray...@gvkbio.com schreef:


Jan,

Thank you very much for the solution given. Still I am having one  
more question.


I want both the graphs in single pdf and the legend should contain  
ACTTRT of individual REFID (Only two lines in legend)

Can you solve it?

Devarayalu


-Original Message-
From: Jan van der Laan [mailto:rh...@eoos.dds.nl]
Sent: Thursday, January 19, 2012 5:09 PM
To: Sri krishna Devarayalu Balanagu
Cc: r-help@r-project.org
Subject: Re: [R] Not generating line chart

Devarayalu,

Please reply to the list.

And it would have easier if you would have outputted your data using
dput (in your case dput(Orange1)) so that I and other r-help members
can just copy the data into R. Not everybody had Excell available (I
for example haven't). The easier you make it for people to look into
your problem, the higher the probability that you will get a usefull
answer. In your case your data is quite small, so using dput is no
problem.

To answer your question. Except for the probable error

refid - unique(Orange2$REFID)

which should probably be

refid - unique(Orange1$REFID)

and the fact that overwrite your files in the loop, I have no problem
generating the graphs. On my system the following code runs and
generates two graphs:


library(ggplot2)

Orange1 - structure(list(REFID = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 9L,
9L, 9L, 9L), ARM = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L,
2L, 2L), SUBARM = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), ACTTRT = structure(c(3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L,
1L, 2L, 2L), .Label = c(ABC, DEF, LCD, Vehicle), class = factor),
 TIME1 = c(0L, 2L, 6L, 12L, 0L, 2L, 6L, 12L, 0L, 12L, 0L,
 12L), ENDPOINT = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L), .Label = PGA, class = factor), BASCHGA = c(0L,
 -39L, -47L, -31L, 0L, -34L, -25L, -12L, 0L, -30L, 0L, -40L
 ), STATANAL = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L), .Label = UNK, class = factor), X = structure(c(1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c(,
 Dansinger_2010_20687812), class = factor)), .Names = c(REFID,
ARM, SUBARM, ACTTRT, TIME1, ENDPOINT, BASCHGA, STATANAL,
X), class = data.frame, row.names = c(NA, -12L))

refid - unique(Orange1$REFID)
for (i in refid)
{
   Orange2 - Orange1[i == Orange1$REFID, ]
   pdf(paste('PGA', i, '.pdf', sep=''))
   print(qplot(TIME1, BASCHGA, data=Orange2, geom= c(line),  
colour= ACTTRT))

   dev.off()
}



Regards,
Jan



Sri krishna Devarayalu Balanagu balanagudevaray...@gvkbio.com schreef:


Jan

Thank you, for your valuable reply. But...

Sorry still I am not getting by using print() with the following
modified code. I am also attaching the raw datafile.


par(mfrow=c(1,3))

#qplot(TIME1, BASCHGA, data=Orange1, geom= c(point, line),
colour= ACTTRT)
unique(Orange1$REFID) - refid
for (i in refid)
{
Orange2 - Orange1[i == Orange1$REFID, ]
pdf('PGA.pdf')
print(qplot(TIME1, BASCHGA, data=Orange2, geom= c(line), colour= ACTTRT))
dev.off()
}
Regards
Devarayalu





-Original Message-
From: Jan van der Laan [mailto:rh...@eoos.dds.nl]
Sent: Thursday, January 19, 2012 4:25 PM
To: Sri krishna Devarayalu Balanagu
Cc: r-help@r-project.org
Subject: Re: [R] Not generating line chart

Devarayalu,

This is FAQ 7.22:

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-do-lattice_002ftrellis-graphics-not-work_003f

use print(qplot())

Regards,
Jan


Sri krishna Devarayalu Balanagu balanagudevaray...@gvkbio.com schreef:


Hi All,


Can you please help me, why this code in not generating line chart?



library(ggplot2)
par(mfrow=c(1,3))

#qplot(TIME1, BASCHGA, data=Orange1, geom= c(point, line),
colour= ACTTRT)
unique(Orange1$REFID) - refid
for (i in refid)
{
Orange2 - Orange1[i == Orange1$REFID, ]
pdf('PGA.pdf')
qplot(TIME1, BASCHGA, data=Orange2, geom= c(line), colour= ACTTRT)
dev.off()
}
Regards,
Devarayalu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide  
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

Re: [R] Generating input population for microsimulation

2011-12-14 Thread Jan van der Laan


Emma,

If, as you say, each unit is the same you can just repeat the units to  
obtain the required number of units. For example,



  unit_size - 10
  n_units - 10

  unit_id - rep(1:n_units, each=unit_size)
  pid - rep(1:unit_size, n_units)
  senior  - ifelse(pid = 2, 1, 0)

  pop - data.frame(unit_id, pid, senior)


If you want more flexibility in generating the units, I would first  
generate the units (without the persons) and then generate the persons  
for each unit. In the example below I use the plyr package; you could  
probably also use lapply/sapply, or simply a loop over the units.


  library(plyr)

  generate_unit - function(unit) {
  pid - 1:unit$size
  senior - rep(0, unit$size)
  senior[sample(unit$size, 2)] - 1
  return(data.frame(unit_id=unit$id, pid=pid, senior=senior))
  }

  units - data.frame(id=1:n_units, size=unit_size)

  library(plyr)
  ddply(units, .(id), generate_unit)


HTH,

Jan




Emma Thomas thomas...@yahoo.com schreef:


Hi all,

I've been struggling with some code and was wondering if you all could help.

I am trying to generate a theoretical population of P people who are  
housed within X different units. Each unit follows the same  
structure- 10 people per unit, 8 of whom are junior and two of whom  
are senior. I'd like to create a unit ID and a unique identifier for  
each person (person ID, PID) in the population so that I have a  
matrix that looks like:


 unit_id pid senior
  [1,]  1   1  0
  [2,]  1   2  0
  [3,]  1   3  0
  [4,]  1   4  0
  [5,]  1   5  0
  [6,]  1   6  0
  [7,]  1   7  0
  [8,]  1   8  0
  [9,]  1   9  1
  [10,]    1   10   1
...

I came up with the following code, but am having some trouble  
getting it to populate my matrix the way I'd like.


world - function(units, pop_size, unit_size){
    pid - rep(0,pop_size) #person ID
    senior - rep(0,pop_size) #senior in charge
    unit_id - rep(0,pop_size) #unit ID
   
        for (i in 1:pop_size){
        for (f in 1:units)    { 
        senior[i] = sample(c(1,1,0,0,0,0,0,0,0,0), 1, replace = FALSE)
        pid[i] = sample(c(1:10), 1, replace = FALSE)
        unit_id[i] - f
                }}   
    data - cbind(unit_id, pid, senior)
   
    return(data)
    }

    world(units = 10,pop_size = 100, unit_size = 10) #call the function



The output looks like:
 unit_id pid senior
  [1,]  10   7  0
  [2,]  10   4  0
  [3,]  10  10  0
  [4,]  10   9  1
  [5,]  10  10  0
  [6,]  10   1  1
...

but what I really want is to generate is 10 different units with two  
seniors per unit, and with each person in the population having a  
unique identifier.


I thought a nested for loop was one way to go about creating my data  
set of people and families, but obviously I'm doing something (or  
many things) wrong. Any suggestions on how to fix this? I had been  
focusing on creating a person and assigning them to a unit, but  
perhaps I should create the units and then populate the units with  
people?


Thanks so much in advance.

Emma

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generating input population for microsimulation

2011-12-14 Thread Jan van der Laan


Emma,

That is because generate_unit expects a data.frame with one row and  
columns id and size:


generate_unit(data.frame(id=1, size=10))

Jan




Emma Thomas thomas...@yahoo.com schreef:


Dear Jan,

Thanks for your reply.

The first solution works well for my needs for now, but I have a  
question about the second. If I run your code and then call the  
function:


generate_unit(10)

I get an error that

Error in unit$size : $ operator is invalid for atomic vectors


Did you experience the same thing?

In any case, I will definitely take a look at the plyr package,  
which I'm sure will be useful in the future.


Thanks again!

Emma



- Original Message -
From: Jan van der Laan rh...@eoos.dds.nl
To: r-help@r-project.org r-help@r-project.org
Cc: Emma Thomas thomas...@yahoo.com
Sent: Wednesday, December 14, 2011 6:18 AM
Subject: Re: [R] Generating input population for microsimulation

Emma,

If, as you say, each unit is the same you can just repeat the units  
to obtain the required number of units. For example,



  unit_size - 10
  n_units - 10

  unit_id - rep(1:n_units, each=unit_size)
  pid     - rep(1:unit_size, n_units)
  senior  - ifelse(pid = 2, 1, 0)

  pop - data.frame(unit_id, pid, senior)


If you want more flexibility in generating the units, I would first  
generate the units (without the persons) and then generate the  
persons for each unit. In the example below I use the plyr package;  
you could probably also use lapply/sapply, or simply a loop over the  
units.


  library(plyr)

  generate_unit - function(unit) {
      pid - 1:unit$size
      senior - rep(0, unit$size)
      senior[sample(unit$size, 2)] - 1
      return(data.frame(unit_id=unit$id, pid=pid, senior=senior))
  }

  units - data.frame(id=1:n_units, size=unit_size)

  library(plyr)
  ddply(units, .(id), generate_unit)


HTH,

Jan




Emma Thomas thomas...@yahoo.com schreef:


Hi all,

I've been struggling with some code and was wondering if you all could help.

I am trying to generate a theoretical population of P people who  
are housed within X different units. Each unit follows the same  
structure- 10 people per unit, 8 of whom are junior and two of whom  
are senior. I'd like to create a unit ID and a unique identifier  
for each person (person ID, PID) in the population so that I have a  
matrix that looks like:


 unit_id pid senior
  [1,]  1   1  0
  [2,]  1   2  0
  [3,]  1   3  0
  [4,]  1   4  0
  [5,]  1   5  0
  [6,]  1   6  0
  [7,]  1   7  0
  [8,]  1   8  0
  [9,]  1   9  1
  [10,]    1   10   1
...

I came up with the following code, but am having some trouble  
getting it to populate my matrix the way I'd like.


world - function(units, pop_size, unit_size){
    pid - rep(0,pop_size) #person ID
    senior - rep(0,pop_size) #senior in charge
    unit_id - rep(0,pop_size) #unit ID
   
        for (i in 1:pop_size){
        for (f in 1:units)    { 
        senior[i] = sample(c(1,1,0,0,0,0,0,0,0,0), 1, replace = FALSE)
        pid[i] = sample(c(1:10), 1, replace = FALSE)
        unit_id[i] - f
                }}   
    data - cbind(unit_id, pid, senior)
   
    return(data)
    }

    world(units = 10,pop_size = 100, unit_size = 10) #call the function



The output looks like:
 unit_id pid senior
  [1,]  10   7  0
  [2,]  10   4  0
  [3,]  10  10  0
  [4,]  10   9  1
  [5,]  10  10  0
  [6,]  10   1  1
...

but what I really want is to generate is 10 different units with  
two seniors per unit, and with each person in the population having  
a unique identifier.


I thought a nested for loop was one way to go about creating my  
data set of people and families, but obviously I'm doing something  
(or many things) wrong. Any suggestions on how to fix this? I had  
been focusing on creating a person and assigning them to a unit,  
but perhaps I should create the units and then populate the units  
with people?


Thanks so much in advance.

Emma

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R - Linux_SSH

2011-12-14 Thread Jan van der Laan

What I did in the past (not with R scripts) is to start my jobs using  
at (start the job at a specified time e.g. now) or batch (start the  
job when the cpu drops below ?%)


at now R CMD BATCH yourscript.R

or

batch R CMD BATCH yourscript.R

something like that, you'll have to look at the man pages for at  
and/or batch. You probably need something like atd running. I do not  
know if current linux distributions have that running by default.  
You'll get an email when the job is finished.


HTH
Jan



R CMD BATCH [options] my_script.R [outfile]


Chris Mcowen chrismco...@gmail.com schreef:


Dear List,



I am unsure if this is the correct list to post to, if it isn't I apologise.



I am using SSH to access a Linux version of R on a remote computer as it
offers more memory and processing power. The model will take 1-2 days to
run, I am accessing R through Putty and when I close the connection and open
R again, I am faced with a new session.



As a Linux newbie, I was wondering if anybody here knew how to keep R
running and interactive and return to it on a later date?



Thanks



Chris


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Read TXT file with variable separation

2011-11-29 Thread Jan van der Laan



Raphael,

This looks like fixed width format which you can read with read.fwf.

In fixed width format the columns are not separated by white space (or  
other characters), but are identified by the positition in the file.  
So in your file, for example the first field looks to contained in the  
first 2 columns of your file (the first 2 characters of every line),  
the second field in the next five columns, etc.


Regards,
Jan


Citeren Raphael Saldanha saldanha.plan...@gmail.com:


Hi!

I have to import some TXT files into R, but the separation between the
columns are made with different blank spaces, but each file use the
same separation. Example:

31  104 5 0   11RUA SAO
SEBASTIAO 25



 BAIRRO FILETO
  01
0020033854

The pattern is the same on each file.

There is two sample files attached to this message.

I would like to figure out how to import a single file, and the use
some code to import several files (like this
http://www.ats.ucla.edu/stat/r/code/read_multiple.htm)

When I try read.table, I receive this:

cnefe - read.table(sample1.txt, header=FALSE)
Erro em scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  linha 1 não tinha 17 elementos


Information about my session:

sessionInfo()R version 2.12.1 (2010-12-16)Platform:  
i386-pc-mingw32/i386 (32-bit)

locale:[1] LC_COLLATE=Portuguese_Brazil.1252
LC_CTYPE=Portuguese_Brazil.1252   [3]
LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Brazil.1252
attached base packages:[1] stats     graphics  grDevices utils
datasets  methods   base

--
Atenciosamente,

Raphael Saldanha
saldanha.plan...@gmail.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RV: Reporting a conflict between ADMB and Rtools on Windows systems

2011-11-17 Thread Jan van der Laan



I assume you use a command window to build your packages. One possible 
solution might be to leave out the path variables set by Rtools from 
your global path and to create a separate shortcut to cmd for building 
r-packages where you set your path as needed by R CMD build/check


Something like

cmd /K PATH 
c:\Rtools\bin;c:\Rtools\MinGW\bin;c:\Rtools\MinGW64\bin;C:\Program Files 
(x86)\MiKTeX 2.9\miktex\bin


(I haven't tried this so it might need some tinkering to get it to 
actually work)


HTH

Jan



On 17-11-2011 9:54, Rubén Roa wrote:



De: Rubén Roa
Enviado el: jueves, 17 de noviembre de 2011 9:53
Para: 'us...@admb-project.org'
Asunto: Reporting a conflict between ADMB and Rtools on Windows systems



Hi,



I have to work under Windows, it's a company policy.



I've just found that there is a conflict between tools used to build R packages 
(Rtools) and ADMB due to the need to put Rtools compiler's location in the PATH 
environmental variable to make Rtools work.



On a Windows 7 64bit  with Rtools installed I installed ADMB-IDE latest version 
and although I could translate ADMB code to cpp code I could not build the cpp 
code into an executable via ADMB-IDE's compiler.



On another Windows machine, a Windows Vista 32bits with Rtools installed I also 
installed the latest ADMB-IDE and this time it was not possible to create the 
.obj file on the way to build the executable when building with ADMB-IDE. On 
this machine I also have a previous ADMB version (6.0.1) that I used to run 
from the DOS shell. This ADMB also failed to build the .obj file.



Now, going to PATH, the location info to make Rtools is:

c:\Rtools\bin;c:\Rtools\MinGW\bin;c:\Rtools\MinGW64\bin;C:\Program Files 
(x86)\MiKTeX 2.9\miktex\bin;

If from this list I remove the reference to the compiler

c:\Rtools\MinGW\bin

then ADMB works again.



So beware of this conflict. Suggestion of a solution will be appreciated. 
Meanwhile, I run ADMB code in one computer and build R packages with Rtools in 
another computer.



Best



Ruben

--

Dr. Ruben H. Roa-Ureta

Senior Researcher, AZTI Tecnalia,

Marine Research Division,

Txatxarramendi Ugartea z/g, 48395, Sukarrieta,

Bizkaia, Spain




[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading a specific column of a csv file in a loop

2011-11-15 Thread Jan van der Laan


Yet another solution. This time using the LaF package:

library(LaF)
d-c(1,4,7,8)
P1 - laf_open_csv(M1.csv, column_types=rep(double, 10), skip=1)
P2 - laf_open_csv(M2.csv, column_types=rep(double, 10), skip=1)
for (i in d) {
  M-data.frame(P1[, i],P2[, i])
}

(The skip=1 is needed as laf_open_csv doesn't read headers)

Jan



On 11/08/2011 11:04 AM, Sergio René Araujo Enciso wrote:

Dear all:

I have two larges files with 2000 columns. For each file I am
performing a loop to extract the ith element of each file and create
a data frame with both ith elements in order to perform further
analysis. I am not extracting all the ith elements but only certain
which I am indicating on a vector called d.

See  an example of my  code below

### generate an example for the CSV files, the original files contain
more than 2000 columns, here for the sake of simplicity they have only
10 columns
M1-matrix(rnorm(1000), nrow=100, ncol=10,
dimnames=list(seq(1:100),letters[1:10]))
M2-matrix(rnorm(1000), nrow=100, ncol=10,
dimnames=list(seq(1:100),letters[1:10]))
write.table(M1, file=M1.csv, sep=,)
write.table(M2, file=M2.csv, sep=,)

### the vector containing the i elements to be read
d-c(1,4,7,8)
P1-read.table(M1.csv, header=TRUE)
P2-read.table(M1.csv, header=TRUE)
for (i in d) {
M-data.frame(P1[i],P2[i])
rm(list=setdiff(ls(),d))
}

As the files are quite large, I want to include read.table within
the loop so as it only read the ith element. I know that there is
the option colClasses for which I have to create a vector with zeros
for all the columns I do not want to load. Nonetheless I have no idea
how to make this vector to change in the loop, so as the only element
with no zeros is the ith element following the vector d. Any ideas
how to do this? Or is there anz other approach to load only an
specific element?

best regards,

Sergio René

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] LaF 0.3: fast access to large ASCII files

2011-11-14 Thread Jan van der Laan

The LaF package provides methods for fast access to large ASCII files. 
Currently the following file formats are supported:


* comma separated format (csv) and other separated formats and
* fixed width format.

It is assumed that the files are too large to fit into memory, although 
the package can also be used to efficiently access files that do fit 
into memory.


In order to process files that are too large to fit into memory, methods 
are provided to access and process file blockwise. Furthermore, an 
opened file can be indexed as one would a data.frame. In this way 
subsets. or specific columns can be read into memory. For example, 
assuming that an object laf has been created using one of the functions 
laf_open_csv or laf_open_fwf, the third column from the file can be read 
into memory using:


 col - laf[,3]


The LaF-manual vignette contains a description of all functionality 
provided:


  http://laf-r.googlecode.com/files/LaF-manual_0.3.pdf

The Laf-benchmark vignette compares the performance of LaF to the 
standard R-routines read.table and read.fwf:


  http://laf-r.googlecode.com/files/LaF-benchmark_0.3.pdf

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-Square test and survey results

2011-10-12 Thread Jan van der Laan


George,

Perhaps the site of the RISQ project (Representativity indicators for  
Survey Quality) might be of use: http://www.risq-project.eu/ . They  
also provide R-code to calculate their indicators.


HTH,
Jan



Quoting ghe...@mathnmaps.com:


An organization has asked me to comment on the validity of their
recent all-employee survey.  Survey responses, by geographic region, compared
with the total number of employees in each region, were as follows:


ByRegion

  All.Employees Survey.Respondents
Region_1735142
Region_2500 83
Region_3897 78
Region_4717133
Region_5167 48
Region_6309  0
Region_7806125
Region_8627122
Region_9858177
Region_10   851160
Region_11   336 52
Region_12  1823312
Region_1380  9
Region_14   774121
Region_15   561 24
Region_16   834134

How well does the survey represent the employee population?
Chi-square test says, not very well:


chisq.test(ByRegion)


Pearson's Chi-squared test

data:  ByRegion
X-squared = 163.6869, df = 15, p-value  2.2e-16

By striking three under-represented regions (3,6, and 15), we get
a more reasonable, although still not convincing, result:


chisq.test(ByRegion[setdiff(1:16,c(3,6,15)),])


Pearson's Chi-squared test

data:  ByRegion[setdiff(1:16, c(3, 6, 15)), ]
X-squared = 22.5643, df = 12, p-value = 0.03166

This poses several questions:

1)  Looking at a side-by-side barchart (proportion of responses vs.
proportion of employees, per region), the pattern of survey responses
appears, visually, to match fairly well the pattern of employees.  Is
this a case where we trust the numbers and not the picture?

2) Part of the problem, ironically, is that there were too many responses
to the survey.  If we had only one-tenth the responses, but in the same
proportions by region, the chi-square statistic would look much better,
(though with a warning about possible inaccuracy):

data:  data.frame(ByRegion$All.Employees, 0.1 *   
(ByRegion$Survey.Respondents))

X-squared = 17.5912, df = 15, p-value = 0.2848

Is there a way of reconciling a large response rate with an unrepresentative
response profile?  Or is the bad news that the survey will give very precise
results about a very ill-specified sub-population?

(Of course, I would put in softer terms, like you need to assess the degree
of homogeneity across different regions .)

3) Is Chi-squared really the right measure of how representative is the
survey?

 

Thanks for any help you can give - hope these questions make sense -

George H.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Applying function to only numeric variable (plyr package?)

2011-10-12 Thread Jan van der Laan



plyr isn't necessary in this case. You can use the following:

cols - sapply(df, is.numeric)
df[, cols] - pct(df[,cols])


round (and therefore pct) accepts a data.frame and returns a  
data.frame with the same dimensions. If that hadn't been the case  
colwise might have been of help:


library(plyr)
pct.colwise - colwise(pct)
df[, cols] - pct.colwise(df[,colwise])

HTH,

Jan



Quoting michael.laviole...@dhhs.state.nh.us:



My data frame consists of character variables, factors, and proportions,
something like

c1 - c(A, B, C, C)
c2 - factor(c(1, 1, 2, 2), labels = c(Y,N))
x - c(0.5234, 0.6919, 0.2307, 0.1160)
y - c(0.9251, 0.7616, 0.3624, 0.4462)
df - data.frame(c1, c2, x, y)
pct - function(x) round(100*x, 1)

I want to apply the pct function to only the numeric variables so that the
proportions are computed to percentages, and retain all the columns:

  c1 c2   x1   x2
1  A  Y 52.3 92.5
2  B  Y 69.2 76.2
3  C  N 23.1 36.2
4  C  N 11.6 44.6

I've been approaching it with the ddply and colwise functions from the plyr
package, but in that case each I need each row to be its own group and
retain all columns. Am I on the right track? If not, what's the best way to
do this?

Thanks in advance,
M. L.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with .C

2011-10-06 Thread Jan van der Laan

An obvious reason might be that your second argument should be a  
pointer to int.


As others have mentioned, you might want to have a look at Rccp and/or  
inline. The documentation is good and I find it much easier to work  
with.


For example, your example could be written as:

library(Rcpp)
library(inline)

test - cxxfunction(signature(x = numeric ) , '
Rcpp::NumericVector v(x);
Rcpp::NumericVector result(v.length());
for (int i = 0; i  v.length(); ++i) {
result[i] = v[i] + i;
}
return(result);
', plugin = Rcpp )


HTH,

Jan


Quoting Grigory Alexandrovich alexandrov...@mathematik.uni-marburg.de:


Hello,

first thank you for your answers.

I did not read the whole pdf Writing R Extension, but I read this
strongly shortened introduction to this subject:

http://www.math.kit.edu/stoch/~lindner/media/.c.call%20extensions.pdf

I get the same error with this C-function:

void test(double * b, int l)
{
 int i;
 for(i=0; i  l ; i++) b[i] +=i;
}



I call it from R like this:

parameter = c(0,0,1,1,1,0,1.5,0.7,0,1.2,0.3);
.C(test, as.double(parameter), as.integer(11))

The programm crashes even in this simple case.
Where can be the error?

Thanks
Grigory Alexandrovich







Answer 1
Without knowing that C code, we cannot know. Have you read Writing   
R Extensions carefully? I.e. take care with memory allocation and   
printing as mentioned in the manual.


Uwe Ligges



Answer 2
This looks like a classic case of not reading the manual, and then   
compounding it by not reading the posting guide. The manual would   
be the Writing R Extensions pdf that comes with R or you can   
google it. The posting guide is referenced at the bottom of this   
and every other posting on this mailing list.
There are nearly an infinite variety of errors that can lead to a   
crash, so it is really unreasonable of you to pose this question   
this way and expect constructive assistance.

---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---
Sent from my phone. Please excuse my brevity.



Answer 3


It's impossible to say, with such minimal information, but a reasonable
guess is that there is a problem with the declaration of x and y in
foo.c.  These would (I think) need to be declared as double *, not double,
when foo is called from .C().

   cheers,

   Rolf Turner



Answer 4


Hi,

As other have said, it's very difficult to help you without an example
+ code to know what you are talking about.

That having been said, it seems as if you are just getting your feet
wet in this R -- C bridge, and I'd recommend you checkout the Rcpp
and inline package to help make your life a lot easier ...

-steve










On 04.10.2011 14:04, Grigory Alexandrovich wrote:

Hello,

I wrote a function in C, which works fine if called from the
main-function in C.

But as soon as I try to call this function from R like .C('foo',
as.double(x), as.integer(y)), the programm crashes.

I created a dll with the cmd command R --arch x64 CMD SHLIB foo.c and
loaded it into R with dyn.load().

What can be the cause of such behaviour?
Again, the C-funcion itself works, but not if called from R.

Thanks
Grigory Alexandrovich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with .C

2011-10-06 Thread Jan van der Laan


Quoting Uwe Ligges lig...@statistik.tu-dortmund.de:




I don't agree that it's overkill -- you get to sidestep the whole `R
CMD SHLIB ...` and `dyn.load` dance this way while you experiment with
C(++) code 'live using the inline package.



You need two additional packages now where you have to rely on the fact
those are available. Moreover, you have to get used to that syntax, and
part of it seems to be C++ now? At least I do not know why the above
should work at all, while I know the simple C function does.


OK, I agree that switching to Rcpp/C++ might be a bit of overkill in
this example although in a lot of other example I find the Rcpp syntax
much more readable than the c-code when dealing with .Call .

The example could also have been writen in C using inline removing the
need of Rcpp and looking more like the original example:

library(inline)

test - cfunction(signature(b = numeric, l = integer) , '
 for(int i=0; i  *l; i++) b[i] += i;
 ', convention=.C)

I find that the advantage of using inline (especially in case of
simple functions like this) is that
1. I no long need to compile and load the shared library manually,
which can sometimes be frustrating when windows locks the dll.
2. Inline performs typechecking and casts variables to the right type.  
You can now type test(1:10,10) without needing as.numeric or  
as.integer. Reducing the amount of r code and the probabiliry of  
screwing things up by passing the wrong type.



Jan




Uwe



It's really handy.


Just make the original source


void test(double *b, int *l)
{
int i;
for(i=0; i  *l ; i++) b[i] += i;
}


which you would have know after reading the Wriiting R Extensions manual.


I agree that this step is unavoidable no matter which avenue (Rcpp or
otherwise) one decides to take.

-steve



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data import

2011-09-26 Thread Jan van der Laan

You can with the routines in the memisc library. You can open a file 
using spss.system.file and then import a subset using subset. Look in 
the help pages of spss.system.file for examples.


HTH

Jan


On 09/25/2011 11:56 PM, sassorauk wrote:

Is it possible to import only certain variables from a SPSS file.

I know that read.spss in the foreign library will bring the data into R but
can I choose to important only chosen variables from the SPSS dataset to R?

Thanks for your help.

R

--
View this message in context: 
http://r.789695.n4.nabble.com/Data-import-tp3842196p3842196.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help on write.csv

2011-09-22 Thread Jan van der Laan

Rowwise is easy. The example code I gave does this: it appends the new 
data /below/ the old. I'll repeat the example below:


con - file(d:test2.csv, wt)
write.table(data, file=con, sep=;, dec=,, row.names=FALSE, 
col.names=TRUE)
write.table(data, file=con, sep=;, dec=,, row.names=FALSE, 
col.names=FALSE, append=TRUE)

close(con)

Or do you mean columnwise where you append columns? This would be very 
difficult in CSV. If you would like to do this you might have a look at 
the various options for exporting to Excel directly.  See for example 
http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows . I have 
no experience in this.


Regards,
Jan

PS I am sorry for my previous triple post. I had a little fight with my 
webmail client.



On 09/22/2011 06:14 AM, Ashish Kumar wrote:


IS there a way we can append row wise, so that it all stacks up 
horizontally, the way you do it in xlswrite in matlab, where you can 
even specify the cell number from where you want to write.


-Ashish

*From:*R. Michael Weylandt [mailto:michael.weyla...@gmail.com]
*Sent:* Thursday, September 22, 2011 12:03 AM
*To:* Jan van der Laan
*Cc:* r-help@r-project.org; ashish.ku...@esteeadvisors.com
*Subject:* Re: [R] R help on write.csv

Oh darn, I had that line and then when I copied it to gmail I thought 
I'd be all slick and clean up my code: oh well...just not my day/thread...


It's possible to work around the repeated headers business (change to 
something like Call$col.names - !append) but yeah, at this point 
I'm thinking its perhaps better practice to direct the OP to the 
various connection methods: sink() is nice, but he'll probably have to 
do something to convert his object to a CSV like string before printing:


apply(OBJ, 1, paste, sep=,)

Michael Weylandt

On Wed, Sep 21, 2011 at 11:20 AM, Jan van der Laan e...@dds.nl 
mailto:e...@dds.nl wrote:


Michael,

You example doesn't seem to work. Append isn't passed on to the 
write.table call. You will need to add a


 Call$append- append

to the function. And even then there will be a problem with the 
headers that are repeated when appending.



An easier solution is to use write.table directly (I am using 
Dutch/European csv format):


data - data.frame(a=1:10, b=1, c=letters[1:10])
write.table(data, file=test.csv, sep=;, dec=,, row.names=FALSE, 
col.names=TRUE)
write.table(data, file=test.csv, sep=;, dec=,, row.names=FALSE, 
col.names=FALSE,

append=TRUE)


When first openening a file connection and passing that to write.csv 
or write.table data is also appended. The problem with write.csv is 
that writing the column names can not be suppressed which will result 
in repeated column names:


con - file(d:test2.csv, wt)
write.csv2(data, file=con, row.names=FALSE)
write.csv2(data, file=con, row.names=FALSE)
close(con)

So one will still have to use write.table to avoid this:

con - file(d:test2.csv, wt)
write.table(data, file=con, sep=;, dec=,, row.names=FALSE, 
col.names=TRUE)
write.table(data, file=con, sep=;, dec=,, row.names=FALSE, 
col.names=FALSE,

append=TRUE)
close(con)

Using a file connection is probably also more efficient when doing a 
large number of appends.


Jan







Quoting R. Michael Weylandt michael.weyla...@gmail.com 
mailto:michael.weyla...@gmail.com:


Touche -- perhaps we could make one though?

write.csv.append - function(..., append = TRUE)
{
   Call - match.call(expand.dots = TRUE)
   for (argname in c(col.names, sep, dec, qmethod)) if
(!is.null(Call[[argname]]))
   warning(gettextf(attempt to set '%s' ignored, argname),
   domain = NA)
   rn - eval.parent(Call$row.names)
   Call$col.names - if (is.logical(rn)  !rn)
   TRUE
   else NA
   Call$sep - ,
   Call$dec - .
   Call$qmethod - double
   Call[[1L]] - as.name http://as.name(write.table)
   eval.parent(Call)
}
write.csv.append(1:5,test.csv, append = FALSE)
write.csv.append(1:15, test.csv)

Output seems a little sloppy, but might work for the OP.

Michael Weylandt

On Wed, Sep 21, 2011 at 9:03 AM, Ivan Calandra
ivan.calan...@uni-hamburg.de mailto:ivan.calan...@uni-hamburg.de

wrote:

I don't think there is an append argument to write.csv()
(well, actually
there is one, but set to FALSE).
There is however one to write.table()
Ivan

Le 9/21/2011 14:54, R. Michael Weylandt
michael.weyla...@gmail.com mailto:michael.weyla...@gmail.com a
écrit :

 The append argument of write.csv()?


Michael

On Sep 21, 2011, at 8:01 AM, Ashish Kumarashish.kumar@**

esteeadvisors.com http://esteeadvisors.com
ashish.ku...@esteeadvisors.com
mailto:ashish.ku...@esteeadvisors.com  wrote:

 Hi,




I wanted to write the data created using R  on existing
csv file. However
everytime I use write.csv, it overwrites the values

Re: [R] R help on write.csv

2011-09-21 Thread Jan van der Laan



Michael,

You example doesn't seem to work. Append isn't passed on to the  
write.table call. You will need to add a


 Call$append- append

to the function. And even then there will be a problem with the  
headers that are repeated when appending.



An easier solution is to use write.table directly (I am using  
Dutch/European csv format):


data - data.frame(a=1:10, b=1, c=letters[1:10])
write.table(data, file=test.csv, sep=;, dec=,, row.names=FALSE,  
col.names=TRUE)
write.table(data, file=test.csv, sep=;, dec=,, row.names=FALSE,  
col.names=FALSE, append=TRUE)



When first openening a file connection and passing that to write.csv  
or write.table data is also appended. The problem with write.csv is  
that writing the column names can not be suppressed which will result  
in repeated column names:


con - file(d:\\test2.csv, wt)
write.csv2(data, file=con, row.names=FALSE)
write.csv2(data, file=con, row.names=FALSE)
close(con)

So one will still have to use write.table to avoid this:

con - file(d:\\test2.csv, wt)
write.table(data, file=con, sep=;, dec=,, row.names=FALSE, col.names=TRUE)
write.table(data, file=con, sep=;, dec=,, row.names=FALSE,  
col.names=FALSE, append=TRUE)

close(con)

Using a file connection is probably also more efficient when doing a  
large number of appends.


Jan






Quoting R. Michael Weylandt michael.weyla...@gmail.com:


Touche -- perhaps we could make one though?

write.csv.append - function(..., append = TRUE)
{
Call - match.call(expand.dots = TRUE)
for (argname in c(col.names, sep, dec, qmethod)) if
(!is.null(Call[[argname]]))
warning(gettextf(attempt to set '%s' ignored, argname),
domain = NA)
rn - eval.parent(Call$row.names)
Call$col.names - if (is.logical(rn)  !rn)
TRUE
else NA
Call$sep - ,
Call$dec - .
Call$qmethod - double
Call[[1L]] - as.name(write.table)
eval.parent(Call)
}
write.csv.append(1:5,test.csv, append = FALSE)
write.csv.append(1:15, test.csv)

Output seems a little sloppy, but might work for the OP.

Michael Weylandt

On Wed, Sep 21, 2011 at 9:03 AM, Ivan Calandra ivan.calan...@uni-hamburg.de

wrote:



I don't think there is an append argument to write.csv() (well, actually
there is one, but set to FALSE).
There is however one to write.table()
Ivan

Le 9/21/2011 14:54, R. Michael Weylandt michael.weyla...@gmail.com a
écrit :

 The append argument of write.csv()?


Michael

On Sep 21, 2011, at 8:01 AM, Ashish Kumarashish.kumar@**
esteeadvisors.com ashish.ku...@esteeadvisors.com  wrote:

 Hi,




I wanted to write the data created using R  on existing csv file. However
everytime I use write.csv, it overwrites the values already there in the
existing csv file. Any workaround on this.



Thanks for your help



Ashish Kumar



Estee Advisors Pvt. Ltd.

Email: ashish.ku...@esteeadvisors.com

Cell: +91-9654072144

Direct: +91-124-4637-713




   [[alternative HTML version deleted]]

__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Dept. Mammalogy
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-**hamburg.de/mammals/eng/1525_8_**1.phphttp://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php


__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help on write.csv

2011-09-21 Thread Jan van der Laan


Michael,

You example doesn't seem to work. Append isn't passed on to the  
write.table call. You

will need to add a

 Call$append- append

to the function. And even then there will be a problem with the  
headers that are repeated

when appending.


An easier solution is to use write.table directly (I am using  
Dutch/European csv format):


data - data.frame(a=1:10, b=1, c=letters[1:10])
write.table(data, file=test.csv, sep=;, dec=,, row.names=FALSE,  
col.names=TRUE)
write.table(data, file=test.csv, sep=;, dec=,, row.names=FALSE,  
col.names=FALSE,

append=TRUE)


When first openening a file connection and passing that to write.csv  
or write.table data
is also appended. The problem with write.csv is that writing the  
column names can not be

suppressed which will result in repeated column names:

con - file(d:test2.csv, wt)
write.csv2(data, file=con, row.names=FALSE)
write.csv2(data, file=con, row.names=FALSE)
close(con)

So one will still have to use write.table to avoid this:

con - file(d:test2.csv, wt)
write.table(data, file=con, sep=;, dec=,, row.names=FALSE, col.names=TRUE)
write.table(data, file=con, sep=;, dec=,, row.names=FALSE,  
col.names=FALSE,

append=TRUE)
close(con)

Using a file connection is probably also more efficient when doing a  
large number of

appends.

Jan





Quoting R. Michael Weylandt michael.weyla...@gmail.com:


Touche -- perhaps we could make one though?

write.csv.append - function(..., append = TRUE)
{
Call - match.call(expand.dots = TRUE)
for (argname in c(col.names, sep, dec, qmethod)) if
(!is.null(Call[[argname]]))
warning(gettextf(attempt to set '%s' ignored, argname),
domain = NA)
rn - eval.parent(Call$row.names)
Call$col.names - if (is.logical(rn)  !rn)
TRUE
else NA
Call$sep - ,
Call$dec - .
Call$qmethod - double
Call[[1L]] - as.name(write.table)
eval.parent(Call)
}
write.csv.append(1:5,test.csv, append = FALSE)
write.csv.append(1:15, test.csv)

Output seems a little sloppy, but might work for the OP.

Michael Weylandt

On Wed, Sep 21, 2011 at 9:03 AM, Ivan Calandra ivan.calan...@uni-hamburg.de

wrote:



I don't think there is an append argument to write.csv() (well, actually
there is one, but set to FALSE).
There is however one to write.table()
Ivan

Le 9/21/2011 14:54, R. Michael Weylandt michael.weyla...@gmail.com a
écrit :

 The append argument of write.csv()?


Michael

On Sep 21, 2011, at 8:01 AM, Ashish Kumarashish.kumar@**
esteeadvisors.com ashish.ku...@esteeadvisors.com  wrote:

 Hi,




I wanted to write the data created using R  on existing csv file. However
everytime I use write.csv, it overwrites the values already there in the
existing csv file. Any workaround on this.



Thanks for your help



Ashish Kumar



Estee Advisors Pvt. Ltd.

Email: ashish.ku...@esteeadvisors.com

Cell: +91-9654072144

Direct: +91-124-4637-713




   [[alternative HTML version deleted]]

__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Dept. Mammalogy
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-**hamburg.de/mammals/eng/1525_8_**1.phphttp://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php


__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help on write.csv

2011-09-21 Thread Jan van der Laan


Michael,

You example doesn't seem to work. Append isn't passed on to the  
write.table call. You will need to add a


 Call$append- append

to the function. And even then there will be a problem with the  
headers that are repeated when appending.



An easier solution is to use write.table directly (I am using  
Dutch/European csv format):


data - data.frame(a=1:10, b=1, c=letters[1:10])
write.table(data, file=test.csv, sep=;, dec=,, row.names=FALSE,  
col.names=TRUE)
write.table(data, file=test.csv, sep=;, dec=,, row.names=FALSE,  
col.names=FALSE,

append=TRUE)


When first openening a file connection and passing that to write.csv  
or write.table data is also appended. The problem with write.csv is  
that writing the column names can not be suppressed which will result  
in repeated column names:


con - file(d:test2.csv, wt)
write.csv2(data, file=con, row.names=FALSE)
write.csv2(data, file=con, row.names=FALSE)
close(con)

So one will still have to use write.table to avoid this:

con - file(d:test2.csv, wt)
write.table(data, file=con, sep=;, dec=,, row.names=FALSE, col.names=TRUE)
write.table(data, file=con, sep=;, dec=,, row.names=FALSE,  
col.names=FALSE,

append=TRUE)
close(con)

Using a file connection is probably also more efficient when doing a  
large number of appends.


Jan








Quoting R. Michael Weylandt michael.weyla...@gmail.com:


Touche -- perhaps we could make one though?

write.csv.append - function(..., append = TRUE)
{
Call - match.call(expand.dots = TRUE)
for (argname in c(col.names, sep, dec, qmethod)) if
(!is.null(Call[[argname]]))
warning(gettextf(attempt to set '%s' ignored, argname),
domain = NA)
rn - eval.parent(Call$row.names)
Call$col.names - if (is.logical(rn)  !rn)
TRUE
else NA
Call$sep - ,
Call$dec - .
Call$qmethod - double
Call[[1L]] - as.name(write.table)
eval.parent(Call)
}
write.csv.append(1:5,test.csv, append = FALSE)
write.csv.append(1:15, test.csv)

Output seems a little sloppy, but might work for the OP.

Michael Weylandt

On Wed, Sep 21, 2011 at 9:03 AM, Ivan Calandra ivan.calan...@uni-hamburg.de

wrote:



I don't think there is an append argument to write.csv() (well, actually
there is one, but set to FALSE).
There is however one to write.table()
Ivan

Le 9/21/2011 14:54, R. Michael Weylandt michael.weyla...@gmail.com a
écrit :

 The append argument of write.csv()?


Michael

On Sep 21, 2011, at 8:01 AM, Ashish Kumarashish.kumar@**
esteeadvisors.com ashish.ku...@esteeadvisors.com  wrote:

 Hi,




I wanted to write the data created using R  on existing csv file. However
everytime I use write.csv, it overwrites the values already there in the
existing csv file. Any workaround on this.



Thanks for your help



Ashish Kumar



Estee Advisors Pvt. Ltd.

Email: ashish.ku...@esteeadvisors.com

Cell: +91-9654072144

Direct: +91-124-4637-713




   [[alternative HTML version deleted]]

__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Dept. Mammalogy
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-**hamburg.de/mammals/eng/1525_8_**1.phphttp://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php


__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Where to put tryCatch or similar in a very big for loop

2011-09-16 Thread Jan van der Laan


Laura,

Perhaps the following example helps:

nbstr - 100
result - numeric(nbstr)
for (i in seq_len(nbstr)) {
  # set the default value for when the current bootstrap fails
  result[i] - NA
  try({
# estimate your cox model here
if (runif(1)  0.1) stop(ERROR)
result[i] - i
  }, silent=TRUE)
}

Regards,
Jan




Quoting Bonnett, Laura l.j.bonn...@liverpool.ac.uk:


Hi,

The simulation occasionally generates either a rare event meaning   
that the Cox model is not appropriate or it generates a covariate   
with most responses being the same which means that the Cox model   
cannot be fit.


At bootstrap sample number 10, the variable c11 is considered   
singular by model cox1.


Thanks,
Laura

-Original Message-
From: Ken [mailto:vicvoncas...@gmail.com]
Sent: 15 September 2011 21:43
To: Bonnett, Laura
Cc: Steve Lianoglou; r-help@r-project.org
Subject: Re: [R] Where to put tryCatch or similar in a very big for loop

What type of singularity exactly, if you're working with counts is   
it a special case? If using a Monte Carlo generation scheme, there   
are various workarounds such as while(sum(vec)!=0) {sample} for   
example. More info on the error circumstances would help.


   Good luck!
Ken Hutchison

On Sep 15, 2554 BE, at 11:41 AM, Bonnett, Laura   
l.j.bonn...@liverpool.ac.uk wrote:



Hi Steve,

Thanks for your response.  The slight issue is that I need to use a  
 different starting seed for each simulation.  If I use 'lapply'   
then I end up using the same seed each time.  (By contrast, I need   
to be able to specify which starting seed I am using).




Thanks,
Laura

-Original Message-
From: Steve Lianoglou [mailto:mailinglist.honey...@gmail.com]
Sent: 15 September 2011 16:17
To: Bonnett, Laura
Cc: r-help@r-project.org
Subject: Re: [R] Where to put tryCatch or similar in a very big for loop

Hi Laura,

On Thu, Sep 15, 2011 at 10:53 AM, Bonnett, Laura
l.j.bonn...@liverpool.ac.uk wrote:

Dear all,

I am running a simulation study to test variable imputation   
methods for Cox models using R 2.9.0 and Windows XP.  The code I   
have written (which is rather long) works (if I set nsim = 9) with  
 the following starting values.



bootrs(nsim=9,lendevdat=1500,lenvaldat=855,ac1=-0.19122,bc1=-0.18355,cc1=-0.51982,cc2=-0.49628,eprop1=0.98,eprop2=0.28,lda=0.003)


I need to run the code 1400 times in total (bootstrap resampling)   
however, occasionally the random numbers generated lead to a   
singularity and hence the code crashes as one of the Cox model   
cannot be fitted (the 10th iteration is the first time this   
happens).


I've been trawling the internet for ideas and it seems that there   
are several options in the form of try() or tryCatch() or next.
I'm not sure however, how to include them in my code (attached).
Ideally I'd like it to run everything simulation from 1 to 1400   
and if there is an error at some point get an error message   
returned (I need to count how many there are) but move onto the   
next number in the loop.


I've tried putting try(,silent=TRUE) around each cox model   
(cph statement) but that hasn't work and I've also tried putting   
try around the whole for loop without any success.


Let's imagine you are using an `lapply` instead of `for`, only because
I guess you want to store the results of `bootrs` somewhere, you can
adapt this to your `for` solution. I typically return NULL when an
error is caught, then filter those out from my results, or whatever
you like:

results - lapply(1:1400, function(i) {
 tryCatch(bootrs(...whatever...), error=function(e) NULL)
})
went.south - sapply(results, is.null)

The `went.south` vector will be TRUE where an error occurred in your
bootrs call.

HTH,
-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] odfWeave: Combining multiple output statements in a function

2011-09-16 Thread Jan van der Laan



Page 7 in my version of formatting.odt (to be sure I have the right  
version I downloaded the latest odfWeave from CRAN) discusses  
registering style definitions and Examples of Changing Styles for  
Tables, Paragraphs, Bullets and Pages which has nothing to do with my  
question (as far as I can tell).  Could you perhaps just tell me how I  
should combine the output of multiple odf* calls inside a function?


Thanks again.

Jan


Quoting Max Kuhn mxk...@gmail.com:


formatting.odf, page 7. The results are in formattingOut.odt

On Thu, Sep 15, 2011 at 2:44 PM, Jan van der Laan rh...@eoos.dds.nl wrote:

Max,

Thank you for your answer. I have had another look at the examples (I
already had before mailing the list), but could find the example you
mention. Could you perhaps tell me which example I should have a look at?

Regards,
Jan



On 09/15/2011 04:47 PM, Max Kuhn wrote:


There are examples in the package directory that explain this.

On Thu, Sep 15, 2011 at 8:16 AM, Jan van der Laanrh...@eoos.dds.nl
 wrote:


What is the correct way to combine multiple calls to odfCat, odfItemize,
odfTable etc. inside a function?

As an example lets say I have a function that needs to write two
paragraphs
of text and a list to the resulting odf-document (the real function has
much
more complex logic, but I don't think thats relevant). My first guess
would
be:

exampleOutput- function() {
  odfCat(This is the first paragraph)
  odfCat(This is the second paragraph)
  odfItemize(letters[1:5])
}

However, calling this function in my odf-document only generates the last
list as only the output of the odfItemize function is returned by
exampleOutput. How do I combine the three results into one to be returned
by
exampleOutput?

I tried to wrap the calls to the odf* functions into a print statement:

exampleOutput2- function() {
  print(odfCat(This is the first paragraph))
  print(odfCat(This is the second paragraph))
  print(odfItemize(letters[1:5]))
}

In another document this seemed to work, but in my current document
strange
odf-output is generated.

Regards,

Jan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.











--

Max



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] odfWeave: Combining multiple output statements in a function

2011-09-15 Thread Jan van der Laan



What is the correct way to combine multiple calls to odfCat,  
odfItemize, odfTable etc. inside a function?


As an example lets say I have a function that needs to write two  
paragraphs of text and a list to the resulting odf-document (the real  
function has much more complex logic, but I don't think thats  
relevant). My first guess would be:


exampleOutput - function() {
   odfCat(This is the first paragraph)
   odfCat(This is the second paragraph)
   odfItemize(letters[1:5])
}

However, calling this function in my odf-document only generates the  
last list as only the output of the odfItemize function is returned by  
exampleOutput. How do I combine the three results into one to be  
returned by exampleOutput?


I tried to wrap the calls to the odf* functions into a print statement:

exampleOutput2 - function() {
   print(odfCat(This is the first paragraph))
   print(odfCat(This is the second paragraph))
   print(odfItemize(letters[1:5]))
}

In another document this seemed to work, but in my current document  
strange odf-output is generated.


Regards,

Jan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] odfWeave: Combining multiple output statements in a function

2011-09-15 Thread Jan van der Laan


Max,

Thank you for your answer. I have had another look at the examples (I 
already had before mailing the list), but could find the example you 
mention. Could you perhaps tell me which example I should have a look at?


Regards,
Jan



On 09/15/2011 04:47 PM, Max Kuhn wrote:

There are examples in the package directory that explain this.

On Thu, Sep 15, 2011 at 8:16 AM, Jan van der Laanrh...@eoos.dds.nl  wrote:

What is the correct way to combine multiple calls to odfCat, odfItemize,
odfTable etc. inside a function?

As an example lets say I have a function that needs to write two paragraphs
of text and a list to the resulting odf-document (the real function has much
more complex logic, but I don't think thats relevant). My first guess would
be:

exampleOutput- function() {
   odfCat(This is the first paragraph)
   odfCat(This is the second paragraph)
   odfItemize(letters[1:5])
}

However, calling this function in my odf-document only generates the last
list as only the output of the odfItemize function is returned by
exampleOutput. How do I combine the three results into one to be returned by
exampleOutput?

I tried to wrap the calls to the odf* functions into a print statement:

exampleOutput2- function() {
   print(odfCat(This is the first paragraph))
   print(odfCat(This is the second paragraph))
   print(odfItemize(letters[1:5]))
}

In another document this seemed to work, but in my current document strange
odf-output is generated.

Regards,

Jan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ranking submodels by AIC (more general question)

2011-06-23 Thread Jan van der Laan


Alexandra,

Have a look at add1 and drop1.

Regards,
Jan


On 06/23/2011 07:32 PM, Alexandra Thorn wrote:

Here's a more general question following up on the specific question I
asked earlier:

Can anybody recommend an R command other than mle.aic() (from the wle
package) that will give back a ranked list of submodels?  It seems like
a pretty basic piece of functionality, but the closest I've been able to
find is stepAIC(), which as far as I can tell only gives back the best
submodel, not a ranking of all submodels.

Thanks in advance,
Alexandra

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Documenting variables, dataframes and files?

2011-06-22 Thread Jan van der Laan


The memisc package also offers functionality for documenting data.

Jan

On 06/22/2011 04:57 PM, Robert Lundqvist wrote:

Every now and then I realize that my attempts to document what all dataframes 
consist of are unsufficient. So far, I have been writing notes in an external 
file. Are there any better ways to do this within R? One possibility could be 
to set up the data as packages, but I would like to have a solution on a lower 
level, closer to data. I can't find any pointers in the standard manuals. 
Suggestions are most welcome.

Robert

**
Robert Lundqvist
Norrbotten regional council
Sweden


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unexpected behaviour as.data.frame

2011-05-16 Thread Jan van der Laan


Santosh, Ivan,

This is also what I was looking for. Thanks. Looking at the source of 
dataFrame.default is seems that it uses the same approach as I did: 
first create a list then a data.frame from that list. I think I'll stick 
with the code I already had as I don't want another dependency (multiple 
actually for R.utils). But thanks again for pointing it out.


Jan

On 05/16/2011 10:42 AM, Santosh Srinivas wrote:

Hi Ivan, Take a look dataFrame in R.utils ... is that what you want?

from the help file:

Examples

   df- dataFrame(colClasses=c(a=integer, b=double), nrow=10)
   df[,1]- sample(1:nrow(df))
   df[,2]- rnorm(nrow(df))
   print(df)

Thanks,
Santosh

On Mon, May 16, 2011 at 1:42 PM, Ivan Calandra
ivan.calan...@uni-hamburg.de  wrote:

I feel like I'm always asking this type of questions, but is it possible to
add a base function that allows creating an empty data.frame, as matrix()
does?

What I mean would be something like:
create.data.frame(number_of_columns, mode_of_columns).
I think it would make things easier than creating one or several matrices
and then combining them

Is it possible; does it make sense?

Ivan

Le 5/15/2011 22:17, Bert Gunter a écrit :

Inline below.

On Sun, May 15, 2011 at 11:11 AM, Jan van der Laanrh...@eoos.dds.nl
  wrote:

Thanks. I also noticed myself minutes after sending my message to the
list.
My 'please ignore my question it was just a stupid typo' message was sent
with the wrong account and is now awaiting moderation.

However, my other question still stands: what is the
preferred/fastest/simplest way to create a data.fame with given column
types
and dimensions?

I do not know, but  why is simply

data.frame(numeric(10), character(10), integer(10),
stringsAsFactors=FALSE)

not acceptable? Note that if you had, say, 500, numeric (= double) and
100 character columns to add, you might do something like:


z- matrix(numeric(5000),nr=10)
u- matrix(character(1000),nr=10)
frm- data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns

While this might save some typing, it may not be much more efficient
than typing it all out -- maybe just some parsing time is saved. You
can experiment and see.

However, since a data.frame **is** a list with added attributes and a
great deal of the work of the constructor is in constructing and
checking these attributes (e.g. row and column names), I see nothing
terribly inefficient with what you did. It's just a bit obscure.  But
maybe someone with greater expertise will set us both straight.

Cheers,
Bert



Regards,
Jan


On 05/15/2011 04:43 PM, Bert Gunter wrote:

In your post, you're missing the final s on the stringsAsFactors
argument in the d1 assignment. When I typed it correctly, it works as
expected.

-- Bert

On Sun, May 15, 2011 at 4:25 AM, Jan van der Laanrh...@eoos.dds.nl
  wrote:

I use the following code to create two data.frames d1 and d2 from a
list:
types- c(integer, character, double)
nlines- 10
d1- as.data.frame(lapply(types, do.call, list(nlines)),
stringsAsFactor=FALSE)
l2- lapply(types, do.call, list(nlines))
d2- as.data.frame(l2, stringsAsFactors=FALSE)

I would expect d1 and d2 to be the same, however, in d1 the second
column
is
a factor while in d2 it is a character (which I would expect):


str(d1)

'data.frame':   10 obs. of  3 variables:
  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
  $ c: Factor w/ 1 level : 1 1
1
1
1 1 1 1 1 1
  $ c.0..0..0..0..0..0..0..0..0..0.  : num  0 0 0 0 0 0 0 0 0 0

str(d2)

'data.frame':   10 obs. of  3 variables:
  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
  $ c: chr  ...
  $ c.0..0..0..0..0..0..0..0..0..0.  : num  0 0 0 0 0 0 0 0 0 0


As different but related question: I use the commands above to create
an
'empty' data.frame with specified column types and dimensions. I need
this
data.frame to pass on to my c++ routines. Is there a more
simple/elegant
way
of creating this data.frame?

Regards,

Jan


PS:
I am running R on 64 bit Ubuntu 11.04:


sessionInfo()

R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere

Re: [R] Unexpected behaviour as.data.frame

2011-05-15 Thread Jan van der Laan

Forget I asked. There was a typo in my example (stringsAsFactor  
instead of stringAsFactors) which explained the difference. My  
apologies.


My second question however still stands: How does on create a  
data.frame with given column types and given dimensions? Thanks.


Regards,
Jan


Quoting Jan van der Laan rh...@eoos.dds.nl:


I use the following code to create two data.frames d1 and d2 from a list:

types  - c(integer, character, double)
nlines - 10
d1 - as.data.frame(lapply(types, do.call, list(nlines)),
stringsAsFactor=FALSE)
l2 - lapply(types, do.call, list(nlines))
d2 - as.data.frame(l2, stringsAsFactors=FALSE)

I would expect d1 and d2 to be the same, however, in d1 the second
column is a factor while in d2 it is a character (which I would expect):


str(d1)

'data.frame':   10 obs. of  3 variables:
 $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
 $ c: Factor w/ 1 level : 1 1
1 1 1 1 1 1 1 1
 $ c.0..0..0..0..0..0..0..0..0..0.  : num  0 0 0 0 0 0 0 0 0 0

str(d2)

'data.frame':   10 obs. of  3 variables:
 $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
 $ c: chr  ...
 $ c.0..0..0..0..0..0..0..0..0..0.  : num  0 0 0 0 0 0 0 0 0 0


As different but related question: I use the commands above to create
an 'empty' data.frame with specified column types and dimensions. I
need this data.frame to pass on to my c++ routines. Is there a more
simple/elegant way of creating this data.frame?

Regards,

Jan


PS:
I am running R on 64 bit Ubuntu 11.04:


sessionInfo()

R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Unexpected behaviour as.data.frame

2011-05-15 Thread Jan van der Laan


I use the following code to create two data.frames d1 and d2 from a list:

types  - c(integer, character, double)
nlines - 10
d1 - as.data.frame(lapply(types, do.call, list(nlines)),  
stringsAsFactor=FALSE)

l2 - lapply(types, do.call, list(nlines))
d2 - as.data.frame(l2, stringsAsFactors=FALSE)

I would expect d1 and d2 to be the same, however, in d1 the second  
column is a factor while in d2 it is a character (which I would expect):



str(d1)

'data.frame':   10 obs. of  3 variables:
 $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
 $ c: Factor w/ 1 level : 1  
1 1 1 1 1 1 1 1 1

 $ c.0..0..0..0..0..0..0..0..0..0.  : num  0 0 0 0 0 0 0 0 0 0

str(d2)

'data.frame':   10 obs. of  3 variables:
 $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
 $ c: chr  ...
 $ c.0..0..0..0..0..0..0..0..0..0.  : num  0 0 0 0 0 0 0 0 0 0


As different but related question: I use the commands above to create  
an 'empty' data.frame with specified column types and dimensions. I  
need this data.frame to pass on to my c++ routines. Is there a more  
simple/elegant way of creating this data.frame?


Regards,

Jan


PS:
I am running R on 64 bit Ubuntu 11.04:


sessionInfo()

R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unexpected behaviour as.data.frame

2011-05-15 Thread Jan van der Laan

Thanks. I also noticed myself minutes after sending my message to the 
list. My 'please ignore my question it was just a stupid typo' message 
was sent with the wrong account and is now awaiting moderation.


However, my other question still stands: what is the 
preferred/fastest/simplest way to create a data.fame with given column 
types and dimensions?


Regards,
Jan


On 05/15/2011 04:43 PM, Bert Gunter wrote:

In your post, you're missing the final s on the stringsAsFactors
argument in the d1 assignment. When I typed it correctly, it works as
expected.

-- Bert

On Sun, May 15, 2011 at 4:25 AM, Jan van der Laanrh...@eoos.dds.nl  wrote:

I use the following code to create two data.frames d1 and d2 from a list:
types- c(integer, character, double)
nlines- 10
d1- as.data.frame(lapply(types, do.call, list(nlines)),
stringsAsFactor=FALSE)
l2- lapply(types, do.call, list(nlines))
d2- as.data.frame(l2, stringsAsFactors=FALSE)

I would expect d1 and d2 to be the same, however, in d1 the second column is
a factor while in d2 it is a character (which I would expect):


str(d1)

'data.frame':   10 obs. of  3 variables:
  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
  $ c: Factor w/ 1 level : 1 1 1 1
1 1 1 1 1 1
  $ c.0..0..0..0..0..0..0..0..0..0.  : num  0 0 0 0 0 0 0 0 0 0

str(d2)

'data.frame':   10 obs. of  3 variables:
  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
  $ c: chr  ...
  $ c.0..0..0..0..0..0..0..0..0..0.  : num  0 0 0 0 0 0 0 0 0 0


As different but related question: I use the commands above to create an
'empty' data.frame with specified column types and dimensions. I need this
data.frame to pass on to my c++ routines. Is there a more simple/elegant way
of creating this data.frame?

Regards,

Jan


PS:
I am running R on 64 bit Ubuntu 11.04:


sessionInfo()

R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] blank space escape sequence in R?

2011-04-25 Thread Jan van der Laan


There exists a non-breaking space:

http://en.wikipedia.org/wiki/Non-breaking_space

Perhaps you could use this. In R on Linux under gnome-terminal I can 
enter it with CTRL+SHIFT+U00A0. This seems to work: it prints as a 
space, but is not equal to ' '. I don't know if there are any 
difficulties using, for example, utf8 encoding in source files (which 
you'll probably need).


Jan



On 04/25/2011 03:28 PM, Duncan Murdoch wrote:

On 25/04/2011 9:13 AM, Mark Heckmann wrote:
I use a function that inserts line breaks (\n as escape sequence) 
according to some criterion when there are blanks in the string.

e.g. some text \nand some more text.

What I want now is another form of a blank, so my function will not 
insert a ”\n at that point.

e.g. some text\spaceand some more text

Here \space stands for some escape sequence for a  blank, which is 
what I am looking for.
So what I need is something that will appear as a blank when printed 
but not in the string itself.


I don't think R has anything like that built in.   You'll need to 
attach a class to your vector of strings, and write a print method for 
it that does the substitution before printing.


Duncan Murdoch


TIA

Am 25.04.2011 um 15:05 schrieb Duncan Murdoch:

  On 25/04/2011 9:01 AM, Mark Heckmann wrote:
  Is there a blank space escape sequence in R, i.e. something like 
\sp etc. to produce a blank space?


  You need to give some context.  A blank in a character vector will 
be printed as a blank, so you are probably talking about something 
else, but what?


  Duncan Murdoch

–––
Mark Heckmann
Blog: www.markheckmann.de
R-Blog: http://ryouready.wordpress.com







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 129 matches

Mail list logo