Re: [R] function input as variable name (deparse/quote/paste) ??
On Sat, Mar 10, 2012 at 04:01:21PM -0800, casperyc wrote: Sorry if I wasn't stating what I really wanted or it was a bit confusing. Basically, there are MANY datasets to run suing the same function I have written a function to analyze it and returns a LIST of useful out put in the variable 'res' (to the workspace). I also created another script run.r such as myname(dat1) myname(dat2) myname(dat3) myname(dat4) myname(dat5) For now, each time the output in the main workspace 'res' (the list) is over written. I want it to have different suffix to differentiate them. So I can have a look later after the batch is run. I see no advantage in having that information in variable names. Just - add the name of the data set to the information that is included in the returned list. - run your function with sapply() and the returned list of sapply will be a list of lists. -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hierarchical clustering of large dataset
On Fri, Mar 09, 2012 at 08:26:01PM -0500, Massimo Di Stefano wrote: my target is to have 'groups of species' based on the similarity of theyr environmental parameters, and build a dendrogram like [2] [2] http://massimo-timecapsule.whoi.edu//data/img/manova_clust_matlab.png Il giorno Mar 9, 2012, alle ore 7:18 PM, Peter Langfelder ha scritto: Well, you didn't say that column e was a label that you wanted to keep separate. Any other labels in the data? You may not want to use labels in the distance calculation. If you want to use the results of the cluster-analysis as evidence on similarities and differences between species, you _must_ not include numeric variables representing labels in the matrix. Including them would mean imposing the expected result onto the data. First do the cluster analysis, then test the distribution of species in clusters. -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issues in installing rgl in Mac OS 10.6.8
On Fri, Mar 09, 2012 at 04:52:31PM -0800, A Ezhil wrote: Dear All, I am trying to install rgl on my mac notebook from the source file. I tried using: /usr/bin/R64 CMD INSTALL rgl_0.92.798.tar.gz and get the following error message: checking for X... no configure: error: X11 not found but required, configure aborted. ERROR: configuration failed for package ‘rgl’ * removing ‘/Library/Frameworks/R.framework/Versions/2.14/Resources/library/rgl’ * restoring previous ‘/Library/Frameworks/R.framework/Versions/2.14/Resources/library/rgl’ I do see a directory X11 installed under /usr and Sys.getenv(PATH) inside R gives me: [1] /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin Could you please hep me to install rgl package? Not really, but I can offer a hint: I think your system has the _runtime_ libraries for X11 (in /usr/X11), but you need _development_ libraries to comile rgl. I have no knowledge about Mac OS, but in my system, Debian GNU/Linux, the needed libraries to build rgl from source are: libgl1-mesa-dev libglu1-mesa-dev mesa-common-dev -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function input as variable name (deparse/quote/paste) ??
On Sat, Mar 10, 2012 at 01:29:16PM -0800, casperyc wrote: Hi all Say I have a function: myname=function(dat,x=5,y=6){ res-x+y-dat } for various input such as myname(dat1) myname(dat2) myname(dat3) myname(dat4) myname(dat5) how should I modify the 'res' line, to have new informative variable name correspondingly, such as dat1.res dat2.res dat3.res dat4.res dat5.res stored in the workspace. Why not keep the information of input values in a list, or vector? What is gained by storing that info in the variable _name_ ? Your function could return a list with both the result and the input value. While you did say that this was part of something complex, I suspect your post might be a case of Being overly specific and not stating your real goal. -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I force confint() for glm() to be quiet?
I need confint() for glm() to supress the messages Waiting for profiling to be done... because they mess up the caching mechanism of pgfSweave (see https://github.com/cameronbracken/pgfSweave/issues/40). I have read the help page of confint(), but I do not know how to get the help page for the glm() version, if any such help page exists. Is there a general way of turning of output from functions in R, that would help here? Below is an example of an intended usage scenario: x - 1 set.seed(42) a - rnorm(x) b - factor(LETTERS[sample(1:7, x, replace = TRUE)]) c - factor(LETTERS[sample(1:4, x, replace = TRUE)]) my.fit - glm(c ~ b + a, family = binomial) my.results - confint(my.fit) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I force confint() for glm() to be quiet?
On 2012-03-09 15:30, David Winsemius wrote: On Mar 9, 2012, at 6:14 AM, Hans Ekbrand wrote: I need confint() for glm() to supress the messages I'm wondering if suppressMessages would be helpful? Which in turn suggests that you do not know how to use ??, so firt you should get in the habit of doing a helpSearch before posting. ??suppress messages OK, noted. Waiting for profiling to be done... because they mess up the caching mechanism of pgfSweave (see https://github.com/cameronbracken/pgfSweave/issues/40). I have read the help page of confint(), but I do not know how to get the help page for the glm() version, if any such help page exists. When I type ?confint.glm at my console I get this help page: Ah, I tried ?confint.lm without success and didn't go further. If suppressMessages is not effective then look at: ?sink OK, but since suppressMessages works, I'll stick to that. G. A _minimal_ example would have had fewer iterations, Sorry. but this does seem to be effective: suppressMessages(my.results - confint(my.fit)) Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up merge
On Fri, Mar 02, 2012 at 03:24:20AM -0700, Ben quant wrote: Hello, I have a nasty loop that I have to do 11877 times. Are you completely sure about that? I often find my self avoiding loops-by-row by constructing vectors of which rows that fullfil a condition, and then creating new vectors out of that vector. If you elaborate on the problem, perhaps we could find a way to avoid the loops altogether? Mostly as a note to self, I wrote http://code.cjb.net/vectors-instead-of-loop.html, it might be understood by others too, but I'm not sure. -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data analysis
On Mon, Feb 27, 2012 at 11:04:13PM -0800, nontokozo mhlanga wrote: Please assist me with all the tests including risk factor analysis i can use to analyse the enclosed database established from a questionnaire survey to test for the prevalence of tuberculosis in humans . That's quite a general request. I think you should try to formulate a specific question. Have you read the posting-guide? http://www.R-project.org/posting-guide.html Also, I don't think the list accepts attached files. -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count how many row i have in a txt file in a directory
On Sun, Feb 26, 2012 at 03:03:58PM +0100, gianni lavaredo wrote: Dear Researchers, I have a large TXT (X,Y,MyValue) file in a directory and I wish to import row by row the txt in a loop to save only the data they are inside a buffer (using inside.owin of spatstat) and delete the rest. The first step before to create a loop row-by-row is to know how many rows there are in the txt file without load in R to save memory problem. some people know the specific function? If the number of rows are many that even only three variables per row will cause memory problems, then looping the file row-by-row will take a very long time. I would - instead of looping row-by-row - split the text file into chunks small enough for a chunk to be read into R, and operated on within R, without memory problems. I create a test file of 10.000.000 rows my.words - replicate(1, paste(LETTERS[sample.int(28, 10)], sep = , collapse = )) my.df - data.frame(x=rnorm(1000), y=rnorm(1000), my.val=rep(my.words, 1000)) write.csv(my.df, file = testmem.csv) Split the file into smaller chunks, say 1.000.000 rows. I use the split command in GNU coreutils, $ split -l 100 testmem.csv Loop through the cunks. for(file.name in c(xaa, xab ...){ chunk - read.csv(file = file.name) [ match and add all the interesting rows to an object ] } Here's an example that for each chunk prints its third row. for(file.name in c(xaa, xab)){ chunk - read.csv(file = file.name) print(chunk[3,]) } With a chunk of 1.000.000 rows, R needed about 250 MB RAM to process this loop. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count how many row i have in a txt file in a directory
On Sun, Feb 26, 2012 at 05:06:42PM +0100, gianni lavaredo wrote: thanks Hans. It's true your idea improve the speed in the analysis respect a row-by-row loop. Sorry if I ask these questions to better understand and better performening my code: 1) split command in GNU coreutils, $ split -l 100 testmem.csv i never use this command. Is it possibile to coding in R or it's an external command? external. split is - as I wrote - part of GNU coreutils. do you have some links where i can study this command. Thanks http://www.gnu.org/software/coreutils/ 2) is it possible to work with txt file? txt file is not a well defined concept, such a file could very well be a csv file, see http://en.wikipedia.org/wiki/Comma-separated_values ?read.csv __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count how many row i have in a txt file in a directory
On Sun, Feb 26, 2012 at 09:39:46AM -0800, Rui Barradas wrote: Hello, The first step before to create a loop row-by-row is to know how many rows there are in the txt file without load in R to save memory problem. some people know the specific function? I don't believe there's a specific function. As stated, OP does not need to know the number of lines in the file to solve the problem. However, if you want to know that, I'd suggest the command wc rather than writing a function in R to accomplish this. wc is also part of GNU coreutils $ wc -l foo.csv 1138200 foo.csv If you want to know how many rows are there in a txt file, try this function. numTextFileLines - function(filename, header=FALSE, sep=,, nrows=5000){ tc - file(filename, open=rt) on.exit(close(tc)) if(header){ # cnames: column names (not used) cnames - read.table(file=tc, sep=sep, nrows=1, stringsAsFactors=FALSE) # cnames - as.character(cnames) } n - 0 while(TRUE){ x - tryCatch(read.table(file=tc, sep=sep, nrows=nrows), error=function(e) e) if (any(grepl(no lines available, unclass(x break if(nrow(x) nrows){ n - n + nrow(x) break } n - n + nrows } n } But hey, programming R is fun, so why not? -- Hans Ekbrand __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] which is the fastest way to make data.frame out of a three-dimensional array?
foo - rnorm(30*34*12) dim(foo) - c(30, 34, 12) I want to make a data.frame out of this three-dimensional array. Each dimension will be a variabel (column) in the data.frame. I know how this can be done in a very slow way using for loops, like this: x - rep(seq(from = 1, to = 30), 34) y - as.vector(sapply(1:34, function(x) {rep(x, 30)})) month - as.vector(sapply(1:12, function(x) {rep(x, 30*34)})) my.df - data.frame(month, x=rep(x, 12), y=rep(y, 12), temp=rep(NA, 30*34*12)) my.counter - 1 for(month in 1:12){ for(i in 1:34){ for(j in 1:30){ my.df$temp[my.counter] - foo[j,i,month] my.counter - my.counter + 1 } } } str(my.df) 'data.frame': 12240 obs. of 4 variables: $ month: int 1 1 1 1 1 1 1 1 1 1 ... $ x: int 1 2 3 4 5 6 7 8 9 10 ... $ y: int 1 1 1 1 1 1 1 1 1 1 ... $ temp : num 0.673 -1.178 0.54 0.285 -1.153 ... (In the real world problem I had, data was monthly measurements of temperature and x, y was coordinates). Does anyone care to share a faster and less ugly solution? TIA -- Hans Ekbrand __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] which is the fastest way to make data.frame out of a three-dimensional array?
First, thank you both Bert and Petr for your excellent answers. Berts solution seems somewhat faster, and Petrs is - in my opion at least - slightly more elegant. foo - rnorm(36 * 150 * 170) dim(foo) - c(36, 150, 170) n - dim(foo) system.time(my.df - data.frame(dat = as.vector(foo), + dim1 = rep(seq_len(n[1]), n[2]*n[3]), + dim2 = rep(rep(seq_len(n[2]), e=n[1]), n[3]), + dim3 = rep(seq_len(n[3]), e = n[1]*n[2]))) user system elapsed 0.932 0.156 1.090 system.time(my.df - cbind(temp=c(foo), expand.grid(dim1=1:n[1], dim2=1:n[2], dim3=1:n[3]))) user system elapsed 0.980 0.252 1.244 -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Behaviour of 'source' with URLs and proxy
On Wed, Oct 05, 2011 at 12:44:12PM +0200, Renaud Gaujoux wrote: Is source supposed to work through a proxy? This worked for me: Sys.setenv(http_proxy=http://192.168.0.252:8118;) source(http://pc5.socio.gu.se:84/enkel-kurva.r;, echo = T) my.vectory = c(1,30,2,3,3,4) my.vectorx = c(1,2,3,4,5,6) plot(y = my.vectory, x = my.vectorx, type = l) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Synchronizing R libraries on N machines?
On Thu, Aug 25, 2011 at 08:25:02AM -0500, Giovanni Petris wrote: Hello! I am using R on two different machines (under Ubuntu and OS X, but this is probably irrelevant) and I would like to keep the two installations 'synchronized', in particular in terms of installed packages. For example, if I install package xxx on my Linux machine, I would like to find it installed also on my Mac, and vice versa. I imagine this to be a fairly common problem, so I would like to ask if anybody has suggestions to share about it. Is there a way to make the synchronization automatic? Painless? I have a number of machines in a home LAN that share /usr/local where I have all but a few R-packages that are automatically installed by the OS package-mangagement system (by installing the meta package r-recommended). I have the following snippet in my .Rprofile lib.loc = /usr/local/lib/R/site-library/ so whenever a package is installed, all machines have access to it. This will of course not work if the machines are running different OS:es, so that is not irrelevant. signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lavaan: how to analyse residuals of a latent variable
Hi r-help, I use lavaan:sem() for structural equation modelling with latent variables. Below is a reproducible example (the code requires a working installation of lavaan) where the latent variable criminality is in focus. Besides criminality in general, I am specifically interested one of the manifest variables that make up the latent variable criminality, namely fire.setting. My question is: how can I analyse the part of the variation in fire.setting that is not included in the latent variable criminality? Ideally I would want a new variable that captures just this. Then I could model regressions with this variable as the dependent variable. As far as I understand the output of the summary() - of which I have reproduced a few lines - about half (0.499) of the variation in fire.setting is included in the latent variable criminality. Estimate Std.err Z-value P(|z|) Std.lv Std.all Latent variables: criminality =~ fire.setting 0.3240.007 48.2230.0000.1890.499 I would like to analyse the other half of fire.setting, so to speak. my.model - ## Measurement model (definitions of the latent variables) priviledged.parents =~ nr.parents.employed + parental.housing school.adaption =~ enjoying.school + good.teachers + good.grades.important school.grades =~ grade.language + grade.english + grade.craft + grade.math + grade.chemistry + grade.arts + grade.sports criminality =~ vandalism + illegal.grafitti + shop.lifting + theft.from.automat + theft.from.school + theft.of.bicycle + theft.of.moped + theft.of.car + theft.from.car + theft.pick.pocket + burglary + buying.stolen.goods + selling.stolen.goods + wearing.knife + robbery + fire.setting + abuse.unknown.persons + abuse.family.members + used.knife + drugs.cannabis + drugs.other + drugs.thinner + drugs.steroids + selling.drugs.cannabis + selling.drugs.other ## Regressions priviledged.parents ~ parental.migration + parental.class school.adaption ~ parental.migration + parental.class + sex.girl + priviledged.parents school.grades ~ parental.migration + parental.class + sex.girl + priviledged.parents criminality ~ parental.migration + parental.class + sex.girl + priviledged.parents + school.adaption + school.grades library(lavaan) con - url(http://code.cjb.net/temp/lavaan.temp.RData;) print(load(con)) close(con) my.fit - sem(my.model, data = my.crim.set) summary(my.fit, fit.measures = T, standardized = T) -- Hans Ekbrand Department of Sociology University of Gothenburg Sweden signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] importing spss-files [ was: Re: need your consult]
On Tue, Aug 09, 2011 at 02:28:22AM -0700, Mehrshad Koleini wrote: Dear Sir/Madam Hi. I am a general paediatrician, and I have read *some* chapters of the following books(1-3). I think SPSS lacks some features that may be important in data analysis (for example: interval of correlation coefficient in bivariate normal distribution, PRESS, and MSPR in cross-validation). I am thinking about changing SPSS to R: 1.SPSS is very expensive for me to update. 2. My colleagues use SPSS, but I think data can be exchanged between SPSS, and R, is this true? Yes, but the data must be converted, which it not an entirely seamless process, there might be quirks to be handled manually. To import data from an SPSS file to R, read http://cran.r-project.org/doc/manuals/R-data.html and http://cran.r-project.org/web/packages/foreign/foreign.pdf Basically, it can be as simple as library(foreign) foo - read.spss(file = data_set.sav) now, your data is in object foo, which can be inspected with the function str() str(foo) -- Hans Ekbrand signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lavaan: how to analyse residuals of a latent variable
On Tue, Aug 09, 2011 at 01:49:13PM +0200, yrosseel wrote: My question is: how can I analyse the part of the variation in fire.setting that is not included in the latent variable criminality? Ideally I would want a new variable that captures just this. Then I could model regressions with this variable as the dependent variable. You can add a regression line to your model syntax with 'fire.setting' as the dependent variable: fire.setting ~ x1 + x2 + x3 were x1-x3 are additional predictors that might influence the variable 'fire.setting'. Can I include criminality among those and thereby get the common part of criminality and fire.setting out of the way? I tried adding the following regression formula: fire.setting ~ parental.migration + parental.class + sex.girl + priviledged.parents + school.adaption + school.grades + criminality but I got: Error in solve.default(E) : Lapack routine dgesv: system is exactly singular [lavaan message:] could not compute standard errors! You can still request a summary of the fit to inspect the current estimates of the parameters. However, the fit-object has regression estimates were criminality seems to have about the same size as I would have thought, given the covariation of fire.setting and criminality. Estimate Std.err Z-value P(|z|) Std.lv Std.all fire.setting ~ parental.migr 0.001 0.0010.003 parental.clas-0.000 -0.000 -0.000 sex.girl -0.015 -0.015 -0.019 priviledged.p 0.066 0.0150.039 school.adapti 0.004 0.0020.005 school.grades-0.012 -0.010 -0.026 criminality 0.327 0.1910.505 Are the other estimates reasonable estimates of the part of variation in fire-setting that does not co-variate with criminality? -- Hans Ekbrand signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [Solved] Re: lavaan: how to analyse residuals of a latent variable
On Tue, Aug 09, 2011 at 03:30:17PM +0200, yrosseel wrote: Can I include criminality among those and thereby get the common part of criminality and fire.setting out of the way? No. You already regress fire.setting on criminality since it is an indicator in the measurement model of criminality. In other words, the 'criminality' part is already regressed out. So, I get just what I want by simply regressing on fire.setting, that is awesome! Maybe this kind of usage of lavaan is not very common, but in order to help others in my situation, is this documented somewhere? My understanding of latent variable analysis is indeed limited, but I did not understand that lavaan worked liked this when I read the documentation. Kind regards, Hans Ekbrand signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] confint.multinom() slow?
Dear R-helpers, I'm doing a bivariate analysis with two factors, both with relatively many levels: 1. clustering, a factor with 35 levels 2. country, a factor with 24 levels n = 12,855 my.fit - multinom(clustering ~ country, maxit=300) converges after 280 iterations. I would like to get CI:s for the odds ratios, and have tried confint() my.cis - confint(my.fit) I started confint() a few hours ago, but now I'm getting suspicious, since it hasn't terminated yet. Perhaps I just lack the reasonable patience, but is such a long computational time for confint() to be expected here? Hans Ekbrand signature.asc Description: OpenPGP digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cluster analysis, factor variables, large data set
Dear R helpers, I have a large data set with 36 variables and about 50.000 cases. The variabels represent labour market status during 36 months, there are 8 different variable values (e.g. Full-time Employment, Student,...) Only cases with at least one change in labour market status is included in the data set. To analyse sub sets of the data, I have used daisy in the cluster-package to create a distance matrix and then used pam (or pamk in the fpc-package), to get a k-medoids cluster-solution. Now I want to analyse the whole set. clara is said to cope with large data sets, but the first step in the cluster analysis, the creation of the distance matrix must be done by another function since clara only works with numeric data. Is there an alternative to the daisy - clara route that does not require as much RAM? What functions would you recommend for a cluster analysis of this kind of data on large data set? regards, Hans Ekbrand __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cluster analysis, factor variables, large data set
On Thu, Mar 31, 2011 at 07:06:31PM +0100, Christian Hennig wrote: Dear Hans, clara doesn't require a distance matrix as input (and therefore doesn't require you to run daisy), it will work with the raw data matrix using Euclidean distances implicitly. I can't tell you whether Euclidean distances are appropriate in this situation (this depends on the interpretation and variables and particularly on how they are scaled), but they may be fine at least after some transformation and standardisation of your variables. The variables are unordered factors, stored as integers 1:9, where 1 means Full-time employment 2 means Part-time employment 3 means Student 4 means Full-time self-employee ... Does euclidean distances make sense on unordered factors coded as integers? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cluster analysis, factor variables, large data set
On Thu, Mar 31, 2011 at 08:48:02PM +0200, Hans Ekbrand wrote: On Thu, Mar 31, 2011 at 07:06:31PM +0100, Christian Hennig wrote: Dear Hans, clara doesn't require a distance matrix as input (and therefore doesn't require you to run daisy), it will work with the raw data matrix using Euclidean distances implicitly. I can't tell you whether Euclidean distances are appropriate in this situation (this depends on the interpretation and variables and particularly on how they are scaled), but they may be fine at least after some transformation and standardisation of your variables. The variables are unordered factors, stored as integers 1:9, where 1 means Full-time employment 2 means Part-time employment 3 means Student 4 means Full-time self-employee ... Does euclidean distances make sense on unordered factors coded as integers? To be clear, here is an extract my.df.full[900:910, 16:19] PL210F.first.year PL210G.first.year PL210H.first.year PL210I.first.year 900 2 2 1 2 901 1 1 1 1 902 1 1 1 1 903 2 2 2 2 904 1 1 1 1 905 2 2 2 2 906 7 8 2 7 907 5 5 5 5 908 1 1 1 1 909 1 1 1 1 910 1 1 1 1 class(my.df.full[,16]) [1] integer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to make list() return a list of *named* elements
On Thu, Sep 30, 2010 at 09:10:16AM -0400, Gabor Grothendieck wrote: A data frame is a list in which every component (i.e. every column) must have the same length (i.e. the same number of rows). data.frame() does preserve names: data.frame(b, my.c) b my.c 1 22.48 2 12.29 3 10.9 15 4 8.51 5 9.2 14 Thanks for your suggestion. However, the reason I used list() was that the different vectors to return usually have different lengths. Admittedly, I should have used another example that explicated this. -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net GnuPG key: 1024D/7050614E Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E Learn about secure email at http://www.gnupg.org signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to make list() return a list of *named* elements
On Thu, Sep 30, 2010 at 09:34:26AM -0300, Henrique Dallazuanna wrote: You should try: eapply(.GlobalEnv, I)[c('b', 'my.c')] Great! b - c(22.4, 12.2, 10.9, 8.5, 9.2) my.c - sample.int(round(2*mean(b)), 4) my.return - function (vector.of.variable.names) { eapply(.GlobalEnv, I)[vector.of.variable.names] } str(my.return(c(b,my.c))) List of 2 $ b :Class 'AsIs' num [1:5] 22.4 12.2 10.9 8.5 9.2 $ my.c:Class 'AsIs' int [1:4] 18 22 12 3 much nicer than list(b=b, my.c=my.c), especially in real cases with longer variable names and a lot of variables to return. Thanks Henrique! -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net Q. What is that strange attachment in this mail? A. My digital signature, see www.gnupg.org for info on how you could use it to ensure that this mail is from me and has not been altered on the way to you. signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to make list() return a list of *named* elements
On Mon, Oct 04, 2010 at 07:45:23PM +0800, Berwin A Turlach wrote: G'day Hans, On Mon, 4 Oct 2010 11:28:15 +0200 Hans Ekbrand h...@sociologi.cjb.net wrote: On Thu, Sep 30, 2010 at 09:34:26AM -0300, Henrique Dallazuanna wrote: You should try: eapply(.GlobalEnv, I)[c('b', 'my.c')] Great! b - c(22.4, 12.2, 10.9, 8.5, 9.2) my.c - sample.int(round(2*mean(b)), 4) my.return - function (vector.of.variable.names) { eapply(.GlobalEnv, I)[vector.of.variable.names] } Well, if you are willing to create a vector with the variable names, then simpler solutions should be possible, i.e. solutions that only operate on the objects of interest and not on all objects in the global environment (which could be a lot depending on your style). Actually, what made me want this list-like function was when coding the return() of the interesting results from a calculation function to what I imagine is the global environment (I have only a vague concept of that though). So, in the global environment there are very few objects, while there are more objects in the function where this list-like function will be used. Your solution does look way cleaner the falling back to hidden stuff as .GlobalEnv, so I will definately use it. In addition, the returned list has is not of a strange class as in Henriques example. Thanks, -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net GPG Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to make list() return a list of *named* elements
On Mon, Oct 04, 2010 at 07:51:10PM +0800, Berwin A Turlach wrote: R my.return - function (vector.of.variable.names) { sapply(vector.of.variable.names, function(x) get(x)) } Even better :-) -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net GPG Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to make list() return a list of *named* elements
On Mon, Oct 04, 2010 at 10:07:06AM -0400, Gabor Grothendieck wrote: Some small tweaks. If you use simplify=FALSE then it will guarantee that a list is returned: sapply(my.names, get, simplify = FALSE) for example, compare the outputs of: sapply(c(letters, LETTERS), get) sapply(c(letters, LETTERS), get, simplify = FALSE) Thanks Gabor, But get() fails to find my objects, though get() successfully finds letters and LETTERS (but they are part of the global environment, I assume). a - c(not, this) b - c(0,0) my.test.function - function() { a - 1:3 b - c(x, y) sapply(c(a, b), get, simplify = FALSE) } my.test.function() $a [1] not this $b [1] 0 0 rm(a,b) my.test.function - function() { a - 1:3 b - c(x, y) sapply(c(a, b), get, simplify = FALSE) } my.test.function() Error in FUN(c(a, b)[[1L]], ...) : object 'a' not found If get() is what should be used, then how do you get it to find objects in the environment of the function? I would prefer to write a special my.return() and for get() to work there, perhaps some deep R magic is needed. Something like this is what I aim for: my.return - function(my.names) { sapply(my.names, get, simplify = FALSE) } my.function - function() { ... summarize data ... my.return(summary.one, summary.two ...) } Kind regards, Hans signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to make list() return a list of *named* elements
On Mon, Oct 04, 2010 at 04:10:46PM +0200, Christophe Pallier wrote: See llist from Hmisc package: library(Hmisc) a=rnorm(10) b=rnorm(5) llist(a,b) Ah, that seems like what I want! My old, ugly and redunant code looked like this list(utdrag=utdrag, n.fires.at.sites.with.one.or.more.fires=n.fires.at.sites.with.one.or.more.fires, more.than.one.fire.in.these.clusters=more.than.one.fire.in.these.clusters, n.fires.in.clusters.with.more.than.one.fire=n.fires.in.clusters.with.more.than.one.fire, n.fires.in.top.ten.clusters=n.fires.in.top.ten.clusters, m=m, översta.fem.procenten=översta.fem.procenten, these.cluster.have.alot.of.members=these.cluster.have.alot.of.members, n.fires.in.hotspots=n.fires.in.hotspots, prop.concentrated.fires=prop.concentrated.fires, d=d, real.vector=real.vector, the.real=the.real, censored.real=censored.real, half.of.effect.at=half.of.effect.at) The new, nice-looking code looks like this: llist(utdrag, n.fires.at.sites.with.one.or.more.fires, more.than.one.fire.in.these.clusters, n.fires.in.clusters.with.more.than.one.fire, n.fires.in.top.ten.clusters, m, översta.fem.procenten, these.cluster.have.alot.of.members, n.fires.in.hotspots, prop.concentrated.fires, d, real.vector, the.real, censored.real, half.of.effect.at) Thank you all r-helpers! signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to make list() return a list of *named* elements
If I combine elements into a list b - c(22.4, 12.2, 10.9, 8.5, 9.2) my.c - sample.int(round(2*mean(b)), 5) my.list - list(b, my.c) the names of the elements seems to get lost in the process: str(my.list) List of 2 $ : num [1:5] 22.4 12.2 10.9 8.5 9.2 $ : int [1:5] 11 8 6 9 20 If I explicitly name the elements at list-creation, I get what I want: my.list - list(b=b, my.c=my.c) str(my.list) List of 2 $ b : num [1:5] 22.4 12.2 10.9 8.5 9.2 $ my.c: int [1:5] 11 8 6 9 20 Now, is there a way to get list() (or some other function) to automatically name the elements? I often use list() in return(), and I am getting tired of having to repeat myself. -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net A. Because it breaks the logical sequence of discussion Q. Why is top posting bad? signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting random integers
On Thu, Apr 29, 2010 at 12:43:42PM -0400, Sarah Goslee wrote: You can always take a look. If you use a much bigger sample size it will be obvious: hist(round(runif(100, min = 1, max = 10))) Thank for this advice, apparently 1 and 10 had not the same chances of being selected. I'd use instead: hist(sample(1:10, 100, replace=TRUE)) sample() is what I want, thank you. -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] getting random integers
I want 100 integers. Each integer, x, can be in the range 1 = x = 10. Does the following code give 1 and 10 the same chances to be selected as 2:8? round(runif(100, min = 1, max = 10)) -- Hans Ekbrand signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to word-wrap text in labels in plots?
c - structure(c(2L, 2L, 1L, 3L, 4L, 2L, 3L, 2L, 3L, 2L, 5L), .Label = c(foo, + bar, a really really long variable label mostly here to show the need of word-wrapping text in labels, + a not so important value, baz), class = factor) plot(c) Is there a way to get the long variable labels to automatically wrap so that all labels can be shown? Alternatively, is there a way to get the labels truncated, possibly with .. appended? -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net Q. What is that strange attachment in this mail? A. My digital signature, see www.gnupg.org for info on how you could use it to ensure that this mail is from me and has not been altered on the way to you. signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to word-wrap text in labels in plots?
Thanks to Jim and Eik! I really appreciate your help, and I think can use your suggestions and perhaps write a wrapper for plot that integrates them. -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net Q. What is that strange attachment in this mail? A. My digital signature, see www.gnupg.org for info on how you could use it to ensure that this mail is from me and has not been altered on the way to you. signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] deleting rows provisionally
On Fri, Apr 24, 2009 at 04:50:48AM -0700, onyourmark wrote: Hi. Thanks very much for the reply and the good suggestion. It works well. But I don't get why the for loop is not deleting anything or making any assignments? Or I should say, doesn't answer3[-i,] delete entries from answer3 when the if condition is true? Your for loop was: for(i in 1:1537){if(answer2[i,1]==answer2[i,2]){answer3[-i,]}} No, answer3[-i] does not remove item i from answer3, it returns an anonymous temporary object which is identical to (answer3 without item i). Since that object not saved, it is deleted when the loop enters the next iteration. To actually *modify* answer3 you can use: answer3 - answer3[-i] -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net Q. What is that strange attachment in this mail? A. My digital signature, see www.gnupg.org for info on how you could use it to ensure that this mail is from me and has not been altered on the way to you. signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stata == R - error messages
On Fri, Apr 24, 2009 at 01:50:04PM +0200, Rob Bakker wrote: Dear Peter, Also thank you for your quick reply. I did the following with no positive result: library(foreign) read.dta(choose.file(C:\Rklein)) a) quote the filename b) include the suffix rklein - read.dta(C:\Rklein.dta) -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net GPG Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Student
On Wed, Apr 08, 2009 at 10:02:10AM +0200, alberto cassese wrote: Hi, I have problem. In the function below (test and test2) i want the function test not to print the variable data but i want the function test2 to use the variable test$data. This is the creation of the variable data: matrice=c(1:10) matrice=matrix(matrice,nrow=5,ncol=2) This is the function test: test=function(data){ + return(list(x=5,data=data)) + } This is the function test2: test2=function(list){ + bodri=list$data + bodri[1,2]=bodri[2,2]+1 + return(bodri) + } Below there are the result: uno=test(matrice) due=test2(uno) uno $x [1] 5 $data [,1] [,2] [1,]16 [2,]27 [3,]38 [4,]49 [5,]5 10 due [,1] [,2] [1,]18 [2,]27 [3,]38 [4,]49 [5,]5 10 What i want is: uno=test(matrice) due=test2(uno) uno $x [1] 5 x is a variable, 5 is variable data and you don't want variable data printed? due [,1] [,2] [1,]18 [2,]27 [3,]38 [4,]49 [5,]5 10 Use uno[1], either directly or by creating a third variable from uno[1] one.and.a.half - uno[1] one.and.a.half $x [1] 5 Or, if you *really* want what that printed output from test(matrice), create a class for your list-object, and add a special print method, that will only print the first item of the list. -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net A. Because it breaks the logical sequence of discussion Q. Why is top posting bad? signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.spss, locale and encodings
I must be missing something obvious here: According to the help page for read.spss, the reencode option is only active when R is run under a UTF-8 locale. read.spss can only import the SPSS file when run under a iso88591(5) locale, under a UTF-8 locale I get: Error in read.spss(wo.sav) : error reading system-file header In addition: Warning message: In read.spss(wo.sav) : wo.sav: position 143: Variable name begins with invalid character This is under Debian GNU/Linux, the stable release. foreign is version 8.27 -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net GPG Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss, locale and encodings
On Wed, Apr 08, 2009 at 03:03:06PM +0200, Peter Dalgaard wrote: Hans Ekbrand wrote: I must be missing something obvious here: According to the help page for read.spss, the reencode option is only active when R is run under a UTF-8 locale. Not in my version: reencode: logical: should character strings be re-encoded to the current locale. The default, 'NA', means to do so in a UTF-8 locale, only. Alternatively character, specifying an encoding to assume. OK, thanks for that correction, but the problem isn't solved, since read.spss fails, see below. When read.spss succeeds, the options is not useful, since then the current locale is iso88591(5). So, does it help with reencode=Latin1? Presumably this comes from assuming UTF-8 when it isn't. Sys.getlocale() [1] LC_CTYPE=sv_SE.UTF-8;LC_NUMERIC=C;LC_TIME=sv_SE.UTF-8;LC_COLLATE=sv_SE.UTF-8;LC_MONETARY=sv_SE.UTF-8;LC_MESSAGES=sv_SE.utf8;LC_PAPER=sv_SE.utf8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=sv_SE.utf8;LC_IDENTIFICATION=C test - read.spss(wo.sav, to.data.frame=TRUE, reencode=Latin1) Error in read.spss(wo.sav, to.data.frame = TRUE, reencode = Latin1) : error reading system-file header In addition: Warning message: In read.spss(wo.sav, to.data.frame = TRUE, reencode = Latin1) : wo.sav: position 143: Variable name begins with invalid character Using another version of the dataset, where I have successfully encoded the names to UTF-8, here is the problematic variable name: names(Workorientation.2005.Swe)[143] [1] KÖN1 8.34 is used in the current prerelease. AFAIR, some issues with encodings were fixed recently. Someone running foreign 8.34 that is willing to test my SPSS-file? -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net Q. What is that strange attachment in this mail? A. My digital signature, see www.gnupg.org for info on how you could use it to ensure that this mail is from me and has not been altered on the way to you. signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss, locale and encodings
On Wed, Apr 08, 2009 at 04:17:51PM +0200, Peter Dalgaard wrote: Hans Ekbrand wrote: Someone running foreign 8.34 that is willing to test my SPSS-file? Someone with an SPSS file problem willing to help test the prereleases? :-) http://sociologi.cjb.net/temp/test.sav -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net GPG Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.spss, locale and encodings
On Wed, Apr 08, 2009 at 07:12:23PM +0200, Peter Dalgaard wrote: Apparently, you can work around it like this lc - Sys.setlocale(LC_CTYPE) Sys.setlocale(LC_CTYPE, da_DK) x - read.spss(~/Desktop/downloads/test.sav, reencode = latin1) Sys.setlocale(LC_CTYPE, lc) -- which doesn't strike me as particularly logical, but whatever works THANKS a lot Peter! This works perfectly! I had been struggling with this problem way too long... -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net GnuPG key: 1024D/7050614E Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E Learn about secure email at http://www.gnupg.org signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Converting a whole dataframe (including attributes) from latin1 to UTF-8
Hi list! Short version: How do I convert a whole data.frame from latin1 encoding to utf8? I get SPSS files with latin1 encoding. My OS is GNU/Linux and the locale sv_SE.utf8, and I normally interface R with Emacs/ESS. I have used the following hack to convert a data.frame in latin1 to utf8: Sys.setlocale(category = LC_ALL, locale = sv_SE.iso88591) foo - read.spss(foo.sav, to.data.frame=TRUE) write.table(foo, foo.data) $ recode lat1..utf8 foo.data Sys.setlocale(category = LC_ALL, locale = sv_SE.utf8) foo - read.table(foo.data) I have now found two problems with this approach: a) variable.labels is droped b) the order of unordered factors is changed I had just worked out a hack for a) when I realised b). b) is a problem when the factors really is ordered, but not recognized as such by read.spss (and/or not defined as such in SPSS, but since SPSS respects the numeric values of the factors anyway, users don't need to) Rather than hack around b) too, I wonder if anyone on the list know how to convert a whole data.frame from latin1 encoding to utf8? TIA -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net A. Because it breaks the logical sequence of discussion Q. Why is top posting bad? signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA and categorical data
On Fri, Mar 06, 2009 at 09:46:17AM -, Ted Harding wrote: On 06-Mar-09 09:25:26, Prof Brian Ripley wrote: You might want to look into correspondence analysis, which has several variants of PCA designed for categorical data. In particular, have a look at the results of RSiteSearch(correspondence) I can recommend the packages ca and FactoMineR http://cran.r-project.org/web/packages/ca/index.html http://cran.r-project.org/web/packages/FactoMineR/index.html http://www.jstatsoft.org/v20/i03 http://www.jstatsoft.org/v25/i01 -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net GPG Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] frequency table for multiple variables
Hi r-help! Consider the following data-frame: var1 var2 var3 1 314 2 223 3 223 4 44 NA 5 435 6 223 7 343 How can I get R to convert this into the following? Value 1 2 3 4 5 var1 0 3 2 2 0 var2 1 3 1 2 0 var3 0 0 4 1 1 TIA, -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net Q. What is that strange attachment in this mail? A. My digital signature, see www.gnupg.org for info on how you could use it to ensure that this mail is from me and has not been altered on the way to you. signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] frequency table for multiple variables
On Tue, Feb 17, 2009 at 10:00:40AM -0600, Marc Schwartz wrote: on 02/17/2009 09:06 AM Hans Ekbrand wrote: Hi r-help! Consider the following data-frame: var1 var2 var3 1 314 2 223 3 223 4 44 NA 5 435 6 223 7 343 How can I get R to convert this into the following? Value 1 2 3 4 5 var1 0 3 2 2 0 var2 1 3 1 2 0 var3 0 0 4 1 1 t(sapply(DF, function(x) table(factor(x, levels = 1:5 1 2 3 4 5 var1 0 3 2 2 0 var2 1 3 1 2 0 var3 0 0 4 1 1 The key is to turn each column into a factor with explicitly defined common levels for tabulation. This enables the table result to have a consistent format across each column, allowing for a matrix to be created, rather than a list. Thanks alot, Marc. Neat and efficient, just what I wanted. BTW, before I saw that you actually included code, I tried on my own, and wrote this: my.count - function(data.frame, levels) { result.df - data.frame(matrix(nrow=length(data.frame),ncol=levels)) for (i in 1:length(data.frame)) { result.df[i,] - table(factor(data.frame[[i]], levels = c(1:levels))) } result.df } which produces the same result. I take this to be a an instructive example of unnecessary use of for-loops in R. -- Hans Ekbrand (http://sociologi.cjb.net) h...@sociologi.cjb.net Q. What is that strange attachment in this mail? A. My digital signature, see www.gnupg.org for info on how you could use it to ensure that this mail is from me and has not been altered on the way to you. signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pros and Cons of R
On Thu, May 22, 2008 at 02:07:01PM -0400, R P Herrold wrote: On Thu, 22 May 2008, Monica Pisica wrote: [...] When a new R version is in place you cannot up-grade your old R version, you have to do a new installation and re-load all the packages you used to have and delete / un-install the old version ummm -- this is of course a function of the package manager and operating system being used, and not of R intrinsicly; under an RPM package manager, this issue is not present Neither under .deb based OS:es such as Ubuntu and Debian. -- Hans Ekbrand (http://sociologi.cjb.net) [EMAIL PROTECTED] Q. What is that strange attachment in this mail? A. My digital signature, see www.gnupg.org for info on how you could use it to ensure that this mail is from me and has not been altered on the way to you. signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] howto import .xls and .ods
On Fri, May 02, 2008 at 07:35:37AM +0100, Prof Brian Ripley wrote: There is a *manual* on R Data Import/Export, not just an FAQ. This is the first request I have seen for .ods (whatever that is -- The most well-known application that uses this file format is the Calc (Spreadsheet) part of the Open Office Suite. Pasted from http://en.wikipedia.org/wiki/OpenDocument OpenDocument Spreadsheet Image:X File extension.ods application/vnd. Internet media type oasis.opendocument. spreadsheet Developed by Sun Microsystems, OASIS Type of formatSpreadsheet Extended fromXML -- Hans Ekbrand (http://sociologi.cjb.net) [EMAIL PROTECTED] GPG Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Get inside a function the name of a variable called as argument?
On Tue, Apr 29, 2008 at 03:53:02PM +0200, Julien Roux wrote: Hi list, I created a function to plot my data: plot_function(vector) I want to write the name of the argument vector in the legend/title of the plot. For example if I call: plot_function(my_vector) I want my_vector to be written in the legend or title and so retrieve the name of this object as a string. Is it possible to achieve this? While it might be possible, I think it would be better to use an extra argument for this: plot_function(my_vector, title = my_title) Functions should be general, and relying on the name of the variable makes your function less general. What if you in the future want to use plot_function with an anynmous vector created dynamically? e.g by combining two other vectors: plot_function(c(foo, bar)) -- Hans Ekbrand (http://sociologi.cjb.net) [EMAIL PROTECTED] GPG Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problems with rgl in Ubuntu 'gutsy'
On Mon, Mar 17, 2008 at 09:11:05PM -, Foadi, J (James) wrote: On Mon, Mar 17, 2008 at 05:43:58PM -, Foadi, J (James) wrote: [...] | all rgl windows device do not show the outer frame, so I cannot move or resize the window. | In addition to that, the image is like frozen, it can't be either rotated or shifted at all. | | In short, rgl seems not to behave according to expectations. No error messages appear at any stage. | | Has anyone a clue on what's going wrong here? Did you try the pre-built version, ie 'sudo apt-get install r-cran-rgl' ? Does X do gl-rendering OK with other applications? (what does $ glxgears -info I've tried what you suggest. This is the output: [EMAIL PROTECTED]:~/workR$ glxgears -info GL_RENDERER = Mesa DRI Intel(R) 945GM 20061017 x86/MMX/SSE2 GL_VERSION= 1.3 Mesa 7.0.1 [...] 3976 frames in 5.0 seconds = 795.106 FPS 4017 frames in 5.0 seconds = 803.364 FPS 3988 frames in 5.0 seconds = 797.593 FPS And a windows with moving gears appears, and it has a frame. But it doesn't click and drag easily. On my system that window can be moved and even resized without problems (keeping a high rendering rate). Sorry, but I my test didn't give any conclusive new facts about the cause of the problem. -- Hans Ekbrand (http://sociologi.cjb.net) [EMAIL PROTECTED] Q. What is that strange attachment in this mail? A. My digital signature, see www.gnupg.org for info on how you could use it to ensure that this mail is from me and has not been altered on the way to you. signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouped colSums without for loops?
On Tue, Mar 18, 2008 at 05:57:59AM -0500, jim holtman wrote: Is this what you want? lapply(split(d, d$foo), function(x) colSums(x[,-1])) Yes! Thank you! the *apply functions seem very powerful, thanks again for giving me a hint on how to use them. -- Hans Ekbrand (http://sociologi.cjb.net) [EMAIL PROTECTED] Q. What is that strange attachment in this mail? A. My digital signature, see www.gnupg.org for info on how you could use it to ensure that this mail is from me and has not been altered on the way to you. signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouped colSums without for loops?
On Tue, Mar 18, 2008 at 12:05:23PM +0100, Albert Greinoecker wrote: try: aggregate(d[,2:3], by=list(d$foo), FUN=sum) Great! Now I can get a data.frame as well as a list, thanks! -- Hans Ekbrand (http://sociologi.cjb.net) [EMAIL PROTECTED] Signature generated by Signify v1.14. For this and more, visit http://www.debian.org/ signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hclust graphics - plotting many points
On Mon, Mar 10, 2008 at 10:19:01AM -, michael watson (IAH-C) wrote: I'd recommend outputting either as pdf or as a windows metafile -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Karin Lagesen Sent: 10 March 2008 09:54 To: r-help@r-project.org Subject: [R] hclust graphics - plotting many points Hello. I have a distance matrix with lots of distances that I use hclust to organise. I then plot the results using the plot method of hclust. However, the plot itself takes around 20 mins to make due to there being ~700 things in the matrix that I have distances for. I thus would like to dump this to some graphics format which will let me examine this further. I tried dumping it to postscript: postscript(myfile.ps, height = 50, pointsize=5) plot(my_hc_object) dev.off() What happens is that since most of the items in the matrix have a distance of zero to something everything just becomes a black smear on the bottom where I cannot distinguish anything from anything else. I thus tried increasing the heigth and/or width and also downscaling the pointsize. None of these improved anything much. So, now I am wondering if any of you have any tips for how I can get something like I get in the x11() window which I can also store and potentially show other people. Don't you have the problem of too small distances in the X11() window? I've had similar problems with a graph in graphviz, where I found it easier to get what I wanted using a png-driver instead of postscript driver. Png doesn't scale well, but it might be worth a try. png(file=myfile.png, width=3000, height=2250) plot(my_hc_object) dev.off() -- Hans Ekbrand (http://sociologi.cjb.net) [EMAIL PROTECTED] GPG Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot using colors
On Mon, Mar 03, 2008 at 02:03:07AM -0800, mysimbaa wrote: Dear R users, I have a problem since I try to plot my datas with different colors. plot(tvar, var, xlab=zeit [s],ylab=Variation [%], col = ifelse(var = varstability, 'green','red')) this works well! But since I add a type=l to my plot, it will color all the plot with green!!! Please include this too. -- Hans Ekbrand (http://sociologi.cjb.net) [EMAIL PROTECTED] GPG Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple usage of for
On Tue, Feb 19, 2008 at 04:52:19PM +0200, K. Elo wrote: Hi, Hans Ekbrand wrote (19.2.2008): I tried the following small code snippet which I copied from the Introduction to R: for (i in 2:length(meriter)) { table(meriter[[1]], meriter[[i]]) } Try: for (i in 2:length(meriter)) { print(table(meriter[[1]], meriter[[i]])) } It works, thanks! -- Hans Ekbrand signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple usage of for
On Tue, Feb 19, 2008 at 10:04:13AM -0500, Duncan Murdoch wrote: On 2/19/2008 9:24 AM, Hans Ekbrand wrote: [...] I tried the following small code snippet which I copied from the Introduction to R: for (i in 2:length(meriter)) { table(meriter[[1]], meriter[[i]]) } Where did you find that? I don't see anything like it. (If there is something like that, it should be fixed.) If you are referring to this snippet: for (i in 1:length(yc)){ plot(xc [[i ]],yc [[i ]]); abline(lsfit(xc [[i ]],yc [[i ]])) } Yes, I copied the for construct from that snippet. then it has the important difference that plot() and abline() both have side effects (they do plotting), whereas table() doesn't. I see. Thanks for the explanation. -- Hans Ekbrand signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simple usage of for
Hi list I have a data frame I would like to loop over. To begin with I would like crosstabulations using the first variabel in the data frame, which is called meriter. table(meriter[[1]], meriter[[3]]) ja nej Annan 0 2 1 Avdelningen för teknik- och vetenskapsstudier 0 5 1 CEFOS 0 6 3 Förvaltningshögskolan 0 13 6 Institutionen för globala studier 0 20 12 Institutionen för journalistik och masskommunikation 0 5 17 Institutionen för socialt arbete 1 19 35 Psykologiska institutionen0 24 21 Sociologiska institutionen0 16 12 Statsvetenskapliga institutionen 0 19 12 I tried the following small code snippet which I copied from the Introduction to R: for (i in 2:length(meriter)) { table(meriter[[1]], meriter[[i]]) } And there is no output at all, just a new prompt. I added a print statement just to check the loop construct, and it seems to work. for (i in 2:length(meriter)) { print(i); table(meriter[[1]], meriter[[i]]) } [1] 2 [1] 3 [1] 4 But I get no tables :-( What do I do wrong? -- Hans Ekbrand (http://sociologi.cjb.net) [EMAIL PROTECTED] GPG Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Solution: ploting a comparison of two scores, including the labels in the plot
Thanks to Greg Snow and John Kane I now have a working function that does what I wanted, that is compares two scores in a plot. Here is the function: ## compare.ratings: plots two lists corresponding to two different ## ratings. For each element, a line connects the position of that ## element in the two lists. compare.ratings - function(data.frame=df, vector1=rating1, vector2=rating2, vector3=labels) { treshold - 0.1 data.frame - data.frame[sort.list(data.frame[[vector2]]),] for(i in 2:length(data.frame[,vector2])) { data.frame[i,vector2] - data.frame[i,vector2] + (treshold * (i-1)) } data.frame - data.frame[sort.list(data.frame[[vector1]]),] for(i in 1:length(data.frame[,vector1])) { data.frame[i,vector1] - data.frame[i,vector1] + (treshold * (i-1)) } tmp - c(rbind( data.frame[[vector1]], data.frame[[vector2]], NA )) tmp2 - rep( c(1,2,NA), nrow(data.frame) ) plot(tmp2, tmp, type='b', xlim=c(0,3), xlab='', ylab='', lwd=0.5) text(0.9, data.frame[[vector1]], data.frame[[vector3]], adj=1, cex=0.75) text(2.1, data.frame[[vector2]], data.frame[[vector3]], adj=0, cex=0.75) } -- Hans Ekbrand (http://sociologi.cjb.net) [EMAIL PROTECTED] A. Because it breaks the logical sequence of discussion Q. Why is top posting bad? signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ploting a comparison of two scores, including the labels in the plot
On Mon, Nov 05, 2007 at 10:51:08AM -0700, Greg Snow wrote: Does the following do what you want (or at least start you in the correct direction)? mydata - data.frame( job=c(Ambassadör,Läkare,Domare, Professor,Advokat,Pilot,Verkställande direktör,Forskare, Civilingenjör,Statsråd), SAMHM= c(8.32, 8.15, 8.14, 8.13, 7.95, 7.81, 7.78, 7.60, 7.47, 7.41),INDM= c( 7.2771, 8.1029, 7.5965, 7.5618, 7.1876, 7.4380, 6.8361, 7.6630, 6.8802, 6.3916)) tmp - c(rbind( mydata$SAMHM, mydata$INDM, NA )) tmp2 - rep( c(1,2,NA), nrow(mydata) ) plot(tmp2, tmp, type='b', xlim=c(0,3), xlab='', ylab='rating') text(0.9, mydata$SAMHM, mydata$job, adj=1, cex=0.75) text(2.1, mydata$INDM, mydata$job, adj=0, cex=0.75) Yes, definately! Thanks Greg, now I'll just increase the smallest differences to a minimum so the labels becomes readable. -- Hans Ekbrand (http://sociologi.cjb.net) [EMAIL PROTECTED] A. Because it breaks the logical sequence of discussion Q. Why is top posting bad? signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ploting a comparison of two scores, including the labels in the plot
Hello r-help! I have data with two kind of ratings on status of 100 occupations. The first kind of rating is on the percieved objective status that these occupations have in society at large, and the second kind or rating is on the status that the respondents think that these occuption *should* have. The ratings were originally integer values in the rage 1-9, but in the current data, I use their mean values. Here is an printout for the first 10 occupations: (the occupation names are in swedish) data.frame(myobj[1:10, c(YRKE, SAMHM, INDM)], row.names = YRKE) SAMHM INDM Ambassadör 8.32 7.2771 Läkare (doctor) 8.15 8.1029 Domare (judge) 8.14 7.5965 Professor 8.13 7.5618 Advokat (lawyer) 7.95 7.1876 Pilot 7.81 7.4380 Verkställande direktör 7.78 6.8361 Forskare (scientist 7.60 7.6630 Civilingenjör (engineer) 7.47 6.8802 Statsråd (minister) 7.41 6.3916 I would like to make a plot with two lists. The first list should list the occupations ordered by SAMHM (as in the printout above) and the values of SAMH. The linespacing in this list should be increased by the difference in SAMH between the the occupations (i.e. between Ambassadör and Läkare (eng. doctor) there should be a larger linespaceing than between Läkare and Domare (eng. judge)). The second list should be like the first, but based on INDM instead of SAMH. These two list should ideally be plotted side by side with lines connecting each occuption. Here is an ascii-art illustration of what I intend (excluding the connecting lines, which are hard to draw with ascii :-) -- Ambassadör Läkare (doctor) Domare (judge) Professor Läkare Advokat (lawyer) Pilot Verkställande direktör Forskare (scientist) Forskare Domare Civilingenjör (engineer) Professor Statsråd (minister) Pilot Ambassadör Advokat Civilingenjör Verkställande direktör Statsråd -- If printing strings (labels) with different linespacing turns out to be problematic, another solution would be to print a list of the occupations ordered by SAMH, points of SAMH values (with Y=SAMH), points of INDM (with Y=INDM) and a list of occupations ordered by INDM, with a line for each occupation connecting the labels with the points and the two points that represents the occupation. Since there are a lot of functions for ploting and I am new to R, I would like advise on what packages/functions that should be used to get what I want (if what I want is possible to achieve with R, if it is not, then please let me know). Sample code is, of course, also very much appreciated. kind regards, -- Hans Ekbrand (http://sociologi.cjb.net) [EMAIL PROTECTED] A. Because it breaks the logical sequence of discussion Q. Why is top posting bad? signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ploting a comparison of two scores, including the labels in the plot
On Thu, Nov 01, 2007 at 02:52:08PM -0400, John Kane wrote: I gave it a try with conventional plot and it does not look easy to get a good result. Thanks alot John Kane! While I see what you mean, I think your solution does a good job and provides a basis for me to work on. If someone would recommend another plotting function (or package) to try, I would still be interested. x - YRKE SAMHM INDM Ambassadör 8.32 7.2771 Läkare 8.15 8.1029 Domare 8.14 7.5965 Professor 8.13 7.5618 Advokat7.95 7.1876 Pilot 7.81 7.4380 Verkställande.direktör 7.78 6.8361 Forskare 7.60 7.6630 Civilingenjör 7.47 6.8802 Statsråd 7.41 6.3916 status - read.table(textConnection(x), header=TRUE) xx1 - c(rep(1,10)) xx2 - c(rep(2,10)) plot(xx1, status[,2], xaxt='s', yaxt='s', xlim=c(.5,2.5), ylim=c(min(status[,3]),max(status[,2])), type='p', xlab=, ylab=) points(xx2,status[,3]) segments(xx1,status[,2],xx2,status[,3]) text(xx1-.1,status[,2], labels=status[,1], cex=.6) text(xx2+.1, status[,3], labels=status[,1], cex=.6) -- Hans Ekbrand (http://sociologi.cjb.net) [EMAIL PROTECTED] Q. What is that strange attachment in this mail? A. My digital signature, see www.gnupg.org for info on how you could use it to ensure that this mail is from me and has not been altered on the way to you. signature.asc Description: Digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.