Re: [R] png-generation from script (non-X11 env) now broken?
Hi, You can try to use cairo instead of png. IT doesn’t require X. try putting this line in a Rprofile file : options(bitmapType=cairo) Massimo. On Feb 25, 2014, at 1:36 PM, David Wolfskill r...@catwhisker.org wrote: Since ... hmmm.. May 2012, I've had a cron-initiated script running monthly to extract some data and generate some numeric tables and plots (in PNG format). In its normal mode, the script automagically sends its results to a small handful of managers on the first of each month. Out of self-defense, I set up a preview run several days before the real one; the preview doesn't go quite as high in the management chain. This morning, I found a message in my mailbox from such a preview run: | Error in .External2(C_X11, paste(png::, filename, sep = ), g$width, : | unable to start device PNG | Calls: gen_build_histogram - png | In addition: Warning message: | In png(filename = fn, height = height, width = width, units = in, : | unable to open connection to X11 display '' | Execution halted [This was running R-3.0.2_1 in FreeBSD/amd64 9.2-STABLE r262323M/262357:902506, built Sun Feb 23 05:14:37 PST 2014.] I admit that I haven't been following changes in R very closely -- the script had been running reliably for over a year, and it's not one of the fires I've been fighting recently. But the implication -- that png() was suddenly[1] changed so that it requires the ability to connect to an X11 display -- seemed peculiar enough that I would have thought a quick search would lead me to soe discussion of a change of this magnitude, rationale, and what folks might do to mitigate the effects. Based on one message I found (from 02 Mar 2008!), I did take a look at help png output, and saw the reference there to bitmap(). I tried ... hacking ... my functions to use bitmap(fn, ...) instead of png(filename = fn, ...), but that looks as if it's leading me astray: | Error in strwidth(legend, units = user, cex = cex, font = text.font) : | family 'mono' not included in postscript() device | Calls: gen_build_histogram - legend - strwidth | Execution halted So... what's someone who wants to use R to generate PNG-format files from an environment that is bereft of X11 to do? 1: Suddenly because it *had* been working up to 01 Feb 2014, at least. [I'll be happy to sumamrize responses that aren't sent to the list.] Peace, david -- David H. Wolfskillr...@catwhisker.org Taliban: Evil cowards with guns afraid of truth from a 14-year old girl. See http://www.catwhisker.org/~david/publickey.gpg for my public key. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help understanding hierarchical clustering
Hi David, thank yuou so much for helping me! Il giorno 01/mag/2013, alle ore 10:16, David Carlson dcarl...@tamu.edu ha scritto: You need to clarify what you are trying to achieve and fix some errors in your code. First, thanks for giving us reproducible data. i tried to fix the errors and uploaded a new link to data and code [1] Thanks for your advice! i'll try to describe the dataset : in the csv are stored information recorded by an underwater towed camera [imagename, temp, sal, depth_m] plus 3 fields added later by an image analyst [idcode, count, subs] so each ROW in the data is composed by - idcode (unique identifier for specie) - count (how many individuals of species 'J' are found in image 'X' ) - temp (temperature) - sal (salinity) - depth_m (depth in meters) - subs (substrate complexity, integer number describing the seafloor texture [hard - soft bottom] ) The csv looks like : idcodecounttempsal depth_m subs 16001136 4.308 32.828 63.4647 .. 10010 1 4.342 32.865 83.5835 Once you have read the file, you seem to be attempting to remove cases with missing values, but you check for missing values of count twice and you never check depth. The whole line can be replaced with dd - na.omit(mat) my mistake sorry about that. fixed in the code Now you have data with complete cases. In your next step you create a distance matrix that includes idcode as a variable! Although it is numeric, it is really a categorical variable. That suggests you need to read up on R and cluster analysis. It is very likely that you want to exclude this variable from the distance matrix and possibly the count variable as well. big mistake here, idcode is my categorical value the one i'm trying in grouping into classes fixed in the code, i now running the code including the count [ dd1 ] and without including count [ dd2 ] the count should express the density for each species with particular environmental parameters associated (i think it was important, it isn't?) What does one row of data represent? You have 8036 complete cases representing data on 100 species. There are great differences in the number of rows for each species (idcode) ranging from 1 to 1066. - trying to clem up the dataset should i remove the records for the idcode that are not well represented (IDcode with a low number of records) so to have a subset of representative species ? - idcodelist = [id_1, , id_N] with count(id_i) = X note : in the data each record refer to a single species identified in an image, this means that there are multiple records for the same image (one record for each species identified in a single image) in the database i have an unique [imagename] and position [lon lat] for each image, should i include this information in my csv ? so that it looks like : idcode count tempsaldepth_msubs lon lat imagename 16001 136 4.308 32.82863.46 47 x1y1 image_year_day_h_m_ms_1 18005 154.308 32.82863.46 47 x1y1 image_year_day_h_m_ms_1 .. 100105 4.342 31.925 82.18 35 xNyN image_year_day_h_m_ms_N 130101 4.342 31.925 82.18 35 xNyN image_year_day_h_m_ms_N and group my data by [imagename] adding a field for each representative species where to store the relative count ? the example below should look like : count_id_1 count_id_2 count_id_5 count_id_9 idcode_N-1 idcode_N temp sal depth_m subs lon lat imagename 136 0 15 0 00 4.308 32.828 63.46 47 x1y1 image_year_day_h_m_ms_1 .. 05 0 0 1 04.342 31.925 82.18 35 xNyNimage_year_day_h_m_ms_N where : count_id_1 is the count for the species with idcode 16001 in the image Xi count_id_5 // 16005 // count_id_2 // 10010 // count_id_N-1 // 13010 // thank you for any further advice, Massimo. [1] http://nbviewer.ipython.org/5497996 - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of epi Sent: Tuesday, April 30, 2013 8:06 PM To: r-help@r-project.org Subject
Re: [R] help understanding hierarchical clustering
Hi David, thank yuou so much for helping me! Il giorno 01/mag/2013, alle ore 10:16, David Carlson dcarl...@tamu.edu ha scritto: You need to clarify what you are trying to achieve and fix some errors in your code. First, thanks for giving us reproducible data. i tried to fix the errors , thanks for your advice! Once you have read the file, you seem to be attempting to remove cases with missing values, but you check for missing values of count twice and you never check depth. The whole line can be replaced with dd - na.omit(mat) Now you have data with complete cases. In your next step you create a distance matrix that includes idcode as a variable! Although it is numeric, it is really a categorical variable. That suggests you need to read up on R and cluster analysis. It is very likely that you want to exclude this variable from the distance matrix and possibly the count variable as well. i excluded idcode and count from the distance matrix What does one row of data represent? You have 8036 complete cases representing data on 100 species. There are great differences in the number of rows for each species (idcode) ranging from 1 to 1066. i'm trying to clean-up the data, i removed all the records where the species idcode is found less than 100 times I uploaded a new link to the new-data and code [1] is this correct ? can i go further and try to understand which species are assigned for each branch of the dendrogram at a specified cut-level ? thanks All for any further help! Massimo. [1] http://nbviewer.ipython.org/5499800 - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of epi Sent: Tuesday, April 30, 2013 8:06 PM To: r-help@r-project.org Subject: [R] help understanding hierarchical clustering Hi All, i've problem to understand how to work with R to generate a hierarchical clustering my data are in a csv and looks like : idcode,count,temp,sal,depth_m,subs 16001,136,4.308,32.828,63.46,47 16001,109,4.31,32.829,63.09,49 16001,107,4.302,32.822,62.54,47 16001,87,4.318,32.834,62.54,48 16002,82,4.312,32.832,63.28,49 16002,77,4.325,32.828,65.65,46 16002,77,4.302,32.821,62.36,47 16002,71,4.299,32.832,65.84,37 16002,70,4.302,32.821,62.54,49 where idcode is a specie identification number and the other fields are environmental parameters. library(vegan) mat-read.csv(http://epi.whoi.edu/ipython/results/mdistefano/pg_site1.csv;, header=T) dd - mat[!is.na(mat$idcode) !is.na(mat$temp) !is.na(mat$sal) !is.na(mat$count) !is.na(mat$count) !is.na(mat$subs),] distmat-vegdist(dd) clusa-hclust(distmat,average) print(clusa) Call: hclust(d = distmat, method = average) Cluster method : average Distance : bray Number of objects: 8036 print(dend1 - as.dendrogram(clusa)) 'dendrogram' with 2 branches and 8036 members total, at height 0.3194225 dend2 - cut(dend1, h=0.07) a complete run with plots is available here : http://nbviewer.ipython.org/5492912 i'm trying try to group together the species (idcode's) that are sharing similar environmental parameters like (looking at the plots) i should be able to retrieve the list of idcode for each branch at cut-level X in the example : X = 0.07 branches1 : [idcodeA, .. .. ,idcodeJ] .. .. branche6 : [idcodeB, .. .. , idcodeK] Many thanks for your precious help!!! Massimo. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help understanding hierarchical clustering
Hi All, i've problem to understand how to work with R to generate a hierarchical clustering my data are in a csv and looks like : idcode,count,temp,sal,depth_m,subs 16001,136,4.308,32.828,63.46,47 16001,109,4.31,32.829,63.09,49 16001,107,4.302,32.822,62.54,47 16001,87,4.318,32.834,62.54,48 16002,82,4.312,32.832,63.28,49 16002,77,4.325,32.828,65.65,46 16002,77,4.302,32.821,62.36,47 16002,71,4.299,32.832,65.84,37 16002,70,4.302,32.821,62.54,49 where idcode is a specie identification number and the other fields are environmental parameters. library(vegan) mat-read.csv(http://epi.whoi.edu/ipython/results/mdistefano/pg_site1.csv,header=T) dd - mat[!is.na(mat$idcode) !is.na(mat$temp) !is.na(mat$sal) !is.na(mat$count) !is.na(mat$count) !is.na(mat$subs),] distmat-vegdist(dd) clusa-hclust(distmat,average) print(clusa) Call: hclust(d = distmat, method = average) Cluster method : average Distance : bray Number of objects: 8036 print(dend1 - as.dendrogram(clusa)) 'dendrogram' with 2 branches and 8036 members total, at height 0.3194225 dend2 - cut(dend1, h=0.07) a complete run with plots is available here : http://nbviewer.ipython.org/5492912 i'm trying try to group together the species (idcode's) that are sharing similar environmental parameters like (looking at the plots) i should be able to retrieve the list of idcode for each branch at cut-level X in the example : X = 0.07 branches1 : [idcodeA, .. .. ,idcodeJ] .. .. branche6 : [idcodeB, .. .. , idcodeK] Many thanks for your precious help!!! Massimo. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] install error - Netcdf library (linux)
Hi All, i'm on a debian linux 64bit, i'm tying to install the netcdf intraface, i tried both ncdf and ncdf4 but trying to build i received the error : (i have necdf installed on my machine and it is able to fiund it .. no missed .h) epy@epinux:~$ sudo R CMD INSTALL --configure-args=-with-netcdf_incdir=/usr/include -with-netcdf_libdir=/usr/lib ncdf4_1.8.tar.gz * installing to library ‘/usr/local/lib/R/site-library’ * installing *source* package ‘ncdf4’ ... checking for nc-config... yes Using nc-config: nc-config Output of nc-config --all: This netCDF 4.2.1.1 has been built with the following features: --cc- gcc --cflags- -I/usr/local/include -I/usr/local/include --libs - -L/usr/local/lib -lnetcdf --has-c++ - no --cxx - --has-c++4 - no --cxx4 - --fc- --fflags- --flibs - --has-f90 - no --has-dap - yes --has-nc2 - yes --has-nc4 - yes --has-hdf5 - yes --has-hdf4 - no --has-pnetcdf- no --has-szlib - --prefix- /usr/local --includedir- /usr/local/include --version - netCDF 4.2.1.1 --- netcdf.m4: about to set rpath, here is source string: -L/usr/local/lib -lnetcdf netcdf.m4: final rpath: -Wl,-rpath,/usr/local/lib Netcdf library version: netCDF 4.2.1.1 Netcdf library has version 4 interface present: yes Netcdf library was compiled with C compiler: gcc configure: creating ./config.status config.status: creating R/load.R config.status: creating src/Makevars ** Results of ncdf4 package configure *** netCDF v4 CPP flags = -I/usr/local/include -I/usr/local/include netCDF v4 LD flags = -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lnetcdf netCDF v4 runtime path = -Wl,-rpath,/usr/local/lib ** ** libs gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I/usr/local/include -I/usr/local/include -fpic -O2 -pipe -g -c ncdf.c -o ncdf.o ncdf.c: In function ‘R_nc4_nctype_to_Rtypecode’: ncdf.c:40:18: error: ‘NC_INT’ undeclared (first use in this function) ncdf.c:40:18: note: each undeclared identifier is reported only once for each function it appears in ncdf.c:49:18: error: ‘NC_UBYTE’ undeclared (first use in this function) ncdf.c:51:18: error: ‘NC_USHORT’ undeclared (first use in this function) ncdf.c:53:18: error: ‘NC_UINT’ undeclared (first use in this function) ncdf.c:55:18: error: ‘NC_INT64’ undeclared (first use in this function) ncdf.c:57:18: error: ‘NC_UINT64’ undeclared (first use in this function) ncdf.c: In function ‘R_nc4_varsize’: ncdf.c:69:28: error: ‘NC_MAX_DIMS’ undeclared (first use in this function) ncdf.c:75:2: warning: implicit declaration of function ‘nc_inq_varndims’ [-Wimplicit-function-declaration] ncdf.c:78:4: warning: implicit declaration of function ‘nc_strerror’ [-Wimplicit-function-declaration] ncdf.c:84:2: warning: implicit declaration of function ‘nc_inq_vardimid’ [-Wimplicit-function-declaration] ncdf.c:94:3: warning: implicit declaration of function ‘nc_inq_dimlen’ [-Wimplicit-function-declaration] ncdf.c: In function ‘R_nc4_inq_varunlim’: ncdf.c:112:2: warning: implicit declaration of function ‘nc_inq_unlimdim’ [-Wimplicit-function-declaration] ncdf.c: In function ‘R_nc4_inq_var’: ncdf.c:152:2: warning: implicit declaration of function ‘nc_inq_var’ [-Wimplicit-function-declaration] ncdf.c: In function ‘R_nc4_inq_vartype’: ncdf.c:168:2: warning: implicit declaration of function ‘nc_inq_vartype’ [-Wimplicit-function-declaration] ncdf.c: In function ‘R_nc4_inq_varname’: ncdf.c:181:2: warning: implicit declaration of function ‘nc_inq_varname’ [-Wimplicit-function-declaration] ncdf.c: In function ‘R_nc4_get_vara_double’: ncdf.c:214:2: warning: implicit declaration of function ‘nc_get_vara_double’ [-Wimplicit-function-declaration] ncdf.c: In function ‘R_nc4_get_vara_int’: ncdf.c:257:2: warning: implicit declaration of function ‘nc_get_vara_int’ [-Wimplicit-function-declaration] ncdf.c: In function ‘R_nc4_get_vara_text’: ncdf.c:313:2: warning: implicit declaration of function ‘nc_get_vara_text’ [-Wimplicit-function-declaration] ncdf.c: In function ‘R_nc4_inq_dimid’: ncdf.c:345:2: warning: implicit declaration of function ‘nc_inq_dimid’ [-Wimplicit-function-declaration] ncdf.c: In function ‘R_nc4_inq_varid’: ncdf.c:355:2: warning: implicit declaration of function ‘nc_inq_varid’ [-Wimplicit-function-declaration] ncdf.c: In function ‘R_nc4_inq_dimids’: ncdf.c:377:9: warning: implicit declaration of function ‘nc_inq_dimids’ [-Wimplicit-function-declaration] ncdf.c: In function ‘R_nc4_inq_dim’: ncdf.c:387:12: error: ‘NC_MAX_NAME’ undeclared (first use in this function) ncdf.c:391:2: warning: implicit declaration of function ‘nc_inq_dim’ [-Wimplicit-function-declaration] ncdf.c:408:2: warning: implicit declaration of function ‘nc_inq_unlimdims’ [-Wimplicit-function-declaration] ncdf.c: In function ‘R_nc4_inq’: ncdf.c:451:2: warning: implicit declaration
[R] deduplication
Colleagues, I am trying to de-duplicate a large (long) database (approx 1mil records) of diagnostic tests. Individuals in the database can have up-to 25 observations, but most will have only one. IDs for de-duplication (names, sex, lab number...) are patchy. In a first step, I am using Andreas Borg's excellent record linkage package (), that leaves me with a list of 'pairs' looking very much like this: id1-c(4,17,9,1,1,1,3,3,6,15,1,1,1,1,3,3,3,3,4,4,4,5,5,12,9,9,10,10) id2-c(8,18,10,3,6,7,6,7,7,16,4,5,12,18,4,5,12,18,5,12,18,12,18,18,15,16,15,16) id-data.frame(cbind(id1,id2)) where a pair means that the records belong to the same individual (e.g., record 4 and record 8; 17 and 18...). My problem now is to get a list with all records that belong to the same person (in the example, obervations 1,3,4,5,6,7,8,12, 17 and 18 are all from the same person). The problem is to find the link between 1 and 8 (only through 1 and 4 and 4 and 8) and the link between 1 and 17 (through 18). I can do it in my head, but I am missing the code that would work its way through too many records. Any clever ideas? (using R 2.10.1 on Windows XP) Thanks, Christian -- View this message in context: http://r.789695.n4.nabble.com/deduplication-tp2241637p2241637.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.