Re: [R] png-generation from script (non-X11 env) now broken?

2014-02-25 Thread epi
Hi,

You can try to use cairo instead of png.
IT doesn’t require X.
try putting this line in a Rprofile file :


options(bitmapType=cairo) 


Massimo.


On Feb 25, 2014, at 1:36 PM, David Wolfskill r...@catwhisker.org wrote:

 Since ... hmmm.. May 2012, I've had a cron-initiated script running
 monthly to extract some data and generate some numeric tables and
 plots (in PNG format).
 
 In its normal mode, the script automagically sends its results
 to a small handful of managers on the first of each month.  Out of
 self-defense, I set up a preview run several days before the real
 one; the preview doesn't go quite as high in the management chain.
 
 This morning, I found a message in my mailbox from such a preview run:
 
 | Error in .External2(C_X11, paste(png::, filename, sep = ), g$width,  :
 |   unable to start device PNG
 | Calls: gen_build_histogram - png
 | In addition: Warning message:
 | In png(filename = fn, height = height, width = width, units = in,  :
 |   unable to open connection to X11 display ''
 | Execution halted
 
 [This was running R-3.0.2_1 in FreeBSD/amd64 9.2-STABLE 
 r262323M/262357:902506, built Sun Feb 23 05:14:37 PST 2014.]
 
 I admit that I haven't been following changes in R very closely --
 the script had been running reliably for over a year, and it's not
 one of the fires I've been fighting recently.
 
 But the implication -- that png() was suddenly[1] changed so that it
 requires the ability to connect to an X11 display -- seemed peculiar
 enough that I would have thought a quick search would lead me to soe
 discussion of a change of this magnitude, rationale, and what folks
 might do to mitigate the effects.
 
 Based on one message I found (from 02 Mar 2008!), I did take a look at
 help png output, and saw the reference there to bitmap().  I tried ...
 hacking ... my functions to use bitmap(fn, ...) instead of
 png(filename = fn, ...), but that looks as if it's leading me astray:
 
 | Error in strwidth(legend, units = user, cex = cex, font = text.font) : 
 |   family 'mono' not included in postscript() device
 | Calls: gen_build_histogram - legend - strwidth
 | Execution halted
 
 
 So... what's someone who wants to use R to generate PNG-format files
 from an environment that is bereft of X11 to do?
 
 
 1: Suddenly because it *had* been working up to 01 Feb 2014, at least.
 
 [I'll be happy to sumamrize responses that aren't sent to the list.]
 
 Peace,
 david
 -- 
 David H. Wolfskillr...@catwhisker.org
 Taliban: Evil cowards with guns afraid of truth from a 14-year old girl.
 
 See http://www.catwhisker.org/~david/publickey.gpg for my public key.
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help understanding hierarchical clustering

2013-05-01 Thread epi
Hi David,

thank yuou so much for helping me!

Il giorno 01/mag/2013, alle ore 10:16, David Carlson dcarl...@tamu.edu ha 
scritto:

 You need to clarify what you are trying to achieve and fix some errors in
 your code. First, thanks for giving us reproducible data. 
 

i tried to fix the errors and uploaded a new link to data and code [1]
Thanks for your advice!

i'll try to describe the dataset :

in the csv  are stored information recorded by an underwater towed camera 

[imagename, temp, sal, depth_m] 

plus 3 fields added later by an image analyst 

[idcode, count, subs]

so each ROW in the data is composed by 

- idcode (unique identifier for specie) 
- count  (how many individuals of species 'J' are found in image 'X' )
- temp (temperature)
- sal (salinity)
- depth_m (depth in meters)
- subs (substrate complexity, integer number describing the seafloor texture 
[hard - soft bottom] )

The csv looks like :


  idcodecounttempsal  depth_m   subs
  16001136   4.308   32.828   63.4647
..
  10010 1   4.342   32.865   83.5835


 Once you have read the file, you seem to be attempting to remove cases with
 missing values, but you check for missing values of count twice and you
 never check depth. The whole line can be replaced with
 
 dd - na.omit(mat)

my mistake sorry about that.
fixed in the code

 
 Now you have data with complete cases. In your next step you create a
 distance matrix that includes idcode as a variable! Although it is
 numeric, it is really a categorical variable. That suggests you need to read
 up on R and cluster analysis. It is very likely that you want to exclude
 this variable from the distance matrix and possibly the count variable as
 well. 

… big mistake here, idcode is my categorical value 
the one i'm trying in grouping into classes

fixed in the code, i now running the code including the count [ dd1 ]  
and without including count [ dd2 ]

the count should express the density for each species with particular 
environmental parameters associated (i think it was important, it isn't?)


 
 What does one row of data represent? You have 8036 complete cases
 representing data on 100 species. There are great differences in the number
 of rows for each species (idcode) ranging from 1 to 1066. 

- trying to clem up the dataset 
  should i remove the records for the idcode that are not well represented 
(IDcode  with a low number of records)
  so to have a subset of representative species ?

- idcodelist = [id_1, … , id_N]  
  with count(id_i) = X

note :
in the data each record refer to a single species identified in an image, 
this means that there are multiple records for the same image (one record for 
each species identified in a single image)

in the database i have an unique [imagename] and position [lon lat]  for each 
image, should i include this information in my csv ?

so that it looks like :


  idcode  count   tempsaldepth_msubs   lon   lat   
imagename 
  16001   136  4.308   32.82863.46  47   x1y1
image_year_day_h_m_ms_1
  18005   154.308   32.82863.46  47   x1y1
image_year_day_h_m_ms_1
..
  100105  4.342  31.925 82.18 35   xNyN
image_year_day_h_m_ms_N 
  130101  4.342  31.925 82.18 35   xNyN
image_year_day_h_m_ms_N


and group my data by [imagename] adding a field for each representative species 
where to store the relative count ?

the example below should look like :

  count_id_1 count_id_2 … count_id_5 … count_id_9 … idcode_N-1 idcode_N  temp   
 sal  depth_m  subs  lon   lat   imagename 
  136   0   15   0  
 00 4.308   32.828   63.46  
  47  x1y1   image_year_day_h_m_ms_1
..
  05 0   0  
   1   04.342   31.925   82.18  
  35  xNyNimage_year_day_h_m_ms_N 

where :

count_id_1  is the count for the species with  idcode 16001 in the image Xi
count_id_5   // 
16005   //
count_id_2   // 
10010   //
count_id_N-1   //   
  13010   //


thank you for any further advice,

Massimo.

[1] http://nbviewer.ipython.org/5497996


 
 -
 David L Carlson
 Associate Professor of Anthropology
 Texas AM University
 College Station, TX 77840-4352
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of epi
 Sent: Tuesday, April 30, 2013 8:06 PM
 To: r-help@r-project.org
 Subject

Re: [R] help understanding hierarchical clustering

2013-05-01 Thread epi
Hi David,

thank yuou so much for helping me!


Il giorno 01/mag/2013, alle ore 10:16, David Carlson dcarl...@tamu.edu ha 
scritto:

 You need to clarify what you are trying to achieve and fix some errors in
 your code. First, thanks for giving us reproducible data. 
 

i tried to fix the errors , thanks for your advice!


 Once you have read the file, you seem to be attempting to remove cases with
 missing values, but you check for missing values of count twice and you
 never check depth. The whole line can be replaced with
 
 dd - na.omit(mat)
 
 Now you have data with complete cases. In your next step you create a
 distance matrix that includes idcode as a variable! Although it is
 numeric, it is really a categorical variable. That suggests you need to read
 up on R and cluster analysis. It is very likely that you want to exclude
 this variable from the distance matrix and possibly the count variable as
 well. 


 i excluded idcode and count from the distance matrix

 
 What does one row of data represent? You have 8036 complete cases
 representing data on 100 species. There are great differences in the number
 of rows for each species (idcode) ranging from 1 to 1066. 


i'm trying to clean-up the data,  i removed all the records where the species 
idcode is found less than 100 times

I uploaded a new link to the new-data and code [1]


is this correct ?
can i go further and try to understand which species are assigned for each 
branch of the dendrogram at a specified cut-level ?

thanks All for any further help!

Massimo.


[1] http://nbviewer.ipython.org/5499800

 
 -
 David L Carlson
 Associate Professor of Anthropology
 Texas AM University
 College Station, TX 77840-4352
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of epi
 Sent: Tuesday, April 30, 2013 8:06 PM
 To: r-help@r-project.org
 Subject: [R] help understanding hierarchical clustering
 
 Hi All,
 
 i've problem to understand how to work with R to generate a hierarchical
 clustering my data are in a csv and looks like :
 
 idcode,count,temp,sal,depth_m,subs
 16001,136,4.308,32.828,63.46,47
 16001,109,4.31,32.829,63.09,49
 16001,107,4.302,32.822,62.54,47
 16001,87,4.318,32.834,62.54,48
 16002,82,4.312,32.832,63.28,49
 16002,77,4.325,32.828,65.65,46
 16002,77,4.302,32.821,62.36,47
 16002,71,4.299,32.832,65.84,37
 16002,70,4.302,32.821,62.54,49
 
 where idcode is a specie identification number and the other fields are
 environmental parameters.
 
 library(vegan)
 mat-read.csv(http://epi.whoi.edu/ipython/results/mdistefano/pg_site1.csv;,
 header=T)
 dd - mat[!is.na(mat$idcode) 
  !is.na(mat$temp) 
  !is.na(mat$sal) 
  !is.na(mat$count) 
  !is.na(mat$count) 
  !is.na(mat$subs),]
 distmat-vegdist(dd)
 clusa-hclust(distmat,average)
 print(clusa)
   Call:
   hclust(d = distmat, method = average)
   
   Cluster method   : average 
   Distance : bray 
   Number of objects: 8036
 print(dend1 - as.dendrogram(clusa))
   'dendrogram' with 2 branches and 8036 members total, at height
 0.3194225
 dend2 - cut(dend1, h=0.07)
 
 
 a complete run with plots is available here :  
 
 http://nbviewer.ipython.org/5492912
 
 i'm trying try to group together the species (idcode's) that are sharing
 similar environmental parameters
 
 like (looking at the plots) i should be able to retrieve the list of idcode
 for each branch at cut-level X
 
 in the example :  
 
 
 X = 0.07 
 
 branches1 : [idcodeA, .. .. ,idcodeJ]
 ..
 ..
 branche6 : [idcodeB, .. .. , idcodeK]
 
 
 
 Many thanks for your precious help!!!
 
 Massimo.
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help understanding hierarchical clustering

2013-04-30 Thread epi
Hi All,

i've problem to understand how to work with R to generate a hierarchical 
clustering
my data are in a csv and looks like :

idcode,count,temp,sal,depth_m,subs
16001,136,4.308,32.828,63.46,47
16001,109,4.31,32.829,63.09,49
16001,107,4.302,32.822,62.54,47
16001,87,4.318,32.834,62.54,48
16002,82,4.312,32.832,63.28,49
16002,77,4.325,32.828,65.65,46
16002,77,4.302,32.821,62.36,47
16002,71,4.299,32.832,65.84,37
16002,70,4.302,32.821,62.54,49

where idcode is a specie identification number
and the other fields are environmental parameters.

library(vegan)
mat-read.csv(http://epi.whoi.edu/ipython/results/mdistefano/pg_site1.csv,header=T)
dd - mat[!is.na(mat$idcode) 
  !is.na(mat$temp) 
  !is.na(mat$sal) 
  !is.na(mat$count) 
  !is.na(mat$count) 
  !is.na(mat$subs),]
distmat-vegdist(dd)
clusa-hclust(distmat,average)
print(clusa)
Call:
hclust(d = distmat, method = average)

Cluster method   : average 
Distance : bray 
Number of objects: 8036 
print(dend1 - as.dendrogram(clusa))
'dendrogram' with 2 branches and 8036 members total, at height 
0.3194225 
dend2 - cut(dend1, h=0.07)


a complete run with plots is available here :  

http://nbviewer.ipython.org/5492912

i'm trying try to group together the species (idcode's) that are sharing 
similar environmental parameters

like (looking at the plots) i should be able to retrieve the list of idcode for 
each branch at cut-level X

in the example :  


X = 0.07 

branches1 : [idcodeA, .. .. ,idcodeJ]
..
..
branche6 : [idcodeB, .. .. , idcodeK]



Many thanks for your precious help!!!

Massimo.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] install error - Netcdf library (linux)

2013-03-07 Thread epi
Hi All,

i'm on a debian linux 64bit,
i'm tying to install the netcdf intraface, i tried both ncdf and ncdf4

but  trying to build i received the error :

(i have necdf installed on my machine and it is able to fiund it .. no missed 
.h)

epy@epinux:~$ sudo R CMD INSTALL 
--configure-args=-with-netcdf_incdir=/usr/include 
-with-netcdf_libdir=/usr/lib ncdf4_1.8.tar.gz 
* installing to library ‘/usr/local/lib/R/site-library’
* installing *source* package ‘ncdf4’ ...
checking for nc-config... yes
Using nc-config: nc-config
Output of nc-config --all:

This netCDF 4.2.1.1 has been built with the following features: 

 --cc- gcc
 --cflags-  -I/usr/local/include -I/usr/local/include
 --libs  - -L/usr/local/lib -lnetcdf

 --has-c++   - no
 --cxx   - 
 --has-c++4  - no
 --cxx4  - 

 --fc- 
 --fflags- 
 --flibs - 
 --has-f90   - no

 --has-dap   - yes
 --has-nc2   - yes
 --has-nc4   - yes
 --has-hdf5  - yes
 --has-hdf4  - no
 --has-pnetcdf- no
 --has-szlib - 

 --prefix- /usr/local
 --includedir- /usr/local/include
 --version   - netCDF 4.2.1.1

---
netcdf.m4: about to set rpath, here is source string: -L/usr/local/lib 
-lnetcdf
netcdf.m4: final rpath:   -Wl,-rpath,/usr/local/lib
Netcdf library version: netCDF 4.2.1.1
Netcdf library has version 4 interface present: yes
Netcdf library was compiled with C compiler: gcc
configure: creating ./config.status
config.status: creating R/load.R
config.status: creating src/Makevars

**  Results of ncdf4 package configure ***

netCDF v4 CPP flags = -I/usr/local/include -I/usr/local/include
netCDF v4 LD flags  =   -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lnetcdf
netCDF v4 runtime path  =   -Wl,-rpath,/usr/local/lib

**

** libs
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I/usr/local/include 
-I/usr/local/include -fpic  -O2 -pipe -g  -c ncdf.c -o ncdf.o
ncdf.c: In function ‘R_nc4_nctype_to_Rtypecode’:
ncdf.c:40:18: error: ‘NC_INT’ undeclared (first use in this function)
ncdf.c:40:18: note: each undeclared identifier is reported only once for each 
function it appears in
ncdf.c:49:18: error: ‘NC_UBYTE’ undeclared (first use in this function)
ncdf.c:51:18: error: ‘NC_USHORT’ undeclared (first use in this function)
ncdf.c:53:18: error: ‘NC_UINT’ undeclared (first use in this function)
ncdf.c:55:18: error: ‘NC_INT64’ undeclared (first use in this function)
ncdf.c:57:18: error: ‘NC_UINT64’ undeclared (first use in this function)
ncdf.c: In function ‘R_nc4_varsize’:
ncdf.c:69:28: error: ‘NC_MAX_DIMS’ undeclared (first use in this function)
ncdf.c:75:2: warning: implicit declaration of function ‘nc_inq_varndims’ 
[-Wimplicit-function-declaration]
ncdf.c:78:4: warning: implicit declaration of function ‘nc_strerror’ 
[-Wimplicit-function-declaration]
ncdf.c:84:2: warning: implicit declaration of function ‘nc_inq_vardimid’ 
[-Wimplicit-function-declaration]
ncdf.c:94:3: warning: implicit declaration of function ‘nc_inq_dimlen’ 
[-Wimplicit-function-declaration]
ncdf.c: In function ‘R_nc4_inq_varunlim’:
ncdf.c:112:2: warning: implicit declaration of function ‘nc_inq_unlimdim’ 
[-Wimplicit-function-declaration]
ncdf.c: In function ‘R_nc4_inq_var’:
ncdf.c:152:2: warning: implicit declaration of function ‘nc_inq_var’ 
[-Wimplicit-function-declaration]
ncdf.c: In function ‘R_nc4_inq_vartype’:
ncdf.c:168:2: warning: implicit declaration of function ‘nc_inq_vartype’ 
[-Wimplicit-function-declaration]
ncdf.c: In function ‘R_nc4_inq_varname’:
ncdf.c:181:2: warning: implicit declaration of function ‘nc_inq_varname’ 
[-Wimplicit-function-declaration]
ncdf.c: In function ‘R_nc4_get_vara_double’:
ncdf.c:214:2: warning: implicit declaration of function ‘nc_get_vara_double’ 
[-Wimplicit-function-declaration]
ncdf.c: In function ‘R_nc4_get_vara_int’:
ncdf.c:257:2: warning: implicit declaration of function ‘nc_get_vara_int’ 
[-Wimplicit-function-declaration]
ncdf.c: In function ‘R_nc4_get_vara_text’:
ncdf.c:313:2: warning: implicit declaration of function ‘nc_get_vara_text’ 
[-Wimplicit-function-declaration]
ncdf.c: In function ‘R_nc4_inq_dimid’:
ncdf.c:345:2: warning: implicit declaration of function ‘nc_inq_dimid’ 
[-Wimplicit-function-declaration]
ncdf.c: In function ‘R_nc4_inq_varid’:
ncdf.c:355:2: warning: implicit declaration of function ‘nc_inq_varid’ 
[-Wimplicit-function-declaration]
ncdf.c: In function ‘R_nc4_inq_dimids’:
ncdf.c:377:9: warning: implicit declaration of function ‘nc_inq_dimids’ 
[-Wimplicit-function-declaration]
ncdf.c: In function ‘R_nc4_inq_dim’:
ncdf.c:387:12: error: ‘NC_MAX_NAME’ undeclared (first use in this function)
ncdf.c:391:2: warning: implicit declaration of function ‘nc_inq_dim’ 
[-Wimplicit-function-declaration]
ncdf.c:408:2: warning: implicit declaration of function ‘nc_inq_unlimdims’ 
[-Wimplicit-function-declaration]
ncdf.c: In function ‘R_nc4_inq’:
ncdf.c:451:2: warning: implicit declaration 

[R] deduplication

2010-06-03 Thread Epi-schnier

Colleagues, 

I am trying to de-duplicate a large (long) database (approx 1mil records) of
diagnostic tests. Individuals in the database can have up-to 25
observations, but most will have only one. IDs for de-duplication (names,
sex, lab number...) are patchy. In a first step, I am using Andreas Borg's
excellent record linkage package (), that leaves me with a list of 'pairs'
looking very much like this:
id1-c(4,17,9,1,1,1,3,3,6,15,1,1,1,1,3,3,3,3,4,4,4,5,5,12,9,9,10,10)
id2-c(8,18,10,3,6,7,6,7,7,16,4,5,12,18,4,5,12,18,5,12,18,12,18,18,15,16,15,16)
id-data.frame(cbind(id1,id2))
where a pair means that the records belong to the same individual (e.g.,
record 4 and record 8; 17 and 18...). My problem now is to get a list with
all records that belong to the same person (in the example, obervations
1,3,4,5,6,7,8,12, 17 and 18 are all from the same person). The problem is to
find the link between 1 and 8 (only through 1 and 4 and 4 and 8) and the
link between 1 and 17 (through 18). I can do it in my head, but I am missing
the code that would work its way through too many records.  

Any clever ideas?
(using R 2.10.1 on Windows XP)

Thanks, 

Christian

 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/deduplication-tp2241637p2241637.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.