date:20120327

Thank you so much, Jessica,

The specific of my case is that I have a very detailed variable 'Interests' 
which may have several thousands of possible values. Usually each customer has 
3-10 different interests. For example:
customer_id|...|interests
1001   |...| cycling, swimming, cooking
1002   |...| cooking, singing, dancing

Total number of possible distinct values is several thousands. I m curious how 
to use these interests in SVM (represent as a vector of real numbers with 
several thousands of elements?).

If you have any ideas please let me know.


Thank you,
-Alex


From: Jessica Streicher [j.streic...@micromata.de]
Sent: 27 March 2012 11:18
To: Alekseiy Beloshitskiy
Subject: Re: [R] normalization of multi-value string variable

Well, not sure what you mean with scaling and normalizing strings, but if you 
want to represent the interests as numbers, you can do something like this:

n-seq(1,length(unique(my_strings)))[factor(my_strings)]


Am 26.03.2012 um 18:50 schrieb Alekseiy Beloshitskiy:

Hi All,

I need to normalize/scale string variable which represents interests of 
customers (e.g., 'cycling, rollerblading, swimming' etc).

Does anybody know how to do this, I want then use it along with other numeric 
variables for SVM classification.

Appreciate for any advice.

-Alex

[[alternative HTML version deleted]]

__
R-help@r-project.orgmailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




Velti anti-spam filter: Click 
herehttps://www.mailcontrol.com/sr/r0FnbR2LtoLTndxI!oX7UvIItv2OGGpT0AcqlhvMu8o1Dzu7YBkufzUjcExl8H5fIQg52m9U+4B6aunJTqVygQ==
 to report this email as spam.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SVM. How to use categorical attributes?

Hi All,

Here is the case. I want to build classification model (SVM). Some of variables 
for this model are categorical attributes which represent words  (usually 3-10 
words - query for search in google). For example:
search_id | query_words|..| result
---+--+--+
1| how,to,grow,tree  |..| 4
2| smartfone,htc,buy,price |..| 7
3| buy,house,realty,london |..| 6
4| where,to,go,weekend,cinema |..| 4
...
As you can see, words in the query are disordered and may occur in different 
queries. Total number of unique words for all queries is several thousands.
The question is how to represent this variable (query_words) to use for SVM.

Thank you for any advices!

Alex

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Completely Off Topic:Link to IOM report on use of -omics tests in clinical trials

2012-03-27 Thread Mike Marchywka






Thanks, I had totally missed this controversy but from quick read of summary 
the impact on open source analysis was unclear.Can you explain the punchline? I 
think many users of R have concluded the biggest problem in most analyses 
isfirst getting the data and then verfiying any results you derive, both issues 
that sound related to your post.
( The jumble below is illustrative of what hotmail has been doing with plain 
text, getting plain data withoutall the formatting junk is a recurring problem 
LOL).






#62; Date#58; Mon, 26 Mar 2012 22#58;38#58;56 #43;0100#13;#10;#62; 
From#58; iaingallagher#64;btopenworld.com#13;#10;#62; To#58; 
gunter.berton#64;gene.com#59; r-help#64;r-project.org#13;#10;#62; 
Subject#58; Re#58; #91;R#93; Completely Off Topic#58;Link to IOM report on 
use of #34;-omics#34; tests in clinical trials#13;#10;#62;#13;#10;#62; 
I followed this case while it was 
ongoing.#13;#10;#62;#13;#10;#62;#13;#10;#62; It was a very interesting 
example of basic mistakes but also #40;for me#41; of journal 
politicking.#13;#10;#62;#13;#10;#62;#13;#10;#62; Keith Baggerly and 
Kevin Coombes wrote a great paper - #34;DERIVING CHEMOSENSITIVITY FROM CELL 
LINES#58; FORENSIC BIOINFORMATICS AND REPRODUCIBLE RESEARCH IN HIGH-THROUGHPUT 
BIOLOGY#34; in The Annals of Applied Statistics #40;2009, Vol. 3, No. 4, 
1309#8211;1334#41; which explains some of the background and investigative 
work they had to do to bring those mistakes to light.!
 #13;#10;#62;#13;#10;#62;#13;#10;#62; 
Best#13;#10;#62;#13;#10;#62; 
iain#13;#10;#62;#13;#10;#62;#13;#10;#62;#13;#10;#62; - Original 
Message -#13;#10;#62; From#58; Bert Gunter 
#60;gunter.berton#64;gene.com#62;#13;#10;#62; To#58; 
r-help#64;r-project.org#13;#10;#62; Cc#58;#13;#10;#62; Sent#58; 
Monday, 26 March 2012, 19#58;12#13;#10;#62; Subject#58; #91;R#93; 
Completely Off Topic#58;Link to IOM report on use of #34;-omics#34; tests in 
clinical trials#13;#10;#62;#13;#10;#62; Warning#58; This has little 
directly to do with R, although R and related#13;#10;#62; tools #40;e.g. 
sweave and other reproducible research tools#41; have a#13;#10;#62; natural 
role to play.#13;#10;#62;#13;#10;#62; The IOM 
report#58;#13;#10;#62;#13;#10;#62; 
http#58;//www.iom.edu/Reports/2012/Evolution-of-Translational-Omics.aspx#13;#10;#62;#13;#10;#62;
 that arose out of the Duke Univ. genomics testing scandal ha!
 s been#13;#10;#62; released. My thanks to Keith Baggerly for forwar
ding this. I believe#13;#10;#62; that many R users in the medical research 
community will find this#13;#10;#62; interesting, and I hope I do not 
venture too far out of line by#13;#10;#62; passing on the link to readers of 
this list. It #42;#42;will#42;#42; have an#13;#10;#62; important impact 
on so-called Personalized Health Care #40;which I guess#13;#10;#62; affects 
all of us#41;, and open source analytical #40;statistical#41;#13;#10;#62; 
methodology is a central issue.#13;#10;#62;#13;#10;#62; For those 
interested, try the summary first.#13;#10;#62;#13;#10;#62; Best to 
all,#13;#10;#62; Bert#13;#10;#62;#13;#10;#62;#13;#10;#62; 
--#13;#10;#62;#13;#10;#62; Bert Gunter#13;#10;#62; Genentech 
Nonclinical Biostatistics#13;#10;#62;#13;#10;#62; Internal Contact 
Info#58;#13;#10;#62; Phone#58; 467-7374#13;#10;#62; 
Website#58;#13;#10;#62; 
http#58;//pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pd!
 b-biostatistics/pdb-ncb-home.htm#13;#10;#62;#13;#10;#62; 
__#13;#10;#62; 
R-help#64;r-project.org mailing list#13;#10;#62; 
https#58;//stat.ethz.ch/mailman/listinfo/r-help#13;#10;#62; PLEASE do read 
the posting guide 
http#58;//www.R-project.org/posting-guide.html#13;#10;#62; and provide 
commented, minimal, self-contained, reproducible 
code.#13;#10;#62;#13;#10;#62;#13;#10;#62; 
__#13;#10;#62; 
R-help#64;r-project.org mailing list#13;#10;#62; 
https#58;//stat.ethz.ch/mailman/listinfo/r-help#13;#10;#62; PLEASE do read 
the posting guide 
http#58;//www.R-project.org/posting-guide.html#13;#10;#62; and provide 
commented, minimal, self-contained, reproducible code.#13;#10;

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to enable Arial font for postcript/pdf figure on Windows?

2012-03-27 Thread antagomir

Hi Agnes and Camille (and help-list),

In Ubuntu 11.10 I needed to use su permissions to copy and gzip the *.afm
files manually into /usr/lib/R/library/grDevices/afm/ to get the Arial
embedding to work in R for postscript. 

Ie. after following the instructions by Agnes and Camille, I did 
  sudo cp arial*.afm /usr/lib/R/library/grDevices/afm/
  gzip /usr/lib/R/library/grDevices/afm/arial*.afm

Then the postscript toy example in this thread worked.

Leo

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-enable-Arial-font-for-postcript-pdf-figure-on-Windows-tp3017809p4508266.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Data indexing issue...

Dear R-help,

My dataset (which is a data frame, called 'Calender' here)  includes 365
rows representing 365 days for a year.  One column ('Season')contains
factor data representing seasons, e.g. spring, summer, autumn and winter.
Another column (called 'Day') contains data representing wether the day  is
a working day  (I use 'Wd' for short here)or weekend (I use 'Wkend' for
short here).


I want to seperate the index of the working days  and weekends for each
season. I used R commend which before for one criteria, for example, if I
use...


WdIndex-which(Calender$Day=='Wd')

that will gives a set of indeices of working days in the year.

I wonder in R could I use a combination of something such as 'AND' , 'OR'
(e.g. in MySQL) to set 'multi-criteria'  when selecting data. So for
example...

WinterWdIndex-which(Calender$Day=='Wd' AND Calender$Season==Winter)


I know the above syntax is wrong, and I checked '?which' which did not give
me an answer and also tried '?AND' but seems it doesn`t exist at all...


Many thanks!
HJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exporting a data.frame to excel using sqlSave - adds a character ' to values

2012-03-27 Thread Juliette Fabre

Hello Tal, 

I have the same problem with the ' added to all my cells when exported into
Excel.

I can drop them manually but only one by one (the Find  Replace does not
work) ... So finally the exported Excel file can actually not be used by
scientists to draw graphs or whatever! 

Did you find a solution to this problem ?

Thanks, 

Juliette


--
View this message in context: 
http://r.789695.n4.nabble.com/Exporting-a-data-frame-to-excel-using-sqlSave-adds-a-character-to-values-tp1016523p4508239.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] detecting time out on download.file command

2012-03-27 Thread Hugh Shanahan

Hi,
   I'm working with a legacy R script which makes use of the
download.file command. We're having a problem that occasionally we get a
time out from a particular FTP site but the function that does this
doesn't pass that information back to the main function that calls it.
I'm aware that it is possible to set a timeout using the options command
but I don't know how to check if a timeout has been executed. If I put
the command into a try block could I get the information there ?

All the best,
Hugh

-- 

Hugh Shanahan   Department of Computer Science 
Lecturer in Bioinformatics  Room 246 McCrea Building
E-mail : hugh.shana...@rhul.ac.uk   Royal Holloway, 
Web : http://www.shanahanlab.orgUniversity of London
Tel : +44 (0)1784 443433Egham, Surrey TW20 0EX
Fax : +44 (0)1784 439786England, U.K.

PGP Key  http://www.cs.rhul.ac.uk/~hugh/PGP/public_key.asc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Discretization Package MDLP

2012-03-27 Thread Khaled_taalab

Dear All,

I have a dataset of eight variables with 156 records which I wish to
discretize using the MDLP algorithm. My issue is that I want to dictate the
number of bins the algorithm splits the data into (around 5), rather than
just allowing the algorithm to dictate this using the mdlp(data) command. 

Any help would be greatly appreciated. 

Kind Regards,

Khaled Taalab

--
View this message in context: 
http://r.789695.n4.nabble.com/Discretization-Package-MDLP-tp4508501p4508501.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data indexing issue...

2012-03-27 Thread Ivan Calandra


Hi HJ,

Take a look at ?; this is probably what you're looking for.

What you could also do is:
Calender[Calender$Day=='Wd'  Calender$Season==Winter, ]  # notice the 
last comma


This will subset directly without using which(); it might be helpful to you.

HTH,
Ivan

--
Ivan CALANDRA
Université de Bourgogne
UMR CNRS/uB 6282 Biogéosciences
6 Boulevard Gabriel
21000 Dijon, FRANCE
+33(0)3.80.39.63.06
ivan.calan...@u-bourgogne.fr
http://biogeosciences.u-bourgogne.fr/calandra


Le 27/03/12 12:32, HJ YAN a écrit :

Dear R-help,

My dataset (which is a data frame, called 'Calender' here)  includes 365
rows representing 365 days for a year.  One column ('Season')contains
factor data representing seasons, e.g. spring, summer, autumn and winter.
Another column (called 'Day') contains data representing wether the day  is
a working day  (I use 'Wd' for short here)or weekend (I use 'Wkend' for
short here).


I want to seperate the index of the working days  and weekends for each
season. I used R commend which before for one criteria, for example, if I
use...


WdIndex-which(Calender$Day=='Wd')

that will gives a set of indeices of working days in the year.

I wonder in R could I use a combination of something such as 'AND' , 'OR'
(e.g. in MySQL) to set 'multi-criteria'  when selecting data. So for
example...

WinterWdIndex-which(Calender$Day=='Wd' AND Calender$Season==Winter)


I know the above syntax is wrong, and I checked '?which' which did not give
me an answer and also tried '?AND' but seems it doesn`t exist at all...


Many thanks!
HJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data indexing issue...

2012-03-27 Thread jim holtman

Why not use 'split' and get all the groups at once:

result - split(Calandra, list(Calandra$Day, Calandra$Season, drop = TRUE)

On Tue, Mar 27, 2012 at 7:43 AM, Ivan Calandra
ivan.calan...@u-bourgogne.fr wrote:
 Hi HJ,

 Take a look at ?; this is probably what you're looking for.

 What you could also do is:
 Calender[Calender$Day=='Wd'  Calender$Season==Winter, ]  # notice the
 last comma

 This will subset directly without using which(); it might be helpful to you.

 HTH,
 Ivan

 --
 Ivan CALANDRA
 Université de Bourgogne
 UMR CNRS/uB 6282 Biogéosciences
 6 Boulevard Gabriel
 21000 Dijon, FRANCE
 +33(0)3.80.39.63.06
 ivan.calan...@u-bourgogne.fr
 http://biogeosciences.u-bourgogne.fr/calandra


 Le 27/03/12 12:32, HJ YAN a écrit :

 Dear R-help,

 My dataset (which is a data frame, called 'Calender' here)  includes 365
 rows representing 365 days for a year.  One column ('Season')contains
 factor data representing seasons, e.g. spring, summer, autumn and winter.
 Another column (called 'Day') contains data representing wether the day
  is
 a working day  (I use 'Wd' for short here)or weekend (I use 'Wkend' for
 short here).


 I want to seperate the index of the working days  and weekends for each
 season. I used R commend which before for one criteria, for example, if
 I
 use...


 WdIndex-which(Calender$Day=='Wd')

 that will gives a set of indeices of working days in the year.

 I wonder in R could I use a combination of something such as 'AND' , 'OR'
 (e.g. in MySQL) to set 'multi-criteria'  when selecting data. So for
 example...

 WinterWdIndex-which(Calender$Day=='Wd' AND Calender$Season==Winter)


 I know the above syntax is wrong, and I checked '?which' which did not
 give
 me an answer and also tried '?AND' but seems it doesn`t exist at all...


 Many thanks!
 HJ

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exporting a data.frame to excel using sqlSave - adds a character ' to values

2012-03-27 Thread jim holtman

I don't see any problem here; there is no data and no indication as to
the actual problem you are having that is causing a Find Replace. I
export to Excel all the time and don't have any problems. So provide
some data and an indication of the problem.

On Tue, Mar 27, 2012 at 4:37 AM, Juliette Fabre juliette_fa...@yahoo.fr wrote:
Hello Tal,

I have the same problem with the ' added to all my cells when exported into
Excel.

I can drop them manually but only one by one (the Find Replace does not
work) ... So finally the exported Excel file can actually not be used by
scientists to draw graphs or whatever!

Did you find a solution to this problem ?

Thanks,

Juliette

--
View this message in context:
http://r.789695.n4.nabble.com/Exporting-a-data-frame-to-excel-using-sqlSave-adds-a-character-to-values-tp1016523p4508239.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

[R] Standard error terms from gfcure

2012-03-27 Thread Bonnett, Laura

Dear R-help,

I am using R 2.14.1 on Windows 7 with the 'gfcure' package (cure rate model).
I have included the treatment variable in the cure part of the model as shown 
below:


Ø  ref_treat - 
gfcure(Surv(rem.Remtime,rem.Rcens)~1,~1+strata(drpa)+factor(treat(delcure)),data=delcure,dist=loglogistic)

From that I can obtain the coefficients, standard errors etc as per 
alternative models (with covariates only fitted to the survival part of the 
model say).

 summary(ref_treat)

However, only one standard error is output:

Log-logistic mixture model

The maximum loglikelihood is -927.0449

Terms in the accelerated failure time model:
Coefficients  Std.err  z-score   p-value
Log(scale) -0.894528   0.0236 -37.8324 0.000
(Intercept) 6.929351   0.0151 460.4157 0.000

Terms in the logistic model:
Coefficients  Std.err  z-score   p-value
(Intercept) 2.542726
strata(drpa)drpa=2 18.76
factor(treat(delcure))2 0.184192
factor(treat(delcure))3 0.472809
factor(treat(delcure))4 0.255565 953.6876   0.0003 0.9997862
factor(treat(delcure))5 0.401713
Warning message:
In sqrt(diag(solve(object$infomat))) : NaNs produced


Can anyone explain why this is the case?

Very many thanks,
Laura

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.csv and field containing single quotes

2012-03-27 Thread Benilton Carvalho

Thanks Henrique...

giving it a try now, but it'll take a good while, given the file size.

Cheers,
b

On 27 March 2012 02:35, Henrique Dallazuanna www...@gmail.com wrote:

 Benilton,

 Try this:

 read.table(textConnection(gsub(',', ',', gsub('^\|\$', ',
 readLines('../teste.csv', sep = ',', quote = ', header = TRUE)

 On Mon, Mar 26, 2012 at 8:09 PM, Benilton Carvalho
 beniltoncarva...@gmail.com wrote:
  I need to read in csv files, created by 3rd party, with fields
  containing single quotes (as shown below).
 
  header1,header2,header3,header4
  field1r1,field2r1,field3r1,field4r1
  field1r2,field2r2,field3r2PartA), field3r2PartB Very
 Long,field4r2
  field1r3,field2r3,field3r3,field4r3
 
 
  read.csv(filename, quote=\', header=TRUE) won't read the file
  represented above, unless the 3rd line has Very  (double quotes)
  instead of Very (single quotes)... and this is documented (scan() man
  page).
 
  Assuming that the creation of such csv files is something I'm not in a
  position to interfere with, are there (preferably, all in R)
  suggestions on how to handle such task?
 
  For the moment, I'm using my poor man's solution (below), but any
  tricks that would simplify this task would be great.
 
  Thank you very much,
 
  benilton
 
 
  parser - function(fname, header=TRUE, stringsAsFactors=FALSE){
 txt - readLines(fname)
 txt - gsub(^\|\$, , txt)
 txt - strsplit(txt, \,\)
 txt - do.call(rbind, lapply(txt, function(x) gsub(\, \\, x)))
 if (header){
 nms - txt[1,]
 txt - txt[-1,]
 }
 txt - as.data.frame(txt, stringsAsFactors=stringsAsFactors)
 if (header) names(txt) - nms
 txt
  }
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.csv and field containing single quotes

2012-03-27 Thread Rainer M Krug

On 27/03/12 01:09, Benilton Carvalho wrote:
 I need to read in csv files, created by 3rd party, with fields
 containing single quotes (as shown below).
 
 header1,header2,header3,header4
 field1r1,field2r1,field3r1,field4r1
 field1r2,field2r2,field3r2PartA), field3r2PartB Very Long,field4r2
 field1r3,field2r3,field3r3,field4r3

You could try under your OS, to

1) replace , with ', (assuming that the csv does not contain any'
2) read into R with sep=\'

If the file is huge, some in OS solution would be the best.

Cheers,

Rainer


 
 
 read.csv(filename, quote=\', header=TRUE) won't read the file
 represented above, unless the 3rd line has Very  (double quotes)
 instead of Very (single quotes)... and this is documented (scan() man
 page).
 
 Assuming that the creation of such csv files is something I'm not in a
 position to interfere with, are there (preferably, all in R)
 suggestions on how to handle such task?
 
 For the moment, I'm using my poor man's solution (below), but any
 tricks that would simplify this task would be great.
 
 Thank you very much,
 
 benilton
 
 
 parser - function(fname, header=TRUE, stringsAsFactors=FALSE){
 txt - readLines(fname)
 txt - gsub(^\|\$, , txt)
 txt - strsplit(txt, \,\)
 txt - do.call(rbind, lapply(txt, function(x) gsub(\, \\, x)))
 if (header){
 nms - txt[1,]
 txt - txt[-1,]
 }
 txt - as.data.frame(txt, stringsAsFactors=stringsAsFactors)
 if (header) names(txt) - nms
 txt
 }
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, 
UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] normalization of multi-value string variable

Right,
I was also thinking about it, but since I have few thousands of unique words I 
'm not quite sure how it will work

I just posted my question with more detailed description here:
http://stats.stackexchange.com/questions/25355/multi-value-categorical-attributes-how-r

Really interesting case :)

Thank you,
-Alex

From: Jessica Streicher [j.streic...@micromata.de]
Sent: 27 March 2012 15:24
To: Alekseiy Beloshitskiy
Cc: r-help@r-project.org
Subject: Re: [R] normalization of multi-value string variable

Hm.. so what you need is either

- one new feature for each activity that has a binary value
e.g.:
cust_id , cycling, swimming, cooking
1001 , 1  , 0, 1

- one new feature that has a value corresponding to a certain combination of 
activities
so if you had just the three activities you would have 2^3 possible values
I'm not sure how useful that would be though for the classification.

(Would need to think about how to compute this, i'm new to R as well. Would 
probably just iterate over the data)

If you make one feature per activity, and you end up having too many to 
properly compute the svm, you might try to reduce it by other methods, PCA 
comes to mind for example, though i never used that on binary data before.


Am 27.03.2012 um 11:34 schrieb Alekseiy Beloshitskiy:

Thank you so much, Jessica,

The specific of my case is that I have a very detailed variable 'Interests' 
which may have several thousands of possible values. Usually each customer has 
3-10 different interests. For example:
customer_id|...|interests
1001   |...| cycling, swimming, cooking
1002   |...| cooking, singing, dancing

Total number of possible distinct values is several thousands. I m curious how 
to use these interests in SVM (represent as a vector of real numbers with 
several thousands of elements?).

If you have any ideas please let me know.


Thank you,
-Alex


From: Jessica Streicher 
[j.streic...@micromata.demailto:j.streic...@micromata.de]
Sent: 27 March 2012 11:18
To: Alekseiy Beloshitskiy
Subject: Re: [R] normalization of multi-value string variable

Well, not sure what you mean with scaling and normalizing strings, but if you 
want to represent the interests as numbers, you can do something like this:

n-seq(1,length(unique(my_strings)))[factor(my_strings)]


Am 26.03.2012 um 18:50 schrieb Alekseiy Beloshitskiy:

Hi All,

I need to normalize/scale string variable which represents interests of 
customers (e.g., 'cycling, rollerblading, swimming' etc).

Does anybody know how to do this, I want then use it along with other numeric 
variables for SVM classification.

Appreciate for any advice.

-Alex

[[alternative HTML version deleted]]

__
R-help@r-project.orgmailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Velti anti-spam filter: Click 
herehttps://www.mailcontrol.com/sr/r0FnbR2LtoLTndxI!oX7UvIItv2OGGpT0AcqlhvMu8o1Dzu7YBkufzUjcExl8H5fIQg52m9U+4B6aunJTqVygQ==
 to report this email as spam.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RSqlite UPDATE command problem

2012-03-27 Thread Thomas Adams

All:

I am using RSqlite and want to be able to update individual values in a
record, such as with this simple example:

library(RSQLite)
drv-dbDriver(SQLite)
con-dbConnect(drv,test.db)
my.data-data.frame(countries=c(US,UK,Canada,Australia,NewZealand),vals=c(52,36,74,10,98))
dbWriteTable(con,testtable,my.data)
q-dbReadTable(con,testtable)
q

   countries vals
1 US   52
2 UK   36
3 Canada   74
4  Australia   10
5 NewZealand   98

So, say, I want to change the value for NewZealand to '21' from '98'

I've tried something like this:

sql-UPDATE testtable SET vals=21 WHERE countries='NewZealand'
dbBeginTransaction(con)
dbGetPreparedQuery(con,sql) == I get an error here
dbCommit(con)

using a different example for an INSERT command using a data frame 'data',
this construct is accepted:

dbGetPreparedQuery(con,sql,bind.data=data)

What do I need to do differently to use the UPDATE command?

Regards,
Tom


-- 

Thomas E Adams
National Weather Service
Ohio River Forecast Center
1901 South State Route 134
Wilmington, OH 45177

EMAIL:  thomas.ad...@noaa.gov
VOICE:  937-383-0528
FAX:937-383-0033

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] normalization of multi-value string variable

2012-03-27 Thread Jessica Streicher

Hm.. so what you need is either

- one new feature for each activity that has a binary value
e.g.:
cust_id , cycling, swimming, cooking
1001 , 1  , 0, 1

- one new feature that has a value corresponding to a certain combination of 
activities
so if you had just the three activities you would have 2^3 possible values
I'm not sure how useful that would be though for the classification.

(Would need to think about how to compute this, i'm new to R as well. Would 
probably just iterate over the data)

If you make one feature per activity, and you end up having too many to 
properly compute the svm, you might try to reduce it by other methods, PCA 
comes to mind for example, though i never used that on binary data before.


Am 27.03.2012 um 11:34 schrieb Alekseiy Beloshitskiy:

 Thank you so much, Jessica,
 
 The specific of my case is that I have a very detailed variable 'Interests' 
 which may have several thousands of possible values. Usually each customer 
 has 3-10 different interests. For example:
 customer_id|...|interests
 1001   |...| cycling, swimming, cooking
 1002   |...| cooking, singing, dancing
 
 Total number of possible distinct values is several thousands. I m curious 
 how to use these interests in SVM (represent as a vector of real numbers with 
 several thousands of elements?).
 
 If you have any ideas please let me know.
 
 
 Thank you,
 -Alex
 
 From: Jessica Streicher [j.streic...@micromata.de]
 Sent: 27 March 2012 11:18
 To: Alekseiy Beloshitskiy
 Subject: Re: [R] normalization of multi-value string variable
 
 Well, not sure what you mean with scaling and normalizing strings, but if you 
 want to represent the interests as numbers, you can do something like this:
 
 n-seq(1,length(unique(my_strings)))[factor(my_strings)]
 
 
 Am 26.03.2012 um 18:50 schrieb Alekseiy Beloshitskiy:
 
 Hi All,
 
 I need to normalize/scale string variable which represents interests of 
 customers (e.g., 'cycling, rollerblading, swimming' etc).
 
 Does anybody know how to do this, I want then use it along with other 
 numeric variables for SVM classification.
 
 Appreciate for any advice.
 
 -Alex
 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 Velti anti-spam filter: Click here to report this email as spam.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Supperscript, subscript and double lines in the main/sub title and using greek letters

Dear R-help,

 I am trying to express myself as best as I can here. If you also use Latex
to edit math reports or other languages with similar editing method,
you'll see what I'm talking about. My sincere appologies if my question is
not clear enough to some extend, as also I'm not able to provide my code
here because I don`t know which one I can use...

When editing the title in R plots, such as using 'plot', or 'xyplot' in
'lattic', what method do you use to write greek letters and make use of
superscript and subscript, e.g. to write mathematical expressions like
using Latex:

\sigma^2
\tau^{2s}
\mu_i
\pi_{2s}

Also I would like to learn how to make two lines in the main title or sub
title if the text I need it too long for putting in a single line, e.g. are
there some R code/syntax allowing me to do something like in Latex to make
two lines in the title, for example using '//' or '\\' to seperate the two
parts of the text I want to put in two lines??

I heard about using something like

plot(x,y, main=expression())

but from neither '?plot' or '?expression' could I find comprehensive
information about what I need...

Many thanks!
HJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data indexing issue...

Hi Jim!

Thank you so much for the very helpful hints!!
I am learning 'split' now and it seems very useful..

HJ

On Tue, Mar 27, 2012 at 12:58 PM, jim holtman jholt...@gmail.com wrote:

 Why not use 'split' and get all the groups at once:

 result - split(Calandra, list(Calandra$Day, Calandra$Season, drop = TRUE)

 On Tue, Mar 27, 2012 at 7:43 AM, Ivan Calandra
 ivan.calan...@u-bourgogne.fr wrote:
  Hi HJ,
 
  Take a look at ?; this is probably what you're looking for.
 
  What you could also do is:
  Calender[Calender$Day=='Wd'  Calender$Season==Winter, ]  # notice the
  last comma
 
  This will subset directly without using which(); it might be helpful to
 you.
 
  HTH,
  Ivan
 
  --
  Ivan CALANDRA
  Université de Bourgogne
  UMR CNRS/uB 6282 Biogéosciences
  6 Boulevard Gabriel
  21000 Dijon, FRANCE
  +33(0)3.80.39.63.06
  ivan.calan...@u-bourgogne.fr
  http://biogeosciences.u-bourgogne.fr/calandra
 
 
  Le 27/03/12 12:32, HJ YAN a écrit :
 
  Dear R-help,
 
  My dataset (which is a data frame, called 'Calender' here)  includes 365
  rows representing 365 days for a year.  One column ('Season')contains
  factor data representing seasons, e.g. spring, summer, autumn and
 winter.
  Another column (called 'Day') contains data representing wether the day
   is
  a working day  (I use 'Wd' for short here)or weekend (I use 'Wkend' for
  short here).
 
 
  I want to seperate the index of the working days  and weekends for each
  season. I used R commend which before for one criteria, for example,
 if
  I
  use...
 
 
  WdIndex-which(Calender$Day=='Wd')
 
  that will gives a set of indeices of working days in the year.
 
  I wonder in R could I use a combination of something such as 'AND' ,
 'OR'
  (e.g. in MySQL) to set 'multi-criteria'  when selecting data. So for
  example...
 
  WinterWdIndex-which(Calender$Day=='Wd' AND Calender$Season==Winter)
 
 
  I know the above syntax is wrong, and I checked '?which' which did not
  give
  me an answer and also tried '?AND' but seems it doesn`t exist at all...
 
 
  Many thanks!
  HJ
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] row, col function but for a list (probably very easy question, cannot seem to find it though)

2012-03-27 Thread MBoersma

Thanks guys for all the replies.

It is an urban myth that using 'apply' functions will deliver better   
performance than 'for' loops. It may even worsen performance or create   
obstacles when it is improperly used with dataframes. Most of the   
benefits come from improving readability and maintainability.

This is what I had to learn the hard way: apply functions made it go
slower :) I do understand them much better now, also in the light of some of
these ways of using them.

In the end my program became much faster by making the data frames matrices,
and even more by finally seeing the light (courtesy of a colleague for
getting me to think in the right direction) and making much more of it into
a matrix operation. I'm very happy with the results :).

So consider me helped!

Regards,
Mark

--
View this message in context: 
http://r.789695.n4.nabble.com/row-col-function-but-for-a-list-probably-very-easy-question-cannot-seem-to-find-it-though-tp4504216p4508816.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RSqlite UPDATE command problem

2012-03-27 Thread Benilton Carvalho

You probably want:

sql-UPDATE testtable SET vals=21 WHERE countries='NewZealand'
dbGetQuery(con, sql)

instead...

b

On 27 March 2012 14:18, Thomas Adams thomas.ad...@noaa.gov wrote:

 All:

 I am using RSqlite and want to be able to update individual values in a
 record, such as with this simple example:

 library(RSQLite)
 drv-dbDriver(SQLite)
 con-dbConnect(drv,test.db)

 my.data-data.frame(countries=c(US,UK,Canada,Australia,NewZealand),vals=c(52,36,74,10,98))
 dbWriteTable(con,testtable,my.data)
 q-dbReadTable(con,testtable)
 q

   countries vals
 1 US   52
 2 UK   36
 3 Canada   74
 4  Australia   10
 5 NewZealand   98

 So, say, I want to change the value for NewZealand to '21' from '98'

 I've tried something like this:

 sql-UPDATE testtable SET vals=21 WHERE countries='NewZealand'
 dbBeginTransaction(con)
 dbGetPreparedQuery(con,sql) == I get an error here
 dbCommit(con)

 using a different example for an INSERT command using a data frame 'data',
 this construct is accepted:

 dbGetPreparedQuery(con,sql,bind.data=data)

 What do I need to do differently to use the UPDATE command?

 Regards,
 Tom


 --

 Thomas E Adams
 National Weather Service
 Ohio River Forecast Center
 1901 South State Route 134
 Wilmington, OH 45177

 EMAIL:  thomas.ad...@noaa.gov
 VOICE:  937-383-0528
 FAX:937-383-0033

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Supperscript, subscript and double lines in the main/subtitle and using greekletters

2012-03-27 Thread Gerrit Eichner


Hi, HJ,

see

?plotmath

 Hth  --  Gerrit

-
Dr. Gerrit Eichner   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104  Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109http://www.uni-giessen.de/cms/eichner
-

On Tue, 27 Mar 2012, HJ YAN wrote:


Dear R-help,

I am trying to express myself as best as I can here. If you also use Latex
to edit math reports or other languages with similar editing method,
you'll see what I'm talking about. My sincere appologies if my question is
not clear enough to some extend, as also I'm not able to provide my code
here because I don`t know which one I can use...

When editing the title in R plots, such as using 'plot', or 'xyplot' in
'lattic', what method do you use to write greek letters and make use of
superscript and subscript, e.g. to write mathematical expressions like
using Latex:

\sigma^2
\tau^{2s}
\mu_i
\pi_{2s}

Also I would like to learn how to make two lines in the main title or sub
title if the text I need it too long for putting in a single line, e.g. are
there some R code/syntax allowing me to do something like in Latex to make
two lines in the title, for example using '//' or '\\' to seperate the two
parts of the text I want to put in two lines??

I heard about using something like

plot(x,y, main=expression())

but from neither '?plot' or '?expression' could I find comprehensive
information about what I need...

Many thanks!
HJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RSqlite UPDATE command problem

2012-03-27 Thread Thomas Adams

Benilton,
*
*
*Thank you  you are quite right!!*
*
*
*Regards,*
*Tom
*
On Tue, Mar 27, 2012 at 9:35 AM, Benilton Carvalho 
beniltoncarva...@gmail.com wrote:

 You probably want:

 sql-UPDATE testtable SET vals=21 WHERE countries='NewZealand'
 dbGetQuery(con, sql)

 instead...

 b

 On 27 March 2012 14:18, Thomas Adams thomas.ad...@noaa.gov wrote:

 All:

 I am using RSqlite and want to be able to update individual values in a
 record, such as with this simple example:

 library(RSQLite)
 drv-dbDriver(SQLite)
 con-dbConnect(drv,test.db)

 my.data-data.frame(countries=c(US,UK,Canada,Australia,NewZealand),vals=c(52,36,74,10,98))
 dbWriteTable(con,testtable,my.data)
 q-dbReadTable(con,testtable)
 q

   countries vals
 1 US   52
 2 UK   36
 3 Canada   74
 4  Australia   10
 5 NewZealand   98

 So, say, I want to change the value for NewZealand to '21' from '98'

 I've tried something like this:

 sql-UPDATE testtable SET vals=21 WHERE countries='NewZealand'
 dbBeginTransaction(con)
 dbGetPreparedQuery(con,sql) == I get an error here
 dbCommit(con)

 using a different example for an INSERT command using a data frame 'data',
 this construct is accepted:

 dbGetPreparedQuery(con,sql,bind.data=data)

 What do I need to do differently to use the UPDATE command?

 Regards,
 Tom


 --

 Thomas E Adams
 National Weather Service
 Ohio River Forecast Center
 1901 South State Route 134
 Wilmington, OH 45177

 EMAIL:  thomas.ad...@noaa.gov
 VOICE:  937-383-0528
 FAX:937-383-0033

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 

Thomas E Adams
National Weather Service
Ohio River Forecast Center
1901 South State Route 134
Wilmington, OH 45177

EMAIL:  thomas.ad...@noaa.gov
VOICE:  937-383-0528
FAX:937-383-0033

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] copy the columns based on the code

2012-03-27 Thread Igor Sosa Mayor

:)

yes! I agree!

On Mon, Mar 26, 2012 at 10:51:17AM -0700, Bert Gunter wrote:
 Fortunes candidate?!
 -- Bert
 
 On Mon, Mar 26, 2012 at 10:24 AM, Sarah Goslee sarah.gos...@gmail.com wrote:
  The OP wrote
  The problem is that it gives the result that I want.
 
 Sarah's reply:  That's a new sort of problem.
 
 
 
 
 -- 
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
:: Igor Sosa Mayor :: joseleopoldo1...@gmail.com ::
:: GnuPG: 0x1C1E2890   :: http://www.gnupg.org/  ::
:: jabberid: rogorido  ::::


pgpB12B850AAx.pgp
Description: PGP signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help in replacing for llop

2012-03-27 Thread R. Michael Weylandt

No idea what a mean median histogram is but you may wish to check
out ?tapply or library(plyr), both of which are designed for this
split-apply-combine paradigm.

Michael

On Tue, Mar 27, 2012 at 12:51 AM, arunkumar akpbond...@gmail.com wrote:
 Hi

 I have records like like this

 X1      X2      State
 34      72      state1
 9       63      state1
 49      31      state1
 60      34      state1
 80      73      state1
 60      20      state2
 59      87      state2
 88      20      state2
 71      66      state2
 65      56      state2
 59      16      state1
 60      100     state2


 I want to get the summarize value like mean median histogram for X1 and X2
 based on state. I'm using FOR loop for this.  Is there any method to remove
 for loop and use apply or any alternatives


 -
 Thanks in Advance
        Arun
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/help-in-replacing-for-llop-tp4507939p4507939.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] I can't open a .nc file with the cdfcont function of the clim.pact package

2012-03-27 Thread anne-laure

Hello,

I am new at using R.

I would like to use the following functions of the clim.pact package:
ncdfcont and retrieve.nc

I have installed the package clim.pact in Rstudio.
I have downloaded the ncdf pack from unicar (including ncdump and ncgen).
The ncdf file I'm working on is called essai2.nc

Here is what I get, when I type the command ncinfo - cdfcont(essai2.nc)

ncinfo cdfcont.txt' renvoie un statut 1
2: In min(nchar(str)) : aucun argument trouvé pour min ; Inf est renvoyé

I'm sorry it's in French!
If I try to translate:
Error in 1:nc: the argument has null length
Information message:
1: executing the command 'C:...' gives status 1
2: In min(nchar(str)) :no argument found for min ; Inf is sent back

Could someone please help me with this?

PS: I can open the document with the function open.ncdf of the ncdf package.

Regards

--
View this message in context: 
http://r.789695.n4.nabble.com/I-can-t-open-a-nc-file-with-the-cdfcont-function-of-the-clim-pact-package-tp4508950p4508950.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] two lmer questions - formula with related variables and output interpretation

2012-03-27 Thread Dragonwalker

Hello,
I have been attempting to set up a lme and have looked at numerous posts
including 'R's lmer cheat-sheet' as well as reading a number of papers and
other resources including R help, but I am still a little confused on how to
write my model (I thought I had it).

I have asked a number of questions on different forums; most of which have
been resolved.

My main concern right now is whether my model is correct. I studied broods
of precocial chicks and watched each chick every other day for five minutes
if possible. As chicks on the same day are completely non-independent the
mean was found for each brood for each day. Variables that were recorded
were the behaviours during that time and the habitats used.

There were seven broods. Three at one site and four at the other site. Only
one site had a brood that consistently used mudflats rather than oceanfront
habitats. As none of the data within a brood is truly independent, along
with the very small number of broods, it became impossible to use
conventional statistics to test the hypotheses and so it was suggested that
mixed-effects models would be the best option as it would not only allow for
all data to be used with a random effect of Brood ID to negate the
pseudo-replication but also let me look at partial use of mudflats in one of
the other broods that only used it periodically.

So, for this part of the analysis I would like to see which factors affect
the amount of time feeding. I set up a global model with ten fixed variables
plus (1|Brood). Site, tide.h.l, tide.inc.out, MF.vs.OF, Human Disturbance
Rate (HDr), Human Disturbance proportion of time(HDp), non-Human Disturbance
(two variables as for Human Disturbance) and Age and mean.foraging.rate. As
so:

gm1-lmer(Feeding~Site+tide.level+MF.vs.OF+HDr+HDp+NHDr+NHDp+Age+mean.for.rate+(1|Brood),
data=AllBrood, REML=TRUE)

I wished to put all the factors together to explore which ones really did
influence the time spent feeding and used 'dredge' command to run all
possible combinations and then averaged the models with an AICc Delta2. I
was expecting that the proportion of time being disturbed (HDp and NHDp)
would be the most relevant as by default the greater time in other
behaviours the less time for feeding. However, MF.vs.OF had a larger effect
than HDp and NHDp but this may be because MF observations did not experience
HDp at all so this may push the effect of this habitat. Surprisingly
non-human disturbance rates rather than time had a greater effect (but these
are quite even among habitats.

The results of the model.avg are as follows:
 Estimate Std. Error z value Pr(|z|)
(Intercept)   102.7190 5.5300  18.575   2e-16 ***
HDr-1.5495 0.3451   4.490 7.11e-06 ***
MF.vs.OF2  -7.6780 3.7507   2.047  0.04065 *  
NHDp   -0.5145 0.2909   1.769  0.07695 .  
NHDr   -1.4164 0.4663   3.037  0.00239 ** 
Site2   6.1477 2.7400   2.244  0.02485 *  
tide.h.l2  -7.2546 2.6914   2.695  0.00703 ** 
tide.inc.out2  -5.8486 2.6187   2.233  0.02553 *  
HDp-0.3773 0.2732   1.381  0.16731
mean.for.rate  -0.3966 0.3220   1.232  0.21807
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Full model-averaged coefficients (with shrinkage): 
 (Intercept)HDr  MF.vs.OF2   NHDp   NHDr  Site2 
tide.h.l2 tide.inc.out2HDp
  102.718962  -1.549499  -5.734171  -0.239550  -1.416373   5.336532 
-7.254627 -5.848553  -0.044795
 mean.for.rate
 -0.081734

Relative variable importance:
  (Intercept)   Age   HDp   HDr mean.for.rate 
MF.vs.OF  NHDp  NHDr 
 1.00  0.00  0.12  1.00  0.21 
0.75  0.47  1.00 
 Site  tide.h.l  tide.inc.out 
 0.87  1.00  1.00 

I was wondering whether there would be a better way to formulate the model
to allow for this effect, or could I just keep it as is and just infer that
it may be partly affected by the amount of disturbance within these habitats
but as it has a greater effect that other factors are at play which would
then lead me onto the next model which is going to explore observations that
do not include disturbance which would allow me to tease the natural factors
affecting feeding behaviour? I was going to run this second model with site
still as a fixed effect and then run it with (1|Site) to remove site effect
(if one is found).

I would prefer to keep it simple as I really want to use a lme, but don't
have the understanding for more complex interactions.

I has also asked a question, which is yet to be answered on stats stack
exchange, in regards to the output of the model.avg.  as follows:

I have seen the Estimates described as the effect of the variable and this
is discussed in results sections as an important value to report (in regards
to the size of them and their direction (+ve/-ve). (the paper I

[R] Zero inflated GAMM

2012-03-27 Thread Bert Harris

HI all,

I am planning to get Zuur et al.'s new book when it comes out, but until
then I was wondering if anyone could suggest examples of zero inflated or
hurdle GAMMs. I have count data with many zeros, non-linear relationships,
and site as a random effect.

Thank you!
Bert Harris, University of Adelaide

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory Utilization on R

2012-03-27 Thread Kurinji Pandiyan

Thank you for the modified script! I have now tried on different datasets
and it works very well and is dramatically faster than my original script!

I really appreciate the help.
Kurinji

On Fri, Mar 23, 2012 at 1:33 PM, R. Michael Weylandt 
michael.weyla...@gmail.com wrote:

 Taking a look at your script: there are a some potential optimizations
 you can do:

  # Fine
 poi - as.character(top.GSM396290) #5000 characters
 x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables

 # Pre-allocate the space
 x - vector(list, 485577) # x - list()

 # Do the a stuff once outside the loop so you aren't doing it 485577
 times
 a - strsplit(as.character(x.data[, UCSC_REFGENE_NAME]), ;)

 # Lets use an apply statement instead of a for loop
 # vapply is the fastest since we prespecify the return type.
 x.data[vapply(a, function(x) any(poi %in% x), logical(1)), ]

 I think this will do what you wanted (and hopefully much faster)

 Note that you could probably tune this further but I think this
 strikes a good balance between clarity and performance (for now)

 Hope this helps,

 Michael

 On Fri, Mar 23, 2012 at 11:52 AM, Kurinji Pandiyan
 kurinji.pandi...@gmail.com wrote:
 
  Thank you for the input.
 
  As it were, I realized that my script is utilizing a lot more memory than
  I claimed - it was initially using 3 GB but has gone up to 20.24 active
 but
  29.63 assigned to the R session.
 
  The script has run overnight and now I don't think it is active anymore
  since I keep getting the error message that I am out of startup disk
 space
  for application memory.
 
  I am attaching screen shots of my RAM usage distribution (given that
 there
  is no fluctuation in the usage by the R session I believe it is not
 running
  anymore) and of my available HD.
 
 
 
 
 
  Here is my script -
 
  poi - as.character(top.GSM396290) #5000 characters
  x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables
  head(x.data)
 
  x - list()
 
  for(i in 1:485577){
   a - as.character(x.data[i, UCSC_REFGENE_NAME])
   a - unlist(strsplit(a, ;))
   if(any(poi %in% a) == TRUE) {x[[i]] - x.data[i,]}
}
 
   # this step completed in a few hours
 
  x - do.call(rbind, x) # this step has been running overnight and is
 still
  stuck
 
  Thanks, I really appreciate the help.
  Kurinji
 
  On Thu, Mar 22, 2012 at 10:44 PM, R. Michael Weylandt
  michael.weyla...@gmail.com wrote:
 
  Well... what makes you think you are hitting memory constraints then?
  If you have significantly less than 3GB of data, it shouldn't surprise
  you if R never needs more than 3GB of memory.
 
  You could just be running your scripts inefficiently...it's an extreme
  example, but all the memory and gigaflopping in the world can't speed
  this up (by much):
 
  for(i in seq_len(1e6)) Sys.sleep(10)
 
  Perhaps you should look into profiling tools or parallel
  computation...if you can post a representative example of your
  scripts, we might be able to give performance pointers.
 
  Michael
 
  On Fri, Mar 23, 2012 at 1:33 AM, Kurinji Pandiyan
  kurinji.pandi...@gmail.com wrote:
   Yes, I am.
  
   Thank you,
   Kurinji
  
   On Mar 22, 2012, at 10:27 PM, R. Michael Weylandt
   michael.weyla...@gmail.com wrote:
  
   Use 64bit R?
  
   Michael
  
   On Thu, Mar 22, 2012 at 5:22 PM, Kurinji Pandiyan
   kurinji.pandi...@gmail.com wrote:
   Hello,
  
   I have a 32 GB RAM Mac Pro with a 2*2.4 GHz quad core processor and
   2TB
   storage. Despite this having so much memory, I am not able to get R
   to
   utilize much more than 3 GBs. Some of my scripts take hours to run
   but I
   would think they would be much faster if more memory is utilized.
 How
   do I
   optimize the memory usage on R by my Mac Pro?
  
   Thank you!
   Kurinji
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using MuMIn - error message

2012-03-27 Thread Dragonwalker

Hello Mike,

I don't think I did, but I fixed the issue by loading each package before
use. The second issue was solved by removing a variable that was used to
create two other categorical variables. I think it must have been
recognising this.

Thanks for the help.

--
View this message in context: 
http://r.789695.n4.nabble.com/Using-MuMIn-error-message-tp4500236p4508901.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] matrix(unlist(strsplit())) 'missing value' issue

2012-03-27 Thread MaartenJacobs

*I'm still a R noob, just had a couple of lectures about it in our research
master.

There is a Deal or no deal experiment where I have to write some code for.
Someone wrote a website to gather the data and write it in a .xlsx file.
These are seperate files for seperate participants so first I have to import
the seperate datafiles. I do that like this:
# Merge the xlsx files into one dataframe
alldata - rbind(read.xlsx('experimentdata.xlsx',1), 
 read.xlsx('experimentdata_1.xlsx',1),
 read.xlsx('experimentdata_2.xlsx',1)
#etc..#read.xlsx('filepath',1)
 )

The website is poorly written and some of the variables are not conveniant.
I have the variables 'bankoffer.1', 'bankoffer.3', 'bankoffer.5' etc.
These variables look like the following:
alldata$bankoffer.1
[1] 246000:accepted267000:notaccepted 20:notaccepted
Levels: 246000:accepted 267000:notaccepted 20:notaccepted

 alldata$bankoffer.3
[1] 999429000:notaccepted 48000:notaccepted 
Levels: 999 429000:notaccepted 48000:notaccepted
The problem is that the values in the cells are weird, they constitude for
example of /'246000:accepted'/I would decompose that so that /246000 /is in
one variable and /accepted /in another

no problem just do this:
 as.data.frame(matrix(unlist(strsplit(as.character(alldata$bankoffer.1),:)),
 ncol = 2, byrow = TRUE))
  V1  V2
1 246000accepted
2 267000 notaccepted
3 20 notaccepted

However when there are missing values, like in bankoffer.3, there is a
problem

 as.data.frame(matrix(unlist(strsplit(as.character(alldata$bankoffer.3),:)),
 ncol = 2, byrow = TRUE))
   V1  V2
1 999  429000
2 notaccepted   48000
3 notaccepted 999
Warning message:
In matrix(unlist(strsplit(as.character(alldata$bankoffer.3), :)),  :
  data length [5] is not a sub-multiple or multiple of the number of rows
[3]

R does not encounter a ':' in the 999 and therefor places the 429000 in
the second colomn, this should however be in the first one. Like this:
   V1  V2
1 999  999
2  429000 notaccepted   
3 48000  notaccepted 

How can I tell R to place 999 in both colomns when he/she encounters a
999. Or any other solotion to my problem is also good. I for example
thought about making R add ':999' whenever it encounters 999 as a
sort of a workaround for the problem but I have no idea how to do that.

I hope I made it a little clear what the problem is and what I eventually
want. If not please ask.

Greetings Maarten

--
View this message in context: 
http://r.789695.n4.nabble.com/matrix-unlist-strsplit-missing-value-issue-tp4509065p4509065.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exporting a data.frame to excel using sqlSave - adds a character ' to values

2012-03-27 Thread Juliette Fabre

Hello, 

I encountered a situation similar as the one described by Tal above :

I use the RODBC library to export multiple dataframes into different sheets
of an Excel file.
My dataframes contain Character, Date and Numeric columns.

library(RODBC)
channel - odbcConnectExcel(xls.file = myXlsFile, readOnly = FALSE)
sqlSave(channel, data, tablename = Table1, rownames = F, colnames = T)
odbcClose(channel)

When exported into Excel, *all * of my cells start with the ' character
(which is different from Tal's situation where *only * non-numeric cells
started with ' character).
I need the columns that contain numeric data or dates to be imported into
the appropriate format so that they can be manipulated (graphics etc).

I found a macro that formats all the sheets in the appropriate way, but I
would like to discover why even my numeric data (type Numeric in R)  are
imported as text.

Regards, 

Juliette




--
View this message in context: 
http://r.789695.n4.nabble.com/Exporting-a-data-frame-to-excel-using-sqlSave-adds-a-character-to-values-tp1016523p4509108.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] row, col function but for a list (probably very easy question, cannot seem to find it though)



On Mar 27, 2012, at 3:37 AM, peter dalgaard wrote:



On Mar 26, 2012, at 17:33 , David Winsemius wrote:


The usual approach to that problem is to use sapply:

x - list()
x - sapply(1:10, function(z) x[[z]] - 1:z )


Yikes!

If that works, it is only by coincidence (The pre-assignment to  
x only serves the purpose of allowing the [[-assignment inside the  
anonymous function, but the assignment is to a local copy which is  
deleted on exit, and the return value is the rhs of the assignment.)


Well, maybe not by pure coincidence. There are really two rhs's and it  
was because of the outer assignment of the values to 'x' that it  
worked as intended. My error is in propagating the notion that  
assignments to named objects inside the function will survive outside  
the function.


 x - list(); y-list()
 y - sapply(1:10, function(z) x[[z]] - 1:z )
 x
list()




Please:

x - lapply(1:10, function(z) 1:z)

or even

x - lapply(1:10, seq_len)


Yes, I see the error of my ways. I wonder how many times I have been  
in this state of sin in the past?




--
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R extract parts

2012-03-27 Thread MSousa


Good Afternoon, 

I believe that my to the problem, the R has a more effective solution.
in place the use the loop 
  I have the following set of data, and needs to extract some sections.


user poscommunications source v_destine
7   1   109   2222
7   2   100   2222
7   3   214   2222
7   4   322   2222
7   5  69920 22   161
7   6   68  16197
7   7  196   9797
7   8   427   9722
7   9460   2222
7  10   307   2222
7  11  9582   2222
7  12   55428   2222
7  139192   2222
7  14  19   2222

my idea is to arise when a value greater than 1000 communications able to
extract some data.
In the example data set, is valued at over 1000 in the position 11,12,13.  
my idea is to get results like this:
user, sector, source, destine, count, average
7 1  22  22 4  186.25 #
(109+100+214+322)
7 2  161   97  1  68
7 2  97   97  1  196
7 2  97   22  1  427
7 2  22   22  2  383



--
View this message in context: 
http://r.789695.n4.nabble.com/R-extract-parts-tp4509042p4509042.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Constructing Distance matrix for hclust

2012-03-27 Thread Vinod Hegde

Hi,

I have similarity value between string pairs in a mysql database.
I need to construct the distance matrix which hclust can take and cluster
the strings. Most of the examples I came across show how to construct the
distance matrix using dist function.

How can I code to construct distance matrix using the data in mysql db.

Thanks a lot for any help.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lasso constraint

2012-03-27 Thread yx78

In the package lasso2, there is a Prostate Data. To find coefficients in the
prostate cancer example we could impose L1 constraint on the parameters. 

code is: 
data(Prostate) 
 p.mean - apply(Prostate, 5,mean) 
 pros - sweep(Prostate, 5, p.mean, -) 
 p.std - apply(pros, 5, var) 
 pros - sweep(pros, 5, sqrt(p.std),/) 
 pros[, lpsa] - Prostate[, lpsa] 
l1ce(lpsa ~  . , pros, bound = 0.44) 

I can't figure out what dose 0.44 come from. On the paper it said it was
from  generalized cross-validation and it is the optimal choice. 

paper name: Regression Shrinkage and Selection via the Lasso 

author: Robert Tibshirani 



--
View this message in context: 
http://r.789695.n4.nabble.com/lasso-constraint-tp4508998p4508998.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Supperscript, subscript and double lines in the main/subtitle and using greekletters



On Mar 27, 2012, at 9:39 AM, Gerrit Eichner wrote:


Hi, HJ,

see

?plotmath

Hth  --  Gerrit

-
Dr. Gerrit Eichner   Mathematical Institute, Room 212

On Tue, 27 Mar 2012, HJ YAN wrote:


Dear R-help,

I am trying to express myself as best as I can here. If you also  
use Latex

to edit math reports or other languages with similar editing method,
you'll see what I'm talking about. My sincere appologies if my  
question is
not clear enough to some extend, as also I'm not able to provide my  
code

here because I don`t know which one I can use...

When editing the title in R plots, such as using 'plot', or  
'xyplot' in
'lattic', what method do you use to write greek letters and make  
use of
superscript and subscript, e.g. to write mathematical expressions  
like

using Latex:

\sigma^2
\tau^{2s}
\mu_i
\pi_{2s}

Also I would like to learn how to make two lines in the main title  
or sub
title if the text I need it too long for putting in a single line,  
e.g. are
there some R code/syntax allowing me to do something like in Latex  
to make
two lines in the title, for example using '//' or '\\' to seperate  
the two

parts of the text I want to put in two lines??

I heard about using something like

plot(x,y, main=expression())

but from neither '?plot' or '?expression' could I find comprehensive
information about what I need...


The plotmath environment (not the correct term) will not accept the  
usual EOL \n marker for new lines. You can cobble together a  
subsitute (at least for the two line problem) using the plotmath  
`atop` function.


plot(1,1, main=expression(atop(  laaahhh~tau,  
bllleeehhh~epsilon)))


Notice the need for a plotmath connector such as ~ or * between  
the text and the unquoted greeks.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to test for the difference of means in population, please help

2012-03-27 Thread Greg Snow

You should use mixed effects modeling to analyze data of this sort.
This is not a topic that has generally been covered by introductory
classes, so you should consult with a professional statistician on
your problem, or educate yourself well beyond the novice level (this
takes more than just reading 1 book, a few classes would be good to
get to this level, or intense study of several books).

Since everything is balanced nicely, you could average over the 4
repeats and use a 2 sample t test (assuming the assumptions hold, your
sample data would be fine) comparing the 2 sets of 400 means.  This
will test for a general difference in the overall means, but ignores
other information and hypotheses that may be important (which is why
the mixed effects model approach is much preferred).

On Tue, Mar 27, 2012 at 1:13 AM, ali_protocol
mohammadianalimohammad...@gmail.com wrote:
 Dear all,

 Novice in statistics.

 I have 2 experimental conditions. Each condition has ~400 points as its
 response. Each condition is done in 4 repereats (so I have 2 x 400 x 4
 points).

 I want to compare the means of two conditions and test whether they are same
 or not. Which test should I use?

 #populations
 c = matrix (sample (1:20,1600, replace= TRUE), 400 ,4)
 b = matrix (sample (1:20,1600, replace= TRUE), 400 ,4)

 #means of repeats
 c.mean= apply (c,2, mean)
 b.mean= apply (b,2,mean)

 #mean of experiment
 c.mean.all= mean (c)
 b.mean.all= mean (b)

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-to-test-for-the-difference-of-means-in-population-please-help-tp4508089p4508089.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Supperscript, subscript and double lines in the main/subtitle and using greekletters

2012-03-27 Thread mlell08

The title() function also has parameter 'line' where you can specify the
margin line in which the text should be displayed.
How many lines of margin should be around the figure region of the plot
can be specified before plotting by par(mar=c(bottom,left,top,right)),
in text lines. margin lines are also used by par(mgp=...) or mtext()

Regards!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory Utilization on R

Guys, let me add my 5 coins into your interesting discussion.

I have ~10Gb txt file with train data for my model. It has about 150 millions 
rows for 12 variables.
When I load it into memory (just run only one row!):

train-read.table(file=/training.txt)

while loading it takes ~28Gb of RAM (It takes about 2hours to finish), and when 
data are loaded, rsession takes ~14Gb.
 I even can't imagine how much it will take when I will run svm train on this 
data set. Is there any optimization to decrease time required for loading data 
into memory.
I use 32RAM x64 box.

Thank you,
-Alex


From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of 
Kurinji Pandiyan [kurinji.pandi...@gmail.com]
Sent: 27 March 2012 18:14
To: R. Michael Weylandt
Cc: r-help@r-project.org
Subject: Re: [R] Memory Utilization on R

Thank you for the modified script! I have now tried on different datasets
and it works very well and is dramatically faster than my original script!

I really appreciate the help.
Kurinji

On Fri, Mar 23, 2012 at 1:33 PM, R. Michael Weylandt 
michael.weyla...@gmail.com wrote:

 Taking a look at your script: there are a some potential optimizations
 you can do:

  # Fine
 poi - as.character(top.GSM396290) #5000 characters
 x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables

 # Pre-allocate the space
 x - vector(list, 485577) # x - list()

 # Do the a stuff once outside the loop so you aren't doing it 485577
 times
 a - strsplit(as.character(x.data[, UCSC_REFGENE_NAME]), ;)

 # Lets use an apply statement instead of a for loop
 # vapply is the fastest since we prespecify the return type.
 x.data[vapply(a, function(x) any(poi %in% x), logical(1)), ]

 I think this will do what you wanted (and hopefully much faster)

 Note that you could probably tune this further but I think this
 strikes a good balance between clarity and performance (for now)

 Hope this helps,

 Michael

 On Fri, Mar 23, 2012 at 11:52 AM, Kurinji Pandiyan
 kurinji.pandi...@gmail.com wrote:
 
  Thank you for the input.
 
  As it were, I realized that my script is utilizing a lot more memory than
  I claimed - it was initially using 3 GB but has gone up to 20.24 active
 but
  29.63 assigned to the R session.
 
  The script has run overnight and now I don't think it is active anymore
  since I keep getting the error message that I am out of startup disk
 space
  for application memory.
 
  I am attaching screen shots of my RAM usage distribution (given that
 there
  is no fluctuation in the usage by the R session I believe it is not
 running
  anymore) and of my available HD.
 
 
 
 
 
  Here is my script -
 
  poi - as.character(top.GSM396290) #5000 characters
  x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables
  head(x.data)
 
  x - list()
 
  for(i in 1:485577){
   a - as.character(x.data[i, UCSC_REFGENE_NAME])
   a - unlist(strsplit(a, ;))
   if(any(poi %in% a) == TRUE) {x[[i]] - x.data[i,]}
}
 
   # this step completed in a few hours
 
  x - do.call(rbind, x) # this step has been running overnight and is
 still
  stuck
 
  Thanks, I really appreciate the help.
  Kurinji
 
  On Thu, Mar 22, 2012 at 10:44 PM, R. Michael Weylandt
  michael.weyla...@gmail.com wrote:
 
  Well... what makes you think you are hitting memory constraints then?
  If you have significantly less than 3GB of data, it shouldn't surprise
  you if R never needs more than 3GB of memory.
 
  You could just be running your scripts inefficiently...it's an extreme
  example, but all the memory and gigaflopping in the world can't speed
  this up (by much):
 
  for(i in seq_len(1e6)) Sys.sleep(10)
 
  Perhaps you should look into profiling tools or parallel
  computation...if you can post a representative example of your
  scripts, we might be able to give performance pointers.
 
  Michael
 
  On Fri, Mar 23, 2012 at 1:33 AM, Kurinji Pandiyan
  kurinji.pandi...@gmail.com wrote:
   Yes, I am.
  
   Thank you,
   Kurinji
  
   On Mar 22, 2012, at 10:27 PM, R. Michael Weylandt
   michael.weyla...@gmail.com wrote:
  
   Use 64bit R?
  
   Michael
  
   On Thu, Mar 22, 2012 at 5:22 PM, Kurinji Pandiyan
   kurinji.pandi...@gmail.com wrote:
   Hello,
  
   I have a 32 GB RAM Mac Pro with a 2*2.4 GHz quad core processor and
   2TB
   storage. Despite this having so much memory, I am not able to get R
   to
   utilize much more than 3 GBs. Some of my scripts take hours to run
   but I
   would think they would be much faster if more memory is utilized.
 How
   do I
   optimize the memory usage on R by my Mac Pro?
  
   Thank you!
   Kurinji
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 


[[alternative HTML version

Re: [R] two lmer questions - formula with related variables and output interpretation

2012-03-27 Thread Dragonwalker

I realised that I removed the link to the question but forgot to remove the
text regarding it. Sorry. I am not sure if I am supposed to link to other
forums, but I can add the links as needed (as the format is clearer).

I actually have one more question though in regards to which data to use.
If it is better to just report the estimates and CIs then should I use those
with shrinkage instead, and if so, does anyone know how I can get the CIs
for these rather than just the regular CIs. I apologise if I am asking too
many questions within one post.

Rachel

--
View this message in context: 
http://r.789695.n4.nabble.com/two-lmer-questions-formula-with-related-variables-and-output-interpretation-tp4508876p4509334.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory Utilization on R

2012-03-27 Thread R. Michael Weylandt

Note that you can actually drop the line defining the big list x. I
thought it would be needed, but it turns out to be unnecessary after
cleaning up the second half: cutting off that allocation might save
you even more time.

Best,
Michael

On Tue, Mar 27, 2012 at 11:14 AM, Kurinji Pandiyan
kurinji.pandi...@gmail.com wrote:
 Thank you for the modified script! I have now tried on different datasets
 and it works very well and is dramatically faster than my original script!

 I really appreciate the help.
 Kurinji

 On Fri, Mar 23, 2012 at 1:33 PM, R. Michael Weylandt
 michael.weyla...@gmail.com wrote:

 Taking a look at your script: there are a some potential optimizations
 you can do:

  # Fine
 poi - as.character(top.GSM396290) #5000 characters
 x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables

 # Pre-allocate the space
 x - vector(list, 485577) # x - list()

 # Do the a stuff once outside the loop so you aren't doing it 485577
 times
 a - strsplit(as.character(x.data[, UCSC_REFGENE_NAME]), ;)

 # Lets use an apply statement instead of a for loop
 # vapply is the fastest since we prespecify the return type.
 x.data[vapply(a, function(x) any(poi %in% x), logical(1)), ]

 I think this will do what you wanted (and hopefully much faster)

 Note that you could probably tune this further but I think this
 strikes a good balance between clarity and performance (for now)

 Hope this helps,

 Michael

 On Fri, Mar 23, 2012 at 11:52 AM, Kurinji Pandiyan
 kurinji.pandi...@gmail.com wrote:
 
  Thank you for the input.
 
  As it were, I realized that my script is utilizing a lot more memory
  than
  I claimed - it was initially using 3 GB but has gone up to 20.24 active
  but
  29.63 assigned to the R session.
 
  The script has run overnight and now I don't think it is active anymore
  since I keep getting the error message that I am out of startup disk
  space
  for application memory.
 
  I am attaching screen shots of my RAM usage distribution (given that
  there
  is no fluctuation in the usage by the R session I believe it is not
  running
  anymore) and of my available HD.
 
 
 
 
 
  Here is my script -
 
  poi - as.character(top.GSM396290) #5000 characters
  x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables
  head(x.data)
 
  x - list()
 
  for(i in 1:485577){
   a - as.character(x.data[i, UCSC_REFGENE_NAME])
   a - unlist(strsplit(a, ;))
   if(any(poi %in% a) == TRUE) {x[[i]] - x.data[i,]}
    }
 
   # this step completed in a few hours
 
  x - do.call(rbind, x) # this step has been running overnight and is
  still
  stuck
 
  Thanks, I really appreciate the help.
  Kurinji
 
  On Thu, Mar 22, 2012 at 10:44 PM, R. Michael Weylandt
  michael.weyla...@gmail.com wrote:
 
  Well... what makes you think you are hitting memory constraints then?
  If you have significantly less than 3GB of data, it shouldn't surprise
  you if R never needs more than 3GB of memory.
 
  You could just be running your scripts inefficiently...it's an extreme
  example, but all the memory and gigaflopping in the world can't speed
  this up (by much):
 
  for(i in seq_len(1e6)) Sys.sleep(10)
 
  Perhaps you should look into profiling tools or parallel
  computation...if you can post a representative example of your
  scripts, we might be able to give performance pointers.
 
  Michael
 
  On Fri, Mar 23, 2012 at 1:33 AM, Kurinji Pandiyan
  kurinji.pandi...@gmail.com wrote:
   Yes, I am.
  
   Thank you,
   Kurinji
  
   On Mar 22, 2012, at 10:27 PM, R. Michael Weylandt
   michael.weyla...@gmail.com wrote:
  
   Use 64bit R?
  
   Michael
  
   On Thu, Mar 22, 2012 at 5:22 PM, Kurinji Pandiyan
   kurinji.pandi...@gmail.com wrote:
   Hello,
  
   I have a 32 GB RAM Mac Pro with a 2*2.4 GHz quad core processor and
   2TB
   storage. Despite this having so much memory, I am not able to get R
   to
   utilize much more than 3 GBs. Some of my scripts take hours to run
   but I
   would think they would be much faster if more memory is utilized.
   How
   do I
   optimize the memory usage on R by my Mac Pro?
  
   Thank you!
   Kurinji
  
          [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] copy the columns based on the code

2012-03-27 Thread MSousa



Hello,

this code, works perfectly 
   temp - merge(travel, city, by.x=Source, by.y=cod)
   result - merge(temp, city, by.x=Destine, by.y=cod) 

The problem was the construction of the data frame, had a parenthesis in
city-rbind(city,data.frame(city=Lisbon,cod=3))), 

I tried to delete the post, but i don't could.
  As I have little experience in R, I still do some mistakes.
I use read.table to load the data frame, the way in the post, it was quickly
that  i found to describe the problem.
  The forum has been a great help for me.

Thanks







--
View this message in context: 
http://r.789695.n4.nabble.com/copy-the-columns-based-on-the-code-tp4505253p4509340.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] copy the columns based on the code

2012-03-27 Thread jim holtman

yet another way:

 city-data.frame(city=Barcelona,cod=1)
 city-rbind(city,data.frame(city=Madrid,cod=2))
 city-rbind(city,data.frame(city=Lisbon,cod=3))
 city-rbind(city,data.frame(city=Milan,cod=4))
 city-rbind(city,data.frame(city=London,cod=5))

 travel-data.frame(pos=1,Source=1,Destine=2)
 travel-rbind(travel,data.frame(pos=1,Source=1,Destine=3))
 travel-rbind(travel,data.frame(pos=2,Source=3,Destine=4))
 travel-rbind(travel,data.frame(pos=3,Source=2,Destine=4))
 travel-rbind(travel,data.frame(pos=4,Source=1,Destine=3))

 travel$city - city$city[match(travel$Source, city$cod)]
 travel$city_destine - city$city[match(travel$Destine, city$cod)]

 travel
  pos Source Destine  city city_destine
1   1  1   2 Barcelona   Madrid
2   1  1   3 Barcelona   Lisbon
3   2  3   4LisbonMilan
4   3  2   4MadridMilan
5   4  1   3 Barcelona   Lisbon



On Tue, Mar 27, 2012 at 12:15 PM, MSousa ricardosousa2...@clix.pt wrote:


 Hello,

 this code, works perfectly
   temp - merge(travel, city, by.x=Source, by.y=cod)
   result - merge(temp, city, by.x=Destine, by.y=cod)

 The problem was the construction of the data frame, had a parenthesis in
 city-rbind(city,data.frame(city=Lisbon,cod=3))),

 I tried to delete the post, but i don't could.
  As I have little experience in R, I still do some mistakes.
 I use read.table to load the data frame, the way in the post, it was quickly
 that  i found to describe the problem.
  The forum has been a great help for me.

 Thanks







 --
 View this message in context: 
 http://r.789695.n4.nabble.com/copy-the-columns-based-on-the-code-tp4505253p4509340.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot of function seems to cut off near edge of domain

2012-03-27 Thread Chad Mills

Ah, thanks.  I am new to R and was unaware of the from/to parameters for
the plot function.  I thought xlim and ylim served that purpose.  Thanks
again!

-Chad

On Tue, Mar 27, 2012 at 3:31 AM, Matthieu Dubois matth...@gmail.com wrote:

 Dear Chad,

 your problem is linked to (1) the function returning NaNs from x values
 greater than 50, and (2) the fact that the function is estimated on a
 predefined number of points.

 Calling plot for a function object is basically a wrapper for curve(). Your
 function g() is evaluated on the whole xlim domain, which will return NaN
 values for x50 (Try g(60) ). In addition, curve() splits the x interval
 (here
 from 0 to 60) into a predifined number of points (n=101 is the default, see
 help(curve)) at which the function is estimated. In your code, the
 function is
 estimated at values x - seq(0, 60, length=101), and g(x) that are not NaN
 are
 plotted. The largest x value (from the sequence) that doesn't return a NaN
 is
 max(x[!is.nan(g(x))]), which is 49.8.

 One way to solve it is to explicitly specify the domain used to estimate
 the
 function, by using the from and to arguments that are passed to curve():

 #Figure 2, with xlim beyond the radius of the circle
 plot(g,axes=F,from=0, to =50, xlim=c(0, 60), ylim=c(0,60))
 axis(1,pos=0)
 axis(2,pos=0)

 HTH

 Matthieu

 Matthieu Dubois
 Post-doctoral researcher
 Psychology Department
 Université Libre de Bruxelles

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory Utilization on R

2012-03-27 Thread R. Michael Weylandt

It's really not suggested etiquette to thread-jack, but generally, the
more you can tell to read.table (particularly the colClasses, nrows,
as.is, and stringsAsFactors arguments) the faster it will be able to
read things by skipping various necessary checks.

Michael

On Tue, Mar 27, 2012 at 12:07 PM, Alekseiy Beloshitskiy
abeloshits...@velti.com wrote:
 Guys, let me add my 5 coins into your interesting discussion.

 I have ~10Gb txt file with train data for my model. It has about 150 millions 
 rows for 12 variables.
 When I load it into memory (just run only one row!):

 train-read.table(file=/training.txt)

 while loading it takes ~28Gb of RAM (It takes about 2hours to finish), and 
 when data are loaded, rsession takes ~14Gb.
  I even can't imagine how much it will take when I will run svm train on this 
 data set. Is there any optimization to decrease time required for loading 
 data into memory.
 I use 32RAM x64 box.

 Thank you,
 -Alex

 
 From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf 
 of Kurinji Pandiyan [kurinji.pandi...@gmail.com]
 Sent: 27 March 2012 18:14
 To: R. Michael Weylandt
 Cc: r-help@r-project.org
 Subject: Re: [R] Memory Utilization on R

 Thank you for the modified script! I have now tried on different datasets
 and it works very well and is dramatically faster than my original script!

 I really appreciate the help.
 Kurinji

 On Fri, Mar 23, 2012 at 1:33 PM, R. Michael Weylandt 
 michael.weyla...@gmail.com wrote:

 Taking a look at your script: there are a some potential optimizations
 you can do:

  # Fine
 poi - as.character(top.GSM396290) #5000 characters
 x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables

 # Pre-allocate the space
 x - vector(list, 485577) # x - list()

 # Do the a stuff once outside the loop so you aren't doing it 485577
 times
 a - strsplit(as.character(x.data[, UCSC_REFGENE_NAME]), ;)

 # Lets use an apply statement instead of a for loop
 # vapply is the fastest since we prespecify the return type.
 x.data[vapply(a, function(x) any(poi %in% x), logical(1)), ]

 I think this will do what you wanted (and hopefully much faster)

 Note that you could probably tune this further but I think this
 strikes a good balance between clarity and performance (for now)

 Hope this helps,

 Michael

 On Fri, Mar 23, 2012 at 11:52 AM, Kurinji Pandiyan
 kurinji.pandi...@gmail.com wrote:
 
  Thank you for the input.
 
  As it were, I realized that my script is utilizing a lot more memory than
  I claimed - it was initially using 3 GB but has gone up to 20.24 active
 but
  29.63 assigned to the R session.
 
  The script has run overnight and now I don't think it is active anymore
  since I keep getting the error message that I am out of startup disk
 space
  for application memory.
 
  I am attaching screen shots of my RAM usage distribution (given that
 there
  is no fluctuation in the usage by the R session I believe it is not
 running
  anymore) and of my available HD.
 
 
 
 
 
  Here is my script -
 
  poi - as.character(top.GSM396290) #5000 characters
  x.data - h1[,c(1,7:9)] # 485577 obs of 4 variables
  head(x.data)
 
  x - list()
 
  for(i in 1:485577){
   a - as.character(x.data[i, UCSC_REFGENE_NAME])
   a - unlist(strsplit(a, ;))
   if(any(poi %in% a) == TRUE) {x[[i]] - x.data[i,]}
    }
 
   # this step completed in a few hours
 
  x - do.call(rbind, x) # this step has been running overnight and is
 still
  stuck
 
  Thanks, I really appreciate the help.
  Kurinji
 
  On Thu, Mar 22, 2012 at 10:44 PM, R. Michael Weylandt
  michael.weyla...@gmail.com wrote:
 
  Well... what makes you think you are hitting memory constraints then?
  If you have significantly less than 3GB of data, it shouldn't surprise
  you if R never needs more than 3GB of memory.
 
  You could just be running your scripts inefficiently...it's an extreme
  example, but all the memory and gigaflopping in the world can't speed
  this up (by much):
 
  for(i in seq_len(1e6)) Sys.sleep(10)
 
  Perhaps you should look into profiling tools or parallel
  computation...if you can post a representative example of your
  scripts, we might be able to give performance pointers.
 
  Michael
 
  On Fri, Mar 23, 2012 at 1:33 AM, Kurinji Pandiyan
  kurinji.pandi...@gmail.com wrote:
   Yes, I am.
  
   Thank you,
   Kurinji
  
   On Mar 22, 2012, at 10:27 PM, R. Michael Weylandt
   michael.weyla...@gmail.com wrote:
  
   Use 64bit R?
  
   Michael
  
   On Thu, Mar 22, 2012 at 5:22 PM, Kurinji Pandiyan
   kurinji.pandi...@gmail.com wrote:
   Hello,
  
   I have a 32 GB RAM Mac Pro with a 2*2.4 GHz quad core processor and
   2TB
   storage. Despite this having so much memory, I am not able to get R
   to
   utilize much more than 3 GBs. Some of my scripts take hours to run
   but I
   would think they would be much faster if more memory is utilized.
 How
   do I
   optimize the memory usage on R by my Mac Pro?
  
   Thank you!
   Kurinji

[R] What error distribution should I use?

2012-03-27 Thread Lívia Dorneles Audino

I'm trying to make a glmm to identify the relationship between insect
species richness with fragment size, isolation and time (different years).
I already tried to analyse it using poisson distribution error, but I
always face with the following warning:
*glm.fit: fitted probabilities numerically 0 or 1 occurred *

This is probably hapenning because my dataset has a lot of zeros. So, what
error distribution should I use?

-- 
*Lívia *

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ignore error getting next result

2012-03-27 Thread C Lin


Dear All,
 
How do I ignore an error and still getting result of next iteration.
I am trying to do wilcox.test on a loop, when the test fail, I would like to 
continue doing the next iteration and getting the p-value.
I tried to do tryCatch or try but I cannot retrieve the p-value if the test is 
not fail.
 
sample code:

test2=list(numeric(0),c(10,20));
test1=list(c(1),c(1,2,3,4));
for (i in 1:2){
 wtest=wilcox.test(test1[[i]],test2[[i]])
}
 
i=1 will fail, I want to ignore this and get the pvalue for i=2.
 
Thanks,
Lin   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lasso constraint

2012-03-27 Thread Weidong Gu

Hi,

your code has errors: apply function only has 1 or 2 as margin.

bound is used as turning parameter for summation of absolute
coefficients. lasso runs on a grid of the turning parameter for
varying strength of shrinkage. so each turning value may yield
different sets of coefficients and values. cross validation is used to
estimate the value of the turning parameter which gives the smallest
errors (mse or deviance) on testing data.

Weidong Gu



On Tue, Mar 27, 2012 at 10:35 AM, yx78 yangx...@gmail.com wrote:
 In the package lasso2, there is a Prostate Data. To find coefficients in the
 prostate cancer example we could impose L1 constraint on the parameters.

 code is:
 data(Prostate)
  p.mean - apply(Prostate, 5,mean)
  pros - sweep(Prostate, 5, p.mean, -)
  p.std - apply(pros, 5, var)
  pros - sweep(pros, 5, sqrt(p.std),/)
  pros[, lpsa] - Prostate[, lpsa]
 l1ce(lpsa ~  . , pros, bound = 0.44)

 I can't figure out what dose 0.44 come from. On the paper it said it was
 from  generalized cross-validation and it is the optimal choice.

 paper name: Regression Shrinkage and Selection via the Lasso

 author: Robert Tibshirani



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/lasso-constraint-tp4508998p4508998.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] read.octave fails with data from Octave 3.2.X

2012-03-27 Thread Helios de Rosario

Hi,

I'm afraid that the function read.octave from package foreign has
some problems with the ASCII data format exported by new versions of
Octave (later than 3.2.X). It fails even for a simple case as:

[Octave code:]
octave:1 x=1;
octave:2 save -ascii testdata.mat x

[Now in R:]
 octavedata - read.octave('testdata.mat')
Mensajes de aviso perdidos
In read_octave_unknown(con, type) : cannot handle unknown type ''

In this simple case I guess that the problem is that new versions
Octave append two blank lines after each variable, and this confuses the
current implementation of read.octave()

The problem is worse if the saved variables include other types as
structs, or strings. The new syntax of the MAT files is not recognized
by read.octave().

Of course, it's always difficult to keep this kind of functions working
when the external program changes its specification for saving
variables, but if would be nice if the maintainers of foreign could at
least solve the issue of blank lines. That way, it would still be
possible to import simple data types as scalars and matrices.

Otherwise, I suppose that a workaround is saving the data in binary
(matlab) format, then load it with Octave 3.2.X, and save it in text
format from that version.

 sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
[5] LC_TIME=Spanish_Spain.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] foreign_0.8-49




-- 
Helios de Rosario Martínez
 
 Researcher


INSTITUTO DE BIOMECÁNICA DE VALENCIA
Universidad Politécnica de Valencia • Edificio 9C
Camino de Vera s/n • 46022 VALENCIA (ESPAÑA)
Tel. +34 96 387 91 60 • Fax +34 96 387 91 69
www.ibv.org

  Antes de imprimir este e-mail piense bien si es necesario hacerlo.
En cumplimiento de la Ley Orgánica 15/1999 reguladora de la Protección
de Datos de Carácter Personal, le informamos de que el presente mensaje
contiene información confidencial, siendo para uso exclusivo del
destinatario arriba indicado. En caso de no ser usted el destinatario
del mismo le informamos que su recepción no le autoriza a su divulgación
o reproducción por cualquier medio, debiendo destruirlo de inmediato,
rogándole lo notifique al remitente.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lasso constraint

2012-03-27 Thread Bert Gunter

Inline:

On Tue, Mar 27, 2012 at 10:00 AM, Weidong Gu anopheles...@gmail.com wrote:
 Hi,

 your code has errors: apply function only has 1 or 2 as margin.

FALSE.  Please re-read the Help files. It works as expected with
arbitrary higher dim arrays.

-- Bert



 bound is used as turning parameter for summation of absolute
 coefficients. lasso runs on a grid of the turning parameter for
 varying strength of shrinkage. so each turning value may yield
 different sets of coefficients and values. cross validation is used to
 estimate the value of the turning parameter which gives the smallest
 errors (mse or deviance) on testing data.

 Weidong Gu



 On Tue, Mar 27, 2012 at 10:35 AM, yx78 yangx...@gmail.com wrote:
 In the package lasso2, there is a Prostate Data. To find coefficients in the
 prostate cancer example we could impose L1 constraint on the parameters.

 code is:
 data(Prostate)
  p.mean - apply(Prostate, 5,mean)
  pros - sweep(Prostate, 5, p.mean, -)
  p.std - apply(pros, 5, var)
  pros - sweep(pros, 5, sqrt(p.std),/)
  pros[, lpsa] - Prostate[, lpsa]
 l1ce(lpsa ~  . , pros, bound = 0.44)

 I can't figure out what dose 0.44 come from. On the paper it said it was
 from  generalized cross-validation and it is the optimal choice.

 paper name: Regression Shrinkage and Selection via the Lasso

 author: Robert Tibshirani



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/lasso-constraint-tp4508998p4508998.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Supperscript, subscript and double lines in the main/subtitle and using greekletters

Sorry last message was not completed before sending
Please below

On Tue, Mar 27, 2012 at 5:36 PM, HJ YAN yhj...@googlemail.com wrote:

 Thank you very much Gerrit, for the nice hints!

 Just done some more googling and reaserches on this and trying to
 answering it myself...

 Below is the code that works for double lines (adopted from Gerrit's
 hints) and some of the formats (e.g. 1 and 3, but not 2 and 4) listed below:

 (1) \sigma^2
 (2) \tau^{2s}
 (3) \mu_i
 (4) \pi_{2s}

 plot(1:3, ylab = expression(Superscript in greek letters ( * mu^2 ~ m))
, xlab = expression(Subscript in greek letters ~ mu[2]* ~ pi)
   , main = expression(atop(Happy Easter ,to all R-Helpers)))


 For using greek letters, am still a bit confused when needing a
 * though...e.g. seems it needs a * in front of greek letter
 expressions, when applying 'expression (...)'. And a * seems not
 required when a greek letter is needed outside the double quotations, e.g.

when applying just 'expression(...)'.  Again, a * is needed when making
subscript as shown above...

It seems ~ is reserved for making spaces before/between greek letters.
What if we need ~ in the title as ~ is a standard notation in
statistics when expressing is from when writing down a distribution, e.g.
'X~N(0,1)'...

HJ

















 On Tue, Mar 27, 2012 at 2:39 PM, Gerrit Eichner 
 gerrit.eich...@math.uni-giessen.de wrote:

 Hi, HJ,

 see

 ?plotmath

  Hth  --  Gerrit

 --**--**-
 Dr. Gerrit Eichner   Mathematical Institute, Room 212
 gerrit.eich...@math.uni-**giessen.de gerrit.eich...@math.uni-giessen.de  
 Justus-Liebig-University Giessen
 Tel: +49-(0)641-99-32104  Arndtstr. 2, 35392 Giessen, Germany
 Fax: +49-(0)641-99-32109
 http://www.uni-giessen.de/cms/**eichnerhttp://www.uni-giessen.de/cms/eichner
 --**--**-



 On Tue, 27 Mar 2012, HJ YAN wrote:

  Dear R-help,

 I am trying to express myself as best as I can here. If you also use
 Latex
 to edit math reports or other languages with similar editing method,
 you'll see what I'm talking about. My sincere appologies if my question
 is
 not clear enough to some extend, as also I'm not able to provide my code
 here because I don`t know which one I can use...

 When editing the title in R plots, such as using 'plot', or 'xyplot' in
 'lattic', what method do you use to write greek letters and make use of
 superscript and subscript, e.g. to write mathematical expressions like
 using Latex:

 \sigma^2
 \tau^{2s}
 \mu_i
 \pi_{2s}

 Also I would like to learn how to make two lines in the main title or sub
 title if the text I need it too long for putting in a single line, e.g.
 are
 there some R code/syntax allowing me to do something like in Latex to
 make
 two lines in the title, for example using '//' or '\\' to seperate the
 two
 parts of the text I want to put in two lines??

 I heard about using something like

 plot(x,y, main=expression())

 but from neither '?plot' or '?expression' could I find comprehensive
 information about what I need...

 Many thanks!
 HJ

[[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rgdal package - get information

2012-03-27 Thread julio cesar oliveira


 Hi,

 I used
 GDALinfo(MOD13Q1.A2001049.h13v11.005.2007002215512.250m_16_days_EVI.tif)  
 and
 got the results:

 rows10
 columns 11
 bands   1
 origin.x150701.4
 origin.y7744897
 res.x   250
 res.y   250
 ysign   -1
 oblique.x   0
 oblique.y   0
 driver  GTiff
 projection  +proj=utm +zone=23 +south +datum=WGS84 +units=m +no_defs
 file
  /MOD13Q1.A2001049.h13v11.005.2007002215512.250m_16_days_EVI.tif
 apparent band summary:
   *GDType*   Bmin  Bmax Bmean Bsd hasNoDataValue NoDataValue
 1  *Int16* -32768 32767 0   0  FALSE   0
 Metadata:
 AREA_OR_POINT=Point
 TIFFTAG_SOFTWARE=MODIS Reprojection Tool  v4.1 March 2009



 *How to read the information GDType?*


Thanks,

julio

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] installing R 2.14.2

2012-03-27 Thread Heba S


Hello,I  am trying to install a newer version of R (R 2.14.2) from this 
linkhttp://cran.r-project.org/bin/macosx/
However I am getting an error that it can not be installed on my computer. My 
Mac is version 10.6.8. Can you please advise me what the problem. I need the 
newer version to install the ggm package.
Thanks,
Heba
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ignore error getting next result



On Mar 27, 2012, at 12:56 PM, C Lin wrote:



Dear All,

How do I ignore an error and still getting result of next iteration.
I am trying to do wilcox.test on a loop, when the test fail, I would  
like to continue doing the next iteration and getting the p-value.
I tried to do tryCatch or try but I cannot retrieve the p-value if  
the test is not fail.


sample code:

test2=list(numeric(0),c(10,20));
test1=list(c(1),c(1,2,3,4));
for (i in 1:2){
wtest=wilcox.test(test1[[i]],test2[[i]])
}

i=1 will fail, I want to ignore this and get the pvalue for i=2.


Please read the FAQ entry And you would be advise to read through the  
rest of the FAQ as well.


http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-capture-or-ignore-errors-in-a-long-simulation_003f




Thanks,
Lin 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help on predict.lm

2012-03-27 Thread Nederjaard

Hello, 

I'm new here, but will try to be as specific and complete as possible. I'm
trying to use âlmâ to first estimate parameter values from a set of
calibration measurements, and then later to use those estimates to calculate
another set of values with âpredict.lmâ.

First I have a calibration dataset of absorbance values measured from
standard solutions with known concentration of Bromide:

 stds
  abs conc
1 -0.00210
2  0.1003  200
3  0.2395  500
4  0.3293  800

On this small calibration series, I perform a linear regression to find the
parameter estimates of the relationship between absorbance (abs) and
concentration (conc):

 linear1 - lm(abs~conc, data=stds)
 summary(linear1)

Call:
lm(formula = abs ~ conc, data = stds)

Residuals:
1 2 3 4 
-0.012600  0.006467  0.020667 -0.014533 

Coefficients:
 Estimate Std. Error t value Pr(|t|)   
(Intercept) 1.050e-02  1.629e-02   0.645  0.58527   
conc4.167e-04  3.378e-05  12.333  0.00651 **
---
Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â 
â 1 

Residual standard error: 0.02048 on 2 degrees of freedom
Multiple R-squared: 0.987,  Adjusted R-squared: 0.9805 
F-statistic: 152.1 on 1 and 2 DF,  p-value: 0.00651 





Now I come with another dataset, which contains measured absorbance values
of Bromide in solution:

 brom
hours abs
1-1.0  0.0633
2 1.0  0.2686
3 5.0  0.2446
418.0  0.2274
529.0  0.2091
642.0  0.1961
753.0  0.1310
876.0  0.1504
991.0  0.1317
10   95.5  0.1169
11  101.0  0.0977
12  115.0  0.1023
13  123.5  0.0879
14  138.5  0.0724
15  147.5  0.0564
16  163.0  0.0495
17  171.0  0.0325
18  189.0  0.0182
19  211.0  0.0047
20  212.5  NA
21  815.5 -0.2112
22  816.5 -0.1896
23  817.5 -0.0783
24  818.5  0.2963
25  819.5  0.1448
26  839.5  0.0936
27  864.0  0.0560
28  888.0  0.0310
29  960.5  0.0056
30 1009.0 -0.0163

The values in column brom$abs, measured on 30 subsequent points in time need
to be calculated to Bromide concentrations, using the previously established
relationship âlinear1â.  
At first, I thought it could be done by:

 predict.lm(linear1, brom$abs)
Error in eval(predvars, data, env) : 
  numeric 'envir' arg not of length one

But, R gives the above error message. Then, after some searching around on
different fora and R-communities (including this one), I learned that the
ânewdataâ in âpredict.lmâ actually needs to be coerced into a separate
dataframe. Thus:

 mabs - data.frame(Abs = brom$abs)
 predict.lm(linear1, mabs)
Error in eval(expr, envir, enclos) : object 'conc' not found

Again, R gives an error...probably because I made an error, but I truly fail
to see where. I hope somebody can explain to me clearly what I'm doing wrong
and what I should do to instead.
Any help is greatly appreciated, thanks !

--
View this message in context: 
http://r.789695.n4.nabble.com/Help-on-predict-lm-tp4509586p4509586.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lasso constraint

2012-03-27 Thread Steve Lianoglou

Hi,

On Tue, Mar 27, 2012 at 10:35 AM, yx78 yangx...@gmail.com wrote:
 In the package lasso2, there is a Prostate Data. To find coefficients in the
 prostate cancer example we could impose L1 constraint on the parameters.

 code is:
 data(Prostate)
  p.mean - apply(Prostate, 5,mean)
  pros - sweep(Prostate, 5, p.mean, -)
  p.std - apply(pros, 5, var)
  pros - sweep(pros, 5, sqrt(p.std),/)
  pros[, lpsa] - Prostate[, lpsa]
 l1ce(lpsa ~  . , pros, bound = 0.44)

 I can't figure out what dose 0.44 come from. On the paper it said it was
 from  generalized cross-validation and it is the optimal choice.

Yes, this is exactly how the optimal value for bound would be found.

Using the lasso2 package, you'll likely have to do a grid search over
possible values for `bound` in a cross validation setting and you pick
the one that fits the model best on the held out data over all your CV
folds.

If I were you, I'd use the glmnet package since it can calculate the
entire regularization path w/o having to do a grid search over the
bound (or lamda), making cross validation easier.

If you're confused about how you might use cross validation to find
the optimal value of the parameter(s) of the model you are building,
then it's time to pull yourself away from the keyboaRd and start doing
some reading, or (as Bert will likely tell you) consult your local
statistician.

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] readHTLMTable help

2012-03-27 Thread Lucas

Hello to everyone.
I´m using this function to download some information from a website.
This is the URL:
http://164.77.222.61/climatologia/php/vientoMaximo8.php?IdEstacion=330007FechaIni=01-1-1980
If you go to that website you´ll find a table with meteorological
information. One column is called Intesidad Máxima Diaria, and that is
the one i need.
I´ve been traying to extract that column, but I´m unable to do it.
First I tryed simple to download the complete table and then do some kind
of filter to extract the column but, for some reason when I call the
function
a-readHTLMTable(url), the table is downloaded in a unfriendly format and I
can not differentiate the column

If anyone could help me I´ll appreciate it.
Thank you.

Lucas.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Convert day of year back into a date format.

2012-03-27 Thread Sam Albers

Hello,

I am having trouble figuring out how to convert a Day of Year integer
back into a Date format. For example I have the following:

date - 
c('2008-01-01','2008-01-02','2008-01-03','2008-01-04','2008-01-05','2008-01-06','2008-01-07',
'2008-01-08','2008-01-09','2008-01-10','2008-01-11','2008-01-12','2008-01-13','2008-01-14','2008-01-15',
'2008-01-16','2008-01-17','2008-01-18','2008-01-19','2008-01-20','2008-01-21','2008-01-22','2008-01-23')

## this is then converted into a number corresponding to the day of
the year like so:

dayofyear - strptime(date, format=%Y-%m-%d)$yday + 1

## Now my question is how do I get back to a date format (obviously
omitting the year).
## The end result is that I'd like to be able to have axis labels as
something like Month-Day or just Month
## instead of just an integers which isn't always intuitive for people
but I can't seem to figure out how to tell R
## to recognize an integer as a date.

Any suggestions?

Many thanks in advance!

Sam

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ignore error getting next result

2012-03-27 Thread C Lin

As a matter of fact, I did read the FAQ. However, in the FAQ coef() is used to 
return the coefficients of lm() if it succeeded. 
I cannot find similar function for pvalue.

 CC: r-help@r-project.org
 From: dwinsem...@comcast.net
 To: bac...@hotmail.com
 Subject: Re: [R] ignore error getting next result
 Date: Tue, 27 Mar 2012 13:40:39 -0400

 On Mar 27, 2012, at 12:56 PM, C Lin wrote:

  Dear All,

  How do I ignore an error and still getting result of next iteration.
  I am trying to do wilcox.test on a loop, when the test fail, I would 
  like to continue doing the next iteration and getting the p-value.
  I tried to do tryCatch or try but I cannot retrieve the p-value if 
  the test is not fail.

  sample code:

  test2=list(numeric(0),c(10,20));
  test1=list(c(1),c(1,2,3,4));
  for (i in 1:2){
  wtest=wilcox.test(test1[[i]],test2[[i]])
  }

  i=1 will fail, I want to ignore this and get the pvalue for i=2.

 Please read the FAQ entry And you would be advise to read through the 
 rest of the FAQ as well.

 http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-capture-or-ignore-errors-in-a-long-simulation_003f

  Thanks,
  Lin 
  [[alternative HTML version deleted]]

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 West Hartford, CT

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help on predict.lm

2012-03-27 Thread Berend Hasselman


On 27-03-2012, at 19:24, Nederjaard wrote:

 Hello, 
 
 I'm new here, but will try to be as specific and complete as possible. I'm
 trying to use “lm“ to first estimate parameter values from a set of
 calibration measurements, and then later to use those estimates to calculate
 another set of values with “predict.lm”.
 
 First I have a calibration dataset of absorbance values measured from
 standard solutions with known concentration of Bromide:
 
 stds
  abs conc
 1 -0.00210
 2  0.1003  200
 3  0.2395  500
 4  0.3293  800
 
 On this small calibration series, I perform a linear regression to find the
 parameter estimates of the relationship between absorbance (abs) and
 concentration (conc):
 
 linear1 - lm(abs~conc, data=stds)
 summary(linear1)
 
 Call:
 lm(formula = abs ~ conc, data = stds)
 
 Residuals:
1 2 3 4 
 -0.012600  0.006467  0.020667 -0.014533 
 
 Coefficients:
 Estimate Std. Error t value Pr(|t|)   
 (Intercept) 1.050e-02  1.629e-02   0.645  0.58527   
 conc4.167e-04  3.378e-05  12.333  0.00651 **
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 
 Residual standard error: 0.02048 on 2 degrees of freedom
 Multiple R-squared: 0.987,  Adjusted R-squared: 0.9805 
 F-statistic: 152.1 on 1 and 2 DF,  p-value: 0.00651 
 
 
 
 
 
 Now I come with another dataset, which contains measured absorbance values
 of Bromide in solution:
 
 brom
hours abs
 1-1.0  0.0633
 2 1.0  0.2686
 3 5.0  0.2446
 418.0  0.2274
 529.0  0.2091
 642.0  0.1961
 753.0  0.1310
 876.0  0.1504
 991.0  0.1317
 10   95.5  0.1169
 11  101.0  0.0977
 12  115.0  0.1023
 13  123.5  0.0879
 14  138.5  0.0724
 15  147.5  0.0564
 16  163.0  0.0495
 17  171.0  0.0325
 18  189.0  0.0182
 19  211.0  0.0047
 20  212.5  NA
 21  815.5 -0.2112
 22  816.5 -0.1896
 23  817.5 -0.0783
 24  818.5  0.2963
 25  819.5  0.1448
 26  839.5  0.0936
 27  864.0  0.0560
 28  888.0  0.0310
 29  960.5  0.0056
 30 1009.0 -0.0163
 
 The values in column brom$abs, measured on 30 subsequent points in time need
 to be calculated to Bromide concentrations, using the previously established
 relationship “linear1”.  
 At first, I thought it could be done by:
 
 predict.lm(linear1, brom$abs)
 Error in eval(predvars, data, env) : 
  numeric 'envir' arg not of length one
 
 But, R gives the above error message. Then, after some searching around on
 different fora and R-communities (including this one), I learned that the
 “newdata” in “predict.lm” actually needs to be coerced into a separate
 dataframe. Thus:
 
 mabs - data.frame(Abs = brom$abs)
 predict.lm(linear1, mabs)
 Error in eval(expr, envir, enclos) : object 'conc' not found
 

There is no column with name conc in your dataframe mabs.

You regressed abs on conc. For prediction you need data for conc and not abs.
So provide data for conc. Or change the regression around: lm(conc ~ abs, 
data=stds) if that makes any sense.

What you did with mabs wouldn't have worked anyway because Abs is not the same 
as abs.
And it wasn't necessary.

Berend


 Again, R gives an error...probably because I made an error, but I truly fail
 to see where. I hope somebody can explain to me clearly what I'm doing wrong
 and what I should do to instead.
 Any help is greatly appreciated, thanks !
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Help-on-predict-lm-tp4509586p4509586.html
 Sent from the R help mailing list archive at Nabble.com.
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ignore error getting next result


On Mar 27, 2012, at 2:18 PM, C Lin wrote:

 As a matter of fact, I did read the FAQ. However, in the FAQ coef()  
 is used to return the coefficients of lm() if it succeeded.
 I cannot find similar function for pvalue.

So your question has nothing to do with the subject line? If you are  
trying to get information about the object returned by the wilcox.test  
function,  then you should be looking at the help page in the Value  
section for that function.

-- 
David.

  CC: r-help@r-project.org
  From: dwinsem...@comcast.net
  To: bac...@hotmail.com
  Subject: Re: [R] ignore error getting next result
  Date: Tue, 27 Mar 2012 13:40:39 -0400
 
 
  On Mar 27, 2012, at 12:56 PM, C Lin wrote:
 
  
   Dear All,
  
   How do I ignore an error and still getting result of next  
 iteration.
   I am trying to do wilcox.test on a loop, when the test fail, I  
 would
   like to continue doing the next iteration and getting the p-value.
   I tried to do tryCatch or try but I cannot retrieve the p-value if
   the test is not fail.
  
   sample code:
  
   test2=list(numeric(0),c(10,20));
   test1=list(c(1),c(1,2,3,4));
   for (i in 1:2){
   wtest=wilcox.test(test1[[i]],test2[[i]])
   }
  
   i=1 will fail, I want to ignore this and get the pvalue for i=2.
 
  Please read the FAQ entry And you would be advise to read through  
 the
  rest of the FAQ as well.
 
  http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-capture-or-ignore-errors-in-a-long-simulation_003f
 
 
  
   Thanks,
   Lin
   [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
  David Winsemius, MD
  West Hartford, CT
 

David Winsemius, MD
West Hartford, CT


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convert day of year back into a date format.

2012-03-27 Thread Justin Haynes

There may very well be a better solution, but this works.

format(strptime(dayofyear, format=%j), format=%m-%d)

On Tue, Mar 27, 2012 at 11:12 AM, Sam Albers tonightstheni...@gmail.comwrote:

 Hello,

 I am having trouble figuring out how to convert a Day of Year integer
 back into a Date format. For example I have the following:

 date -
 c('2008-01-01','2008-01-02','2008-01-03','2008-01-04','2008-01-05','2008-01-06','2008-01-07',

 '2008-01-08','2008-01-09','2008-01-10','2008-01-11','2008-01-12','2008-01-13','2008-01-14','2008-01-15',

 '2008-01-16','2008-01-17','2008-01-18','2008-01-19','2008-01-20','2008-01-21','2008-01-22','2008-01-23')

 ## this is then converted into a number corresponding to the day of
 the year like so:

 dayofyear - strptime(date, format=%Y-%m-%d)$yday + 1

 ## Now my question is how do I get back to a date format (obviously
 omitting the year).
 ## The end result is that I'd like to be able to have axis labels as
 something like Month-Day or just Month
 ## instead of just an integers which isn't always intuitive for people
 but I can't seem to figure out how to tell R
 ## to recognize an integer as a date.

 Any suggestions?

 Many thanks in advance!

 Sam

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] installing R 2.14.2

2012-03-27 Thread Steve Lianoglou

Hi,

On Tue, Mar 27, 2012 at 1:03 PM, Heba S abehsun...@hotmail.com wrote:

 Hello,I  am trying to install a newer version of R (R 2.14.2) from this 
 linkhttp://cran.r-project.org/bin/macosx/
 However I am getting an error that it can not be installed on my computer. My 
 Mac is version 10.6.8. Can you please advise me what the problem. I need the 
 newer version to install the ggm package.

If you want any meaningful help, you'll have to provide the exact
error that you're getting, so please reproduce the error message
(verbatim) in your follow up email.

Also let us know when during the installation process the error occurs.

Thanks,

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ignore error getting next result

2012-03-27 Thread C Lin


I'm sorry. I do appreciate you are trying to help. However, what I am trying to 
do is not exactly the same as in FAQ.
 
If I do the following:
 
test2=list(numeric(0),c(10,20));
test1=list(c(1),c(1,2,3,4));
for (i in 1:2){
 tryCatch(wilcox.test(test1[[i]],test2[[i]]),error = function(e) NULL);
}

I cannot get the p-value of the test for i=2.

any other input? anyone?
 
Thanks,
Lin 




CC: r-help@r-project.org
From: dwinsem...@comcast.net
To: bac...@hotmail.com
Subject: Re: [R] ignore error getting next result
Date: Tue, 27 Mar 2012 14:26:40 -0400




On Mar 27, 2012, at 2:18 PM, C Lin wrote:


As a matter of fact, I did read the FAQ. However, in the FAQ coef() is used to 
return the coefficients of lm() if it succeeded. 
I cannot find similar function for pvalue.


So your question has nothing to do with the subject line? If you are trying to 
get information about the object returned by the wilcox.test function,  then 
you should be looking at the help page in the Value section for that function.


-- 
David.






 CC: r-help@r-project.org
 From: dwinsem...@comcast.net
 To: bac...@hotmail.com
 Subject: Re: [R] ignore error getting next result
 Date: Tue, 27 Mar 2012 13:40:39 -0400
 
 
 On Mar 27, 2012, at 12:56 PM, C Lin wrote:
 
 
  Dear All,
 
  How do I ignore an error and still getting result of next iteration.
  I am trying to do wilcox.test on a loop, when the test fail, I would 
  like to continue doing the next iteration and getting the p-value.
  I tried to do tryCatch or try but I cannot retrieve the p-value if 
  the test is not fail.
 
  sample code:
 
  test2=list(numeric(0),c(10,20));
  test1=list(c(1),c(1,2,3,4));
  for (i in 1:2){
  wtest=wilcox.test(test1[[i]],test2[[i]])
  }
 
  i=1 will fail, I want to ignore this and get the pvalue for i=2.
 
 Please read the FAQ entry And you would be advise to read through the 
 rest of the FAQ as well.
 
 http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-capture-or-ignore-errors-in-a-long-simulation_003f
 
 
 
  Thanks,
  Lin 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius, MD
 West Hartford, CT
 





David Winsemius, MD
West Hartford, CT
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ignore error getting next result