Re: [R] pheatmap: incomplete figure

2017-09-18 Thread David Winsemius

> On Sep 18, 2017, at 9:26 AM, Fix Ace via R-help  wrote:
> 
> Dear R Community,
> I tried to generate heatmap for a matrix of 1500 columns by 106 rows using 
> the following R script:
>> pheatmap(tf.vs.DE.1.removeAllZeroCol, fontsize=3,border_color=NA)
> and got the graph (as attached Fig 1)
> 
> Since the column labels appear very crowded, I tried to increase the 
> cellwidth to stretch the graph horizontally. The idea was to show the graph 
> section by section, but with clear/readable column labels (not overlapped 
> labels).
> So I typed:
>> pheatmap(tf.vs.DE.1.removeAllZeroCol, 
>> fontsize=3,cellwidth=3,cellheight=3,border_color=NA)
> However, this time I only got middle part of the original heatmap (as 
> attached Fig 2)
> I wonder if there is way I could output the whole graph after such horizontal 
> stretch. If not, how do I get the left end of the graph.
> 

Why not define a graphics device that is wider?

-- 

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   
-Gehm's Corollary to Clarke's Third Law

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[ESS] Aquamacs cannot find file at startup

2017-09-18 Thread Christian

Hi,
Starting Aquamacs 3.3 on MacbookPro MacOS 10.12.6, I get an error

Warning (initialization): An error occurred while loading 
‘/Users/hoffmannc/.emacs’:


File error: Cannot open load file, No such file or directory, 
~/elisp/vendor/ess/lisp/ess-site.el

---

My /Users/hoffmannc/.emacs looks like this
_
;; ESS: Emacs Speaks Statistics
(load "~/elisp/vendor/ess/lisp/ess-site.el")

;; Use shift-enter to split window & launch R (if not running), execute 
highlighted

;; region (if R running & area highlighted), or execute current line
;; (and move to next line, skipping comments). Nice.
;; See http://www.emacswiki.org/emacs/EmacsSpeaksStatistics,
;; FelipeCsaszar. Adapted to split vertically instead of
;; horizontally.

(setq ess-ask-for-ess-directory nil)
  (setq ess-local-process-name "R")
  (setq ansi-color-for-comint-mode 'filter)
  (setq comint-prompt-read-only t)
  (setq comint-scroll-to-bottom-on-input t)
  (setq comint-scroll-to-bottom-on-output t)
  (setq comint-move-point-for-output t)
  (defun my-ess-start-R ()
(interactive)
(if (not (member "*R*" (mapcar (function buffer-name) (buffer-list
  (progn
(delete-other-windows)
(setq w1 (selected-window))
(setq w1name (buffer-name))
(setq w2 (split-window w1 nil t))
(R)
(set-window-buffer w2 "*R*")
(set-window-buffer w1 w1name
  (defun my-ess-eval ()
(interactive)
(my-ess-start-R)
(if (and transient-mark-mode mark-active)
(call-interactively 'ess-eval-region)
  (call-interactively 'ess-eval-line-and-step)))
  (add-hook 'ess-mode-hook
'(lambda()
   (local-set-key [(shift return)] 'my-ess-eval)))
  (add-hook 'inferior-ess-mode-hook
'(lambda()
   (local-set-key [C-up] 
'comint-previous-matching-input-from-input)

   (local-set-key [C-down] 'comint-next-input)))
  (require 'ess-site)
_
..and the last line seems to cause the error, as can be seen by C-x C-e.
How can I correct this error?

Thanks for any hints.
Christian
--
Christian Hoffmann
Rigiblickstrasse 15b
CH-8915 Hausen am Albis
Switzerland
Telefon +41-(0)44-7640853

__
ESS-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/ess-help

Re: [R] Help/information required

2017-09-18 Thread David Winsemius

> On Sep 17, 2017, at 9:24 PM, Ajay Arvind Rao  
> wrote:
> 
> Hi,
> 
> We are using open source license of R to analyze data at our organization. 
> The system configuration are as follows:
> 
> *System configuration:
> 
> o   Operating System - Windows 7 Enterprise SP1, 64 bit (Desktop)
> 
> o   RAM - 8 GB
> 
> o   Processor - i5-6500 @ 3.2 Ghz
> 
> *R Version:
> 
> o   R Studio 1.0.136
> 
> o   R 3.4.0
> 
> While trying to merge two datasets we received the following resource error 
> message on running the code
> Code: merg_data <- 
> merge(x=Data_1Junto30Jun,y=flight_code,by.x="EB_FLNO1",by.y="EB_FLNO1",all.x 
> = TRUE)
> Error Message: Error: cannot allocate vector of size 5.8 Gb
> 
> Later we tried running the code differently but error still remained
> Code: merg_data <- sqldf("Select * from Data_1Junto30Jun as a inner join 
> flight_code as b on a.EB_FLNO1=b.EB_FLNO1")
> Error Message: Error: cannot allocate vector of size 200.0 Mb
> 
> We have upgraded the RAM to 8 GB couple of months back. Can you let us know 
> options to resolve the above issue without having to increase the RAM? The 
> size of the datasets are as follows:
> 
> *Data_1Junto30Jun (513476 obs of 32 variables). Data size - 172033368 
> bytes / 172 MB
> 
> *flight_code (478105 obs of 2 variables). Data size - 3836304 bytes / 
> 4 MB
> 
> 
> Help with determining system requirement:
> Is there a way to determine minimum system requirement (hardware and software)

There are some packages for working with data "out of memory". See bigmemory 
and other "big*" packages. See also the data.table package which has many 
satisfied users.  There are also several packages for handling data through 
database connections. That would be probably the preferred method for your use 
case.

R objects are almost always copied when an assignment is made and this means 
that you need at a minimum at least twice as much free (and in  _continuous_ 
chunks) memory. You will often be breaking up the memory with other code and 
other out-of-R processes. Windows was in the past notorious for having poor 
memory management. I don't know if Windows 7 continued that tradition or 
whether later versions might be useful to avoid  the problem.

A dataframe will consume about 10 bytes per row for numeric columns. Factor and 
character vectors are hashed so the memory consumed will depend on the degree 
of duplication of entries. That will also affect the merge operations. Merges 
will give you a Cartesian product so if you merge two dataframes with lots of 
duplicates you will often get a message such as: "Error: cannot allocate vector 
of size 5.8 Gb"

The second error you cite suggests that much of your 8Gb of storage has been 
fragmented.

Most of this information should be available via searching in Rhelp or RSeek.


> depending on size of the data, the way the data is loaded into R (directly 
> from server or in a flat file) and the type of analysis to be run?

No difference for the source of data but cannot comment on the type of analysis 
because that part of the question is too vague. (Aside from mentioning the 
issue of Cartesian multiplication of merge results which often trips up new 
users of database technology.)

> We have not been able to get any specific information related to this and are 
> estimating the requirements through a trial and error method. Any information 
> on this front will be helpful.

This suggests an impoverished ability for searching:

https://stackoverflow.com/search?q=%5Br%5D+memory+limitations

https://stackoverflow.com/search?q=%5Br%5D+memory+limitations+windows

http://markmail.org/search/?q=list%3Aorg.r-project.r-help+memory+limitations+windows

-- 
David.

> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   
-Gehm's Corollary to Clarke's Third Law

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help matching rows of a data frame

2017-09-18 Thread Bert Gunter
Yes. My understanding is that you want the identifier to have the same
number of rows as the data frame. A slight variant of David's solution
would then be:

do.call(paste0,x)


-- Bert



On Mon, Sep 18, 2017 at 8:29 AM, David Winsemius 
wrote:

>
> > On Sep 18, 2017, at 5:13 AM, Therneau, Terry M., Ph.D. <
> thern...@mayo.edu> wrote:
> >
> > This question likely has a 1 line answer, I'm just not seeing it.  (2,
> 3, or 10 lines is fine too.)
> >
> > For a vector I can do group  <- match(x, unqiue(x)) to get a vector that
> labels each element of x.
> > What is an equivalent if x is a data frame?
> >
>
> In the past I've use apply with past to generate "group" identifiers:
>
>
> x<-data.frame("X0"=c("A","B","C","C","D","A"), "X1"=c(1,2,1,1,3,1))
>
> apply(x, 1, paste, collapse=".")
> [1] "A.1" "B.2" "C.1" "C.1" "D.3" "A.1"
>
>
> > The result does not have to be fast: the data set will have < 100
> elements.  Since this is inside the survival package, and that package is
> on  the 'recommended' list, I can't depend on any package outside the
> recommended list.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'
>  -Gehm's Corollary to Clarke's Third Law
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help matching rows of a data frame

2017-09-18 Thread David Winsemius

> On Sep 18, 2017, at 5:13 AM, Therneau, Terry M., Ph.D.  
> wrote:
> 
> This question likely has a 1 line answer, I'm just not seeing it.  (2, 3, or 
> 10 lines is fine too.)
> 
> For a vector I can do group  <- match(x, unqiue(x)) to get a vector that 
> labels each element of x.
> What is an equivalent if x is a data frame?
> 

In the past I've use apply with past to generate "group" identifiers:


x<-data.frame("X0"=c("A","B","C","C","D","A"), "X1"=c(1,2,1,1,3,1))

apply(x, 1, paste, collapse=".")
[1] "A.1" "B.2" "C.1" "C.1" "D.3" "A.1"


> The result does not have to be fast: the data set will have < 100 elements.  
> Since this is inside the survival package, and that package is on  the 
> 'recommended' list, I can't depend on any package outside the recommended 
> list.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   
-Gehm's Corollary to Clarke's Third Law

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help matching rows of a data frame

2017-09-18 Thread William Dunlap via R-help
You could use merge() with an ID column pasted onto the table of names, as
in

> tbl <- data.frame(FirstName=c("Abe","Abe","Bob","Chuck","Chuck"),
Surname=c("Xavier","Yates","Yates","Yates","Zapf"), Id=paste0("P",101:105))
> tbl
  FirstName Surname   Id
1   Abe  Xavier P101
2   Abe   Yates P102
3   Bob   Yates P103
4 Chuck   Yates P104
5 ChuckZapf P105
> merge(data.frame(FirstName=c("Abe","Chuck","Dave"),
Surname=rep("Yates",3)), tbl, all.x=TRUE)
  FirstName Surname   Id
1   Abe   Yates P102
2 Chuck   Yates P104
3  Dave   Yates 


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Sep 18, 2017 at 5:13 AM, Therneau, Terry M., Ph.D. <
thern...@mayo.edu> wrote:

> This question likely has a 1 line answer, I'm just not seeing it.  (2, 3,
> or 10 lines is fine too.)
>
> For a vector I can do group  <- match(x, unqiue(x)) to get a vector that
> labels each element of x.
> What is an equivalent if x is a data frame?
>
> The result does not have to be fast: the data set will have < 100
> elements.  Since this is inside the survival package, and that package is
> on  the 'recommended' list, I can't depend on any package outside the
> recommended list.
>
> Terry T.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data arrangement for PLSDA using the ropls package

2017-09-18 Thread Bert Gunter
If this is a bioconductor package, why do you not post on the bioconductor
list?

-- Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Mon, Sep 18, 2017 at 1:51 AM, 
wrote:

> Hello,
> I would like to do a partial least square discriminant analysis (PLSDA) in
> R using the package "ropls"
> Which is in R available via the R command :
>
> source("https://bioconductor.org/biocLite.R;)
>
> I try to do a PLSDA to illustrate the impact of two genders (AP,C) on 5
> compounds measured in persons (samples) should be illustrated.  When I try
> to do a PLSDA I get the warning message:
>
> "Single component model: only 'overview' and 'permutation' (in case of
> single response (O)PLS(-DA)) plots available"
>
>
>
> I assume it has something to do with the way I arrange my data into R. I
> tried to do it in a similar way as it has been done in the example of the
> package using the sacurine data set (bioconductor.org/packages/
> release/bioc/vignettes/ropls/inst/doc/ropls-vignette.pdf)
>
>
>
> Can somebody maybe tell me how I correctly have to arrange my data in
> order to perfom a PLSDA using the "ropls" package?
>
>
>
> Thank you very much,
>
> Mike
>
>
>
> Please find my code and an example data set below:
>
>
>
> CODE:
>
>
>
> #Input data and convert to data frame and define "Sample" as row
>
> dta<-read.csv("Demo.csv",sep=";",header=T)
>
> rownames(dta)<-dta$Sample
>
> dta
>
>
>
> #Remove non-numeric "Sample" and "Gender" rows and convert to matrix
>
> dta.exp<-dta[,c(-1,-7)]
>
> matrix<-as.matrix(dta.exp)
>
> str(matrix)
>
> matrix
>
>
>
> #create vector with "gender" as y-component
>
> dta.treatments<-dta[,7]
>
> dta.treatments
>
>
>
> dta.factor<-as.factor(dta.treatments)
>
>
>
> dta.plsda <- opls(matrix, dta.factor)
>
>
>
>
> DATA:
>
> > dput(dta)
>
> structure(list(Sample = structure(c(1L, 12L, 23L, 34L, 36L, 37L,
>
> 38L, 39L, 40L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L,
>
> 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 24L, 25L, 26L, 27L,
>
> 28L, 29L, 30L, 31L, 32L, 33L, 35L), .Label = c("sa1", "sa10",
>
> "sa11", "sa12", "sa13", "sa14", "sa15", "sa16", "sa17", "sa18",
>
> "sa19", "sa2", "sa20", "sa21", "sa22", "sa23", "sa24", "sa25",
>
> "sa26", "sa27", "sa28", "sa29", "sa3", "sa30", "sa31", "sa32",
>
> "sa33", "sa34", "sa35", "sa36", "sa37", "sa38", "sa39", "sa4",
>
> "sa40", "sa5", "sa6", "sa7", "sa8", "sa9"), class = "factor"),
>
> Comp1 = c(1.7686, 0.6873, 1.2322, 1.4874, 1.8986, 1.3484,
>
> 1.0959, 0.583, 1.039, 1.6133, 0.9595, 1.6377, 1.4538, 0.8737,
>
> 1.3363, 1.7881, 2.3604, 1.1239, 2.1281, 2.037, 0.5314, 0.7147,
>
> 0.5917, 0.6671, 0.6645, 0.9865, 1.019, 0.9664, 0.6966, 0.679,
>
> 0.7976, 0.8503, 1.2566, 0.5881, 0.8838, 0.6657, 0.7399, 0.5778,
>
> 0.7121, 1.1909), Comp2 = c(0.0284, 0.9064, 0, 0.7053, 0.7695,
>
> 0.337, 1.0418, 0.8346, 0.3884, 1.9946, 1.3296, 0.119, 0.0106,
>
> 0.7872, 1.0174, 0.0704, 0.0854, 0.4259, 0.0395, 0.0549, 2.4471,
>
> 1.8418, 2.9805, 1.1181, 0.5403, 2.7181, 1.4835, 0.875, 2.2205,
>
> 2.4106, 1.1967, 0.303, 0.1129, 2.5432, 2.328, 0.9839, 2.3583,
>
> 1.9589, 1.9918, 1.2232), Comp3 = c(2.9976, 1.6201, 0.7497,
>
> 1.371, 2.7035, 0.4533, 0.9927, 1.0973, 1.6702, 1.3696, 0.3392,
>
> 1.1489, 2.1086, 1.1586, 1.3645, 1.6008, 2.9567, 1.5721, 2.9633,
>
> 2.4623, 0.1103, 0.3137, 0.313, 0.2969, 0.5148, 0.7419, 0.5641,
>
> 0.7871, 0.7362, 0.8754, 0.4883, 0.8504, 1.4582, 0.1934, 0.764,
>
> 0.7515, 0.7143, 0.2139, 0.5743, 1.7305), Comp4 = c(0, 0,
>
> 0.603, 0, 1.6524, 0, 0, 0, 0, 1.1056, 0, 0, 0, 0, 0, 0, 5.7848,
>
> 0, 0, 0, 0, 0, 0, 0, 0, 0.7895, 3.4641, 0, 0, 1.7446, 0,
>
> 0, 1.5165, 0, 5.9645, 4.1878, 0.7313, 5.7994, 3.0168, 0),
>
> Comp5 = c(18.6058, 5.6489, 12.0842, 4.2708, 3.8489, 10.2139,
>
> 6.1149, 11.3373, 8.9013, 5.8342, 18.532, 17.9267, 8.7386,
>
> 6.9455, 7.3044, 19.0811, 10.8809, 10.7149, 4.7057, 0, 10.3088,
>
> 5.1514, 19.1218, 21.1768, 8.3797, 2.7146, 8.7405, 14.4817,
>
> 8.6571, 17.4254, 17.5725, 5.1233, 13.7539, 6.7396, 2.1342,
>
> 14.4216, 9.2952, 19.9525, 2.2317, 16.501), Gender = structure(c(1L,
>
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>
> 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AP", "C"
>
> ), class = "factor")), .Names = c("Sample", "Comp1", "Comp2",
>
> "Comp3", "Comp4", "Comp5", "Gender"), class = "data.frame", row.names =
> c("sa1",
>
> "sa2", "sa3", "sa4", "sa5", "sa6", "sa7", "sa8", "sa9", "sa10",
>
> "sa11", "sa12", "sa13", "sa14", "sa15", "sa16", "sa17", "sa18",
>
> "sa19", "sa20", "sa21", "sa22", "sa23", "sa24", "sa25", "sa26",
>
> "sa27", "sa28", "sa29", "sa30", "sa31", "sa32", "sa33", "sa34",
>
> "sa35", "sa36", "sa37", "sa38", "sa39", "sa40"))
>
>
>
>
> 

Re: [R] help matching rows of a data frame

2017-09-18 Thread K. Elo
Hi!
2017-09-18 07:13 -0500, Therneau, Terry M., Ph.D. wrote:
> This question likely has a 1 line answer, I'm just not seeing
> it.  (2, 3, or 10 lines is 
> fine too.)
> 
> For a vector I can do group  <- match(x, unqiue(x)) to get a vector
> that labels each 
> element of x.

Actually, you get a vector of indices matching 'unique(x)', not a
labelled vector.

> x<-c("A","B","C","A","C","D")
> group<-match(x, unique(x))
> group
[1] 1 2 3 1 3 4

> What is an equivalent if x is a data frame?

So you will generate an index where duplicated rows have the row index
of the first occurrence, right? This could work:

> x<-data.frame("X0"=c("A","B","C","C","D","A"), "X1"=c(1,2,1,1,3,1))
> group<-rownames(x)
> for (i in 1:(nrow(x)-1)) { 
     for (j in (i+1):nrow(x)) { 
        if (sum(as.numeric(x[i,]==x[j,]))==ncol(x)) { 
           group[j]<-group[i] }
     }
   }
>  group
[1] "1" "2" "3" "3" "5" "1"

HTH,
Kimmo

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help matching rows of a data frame

2017-09-18 Thread Jeff Newmiller
"Label" is not a clear term for data frames,  but most data frames have 
rownames. If dta is a data frame, not a tibble, 

rownames( dta )[ !duplicated( dta ) ]

Or could use row indexes directly

which( !duplicated( dta ) )
-- 
Sent from my phone. Please excuse my brevity.

On September 18, 2017 6:54:29 AM PDT, Eric Berger  wrote:
>Hi Terry,
>I take your question to mean how to label distinct rows of a data
>frame. If
>that is not your question please clarify.
>I found the row.match() function in the package prodlim that can be
>used to
>solve this.
>However since your request requires no additional dependencies I
>borrowed
>the relevant code from the row.match function.
>Here is some obfuscated code to provide your answer in one line, per
>your
>request. (less obfuscated code just below that.
>
>Assuming your data frame is called 'df':
>
>df[,ncol(df)+1] <- match( do.call("paste", c(df[, , drop = FALSE], sep
>=
>"\\r")), do.call("paste", c(unique(df)[, , drop = FALSE], sep = "\\r"))
>)
>
>The last column of df now contains the 'label' i.e. the row number of
>the
>first row in df that is the same as the given row.
>
>Somewhat less obfuscated
>
>getLabels <- function(df) {
>match( do.call("paste", c(df[, , drop = FALSE],
>sep = "\\r")),
> do.call("paste", c(unique(df)[, , drop
>= FALSE], sep = "\\r")) )
> }
>
>myDataFrame$label <- getLabels(myDataFrame)
>
>
>HTH,
>
>Eric
>
>
>On Mon, Sep 18, 2017 at 3:13 PM, Therneau, Terry M., Ph.D. <
>thern...@mayo.edu> wrote:
>
>> This question likely has a 1 line answer, I'm just not seeing it. 
>(2, 3,
>> or 10 lines is fine too.)
>>
>> For a vector I can do group  <- match(x, unqiue(x)) to get a vector
>that
>> labels each element of x.
>> What is an equivalent if x is a data frame?
>>
>> The result does not have to be fast: the data set will have < 100
>> elements.  Since this is inside the survival package, and that
>package is
>> on  the 'recommended' list, I can't depend on any package outside the
>> recommended list.
>>
>> Terry T.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert data into zoo object using Performance analytics package

2017-09-18 Thread Gabor Grothendieck
Depending on how you created df maybe your code has the column names
wrong.  In any case these 4 alternatives all work.  Start a fresh R
session and then copy and paste this into it.

library(zoo)
u  <- "https://faculty.washington.edu/ezivot/econ424/sbuxPrices.csv;
fmt <- "%m/%d/%Y"

# 1
sbux1.z <- read.csv.zoo(u, FUN = as.yearmon, format = fmt)

# 2
df <- read.csv(u)
sbux2.z <- read.zoo(df, FUN = as.yearmon, format = fmt)

# 3
df <- read.csv(u)
names(head(df))
## [1] "Date"  "Adj.Close"
sbux3.z <- zoo(df$Adj.Close, as.yearmon(df$Date, fmt))

# 4
df <- read.csv(u)
sbux4.z <- zoo(df[[2]], as.yearmon(df[[1]], fmt))

On Mon, Sep 18, 2017 at 7:36 AM, Upananda Pani  wrote:
> Dear All,
>
> While i am trying convert data frame object to zoo object I am
> getting numeric(0) error in performance analytics package.
>
> The source code i am using from this website to learn r in finance:
> https://faculty.washington.edu/ezivot/econ424/returnCalculations.r
>
> # create zoo objects from data.frame objects
> dates.sbux = as.yearmon(sbux.df$Date, format="%m/%d/%Y")
> dates.msft = as.yearmon(msft.df$Date, format="%m/%d/%Y")
> sbux.z = zoo(x=sbux.df$Adj.Close, order.by=dates.sbux)
> msft.z = zoo(x=msft.df$Adj.Close, order.by=dates.msft)
> class(sbux.z)
> head(sbux.z)
>> head(sbux.z)
> Data:
> numeric(0)
>
> I will be grateful if anybody would like to guide me where i am making the
> mistake.
>
> With best regards,
> Upananda Pani
>
>
> --
>
>
> You may delay, but time will not.
>
>
> Research Scholar
> alternative mail id: up...@iitkgp.ac.in
> Department of HSS, IIT KGP
> KGP
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Q2/R2 ratio in PLSDA

2017-09-18 Thread michael.eisenring
Hello,

I would like to perform a Partial least square discriminate analysis (PLSDA) in 
R.

To do this I use the package mixOmics.

I could perform the PLSDA in R. however I would also like to perform a 
leave-one-out cross validation in order to assess the performance of my model. 
My supervisor told me that I should focus on the R2/Q2 ratios.

However when I read the instruction for running the "perf" function 
(mixomics.org/wp-content/uploads/2014/08/Running_perf_function4.pd) I found no 
test showing the R2/Q2 ratios for a PLSDA.

Following the instructions I ended up with an estimation of 3 different error 
rates (max.dist /centroids. Dist /mahalanobis. Dist) (page 9 of the PDF I 
mentioned above).

1.Are these 3 error rates different variations of R2/Q2 ratios?
2.Is there a rule telling me what values my errorates should have in order to 
have a good model performance
3. Is there a way to calculate R2/Q2 ratios for PLSDA using the mixOmics package

Thank you


Below I provide a simplified example data set and my code:

DATA:
> dput(dta)
structure(list(Treatment = structure(c(2L, 1L, 1L, 2L, 2L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L,
1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L), .Label = c("C", "CAT"), class = "factor"),
comp1 = c(0, 0.5677, 0.4486, 0.1772, 0.2145, 0.0302, 0.216,
0.0938, 0.1143, 0.6414, 0.2461, 0.0498, 0.144, 0.0953, 0.3208,
0.296, 0.418, 0.2247, 0.1921, 0.3792, 0.1394, 0.3069, 0.1211,
0.0355, 0.8968, 0.1981, 0.1187, 0.418, 0.4313, 0.0835), comp2 = c(1.8378,
2.3565, 4.6184, 2.3739, 1.3595, 1.9645, 1.2066, 0.9758, 2.259,
1.9429, 1.9797, 2.3005, 2.2246, 1.5881, 1.3051, 1.5218, 1.8931,
1.4476, 1.2672, 1.5634, 1.9313, 1.2859, 3.9039, 2.8956, 3.7026,
2.1356, 1.4473, 1.8477, 2.1495, 1.2323), comp3 = c(5.6652,
4.3214, 1.8763, 1.7093, 3.6592, 1.6457, 3.4825, 2.7332, 5.1582,
2.7374, 5.0283, 4.7604, 2.0357, 4.0205, 3.5946, 4.1626, 2.3342,
3.5049, 3.1272, 3.328, 3.5106, 3.7209, 1.8475, 5.4776, 2.4554,
5.1995, 3.9241, 4.5022, 4.1593, 4.3931), comp4 = c(3.7994,
4.2763, 3.7141, 1.166, 1.8907, 4.6145, 1.8988, 1.459, 3.2,
3.4403, 3.8283, 2.8549, 4.7747, 2.1849, 1.1687, 2.5519, 4.021,
1.2343, 1.4335, 1.8305, 4.5704, 0.2238, 3.6566, 4.0569, 2.1626,
3.2887, 1.4183, 2.1783, 2.6233, 3.2128), comp5 = c(1.0424,
2.2589, 0, 1.2217, 0, 0, 0, 0, 0, 0, 1.6675, 1.7548, 0, 1.0983,
1.2258, 1.314, 2.9437, 0, 0.9749, 0.8959, 0, 0.9189, 1.5026,
0, 1.0831, 2.2251, 0.8419, 1.1912, 2.2912, 0), comp6 = c(4.0781,
7.2073, 6.0885, 4.9657, 4.0133, 7.6783, 4.2064, 1.6421, 6.6831,
6.8437, 6.5152, 1.4712, 7.048, 4.9872, 4.4658, 1.3119, 10.2047,
4.7551, 3.7564, 4.829, 8.5836, 3.508, 6.0251, 5.1122, 2.2058,
6.8343, 3.9664, 2.005, 6.6678, 2.8081), comp7 = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.9795, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("Treatment", "comp1",
"comp2", "comp3", "comp4", "comp5", "comp6", "comp7"), class = "data.frame", 
row.names = c(NA,
-30L))


CODE
library(mixOmics)#plsda
library(MetabolAnalyze)#scaling
#read in data & convert to matrix
dta<-read.csv("test.csv",sep=";",header=T)
head(dta)

#Scale and remove "Sample" and create matrix
dta.red<-dta[,-1]
dta.scale<-scaling(dta.red,type="pareto")
matrix<-as.matrix(dta.scale)

#create vector with "Treatment"
dta.treatments<-dta[,1]
dta.factor<-as.factor(dta.treatments)
dta.factor


#PLSDA

#Performance/Loo cross validation
res.plsda2 = plsda(dta.scale, dta.factor, ncomp = 5)
tune.plsda2 = perf(res.plsda2, dist = "all", validation = "loo", progressBar = 
FALSE)
tune.plsda2$error.rate

dta.plsda2<-plsda(dta.scale, dta.factor,scale=F,mode="classic")
dta.plsda2






plotIndiv(dta.plsda2, ind.names = dta.factor, ellipse = TRUE, legend =TRUE)
plotArrow(dta.plsda2, ind.names = dta.factor, legend =TRUE)
plotVar(dta.plsda2, cex = 2)



plot(dta.plsda,typeVc = "x-score",parAsColFcVn = dta.factor,parEllipsesL = TRUE)


Eisenring Michael, Dr.

Federal Department of Economic Affairs, Education and Research
EAER
Agroecology and Environment
Biosafety

Reckenholzstrasse 191, CH-8046 Z�rich
Tel. +41 58 468 7181
Fax +41 58 468 7201
michael.eisenr...@agroscope.admin.ch
www.agroscope.ch


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help matching rows of a data frame

2017-09-18 Thread Eric Berger
Hi Terry,
I take your question to mean how to label distinct rows of a data frame. If
that is not your question please clarify.
I found the row.match() function in the package prodlim that can be used to
solve this.
However since your request requires no additional dependencies I borrowed
the relevant code from the row.match function.
Here is some obfuscated code to provide your answer in one line, per your
request. (less obfuscated code just below that.

Assuming your data frame is called 'df':

df[,ncol(df)+1] <- match( do.call("paste", c(df[, , drop = FALSE], sep =
"\\r")), do.call("paste", c(unique(df)[, , drop = FALSE], sep = "\\r")) )

The last column of df now contains the 'label' i.e. the row number of the
first row in df that is the same as the given row.

Somewhat less obfuscated

getLabels <- function(df) {
  match( do.call("paste", c(df[, , drop = FALSE],
sep = "\\r")),
 do.call("paste", c(unique(df)[, , drop
= FALSE], sep = "\\r")) )
 }

myDataFrame$label <- getLabels(myDataFrame)


HTH,

Eric


On Mon, Sep 18, 2017 at 3:13 PM, Therneau, Terry M., Ph.D. <
thern...@mayo.edu> wrote:

> This question likely has a 1 line answer, I'm just not seeing it.  (2, 3,
> or 10 lines is fine too.)
>
> For a vector I can do group  <- match(x, unqiue(x)) to get a vector that
> labels each element of x.
> What is an equivalent if x is a data frame?
>
> The result does not have to be fast: the data set will have < 100
> elements.  Since this is inside the survival package, and that package is
> on  the 'recommended' list, I can't depend on any package outside the
> recommended list.
>
> Terry T.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help understanding why glm and lrm.fit runs with my data, but lrm does not

2017-09-18 Thread Bonnett, Laura
Many thanks for the assistance.  I am using a small sample of GUSTO-1 as a 
teaching demonstration.  The Gusto-1 dataset in various smaller subsets is 
available from this website: 
http://clinicalpredictionmodels.org/doku.php?id=rcode_and_data:start  which is 
associated with the Clinical Prediction Models book by Steyerberg.

Many thanks again for your assistance.

Kind regards,
Laura

From: harre...@gmail.com [mailto:harre...@gmail.com] On Behalf Of Frank Harrell
Sent: 14 September 2017 17:22
To: David Winsemius 
Cc: Bonnett, Laura ; r-help@r-project.org
Subject: Re: [R] Help understanding why glm and lrm.fit runs with my data, but 
lrm does not

Fixed 'maxiter' in the help file.  Thanks.

Please give the original source of that dataset.

That dataset is a tiny sample of GUSTO-I and not large enough to fit this model 
very reliably.

A nomogram using the full dataset (not publicly available to my knowledge) is 
already available in http://biostat.mc.vanderbilt.edu/tmp/bbr.pdf

Use lrm, not lrm.fit for this.  Adding maxit=20 will probably make it work on 
the small dataset but still not clear on why you are using this dataset.

Frank



Frank E Harrell Jr

Professor

School of Medicine


Department of Biostatistics

Vanderbilt University


On Thu, Sep 14, 2017 at 10:48 AM, David Winsemius 
> wrote:

> On Sep 14, 2017, at 12:30 AM, Bonnett, Laura 
> > wrote:
>
> Dear all,
>
> I am using the publically available GustoW dataset.  The exact version I am 
> using is available here: 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Fopen%3Fid%3D0B4oZ2TQA0PAoUm85UzBFNjZ0Ulk=02%7C01%7Cf.harrell%40vanderbilt.edu%7Cadb58b13c3994f89209708d4fb8807f0%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636410009046132507=UZgX3%2Ba%2FU2Eeh8ybHMI6JnF0Npd2XJPXAzlmtEhDgOY%3D=0
>
> I would like to produce a nomogram for 5 covariates - AGE, HYP, KILLIP, HRT 
> and ANT.  I have successfully fitted a logistic regression model using the 
> "glm" function as shown below.
>
> library(rms)
> gusto <- spss.get("GustoW.sav")
> fit <- 
> glm(DAY30~AGE+HYP+factor(KILLIP)+HRT+ANT,family=binomial(link="logit"),data=gusto,x=TRUE,y=TRUE)
>
> However, my review of the literature and other websites suggest I need to use 
> "lrm" for the purposes of producing a nomogram.  When I run the command using 
> "lrm" (see below) I get an error message saying:
> Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
>  Unable to fit model using "lrm.fit"
>
> My code is as follows:
> gusto2 <- gusto[,c(1,3,5,8,9,10)]
> gusto2$HYP <- factor(gusto2$HYP, labels=c("No","Yes"))
> gusto2$KILLIP <- factor(gusto2$KILLIP, labels=c("1","2","3","4"))
> gusto2$HRT <- factor(gusto2$HRT, labels=c("No","Yes"))
> gusto2$ANT <- factor(gusto2$ANT, labels=c("No","Yes"))
> var.labels=c(DAY30="30-day Mortality", AGE="Age in Years", KILLIP="Killip 
> Class", HYP="Hypertension", HRT="Tachycardia", ANT="Anterior Infarct 
> Location")
> label(gusto2)=lapply(names(var.labels),function(x) 
> label(gusto2[,x])=var.labels[x])
>
> ddist = datadist(gusto2)
> options(datadist='ddist')
>
> fit1 <- lrm(DAY30~AGE+HYP+KILLIP+HRT+ANT,gusto2)
>
> Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
>  Unable to fit model using "lrm.fit"
>
> Online solutions to this problem involve checking whether any variables are 
> redundant.  However, the results for my data suggest  that none are.
> redun(~AGE+HYP+KILLIP+HRT+ANT,gusto2)
>
> Redundancy Analysis
>
> redun(formula = ~AGE + HYP + KILLIP + HRT + ANT, data = gusto2)
>
> n: 2188 p: 5nk: 3
>
> Number of NAs:   0
>
> Transformation of target variables forced to be linear
>
> R-squared cutoff: 0.9   Type: ordinary
>
> R^2 with which each variable can be predicted from all other variables:
>
>   AGEHYP KILLIPHRTANT
> 0.028  0.032  0.053  0.046  0.040
>
> No redundant variables
>
> I've also tried just considering "lrm.fit" and that code seems to run without 
> error too:
> lrm.fit(cbind(gusto2$AGE,gusto2$KILLIP,gusto2$HYP,gusto2$HRT,gusto2$ANT),gusto2$DAY30)
>
> Logistic Regression Model
>
> lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP, gusto2$HRT,
> gusto2$ANT), y = gusto2$DAY30)
>
>   Model Likelihood DiscriminationRank Discrim.
>  Ratio Test   Indexes   Indexes
> Obs  2188LR chi2 233.59R2   0.273C   0.846
>  0   2053d.f. 5g1.642Dxy 0.691
>  1135Pr(> chi2) <0.0001gr   5.165gamma   0.696
> max |deriv| 4e-09  gp   0.079tau-a   0.080
>Brier0.048
>
>   Coef S.E.   Wald Z Pr(>|Z|)
> Intercept -13.8515 

[R] help matching rows of a data frame

2017-09-18 Thread Therneau, Terry M., Ph.D.
This question likely has a 1 line answer, I'm just not seeing it.  (2, 3, or 10 lines is 
fine too.)


For a vector I can do group  <- match(x, unqiue(x)) to get a vector that labels each 
element of x.

What is an equivalent if x is a data frame?

The result does not have to be fast: the data set will have < 100 elements.  Since this is 
inside the survival package, and that package is on  the 'recommended' list, I can't 
depend on any package outside the recommended list.


Terry T.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Convert data into zoo object using Performance analytics package

2017-09-18 Thread Upananda Pani
Dear All,

While i am trying convert data frame object to zoo object I am
getting numeric(0) error in performance analytics package.

The source code i am using from this website to learn r in finance:
https://faculty.washington.edu/ezivot/econ424/returnCalculations.r

# create zoo objects from data.frame objects
dates.sbux = as.yearmon(sbux.df$Date, format="%m/%d/%Y")
dates.msft = as.yearmon(msft.df$Date, format="%m/%d/%Y")
sbux.z = zoo(x=sbux.df$Adj.Close, order.by=dates.sbux)
msft.z = zoo(x=msft.df$Adj.Close, order.by=dates.msft)
class(sbux.z)
head(sbux.z)
> head(sbux.z)
Data:
numeric(0)

I will be grateful if anybody would like to guide me where i am making the
mistake.

With best regards,
Upananda Pani


-- 


You may delay, but time will not.


Research Scholar
alternative mail id: up...@iitkgp.ac.in
Department of HSS, IIT KGP
KGP

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data arrangement for PLSDA using the ropls package

2017-09-18 Thread michael.eisenring
Hello,
I would like to do a partial least square discriminant analysis (PLSDA) in R 
using the package "ropls"
Which is in R available via the R command :

source("https://bioconductor.org/biocLite.R;)

I try to do a PLSDA to illustrate the impact of two genders (AP,C) on 5 
compounds measured in persons (samples) should be illustrated.  When I try to 
do a PLSDA I get the warning message:

"Single component model: only 'overview' and 'permutation' (in case of single 
response (O)PLS(-DA)) plots available"



I assume it has something to do with the way I arrange my data into R. I tried 
to do it in a similar way as it has been done in the example of the package 
using the sacurine data set 
(bioconductor.org/packages/release/bioc/vignettes/ropls/inst/doc/ropls-vignette.pdf)



Can somebody maybe tell me how I correctly have to arrange my data in order to 
perfom a PLSDA using the "ropls" package?



Thank you very much,

Mike



Please find my code and an example data set below:



CODE:



#Input data and convert to data frame and define "Sample" as row

dta<-read.csv("Demo.csv",sep=";",header=T)

rownames(dta)<-dta$Sample

dta



#Remove non-numeric "Sample" and "Gender" rows and convert to matrix

dta.exp<-dta[,c(-1,-7)]

matrix<-as.matrix(dta.exp)

str(matrix)

matrix



#create vector with "gender" as y-component

dta.treatments<-dta[,7]

dta.treatments



dta.factor<-as.factor(dta.treatments)



dta.plsda <- opls(matrix, dta.factor)




DATA:

> dput(dta)

structure(list(Sample = structure(c(1L, 12L, 23L, 34L, 36L, 37L,

38L, 39L, 40L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L,

14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 24L, 25L, 26L, 27L,

28L, 29L, 30L, 31L, 32L, 33L, 35L), .Label = c("sa1", "sa10",

"sa11", "sa12", "sa13", "sa14", "sa15", "sa16", "sa17", "sa18",

"sa19", "sa2", "sa20", "sa21", "sa22", "sa23", "sa24", "sa25",

"sa26", "sa27", "sa28", "sa29", "sa3", "sa30", "sa31", "sa32",

"sa33", "sa34", "sa35", "sa36", "sa37", "sa38", "sa39", "sa4",

"sa40", "sa5", "sa6", "sa7", "sa8", "sa9"), class = "factor"),

Comp1 = c(1.7686, 0.6873, 1.2322, 1.4874, 1.8986, 1.3484,

1.0959, 0.583, 1.039, 1.6133, 0.9595, 1.6377, 1.4538, 0.8737,

1.3363, 1.7881, 2.3604, 1.1239, 2.1281, 2.037, 0.5314, 0.7147,

0.5917, 0.6671, 0.6645, 0.9865, 1.019, 0.9664, 0.6966, 0.679,

0.7976, 0.8503, 1.2566, 0.5881, 0.8838, 0.6657, 0.7399, 0.5778,

0.7121, 1.1909), Comp2 = c(0.0284, 0.9064, 0, 0.7053, 0.7695,

0.337, 1.0418, 0.8346, 0.3884, 1.9946, 1.3296, 0.119, 0.0106,

0.7872, 1.0174, 0.0704, 0.0854, 0.4259, 0.0395, 0.0549, 2.4471,

1.8418, 2.9805, 1.1181, 0.5403, 2.7181, 1.4835, 0.875, 2.2205,

2.4106, 1.1967, 0.303, 0.1129, 2.5432, 2.328, 0.9839, 2.3583,

1.9589, 1.9918, 1.2232), Comp3 = c(2.9976, 1.6201, 0.7497,

1.371, 2.7035, 0.4533, 0.9927, 1.0973, 1.6702, 1.3696, 0.3392,

1.1489, 2.1086, 1.1586, 1.3645, 1.6008, 2.9567, 1.5721, 2.9633,

2.4623, 0.1103, 0.3137, 0.313, 0.2969, 0.5148, 0.7419, 0.5641,

0.7871, 0.7362, 0.8754, 0.4883, 0.8504, 1.4582, 0.1934, 0.764,

0.7515, 0.7143, 0.2139, 0.5743, 1.7305), Comp4 = c(0, 0,

0.603, 0, 1.6524, 0, 0, 0, 0, 1.1056, 0, 0, 0, 0, 0, 0, 5.7848,

0, 0, 0, 0, 0, 0, 0, 0, 0.7895, 3.4641, 0, 0, 1.7446, 0,

0, 1.5165, 0, 5.9645, 4.1878, 0.7313, 5.7994, 3.0168, 0),

Comp5 = c(18.6058, 5.6489, 12.0842, 4.2708, 3.8489, 10.2139,

6.1149, 11.3373, 8.9013, 5.8342, 18.532, 17.9267, 8.7386,

6.9455, 7.3044, 19.0811, 10.8809, 10.7149, 4.7057, 0, 10.3088,

5.1514, 19.1218, 21.1768, 8.3797, 2.7146, 8.7405, 14.4817,

8.6571, 17.4254, 17.5725, 5.1233, 13.7539, 6.7396, 2.1342,

14.4216, 9.2952, 19.9525, 2.2317, 16.501), Gender = structure(c(1L,

1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AP", "C"

), class = "factor")), .Names = c("Sample", "Comp1", "Comp2",

"Comp3", "Comp4", "Comp5", "Gender"), class = "data.frame", row.names = c("sa1",

"sa2", "sa3", "sa4", "sa5", "sa6", "sa7", "sa8", "sa9", "sa10",

"sa11", "sa12", "sa13", "sa14", "sa15", "sa16", "sa17", "sa18",

"sa19", "sa20", "sa21", "sa22", "sa23", "sa24", "sa25", "sa26",

"sa27", "sa28", "sa29", "sa30", "sa31", "sa32", "sa33", "sa34",

"sa35", "sa36", "sa37", "sa38", "sa39", "sa40"))




Eisenring Michael, Dr.

Federal Department of Economic Affairs, Education and Research
EAER
Agroecology and Environment
Biosafety

Reckenholzstrasse 191, CH-8046 Z�rich
Tel. +41 58 468 7181
Fax +41 58 468 7201
michael.eisenr...@agroscope.admin.ch
www.agroscope.ch


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

[R] Data arrangement for PLSDA using the ropls package

2017-09-18 Thread michael.eisenring
Hello,
I would like to do a partial least square discriminant analysis (PLSDA) in R 
using the package "ropls"
Which is in R available via the R command :

source("https://bioconductor.org/biocLite.R;)

When I try to do a PLSDA using my own data.
The impact of two genders (AP,C) on 5 compounds measured in persons (samples) 
should be illustrated.  When I try to do a PLSDA I get the warning message:

"Single component model: only 'overview' and 'permutation' (in case of single 
response (O)PLS(-DA)) plots available"



I assume it has something to do with the way I arrange my data into R. I tried 
to do it in a similar way as it has been done in the example of the package 
using the sacurine data set 
(bioconductor.org/packages/release/bioc/vignettes/ropls/inst/doc/ropls-vignette.pdf)



Can somebody maybe tell me how I correctly have to arrange my data in order to 
perfom a PLSDA using the "ropls" package?



Thank you very much,

Mike



Please find my code and an example data set below:



CODE:



#Input data and convert to data frame and define "Sample" as row

dta<-read.csv("Demo.csv",sep=";",header=T)

rownames(dta)<-dta$Sample

dta



#Remove non-numeric "Sample" and "Gender" rows and convert to matrix

dta.exp<-dta[,c(-1,-7)]

matrix<-as.matrix(dta.exp)

str(matrix)

matrix



#create vector with "gender" as y-component

dta.treatments<-dta[,7]

dta.treatments



dta.factor<-as.factor(dta.treatments)



dta.plsda <- opls(matrix, dta.factor)




DATA:

> dput(dta)

structure(list(Sample = structure(c(1L, 12L, 23L, 34L, 36L, 37L,

38L, 39L, 40L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L,

14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 24L, 25L, 26L, 27L,

28L, 29L, 30L, 31L, 32L, 33L, 35L), .Label = c("sa1", "sa10",

"sa11", "sa12", "sa13", "sa14", "sa15", "sa16", "sa17", "sa18",

"sa19", "sa2", "sa20", "sa21", "sa22", "sa23", "sa24", "sa25",

"sa26", "sa27", "sa28", "sa29", "sa3", "sa30", "sa31", "sa32",

"sa33", "sa34", "sa35", "sa36", "sa37", "sa38", "sa39", "sa4",

"sa40", "sa5", "sa6", "sa7", "sa8", "sa9"), class = "factor"),

Comp1 = c(1.7686, 0.6873, 1.2322, 1.4874, 1.8986, 1.3484,

1.0959, 0.583, 1.039, 1.6133, 0.9595, 1.6377, 1.4538, 0.8737,

1.3363, 1.7881, 2.3604, 1.1239, 2.1281, 2.037, 0.5314, 0.7147,

0.5917, 0.6671, 0.6645, 0.9865, 1.019, 0.9664, 0.6966, 0.679,

0.7976, 0.8503, 1.2566, 0.5881, 0.8838, 0.6657, 0.7399, 0.5778,

0.7121, 1.1909), Comp2 = c(0.0284, 0.9064, 0, 0.7053, 0.7695,

0.337, 1.0418, 0.8346, 0.3884, 1.9946, 1.3296, 0.119, 0.0106,

0.7872, 1.0174, 0.0704, 0.0854, 0.4259, 0.0395, 0.0549, 2.4471,

1.8418, 2.9805, 1.1181, 0.5403, 2.7181, 1.4835, 0.875, 2.2205,

2.4106, 1.1967, 0.303, 0.1129, 2.5432, 2.328, 0.9839, 2.3583,

1.9589, 1.9918, 1.2232), Comp3 = c(2.9976, 1.6201, 0.7497,

1.371, 2.7035, 0.4533, 0.9927, 1.0973, 1.6702, 1.3696, 0.3392,

1.1489, 2.1086, 1.1586, 1.3645, 1.6008, 2.9567, 1.5721, 2.9633,

2.4623, 0.1103, 0.3137, 0.313, 0.2969, 0.5148, 0.7419, 0.5641,

0.7871, 0.7362, 0.8754, 0.4883, 0.8504, 1.4582, 0.1934, 0.764,

0.7515, 0.7143, 0.2139, 0.5743, 1.7305), Comp4 = c(0, 0,

0.603, 0, 1.6524, 0, 0, 0, 0, 1.1056, 0, 0, 0, 0, 0, 0, 5.7848,

0, 0, 0, 0, 0, 0, 0, 0, 0.7895, 3.4641, 0, 0, 1.7446, 0,

0, 1.5165, 0, 5.9645, 4.1878, 0.7313, 5.7994, 3.0168, 0),

Comp5 = c(18.6058, 5.6489, 12.0842, 4.2708, 3.8489, 10.2139,

6.1149, 11.3373, 8.9013, 5.8342, 18.532, 17.9267, 8.7386,

6.9455, 7.3044, 19.0811, 10.8809, 10.7149, 4.7057, 0, 10.3088,

5.1514, 19.1218, 21.1768, 8.3797, 2.7146, 8.7405, 14.4817,

8.6571, 17.4254, 17.5725, 5.1233, 13.7539, 6.7396, 2.1342,

14.4216, 9.2952, 19.9525, 2.2317, 16.501), Gender = structure(c(1L,

1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AP", "C"

), class = "factor")), .Names = c("Sample", "Comp1", "Comp2",

"Comp3", "Comp4", "Comp5", "Gender"), class = "data.frame", row.names = c("sa1",

"sa2", "sa3", "sa4", "sa5", "sa6", "sa7", "sa8", "sa9", "sa10",

"sa11", "sa12", "sa13", "sa14", "sa15", "sa16", "sa17", "sa18",

"sa19", "sa20", "sa21", "sa22", "sa23", "sa24", "sa25", "sa26",

"sa27", "sa28", "sa29", "sa30", "sa31", "sa32", "sa33", "sa34",

"sa35", "sa36", "sa37", "sa38", "sa39", "sa40"))



Eisenring Michael, Dr.

Federal Department of Economic Affairs, Education and Research
EAER
Agroecology and Environment
Biosafety

Reckenholzstrasse 191, CH-8046 Z�rich
Tel. +41 58 468 7181
Fax +41 58 468 7201
michael.eisenr...@agroscope.admin.ch
www.agroscope.ch


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide