Re: [R] Accessing terminal datasets in Ctree()
On Mon, 2 May 2016, Preetam Pal wrote: Great, thank you so much Achim.But one issue, in case I do not know how many terminal nodes would be there, what do I do? Note that I do not need the datasets corresponding to the intermediate nodes only need the terminal datasets. With predict(ct, type = "node") you can set up a new variable, e.g., iris$node <- factor(predict(ct, type = "node")) and then use this to obtain the subset corresponding to each of the terminal nodes. Regards, Preetam On Tue, May 3, 2016 at 3:08 AM, Achim Zeileiswrote: On Mon, 2 May 2016, Preetam Pal wrote: Hi guys, If I am applying ctree() on a data (specifying some control parameters like maxdepth), is there a way I can programmatically access the (smaller) datasets corresponding to the terminal nodes in the tree? Say, if there are 7 terminal nodes, I need those 7 datasets (of course, I can look at the respective node-splitting attributes and write out a filtering function - but clearly too much to ask for if I have a large number of terminal nodes). Intention is to perform regression on each of these terminal datasets. If you use the "partykit" implementation you can do: library("partykit") ct <- ctree(Species ~ ., data = iris) data_party(ct, id = 6) to obtain the data associated with node 6 for example. You can also use ct[6] to obtain the subtree and ct[6]$data for its associated data. For setting up a factor with the terminal node IDs, you can also use predict(ct, type = "node") and then use that in lm() etc. Finally, note that there is also lmtree() and glmtree() for trees with (generalized) linear models in their nodes. Regards, Preetam -- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman HallIndian Statistical Institute, B.H.O.S. Kolkata. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing terminal datasets in Ctree()
Again, really appreciate your help on this. Thanks, Achim. -Preetam On Tue, May 3, 2016 at 3:22 AM, Achim Zeileiswrote: > On Mon, 2 May 2016, Preetam Pal wrote: > > Great, thank you so much Achim.But one issue, in case I do not know how >> many >> terminal nodes would be there, what do I do? Note that I do not need the >> datasets corresponding to the intermediate nodes only need the terminal >> datasets. >> > > With predict(ct, type = "node") you can set up a new variable, e.g., > > iris$node <- factor(predict(ct, type = "node")) > > and then use this to obtain the subset corresponding to each of the > terminal nodes. > > > Regards, >> Preetam >> >> On Tue, May 3, 2016 at 3:08 AM, Achim Zeileis >> wrote: >> On Mon, 2 May 2016, Preetam Pal wrote: >> >> Hi guys, >> >> If I am applying ctree() on a data (specifying some >> control parameters like >> maxdepth), is there a way I can programmatically >> access the (smaller) >> datasets corresponding to the terminal nodes in the >> tree? Say, if there are >> 7 terminal nodes, I need those 7 datasets (of >> course, I can look at the >> respective node-splitting attributes and write out a >> filtering function - >> but clearly too much to ask for if I have a large >> number of terminal >> nodes). Intention is to perform regression on each >> of these terminal >> datasets. >> >> >> If you use the "partykit" implementation you can do: >> >> library("partykit") >> ct <- ctree(Species ~ ., data = iris) >> data_party(ct, id = 6) >> >> to obtain the data associated with node 6 for example. You can >> also use ct[6] to obtain the subtree and ct[6]$data for its >> associated data. >> >> For setting up a factor with the terminal node IDs, you can also >> use predict(ct, type = "node") and then use that in lm() etc. >> >> Finally, note that there is also lmtree() and glmtree() for >> trees with (generalized) linear models in their nodes. >> >> Regards, >> Preetam >> >> -- >> Preetam Pal >> (+91)-9432212774 >> M-Stat 2nd Year, >> Room No. N-114 >> Statistics Division, >>C.V.Raman >> Hall >> Indian Statistical Institute, >> B.H.O.S. >> Kolkata. >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE >> and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, >> reproducible code. >> >> >> >> >> -- >> Preetam Pal >> (+91)-9432212774 >> M-Stat 2nd Year, Room No. >> N-114 >> Statistics Division, C.V.Raman >> HallIndian Statistical Institute, B.H.O.S. >> Kolkata. >> >> -- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing terminal datasets in Ctree()
Great, thank you so much Achim. But one issue, in case I do not know how many terminal nodes would be there, what do I do? Note that I do not need the datasets corresponding to the intermediate nodes only need the terminal datasets. Regards, Preetam On Tue, May 3, 2016 at 3:08 AM, Achim Zeileiswrote: > On Mon, 2 May 2016, Preetam Pal wrote: > > Hi guys, >> >> If I am applying ctree() on a data (specifying some control parameters >> like >> maxdepth), is there a way I can programmatically access the (smaller) >> datasets corresponding to the terminal nodes in the tree? Say, if there >> are >> 7 terminal nodes, I need those 7 datasets (of course, I can look at the >> respective node-splitting attributes and write out a filtering function - >> but clearly too much to ask for if I have a large number of terminal >> nodes). Intention is to perform regression on each of these terminal >> datasets. >> > > If you use the "partykit" implementation you can do: > > library("partykit") > ct <- ctree(Species ~ ., data = iris) > data_party(ct, id = 6) > > to obtain the data associated with node 6 for example. You can also use > ct[6] to obtain the subtree and ct[6]$data for its associated data. > > For setting up a factor with the terminal node IDs, you can also use > predict(ct, type = "node") and then use that in lm() etc. > > Finally, note that there is also lmtree() and glmtree() for trees with > (generalized) linear models in their nodes. > > Regards, >> Preetam >> >> -- >> Preetam Pal >> (+91)-9432212774 >> M-Stat 2nd Year, Room No. >> N-114 >> Statistics Division, C.V.Raman >> Hall >> Indian Statistical Institute, B.H.O.S. >> Kolkata. >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> -- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing terminal datasets in Ctree()
On Mon, 2 May 2016, Preetam Pal wrote: Hi guys, If I am applying ctree() on a data (specifying some control parameters like maxdepth), is there a way I can programmatically access the (smaller) datasets corresponding to the terminal nodes in the tree? Say, if there are 7 terminal nodes, I need those 7 datasets (of course, I can look at the respective node-splitting attributes and write out a filtering function - but clearly too much to ask for if I have a large number of terminal nodes). Intention is to perform regression on each of these terminal datasets. If you use the "partykit" implementation you can do: library("partykit") ct <- ctree(Species ~ ., data = iris) data_party(ct, id = 6) to obtain the data associated with node 6 for example. You can also use ct[6] to obtain the subtree and ct[6]$data for its associated data. For setting up a factor with the terminal node IDs, you can also use predict(ct, type = "node") and then use that in lm() etc. Finally, note that there is also lmtree() and glmtree() for trees with (generalized) linear models in their nodes. Regards, Preetam -- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.