Hi DB Tsai, Thank you again for your insightful comments!
1) I agree the sorting method you suggested is a very efficient way to handle the unordered categorical variables in binary classification and regression. I propose we have a Spark ML Transformer to do the sorting and encoding, bringing the benefits to many tree based methods. How about I open a jira for this? 2) For L2/L1 regularization vs Learning rate (I use this name instead shrinkage to avoid confusion), I have the following observations: Suppose G and H are the sum (over the data assigned to a leaf node) of the 1st and 2nd derivative of the loss evaluated at f_m, respectively. Then for this leaf node, * With a learning rate eta, f_{m+1} = f_m - G/H*eta * With a L2 regularization coefficient lambda, f_{m+1} =f_m - G/(H+lambda) If H>0 (convex loss), both approach lead to "shrinkage": * For the learning rate approach, the percentage of shrinkage is uniform for any leaf node. * For L2 regularization, the percentage of shrinkage would adapt to the number of instances assigned to a leaf node: more instances => larger G and H => less shrinkage. This behavior is intuitive to me. If the value estimated from this node is based on a large amount of data, the value should be reliable and less shrinkage is needed. I suppose we could have something similar for L1. I am not aware of theoretical results to conclude which method is better. Likely to be dependent on the data at hand. Implementing learning rate is on my radar for version 0.2. I should be able to add it in a week or so. I will send you a note once it is done. Thanks, Meihua On Tue, Oct 27, 2015 at 1:02 AM, DB Tsai <dbt...@dbtsai.com> wrote: > Hi Meihua, > > For categorical features, the ordinal issue can be solved by trying > all kind of different partitions 2^(q-1) -1 for q values into two > groups. However, it's computational expensive. In Hastie's book, in > 9.2.4, the trees can be trained by sorting the residuals and being > learnt as if they are ordered. It can be proven that it will give the > optimal solution. I have a proof that this works for learning > regression trees through variance reduction. > > I'm also interested in understanding how the L1 and L2 regularization > within the boosting works (and if it helps with overfitting more than > shrinkage). > > Thanks. > > Sincerely, > > DB Tsai > ---------------------------------------------------------- > Web: https://www.dbtsai.com > PGP Key ID: 0xAF08DF8D > > > On Mon, Oct 26, 2015 at 8:37 PM, Meihua Wu <rotationsymmetr...@gmail.com> > wrote: >> Hi DB Tsai, >> >> Thank you very much for your interest and comment. >> >> 1) feature sub-sample is per-node, like random forest. >> >> 2) The current code heavily exploits the tree structure to speed up >> the learning (such as processing multiple learning node in one pass of >> the training data). So a generic GBM is likely to be a different >> codebase. Do you have any nice reference of efficient GBM? I am more >> than happy to look into that. >> >> 3) The algorithm accept training data as a DataFrame with the >> featureCol indexed by VectorIndexer. You can specify which variable is >> categorical in the VectorIndexer. Please note that currently all >> categorical variables are treated as ordered. If you want some >> categorical variables as unordered, you can pass the data through >> OneHotEncoder before the VectorIndexer. I do have a plan to handle >> unordered categorical variable using the approach in RF in Spark ML >> (Please see roadmap in the README.md) >> >> Thanks, >> >> Meihua >> >> >> >> On Mon, Oct 26, 2015 at 4:06 PM, DB Tsai <dbt...@dbtsai.com> wrote: >>> Interesting. For feature sub-sampling, is it per-node or per-tree? Do >>> you think you can implement generic GBM and have it merged as part of >>> Spark codebase? >>> >>> Sincerely, >>> >>> DB Tsai >>> ---------------------------------------------------------- >>> Web: https://www.dbtsai.com >>> PGP Key ID: 0xAF08DF8D >>> >>> >>> On Mon, Oct 26, 2015 at 11:42 AM, Meihua Wu >>> <rotationsymmetr...@gmail.com> wrote: >>>> Hi Spark User/Dev, >>>> >>>> Inspired by the success of XGBoost, I have created a Spark package for >>>> gradient boosting tree with 2nd order approximation of arbitrary >>>> user-defined loss functions. >>>> >>>> https://github.com/rotationsymmetry/SparkXGBoost >>>> >>>> Currently linear (normal) regression, binary classification, Poisson >>>> regression are supported. You can extend with other loss function as >>>> well. >>>> >>>> L1, L2, bagging, feature sub-sampling are also employed to avoid >>>> overfitting. >>>> >>>> Thank you for testing. I am looking forward to your comments and >>>> suggestions. Bugs or improvements can be reported through GitHub. >>>> >>>> Many thanks! >>>> >>>> Meihua >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org