[GitHub] spark pull request #19433: [SPARK-3162] [MLlib] Add local tree training for ...

smurching Tue, 24 Oct 2017 17:47:52 -0700

Github user smurching commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19433#discussion_r146731101
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/TrainingInfo.scala ---
    @@ -0,0 +1,144 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ml.tree.impl
    +
    +import scala.collection.mutable.ArrayBuffer
    +
    +import org.apache.spark.ml.tree.{LearningNode, Split}
    +import org.apache.spark.util.collection.BitSet
    +
    +/**
    + * Maintains intermediate state of data (columns) and tree during local 
tree training.
    + * Primary local tree training data structure; contains all information 
required to describe
    + * the state of the algorithm at any point during learning.??
    + *
    + * Nodes are indexed left-to-right along the periphery of the tree, with 
0-based indices.
    + * The "periphery" is the set of leaf nodes (active and inactive).
    + *
    + * @param columns  Array of columns.
    + *                 Each column is sorted first by nodes (left-to-right 
along the tree periphery);
    + *                 all columns share this first level of sorting.
    + *                 Within each node's group, each column is sorted based 
on feature value;
    + *                 this second level of sorting differs across columns.
    + * @param instanceWeights Array of weights for each training example
    + * @param nodeOffsets  Offsets into the columns indicating the first level 
of sorting (by node).
    + *                     The rows corresponding to the node activeNodes(i) 
are in the range
    + *                     [nodeOffsets(i)(0), nodeOffsets(i)(1)) .
    + * @param activeNodes  Nodes which are active (still being split).
    + *                     Inactive nodes are known to be leaves in the final 
tree.
    + */
    +private[impl] case class TrainingInfo(
    +    columns: Array[FeatureVector],
    +    instanceWeights: Array[Double],
    --- End diff --
    
    Good call, I'll move `instanceWeights` outside `TrainingInfo`



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19433: [SPARK-3162] [MLlib] Add local tree training for ...

Reply via email to