[ https://issues.apache.org/jira/browse/SPARK-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-5809. ------------------------------ Resolution: Not a Problem So I think this is basically NotAProblem in the sense that it's ToBeExpected with 1M features? > OutOfMemoryError in logDebug in RandomForest.scala > -------------------------------------------------- > > Key: SPARK-5809 > URL: https://issues.apache.org/jira/browse/SPARK-5809 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 1.2.0 > Reporter: Devesh Parekh > Assignee: Joseph K. Bradley > Priority: Minor > Labels: easyfix > > When training a GBM on sparse vectors produced by HashingTF, I get the > following OutOfMemoryError, where RandomForest is building a debug string to > log. > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:3326) > at > java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) > at > java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121 > ) > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) > at java.lang.StringBuilder.append(StringBuilder.java:136) > at > scala.collection.mutable.StringBuilder.append(StringBuilder.scala:197) > at > scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:327 > ) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:320) > at > scala.collection.AbstractTraversable.addString(Traversable.scala:105) > at > scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:286) > at > scala.collection.AbstractTraversable.mkString(Traversable.scala:105) > at > scala.collection.TraversableOnce$class.mkString(TraversableOnce.scala:288) > at > scala.collection.AbstractTraversable.mkString(Traversable.scala:105) > at > org.apache.spark.mllib.tree.RandomForest$$anonfun$run$9.apply(RandomForest.scala:152) > at > org.apache.spark.mllib.tree.RandomForest$$anonfun$run$9.apply(RandomForest.scala:152) > at org.apache.spark.Logging$class.logDebug(Logging.scala:63) > at > org.apache.spark.mllib.tree.RandomForest.logDebug(RandomForest.scala:67) > at > org.apache.spark.mllib.tree.RandomForest.run(RandomForest.scala:150) > at org.apache.spark.mllib.tree.DecisionTree.run(DecisionTree.scala:64) > at > org.apache.spark.mllib.tree.GradientBoostedTrees$.org$apache$spark$mllib$tree$GradientBoostedTrees$$boost(GradientBoostedTrees.scala:150) > at > org.apache.spark.mllib.tree.GradientBoostedTrees.run(GradientBoostedTrees.scala:63) > > at > org.apache.spark.mllib.tree.GradientBoostedTrees$.train(GradientBoostedTrees.scala:96) > A workaround until this is fixed is to modify log4j.properties in the conf > directory to filter out debug logs in RandomForest. For example: > log4j.logger.org.apache.spark.mllib.tree.RandomForest=WARN -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org