[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver
[ https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769738#comment-16769738 ] t oo commented on SPARK-23427: -- gentle ping > spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver > - > > Key: SPARK-23427 > URL: https://issues.apache.org/jira/browse/SPARK-23427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: SPARK 2.0 version >Reporter: Dhiraj >Priority: Critical > > We are facing issue around value of spark.sql.autoBroadcastJoinThreshold. > With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver > memory used flat. > With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used > goes up with rate depending upon the size of the autoBroadcastThreshold and > getting OOM exception. The problem is memory used by autoBroadcast is not > being free up in the driver. > Application imports oracle tables as master dataframes which are persisted. > Each job applies filter to these tables and then registered them as > tempViewTable . Then sql query are using to process data further. At the end > all the intermediate dataFrame are unpersisted. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver
[ https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517326#comment-16517326 ] Dean Wampler commented on SPARK-23427: -- Hi, Kazuaki. Any update on this issue? Any pointers on what you discovered? Thanks. > spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver > - > > Key: SPARK-23427 > URL: https://issues.apache.org/jira/browse/SPARK-23427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: SPARK 2.0 version >Reporter: Dhiraj >Priority: Critical > > We are facing issue around value of spark.sql.autoBroadcastJoinThreshold. > With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver > memory used flat. > With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used > goes up with rate depending upon the size of the autoBroadcastThreshold and > getting OOM exception. The problem is memory used by autoBroadcast is not > being free up in the driver. > Application imports oracle tables as master dataframes which are persisted. > Each job applies filter to these tables and then registered them as > tempViewTable . Then sql query are using to process data further. At the end > all the intermediate dataFrame are unpersisted. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver
[ https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381815#comment-16381815 ] Kazuaki Ishizaki commented on SPARK-23427: -- I am working for this (now trying to collect heap profiling), but I am working for other stuffs, too. We cannot promise to prepare a solution, but do the best. I do not have any workaround regarding this. > spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver > - > > Key: SPARK-23427 > URL: https://issues.apache.org/jira/browse/SPARK-23427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: SPARK 2.0 version >Reporter: Dhiraj >Priority: Critical > > We are facing issue around value of spark.sql.autoBroadcastJoinThreshold. > With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver > memory used flat. > With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used > goes up with rate depending upon the size of the autoBroadcastThreshold and > getting OOM exception. The problem is memory used by autoBroadcast is not > being free up in the driver. > Application imports oracle tables as master dataframes which are persisted. > Each job applies filter to these tables and then registered them as > tempViewTable . Then sql query are using to process data further. At the end > all the intermediate dataFrame are unpersisted. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver
[ https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381251#comment-16381251 ] Pratik Dhumal commented on SPARK-23427: --- Hello, For the purpose of development plan, # Can we expect solution for this in near future? # Do you have any suggestion/patch or workaround to deal with issue? I appreciate your help in this regard, thanks for your time. > spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver > - > > Key: SPARK-23427 > URL: https://issues.apache.org/jira/browse/SPARK-23427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: SPARK 2.0 version >Reporter: Dhiraj >Priority: Critical > > We are facing issue around value of spark.sql.autoBroadcastJoinThreshold. > With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver > memory used flat. > With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used > goes up with rate depending upon the size of the autoBroadcastThreshold and > getting OOM exception. The problem is memory used by autoBroadcast is not > being free up in the driver. > Application imports oracle tables as master dataframes which are persisted. > Each job applies filter to these tables and then registered them as > tempViewTable . Then sql query are using to process data further. At the end > all the intermediate dataFrame are unpersisted. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver
[ https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372000#comment-16372000 ] Pratik Dhumal commented on SPARK-23427: --- Sorry for very late reply. I am facing that issue when Autobroadcast value is not -1. Somehow, I couldn't reproduce same for Autobroadcast = -1, one thing I have noticed is, for me it goes through more than double the iteration when autobroadcast is -1. But, At certain point it does not iterate, and get stuck (with no errors and info message as *ContextCleaner: Cleaned accumulator* ) Also, This is the *stack trace* I'm getting. {code:java} // code placeholder Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) at java.lang.StringBuilder.append(StringBuilder.java:136) at java.lang.StringBuilder.append(StringBuilder.java:131) at scala.StringContext.standardInterpolator(StringContext.scala:125) at scala.StringContext.s(StringContext.scala:95) at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:220) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:54) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2546) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2192) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2199) at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2227) at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2226) at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2559) at org.apache.spark.sql.Dataset.count(Dataset.scala:2226). {code} Hope this helps. Thank you. > spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver > - > > Key: SPARK-23427 > URL: https://issues.apache.org/jira/browse/SPARK-23427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: SPARK 2.0 version >Reporter: Dhiraj >Priority: Critical > > We are facing issue around value of spark.sql.autoBroadcastJoinThreshold. > With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver > memory used flat. > With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used > goes up with rate depending upon the size of the autoBroadcastThreshold and > getting OOM exception. The problem is memory used by autoBroadcast is not > being free up in the driver. > Application imports oracle tables as master dataframes which are persisted. > Each job applies filter to these tables and then registered them as > tempViewTable . Then sql query are using to process data further. At the end > all the intermediate dataFrame are unpersisted. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver
[ https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368823#comment-16368823 ] Kazuaki Ishizaki commented on SPARK-23427: -- I got the OOM with the same stack trace for both configurations when I ran this program using 256gb heap. > spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver > - > > Key: SPARK-23427 > URL: https://issues.apache.org/jira/browse/SPARK-23427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: SPARK 2.0 version >Reporter: Dhiraj >Priority: Critical > > We are facing issue around value of spark.sql.autoBroadcastJoinThreshold. > With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver > memory used flat. > With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used > goes up with rate depending upon the size of the autoBroadcastThreshold and > getting OOM exception. The problem is memory used by autoBroadcast is not > being free up in the driver. > Application imports oracle tables as master dataframes which are persisted. > Each job applies filter to these tables and then registered them as > tempViewTable . Then sql query are using to process data further. At the end > all the intermediate dataFrame are unpersisted. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23427) spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver
[ https://issues.apache.org/jira/browse/SPARK-23427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368728#comment-16368728 ] Kazuaki Ishizaki commented on SPARK-23427: -- Thank you. I ran this program several times with 64GB heap size. I saw the following OOM in both cases `-1` or default (`10*1024`*1024`). I am running the program with other heap sizes. Is this OOM what you are seeing? If not, I would appreciate if you could upload stack trace when OOM occurred. {code:java} [info] org.apache.spark.sql.MyTest *** ABORTED *** (2 hours, 14 minutes, 36 seconds) [info] java.lang.OutOfMemoryError: [info] at java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:161) [info] at java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:155) [info] at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:125) [info] at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) [info] at java.lang.StringBuilder.append(StringBuilder.java:136) [info] at java.lang.StringBuilder.append(StringBuilder.java:131) [info] at scala.StringContext.standardInterpolator(StringContext.scala:125) [info] at scala.StringContext.s(StringContext.scala:95) [info] at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:199) [info] at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) [info] at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252) [info] at org.apache.spark.sql.Dataset.(Dataset.scala:190) [info] at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) [info] at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:3295) [info] at org.apache.spark.sql.Dataset.createOrReplaceTempView(Dataset.scala:3033) [info] at org.apache.spark.sql.MyTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(MyTest.scala:87) [info] at org.apache.spark.sql.catalyst.plans.PlanTestBase$class.withSQLConf(PlanTest.scala:176) [info] at org.apache.spark.sql.MyTest.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(MyTest.scala:27) [info] at org.apache.spark.sql.test.SQLTestUtilsBase$class.withSQLConf(SQLTestUtils.scala:167) [info] at org.apache.spark.sql.MyTest.withSQLConf(MyTest.scala:27) [info] at org.apache.spark.sql.MyTest$$anonfun$1.apply$mcV$sp(MyTest.scala:65) [info] at org.apache.spark.sql.MyTest$$anonfun$1.apply(MyTest.scala:65) [info] at org.apache.spark.sql.MyTest$$anonfun$1.apply(MyTest.scala:65) ... {code:java} > spark.sql.autoBroadcastJoinThreshold causing OOM exception in the driver > - > > Key: SPARK-23427 > URL: https://issues.apache.org/jira/browse/SPARK-23427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: SPARK 2.0 version >Reporter: Dhiraj >Priority: Critical > > We are facing issue around value of spark.sql.autoBroadcastJoinThreshold. > With spark.sql.autoBroadcastJoinThreshold -1 ( disable) we seeing driver > memory used flat. > With any other values 10MB, 5MB, 2 MB, 1MB, 10K, 1K we see driver memory used > goes up with rate depending upon the size of the autoBroadcastThreshold and > getting OOM exception. The problem is memory used by autoBroadcast is not > being free up in the driver. > Application imports oracle tables as master dataframes which are persisted. > Each job applies filter to these tables and then registered them as > tempViewTable . Then sql query are using to process data further. At the end > all the intermediate dataFrame are unpersisted. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org