[jira] [Commented] (SPARK-22152) Add Dataset flatten function
[ https://issues.apache.org/jira/browse/SPARK-22152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184296#comment-16184296 ] Drew Robb commented on SPARK-22152: --- There is also a ticket for adding it to RDD: https://issues.apache.org/jira/browse/SPARK-18855 > Add Dataset flatten function > > > Key: SPARK-22152 > URL: https://issues.apache.org/jira/browse/SPARK-22152 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.2.0 >Reporter: Drew Robb >Priority: Minor > > Currently you can use an identify flatMap to flatten a Dataset, for example > to get from a Dataset[Option[T]] to a Dataset[T], but adding flatten directly > would allow for a more similar API to scala collections. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22152) Add Dataset flatten function
[ https://issues.apache.org/jira/browse/SPARK-22152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183585#comment-16183585 ] Drew Robb commented on SPARK-22152: --- I personally use `Option` very frequently in datasets, and it is also idomatic to use Option over null in scala if possible. Another use case would be for `Dataset[Seq[T]] => Dataset[T]`: {code:java} scala> Seq(Seq(1,2,3)).toDS.flatMap{x => x}.show() +-+ |value| +-+ |1| |2| |3| +-+ {code} > Add Dataset flatten function > > > Key: SPARK-22152 > URL: https://issues.apache.org/jira/browse/SPARK-22152 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.2.0 >Reporter: Drew Robb >Priority: Minor > > Currently you can use an identify flatMap to flatten a Dataset, for example > to get from a Dataset[Option[T]] to a Dataset[T], but adding flatten directly > would allow for a more similar API to scala collections. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22152) Add Dataset flatten function
Drew Robb created SPARK-22152: - Summary: Add Dataset flatten function Key: SPARK-22152 URL: https://issues.apache.org/jira/browse/SPARK-22152 Project: Spark Issue Type: Wish Components: Spark Core Affects Versions: 2.2.0 Reporter: Drew Robb Priority: Minor Currently you can use an identify flatMap to flatten a Dataset, for example to get from a Dataset[Option[T]] to a Dataset[T], but adding flatten directly would allow for a more similar API to scala collections. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8288) ScalaReflection should also try apply methods defined in companion objects when inferring schema from a Product type
[ https://issues.apache.org/jira/browse/SPARK-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181716#comment-16181716 ] Drew Robb commented on SPARK-8288: -- I do not yet have a fully working fix. I think that the best approach might be instead to change things on the scrooge end. > ScalaReflection should also try apply methods defined in companion objects > when inferring schema from a Product type > > > Key: SPARK-8288 > URL: https://issues.apache.org/jira/browse/SPARK-8288 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.4.0 >Reporter: Cheng Lian > > This ticket is derived from PARQUET-293 (which actually describes a Spark SQL > issue). > My comment on that issue quoted below: > {quote} > ... The reason of this exception is that, the Scala code Scrooge generates > is actually a trait extending {{Product}}: > {code} > trait Junk > extends ThriftStruct > with scala.Product2[Long, String] > with java.io.Serializable > {code} > while Spark expects a case class, something like: > {code} > case class Junk(junkID: Long, junkString: String) > {code} > The key difference here is that the latter case class version has a > constructor whose arguments can be transformed into fields of the DataFrame > schema. The exception was thrown because Spark can't find such a constructor > from trait {{Junk}}. > {quote} > We can make {{ScalaReflection}} try {{apply}} methods in companion objects, > so that trait types generated by Scrooge can also be used for Spark SQL > schema inference. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162519#comment-16162519 ] Drew Robb commented on SPARK-21133: --- My mistake, you are absolutely correct. I had some locally cached rc build of 2.2.0 > HighlyCompressedMapStatus#writeExternal throws NPE > -- > > Key: SPARK-21133 > URL: https://issues.apache.org/jira/browse/SPARK-21133 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Blocker > Fix For: 2.2.0 > > > Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for > simple: > {code:sql} > spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7 -e " > set spark.sql.shuffle.partitions=2001; > drop table if exists spark_hcms_npe; > create table spark_hcms_npe as select id, count(*) from big_table group by > id; > " > {code} > Error logs: > {noformat} > 17/06/18 15:00:27 ERROR Utils: Exception encountered > java.lang.NullPointerException > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException > java.io.IOException: java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadP
[jira] [Commented] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162469#comment-16162469 ] Drew Robb commented on SPARK-21133: --- Thanks for the fix on this, but I don't think the fix version is correct at 2.2.0, as it is reproducible in 2.2.0. > HighlyCompressedMapStatus#writeExternal throws NPE > -- > > Key: SPARK-21133 > URL: https://issues.apache.org/jira/browse/SPARK-21133 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Blocker > Fix For: 2.2.0 > > > Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for > simple: > {code:sql} > spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7 -e " > set spark.sql.shuffle.partitions=2001; > drop table if exists spark_hcms_npe; > create table spark_hcms_npe as select id, count(*) from big_table group by > id; > " > {code} > Error logs: > {noformat} > 17/06/18 15:00:27 ERROR Utils: Exception encountered > java.lang.NullPointerException > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException > java.io.IOException: java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.Thre
[jira] [Commented] (SPARK-8288) ScalaReflection should also try apply methods defined in companion objects when inferring schema from a Product type
[ https://issues.apache.org/jira/browse/SPARK-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115036#comment-16115036 ] Drew Robb commented on SPARK-8288: -- An additional fix beyond my PR would be needed to handle reading this data as a dataset. The codegen constructor call here does not work since scrooge classes do not have a constructor: https://github.com/apache/spark/blob/v2.2.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L328 I experimented with changing this line to {code} s"$className$$.MODULE$$.apply($argString)" {code} this appeared to work, but some tests failed. > ScalaReflection should also try apply methods defined in companion objects > when inferring schema from a Product type > > > Key: SPARK-8288 > URL: https://issues.apache.org/jira/browse/SPARK-8288 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.4.0 >Reporter: Cheng Lian > > This ticket is derived from PARQUET-293 (which actually describes a Spark SQL > issue). > My comment on that issue quoted below: > {quote} > ... The reason of this exception is that, the Scala code Scrooge generates > is actually a trait extending {{Product}}: > {code} > trait Junk > extends ThriftStruct > with scala.Product2[Long, String] > with java.io.Serializable > {code} > while Spark expects a case class, something like: > {code} > case class Junk(junkID: Long, junkString: String) > {code} > The key difference here is that the latter case class version has a > constructor whose arguments can be transformed into fields of the DataFrame > schema. The exception was thrown because Spark can't find such a constructor > from trait {{Junk}}. > {quote} > We can make {{ScalaReflection}} try {{apply}} methods in companion objects, > so that trait types generated by Scrooge can also be used for Spark SQL > schema inference. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8288) ScalaReflection should also try apply methods defined in companion objects when inferring schema from a Product type
[ https://issues.apache.org/jira/browse/SPARK-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107798#comment-16107798 ] Drew Robb commented on SPARK-8288: -- I opened a PR for this issue, not sure why the bot didn't pick it up? https://github.com/apache/spark/pull/18766 > ScalaReflection should also try apply methods defined in companion objects > when inferring schema from a Product type > > > Key: SPARK-8288 > URL: https://issues.apache.org/jira/browse/SPARK-8288 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.4.0 >Reporter: Cheng Lian > > This ticket is derived from PARQUET-293 (which actually describes a Spark SQL > issue). > My comment on that issue quoted below: > {quote} > ... The reason of this exception is that, the Scala code Scrooge generates > is actually a trait extending {{Product}}: > {code} > trait Junk > extends ThriftStruct > with scala.Product2[Long, String] > with java.io.Serializable > {code} > while Spark expects a case class, something like: > {code} > case class Junk(junkID: Long, junkString: String) > {code} > The key difference here is that the latter case class version has a > constructor whose arguments can be transformed into fields of the DataFrame > schema. The exception was thrown because Spark can't find such a constructor > from trait {{Junk}}. > {quote} > We can make {{ScalaReflection}} try {{apply}} methods in companion objects, > so that trait types generated by Scrooge can also be used for Spark SQL > schema inference. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12664) Expose raw prediction scores in MultilayerPerceptronClassificationModel
[ https://issues.apache.org/jira/browse/SPARK-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928928#comment-15928928 ] Drew Robb commented on SPARK-12664: --- This feature is also very important to me. I'm considering working on it myself and will post here if I begin that. > Expose raw prediction scores in MultilayerPerceptronClassificationModel > --- > > Key: SPARK-12664 > URL: https://issues.apache.org/jira/browse/SPARK-12664 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Robert Dodier >Assignee: Yanbo Liang > > In > org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel, > there isn't any way to get raw prediction scores; only an integer output > (from 0 to #classes - 1) is available via the `predict` method. > `mplModel.predict` is called within the class to get the raw score, but > `mlpModel` is private so that isn't available to outside callers. > The raw score is useful when the user wants to interpret the classifier > output as a probability. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-16599) java.util.NoSuchElementException: None.get at at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343)
[ https://issues.apache.org/jira/browse/SPARK-16599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Drew Robb updated SPARK-16599: -- Comment: was deleted (was: I encountered an identical exception when using a singleton spark session. For me, I was able to resolve the issue by ensuring all objects that used the singleton spark session did a `import spark.implicits._`, even if that particular import was not necessary for compiling.) > java.util.NoSuchElementException: None.get at at > org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343) > -- > > Key: SPARK-16599 > URL: https://issues.apache.org/jira/browse/SPARK-16599 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: centos 6.7 spark 2.0 >Reporter: binde > > run a spark job with spark 2.0, error message > Job aborted due to stage failure: Task 0 in stage 821.0 failed 4 times, most > recent failure: Lost task 0.3 in stage 821.0 (TID 1480, e103): > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343) > at > org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:644) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:281) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16599) java.util.NoSuchElementException: None.get at at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343)
[ https://issues.apache.org/jira/browse/SPARK-16599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832569#comment-15832569 ] Drew Robb commented on SPARK-16599: --- I encountered an identical exception when using a singleton spark session. For me, I was able to resolve the issue by ensuring all objects that used the singleton spark session did a `import spark.implicits._`, even if that particular import was not necessary for compiling. > java.util.NoSuchElementException: None.get at at > org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343) > -- > > Key: SPARK-16599 > URL: https://issues.apache.org/jira/browse/SPARK-16599 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: centos 6.7 spark 2.0 >Reporter: binde > > run a spark job with spark 2.0, error message > Job aborted due to stage failure: Task 0 in stage 821.0 failed 4 times, most > recent failure: Lost task 0.3 in stage 821.0 (TID 1480, e103): > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343) > at > org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:644) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:281) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17986) SQLTransformer leaks temporary tables
[ https://issues.apache.org/jira/browse/SPARK-17986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Drew Robb updated SPARK-17986: -- Description: The SQLTransformer creates a temporary table when called, and does not delete this temporary table. When using a SQLTransformer in a long running Spark Streaming task, these temporary tables accumulate. I believe that the fix would be as simple as calling `dataset.sparkSession.catalog.dropTempView(tableName)` in the last part of `transform`: https://github.com/apache/spark/blob/v2.0.1/mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala#L65. was: The SQLTransformer creates a temporary table when called, and does not delete this temporary table. When using a SQLTransformer in a long running Spark Streaming task, these temporary tables accumulate. I believe that the fix would be as simple as calling `dataset.sparkSession.catalog.dropTempView(tableName)` in the last part of `transform`: https://github.com/apache/spark/blob/v2.0.1/mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala#L65. I would be happy to attempt this fix myself if someone could validate this issue. > SQLTransformer leaks temporary tables > - > > Key: SPARK-17986 > URL: https://issues.apache.org/jira/browse/SPARK-17986 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.0.1 >Reporter: Drew Robb >Priority: Minor > > The SQLTransformer creates a temporary table when called, and does not delete > this temporary table. When using a SQLTransformer in a long running Spark > Streaming task, these temporary tables accumulate. > I believe that the fix would be as simple as calling > `dataset.sparkSession.catalog.dropTempView(tableName)` in the last part of > `transform`: > https://github.com/apache/spark/blob/v2.0.1/mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala#L65. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17986) SQLTransformer leaks temporary tables
Drew Robb created SPARK-17986: - Summary: SQLTransformer leaks temporary tables Key: SPARK-17986 URL: https://issues.apache.org/jira/browse/SPARK-17986 Project: Spark Issue Type: Bug Components: ML Affects Versions: 2.0.1 Reporter: Drew Robb Priority: Minor The SQLTransformer creates a temporary table when called, and does not delete this temporary table. When using a SQLTransformer in a long running Spark Streaming task, these temporary tables accumulate. I believe that the fix would be as simple as calling `dataset.sparkSession.catalog.dropTempView(tableName)` in the last part of `transform`: https://github.com/apache/spark/blob/v2.0.1/mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala#L65. I would be happy to attempt this fix myself if someone could validate this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17928) No driver.memoryOverhead setting for mesos cluster mode
Drew Robb created SPARK-17928: - Summary: No driver.memoryOverhead setting for mesos cluster mode Key: SPARK-17928 URL: https://issues.apache.org/jira/browse/SPARK-17928 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 2.0.1 Reporter: Drew Robb Mesos cluster mode does not have a configuration setting for the driver's memory overhead. This makes scheduling long running drivers on mesos using dispatcher very unreliable. There is an equivalent setting for yarn-- spark.yarn.driver.memoryOverhead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org