[jira] [Commented] (SPARK-22152) Add Dataset flatten function

2017-09-28 Thread Drew Robb (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184296#comment-16184296
 ] 

Drew Robb commented on SPARK-22152:
---

There is also a ticket for adding it to RDD: 
https://issues.apache.org/jira/browse/SPARK-18855

> Add Dataset flatten function
> 
>
> Key: SPARK-22152
> URL: https://issues.apache.org/jira/browse/SPARK-22152
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Drew Robb
>Priority: Minor
>
> Currently you can use an identify flatMap to flatten a Dataset, for example 
> to get from a Dataset[Option[T]] to a Dataset[T], but adding flatten directly 
> would allow for a more similar API to scala collections.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22152) Add Dataset flatten function

2017-09-27 Thread Drew Robb (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183585#comment-16183585
 ] 

Drew Robb commented on SPARK-22152:
---

I personally use `Option` very frequently in datasets, and it is also idomatic 
to use Option over null in scala if possible. Another use case would be for 
`Dataset[Seq[T]] => Dataset[T]`:


{code:java}
scala> Seq(Seq(1,2,3)).toDS.flatMap{x => x}.show()
+-+
|value|
+-+
|1|
|2|
|3|
+-+
{code}



> Add Dataset flatten function
> 
>
> Key: SPARK-22152
> URL: https://issues.apache.org/jira/browse/SPARK-22152
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Drew Robb
>Priority: Minor
>
> Currently you can use an identify flatMap to flatten a Dataset, for example 
> to get from a Dataset[Option[T]] to a Dataset[T], but adding flatten directly 
> would allow for a more similar API to scala collections.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22152) Add Dataset flatten function

2017-09-27 Thread Drew Robb (JIRA)
Drew Robb created SPARK-22152:
-

 Summary: Add Dataset flatten function
 Key: SPARK-22152
 URL: https://issues.apache.org/jira/browse/SPARK-22152
 Project: Spark
  Issue Type: Wish
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Drew Robb
Priority: Minor


Currently you can use an identify flatMap to flatten a Dataset, for example to 
get from a Dataset[Option[T]] to a Dataset[T], but adding flatten directly 
would allow for a more similar API to scala collections.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8288) ScalaReflection should also try apply methods defined in companion objects when inferring schema from a Product type

2017-09-26 Thread Drew Robb (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181716#comment-16181716
 ] 

Drew Robb commented on SPARK-8288:
--

I do not yet have a fully working fix. I think that the best approach might be 
instead to change things on the scrooge end.

> ScalaReflection should also try apply methods defined in companion objects 
> when inferring schema from a Product type
> 
>
> Key: SPARK-8288
> URL: https://issues.apache.org/jira/browse/SPARK-8288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: Cheng Lian
>
> This ticket is derived from PARQUET-293 (which actually describes a Spark SQL 
> issue).
> My comment on that issue quoted below:
> {quote}
> ...  The reason of this exception is that, the Scala code Scrooge generates 
> is actually a trait extending {{Product}}:
> {code}
> trait Junk
>   extends ThriftStruct
>   with scala.Product2[Long, String]
>   with java.io.Serializable
> {code}
> while Spark expects a case class, something like:
> {code}
> case class Junk(junkID: Long, junkString: String)
> {code}
> The key difference here is that the latter case class version has a 
> constructor whose arguments can be transformed into fields of the DataFrame 
> schema.  The exception was thrown because Spark can't find such a constructor 
> from trait {{Junk}}.
> {quote}
> We can make {{ScalaReflection}} try {{apply}} methods in companion objects, 
> so that trait types generated by Scrooge can also be used for Spark SQL 
> schema inference.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE

2017-09-11 Thread Drew Robb (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162519#comment-16162519
 ] 

Drew Robb commented on SPARK-21133:
---

My mistake, you are absolutely correct. I had some locally cached rc build of 
2.2.0

> HighlyCompressedMapStatus#writeExternal throws NPE
> --
>
> Key: SPARK-21133
> URL: https://issues.apache.org/jira/browse/SPARK-21133
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Blocker
> Fix For: 2.2.0
>
>
> Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for 
> simple:
> {code:sql}
> spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7   -e "
>   set spark.sql.shuffle.partitions=2001;
>   drop table if exists spark_hcms_npe;
>   create table spark_hcms_npe as select id, count(*) from big_table group by 
> id;
> "
> {code}
> Error logs:
> {noformat}
> 17/06/18 15:00:27 ERROR Utils: Exception encountered
> java.lang.NullPointerException
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167)
> at 
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337)
> at 
> org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619)
> at 
> org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562)
> at 
> org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException
> java.io.IOException: java.lang.NullPointerException
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167)
> at 
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337)
> at 
> org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619)
> at 
> org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562)
> at 
> org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadP

[jira] [Commented] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE

2017-09-11 Thread Drew Robb (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162469#comment-16162469
 ] 

Drew Robb commented on SPARK-21133:
---

Thanks for the fix on this, but I don't think the fix version is correct at 
2.2.0, as it is reproducible in 2.2.0. 

> HighlyCompressedMapStatus#writeExternal throws NPE
> --
>
> Key: SPARK-21133
> URL: https://issues.apache.org/jira/browse/SPARK-21133
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Blocker
> Fix For: 2.2.0
>
>
> Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for 
> simple:
> {code:sql}
> spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7   -e "
>   set spark.sql.shuffle.partitions=2001;
>   drop table if exists spark_hcms_npe;
>   create table spark_hcms_npe as select id, count(*) from big_table group by 
> id;
> "
> {code}
> Error logs:
> {noformat}
> 17/06/18 15:00:27 ERROR Utils: Exception encountered
> java.lang.NullPointerException
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167)
> at 
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337)
> at 
> org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619)
> at 
> org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562)
> at 
> org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException
> java.io.IOException: java.lang.NullPointerException
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167)
> at 
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337)
> at 
> org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619)
> at 
> org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562)
> at 
> org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351)
> at 
> java.util.concurrent.Thre

[jira] [Commented] (SPARK-8288) ScalaReflection should also try apply methods defined in companion objects when inferring schema from a Product type

2017-08-04 Thread Drew Robb (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115036#comment-16115036
 ] 

Drew Robb commented on SPARK-8288:
--

An additional fix beyond my PR would be needed to handle reading this data as a 
dataset. The codegen constructor call here does not work since scrooge classes 
do not have a constructor: 
https://github.com/apache/spark/blob/v2.2.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L328

I experimented with changing this line to

{code}
s"$className$$.MODULE$$.apply($argString)"
{code}

this appeared to work, but some tests failed.

> ScalaReflection should also try apply methods defined in companion objects 
> when inferring schema from a Product type
> 
>
> Key: SPARK-8288
> URL: https://issues.apache.org/jira/browse/SPARK-8288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: Cheng Lian
>
> This ticket is derived from PARQUET-293 (which actually describes a Spark SQL 
> issue).
> My comment on that issue quoted below:
> {quote}
> ...  The reason of this exception is that, the Scala code Scrooge generates 
> is actually a trait extending {{Product}}:
> {code}
> trait Junk
>   extends ThriftStruct
>   with scala.Product2[Long, String]
>   with java.io.Serializable
> {code}
> while Spark expects a case class, something like:
> {code}
> case class Junk(junkID: Long, junkString: String)
> {code}
> The key difference here is that the latter case class version has a 
> constructor whose arguments can be transformed into fields of the DataFrame 
> schema.  The exception was thrown because Spark can't find such a constructor 
> from trait {{Junk}}.
> {quote}
> We can make {{ScalaReflection}} try {{apply}} methods in companion objects, 
> so that trait types generated by Scrooge can also be used for Spark SQL 
> schema inference.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8288) ScalaReflection should also try apply methods defined in companion objects when inferring schema from a Product type

2017-07-31 Thread Drew Robb (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107798#comment-16107798
 ] 

Drew Robb commented on SPARK-8288:
--

I opened a PR for this issue, not sure why the bot didn't pick it up? 
https://github.com/apache/spark/pull/18766

> ScalaReflection should also try apply methods defined in companion objects 
> when inferring schema from a Product type
> 
>
> Key: SPARK-8288
> URL: https://issues.apache.org/jira/browse/SPARK-8288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: Cheng Lian
>
> This ticket is derived from PARQUET-293 (which actually describes a Spark SQL 
> issue).
> My comment on that issue quoted below:
> {quote}
> ...  The reason of this exception is that, the Scala code Scrooge generates 
> is actually a trait extending {{Product}}:
> {code}
> trait Junk
>   extends ThriftStruct
>   with scala.Product2[Long, String]
>   with java.io.Serializable
> {code}
> while Spark expects a case class, something like:
> {code}
> case class Junk(junkID: Long, junkString: String)
> {code}
> The key difference here is that the latter case class version has a 
> constructor whose arguments can be transformed into fields of the DataFrame 
> schema.  The exception was thrown because Spark can't find such a constructor 
> from trait {{Junk}}.
> {quote}
> We can make {{ScalaReflection}} try {{apply}} methods in companion objects, 
> so that trait types generated by Scrooge can also be used for Spark SQL 
> schema inference.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12664) Expose raw prediction scores in MultilayerPerceptronClassificationModel

2017-03-16 Thread Drew Robb (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928928#comment-15928928
 ] 

Drew Robb commented on SPARK-12664:
---

This feature is also very important to me. I'm considering working on it myself 
and will post here if I begin that.

> Expose raw prediction scores in MultilayerPerceptronClassificationModel
> ---
>
> Key: SPARK-12664
> URL: https://issues.apache.org/jira/browse/SPARK-12664
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Robert Dodier
>Assignee: Yanbo Liang
>
> In 
> org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel, 
> there isn't any way to get raw prediction scores; only an integer output 
> (from 0 to #classes - 1) is available via the `predict` method. 
> `mplModel.predict` is called within the class to get the raw score, but 
> `mlpModel` is private so that isn't available to outside callers.
> The raw score is useful when the user wants to interpret the classifier 
> output as a probability. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-16599) java.util.NoSuchElementException: None.get at at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343)

2017-01-20 Thread Drew Robb (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Drew Robb updated SPARK-16599:
--
Comment: was deleted

(was: I encountered an identical exception when using a singleton spark 
session. For me, I was able to resolve the issue by ensuring all objects that 
used the singleton spark session did a `import spark.implicits._`, even if that 
particular import was not necessary for compiling.)

> java.util.NoSuchElementException: None.get  at at 
> org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343)
> --
>
> Key: SPARK-16599
> URL: https://issues.apache.org/jira/browse/SPARK-16599
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: centos 6.7   spark 2.0
>Reporter: binde
>
> run a spark job with spark 2.0, error message
> Job aborted due to stage failure: Task 0 in stage 821.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 821.0 (TID 1480, e103): 
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343)
>   at 
> org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:644)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:281)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16599) java.util.NoSuchElementException: None.get at at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343)

2017-01-20 Thread Drew Robb (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832569#comment-15832569
 ] 

Drew Robb commented on SPARK-16599:
---

I encountered an identical exception when using a singleton spark session. For 
me, I was able to resolve the issue by ensuring all objects that used the 
singleton spark session did a `import spark.implicits._`, even if that 
particular import was not necessary for compiling.

> java.util.NoSuchElementException: None.get  at at 
> org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343)
> --
>
> Key: SPARK-16599
> URL: https://issues.apache.org/jira/browse/SPARK-16599
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: centos 6.7   spark 2.0
>Reporter: binde
>
> run a spark job with spark 2.0, error message
> Job aborted due to stage failure: Task 0 in stage 821.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 821.0 (TID 1480, e103): 
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343)
>   at 
> org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:644)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:281)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17986) SQLTransformer leaks temporary tables

2016-10-17 Thread Drew Robb (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Drew Robb updated SPARK-17986:
--
Description: 
The SQLTransformer creates a temporary table when called, and does not delete 
this temporary table. When using a SQLTransformer in a long running Spark 
Streaming task, these temporary tables accumulate.

I believe that the fix would be as simple as calling  
`dataset.sparkSession.catalog.dropTempView(tableName)` in the last part of 
`transform`:
https://github.com/apache/spark/blob/v2.0.1/mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala#L65.
 

  was:
The SQLTransformer creates a temporary table when called, and does not delete 
this temporary table. When using a SQLTransformer in a long running Spark 
Streaming task, these temporary tables accumulate.

I believe that the fix would be as simple as calling  
`dataset.sparkSession.catalog.dropTempView(tableName)` in the last part of 
`transform`:
https://github.com/apache/spark/blob/v2.0.1/mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala#L65.
 I would be happy to attempt this fix myself if someone could validate this 
issue.


> SQLTransformer leaks temporary tables
> -
>
> Key: SPARK-17986
> URL: https://issues.apache.org/jira/browse/SPARK-17986
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.0.1
>Reporter: Drew Robb
>Priority: Minor
>
> The SQLTransformer creates a temporary table when called, and does not delete 
> this temporary table. When using a SQLTransformer in a long running Spark 
> Streaming task, these temporary tables accumulate.
> I believe that the fix would be as simple as calling  
> `dataset.sparkSession.catalog.dropTempView(tableName)` in the last part of 
> `transform`:
> https://github.com/apache/spark/blob/v2.0.1/mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala#L65.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17986) SQLTransformer leaks temporary tables

2016-10-17 Thread Drew Robb (JIRA)
Drew Robb created SPARK-17986:
-

 Summary: SQLTransformer leaks temporary tables
 Key: SPARK-17986
 URL: https://issues.apache.org/jira/browse/SPARK-17986
 Project: Spark
  Issue Type: Bug
  Components: ML
Affects Versions: 2.0.1
Reporter: Drew Robb
Priority: Minor


The SQLTransformer creates a temporary table when called, and does not delete 
this temporary table. When using a SQLTransformer in a long running Spark 
Streaming task, these temporary tables accumulate.

I believe that the fix would be as simple as calling  
`dataset.sparkSession.catalog.dropTempView(tableName)` in the last part of 
`transform`:
https://github.com/apache/spark/blob/v2.0.1/mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala#L65.
 I would be happy to attempt this fix myself if someone could validate this 
issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17928) No driver.memoryOverhead setting for mesos cluster mode

2016-10-13 Thread Drew Robb (JIRA)
Drew Robb created SPARK-17928:
-

 Summary: No driver.memoryOverhead setting for mesos cluster mode
 Key: SPARK-17928
 URL: https://issues.apache.org/jira/browse/SPARK-17928
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 2.0.1
Reporter: Drew Robb


Mesos cluster mode does not have a configuration setting for the driver's 
memory overhead. This makes scheduling long running drivers on mesos using 
dispatcher very unreliable. There is an equivalent setting for yarn-- 
spark.yarn.driver.memoryOverhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org