[jira] [Updated] (SPARK-14670) Allow updating SQLMetrics on driver

2016-06-08 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-14670:
---
Assignee: Wenchen Fan  (was: Andrew Or)

> Allow updating SQLMetrics on driver
> ---
>
> Key: SPARK-14670
> URL: https://issues.apache.org/jira/browse/SPARK-14670
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>
> On the SparkUI right now we have this SQLTab that displays accumulator values 
> per operator. However, it only displays metrics updated on the executors, not 
> on the driver. It is useful to also include driver metrics, e.g. broadcast 
> time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14670) Allow updating SQLMetrics on driver

2016-06-08 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-14670.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13189
[https://github.com/apache/spark/pull/13189]

> Allow updating SQLMetrics on driver
> ---
>
> Key: SPARK-14670
> URL: https://issues.apache.org/jira/browse/SPARK-14670
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> On the SparkUI right now we have this SQLTab that displays accumulator values 
> per operator. However, it only displays metrics updated on the executors, not 
> on the driver. It is useful to also include driver metrics, e.g. broadcast 
> time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15832) Embedded IN/EXISTS predicate subquery throws TreeNodeException

2016-06-08 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-15832:

Priority: Major  (was: Minor)

> Embedded IN/EXISTS predicate subquery throws TreeNodeException
> --
>
> Key: SPARK-15832
> URL: https://issues.apache.org/jira/browse/SPARK-15832
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Ioana Delaney
>
> Queries with embedded existential sub-query predicates throws exception when 
> building the physical plan.
> Example failing query:
> {code}
> scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t1")
> scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t2")
> scala> sql("select c1 from t1 where (case when c2 in (select c2 from t2) then 
> 2 else 3 end) IN (select c2 from t1)").show()
> Binding attribute, tree: c2#239
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: c2#239
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:50)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88)
>   ...
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:87)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$$anonfun$4.apply(HashJoin.scala:66)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$$anonfun$4.apply(HashJoin.scala:66)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$class.org$apache$spark$sql$execution$joins$HashJoin$$x$8(HashJoin.scala:66)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$8$lzycompute(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$8(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$class.buildKeys(HashJoin.scala:63)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys$lzycompute(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.requiredChildDistribution(BroadcastHashJoinExec.scala:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15837) PySpark ML Word2Vec should support maxSentenceLength

2016-06-08 Thread Weichen Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321899#comment-15321899
 ] 

Weichen Xu commented on SPARK-15837:


I'll work on it and create a PR soon !

> PySpark ML Word2Vec should support maxSentenceLength
> 
>
> Key: SPARK-15837
> URL: https://issues.apache.org/jira/browse/SPARK-15837
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Yanbo Liang
>Priority: Minor
>
> SPARK-15793 adds maxSentenceLength for ML Word2Vec in Scala, we should also 
> add it in Python API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15837) PySpark ML Word2Vec should support maxSentenceLength

2016-06-08 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-15837:
---

 Summary: PySpark ML Word2Vec should support maxSentenceLength
 Key: SPARK-15837
 URL: https://issues.apache.org/jira/browse/SPARK-15837
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Priority: Minor


SPARK-15793 adds maxSentenceLength for ML Word2Vec in Scala, we should also add 
it in Python API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15836) Spark 2.0/master maven snapshots are broken

2016-06-08 Thread Yin Huai (JIRA)
Yin Huai created SPARK-15836:


 Summary: Spark 2.0/master maven snapshots are broken
 Key: SPARK-15836
 URL: https://issues.apache.org/jira/browse/SPARK-15836
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Yin Huai


See 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.0-maven-snapshots/
 and 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15086) Update Java API once the Scala one is finalized

2016-06-08 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321730#comment-15321730
 ] 

Reynold Xin commented on SPARK-15086:
-

This is really tricky. On one hand it'd be great not having to break the API, 
on the other it would be really confusing if the two return different values 
... One possibility is to rename everything to longAccumulatorV2, 
doubleAccumulatorV2. Thoughts?


> Update Java API once the Scala one is finalized
> ---
>
> Key: SPARK-15086
> URL: https://issues.apache.org/jira/browse/SPARK-15086
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Priority: Blocker
>
> We should make sure we update the Java API once the Scala one is finalized. 
> This includes adding the equivalent API in Java as well as deprecating the 
> old ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15835) The read path of json doesn't support write path when schema contains Options

2016-06-08 Thread Burak Yavuz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321729#comment-15321729
 ] 

Burak Yavuz commented on SPARK-15835:
-

cc [~cloud_fan]

> The read path of json doesn't support write path when schema contains Options
> -
>
> Key: SPARK-15835
> URL: https://issues.apache.org/jira/browse/SPARK-15835
> Project: Spark
>  Issue Type: Bug
>Reporter: Burak Yavuz
>
> my schema contains optional fields. When these fields are written in json 
> (and all of these records are None), the field will be omitted during writes. 
> When reading, these fields can't be found and this throws an exception.
> Either during writes, the fields should be included as `null`, or the Dataset 
> should not require the field to exist in the DataFrame if the field is an 
> Option (which may be a better solution)
> {code}
> case class Bug(field1: String, field2: Option[String])
> Seq(Bug("abc", None)).toDS.write.json("/tmp/sqlBug")
> spark.read.json("/tmp/sqlBug").as[Bug]
> {code}
> stack trace:
> {code}
> org.apache.spark.sql.AnalysisException: cannot resolve '`field2`' given input 
> columns: [field1]
> at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:62)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:59)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:68)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15835) The read path of json doesn't support write path when schema contains Options

2016-06-08 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-15835:
---

 Summary: The read path of json doesn't support write path when 
schema contains Options
 Key: SPARK-15835
 URL: https://issues.apache.org/jira/browse/SPARK-15835
 Project: Spark
  Issue Type: Bug
Reporter: Burak Yavuz



my schema contains optional fields. When these fields are written in json (and 
all of these records are None), the field will be omitted during writes. When 
reading, these fields can't be found and this throws an exception.
Either during writes, the fields should be included as `null`, or the Dataset 
should not require the field to exist in the DataFrame if the field is an 
Option (which may be a better solution)

{code}
case class Bug(field1: String, field2: Option[String])
Seq(Bug("abc", None)).toDS.write.json("/tmp/sqlBug")
spark.read.json("/tmp/sqlBug").as[Bug]
{code}

stack trace:
{code}
org.apache.spark.sql.AnalysisException: cannot resolve '`field2`' given input 
columns: [field1]
at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:62)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:59)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:68)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15369) Investigate selectively using Jython for parts of PySpark

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15369:


Assignee: (was: Apache Spark)

> Investigate selectively using Jython for parts of PySpark
> -
>
> Key: SPARK-15369
> URL: https://issues.apache.org/jira/browse/SPARK-15369
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: holdenk
>Priority: Minor
>
> Transfering data from the JVM to the Python executor can be a substantial 
> bottleneck. While JYthon is not suitable for all UDFs or map functions, it 
> may be suitable for some simple ones. We should investigate the option of 
> using JYthon to accelerate these small functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15369) Investigate selectively using Jython for parts of PySpark

2016-06-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321684#comment-15321684
 ] 

Apache Spark commented on SPARK-15369:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/13571

> Investigate selectively using Jython for parts of PySpark
> -
>
> Key: SPARK-15369
> URL: https://issues.apache.org/jira/browse/SPARK-15369
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: holdenk
>Priority: Minor
>
> Transfering data from the JVM to the Python executor can be a substantial 
> bottleneck. While JYthon is not suitable for all UDFs or map functions, it 
> may be suitable for some simple ones. We should investigate the option of 
> using JYthon to accelerate these small functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15369) Investigate selectively using Jython for parts of PySpark

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15369:


Assignee: Apache Spark

> Investigate selectively using Jython for parts of PySpark
> -
>
> Key: SPARK-15369
> URL: https://issues.apache.org/jira/browse/SPARK-15369
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Minor
>
> Transfering data from the JVM to the Python executor can be a substantial 
> bottleneck. While JYthon is not suitable for all UDFs or map functions, it 
> may be suitable for some simple ones. We should investigate the option of 
> using JYthon to accelerate these small functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11415) Catalyst DateType Shifts Input Data by Local Timezone

2016-06-08 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-11415:
---
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-15834

> Catalyst DateType Shifts Input Data by Local Timezone
> -
>
> Key: SPARK-11415
> URL: https://issues.apache.org/jira/browse/SPARK-11415
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0, 1.5.1
>Reporter: Russell Alexander Spitzer
>
> I've been running type tests for the Spark Cassandra Connector and couldn't 
> get a consistent result for java.sql.Date. I investigated and noticed the 
> following code is used to create Catalyst.DateTypes
> https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L139-L144
> {code}
>  /**
>* Returns the number of days since epoch from from java.sql.Date.
>*/
>   def fromJavaDate(date: Date): SQLDate = {
> millisToDays(date.getTime)
>   }
> {code}
> But millisToDays does not abide by this contract, shifting the underlying 
> timestamp to the local timezone before calculating the days from epoch. This 
> causes the invocation to move the actual date around.
> {code}
>   // we should use the exact day as Int, for example, (year, month, day) -> 
> day
>   def millisToDays(millisUtc: Long): SQLDate = {
> // SPARK-6785: use Math.floor so negative number of days (dates before 
> 1970)
> // will correctly work as input for function toJavaDate(Int)
> val millisLocal = millisUtc + 
> threadLocalLocalTimeZone.get().getOffset(millisUtc)
> Math.floor(millisLocal.toDouble / MILLIS_PER_DAY).toInt
>   }
> {code}
> The inverse function also incorrectly shifts the timezone
> {code}
>   // reverse of millisToDays
>   def daysToMillis(days: SQLDate): Long = {
> val millisUtc = days.toLong * MILLIS_PER_DAY
> millisUtc - threadLocalLocalTimeZone.get().getOffset(millisUtc)
>   }
> {code}
> https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L81-L93
> This will cause 1-off errors and could cause significant shifts in data if 
> the underlying data is worked on in different timezones than UTC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14057) sql time stamps do not respect time zones

2016-06-08 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-14057:
---
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-15834

> sql time stamps do not respect time zones
> -
>
> Key: SPARK-14057
> URL: https://issues.apache.org/jira/browse/SPARK-14057
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Andrew Davidson
>Priority: Minor
>
> we have time stamp data. The time stamp data is UTC how ever when we load the 
> data into spark data frames, the system assume the time stamps are in the 
> local time zone. This causes problems for our data scientists. Often they 
> pull data from our data center into their local macs. The data centers run 
> UTC. There computers are typically in PST or EST.
> It is possible to hack around this problem
> This cause a lot of errors in their analysis
> A complete description of this issue can be found in the following mail msg
> https://www.mail-archive.com/user@spark.apache.org/msg48121.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15834) Time zone / locale sensitivity umbrella

2016-06-08 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-15834:
--

 Summary: Time zone / locale sensitivity umbrella
 Key: SPARK-15834
 URL: https://issues.apache.org/jira/browse/SPARK-15834
 Project: Spark
  Issue Type: Umbrella
  Components: SQL
Reporter: Josh Rosen


This is an umbrella ticket for tracking time zone and locale sensitivity bugs 
in Spark SQL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15613) Incorrect days to millis conversion

2016-06-08 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-15613:
---
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-15834

> Incorrect days to millis conversion 
> 
>
> Key: SPARK-15613
> URL: https://issues.apache.org/jira/browse/SPARK-15613
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: java version "1.8.0_91"
>Reporter: Dmitry Bushev
>
> There is an issue with {{DateTimeUtils.daysToMillis}} implementation. It  
> affects {{DateTimeUtils.toJavaDate}} and ultimately CatalystTypeConverter, 
> i.e the conversion of date stored as {{Int}} days from epoch in InternalRow 
> to {{java.sql.Date}} of Row returned to user.
>  
> The issue can be reproduced with this test (all the following tests are in my 
> defalut timezone Europe/Moscow):
> {code}
> scala> for (days <- 0 to 2 if millisToDays(daysToMillis(days)) != days) 
> yield days
> res23: scala.collection.immutable.IndexedSeq[Int] = Vector(4108, 4473, 4838, 
> 5204, 5568, 5932, 6296, 6660, 7024, 7388, 8053, 8487, 8851, 9215, 9586, 9950, 
> 10314, 10678, 11042, 11406, 11777, 12141, 12505, 12869, 13233, 13597, 13968, 
> 14332, 14696, 15060)
> {code}
> For example, for {{4108}} day of epoch, the correct date should be 
> {{1981-04-01}}
> {code}
> scala> DateTimeUtils.toJavaDate(4107)
> res25: java.sql.Date = 1981-03-31
> scala> DateTimeUtils.toJavaDate(4108)
> res26: java.sql.Date = 1981-03-31
> scala> DateTimeUtils.toJavaDate(4109)
> res27: java.sql.Date = 1981-04-02
> {code}
> There was previous unsuccessful attempt to work around the problem in 
> SPARK-11415. It seems that issue involves flaws in java date implementation 
> and I don't see how it can be fixed without third-party libraries.
> I was not able to identify the library of choice for Spark. The following 
> implementation uses [JSR-310|http://www.threeten.org/]
> {code}
> def millisToDays(millisUtc: Long): SQLDate = {
>   val instant = Instant.ofEpochMilli(millisUtc)
>   val zonedDateTime = instant.atZone(ZoneId.systemDefault)
>   zonedDateTime.toLocalDate.toEpochDay.toInt
> }
> def daysToMillis(days: SQLDate): Long = {
>   val localDate = LocalDate.ofEpochDay(days)
>   val zonedDateTime = localDate.atStartOfDay(ZoneId.systemDefault)
>   zonedDateTime.toInstant.toEpochMilli
> }
> {code}
> that produces correct results:
> {code}
> scala> for (days <- 0 to 2 if millisToDays(daysToMillis(days)) != days) 
> yield days
> res37: scala.collection.immutable.IndexedSeq[Int] = Vector()
> scala> new java.sql.Date(daysToMillis(4108))
> res36: java.sql.Date = 1981-04-01
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13268) SQL Timestamp stored as GMT but toString returns GMT-08:00

2016-06-08 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-13268:
---
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-15834

> SQL Timestamp stored as GMT but toString returns GMT-08:00
> --
>
> Key: SPARK-13268
> URL: https://issues.apache.org/jira/browse/SPARK-13268
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Ilya Ganelin
>
> There is an issue with how timestamps are displayed/converted to Strings in 
> Spark SQL. The documentation states that the timestamp should be created in 
> the GMT time zone, however, if we do so, we see that the output actually 
> contains a -8 hour offset:
> {code}
> new 
> Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT]").toInstant.toEpochMilli)
> res144: java.sql.Timestamp = 2014-12-31 16:00:00.0
> new 
> Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT-08:00]").toInstant.toEpochMilli)
> res145: java.sql.Timestamp = 2015-01-01 00:00:00.0
> {code}
> This result is confusing, unintuitive, and introduces issues when converting 
> from DataFrames containing timestamps to RDDs which are then saved as text. 
> This has the effect of essentially shifting all dates in a dataset by 1 day. 
> The suggested fix for this is to update the timestamp toString representation 
> to either a) Include timezone or b) Correctly display in GMT.
> This change may well introduce substantial and insidious bugs so I'm not sure 
> how best to resolve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-14923) Support "Extended" in "Describe" table DDL

2016-06-08 Thread Bo Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Meng closed SPARK-14923.
---
Resolution: Fixed

> Support "Extended" in "Describe" table DDL
> --
>
> Key: SPARK-14923
> URL: https://issues.apache.org/jira/browse/SPARK-14923
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bo Meng
>
> Currently, {{Extended}} keywords in {{Describe [Extended] }} DDL 
> is simply ignored. This JIRA is to bring it back with the similar behavior as 
> Hive does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15735) Allow specifying min time to run in microbenchmarks

2016-06-08 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-15735.
---
  Resolution: Resolved
Assignee: Eric Liang
Target Version/s: 2.0.0

> Allow specifying min time to run in microbenchmarks
> ---
>
> Key: SPARK-15735
> URL: https://issues.apache.org/jira/browse/SPARK-15735
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Eric Liang
>Assignee: Eric Liang
>
> This is helpful e.g. in SPARK-15724 so microbenchmarks don't need to 
> hard-code the number of iterations to run to get a meaningful result, which 
> is brittle as performance changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15833) REST API for streaming

2016-06-08 Thread Lin Chan (JIRA)
Lin Chan created SPARK-15833:


 Summary: REST API for streaming
 Key: SPARK-15833
 URL: https://issues.apache.org/jira/browse/SPARK-15833
 Project: Spark
  Issue Type: Improvement
  Components: Input/Output
Reporter: Lin Chan


According to this documentation page: 
http://spark.apache.org/docs/latest/monitoring.html, there are already REST 
APIs for monitoring jobs/stages/executors etc but not for streaming. 
Can we please also provide the equivalent REST API support for stream as well?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8426) Add blacklist mechanism for YARN container allocation

2016-06-08 Thread Imran Rashid (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321637#comment-15321637
 ] 

Imran Rashid commented on SPARK-8426:
-

[~kayousterhout] shoudl work now, sorry about that.

> Add blacklist mechanism for YARN container allocation
> -
>
> Key: SPARK-8426
> URL: https://issues.apache.org/jira/browse/SPARK-8426
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, YARN
>Reporter: Saisai Shao
>Priority: Minor
> Attachments: DesignDocforBlacklistMechanism.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8426) Add blacklist mechanism for YARN container allocation

2016-06-08 Thread Kay Ousterhout (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321626#comment-15321626
 ] 

Kay Ousterhout commented on SPARK-8426:
---

Imran can you enable commenting on the design doc?

> Add blacklist mechanism for YARN container allocation
> -
>
> Key: SPARK-8426
> URL: https://issues.apache.org/jira/browse/SPARK-8426
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, YARN
>Reporter: Saisai Shao
>Priority: Minor
> Attachments: DesignDocforBlacklistMechanism.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15832) Embedded IN/EXISTS predicate subquery throws TreeNodeException

2016-06-08 Thread Ioana Delaney (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ioana Delaney updated SPARK-15832:
--
Component/s: SQL

> Embedded IN/EXISTS predicate subquery throws TreeNodeException
> --
>
> Key: SPARK-15832
> URL: https://issues.apache.org/jira/browse/SPARK-15832
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Ioana Delaney
>Priority: Minor
>
> Queries with embedded existential sub-query predicates throws exception when 
> building the physical plan.
> Example failing query:
> {code}
> scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t1")
> scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t2")
> scala> sql("select c1 from t1 where (case when c2 in (select c2 from t2) then 
> 2 else 3 end) IN (select c2 from t1)").show()
> Binding attribute, tree: c2#239
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: c2#239
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:50)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88)
>   ...
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:87)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$$anonfun$4.apply(HashJoin.scala:66)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$$anonfun$4.apply(HashJoin.scala:66)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$class.org$apache$spark$sql$execution$joins$HashJoin$$x$8(HashJoin.scala:66)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$8$lzycompute(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$8(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$class.buildKeys(HashJoin.scala:63)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys$lzycompute(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.requiredChildDistribution(BroadcastHashJoinExec.scala:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15832) Embedded IN/EXISTS predicate subquery throws TreeNodeException

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15832:


Assignee: Apache Spark

> Embedded IN/EXISTS predicate subquery throws TreeNodeException
> --
>
> Key: SPARK-15832
> URL: https://issues.apache.org/jira/browse/SPARK-15832
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Ioana Delaney
>Assignee: Apache Spark
>Priority: Minor
>
> Queries with embedded existential sub-query predicates throws exception when 
> building the physical plan.
> Example failing query:
> {code}
> scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t1")
> scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t2")
> scala> sql("select c1 from t1 where (case when c2 in (select c2 from t2) then 
> 2 else 3 end) IN (select c2 from t1)").show()
> Binding attribute, tree: c2#239
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: c2#239
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:50)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88)
>   ...
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:87)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$$anonfun$4.apply(HashJoin.scala:66)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$$anonfun$4.apply(HashJoin.scala:66)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$class.org$apache$spark$sql$execution$joins$HashJoin$$x$8(HashJoin.scala:66)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$8$lzycompute(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$8(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$class.buildKeys(HashJoin.scala:63)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys$lzycompute(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.requiredChildDistribution(BroadcastHashJoinExec.scala:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15832) Embedded IN/EXISTS predicate subquery throws TreeNodeException

2016-06-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321617#comment-15321617
 ] 

Apache Spark commented on SPARK-15832:
--

User 'ioana-delaney' has created a pull request for this issue:
https://github.com/apache/spark/pull/13570

> Embedded IN/EXISTS predicate subquery throws TreeNodeException
> --
>
> Key: SPARK-15832
> URL: https://issues.apache.org/jira/browse/SPARK-15832
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Ioana Delaney
>Priority: Minor
>
> Queries with embedded existential sub-query predicates throws exception when 
> building the physical plan.
> Example failing query:
> {code}
> scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t1")
> scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t2")
> scala> sql("select c1 from t1 where (case when c2 in (select c2 from t2) then 
> 2 else 3 end) IN (select c2 from t1)").show()
> Binding attribute, tree: c2#239
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: c2#239
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:50)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88)
>   ...
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:87)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$$anonfun$4.apply(HashJoin.scala:66)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$$anonfun$4.apply(HashJoin.scala:66)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$class.org$apache$spark$sql$execution$joins$HashJoin$$x$8(HashJoin.scala:66)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$8$lzycompute(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$8(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$class.buildKeys(HashJoin.scala:63)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys$lzycompute(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.requiredChildDistribution(BroadcastHashJoinExec.scala:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15832) Embedded IN/EXISTS predicate subquery throws TreeNodeException

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15832:


Assignee: (was: Apache Spark)

> Embedded IN/EXISTS predicate subquery throws TreeNodeException
> --
>
> Key: SPARK-15832
> URL: https://issues.apache.org/jira/browse/SPARK-15832
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Ioana Delaney
>Priority: Minor
>
> Queries with embedded existential sub-query predicates throws exception when 
> building the physical plan.
> Example failing query:
> {code}
> scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t1")
> scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t2")
> scala> sql("select c1 from t1 where (case when c2 in (select c2 from t2) then 
> 2 else 3 end) IN (select c2 from t1)").show()
> Binding attribute, tree: c2#239
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: c2#239
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:50)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88)
>   ...
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:87)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$$anonfun$4.apply(HashJoin.scala:66)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$$anonfun$4.apply(HashJoin.scala:66)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$class.org$apache$spark$sql$execution$joins$HashJoin$$x$8(HashJoin.scala:66)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$8$lzycompute(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$8(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.HashJoin$class.buildKeys(HashJoin.scala:63)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys$lzycompute(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys(BroadcastHashJoinExec.scala:38)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.requiredChildDistribution(BroadcastHashJoinExec.scala:52)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15832) Embedded IN/EXISTS predicate subquery throws TreeNodeException

2016-06-08 Thread Ioana Delaney (JIRA)
Ioana Delaney created SPARK-15832:
-

 Summary: Embedded IN/EXISTS predicate subquery throws 
TreeNodeException
 Key: SPARK-15832
 URL: https://issues.apache.org/jira/browse/SPARK-15832
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Ioana Delaney
Priority: Minor


Queries with embedded existential sub-query predicates throws exception when 
building the physical plan.

Example failing query:

{code}
scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t1")
scala> Seq((1, 1), (2, 2)).toDF("c1", "c2").createOrReplaceTempView("t2")
scala> sql("select c1 from t1 where (case when c2 in (select c2 from t2) then 2 
else 3 end) IN (select c2 from t1)").show()

Binding attribute, tree: c2#239
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
attribute, tree: c2#239
  at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:50)
  at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88)

  ...
  at 
org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:87)
  at 
org.apache.spark.sql.execution.joins.HashJoin$$anonfun$4.apply(HashJoin.scala:66)
  at 
org.apache.spark.sql.execution.joins.HashJoin$$anonfun$4.apply(HashJoin.scala:66)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.immutable.List.map(List.scala:285)
  at 
org.apache.spark.sql.execution.joins.HashJoin$class.org$apache$spark$sql$execution$joins$HashJoin$$x$8(HashJoin.scala:66)
  at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$8$lzycompute(BroadcastHashJoinExec.scala:38)
  at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.org$apache$spark$sql$execution$joins$HashJoin$$x$8(BroadcastHashJoinExec.scala:38)
  at 
org.apache.spark.sql.execution.joins.HashJoin$class.buildKeys(HashJoin.scala:63)
  at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys$lzycompute(BroadcastHashJoinExec.scala:38)
  at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.buildKeys(BroadcastHashJoinExec.scala:38)
  at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.requiredChildDistribution(BroadcastHashJoinExec.scala:52)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15791) NPE in ScalarSubquery

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15791:


Assignee: Eric Liang  (was: Apache Spark)

> NPE in ScalarSubquery
> -
>
> Key: SPARK-15791
> URL: https://issues.apache.org/jira/browse/SPARK-15791
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Eric Liang
>
> {code}
> Job aborted due to stage failure: Task 0 in stage 146.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 146.0 (TID 48828, 10.0.206.208): 
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.dataType(subquery.scala:45)
>   at 
> org.apache.spark.sql.catalyst.expressions.CaseWhenBase.dataType(conditionalExpressions.scala:103)
>   at 
> org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:165)
>   at 
> org.apache.spark.sql.execution.ProjectExec$$anonfun$output$1.apply(basicPhysicalOperators.scala:33)
>   at 
> org.apache.spark.sql.execution.ProjectExec$$anonfun$output$1.apply(basicPhysicalOperators.scala:33)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.execution.ProjectExec.output(basicPhysicalOperators.scala:33)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.output(WholeStageCodegenExec.scala:291)
>   at 
> org.apache.spark.sql.execution.DeserializeToObjectExec$$anonfun$2.apply(objects.scala:85)
>   at 
> org.apache.spark.sql.execution.DeserializeToObjectExec$$anonfun$2.apply(objects.scala:84)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15791) NPE in ScalarSubquery

2016-06-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321574#comment-15321574
 ] 

Apache Spark commented on SPARK-15791:
--

User 'ericl' has created a pull request for this issue:
https://github.com/apache/spark/pull/13569

> NPE in ScalarSubquery
> -
>
> Key: SPARK-15791
> URL: https://issues.apache.org/jira/browse/SPARK-15791
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Eric Liang
>
> {code}
> Job aborted due to stage failure: Task 0 in stage 146.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 146.0 (TID 48828, 10.0.206.208): 
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.dataType(subquery.scala:45)
>   at 
> org.apache.spark.sql.catalyst.expressions.CaseWhenBase.dataType(conditionalExpressions.scala:103)
>   at 
> org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:165)
>   at 
> org.apache.spark.sql.execution.ProjectExec$$anonfun$output$1.apply(basicPhysicalOperators.scala:33)
>   at 
> org.apache.spark.sql.execution.ProjectExec$$anonfun$output$1.apply(basicPhysicalOperators.scala:33)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.execution.ProjectExec.output(basicPhysicalOperators.scala:33)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.output(WholeStageCodegenExec.scala:291)
>   at 
> org.apache.spark.sql.execution.DeserializeToObjectExec$$anonfun$2.apply(objects.scala:85)
>   at 
> org.apache.spark.sql.execution.DeserializeToObjectExec$$anonfun$2.apply(objects.scala:84)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15791) NPE in ScalarSubquery

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15791:


Assignee: Apache Spark  (was: Eric Liang)

> NPE in ScalarSubquery
> -
>
> Key: SPARK-15791
> URL: https://issues.apache.org/jira/browse/SPARK-15791
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Apache Spark
>
> {code}
> Job aborted due to stage failure: Task 0 in stage 146.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 146.0 (TID 48828, 10.0.206.208): 
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.ScalarSubquery.dataType(subquery.scala:45)
>   at 
> org.apache.spark.sql.catalyst.expressions.CaseWhenBase.dataType(conditionalExpressions.scala:103)
>   at 
> org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:165)
>   at 
> org.apache.spark.sql.execution.ProjectExec$$anonfun$output$1.apply(basicPhysicalOperators.scala:33)
>   at 
> org.apache.spark.sql.execution.ProjectExec$$anonfun$output$1.apply(basicPhysicalOperators.scala:33)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.execution.ProjectExec.output(basicPhysicalOperators.scala:33)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.output(WholeStageCodegenExec.scala:291)
>   at 
> org.apache.spark.sql.execution.DeserializeToObjectExec$$anonfun$2.apply(objects.scala:85)
>   at 
> org.apache.spark.sql.execution.DeserializeToObjectExec$$anonfun$2.apply(objects.scala:84)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11695) Set s3a credentials by default similarly to s3 and s3n

2016-06-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321539#comment-15321539
 ] 

Steve Loughran commented on SPARK-11695:


There's some interesting ramifications to this code: it means that if the env 
vars are set then they overwrite any value in core-default.xml. It's also going 
to slightly complicate the workings of HADOOP-12807; now that the AWS env vars 
are being picked up, there's a whole set of config options which ought to be 
handled together. The session token is the big one. If that var is set, then 
fixing up the fs.s3a things will stop operations working.

http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html#cli-environment




> Set s3a credentials by default similarly to s3 and s3n
> --
>
> Key: SPARK-11695
> URL: https://issues.apache.org/jira/browse/SPARK-11695
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Chris Bannister
>Assignee: Chris Bannister
>Priority: Trivial
> Fix For: 1.6.0
>
>
> When creating a new hadoop configuration Spark sets s3 and s3n credentials if 
> the environment variables are set, it should also add s3a.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15743) Prevent saving with all-column partitioning

2016-06-08 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-15743:
-
Labels: releasenotes  (was: )

> Prevent saving with all-column partitioning
> ---
>
> Key: SPARK-15743
> URL: https://issues.apache.org/jira/browse/SPARK-15743
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Dongjoon Hyun
>  Labels: releasenotes
>
> When saving datasets on storage, `partitionBy` provides an easy way to 
> construct the directory structure. However, if a user choose all columns as 
> partition columns, some exceptions occurs.
> - ORC: `AnalysisException` on **future read** due to schema inference failure.
> - Parquet: `InvalidSchemaException` on **write execution** due to Parquet 
> limitation.
> The followings are the examples.
> **ORC with all column partitioning**
> {code}
> scala> 
> spark.range(10).write.format("orc").mode("overwrite").partitionBy("id").save("/tmp/data")
>   
>   
> scala> spark.read.format("orc").load("/tmp/data").collect()
> org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC at 
> /tmp/data. It must be specified manually;
> {code}
> **Parquet with all-column partitioning**
> {code}
> scala> 
> spark.range(100).write.format("parquet").mode("overwrite").partitionBy("id").save("/tmp/data")
> [Stage 0:>  (0 + 8) / 
> 8]16/06/02 16:51:17 ERROR Utils: Aborting task
> org.apache.parquet.schema.InvalidSchemaException: A group type can not be 
> empty. Parquet does not support empty group without leaves. Empty group: 
> spark_schema
> ... (lots of error messages)
> {code}
> Although some formats like JSON support all-column partitioning without any 
> problem, it seems not a good idea to make lots of empty directories. 
> This issue prevents this by consistently raising `AnalysisException` before 
> saving. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15830) Spark application should get hive tokens only when it is required

2016-06-08 Thread Yesha Vora (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated SPARK-15830:
---
Summary: Spark application should get hive tokens only when it is required  
(was: Spark application should get hive tokens only when needed)

> Spark application should get hive tokens only when it is required
> -
>
> Key: SPARK-15830
> URL: https://issues.apache.org/jira/browse/SPARK-15830
> Project: Spark
>  Issue Type: Improvement
>Reporter: Yesha Vora
>
> Currently , All spark application try to get Hive tokens (Even if application 
> does not use them) if Hive is installed on the cluster.
> Due to this practice, spark application which does not require Hive fails 
> when Hive service (metastore) is down for some reason.
> Thus, spark should only try to get Hive tokens when required. It should not 
> fetch hive token if it is not needed by application.
> Example : Spark Pi application does not perform any hive related actions. But 
> Spark Pi application still fails if hive metastore service is down.
> {code}
> 16/06/08 01:18:42 INFO YarnSparkHadoopUtil: getting token for namenode: 
> hdfs://xxx:8020/user/xx/.sparkStaging/application_1465347287950_0001
> 16/06/08 01:18:42 INFO DFSClient: Created HDFS_DELEGATION_TOKEN token 7 for 
> xx on xx.xx.xx.xxx:8020
> 16/06/08 01:18:43 INFO metastore: Trying to connect to metastore with URI 
> thrift://xx.xx.xx.xxx:9090
> 16/06/08 01:18:43 WARN metastore: Failed to connect to the MetaStore Server...
> 16/06/08 01:18:43 INFO metastore: Waiting 5 seconds before next connection 
> attempt.
> 16/06/08 01:18:48 INFO metastore: Trying to connect to metastore with URI 
> thrift://xx.xx.xx.xxx:9090
> 16/06/08 01:18:48 WARN metastore: Failed to connect to the MetaStore Server...
> 16/06/08 01:18:48 INFO metastore: Waiting 5 seconds before next connection 
> attempt.
> 16/06/08 01:18:53 INFO metastore: Trying to connect to metastore with URI 
> thrift://xx.xx.xx.xxx:9090
> 16/06/08 01:18:53 WARN metastore: Failed to connect to the MetaStore Server...
> 16/06/08 01:18:53 INFO metastore: Waiting 5 seconds before next connection 
> attempt.
> 16/06/08 01:18:59 WARN Hive: Failed to access metastore. This class should 
> not accessed in runtime.
> org.apache.hadoop.hive.ql.metadata.Hive Exception : java.lang.Runtime 
> Exception : Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
> at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
> at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15831) Kryo 2.21 TreeMap serialization bug causes random job failures with RDDs of HBase puts

2016-06-08 Thread JIRA
Charles Gariépy-Ikeson created SPARK-15831:
--

 Summary: Kryo 2.21 TreeMap serialization bug causes random job 
failures with RDDs of HBase puts
 Key: SPARK-15831
 URL: https://issues.apache.org/jira/browse/SPARK-15831
 Project: Spark
  Issue Type: Bug
Reporter: Charles Gariépy-Ikeson


This was found on Spark 1.5, but it seems that all Spark 1.x brings in the 
problematic dependency in question.

Kryo 2.21 has a bug when serializing TreeMap that causes intermittent failures 
in Spark. This problem cause be seen especially when sinking data to HBase 
using a RDD of HBase Puts (which internally have TreeMap).

Kryo fixed the issue in 2.21.1. Current work around involves setting 
"spark.kryo.referenceTracking" to false.

For reference see:
Kryo commit: 
https://github.com/EsotericSoftware/kryo/commit/00ffc7ed443e022a8438d1e4c4f5b86fe4f9912b
TreeMap Kryo Issue: https://github.com/EsotericSoftware/kryo/issues/112
HBase Put Kryo Issue: https://github.com/EsotericSoftware/kryo/issues/428



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15830) Spark application should get hive tokens only when needed

2016-06-08 Thread Yesha Vora (JIRA)
Yesha Vora created SPARK-15830:
--

 Summary: Spark application should get hive tokens only when needed
 Key: SPARK-15830
 URL: https://issues.apache.org/jira/browse/SPARK-15830
 Project: Spark
  Issue Type: Improvement
Reporter: Yesha Vora


Currently , All spark application try to get Hive tokens (Even if application 
does not use them) if Hive is installed on the cluster.

Due to this practice, spark application which does not require Hive fails when 
Hive service (metastore) is down for some reason.

Thus, spark should only try to get Hive tokens when required. It should not 
fetch hive token if it is not needed by application.

Example : Spark Pi application does not perform any hive related actions. But 
Spark Pi application still fails if hive metastore service is down.
{code}
16/06/08 01:18:42 INFO YarnSparkHadoopUtil: getting token for namenode: 
hdfs://xxx:8020/user/xx/.sparkStaging/application_1465347287950_0001
16/06/08 01:18:42 INFO DFSClient: Created HDFS_DELEGATION_TOKEN token 7 for xx 
on xx.xx.xx.xxx:8020
16/06/08 01:18:43 INFO metastore: Trying to connect to metastore with URI 
thrift://xx.xx.xx.xxx:9090
16/06/08 01:18:43 WARN metastore: Failed to connect to the MetaStore Server...
16/06/08 01:18:43 INFO metastore: Waiting 5 seconds before next connection 
attempt.
16/06/08 01:18:48 INFO metastore: Trying to connect to metastore with URI 
thrift://xx.xx.xx.xxx:9090
16/06/08 01:18:48 WARN metastore: Failed to connect to the MetaStore Server...
16/06/08 01:18:48 INFO metastore: Waiting 5 seconds before next connection 
attempt.
16/06/08 01:18:53 INFO metastore: Trying to connect to metastore with URI 
thrift://xx.xx.xx.xxx:9090
16/06/08 01:18:53 WARN metastore: Failed to connect to the MetaStore Server...
16/06/08 01:18:53 INFO metastore: Waiting 5 seconds before next connection 
attempt.
16/06/08 01:18:59 WARN Hive: Failed to access metastore. This class should not 
accessed in runtime.
org.apache.hadoop.hive.ql.metadata.Hive Exception : java.lang.Runtime Exception 
: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15829) spark master webpage links to application UI broke when running in cluster mode

2016-06-08 Thread Andrew Davidson (JIRA)
Andrew Davidson created SPARK-15829:
---

 Summary: spark master webpage links to application UI broke when 
running in cluster mode
 Key: SPARK-15829
 URL: https://issues.apache.org/jira/browse/SPARK-15829
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.6.1
 Environment: AWS ec2 cluster
Reporter: Andrew Davidson
Priority: Critical


Hi 
I created a cluster using the spark-1.6.1-bin-hadoop2.6/ec2/spark-ec2

I use the stand alone cluster manager. I have a streaming app running in 
cluster mode. I notice the master webpage links to the application UI page are 
incorrect

It does not look like jira will let my upload images. I'll try and describe the 
web pages and the bug

My master is running on
http://ec2-54-215-230-73.us-west-1.compute.amazonaws.com:8080/

It has a section marked "applications". If I click on one of the running 
application ids I am taken to a page showing "Executor Summary".  This page has 
a link to teh 'application detail UI'  the url is 
http://ec2-54-215-230-73.us-west-1.compute.amazonaws.com:4041/

Notice it things the application UI is running on the cluster master.

It is actually running on the same machine as the driver on port 4041. I was 
able to reverse engine the url by noticing the private ip address is part of 
the worker id . For example   worker-20160322041632-172.31.23.201-34909

next I went on the aws ec2 console to find the public DNS name for this machine 
http://ec2-54-193-104-169.us-west-1.compute.amazonaws.com:4041/streaming/

Kind regards

Andy




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15828) YARN is not aware of Spark's External Shuffle Service

2016-06-08 Thread Miles Crawford (JIRA)
Miles Crawford created SPARK-15828:
--

 Summary: YARN is not aware of Spark's External Shuffle Service
 Key: SPARK-15828
 URL: https://issues.apache.org/jira/browse/SPARK-15828
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.6.1
 Environment: EMR
Reporter: Miles Crawford


When using Spark with dynamic allocation, it is common for all containers on a
particular YARN node to be released.  This is generally okay because of the
external shuffle service.

The problem arises when YARN is attempting to downsize the cluster - once all
containers on the node are gone, YARN will decommission the node, regardless of
whether the external shuffle service is still required!

The once the node is shut down, jobs begin failing with messages such as:
```
2016-06-07 18:56:40,016 ERROR o.a.s.n.shuffle.RetryingBlockFetcher: Exception 
while beginning fetch of 13 outstanding blocks
java.io.IOException: Failed to connect to 
ip-10-12-32-67.us-west-2.compute.internal/10.12.32.67:7337
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
 
~[d58092b50d2880a1c259cb51c6ed83955f97e34a4b75cedaa8ab00f89a09df50-spark-network-common_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
 
~[d58092b50d2880a1c259cb51c6ed83955f97e34a4b75cedaa8ab00f89a09df50-spark-network-common_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.network.shuffle.ExternalShuffleClient$1.createAndStart(ExternalShuffleClient.java:105)
 
~[2d5c6a1b64d0070faea2e852616885c0110121f4f5c3206cbde88946abce11c3-spark-network-shuffle_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
 
[2d5c6a1b64d0070faea2e852616885c0110121f4f5c3206cbde88946abce11c3-spark-network-shuffle_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
 
[2d5c6a1b64d0070faea2e852616885c0110121f4f5c3206cbde88946abce11c3-spark-network-shuffle_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.network.shuffle.ExternalShuffleClient.fetchBlocks(ExternalShuffleClient.java:114)
 
[2d5c6a1b64d0070faea2e852616885c0110121f4f5c3206cbde88946abce11c3-spark-network-shuffle_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:152)
 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchUpToMaxBytes(ShuffleBlockFetcherIterator.scala:316)
 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:263)
 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.(ShuffleBlockFetcherIterator.scala:112)
 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:43)
 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:98) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 

[jira] [Updated] (SPARK-15828) YARN is not aware of Spark's External Shuffle Service

2016-06-08 Thread Miles Crawford (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miles Crawford updated SPARK-15828:
---
Description: 
When using Spark with dynamic allocation, it is common for all containers on a
particular YARN node to be released.  This is generally okay because of the
external shuffle service.

The problem arises when YARN is attempting to downsize the cluster - once all
containers on the node are gone, YARN will decommission the node, regardless of
whether the external shuffle service is still required!

The once the node is shut down, jobs begin failing with messages such as:
{code}
2016-06-07 18:56:40,016 ERROR o.a.s.n.shuffle.RetryingBlockFetcher: Exception 
while beginning fetch of 13 outstanding blocks
java.io.IOException: Failed to connect to 
ip-10-12-32-67.us-west-2.compute.internal/10.12.32.67:7337
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
 
~[d58092b50d2880a1c259cb51c6ed83955f97e34a4b75cedaa8ab00f89a09df50-spark-network-common_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
 
~[d58092b50d2880a1c259cb51c6ed83955f97e34a4b75cedaa8ab00f89a09df50-spark-network-common_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.network.shuffle.ExternalShuffleClient$1.createAndStart(ExternalShuffleClient.java:105)
 
~[2d5c6a1b64d0070faea2e852616885c0110121f4f5c3206cbde88946abce11c3-spark-network-shuffle_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
 
[2d5c6a1b64d0070faea2e852616885c0110121f4f5c3206cbde88946abce11c3-spark-network-shuffle_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
 
[2d5c6a1b64d0070faea2e852616885c0110121f4f5c3206cbde88946abce11c3-spark-network-shuffle_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.network.shuffle.ExternalShuffleClient.fetchBlocks(ExternalShuffleClient.java:114)
 
[2d5c6a1b64d0070faea2e852616885c0110121f4f5c3206cbde88946abce11c3-spark-network-shuffle_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:152)
 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchUpToMaxBytes(ShuffleBlockFetcherIterator.scala:316)
 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:263)
 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.(ShuffleBlockFetcherIterator.scala:112)
 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:43)
 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:98) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
[d56f3336b4a0fcc71fe8beb90052dbafd0e88a749bdb4bbb15d37894cf443364-spark-core_2.11-1.6.1.jar:1.6.1]
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 

[jira] [Commented] (SPARK-12712) test-dependencies.sh script fails when run against empty .m2 cache

2016-06-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321415#comment-15321415
 ] 

Apache Spark commented on SPARK-12712:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/13568

> test-dependencies.sh script fails when run against empty .m2 cache
> --
>
> Key: SPARK-12712
> URL: https://issues.apache.org/jira/browse/SPARK-12712
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Reporter: Stavros Kontopoulos
>Assignee: Josh Rosen
>
> Test-dependencies.sh script fails.
> This relates to this https://github.com/apache/spark/pull/10461
> Check failure here:
> https://ci.typesafe.com/job/ghprb-spark-multi-conf/label=Spark-Ora-JDK7-PV,scala_version=2.10/84/console
> My pr does not change dependencies shouldnt the pr manifest be generated with 
> full dependencies it seems empty. should i use replace-manifest?
> Reproducing it locally on that jenkins instance i get this: 
> Spark's published dependencies DO NOT MATCH the manifest file 
> (dev/spark-deps).
> To update the manifest file, run './dev/test-dependencies.sh 
> --replace-manifest'.
> diff --git a/dev/deps/spark-deps-hadoop-2.6 
> b/dev/pr-deps/spark-deps-hadoop-2.6
> index e703c7a..3aa2c38 100644
> --- a/dev/deps/spark-deps-hadoop-2.6
> +++ b/dev/pr-deps/spark-deps-hadoop-2.6
> @@ -1,190 +1,2 @@
> -JavaEWAH-0.3.2.jar
> -RoaringBitmap-0.5.11.jar
> -ST4-4.0.4.jar
> -activation-1.1.1.jar
> -akka-actor_2.10-2.3.11.jar
> -akka-remote_2.10-2.3.11.jar
> -akka-slf4j_2.10-2.3.11.jar
> -antlr-runtime-3.5.2.jar
> -aopalliance-1.0.jar
> -apache-log4j-extras-1.2.17.jar
> -apacheds-i18n-2.0.0-M15.jar
> -apacheds-kerberos-codec-2.0.0-M15.jar
> -api-asn1-api-1.0.0-M20.jar
> -api-util-1.0.0-M20.jar
> -arpack_combined_all-0.1.jar
> -asm-3.1.jar
> -asm-commons-3.1.jar
> -asm-tree-3.1.jar
> -avro-1.7.7.jar
> -avro-ipc-1.7.7-tests.jar
> -avro-ipc-1.7.7.jar
> -avro-mapred-1.7.7-hadoop2.jar
> -base64-2.3.8.jar
> -bcprov-jdk15on-1.51.jar
> -bonecp-0.8.0.RELEASE.jar
> -breeze-macros_2.10-0.11.2.jar
> -breeze_2.10-0.11.2.jar
> -calcite-avatica-1.2.0-incubating.jar
> -calcite-core-1.2.0-incubating.jar
> -calcite-linq4j-1.2.0-incubating.jar
> -chill-java-0.5.0.jar
> -chill_2.10-0.5.0.jar
> -commons-beanutils-1.7.0.jar
> -commons-beanutils-core-1.8.0.jar
> -commons-cli-1.2.jar
> -commons-codec-1.10.jar
> -commons-collections-3.2.2.jar
> -commons-compiler-2.7.6.jar
> -commons-compress-1.4.1.jar
> -commons-configuration-1.6.jar
> -commons-dbcp-1.4.jar
> -commons-digester-1.8.jar
> -commons-httpclient-3.1.jar
> -commons-io-2.4.jar
> -commons-lang-2.6.jar
> -commons-lang3-3.3.2.jar
> -commons-logging-1.1.3.jar
> -commons-math3-3.4.1.jar
> -commons-net-2.2.jar
> -commons-pool-1.5.4.jar
> -compress-lzf-1.0.3.jar
> -config-1.2.1.jar
> -core-1.1.2.jar
> -curator-client-2.6.0.jar
> -curator-framework-2.6.0.jar
> -curator-recipes-2.6.0.jar
> -datanucleus-api-jdo-3.2.6.jar
> -datanucleus-core-3.2.10.jar
> -datanucleus-rdbms-3.2.9.jar
> -derby-10.10.1.1.jar
> -eigenbase-properties-1.1.5.jar
> -geronimo-annotation_1.0_spec-1.1.1.jar
> -geronimo-jaspic_1.0_spec-1.0.jar
> -geronimo-jta_1.1_spec-1.1.1.jar
> -groovy-all-2.1.6.jar
> -gson-2.2.4.jar
> -guice-3.0.jar
> -guice-servlet-3.0.jar
> -hadoop-annotations-2.6.0.jar
> -hadoop-auth-2.6.0.jar
> -hadoop-client-2.6.0.jar
> -hadoop-common-2.6.0.jar
> -hadoop-hdfs-2.6.0.jar
> -hadoop-mapreduce-client-app-2.6.0.jar
> -hadoop-mapreduce-client-common-2.6.0.jar
> -hadoop-mapreduce-client-core-2.6.0.jar
> -hadoop-mapreduce-client-jobclient-2.6.0.jar
> -hadoop-mapreduce-client-shuffle-2.6.0.jar
> -hadoop-yarn-api-2.6.0.jar
> -hadoop-yarn-client-2.6.0.jar
> -hadoop-yarn-common-2.6.0.jar
> -hadoop-yarn-server-common-2.6.0.jar
> -hadoop-yarn-server-web-proxy-2.6.0.jar
> -htrace-core-3.0.4.jar
> -httpclient-4.3.2.jar
> -httpcore-4.3.2.jar
> -ivy-2.4.0.jar
> -jackson-annotations-2.4.4.jar
> -jackson-core-2.4.4.jar
> -jackson-core-asl-1.9.13.jar
> -jackson-databind-2.4.4.jar
> -jackson-jaxrs-1.9.13.jar
> -jackson-mapper-asl-1.9.13.jar
> -jackson-module-scala_2.10-2.4.4.jar
> -jackson-xc-1.9.13.jar
> -janino-2.7.8.jar
> -jansi-1.4.jar
> -java-xmlbuilder-1.0.jar
> -javax.inject-1.jar
> -javax.servlet-3.0.0.v201112011016.jar
> -javolution-5.5.1.jar
> -jaxb-api-2.2.2.jar
> -jaxb-impl-2.2.3-1.jar
> -jcl-over-slf4j-1.7.10.jar
> -jdo-api-3.0.1.jar
> -jersey-client-1.9.jar
> -jersey-core-1.9.jar
> -jersey-guice-1.9.jar
> -jersey-json-1.9.jar
> -jersey-server-1.9.jar
> -jets3t-0.9.3.jar
> -jettison-1.1.jar
> -jetty-6.1.26.jar
> -jetty-all-7.6.0.v20120127.jar
> -jetty-util-6.1.26.jar
> -jline-2.10.5.jar
> -jline-2.12.jar
> -joda-time-2.9.jar
> -jodd-core-3.5.2.jar
> -jpam-1.1.jar
> -json-20090211.jar
> -json4s-ast_2.10-3.2.10.jar
> 

[jira] [Updated] (SPARK-15807) Support varargs for dropDuplicates in Dataset/DataFrame

2016-06-08 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-15807:
--
Description: 
This issue adds `varargs`-types `dropDuplicates` functions in 
`Dataset/DataFrame`. Currently, `dropDuplicates` supports only `Seq` or `Array`.

{code}
scala> val ds = spark.createDataFrame(Seq(("a", 1), ("b", 2), ("a", 2)))
ds: org.apache.spark.sql.DataFrame = [_1: string, _2: int]

scala> ds.dropDuplicates(Seq("_1", "_2"))
res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, _2: 
int]

scala> ds.dropDuplicates("_1", "_2")
:26: error: overloaded method value dropDuplicates with alternatives:
  (colNames: 
Array[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] 
  (colNames: Seq[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] 

  ()org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
 cannot be applied to (String, String)
   ds.dropDuplicates("_1", "_2")
  ^
{code}

  was:
This issue adds `varargs`-types `distinct/dropDuplicates` functions in 
`Dataset/DataFrame`. Currently, `distinct` does not get arguments, and 
`dropDuplicates` supports only `Seq` or `Array`.

{code}
scala> val ds = spark.createDataFrame(Seq(("a", 1), ("b", 2), ("a", 2)))
ds: org.apache.spark.sql.DataFrame = [_1: string, _2: int]

scala> ds.dropDuplicates(Seq("_1", "_2"))
res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, _2: 
int]

scala> ds.dropDuplicates("_1", "_2")
:26: error: overloaded method value dropDuplicates with alternatives:
  (colNames: 
Array[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] 
  (colNames: Seq[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] 

  ()org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
 cannot be applied to (String, String)
   ds.dropDuplicates("_1", "_2")
  ^

scala> ds.distinct("_1", "_2")
:26: error: too many arguments for method distinct: 
()org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
   ds.distinct("_1", "_2")
{code}

Summary: Support varargs for dropDuplicates in Dataset/DataFrame  (was: 
Support varargs for distinct/dropDuplicates in Dataset/DataFrame)

> Support varargs for dropDuplicates in Dataset/DataFrame
> ---
>
> Key: SPARK-15807
> URL: https://issues.apache.org/jira/browse/SPARK-15807
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Dongjoon Hyun
>
> This issue adds `varargs`-types `dropDuplicates` functions in 
> `Dataset/DataFrame`. Currently, `dropDuplicates` supports only `Seq` or 
> `Array`.
> {code}
> scala> val ds = spark.createDataFrame(Seq(("a", 1), ("b", 2), ("a", 2)))
> ds: org.apache.spark.sql.DataFrame = [_1: string, _2: int]
> scala> ds.dropDuplicates(Seq("_1", "_2"))
> res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, 
> _2: int]
> scala> ds.dropDuplicates("_1", "_2")
> :26: error: overloaded method value dropDuplicates with alternatives:
>   (colNames: 
> Array[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] 
>   (colNames: 
> Seq[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] 
>   ()org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
>  cannot be applied to (String, String)
>ds.dropDuplicates("_1", "_2")
>   ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12712) test-dependencies.sh script fails when run against empty .m2 cache

2016-06-08 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321402#comment-15321402
 ] 

Josh Rosen commented on SPARK-12712:


Minimal local reproduction:

{code}
rm -rf ~/.m2/repository/org/apache/commons/
./dev/test-dependencies.sh
{code}

> test-dependencies.sh script fails when run against empty .m2 cache
> --
>
> Key: SPARK-12712
> URL: https://issues.apache.org/jira/browse/SPARK-12712
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Reporter: Stavros Kontopoulos
>Assignee: Josh Rosen
>
> Test-dependencies.sh script fails.
> This relates to this https://github.com/apache/spark/pull/10461
> Check failure here:
> https://ci.typesafe.com/job/ghprb-spark-multi-conf/label=Spark-Ora-JDK7-PV,scala_version=2.10/84/console
> My pr does not change dependencies shouldnt the pr manifest be generated with 
> full dependencies it seems empty. should i use replace-manifest?
> Reproducing it locally on that jenkins instance i get this: 
> Spark's published dependencies DO NOT MATCH the manifest file 
> (dev/spark-deps).
> To update the manifest file, run './dev/test-dependencies.sh 
> --replace-manifest'.
> diff --git a/dev/deps/spark-deps-hadoop-2.6 
> b/dev/pr-deps/spark-deps-hadoop-2.6
> index e703c7a..3aa2c38 100644
> --- a/dev/deps/spark-deps-hadoop-2.6
> +++ b/dev/pr-deps/spark-deps-hadoop-2.6
> @@ -1,190 +1,2 @@
> -JavaEWAH-0.3.2.jar
> -RoaringBitmap-0.5.11.jar
> -ST4-4.0.4.jar
> -activation-1.1.1.jar
> -akka-actor_2.10-2.3.11.jar
> -akka-remote_2.10-2.3.11.jar
> -akka-slf4j_2.10-2.3.11.jar
> -antlr-runtime-3.5.2.jar
> -aopalliance-1.0.jar
> -apache-log4j-extras-1.2.17.jar
> -apacheds-i18n-2.0.0-M15.jar
> -apacheds-kerberos-codec-2.0.0-M15.jar
> -api-asn1-api-1.0.0-M20.jar
> -api-util-1.0.0-M20.jar
> -arpack_combined_all-0.1.jar
> -asm-3.1.jar
> -asm-commons-3.1.jar
> -asm-tree-3.1.jar
> -avro-1.7.7.jar
> -avro-ipc-1.7.7-tests.jar
> -avro-ipc-1.7.7.jar
> -avro-mapred-1.7.7-hadoop2.jar
> -base64-2.3.8.jar
> -bcprov-jdk15on-1.51.jar
> -bonecp-0.8.0.RELEASE.jar
> -breeze-macros_2.10-0.11.2.jar
> -breeze_2.10-0.11.2.jar
> -calcite-avatica-1.2.0-incubating.jar
> -calcite-core-1.2.0-incubating.jar
> -calcite-linq4j-1.2.0-incubating.jar
> -chill-java-0.5.0.jar
> -chill_2.10-0.5.0.jar
> -commons-beanutils-1.7.0.jar
> -commons-beanutils-core-1.8.0.jar
> -commons-cli-1.2.jar
> -commons-codec-1.10.jar
> -commons-collections-3.2.2.jar
> -commons-compiler-2.7.6.jar
> -commons-compress-1.4.1.jar
> -commons-configuration-1.6.jar
> -commons-dbcp-1.4.jar
> -commons-digester-1.8.jar
> -commons-httpclient-3.1.jar
> -commons-io-2.4.jar
> -commons-lang-2.6.jar
> -commons-lang3-3.3.2.jar
> -commons-logging-1.1.3.jar
> -commons-math3-3.4.1.jar
> -commons-net-2.2.jar
> -commons-pool-1.5.4.jar
> -compress-lzf-1.0.3.jar
> -config-1.2.1.jar
> -core-1.1.2.jar
> -curator-client-2.6.0.jar
> -curator-framework-2.6.0.jar
> -curator-recipes-2.6.0.jar
> -datanucleus-api-jdo-3.2.6.jar
> -datanucleus-core-3.2.10.jar
> -datanucleus-rdbms-3.2.9.jar
> -derby-10.10.1.1.jar
> -eigenbase-properties-1.1.5.jar
> -geronimo-annotation_1.0_spec-1.1.1.jar
> -geronimo-jaspic_1.0_spec-1.0.jar
> -geronimo-jta_1.1_spec-1.1.1.jar
> -groovy-all-2.1.6.jar
> -gson-2.2.4.jar
> -guice-3.0.jar
> -guice-servlet-3.0.jar
> -hadoop-annotations-2.6.0.jar
> -hadoop-auth-2.6.0.jar
> -hadoop-client-2.6.0.jar
> -hadoop-common-2.6.0.jar
> -hadoop-hdfs-2.6.0.jar
> -hadoop-mapreduce-client-app-2.6.0.jar
> -hadoop-mapreduce-client-common-2.6.0.jar
> -hadoop-mapreduce-client-core-2.6.0.jar
> -hadoop-mapreduce-client-jobclient-2.6.0.jar
> -hadoop-mapreduce-client-shuffle-2.6.0.jar
> -hadoop-yarn-api-2.6.0.jar
> -hadoop-yarn-client-2.6.0.jar
> -hadoop-yarn-common-2.6.0.jar
> -hadoop-yarn-server-common-2.6.0.jar
> -hadoop-yarn-server-web-proxy-2.6.0.jar
> -htrace-core-3.0.4.jar
> -httpclient-4.3.2.jar
> -httpcore-4.3.2.jar
> -ivy-2.4.0.jar
> -jackson-annotations-2.4.4.jar
> -jackson-core-2.4.4.jar
> -jackson-core-asl-1.9.13.jar
> -jackson-databind-2.4.4.jar
> -jackson-jaxrs-1.9.13.jar
> -jackson-mapper-asl-1.9.13.jar
> -jackson-module-scala_2.10-2.4.4.jar
> -jackson-xc-1.9.13.jar
> -janino-2.7.8.jar
> -jansi-1.4.jar
> -java-xmlbuilder-1.0.jar
> -javax.inject-1.jar
> -javax.servlet-3.0.0.v201112011016.jar
> -javolution-5.5.1.jar
> -jaxb-api-2.2.2.jar
> -jaxb-impl-2.2.3-1.jar
> -jcl-over-slf4j-1.7.10.jar
> -jdo-api-3.0.1.jar
> -jersey-client-1.9.jar
> -jersey-core-1.9.jar
> -jersey-guice-1.9.jar
> -jersey-json-1.9.jar
> -jersey-server-1.9.jar
> -jets3t-0.9.3.jar
> -jettison-1.1.jar
> -jetty-6.1.26.jar
> -jetty-all-7.6.0.v20120127.jar
> -jetty-util-6.1.26.jar
> -jline-2.10.5.jar
> -jline-2.12.jar
> -joda-time-2.9.jar
> -jodd-core-3.5.2.jar
> -jpam-1.1.jar
> -json-20090211.jar
> -json4s-ast_2.10-3.2.10.jar
> 

[jira] [Updated] (SPARK-12712) test-dependencies.sh script fails when run against empty .m2 cache

2016-06-08 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-12712:
---
Summary: test-dependencies.sh script fails when run against empty .m2 cache 
 (was: test-dependencies.sh fails with difference in manifests)

> test-dependencies.sh script fails when run against empty .m2 cache
> --
>
> Key: SPARK-12712
> URL: https://issues.apache.org/jira/browse/SPARK-12712
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Reporter: Stavros Kontopoulos
>Assignee: Josh Rosen
>
> Test-dependencies.sh script fails.
> This relates to this https://github.com/apache/spark/pull/10461
> Check failure here:
> https://ci.typesafe.com/job/ghprb-spark-multi-conf/label=Spark-Ora-JDK7-PV,scala_version=2.10/84/console
> My pr does not change dependencies shouldnt the pr manifest be generated with 
> full dependencies it seems empty. should i use replace-manifest?
> Reproducing it locally on that jenkins instance i get this: 
> Spark's published dependencies DO NOT MATCH the manifest file 
> (dev/spark-deps).
> To update the manifest file, run './dev/test-dependencies.sh 
> --replace-manifest'.
> diff --git a/dev/deps/spark-deps-hadoop-2.6 
> b/dev/pr-deps/spark-deps-hadoop-2.6
> index e703c7a..3aa2c38 100644
> --- a/dev/deps/spark-deps-hadoop-2.6
> +++ b/dev/pr-deps/spark-deps-hadoop-2.6
> @@ -1,190 +1,2 @@
> -JavaEWAH-0.3.2.jar
> -RoaringBitmap-0.5.11.jar
> -ST4-4.0.4.jar
> -activation-1.1.1.jar
> -akka-actor_2.10-2.3.11.jar
> -akka-remote_2.10-2.3.11.jar
> -akka-slf4j_2.10-2.3.11.jar
> -antlr-runtime-3.5.2.jar
> -aopalliance-1.0.jar
> -apache-log4j-extras-1.2.17.jar
> -apacheds-i18n-2.0.0-M15.jar
> -apacheds-kerberos-codec-2.0.0-M15.jar
> -api-asn1-api-1.0.0-M20.jar
> -api-util-1.0.0-M20.jar
> -arpack_combined_all-0.1.jar
> -asm-3.1.jar
> -asm-commons-3.1.jar
> -asm-tree-3.1.jar
> -avro-1.7.7.jar
> -avro-ipc-1.7.7-tests.jar
> -avro-ipc-1.7.7.jar
> -avro-mapred-1.7.7-hadoop2.jar
> -base64-2.3.8.jar
> -bcprov-jdk15on-1.51.jar
> -bonecp-0.8.0.RELEASE.jar
> -breeze-macros_2.10-0.11.2.jar
> -breeze_2.10-0.11.2.jar
> -calcite-avatica-1.2.0-incubating.jar
> -calcite-core-1.2.0-incubating.jar
> -calcite-linq4j-1.2.0-incubating.jar
> -chill-java-0.5.0.jar
> -chill_2.10-0.5.0.jar
> -commons-beanutils-1.7.0.jar
> -commons-beanutils-core-1.8.0.jar
> -commons-cli-1.2.jar
> -commons-codec-1.10.jar
> -commons-collections-3.2.2.jar
> -commons-compiler-2.7.6.jar
> -commons-compress-1.4.1.jar
> -commons-configuration-1.6.jar
> -commons-dbcp-1.4.jar
> -commons-digester-1.8.jar
> -commons-httpclient-3.1.jar
> -commons-io-2.4.jar
> -commons-lang-2.6.jar
> -commons-lang3-3.3.2.jar
> -commons-logging-1.1.3.jar
> -commons-math3-3.4.1.jar
> -commons-net-2.2.jar
> -commons-pool-1.5.4.jar
> -compress-lzf-1.0.3.jar
> -config-1.2.1.jar
> -core-1.1.2.jar
> -curator-client-2.6.0.jar
> -curator-framework-2.6.0.jar
> -curator-recipes-2.6.0.jar
> -datanucleus-api-jdo-3.2.6.jar
> -datanucleus-core-3.2.10.jar
> -datanucleus-rdbms-3.2.9.jar
> -derby-10.10.1.1.jar
> -eigenbase-properties-1.1.5.jar
> -geronimo-annotation_1.0_spec-1.1.1.jar
> -geronimo-jaspic_1.0_spec-1.0.jar
> -geronimo-jta_1.1_spec-1.1.1.jar
> -groovy-all-2.1.6.jar
> -gson-2.2.4.jar
> -guice-3.0.jar
> -guice-servlet-3.0.jar
> -hadoop-annotations-2.6.0.jar
> -hadoop-auth-2.6.0.jar
> -hadoop-client-2.6.0.jar
> -hadoop-common-2.6.0.jar
> -hadoop-hdfs-2.6.0.jar
> -hadoop-mapreduce-client-app-2.6.0.jar
> -hadoop-mapreduce-client-common-2.6.0.jar
> -hadoop-mapreduce-client-core-2.6.0.jar
> -hadoop-mapreduce-client-jobclient-2.6.0.jar
> -hadoop-mapreduce-client-shuffle-2.6.0.jar
> -hadoop-yarn-api-2.6.0.jar
> -hadoop-yarn-client-2.6.0.jar
> -hadoop-yarn-common-2.6.0.jar
> -hadoop-yarn-server-common-2.6.0.jar
> -hadoop-yarn-server-web-proxy-2.6.0.jar
> -htrace-core-3.0.4.jar
> -httpclient-4.3.2.jar
> -httpcore-4.3.2.jar
> -ivy-2.4.0.jar
> -jackson-annotations-2.4.4.jar
> -jackson-core-2.4.4.jar
> -jackson-core-asl-1.9.13.jar
> -jackson-databind-2.4.4.jar
> -jackson-jaxrs-1.9.13.jar
> -jackson-mapper-asl-1.9.13.jar
> -jackson-module-scala_2.10-2.4.4.jar
> -jackson-xc-1.9.13.jar
> -janino-2.7.8.jar
> -jansi-1.4.jar
> -java-xmlbuilder-1.0.jar
> -javax.inject-1.jar
> -javax.servlet-3.0.0.v201112011016.jar
> -javolution-5.5.1.jar
> -jaxb-api-2.2.2.jar
> -jaxb-impl-2.2.3-1.jar
> -jcl-over-slf4j-1.7.10.jar
> -jdo-api-3.0.1.jar
> -jersey-client-1.9.jar
> -jersey-core-1.9.jar
> -jersey-guice-1.9.jar
> -jersey-json-1.9.jar
> -jersey-server-1.9.jar
> -jets3t-0.9.3.jar
> -jettison-1.1.jar
> -jetty-6.1.26.jar
> -jetty-all-7.6.0.v20120127.jar
> -jetty-util-6.1.26.jar
> -jline-2.10.5.jar
> -jline-2.12.jar
> -joda-time-2.9.jar
> -jodd-core-3.5.2.jar
> -jpam-1.1.jar
> -json-20090211.jar
> -json4s-ast_2.10-3.2.10.jar
> 

[jira] [Updated] (SPARK-12712) test-dependencies.sh script fails when run against empty .m2 cache

2016-06-08 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-12712:
---
Target Version/s: 1.6.2, 2.0.0
 Component/s: Project Infra

> test-dependencies.sh script fails when run against empty .m2 cache
> --
>
> Key: SPARK-12712
> URL: https://issues.apache.org/jira/browse/SPARK-12712
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Reporter: Stavros Kontopoulos
>Assignee: Josh Rosen
>
> Test-dependencies.sh script fails.
> This relates to this https://github.com/apache/spark/pull/10461
> Check failure here:
> https://ci.typesafe.com/job/ghprb-spark-multi-conf/label=Spark-Ora-JDK7-PV,scala_version=2.10/84/console
> My pr does not change dependencies shouldnt the pr manifest be generated with 
> full dependencies it seems empty. should i use replace-manifest?
> Reproducing it locally on that jenkins instance i get this: 
> Spark's published dependencies DO NOT MATCH the manifest file 
> (dev/spark-deps).
> To update the manifest file, run './dev/test-dependencies.sh 
> --replace-manifest'.
> diff --git a/dev/deps/spark-deps-hadoop-2.6 
> b/dev/pr-deps/spark-deps-hadoop-2.6
> index e703c7a..3aa2c38 100644
> --- a/dev/deps/spark-deps-hadoop-2.6
> +++ b/dev/pr-deps/spark-deps-hadoop-2.6
> @@ -1,190 +1,2 @@
> -JavaEWAH-0.3.2.jar
> -RoaringBitmap-0.5.11.jar
> -ST4-4.0.4.jar
> -activation-1.1.1.jar
> -akka-actor_2.10-2.3.11.jar
> -akka-remote_2.10-2.3.11.jar
> -akka-slf4j_2.10-2.3.11.jar
> -antlr-runtime-3.5.2.jar
> -aopalliance-1.0.jar
> -apache-log4j-extras-1.2.17.jar
> -apacheds-i18n-2.0.0-M15.jar
> -apacheds-kerberos-codec-2.0.0-M15.jar
> -api-asn1-api-1.0.0-M20.jar
> -api-util-1.0.0-M20.jar
> -arpack_combined_all-0.1.jar
> -asm-3.1.jar
> -asm-commons-3.1.jar
> -asm-tree-3.1.jar
> -avro-1.7.7.jar
> -avro-ipc-1.7.7-tests.jar
> -avro-ipc-1.7.7.jar
> -avro-mapred-1.7.7-hadoop2.jar
> -base64-2.3.8.jar
> -bcprov-jdk15on-1.51.jar
> -bonecp-0.8.0.RELEASE.jar
> -breeze-macros_2.10-0.11.2.jar
> -breeze_2.10-0.11.2.jar
> -calcite-avatica-1.2.0-incubating.jar
> -calcite-core-1.2.0-incubating.jar
> -calcite-linq4j-1.2.0-incubating.jar
> -chill-java-0.5.0.jar
> -chill_2.10-0.5.0.jar
> -commons-beanutils-1.7.0.jar
> -commons-beanutils-core-1.8.0.jar
> -commons-cli-1.2.jar
> -commons-codec-1.10.jar
> -commons-collections-3.2.2.jar
> -commons-compiler-2.7.6.jar
> -commons-compress-1.4.1.jar
> -commons-configuration-1.6.jar
> -commons-dbcp-1.4.jar
> -commons-digester-1.8.jar
> -commons-httpclient-3.1.jar
> -commons-io-2.4.jar
> -commons-lang-2.6.jar
> -commons-lang3-3.3.2.jar
> -commons-logging-1.1.3.jar
> -commons-math3-3.4.1.jar
> -commons-net-2.2.jar
> -commons-pool-1.5.4.jar
> -compress-lzf-1.0.3.jar
> -config-1.2.1.jar
> -core-1.1.2.jar
> -curator-client-2.6.0.jar
> -curator-framework-2.6.0.jar
> -curator-recipes-2.6.0.jar
> -datanucleus-api-jdo-3.2.6.jar
> -datanucleus-core-3.2.10.jar
> -datanucleus-rdbms-3.2.9.jar
> -derby-10.10.1.1.jar
> -eigenbase-properties-1.1.5.jar
> -geronimo-annotation_1.0_spec-1.1.1.jar
> -geronimo-jaspic_1.0_spec-1.0.jar
> -geronimo-jta_1.1_spec-1.1.1.jar
> -groovy-all-2.1.6.jar
> -gson-2.2.4.jar
> -guice-3.0.jar
> -guice-servlet-3.0.jar
> -hadoop-annotations-2.6.0.jar
> -hadoop-auth-2.6.0.jar
> -hadoop-client-2.6.0.jar
> -hadoop-common-2.6.0.jar
> -hadoop-hdfs-2.6.0.jar
> -hadoop-mapreduce-client-app-2.6.0.jar
> -hadoop-mapreduce-client-common-2.6.0.jar
> -hadoop-mapreduce-client-core-2.6.0.jar
> -hadoop-mapreduce-client-jobclient-2.6.0.jar
> -hadoop-mapreduce-client-shuffle-2.6.0.jar
> -hadoop-yarn-api-2.6.0.jar
> -hadoop-yarn-client-2.6.0.jar
> -hadoop-yarn-common-2.6.0.jar
> -hadoop-yarn-server-common-2.6.0.jar
> -hadoop-yarn-server-web-proxy-2.6.0.jar
> -htrace-core-3.0.4.jar
> -httpclient-4.3.2.jar
> -httpcore-4.3.2.jar
> -ivy-2.4.0.jar
> -jackson-annotations-2.4.4.jar
> -jackson-core-2.4.4.jar
> -jackson-core-asl-1.9.13.jar
> -jackson-databind-2.4.4.jar
> -jackson-jaxrs-1.9.13.jar
> -jackson-mapper-asl-1.9.13.jar
> -jackson-module-scala_2.10-2.4.4.jar
> -jackson-xc-1.9.13.jar
> -janino-2.7.8.jar
> -jansi-1.4.jar
> -java-xmlbuilder-1.0.jar
> -javax.inject-1.jar
> -javax.servlet-3.0.0.v201112011016.jar
> -javolution-5.5.1.jar
> -jaxb-api-2.2.2.jar
> -jaxb-impl-2.2.3-1.jar
> -jcl-over-slf4j-1.7.10.jar
> -jdo-api-3.0.1.jar
> -jersey-client-1.9.jar
> -jersey-core-1.9.jar
> -jersey-guice-1.9.jar
> -jersey-json-1.9.jar
> -jersey-server-1.9.jar
> -jets3t-0.9.3.jar
> -jettison-1.1.jar
> -jetty-6.1.26.jar
> -jetty-all-7.6.0.v20120127.jar
> -jetty-util-6.1.26.jar
> -jline-2.10.5.jar
> -jline-2.12.jar
> -joda-time-2.9.jar
> -jodd-core-3.5.2.jar
> -jpam-1.1.jar
> -json-20090211.jar
> -json4s-ast_2.10-3.2.10.jar
> -json4s-core_2.10-3.2.10.jar
> -json4s-jackson_2.10-3.2.10.jar
> -jsr305-1.3.9.jar
> 

[jira] [Reopened] (SPARK-12712) test-dependencies.sh fails with difference in manifests

2016-06-08 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reopened SPARK-12712:

  Assignee: Josh Rosen

I've managed to reproduce this problem in one of my own CI environments. This 
problem is triggered when Spark's test-dependendencies script runs with an 
initially-empty .m2 cache: extra log output from downloading dependencies 
breaks the regex used in this script.

I have a fix for this and will open a PR shortly.

> test-dependencies.sh fails with difference in manifests
> ---
>
> Key: SPARK-12712
> URL: https://issues.apache.org/jira/browse/SPARK-12712
> Project: Spark
>  Issue Type: Bug
>Reporter: Stavros Kontopoulos
>Assignee: Josh Rosen
>
> Test-dependencies.sh script fails.
> This relates to this https://github.com/apache/spark/pull/10461
> Check failure here:
> https://ci.typesafe.com/job/ghprb-spark-multi-conf/label=Spark-Ora-JDK7-PV,scala_version=2.10/84/console
> My pr does not change dependencies shouldnt the pr manifest be generated with 
> full dependencies it seems empty. should i use replace-manifest?
> Reproducing it locally on that jenkins instance i get this: 
> Spark's published dependencies DO NOT MATCH the manifest file 
> (dev/spark-deps).
> To update the manifest file, run './dev/test-dependencies.sh 
> --replace-manifest'.
> diff --git a/dev/deps/spark-deps-hadoop-2.6 
> b/dev/pr-deps/spark-deps-hadoop-2.6
> index e703c7a..3aa2c38 100644
> --- a/dev/deps/spark-deps-hadoop-2.6
> +++ b/dev/pr-deps/spark-deps-hadoop-2.6
> @@ -1,190 +1,2 @@
> -JavaEWAH-0.3.2.jar
> -RoaringBitmap-0.5.11.jar
> -ST4-4.0.4.jar
> -activation-1.1.1.jar
> -akka-actor_2.10-2.3.11.jar
> -akka-remote_2.10-2.3.11.jar
> -akka-slf4j_2.10-2.3.11.jar
> -antlr-runtime-3.5.2.jar
> -aopalliance-1.0.jar
> -apache-log4j-extras-1.2.17.jar
> -apacheds-i18n-2.0.0-M15.jar
> -apacheds-kerberos-codec-2.0.0-M15.jar
> -api-asn1-api-1.0.0-M20.jar
> -api-util-1.0.0-M20.jar
> -arpack_combined_all-0.1.jar
> -asm-3.1.jar
> -asm-commons-3.1.jar
> -asm-tree-3.1.jar
> -avro-1.7.7.jar
> -avro-ipc-1.7.7-tests.jar
> -avro-ipc-1.7.7.jar
> -avro-mapred-1.7.7-hadoop2.jar
> -base64-2.3.8.jar
> -bcprov-jdk15on-1.51.jar
> -bonecp-0.8.0.RELEASE.jar
> -breeze-macros_2.10-0.11.2.jar
> -breeze_2.10-0.11.2.jar
> -calcite-avatica-1.2.0-incubating.jar
> -calcite-core-1.2.0-incubating.jar
> -calcite-linq4j-1.2.0-incubating.jar
> -chill-java-0.5.0.jar
> -chill_2.10-0.5.0.jar
> -commons-beanutils-1.7.0.jar
> -commons-beanutils-core-1.8.0.jar
> -commons-cli-1.2.jar
> -commons-codec-1.10.jar
> -commons-collections-3.2.2.jar
> -commons-compiler-2.7.6.jar
> -commons-compress-1.4.1.jar
> -commons-configuration-1.6.jar
> -commons-dbcp-1.4.jar
> -commons-digester-1.8.jar
> -commons-httpclient-3.1.jar
> -commons-io-2.4.jar
> -commons-lang-2.6.jar
> -commons-lang3-3.3.2.jar
> -commons-logging-1.1.3.jar
> -commons-math3-3.4.1.jar
> -commons-net-2.2.jar
> -commons-pool-1.5.4.jar
> -compress-lzf-1.0.3.jar
> -config-1.2.1.jar
> -core-1.1.2.jar
> -curator-client-2.6.0.jar
> -curator-framework-2.6.0.jar
> -curator-recipes-2.6.0.jar
> -datanucleus-api-jdo-3.2.6.jar
> -datanucleus-core-3.2.10.jar
> -datanucleus-rdbms-3.2.9.jar
> -derby-10.10.1.1.jar
> -eigenbase-properties-1.1.5.jar
> -geronimo-annotation_1.0_spec-1.1.1.jar
> -geronimo-jaspic_1.0_spec-1.0.jar
> -geronimo-jta_1.1_spec-1.1.1.jar
> -groovy-all-2.1.6.jar
> -gson-2.2.4.jar
> -guice-3.0.jar
> -guice-servlet-3.0.jar
> -hadoop-annotations-2.6.0.jar
> -hadoop-auth-2.6.0.jar
> -hadoop-client-2.6.0.jar
> -hadoop-common-2.6.0.jar
> -hadoop-hdfs-2.6.0.jar
> -hadoop-mapreduce-client-app-2.6.0.jar
> -hadoop-mapreduce-client-common-2.6.0.jar
> -hadoop-mapreduce-client-core-2.6.0.jar
> -hadoop-mapreduce-client-jobclient-2.6.0.jar
> -hadoop-mapreduce-client-shuffle-2.6.0.jar
> -hadoop-yarn-api-2.6.0.jar
> -hadoop-yarn-client-2.6.0.jar
> -hadoop-yarn-common-2.6.0.jar
> -hadoop-yarn-server-common-2.6.0.jar
> -hadoop-yarn-server-web-proxy-2.6.0.jar
> -htrace-core-3.0.4.jar
> -httpclient-4.3.2.jar
> -httpcore-4.3.2.jar
> -ivy-2.4.0.jar
> -jackson-annotations-2.4.4.jar
> -jackson-core-2.4.4.jar
> -jackson-core-asl-1.9.13.jar
> -jackson-databind-2.4.4.jar
> -jackson-jaxrs-1.9.13.jar
> -jackson-mapper-asl-1.9.13.jar
> -jackson-module-scala_2.10-2.4.4.jar
> -jackson-xc-1.9.13.jar
> -janino-2.7.8.jar
> -jansi-1.4.jar
> -java-xmlbuilder-1.0.jar
> -javax.inject-1.jar
> -javax.servlet-3.0.0.v201112011016.jar
> -javolution-5.5.1.jar
> -jaxb-api-2.2.2.jar
> -jaxb-impl-2.2.3-1.jar
> -jcl-over-slf4j-1.7.10.jar
> -jdo-api-3.0.1.jar
> -jersey-client-1.9.jar
> -jersey-core-1.9.jar
> -jersey-guice-1.9.jar
> -jersey-json-1.9.jar
> -jersey-server-1.9.jar
> -jets3t-0.9.3.jar
> -jettison-1.1.jar
> -jetty-6.1.26.jar
> -jetty-all-7.6.0.v20120127.jar
> -jetty-util-6.1.26.jar
> -jline-2.10.5.jar
> 

[jira] [Commented] (SPARK-11765) Avoid assign UI port between browser unsafe ports (or just 4045: lockd)

2016-06-08 Thread Willy Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321311#comment-15321311
 ] 

Willy Lee commented on SPARK-11765:
---

So users should be picking their own ports hoping they aren't already in use? 
It seems easier to just start from 4140 and increment. The next port blocked by 
webkit/chrome/etc isn't until 6000.

> Avoid assign UI port between browser unsafe ports (or just 4045: lockd)
> ---
>
> Key: SPARK-11765
> URL: https://issues.apache.org/jira/browse/SPARK-11765
> Project: Spark
>  Issue Type: Improvement
>Reporter: Jungtaek Lim
>Priority: Minor
>
> Spark UI port starts on 4040, and UI port is incremented by 1 for every 
> confliction.
> In our use case, we have some drivers running at the same time, which makes 
> UI port to be assigned to 4045, which is treated to unsafe port for chrome 
> and mozilla.
> http://src.chromium.org/viewvc/chrome/trunk/src/net/base/net_util.cc?view=markup
> http://www-archive.mozilla.org/projects/netlib/PortBanning.html#portlist
> We would like to avoid assigning UI to these ports, or just avoid assigning 
> UI port to 4045 which is too close to default port.
> If we'd like to accept this idea, I'm happy to work on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15827) Publish Spark's forked sbt-pom-reader to Maven Central

2016-06-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321248#comment-15321248
 ] 

Apache Spark commented on SPARK-15827:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/13564

> Publish Spark's forked sbt-pom-reader to Maven Central
> --
>
> Key: SPARK-15827
> URL: https://issues.apache.org/jira/browse/SPARK-15827
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> Spark's SBT build currently uses a fork of the sbt-pom-reader plugin but 
> depends on that fork via a SBT subproject which is cloned from 
> https://github.com/scrapcodes/sbt-pom-reader/tree/ignore_artifact_id. This 
> unnecessarily slows down the initial build on fresh machines and is also 
> risky because it risks a build breakage in case that GitHub repository ever 
> changes or is deleted.
> In order to address these issues, I propose to publish a pre-built binary of 
> our forked sbt-pom-reader plugin to Maven Central under the org.spark-project 
> namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15827) Publish Spark's forked sbt-pom-reader to Maven Central

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15827:


Assignee: Apache Spark  (was: Josh Rosen)

> Publish Spark's forked sbt-pom-reader to Maven Central
> --
>
> Key: SPARK-15827
> URL: https://issues.apache.org/jira/browse/SPARK-15827
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>Assignee: Apache Spark
>
> Spark's SBT build currently uses a fork of the sbt-pom-reader plugin but 
> depends on that fork via a SBT subproject which is cloned from 
> https://github.com/scrapcodes/sbt-pom-reader/tree/ignore_artifact_id. This 
> unnecessarily slows down the initial build on fresh machines and is also 
> risky because it risks a build breakage in case that GitHub repository ever 
> changes or is deleted.
> In order to address these issues, I propose to publish a pre-built binary of 
> our forked sbt-pom-reader plugin to Maven Central under the org.spark-project 
> namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15678) Not use cache on appends and overwrites

2016-06-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321249#comment-15321249
 ] 

Apache Spark commented on SPARK-15678:
--

User 'sameeragarwal' has created a pull request for this issue:
https://github.com/apache/spark/pull/13566

> Not use cache on appends and overwrites
> ---
>
> Key: SPARK-15678
> URL: https://issues.apache.org/jira/browse/SPARK-15678
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Sameer Agarwal
>
> SparkSQL currently doesn't drop caches if the underlying data is overwritten.
> {code}
> val dir = "/tmp/test"
> sqlContext.range(1000).write.mode("overwrite").parquet(dir)
> val df = sqlContext.read.parquet(dir).cache()
> df.count() // outputs 1000
> sqlContext.range(10).write.mode("overwrite").parquet(dir)
> sqlContext.read.parquet(dir).count() // outputs 1000 instead of 10 < We 
> are still using the cached dataset
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15827) Publish Spark's forked sbt-pom-reader to Maven Central

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15827:


Assignee: Josh Rosen  (was: Apache Spark)

> Publish Spark's forked sbt-pom-reader to Maven Central
> --
>
> Key: SPARK-15827
> URL: https://issues.apache.org/jira/browse/SPARK-15827
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> Spark's SBT build currently uses a fork of the sbt-pom-reader plugin but 
> depends on that fork via a SBT subproject which is cloned from 
> https://github.com/scrapcodes/sbt-pom-reader/tree/ignore_artifact_id. This 
> unnecessarily slows down the initial build on fresh machines and is also 
> risky because it risks a build breakage in case that GitHub repository ever 
> changes or is deleted.
> In order to address these issues, I propose to publish a pre-built binary of 
> our forked sbt-pom-reader plugin to Maven Central under the org.spark-project 
> namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15783) Fix more flakiness: o.a.s.scheduler.BlacklistIntegrationSuite

2016-06-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321247#comment-15321247
 ] 

Apache Spark commented on SPARK-15783:
--

User 'squito' has created a pull request for this issue:
https://github.com/apache/spark/pull/13565

> Fix more flakiness: o.a.s.scheduler.BlacklistIntegrationSuite
> -
>
> Key: SPARK-15783
> URL: https://issues.apache.org/jira/browse/SPARK-15783
> Project: Spark
>  Issue Type: Test
>Reporter: Imran Rashid
>Priority: Minor
>
> Looks like SPARK-15714 didn't address all the sources of flakiness.  First 
> turning the test off to stop breaking builds, then will try to fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14485) Task finished cause fetch failure when its executor has already been removed by driver

2016-06-08 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321238#comment-15321238
 ] 

Marcelo Vanzin commented on SPARK-14485:


Just to distill my comment a little bit more, without writing extra code there 
will be recomputation either way: the old code would cause the downstream task 
to fail, the new code will cause the original task to be recomputed. I prefer 
the new one better because it avoids noise in the logs, but, either way works.

> Task finished cause fetch failure when its executor has already been removed 
> by driver 
> ---
>
> Key: SPARK-14485
> URL: https://issues.apache.org/jira/browse/SPARK-14485
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1, 1.5.2
>Reporter: iward
>Assignee: iward
> Fix For: 2.0.0
>
>
> Now, when executor is removed by driver with heartbeats timeout, driver will 
> re-queue the task on this executor and send a kill command to cluster to kill 
> this executor.
> But, in a situation, the running task of this executor is finished and return 
> result to driver before this executor killed by kill command sent by driver. 
> At this situation, driver will accept the task finished event and ignore  
> speculative task and re-queued task. But, as we know, this executor has 
> removed by driver, the result of this finished task can not save in driver 
> because the *BlockManagerId* has also removed from *BlockManagerMaster* by 
> driver. So, the result data of this stage is not complete, and then, it will 
> cause fetch failure.
> For example, the following is the task log:
> {noformat}
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 WARN HeartbeatReceiver: Removing 
> executor 322 with no recent heartbeats: 256015 ms exceeds timeout 25 ms
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 ERROR YarnScheduler: Lost executor 
> 322 on BJHC-HERA-16168.hadoop.jd.local: Executor heartbeat timed out after 
> 256015 ms
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 INFO TaskSetManager: Re-queueing 
> tasks for 322 from TaskSet 107.0
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 WARN TaskSetManager: Lost task 
> 229.0 in stage 107.0 (TID 10384, BJHC-HERA-16168.hadoop.jd.local): 
> ExecutorLostFailure (executor 322 lost)
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 INFO DAGScheduler: Executor lost: 
> 322 (epoch 11)
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 INFO BlockManagerMasterEndpoint: 
> Trying to remove executor 322 from BlockManagerMaster.
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 INFO BlockManagerMaster: Removed 
> 322 successfully in removeExecutor
> {noformat}
> {noformat}
> 2015-12-31 04:38:52 INFO 15/12/31 04:38:52 INFO TaskSetManager: Finished task 
> 229.0 in stage 107.0 (TID 10384) in 272315 ms on 
> BJHC-HERA-16168.hadoop.jd.local (579/700)
> 2015-12-31 04:40:12 INFO 15/12/31 04:40:12 INFO TaskSetManager: Ignoring 
> task-finished event for 229.1 in stage 107.0 because task 229 has already 
> completed successfully
> {noformat}
> {noformat}
> 2015-12-31 04:40:12 INFO 15/12/31 04:40:12 INFO DAGScheduler: Submitting 3 
> missing tasks from ShuffleMapStage 107 (MapPartitionsRDD[263] at 
> mapPartitions at Exchange.scala:137)
> 2015-12-31 04:40:12 INFO 15/12/31 04:40:12 INFO YarnScheduler: Adding task 
> set 107.1 with 3 tasks
> 2015-12-31 04:40:12 INFO 15/12/31 04:40:12 INFO TaskSetManager: Starting task 
> 0.0 in stage 107.1 (TID 10863, BJHC-HERA-18043.hadoop.jd.local, 
> PROCESS_LOCAL, 3745 bytes)
> 2015-12-31 04:40:12 INFO 15/12/31 04:40:12 INFO TaskSetManager: Starting task 
> 1.0 in stage 107.1 (TID 10864, BJHC-HERA-9291.hadoop.jd.local, PROCESS_LOCAL, 
> 3745 bytes)
> 2015-12-31 04:40:12 INFO 15/12/31 04:40:12 INFO TaskSetManager: Starting task 
> 2.0 in stage 107.1 (TID 10865, BJHC-HERA-16047.hadoop.jd.local, 
> PROCESS_LOCAL, 3745 bytes)
> {noformat}
> Driver will check the stage's result is not complete, and submit missing 
> task, but this time, the next stage has run because previous stage has finish 
> for its task is all finished although its result is not complete.
> {noformat}
> 2015-12-31 04:40:13 INFO 15/12/31 04:40:13 WARN TaskSetManager: Lost task 
> 39.0 in stage 109.0 (TID 10905, BJHC-HERA-9357.hadoop.jd.local): 
> FetchFailed(null, shuffleId=11, mapId=-1, reduceId=39, message=
> 2015-12-31 04:40:13 INFO 
> org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output 
> location for shuffle 11
> 2015-12-31 04:40:13 INFO at 
> org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$1.apply(MapOutputTracker.scala:385)
> 2015-12-31 04:40:13 INFO at 
> 

[jira] [Created] (SPARK-15827) Publish Spark's forked sbt-pom-reader to Maven Central

2016-06-08 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-15827:
--

 Summary: Publish Spark's forked sbt-pom-reader to Maven Central
 Key: SPARK-15827
 URL: https://issues.apache.org/jira/browse/SPARK-15827
 Project: Spark
  Issue Type: Improvement
  Components: Build, Project Infra
Reporter: Josh Rosen
Assignee: Josh Rosen


Spark's SBT build currently uses a fork of the sbt-pom-reader plugin but 
depends on that fork via a SBT subproject which is cloned from 
https://github.com/scrapcodes/sbt-pom-reader/tree/ignore_artifact_id. This 
unnecessarily slows down the initial build on fresh machines and is also risky 
because it risks a build breakage in case that GitHub repository ever changes 
or is deleted.

In order to address these issues, I propose to publish a pre-built binary of 
our forked sbt-pom-reader plugin to Maven Central under the org.spark-project 
namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14485) Task finished cause fetch failure when its executor has already been removed by driver

2016-06-08 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321202#comment-15321202
 ] 

Marcelo Vanzin commented on SPARK-14485:


I commented on the PR, but will mostly repeat it here.

I think your point (a) is valid. It should be a rare case, though, so I don't 
feel strongly one way or another about it.

The change helps (b) because it can avoid unnecessary logs to the output. It's 
a minor issue, I grant you that.

Similarly, the change helps (c) by avoiding log noise in certain situations. 
The race you mention does exist, but users get really antsy when they see 
exceptions in logs, so if we can help avoid those it's always good.

So to me it boils down to case (a): we can revert the change and live with the 
extra noise in the logs, we can add code to handle that case and still clean 
the logs, or we can live with the added inefficiency which, in my view, should 
be hit pretty rarely.


> Task finished cause fetch failure when its executor has already been removed 
> by driver 
> ---
>
> Key: SPARK-14485
> URL: https://issues.apache.org/jira/browse/SPARK-14485
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1, 1.5.2
>Reporter: iward
>Assignee: iward
> Fix For: 2.0.0
>
>
> Now, when executor is removed by driver with heartbeats timeout, driver will 
> re-queue the task on this executor and send a kill command to cluster to kill 
> this executor.
> But, in a situation, the running task of this executor is finished and return 
> result to driver before this executor killed by kill command sent by driver. 
> At this situation, driver will accept the task finished event and ignore  
> speculative task and re-queued task. But, as we know, this executor has 
> removed by driver, the result of this finished task can not save in driver 
> because the *BlockManagerId* has also removed from *BlockManagerMaster* by 
> driver. So, the result data of this stage is not complete, and then, it will 
> cause fetch failure.
> For example, the following is the task log:
> {noformat}
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 WARN HeartbeatReceiver: Removing 
> executor 322 with no recent heartbeats: 256015 ms exceeds timeout 25 ms
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 ERROR YarnScheduler: Lost executor 
> 322 on BJHC-HERA-16168.hadoop.jd.local: Executor heartbeat timed out after 
> 256015 ms
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 INFO TaskSetManager: Re-queueing 
> tasks for 322 from TaskSet 107.0
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 WARN TaskSetManager: Lost task 
> 229.0 in stage 107.0 (TID 10384, BJHC-HERA-16168.hadoop.jd.local): 
> ExecutorLostFailure (executor 322 lost)
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 INFO DAGScheduler: Executor lost: 
> 322 (epoch 11)
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 INFO BlockManagerMasterEndpoint: 
> Trying to remove executor 322 from BlockManagerMaster.
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 INFO BlockManagerMaster: Removed 
> 322 successfully in removeExecutor
> {noformat}
> {noformat}
> 2015-12-31 04:38:52 INFO 15/12/31 04:38:52 INFO TaskSetManager: Finished task 
> 229.0 in stage 107.0 (TID 10384) in 272315 ms on 
> BJHC-HERA-16168.hadoop.jd.local (579/700)
> 2015-12-31 04:40:12 INFO 15/12/31 04:40:12 INFO TaskSetManager: Ignoring 
> task-finished event for 229.1 in stage 107.0 because task 229 has already 
> completed successfully
> {noformat}
> {noformat}
> 2015-12-31 04:40:12 INFO 15/12/31 04:40:12 INFO DAGScheduler: Submitting 3 
> missing tasks from ShuffleMapStage 107 (MapPartitionsRDD[263] at 
> mapPartitions at Exchange.scala:137)
> 2015-12-31 04:40:12 INFO 15/12/31 04:40:12 INFO YarnScheduler: Adding task 
> set 107.1 with 3 tasks
> 2015-12-31 04:40:12 INFO 15/12/31 04:40:12 INFO TaskSetManager: Starting task 
> 0.0 in stage 107.1 (TID 10863, BJHC-HERA-18043.hadoop.jd.local, 
> PROCESS_LOCAL, 3745 bytes)
> 2015-12-31 04:40:12 INFO 15/12/31 04:40:12 INFO TaskSetManager: Starting task 
> 1.0 in stage 107.1 (TID 10864, BJHC-HERA-9291.hadoop.jd.local, PROCESS_LOCAL, 
> 3745 bytes)
> 2015-12-31 04:40:12 INFO 15/12/31 04:40:12 INFO TaskSetManager: Starting task 
> 2.0 in stage 107.1 (TID 10865, BJHC-HERA-16047.hadoop.jd.local, 
> PROCESS_LOCAL, 3745 bytes)
> {noformat}
> Driver will check the stage's result is not complete, and submit missing 
> task, but this time, the next stage has run because previous stage has finish 
> for its task is all finished although its result is not complete.
> {noformat}
> 2015-12-31 04:40:13 INFO 15/12/31 04:40:13 WARN TaskSetManager: Lost task 
> 39.0 in stage 109.0 (TID 10905, BJHC-HERA-9357.hadoop.jd.local): 
> FetchFailed(null, shuffleId=11, mapId=-1, reduceId=39, 

[jira] [Commented] (SPARK-15826) PipedRDD to strictly use UTF-8 and not rely on default encoding

2016-06-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321203#comment-15321203
 ] 

Apache Spark commented on SPARK-15826:
--

User 'tejasapatil' has created a pull request for this issue:
https://github.com/apache/spark/pull/13563

> PipedRDD to strictly use UTF-8 and not rely on default encoding
> ---
>
> Key: SPARK-15826
> URL: https://issues.apache.org/jira/browse/SPARK-15826
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Tejas Patil
>Priority: Trivial
>
> Encountered an issue wherein the code works in some cluster but fails on 
> another one for the same input. After debugging realised that PipedRDD is 
> picking default char encoding from the JVM which may be different across 
> different platforms. Making it use UTF-8 encoding just like 
> `ScriptTransformation` does.
> Stack trace:
> {noformat}
> Caused by: java.nio.charset.MalformedInputException: Input length = 1
>   at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:161)
>   at java.io.BufferedReader.readLine(BufferedReader.java:324)
>   at java.io.BufferedReader.readLine(BufferedReader.java:389)
>   at 
> scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:67)
>   at org.apache.spark.rdd.PipedRDD$$anon$1.hasNext(PipedRDD.scala:185)
>   at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1612)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15826) PipedRDD to strictly use UTF-8 and not rely on default encoding

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15826:


Assignee: (was: Apache Spark)

> PipedRDD to strictly use UTF-8 and not rely on default encoding
> ---
>
> Key: SPARK-15826
> URL: https://issues.apache.org/jira/browse/SPARK-15826
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Tejas Patil
>Priority: Trivial
>
> Encountered an issue wherein the code works in some cluster but fails on 
> another one for the same input. After debugging realised that PipedRDD is 
> picking default char encoding from the JVM which may be different across 
> different platforms. Making it use UTF-8 encoding just like 
> `ScriptTransformation` does.
> Stack trace:
> {noformat}
> Caused by: java.nio.charset.MalformedInputException: Input length = 1
>   at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:161)
>   at java.io.BufferedReader.readLine(BufferedReader.java:324)
>   at java.io.BufferedReader.readLine(BufferedReader.java:389)
>   at 
> scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:67)
>   at org.apache.spark.rdd.PipedRDD$$anon$1.hasNext(PipedRDD.scala:185)
>   at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1612)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15826) PipedRDD to strictly use UTF-8 and not rely on default encoding

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15826:


Assignee: Apache Spark

> PipedRDD to strictly use UTF-8 and not rely on default encoding
> ---
>
> Key: SPARK-15826
> URL: https://issues.apache.org/jira/browse/SPARK-15826
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Tejas Patil
>Assignee: Apache Spark
>Priority: Trivial
>
> Encountered an issue wherein the code works in some cluster but fails on 
> another one for the same input. After debugging realised that PipedRDD is 
> picking default char encoding from the JVM which may be different across 
> different platforms. Making it use UTF-8 encoding just like 
> `ScriptTransformation` does.
> Stack trace:
> {noformat}
> Caused by: java.nio.charset.MalformedInputException: Input length = 1
>   at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:161)
>   at java.io.BufferedReader.readLine(BufferedReader.java:324)
>   at java.io.BufferedReader.readLine(BufferedReader.java:389)
>   at 
> scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:67)
>   at org.apache.spark.rdd.PipedRDD$$anon$1.hasNext(PipedRDD.scala:185)
>   at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1612)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15826) PipedRDD to strictly use UTF-8 and not rely on default encoding

2016-06-08 Thread Tejas Patil (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tejas Patil updated SPARK-15826:

Description: 
Encountered an issue wherein the code works in some cluster but fails on 
another one for the same input. After debugging realised that PipedRDD is 
picking default char encoding from the JVM which may be different across 
different platforms. Making it use UTF-8 encoding just like 
`ScriptTransformation` does.

Stack trace:
{noformat}
Caused by: java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at 
scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:67)
at org.apache.spark.rdd.PipedRDD$$anon$1.hasNext(PipedRDD.scala:185)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1612)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
at 
org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
at 
org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}

  was:
Encountered an issue wherein the code works in some cluster but fails on 
another one for the same input. After debugging realised that PipedRDD is 
picking default char encoding from the JVM which may be different across 
different platforms. 

Making it use UTF-8 encoding just like `ScriptTransformation` does


> PipedRDD to strictly use UTF-8 and not rely on default encoding
> ---
>
> Key: SPARK-15826
> URL: https://issues.apache.org/jira/browse/SPARK-15826
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Tejas Patil
>Priority: Trivial
>
> Encountered an issue wherein the code works in some cluster but fails on 
> another one for the same input. After debugging realised that PipedRDD is 
> picking default char encoding from the JVM which may be different across 
> different platforms. Making it use UTF-8 encoding just like 
> `ScriptTransformation` does.
> Stack trace:
> {noformat}
> Caused by: java.nio.charset.MalformedInputException: Input length = 1
>   at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:161)
>   at java.io.BufferedReader.readLine(BufferedReader.java:324)
>   at java.io.BufferedReader.readLine(BufferedReader.java:389)
>   at 
> scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:67)
>   at org.apache.spark.rdd.PipedRDD$$anon$1.hasNext(PipedRDD.scala:185)
>   at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1612)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: 

[jira] [Updated] (SPARK-15826) PipedRDD to strictly use UTF-8 and not rely on default encoding

2016-06-08 Thread Tejas Patil (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tejas Patil updated SPARK-15826:

Summary: PipedRDD to strictly use UTF-8 and not rely on default encoding  
(was: PipedRDD relies on JVM default encoding)

> PipedRDD to strictly use UTF-8 and not rely on default encoding
> ---
>
> Key: SPARK-15826
> URL: https://issues.apache.org/jira/browse/SPARK-15826
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Tejas Patil
>Priority: Trivial
>
> Encountered an issue wherein the code works in some cluster but fails on 
> another one for the same input. After debugging realised that PipedRDD is 
> picking default char encoding from the JVM which may be different across 
> different platforms. 
> Making it use UTF-8 encoding just like `ScriptTransformation` does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15826) PipedRDD relies on JVM default encoding

2016-06-08 Thread Tejas Patil (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tejas Patil updated SPARK-15826:

Description: 
Encountered an issue wherein the code works in some cluster but fails on 
another one for the same input. After debugging realised that PipedRDD is 
picking default char encoding from the JVM which may be different across 
different platforms. 

Making it use UTF-8 encoding just like `ScriptTransformation` does

> PipedRDD relies on JVM default encoding
> ---
>
> Key: SPARK-15826
> URL: https://issues.apache.org/jira/browse/SPARK-15826
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Tejas Patil
>Priority: Trivial
>
> Encountered an issue wherein the code works in some cluster but fails on 
> another one for the same input. After debugging realised that PipedRDD is 
> picking default char encoding from the JVM which may be different across 
> different platforms. 
> Making it use UTF-8 encoding just like `ScriptTransformation` does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15826) PipedRDD relies on JVM default encoding

2016-06-08 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-15826:
---

 Summary: PipedRDD relies on JVM default encoding
 Key: SPARK-15826
 URL: https://issues.apache.org/jira/browse/SPARK-15826
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Tejas Patil
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14485) Task finished cause fetch failure when its executor has already been removed by driver

2016-06-08 Thread Kay Ousterhout (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321177#comment-15321177
 ] 

Kay Ousterhout commented on SPARK-14485:


I commented on the pull request, but want to continue the discussion here for 
archiving purposes.

My understanding is that this pull request fixes the following sequence of 
events:
(1) A task completes on an executor
(2) The executor fails
(3) The scheduler is notified about the task completing.
(4) A future stage that depends on the task runs, and fails, because the 
executor where the data was stored has failed.

With the proposed pull request, in step (3), the scheduler ignores the update, 
because it came from a failed executor.

I don't think we should do this for a few reasons:

(a) If the task didn't have a result stored on the executor (e.g., it computed 
some result on the RDD that it sent directly back to the master, like counting 
the elements in the RDD), it doesn't need to be failed, and can complete 
successfully.  With this change, we'd unnecessarily re-run the task.
(b) If the task did had in IndirectTaskResult (where it was too big to be sent 
directly to the master), the TaskResultGetter will fail to get the task result, 
and the task will be marked as failed.  This already worked correctly with the 
old code (AFAIK).
(c) This change is attempting to fix a third case, where the task had shuffle 
data that's now inaccessible because the machine had died.  I don't think it 
makes sense to fix this, because you can imagine a slight change in timing that 
causes the order of (2) and (3) above to be swapped.  In this case, even with 
the proposed code change, we're still stuck with the fetch failure and 
re-running the map stage.  Furthermore, it's possible (and likely!) that there 
were other map tasks that ran on the failed executor, and those tasks won't be 
failed and re-run with this change, so the reduce stage will still fail.  In 
general, the reason we have the fetch failure mechanism is because it can 
happen that shuffle data gets lost, and rather than detecting every kind of 
map-side failure, it's simpler to fail on the reduce side and then re-run the 
necessary tasks in the map stage.

Given all of the above, I'd advocate for reverting this change and marking the 
JIRA as won't fix.  [~vanzin] [~iward] let me know what your thoughts are. 

> Task finished cause fetch failure when its executor has already been removed 
> by driver 
> ---
>
> Key: SPARK-14485
> URL: https://issues.apache.org/jira/browse/SPARK-14485
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1, 1.5.2
>Reporter: iward
>Assignee: iward
> Fix For: 2.0.0
>
>
> Now, when executor is removed by driver with heartbeats timeout, driver will 
> re-queue the task on this executor and send a kill command to cluster to kill 
> this executor.
> But, in a situation, the running task of this executor is finished and return 
> result to driver before this executor killed by kill command sent by driver. 
> At this situation, driver will accept the task finished event and ignore  
> speculative task and re-queued task. But, as we know, this executor has 
> removed by driver, the result of this finished task can not save in driver 
> because the *BlockManagerId* has also removed from *BlockManagerMaster* by 
> driver. So, the result data of this stage is not complete, and then, it will 
> cause fetch failure.
> For example, the following is the task log:
> {noformat}
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 WARN HeartbeatReceiver: Removing 
> executor 322 with no recent heartbeats: 256015 ms exceeds timeout 25 ms
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 ERROR YarnScheduler: Lost executor 
> 322 on BJHC-HERA-16168.hadoop.jd.local: Executor heartbeat timed out after 
> 256015 ms
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 INFO TaskSetManager: Re-queueing 
> tasks for 322 from TaskSet 107.0
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 WARN TaskSetManager: Lost task 
> 229.0 in stage 107.0 (TID 10384, BJHC-HERA-16168.hadoop.jd.local): 
> ExecutorLostFailure (executor 322 lost)
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 INFO DAGScheduler: Executor lost: 
> 322 (epoch 11)
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 INFO BlockManagerMasterEndpoint: 
> Trying to remove executor 322 from BlockManagerMaster.
> 2015-12-31 04:38:50 INFO 15/12/31 04:38:50 INFO BlockManagerMaster: Removed 
> 322 successfully in removeExecutor
> {noformat}
> {noformat}
> 2015-12-31 04:38:52 INFO 15/12/31 04:38:52 INFO TaskSetManager: Finished task 
> 229.0 in stage 107.0 (TID 10384) in 272315 ms on 
> BJHC-HERA-16168.hadoop.jd.local (579/700)
> 2015-12-31 04:40:12 INFO 15/12/31 04:40:12 

[jira] [Commented] (SPARK-2984) FileNotFoundException on _temporary directory

2016-06-08 Thread Sandeep (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321116#comment-15321116
 ] 

Sandeep commented on SPARK-2984:


Can this bug be reopened please ? I am seeing the issue with spark 1.6.1. as 
well on AWS. 
Caused by: java.io.FileNotFoundException: File 
s3n://xxx/_temporary/0/task_201606080516_0004_m_79 does not exist.
at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem.listStatus(NativeS3FileSystem.java:506)
  at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:360)
  at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
  at 
org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46)
  at 
org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230)
  at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:151)
  ... 42 more


> FileNotFoundException on _temporary directory
> -
>
> Key: SPARK-2984
> URL: https://issues.apache.org/jira/browse/SPARK-2984
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Andrew Ash
>Assignee: Josh Rosen
>Priority: Critical
> Fix For: 1.3.0
>
>
> We've seen several stacktraces and threads on the user mailing list where 
> people are having issues with a {{FileNotFoundException}} stemming from an 
> HDFS path containing {{_temporary}}.
> I ([~aash]) think this may be related to {{spark.speculation}}.  I think the 
> error condition might manifest in this circumstance:
> 1) task T starts on a executor E1
> 2) it takes a long time, so task T' is started on another executor E2
> 3) T finishes in E1 so moves its data from {{_temporary}} to the final 
> destination and deletes the {{_temporary}} directory during cleanup
> 4) T' finishes in E2 and attempts to move its data from {{_temporary}}, but 
> those files no longer exist!  exception
> Some samples:
> {noformat}
> 14/08/11 08:05:08 ERROR JobScheduler: Error running job streaming job 
> 140774430 ms.0
> java.io.FileNotFoundException: File 
> hdfs://hadoopc/user/csong/output/human_bot/-140774430.out/_temporary/0/task_201408110805__m_07
>  does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:654)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:360)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
> at 
> org.apache.hadoop.mapred.FileOutputCommitter.commitJob(FileOutputCommitter.java:136)
> at 
> org.apache.spark.SparkHadoopWriter.commitJob(SparkHadoopWriter.scala:126)
> at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:841)
> at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:724)
> at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:643)
> at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1068)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:773)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:771)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
> at scala.util.Try$.apply(Try.scala:161)
> at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
> at 
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> -- Chen Song at 
> 

[jira] [Created] (SPARK-15825) sort-merge-join gives invalid results when joining on a tupled key

2016-06-08 Thread Andres Perez (JIRA)
Andres Perez created SPARK-15825:


 Summary: sort-merge-join gives invalid results when joining on a 
tupled key
 Key: SPARK-15825
 URL: https://issues.apache.org/jira/browse/SPARK-15825
 Project: Spark
  Issue Type: Bug
  Components: SQL
 Environment: spark 2.0.0-SNAPSHOT
Reporter: Andres Perez


{noformat}
  import org.apache.spark.sql.functions
  val left = List("0", "1", "2").toDS()
.map{ k => ((k, 0), "l") }

  val right = List("0", "1", "2").toDS()
.map{ k => ((k, 0), "r") }

  val result = left.toDF("k", "v").as[((String, Int), String)].alias("left")
.joinWith(right.toDF("k", "v").as[((String, Int), String)].alias("right"), 
functions.col("left.k") === functions.col("right.k"), "inner")
.as[(((String, Int), String), ((String, Int), String))]
{noformat}

When broadcast joins are enabled, we get the expected output:

{noformat}
(((0,0),l),((0,0),r))
(((1,0),l),((1,0),r))
(((2,0),l),((2,0),r))
{noformat}

However, when broadcast joins are disabled (i.e. setting 
spark.sql.autoBroadcastJoinThreshold to -1), the result is incorrect:

{noformat}
(((2,0),l),((2,-1),))
(((0,0),l),((0,-313907893),))
(((1,0),l),((null,-313907893),))
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org




[jira] [Updated] (SPARK-15046) When running hive-thriftserver with yarn on a secure cluster the workers fail with java.lang.NumberFormatException

2016-06-08 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-15046:
---
Target Version/s: 2.0.0
Priority: Blocker  (was: Major)
 Component/s: YARN

Marking as blocker since this is a regression.

> When running hive-thriftserver with yarn on a secure cluster the workers fail 
> with java.lang.NumberFormatException
> --
>
> Key: SPARK-15046
> URL: https://issues.apache.org/jira/browse/SPARK-15046
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Trystan Leftwich
>Priority: Blocker
>
> When running hive-thriftserver with yarn on a secure cluster 
> (spark.yarn.principal and spark.yarn.keytab are set) the workers fail with 
> the following error.
> {code}
> 16/04/30 22:40:50 ERROR yarn.ApplicationMaster: Uncaught exception: 
> java.lang.NumberFormatException: For input string: "86400079ms"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Long.parseLong(Long.java:441)
>   at java.lang.Long.parseLong(Long.java:483)
>   at 
> scala.collection.immutable.StringLike$class.toLong(StringLike.scala:276)
>   at scala.collection.immutable.StringOps.toLong(StringOps.scala:29)
>   at 
> org.apache.spark.SparkConf$$anonfun$getLong$2.apply(SparkConf.scala:380)
>   at 
> org.apache.spark.SparkConf$$anonfun$getLong$2.apply(SparkConf.scala:380)
>   at scala.Option.map(Option.scala:146)
>   at org.apache.spark.SparkConf.getLong(SparkConf.scala:380)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.getTimeFromNowToRenewal(SparkHadoopUtil.scala:289)
>   at 
> org.apache.spark.deploy.yarn.AMDelegationTokenRenewer.org$apache$spark$deploy$yarn$AMDelegationTokenRenewer$$scheduleRenewal$1(AMDelegationTokenRenewer.scala:89)
>   at 
> org.apache.spark.deploy.yarn.AMDelegationTokenRenewer.scheduleLoginFromKeytab(AMDelegationTokenRenewer.scala:121)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$3.apply(ApplicationMaster.scala:243)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$3.apply(ApplicationMaster.scala:243)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:243)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:723)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:721)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:748)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11183) enable support for mesos 0.24+

2016-06-08 Thread Charles Allen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320914#comment-15320914
 ] 

Charles Allen commented on SPARK-11183:
---

Eventually it could be wroth adopting something like 
https://github.com/mesosphere/mesos-rxjava to plug into the mesos cluster

> enable support for mesos 0.24+
> --
>
> Key: SPARK-11183
> URL: https://issues.apache.org/jira/browse/SPARK-11183
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Ioannis Polyzos
>
> mesos 0.24, the mesos leader info in ZK has changed to json tis result to 
> spark failed to running on 0.24+.
> References : 
>   https://issues.apache.org/jira/browse/MESOS-2340 
>   
> http://mail-archives.apache.org/mod_mbox/mesos-commits/201506.mbox/%3ced4698dc56444bcdac3bdf19134db...@git.apache.org%3E
>   https://github.com/mesos/elasticsearch/issues/338
>   https://github.com/spark-jobserver/spark-jobserver/issues/267



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15812) Allow sorting on aggregated streaming dataframe when the output mode is Complete

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15812:


Assignee: Tathagata Das  (was: Apache Spark)

> Allow sorting on aggregated streaming dataframe when the output mode is 
> Complete
> 
>
> Key: SPARK-15812
> URL: https://issues.apache.org/jira/browse/SPARK-15812
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> When the output mode is complete, then the output of a streaming aggregation 
> essentially will contain the complete aggregates every time. So this is not 
> different from a batch dataset within an incremental execution. Other 
> non-streaming operations should be supported on this dataset. In this JIRA, 
> we are just adding support for sorting, as it is a common useful 
> functionality. Support for other operations will come later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15812) Allow sorting on aggregated streaming dataframe when the output mode is Complete

2016-06-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320908#comment-15320908
 ] 

Apache Spark commented on SPARK-15812:
--

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/13549

> Allow sorting on aggregated streaming dataframe when the output mode is 
> Complete
> 
>
> Key: SPARK-15812
> URL: https://issues.apache.org/jira/browse/SPARK-15812
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> When the output mode is complete, then the output of a streaming aggregation 
> essentially will contain the complete aggregates every time. So this is not 
> different from a batch dataset within an incremental execution. Other 
> non-streaming operations should be supported on this dataset. In this JIRA, 
> we are just adding support for sorting, as it is a common useful 
> functionality. Support for other operations will come later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15812) Allow sorting on aggregated streaming dataframe when the output mode is Complete

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15812:


Assignee: Apache Spark  (was: Tathagata Das)

> Allow sorting on aggregated streaming dataframe when the output mode is 
> Complete
> 
>
> Key: SPARK-15812
> URL: https://issues.apache.org/jira/browse/SPARK-15812
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Apache Spark
>
> When the output mode is complete, then the output of a streaming aggregation 
> essentially will contain the complete aggregates every time. So this is not 
> different from a batch dataset within an incremental execution. Other 
> non-streaming operations should be supported on this dataset. In this JIRA, 
> we are just adding support for sorting, as it is a common useful 
> functionality. Support for other operations will come later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15784) Add Power Iteration Clustering to spark.ml

2016-06-08 Thread Miao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320879#comment-15320879
 ] 

Miao Wang commented on SPARK-15784:
---

I can work on this. Thanks!

> Add Power Iteration Clustering to spark.ml
> --
>
> Key: SPARK-15784
> URL: https://issues.apache.org/jira/browse/SPARK-15784
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Xinh Huynh
>
> Adding this algorithm is required as part of SPARK-4591: Algorithm/model 
> parity for spark.ml. The review JIRA for clustering is SPARK-14380.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11183) enable support for mesos 0.24+

2016-06-08 Thread Charles Allen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320840#comment-15320840
 ] 

Charles Allen commented on SPARK-11183:
---

Being able to enable the fetch cache 
http://mesos.apache.org/documentation/latest/fetcher/ would be nice also

> enable support for mesos 0.24+
> --
>
> Key: SPARK-11183
> URL: https://issues.apache.org/jira/browse/SPARK-11183
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Ioannis Polyzos
>
> mesos 0.24, the mesos leader info in ZK has changed to json tis result to 
> spark failed to running on 0.24+.
> References : 
>   https://issues.apache.org/jira/browse/MESOS-2340 
>   
> http://mail-archives.apache.org/mod_mbox/mesos-commits/201506.mbox/%3ced4698dc56444bcdac3bdf19134db...@git.apache.org%3E
>   https://github.com/mesos/elasticsearch/issues/338
>   https://github.com/spark-jobserver/spark-jobserver/issues/267



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15821) Should we use mvn -T for multithreaded Spark builds?

2016-06-08 Thread Adam Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320614#comment-15320614
 ] 

Adam Roberts commented on SPARK-15821:
--

Agreed with your comment on tests, the above pull request is for README.md and 
building Spark (perhaps better placed in the Building Spark section)

> Should we use mvn -T for multithreaded Spark builds?
> 
>
> Key: SPARK-15821
> URL: https://issues.apache.org/jira/browse/SPARK-15821
> Project: Spark
>  Issue Type: Question
>  Components: Build
>Reporter: Adam Roberts
>Priority: Minor
>
> With Maven we can build Spark in a multithreaded way and benefit from 
> increased build time performance as a result.
> On a machine with eight cores, I noticed the build time reduced from 20-25 
> minutes to five minutes; this is by building with
> mvn -T 1C -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -DskipTests clean 
> package
> -T 1C says that we'll use one extra thread for each core available, I've 
> never experienced a problem with using this option (ranging from a single 
> cored box to one with 192 cores available)
> Should we use this for building Spark quicker or is the Jenkins job 
> deliberately set up such that each "executor" is needed for each pull request 
> and we wouldn't see an improvement anyway? 
> This can be discovered by checking core utilization across the farm and can 
> potentially reduce our build times.
> Here's more information on the feature: 
> https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3
> If this isn't suitable for the current farm then I think we should document 
> it for those building Spark from source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15821) Should we use mvn -T for multithreaded Spark builds?

2016-06-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320604#comment-15320604
 ] 

Apache Spark commented on SPARK-15821:
--

User 'a-roberts' has created a pull request for this issue:
https://github.com/apache/spark/pull/13562

> Should we use mvn -T for multithreaded Spark builds?
> 
>
> Key: SPARK-15821
> URL: https://issues.apache.org/jira/browse/SPARK-15821
> Project: Spark
>  Issue Type: Question
>  Components: Build
>Reporter: Adam Roberts
>Priority: Minor
>
> With Maven we can build Spark in a multithreaded way and benefit from 
> increased build time performance as a result.
> On a machine with eight cores, I noticed the build time reduced from 20-25 
> minutes to five minutes; this is by building with
> mvn -T 1C -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -DskipTests clean 
> package
> -T 1C says that we'll use one extra thread for each core available, I've 
> never experienced a problem with using this option (ranging from a single 
> cored box to one with 192 cores available)
> Should we use this for building Spark quicker or is the Jenkins job 
> deliberately set up such that each "executor" is needed for each pull request 
> and we wouldn't see an improvement anyway? 
> This can be discovered by checking core utilization across the farm and can 
> potentially reduce our build times.
> Here's more information on the feature: 
> https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3
> If this isn't suitable for the current farm then I think we should document 
> it for those building Spark from source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15821) Should we use mvn -T for multithreaded Spark builds?

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15821:


Assignee: (was: Apache Spark)

> Should we use mvn -T for multithreaded Spark builds?
> 
>
> Key: SPARK-15821
> URL: https://issues.apache.org/jira/browse/SPARK-15821
> Project: Spark
>  Issue Type: Question
>  Components: Build
>Reporter: Adam Roberts
>Priority: Minor
>
> With Maven we can build Spark in a multithreaded way and benefit from 
> increased build time performance as a result.
> On a machine with eight cores, I noticed the build time reduced from 20-25 
> minutes to five minutes; this is by building with
> mvn -T 1C -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -DskipTests clean 
> package
> -T 1C says that we'll use one extra thread for each core available, I've 
> never experienced a problem with using this option (ranging from a single 
> cored box to one with 192 cores available)
> Should we use this for building Spark quicker or is the Jenkins job 
> deliberately set up such that each "executor" is needed for each pull request 
> and we wouldn't see an improvement anyway? 
> This can be discovered by checking core utilization across the farm and can 
> potentially reduce our build times.
> Here's more information on the feature: 
> https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3
> If this isn't suitable for the current farm then I think we should document 
> it for those building Spark from source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15821) Should we use mvn -T for multithreaded Spark builds?

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15821:


Assignee: Apache Spark

> Should we use mvn -T for multithreaded Spark builds?
> 
>
> Key: SPARK-15821
> URL: https://issues.apache.org/jira/browse/SPARK-15821
> Project: Spark
>  Issue Type: Question
>  Components: Build
>Reporter: Adam Roberts
>Assignee: Apache Spark
>Priority: Minor
>
> With Maven we can build Spark in a multithreaded way and benefit from 
> increased build time performance as a result.
> On a machine with eight cores, I noticed the build time reduced from 20-25 
> minutes to five minutes; this is by building with
> mvn -T 1C -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -DskipTests clean 
> package
> -T 1C says that we'll use one extra thread for each core available, I've 
> never experienced a problem with using this option (ranging from a single 
> cored box to one with 192 cores available)
> Should we use this for building Spark quicker or is the Jenkins job 
> deliberately set up such that each "executor" is needed for each pull request 
> and we wouldn't see an improvement anyway? 
> This can be discovered by checking core utilization across the farm and can 
> potentially reduce our build times.
> Here's more information on the feature: 
> https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3
> If this isn't suitable for the current farm then I think we should document 
> it for those building Spark from source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15086) Update Java API once the Scala one is finalized

2016-06-08 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320495#comment-15320495
 ] 

Sean Owen commented on SPARK-15086:
---

I think we can alter the API if that makes sense. Those really aren't the 
tricky questions here. See above.

> Update Java API once the Scala one is finalized
> ---
>
> Key: SPARK-15086
> URL: https://issues.apache.org/jira/browse/SPARK-15086
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Priority: Blocker
>
> We should make sure we update the Java API once the Scala one is finalized. 
> This includes adding the equivalent API in Java as well as deprecating the 
> old ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15811) UDFs do not work in Spark 2.0-preview built with scala 2.10

2016-06-08 Thread Gabor Ratky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320466#comment-15320466
 ] 

Gabor Ratky commented on SPARK-15811:
-

I was able to reproduce the issue on a Databricks Cluster using Spark 2.0 
(apache/branch-2.0 preview). I've also tested whether using 
{{sqlContext.registerFunction}} solved the problem but the issue persisted.

> UDFs do not work in Spark 2.0-preview built with scala 2.10
> ---
>
> Key: SPARK-15811
> URL: https://issues.apache.org/jira/browse/SPARK-15811
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Franklyn Dsouza
>Priority: Critical
>
> I've built spark-2.0-preview (8f5a04b) with scala-2.10 using the following
> {code}
> ./dev/change-version-to-2.10.sh
> ./dev/make-distribution.sh -DskipTests -Dzookeeper.version=3.4.5 
> -Dcurator.version=2.4.0 -Dscala-2.10 -Phadoop-2.6  -Pyarn -Phive
> {code}
> and then ran the following code in a pyspark shell
> {code}
> from pyspark.sql import SparkSession
> from pyspark.sql.types import IntegerType, StructField, StructType
> from pyspark.sql.functions import udf
> from pyspark.sql.types import Row
> spark = SparkSession.builder.master('local[4]').appName('2.0 
> DF').getOrCreate()
> add_one = udf(lambda x: x + 1, IntegerType())
> schema = StructType([StructField('a', IntegerType(), False)])
> df = spark.createDataFrame([Row(a=1),Row(a=2)], schema)
> df.select(add_one(df.a).alias('incremented')).collect()
> {code}
> This never returns with a result. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15824) Run 'with ... insert ... select' failed when use spark thriftserver

2016-06-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320456#comment-15320456
 ] 

Apache Spark commented on SPARK-15824:
--

User 'Sephiroth-Lin' has created a pull request for this issue:
https://github.com/apache/spark/pull/13561

> Run 'with ... insert ... select' failed when use spark thriftserver
> ---
>
> Key: SPARK-15824
> URL: https://issues.apache.org/jira/browse/SPARK-15824
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Weizhong
>Priority: Minor
>
> {code:sql}
> create table src(k int, v int);
> create table src_parquet(k int, v int);
> with v as (select 1, 2) insert into table src_parquet from src;
> {code}
> Will throw exception: spark.sql.execution.id is already set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15824) Run 'with ... insert ... select' failed when use spark thriftserver

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15824:


Assignee: Apache Spark

> Run 'with ... insert ... select' failed when use spark thriftserver
> ---
>
> Key: SPARK-15824
> URL: https://issues.apache.org/jira/browse/SPARK-15824
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Weizhong
>Assignee: Apache Spark
>Priority: Minor
>
> {code:sql}
> create table src(k int, v int);
> create table src_parquet(k int, v int);
> with v as (select 1, 2) insert into table src_parquet from src;
> {code}
> Will throw exception: spark.sql.execution.id is already set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15824) Run 'with ... insert ... select' failed when use spark thriftserver

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15824:


Assignee: (was: Apache Spark)

> Run 'with ... insert ... select' failed when use spark thriftserver
> ---
>
> Key: SPARK-15824
> URL: https://issues.apache.org/jira/browse/SPARK-15824
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Weizhong
>Priority: Minor
>
> {code:sql}
> create table src(k int, v int);
> create table src_parquet(k int, v int);
> with v as (select 1, 2) insert into table src_parquet from src;
> {code}
> Will throw exception: spark.sql.execution.id is already set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13566) Deadlock between MemoryStore and BlockManager

2016-06-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320431#comment-15320431
 ] 

Josef Lindman Hörnlund commented on SPARK-13566:


Do we know if this affects 1.5 as well? 

> Deadlock between MemoryStore and BlockManager
> -
>
> Key: SPARK-13566
> URL: https://issues.apache.org/jira/browse/SPARK-13566
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 1.6.0
> Environment: Spark 1.6.0 hadoop2.2.0 jdk1.8.0_65 centOs 6.2
>Reporter: cen yuhai
>Assignee: cen yuhai
> Fix For: 1.6.2
>
>
> ===
> "block-manager-slave-async-thread-pool-1":
> at org.apache.spark.storage.MemoryStore.remove(MemoryStore.scala:216)
> - waiting to lock <0x0005895b09b0> (a 
> org.apache.spark.memory.UnifiedMemoryManager)
> at 
> org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1114)
> - locked <0x00058ed6aae0> (a org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1101)
> at scala.collection.immutable.Set$Set2.foreach(Set.scala:94)
> at 
> org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1101)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> at 
> org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:84)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> "Executor task launch worker-10":
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1032)
> - waiting to lock <0x00059a0988b8> (a 
> org.apache.spark.storage.BlockInfo)
> at 
> org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:1009)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:460)
> at 
> org.apache.spark.storage.MemoryStore$$anonfun$evictBlocksToFreeSpace$2.apply(MemoryStore.scala:449)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15086) Update Java API once the Scala one is finalized

2016-06-08 Thread Weichen Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320428#comment-15320428
 ] 

Weichen Xu commented on SPARK-15086:


So, if considering java API compatibility with old version, the thing became 
difficult.
Or if we can create a new class such as JavaSparkContextV2 and using new API ? 
so we can design API to make each API the same with scala one.

> Update Java API once the Scala one is finalized
> ---
>
> Key: SPARK-15086
> URL: https://issues.apache.org/jira/browse/SPARK-15086
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Priority: Blocker
>
> We should make sure we update the Java API once the Scala one is finalized. 
> This includes adding the equivalent API in Java as well as deprecating the 
> old ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15816) SQL server based on Postgres protocol

2016-06-08 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320416#comment-15320416
 ] 

Takeshi Yamamuro commented on SPARK-15816:
--

Oh, this is a good start point for that.

About Q1:
We can see the specification of the postgresql frontend/backend wire protocol 
(what's called a `v3 protocol`) here;
https://www.postgresql.org/docs/9.5/static/protocol.html

About Q2:
If we could limit the scope of the implementation, it seems there a feasibility 
to do that.
Actually, `prestogres`, that is a stand-alone server of a gateway for Presto, 
has the same approach with this.
It has some workarounds tough, it has succeeded to implement the gateway via 
the v3 protocol.
https://github.com/treasure-data/prestogres

About Q4:
Since the `postgresql-jdbc` driver implicitly queries system catalogs in 
postgresql to answer system commands or something,
we need to handle these queries to work well. It seems this is a kind of hacky 
points to implement.


I looked for other related implementations and I found H2 databases;
it seems we can use ` the `postgresql-jdbc` driver  to connect them, so this is 
one of references for this discussion.
http://www.h2database.com/html/advanced.html#odbc_driver


> SQL server based on Postgres protocol
> -
>
> Key: SPARK-15816
> URL: https://issues.apache.org/jira/browse/SPARK-15816
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>
> At Spark Summit today this idea came up from a discussion: it would be great 
> to investigate the possibility of implementing a new SQL server using 
> Postgres' protocol, in lieu of Hive ThriftServer 2. I'm creating this ticket 
> to track this idea, in case others have feedback.
> This server can have a simpler architecture, and allows users to leverage a 
> wide range of tools that are already available for Postgres (and many 
> commercial database systems based on Postgres).
> Some of the problems we'd need to figure out are:
> 1. What is the Postgres protocol? Is there an official documentation for it?
> 2. How difficult would it be to implement that protocol in Spark (JVM in 
> particular).
> 3. How does data type mapping work?
> 4. How does system commands work? Would Spark need to support all of 
> Postgres' commands?
> 5. Any restrictions in supporting nested data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15824) Run 'with ... insert ... select' failed when use spark thriftserver

2016-06-08 Thread Weizhong (JIRA)
Weizhong created SPARK-15824:


 Summary: Run 'with ... insert ... select' failed when use spark 
thriftserver
 Key: SPARK-15824
 URL: https://issues.apache.org/jira/browse/SPARK-15824
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Weizhong
Priority: Minor


{code:sql}
create table src(k int, v int);
create table src_parquet(k int, v int);
with v as (select 1, 2) insert into table src_parquet from src;
{code}
Will throw exception: spark.sql.execution.id is already set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15823) Add @property for 'property' in MulticlassMetrics

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15823:


Assignee: Apache Spark

> Add @property for 'property' in MulticlassMetrics
> -
>
> Key: SPARK-15823
> URL: https://issues.apache.org/jira/browse/SPARK-15823
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: zhengruifeng
>Assignee: Apache Spark
>Priority: Minor
>
> 'accuracy' should be decorated with `@property` to keep step with other 
> methods in `pyspark.MulticlassMetrics`, like `weightedPrecision`, 
> `weightedRecall`, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15823) Add @property for 'property' in MulticlassMetrics

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15823:


Assignee: (was: Apache Spark)

> Add @property for 'property' in MulticlassMetrics
> -
>
> Key: SPARK-15823
> URL: https://issues.apache.org/jira/browse/SPARK-15823
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: zhengruifeng
>Priority: Minor
>
> 'accuracy' should be decorated with `@property` to keep step with other 
> methods in `pyspark.MulticlassMetrics`, like `weightedPrecision`, 
> `weightedRecall`, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15823) Add @property for 'property' in MulticlassMetrics

2016-06-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320393#comment-15320393
 ] 

Apache Spark commented on SPARK-15823:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/13560

> Add @property for 'property' in MulticlassMetrics
> -
>
> Key: SPARK-15823
> URL: https://issues.apache.org/jira/browse/SPARK-15823
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: zhengruifeng
>Priority: Minor
>
> 'accuracy' should be decorated with `@property` to keep step with other 
> methods in `pyspark.MulticlassMetrics`, like `weightedPrecision`, 
> `weightedRecall`, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15823) Add @property for 'property' in MulticlassMetrics

2016-06-08 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320392#comment-15320392
 ] 

Sean Owen commented on SPARK-15823:
---

OK. Can you review these changes for similar issues in one pass?

> Add @property for 'property' in MulticlassMetrics
> -
>
> Key: SPARK-15823
> URL: https://issues.apache.org/jira/browse/SPARK-15823
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: zhengruifeng
>Priority: Minor
>
> 'accuracy' should be decorated with `@property` to keep step with other 
> methods in `pyspark.MulticlassMetrics`, like `weightedPrecision`, 
> `weightedRecall`, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15823) Add @property for 'property' in MulticlassMetrics

2016-06-08 Thread zhengruifeng (JIRA)
zhengruifeng created SPARK-15823:


 Summary: Add @property for 'property' in MulticlassMetrics
 Key: SPARK-15823
 URL: https://issues.apache.org/jira/browse/SPARK-15823
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: zhengruifeng
Priority: Minor


'accuracy' should be decorated with `@property` to keep step with other methods 
in `pyspark.MulticlassMetrics`, like `weightedPrecision`, `weightedRecall`, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15086) Update Java API once the Scala one is finalized

2016-06-08 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320386#comment-15320386
 ] 

Sean Owen commented on SPARK-15086:
---

Yes, that's clear, but this doesn't address the 3 questions above, which are 
the stickier questions about what to do here.

> Update Java API once the Scala one is finalized
> ---
>
> Key: SPARK-15086
> URL: https://issues.apache.org/jira/browse/SPARK-15086
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Priority: Blocker
>
> We should make sure we update the Java API once the Scala one is finalized. 
> This includes adding the equivalent API in Java as well as deprecating the 
> old ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15821) Should we use mvn -T for multithreaded Spark builds?

2016-06-08 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320385#comment-15320385
 ] 

Sean Owen commented on SPARK-15821:
---

For building (i.e. compiling and packaging) -- sure. For testing, not sure, 
since I would bet some tests somewhere actually conflict (i.e. holding the same 
lock on a local DB or something), but, wouldn't hurt to try.

> Should we use mvn -T for multithreaded Spark builds?
> 
>
> Key: SPARK-15821
> URL: https://issues.apache.org/jira/browse/SPARK-15821
> Project: Spark
>  Issue Type: Question
>  Components: Build
>Reporter: Adam Roberts
>Priority: Minor
>
> With Maven we can build Spark in a multithreaded way and benefit from 
> increased build time performance as a result.
> On a machine with eight cores, I noticed the build time reduced from 20-25 
> minutes to five minutes; this is by building with
> mvn -T 1C -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -DskipTests clean 
> package
> -T 1C says that we'll use one extra thread for each core available, I've 
> never experienced a problem with using this option (ranging from a single 
> cored box to one with 192 cores available)
> Should we use this for building Spark quicker or is the Jenkins job 
> deliberately set up such that each "executor" is needed for each pull request 
> and we wouldn't see an improvement anyway? 
> This can be discovered by checking core utilization across the farm and can 
> potentially reduce our build times.
> Here's more information on the feature: 
> https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3
> If this isn't suitable for the current farm then I think we should document 
> it for those building Spark from source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String with spark.memory.offHeap.enabled=true

2016-06-08 Thread Pete Robbins (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320380#comment-15320380
 ] 

Pete Robbins commented on SPARK-15822:
--

I'm investigating this and will attach the app and config later

> segmentation violation in o.a.s.unsafe.types.UTF8String with 
> spark.memory.offHeap.enabled=true
> --
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Priority: Critical
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String with spark.memory.offHeap.enabled=true

2016-06-08 Thread Pete Robbins (JIRA)
Pete Robbins created SPARK-15822:


 Summary: segmentation violation in o.a.s.unsafe.types.UTF8String 
with spark.memory.offHeap.enabled=true
 Key: SPARK-15822
 URL: https://issues.apache.org/jira/browse/SPARK-15822
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: linux amd64

openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

Reporter: Pete Robbins
Priority: Critical


Executors fail with segmentation violation while running application with
spark.memory.offHeap.enabled true
spark.memory.offHeap.size 512m

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
#
# JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
# Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# J 4816 C2 
org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
 (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]

We initially saw this on IBM java on PowerPC box but is recreatable on linux 
with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
same code point:

16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
java.lang.NullPointerException
at 
org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
at 
org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
at 
org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.lang.Thread.run(Thread.java:785)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14146) Imported implicits can't be found in Spark REPL in some cases

2016-06-08 Thread Prashant Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320357#comment-15320357
 ] 

Prashant Sharma commented on SPARK-14146:
-

So I tried that option, looks like it does not help either. 
here is the branch. 
https://github.com/ScrapCodes/spark/tree/SPARK-14146/import-fix

Unexplored ideas that can may be fix this issue.
-Yunused-imports is one unexplored territory.
I am not sure, if replacing semicolon in the input with \n can work. Because 
sometimes input can be an XML(or even scala-xml) literal. 


And I will think of more options.

Thanks,

> Imported implicits can't be found in Spark REPL in some cases
> -
>
> Key: SPARK-14146
> URL: https://issues.apache.org/jira/browse/SPARK-14146
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>
> {code}
> class I(i: Int) {
>   def double: Int = i * 2
> }
> class Context {
>   implicit def toI(i: Int): I = new I(i)
> }
> val c = new Context
> import c._
> // OK
> 1.double
> // Fail
> class A; 1.double
> {code}
> The above code snippets can work in Scala REPL however.
> This will affect our Dataset functionality, for example:
> {code}
> class A; Seq(1 -> "a").toDS() // fail
> {code}
> or in paste mode:
> {code}
> :paste
> class A
> Seq(1 -> "a").toDS() // fail
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15086) Update Java API once the Scala one is finalized

2016-06-08 Thread Weichen Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320351#comment-15320351
 ] 

Weichen Xu commented on SPARK-15086:


I think the Java API should be the same to scala API if possible, so here the 
JavaSparkContext.longAccumulator method should return LongAccumulator object, 
not Accumulator[long] which is deprecate, other methods can do similar 
modification. If such modification is OK, I can do it. Thanks!

> Update Java API once the Scala one is finalized
> ---
>
> Key: SPARK-15086
> URL: https://issues.apache.org/jira/browse/SPARK-15086
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Priority: Blocker
>
> We should make sure we update the Java API once the Scala one is finalized. 
> This includes adding the equivalent API in Java as well as deprecating the 
> old ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15821) Should we use mvn -T for multithreaded Spark builds?

2016-06-08 Thread Adam Roberts (JIRA)
Adam Roberts created SPARK-15821:


 Summary: Should we use mvn -T for multithreaded Spark builds?
 Key: SPARK-15821
 URL: https://issues.apache.org/jira/browse/SPARK-15821
 Project: Spark
  Issue Type: Question
  Components: Build
Reporter: Adam Roberts
Priority: Minor


With Maven we can build Spark in a multithreaded way and benefit from increased 
build time performance as a result.

On a machine with eight cores, I noticed the build time reduced from 20-25 
minutes to five minutes; this is by building with

mvn -T 1C -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -DskipTests clean 
package

-T 1C says that we'll use one extra thread for each core available, I've never 
experienced a problem with using this option (ranging from a single cored box 
to one with 192 cores available)

Should we use this for building Spark quicker or is the Jenkins job 
deliberately set up such that each "executor" is needed for each pull request 
and we wouldn't see an improvement anyway? 

This can be discovered by checking core utilization across the farm and can 
potentially reduce our build times.

Here's more information on the feature: 
https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3

If this isn't suitable for the current farm then I think we should document it 
for those building Spark from source



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15820) Add Catalog.refreshTable into python API

2016-06-08 Thread Weichen Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu updated SPARK-15820:
---
Summary: Add Catalog.refreshTable into python API  (was: Add spark-SQL 
Catalog.refreshTable into python api)

> Add Catalog.refreshTable into python API
> 
>
> Key: SPARK-15820
> URL: https://issues.apache.org/jira/browse/SPARK-15820
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Reporter: Weichen Xu
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The Catalog.refreshTable API is missing in python interface for Spark-SQL, 
> add it.
> see also: https://issues.apache.org/jira/browse/SPARK-15367



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15820) Add spark-SQL Catalog.refreshTable into python api

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15820:


Assignee: (was: Apache Spark)

> Add spark-SQL Catalog.refreshTable into python api
> --
>
> Key: SPARK-15820
> URL: https://issues.apache.org/jira/browse/SPARK-15820
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Reporter: Weichen Xu
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The Catalog.refreshTable API is missing in python interface for Spark-SQL, 
> add it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15820) Add spark-SQL Catalog.refreshTable into python api

2016-06-08 Thread Weichen Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu updated SPARK-15820:
---
External issue ID:   (was: SPARK-15367)

> Add spark-SQL Catalog.refreshTable into python api
> --
>
> Key: SPARK-15820
> URL: https://issues.apache.org/jira/browse/SPARK-15820
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Reporter: Weichen Xu
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The Catalog.refreshTable API is missing in python interface for Spark-SQL, 
> add it.
> see also: https://issues.apache.org/jira/browse/SPARK-15367



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15820) Add spark-SQL Catalog.refreshTable into python api

2016-06-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320304#comment-15320304
 ] 

Apache Spark commented on SPARK-15820:
--

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/13558

> Add spark-SQL Catalog.refreshTable into python api
> --
>
> Key: SPARK-15820
> URL: https://issues.apache.org/jira/browse/SPARK-15820
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Reporter: Weichen Xu
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The Catalog.refreshTable API is missing in python interface for Spark-SQL, 
> add it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15820) Add spark-SQL Catalog.refreshTable into python api

2016-06-08 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15820:


Assignee: Apache Spark

> Add spark-SQL Catalog.refreshTable into python api
> --
>
> Key: SPARK-15820
> URL: https://issues.apache.org/jira/browse/SPARK-15820
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Reporter: Weichen Xu
>Assignee: Apache Spark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The Catalog.refreshTable API is missing in python interface for Spark-SQL, 
> add it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >