[jira] [Assigned] (SPARK-13891) Issue an exception when hitting max iteration limit in testing
[ https://issues.apache.org/jira/browse/SPARK-13891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13891: Assignee: (was: Apache Spark) > Issue an exception when hitting max iteration limit in testing > -- > > Key: SPARK-13891 > URL: https://issues.apache.org/jira/browse/SPARK-13891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > Issue an exception in the unit tests of Spark SQL when hitting the max > iteration limit. Then, we can catch the infinite loop bugs in Analyzer and > Optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13891) Issue an exception when hitting max iteration limit in testing
[ https://issues.apache.org/jira/browse/SPARK-13891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194761#comment-15194761 ] Apache Spark commented on SPARK-13891: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/11714 > Issue an exception when hitting max iteration limit in testing > -- > > Key: SPARK-13891 > URL: https://issues.apache.org/jira/browse/SPARK-13891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > Issue an exception in the unit tests of Spark SQL when hitting the max > iteration limit. Then, we can catch the infinite loop bugs in Analyzer and > Optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13891) Issue an exception when hitting max iteration limit in testing
[ https://issues.apache.org/jira/browse/SPARK-13891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13891: Assignee: Apache Spark > Issue an exception when hitting max iteration limit in testing > -- > > Key: SPARK-13891 > URL: https://issues.apache.org/jira/browse/SPARK-13891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Apache Spark > > Issue an exception in the unit tests of Spark SQL when hitting the max > iteration limit. Then, we can catch the infinite loop bugs in Analyzer and > Optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13891) Issue an exception when hitting max iteration limit in testing
Xiao Li created SPARK-13891: --- Summary: Issue an exception when hitting max iteration limit in testing Key: SPARK-13891 URL: https://issues.apache.org/jira/browse/SPARK-13891 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Xiao Li Issue an exception in the unit tests of Spark SQL when hitting the max iteration limit. Then, we can catch the infinite loop bugs in Analyzer and Optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13034) PySpark ml.classification support export/import
[ https://issues.apache.org/jira/browse/SPARK-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194746#comment-15194746 ] Apache Spark commented on SPARK-13034: -- User 'GayathriMurali' has created a pull request for this issue: https://github.com/apache/spark/pull/11707 > PySpark ml.classification support export/import > --- > > Key: SPARK-13034 > URL: https://issues.apache.org/jira/browse/SPARK-13034 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Reporter: Yanbo Liang >Priority: Minor > > Add export/import for all estimators and transformers(which have Scala > implementation) under pyspark/ml/classification.py. Please refer the > implementation at SPARK-13032. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13353) Fast serialization for collecting DataFrame
[ https://issues.apache.org/jira/browse/SPARK-13353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13353. - Resolution: Fixed Assignee: Davies Liu Fix Version/s: 2.0.0 > Fast serialization for collecting DataFrame > --- > > Key: SPARK-13353 > URL: https://issues.apache.org/jira/browse/SPARK-13353 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > > UnsafeRowSerializer should be more efficient than JavaSerializer or > KyroSerializer for DataFrame. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13661) Avoid the copy of UnsafeRow in HashedRelation
[ https://issues.apache.org/jira/browse/SPARK-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13661. - Resolution: Fixed Assignee: Davies Liu Fix Version/s: 2.0.0 > Avoid the copy of UnsafeRow in HashedRelation > - > > Key: SPARK-13661 > URL: https://issues.apache.org/jira/browse/SPARK-13661 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > > We usually build the HashedRelation on top of array of UnsafeRow, the copy > could be avoided. > The caller of HashedRelation need to do the copy if it's needed. > Another approach could be making the copy() of UnsafeRow smart so that it > know when should copy the bytes or not, this could be also useful for other > components. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13889) Integer overflow when calculating the max number of executor failure
[ https://issues.apache.org/jira/browse/SPARK-13889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13889: Assignee: Apache Spark > Integer overflow when calculating the max number of executor failure > > > Key: SPARK-13889 > URL: https://issues.apache.org/jira/browse/SPARK-13889 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0, 1.6.1 >Reporter: Carson Wang >Assignee: Apache Spark > > The max number of executor failure before failing the application is default > to twice the maximum number of executors if dynamic allocation is enabled. > The default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. > So this causes an integer overflow and a wrong result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13889) Integer overflow when calculating the max number of executor failure
[ https://issues.apache.org/jira/browse/SPARK-13889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194737#comment-15194737 ] Apache Spark commented on SPARK-13889: -- User 'carsonwang' has created a pull request for this issue: https://github.com/apache/spark/pull/11713 > Integer overflow when calculating the max number of executor failure > > > Key: SPARK-13889 > URL: https://issues.apache.org/jira/browse/SPARK-13889 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0, 1.6.1 >Reporter: Carson Wang > > The max number of executor failure before failing the application is default > to twice the maximum number of executors if dynamic allocation is enabled. > The default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. > So this causes an integer overflow and a wrong result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13889) Integer overflow when calculating the max number of executor failure
[ https://issues.apache.org/jira/browse/SPARK-13889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13889: Assignee: (was: Apache Spark) > Integer overflow when calculating the max number of executor failure > > > Key: SPARK-13889 > URL: https://issues.apache.org/jira/browse/SPARK-13889 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0, 1.6.1 >Reporter: Carson Wang > > The max number of executor failure before failing the application is default > to twice the maximum number of executors if dynamic allocation is enabled. > The default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. > So this causes an integer overflow and a wrong result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13890) Remove some internal classes' dependency on SQLContext
[ https://issues.apache.org/jira/browse/SPARK-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194735#comment-15194735 ] Apache Spark commented on SPARK-13890: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/11712 > Remove some internal classes' dependency on SQLContext > -- > > Key: SPARK-13890 > URL: https://issues.apache.org/jira/browse/SPARK-13890 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > In general it is better for internal classes to not depend on the external > class (in this case SQLContext) to reduce coupling between user-facing APIs > and the internal implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13890) Remove some internal classes' dependency on SQLContext
[ https://issues.apache.org/jira/browse/SPARK-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13890: Assignee: Apache Spark (was: Reynold Xin) > Remove some internal classes' dependency on SQLContext > -- > > Key: SPARK-13890 > URL: https://issues.apache.org/jira/browse/SPARK-13890 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > > In general it is better for internal classes to not depend on the external > class (in this case SQLContext) to reduce coupling between user-facing APIs > and the internal implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13890) Remove some internal classes' dependency on SQLContext
[ https://issues.apache.org/jira/browse/SPARK-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13890: Assignee: Reynold Xin (was: Apache Spark) > Remove some internal classes' dependency on SQLContext > -- > > Key: SPARK-13890 > URL: https://issues.apache.org/jira/browse/SPARK-13890 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > In general it is better for internal classes to not depend on the external > class (in this case SQLContext) to reduce coupling between user-facing APIs > and the internal implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13712) Add OneVsOne to ML
[ https://issues.apache.org/jira/browse/SPARK-13712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194734#comment-15194734 ] Joseph K. Bradley commented on SPARK-13712: --- Hm, my understanding is that ECOC-based ones can achieve performance similar to one-vs-one. See, e.g., Allwein et al. "Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers" JMLR 2001. [http://www.jmlr.org/papers/volume1/allwein00a/allwein00a.pdf] It's possible those results rely on soft predictions from classifiers, but I don't think they do. I'd need to refresh on that material to recall for sure. > Add OneVsOne to ML > -- > > Key: SPARK-13712 > URL: https://issues.apache.org/jira/browse/SPARK-13712 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: zhengruifeng >Priority: Minor > > Another Meta method for multi-class classification. > Most classification algorithms were designed for balanced data. > The OneVsRest method will generate K models on imbalanced data. > The OneVsOne will train K*(K-1)/2 models on balanced data. > OneVsOne is less sensitive to the problems of imbalanced datasets, and can > usually result in higher precision. > But it is much more computationally expensive, although each model are > trained on a much smaller dataset. (2/K of total) > The OneVsOne is implemented in the way OneVsRest did: > val classifier = new LogisticRegression() > val ovo = new OneVsOne() > ovo.setClassifier(classifier) > val ovoModel = ovo.fit(data) > val predictions = ovoModel.transform(data) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13890) Remove some internal classes' dependency on SQLContext
Reynold Xin created SPARK-13890: --- Summary: Remove some internal classes' dependency on SQLContext Key: SPARK-13890 URL: https://issues.apache.org/jira/browse/SPARK-13890 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin In general it is better for internal classes to not depend on the external class (in this case SQLContext) to reduce coupling between user-facing APIs and the internal implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13888) Remove Streaming Akka docs from Spark
[ https://issues.apache.org/jira/browse/SPARK-13888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194731#comment-15194731 ] Apache Spark commented on SPARK-13888: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/11711 > Remove Streaming Akka docs from Spark > - > > Key: SPARK-13888 > URL: https://issues.apache.org/jira/browse/SPARK-13888 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Reporter: Shixiong Zhu > > I have copied the docs of Streaming Akka to > https://github.com/spark-packages/dstream-akka/blob/master/README.md > So we can remove them from Spark now. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13888) Remove Streaming Akka docs from Spark
[ https://issues.apache.org/jira/browse/SPARK-13888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13888: Assignee: Apache Spark > Remove Streaming Akka docs from Spark > - > > Key: SPARK-13888 > URL: https://issues.apache.org/jira/browse/SPARK-13888 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Reporter: Shixiong Zhu >Assignee: Apache Spark > > I have copied the docs of Streaming Akka to > https://github.com/spark-packages/dstream-akka/blob/master/README.md > So we can remove them from Spark now. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13888) Remove Streaming Akka docs from Spark
[ https://issues.apache.org/jira/browse/SPARK-13888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13888: Assignee: (was: Apache Spark) > Remove Streaming Akka docs from Spark > - > > Key: SPARK-13888 > URL: https://issues.apache.org/jira/browse/SPARK-13888 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Reporter: Shixiong Zhu > > I have copied the docs of Streaming Akka to > https://github.com/spark-packages/dstream-akka/blob/master/README.md > So we can remove them from Spark now. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13888) Remove Streaming Akka docs from Spark
Shixiong Zhu created SPARK-13888: Summary: Remove Streaming Akka docs from Spark Key: SPARK-13888 URL: https://issues.apache.org/jira/browse/SPARK-13888 Project: Spark Issue Type: Sub-task Components: Documentation Reporter: Shixiong Zhu I have copied the docs of Streaming Akka to https://github.com/spark-packages/dstream-akka/blob/master/README.md So we can remove them from Spark now. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13889) Integer overflow when calculating the max number of executor failure
Carson Wang created SPARK-13889: --- Summary: Integer overflow when calculating the max number of executor failure Key: SPARK-13889 URL: https://issues.apache.org/jira/browse/SPARK-13889 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.6.1, 1.6.0 Reporter: Carson Wang The max number of executor failure before failing the application is default to twice the maximum number of executors if dynamic allocation is enabled. The default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. So this causes an integer overflow and a wrong result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method
[ https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MahmoudHanafy updated SPARK-13886: -- Description: There are multiple types that are supoprted by Spark SQL, One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} Question1: Can the key in MapType be of BinaryType ? Question2: Isn't there another way to handle BinaryType by using scala type instead of Array ? I want to contribute by fixing this issue. was: There are multiple types that are supoprted by Spark SQL, One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} Question1: Can the key in MapType be of BinaryType ? Question2: Why do you use Array\[Byte\] as Binary type ? Why not Seq\[Byte\] ? I think this to differentiate between (ArrayType of Byte) and (BinaryType). Isn't there another way to handle BinaryType by using scala type instead of Array I want to contribute by fixing this issue. > ArrayType of BinaryType not supported in Row.equals method > --- > > Key: SPARK-13886 > URL: https://issues.apache.org/jira/browse/SPARK-13886 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: MahmoudHanafy >Priority: Minor > > There are multiple types that are supoprted by Spark SQL, One of them is > ArrayType(Seq) which can be of any element type > So it can be BinaryType(Array\[Byte\]) > In equals method in Row class, there is no handling for ArrayType of > BinaryType. > So for example: > {code:xml} > val a = Row( Seq( Array(1.toByte) ) ) > val b = Row( Seq( Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Also, this doesn't work for MapType of BinaryType. > {code:xml} > val a = Row( Map(1 -> Array(1.toByte) ) ) > val b = Row( Map(1 -> Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Question1: Can the key in MapType be of BinaryType ? > Question2: Isn't there another way to handle BinaryType by using scala type > instead of Array ? > I want to contribute by fixing this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method
[ https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MahmoudHanafy updated SPARK-13886: -- Description: There are multiple types that are supoprted by Spark SQL, One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} Question1: Can the key in MapType be of BinaryType ? Question2: Why do you use Array\[Byte\] as Binary type ? Why not Seq\[Byte\] ? I think this to differentiate between (ArrayType of Byte) and (BinaryType). Isn't there another way to handle BinaryType by using scala type instead of Array I want to contribute by fixing this issue. was: There are multiple types that are supoprted by Spark SQL, One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} Question1: Can the key in MapType be of BinaryType ? Question2: Why do you use Array\[Byte\] as Binary type ? Why not Seq\[Byte\] ? I think this to differentiate between (ArrayType of Byte) and (BinaryType). I want to contribute by fixing this issue. > ArrayType of BinaryType not supported in Row.equals method > --- > > Key: SPARK-13886 > URL: https://issues.apache.org/jira/browse/SPARK-13886 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: MahmoudHanafy >Priority: Minor > > There are multiple types that are supoprted by Spark SQL, One of them is > ArrayType(Seq) which can be of any element type > So it can be BinaryType(Array\[Byte\]) > In equals method in Row class, there is no handling for ArrayType of > BinaryType. > So for example: > {code:xml} > val a = Row( Seq( Array(1.toByte) ) ) > val b = Row( Seq( Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Also, this doesn't work for MapType of BinaryType. > {code:xml} > val a = Row( Map(1 -> Array(1.toByte) ) ) > val b = Row( Map(1 -> Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Question1: Can the key in MapType be of BinaryType ? > Question2: Why do you use Array\[Byte\] as Binary type ? Why not Seq\[Byte\] ? > I think this to differentiate between (ArrayType of Byte) and (BinaryType). > Isn't there another way to handle BinaryType by using scala type instead of > Array > I want to contribute by fixing this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method
[ https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MahmoudHanafy updated SPARK-13886: -- Description: There are multiple types that are supoprted by Spark SQL, One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} Question1: Can the key in MapType be of BinaryType ? Question2: Why do you use Array\[Byte\] as Binary type ? Why not Seq\[Byte\] ? I think this to differentiate between (ArrayType of Byte) and (BinaryType). I want to contribute by fixing this issue. was: There are multiple types that are supoprted by Spark SQL, One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} Question1: Can the key in MapType be of BinaryType ? Question2: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ? All this overhead here because using Array and it needs special handling I want to contribute by fixing this issue. > ArrayType of BinaryType not supported in Row.equals method > --- > > Key: SPARK-13886 > URL: https://issues.apache.org/jira/browse/SPARK-13886 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: MahmoudHanafy >Priority: Minor > > There are multiple types that are supoprted by Spark SQL, One of them is > ArrayType(Seq) which can be of any element type > So it can be BinaryType(Array\[Byte\]) > In equals method in Row class, there is no handling for ArrayType of > BinaryType. > So for example: > {code:xml} > val a = Row( Seq( Array(1.toByte) ) ) > val b = Row( Seq( Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Also, this doesn't work for MapType of BinaryType. > {code:xml} > val a = Row( Map(1 -> Array(1.toByte) ) ) > val b = Row( Map(1 -> Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Question1: Can the key in MapType be of BinaryType ? > Question2: Why do you use Array\[Byte\] as Binary type ? Why not Seq\[Byte\] ? > I think this to differentiate between (ArrayType of Byte) and (BinaryType). > I want to contribute by fixing this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5433) Spark EC2 doesn't mount local disks for all instance types
[ https://issues.apache.org/jira/browse/SPARK-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194722#comment-15194722 ] Geet Kumar commented on SPARK-5433: --- The spark-ec2 script does not mount disks for the d2.xlarge instance types. > Spark EC2 doesn't mount local disks for all instance types > -- > > Key: SPARK-5433 > URL: https://issues.apache.org/jira/browse/SPARK-5433 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.2.0 >Reporter: Tomer Kaftan >Priority: Critical > > Launching a cluster using spark-ec2 will currently mount all local disks for > the r3 instance types. > Branch 1.3.0 of the ec2 scripts has also been updated to mount one local ssd > disk for the i2 instance types. > At the very least the i2 instance types need to have all local disks mounted. > We also need to find if there are any other instance types that also need to > be updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method
[ https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MahmoudHanafy updated SPARK-13886: -- Description: There are multiple types that are supoprted by Spark SQL, One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} Question: Can the key in MapType be of BinaryType ? Question: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ? All this overhead here because using Array and it needs special handling I want to contribute by fixing this issue. was: There are multiple types that are supoprted by Spark SQL, One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} Question: Can the key in MapType be of BinaryType ? I want to contribute by fixing this issue. > ArrayType of BinaryType not supported in Row.equals method > --- > > Key: SPARK-13886 > URL: https://issues.apache.org/jira/browse/SPARK-13886 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: MahmoudHanafy >Priority: Minor > > There are multiple types that are supoprted by Spark SQL, One of them is > ArrayType(Seq) which can be of any element type > So it can be BinaryType(Array\[Byte\]) > In equals method in Row class, there is no handling for ArrayType of > BinaryType. > So for example: > {code:xml} > val a = Row( Seq( Array(1.toByte) ) ) > val b = Row( Seq( Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Also, this doesn't work for MapType of BinaryType. > {code:xml} > val a = Row( Map(1 -> Array(1.toByte) ) ) > val b = Row( Map(1 -> Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Question: Can the key in MapType be of BinaryType ? > Question: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ? > All this overhead here because using Array and it needs special handling > I want to contribute by fixing this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method
[ https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MahmoudHanafy updated SPARK-13886: -- Description: There are multiple types that are supoprted by Spark SQL, One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} Question1: Can the key in MapType be of BinaryType ? Question2: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ? All this overhead here because using Array and it needs special handling I want to contribute by fixing this issue. was: There are multiple types that are supoprted by Spark SQL, One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} Question: Can the key in MapType be of BinaryType ? Question: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ? All this overhead here because using Array and it needs special handling I want to contribute by fixing this issue. > ArrayType of BinaryType not supported in Row.equals method > --- > > Key: SPARK-13886 > URL: https://issues.apache.org/jira/browse/SPARK-13886 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: MahmoudHanafy >Priority: Minor > > There are multiple types that are supoprted by Spark SQL, One of them is > ArrayType(Seq) which can be of any element type > So it can be BinaryType(Array\[Byte\]) > In equals method in Row class, there is no handling for ArrayType of > BinaryType. > So for example: > {code:xml} > val a = Row( Seq( Array(1.toByte) ) ) > val b = Row( Seq( Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Also, this doesn't work for MapType of BinaryType. > {code:xml} > val a = Row( Map(1 -> Array(1.toByte) ) ) > val b = Row( Map(1 -> Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Question1: Can the key in MapType be of BinaryType ? > Question2: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ? > All this overhead here because using Array and it needs special handling > I want to contribute by fixing this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method
[ https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MahmoudHanafy updated SPARK-13886: -- Description: There are multiple types that are supoprted by Spark SQL, One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} Question: Can the key in MapType be of BinaryType ? Question: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ? All this overhead here because using Array and it needs special handling I want to contribute by fixing this issue. was: There are multiple types that are supoprted by Spark SQL, One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} Question: Can the key in MapType be of BinaryType ? Question: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ? All this overhead here because using Array and it needs special handling I want to contribute by fixing this issue. > ArrayType of BinaryType not supported in Row.equals method > --- > > Key: SPARK-13886 > URL: https://issues.apache.org/jira/browse/SPARK-13886 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: MahmoudHanafy >Priority: Minor > > There are multiple types that are supoprted by Spark SQL, One of them is > ArrayType(Seq) which can be of any element type > So it can be BinaryType(Array\[Byte\]) > In equals method in Row class, there is no handling for ArrayType of > BinaryType. > So for example: > {code:xml} > val a = Row( Seq( Array(1.toByte) ) ) > val b = Row( Seq( Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Also, this doesn't work for MapType of BinaryType. > {code:xml} > val a = Row( Map(1 -> Array(1.toByte) ) ) > val b = Row( Map(1 -> Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Question: Can the key in MapType be of BinaryType ? > Question: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ? > All this overhead here because using Array and it needs special handling > I want to contribute by fixing this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method
[ https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MahmoudHanafy updated SPARK-13886: -- Description: There are multiple types that are supoprted by Spark SQL, One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} Question: Can the key in MapType be of BinaryType ? I want to contribute by fixing this issue. was: There are multiple types that are supoprted by Spark SQL, One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} I want to contribute by fixing this issue. > ArrayType of BinaryType not supported in Row.equals method > --- > > Key: SPARK-13886 > URL: https://issues.apache.org/jira/browse/SPARK-13886 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: MahmoudHanafy >Priority: Minor > > There are multiple types that are supoprted by Spark SQL, One of them is > ArrayType(Seq) which can be of any element type > So it can be BinaryType(Array\[Byte\]) > In equals method in Row class, there is no handling for ArrayType of > BinaryType. > So for example: > {code:xml} > val a = Row( Seq( Array(1.toByte) ) ) > val b = Row( Seq( Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Also, this doesn't work for MapType of BinaryType. > {code:xml} > val a = Row( Map(1 -> Array(1.toByte) ) ) > val b = Row( Map(1 -> Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Question: Can the key in MapType be of BinaryType ? > I want to contribute by fixing this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method
[ https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MahmoudHanafy updated SPARK-13886: -- Description: There are multiple types that are supoprted by Spark SQL, One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} I want to contribute by fixing this issue. was: There are multiple types that are supoprted by Spark SQL One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} I want to contribute by fixing this issue. > ArrayType of BinaryType not supported in Row.equals method > --- > > Key: SPARK-13886 > URL: https://issues.apache.org/jira/browse/SPARK-13886 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: MahmoudHanafy >Priority: Minor > > There are multiple types that are supoprted by Spark SQL, One of them is > ArrayType(Seq) which can be of any element type > So it can be BinaryType(Array\[Byte\]) > In equals method in Row class, there is no handling for ArrayType of > BinaryType. > So for example: > {code:xml} > val a = Row( Seq( Array(1.toByte) ) ) > val b = Row( Seq( Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Also, this doesn't work for MapType of BinaryType. > {code:xml} > val a = Row( Map(1 -> Array(1.toByte) ) ) > val b = Row( Map(1 -> Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > I want to contribute by fixing this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method
[ https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MahmoudHanafy updated SPARK-13886: -- Description: There are multiple types that are supoprted by Spark SQL One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} I want to contribute by fixing this issue. was: There are multiple types that are supoprted by Spark SQL One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} I can fix this issue. > ArrayType of BinaryType not supported in Row.equals method > --- > > Key: SPARK-13886 > URL: https://issues.apache.org/jira/browse/SPARK-13886 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: MahmoudHanafy >Priority: Minor > > There are multiple types that are supoprted by Spark SQL > One of them is ArrayType(Seq) which can be of any element type > So it can be BinaryType(Array\[Byte\]) > In equals method in Row class, there is no handling for > ArrayType of BinaryType. > So for example: > {code:xml} > val a = Row( Seq( Array(1.toByte) ) ) > val b = Row( Seq( Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Also, this doesn't work for MapType of BinaryType. > {code:xml} > val a = Row( Map(1 -> Array(1.toByte) ) ) > val b = Row( Map(1 -> Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > I want to contribute by fixing this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method
[ https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MahmoudHanafy updated SPARK-13886: -- Description: There are multiple types that are supoprted by Spark SQL One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, this doesn't work for MapType of BinaryType. {code:xml} val a = Row( Map(1 -> Array(1.toByte) ) ) val b = Row( Map(1 -> Array(1.toByte) ) ) a.equals(b) // this will return false {code} I can fix this issue. was: There are multiple types that are supoprted by Spark SQL One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, I think this doesn't work for MapType of BinaryType. I can fix this issue. > ArrayType of BinaryType not supported in Row.equals method > --- > > Key: SPARK-13886 > URL: https://issues.apache.org/jira/browse/SPARK-13886 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: MahmoudHanafy >Priority: Minor > > There are multiple types that are supoprted by Spark SQL > One of them is ArrayType(Seq) which can be of any element type > So it can be BinaryType(Array\[Byte\]) > In equals method in Row class, there is no handling for > ArrayType of BinaryType. > So for example: > {code:xml} > val a = Row( Seq( Array(1.toByte) ) ) > val b = Row( Seq( Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Also, this doesn't work for MapType of BinaryType. > {code:xml} > val a = Row( Map(1 -> Array(1.toByte) ) ) > val b = Row( Map(1 -> Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > I can fix this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method
[ https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MahmoudHanafy updated SPARK-13886: -- Description: There are multiple types that are supoprted by Spark SQL One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, I think this doesn't work for MapType of BinaryType. I can fix this issue. was: There are multiple types that are supoprted by Spark SQL One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, I think this doesn't work for MapType of binary type. I can fix this issue. > ArrayType of BinaryType not supported in Row.equals method > --- > > Key: SPARK-13886 > URL: https://issues.apache.org/jira/browse/SPARK-13886 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: MahmoudHanafy >Priority: Minor > > There are multiple types that are supoprted by Spark SQL > One of them is ArrayType(Seq) which can be of any element type > So it can be BinaryType(Array\[Byte\]) > In equals method in Row class, there is no handling for > ArrayType of BinaryType. > So for example: > {code:xml} > val a = Row( Seq( Array(1.toByte) ) ) > val b = Row( Seq( Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Also, I think this doesn't work for MapType of BinaryType. > I can fix this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method
[ https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MahmoudHanafy updated SPARK-13886: -- Description: There are multiple types that are supoprted by Spark SQL One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} Also, I think this doesn't work for MapType of binary type. I can fix this issue. was: There are multiple types that are supoprted by Spark SQL One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} I can fix this issue. > ArrayType of BinaryType not supported in Row.equals method > --- > > Key: SPARK-13886 > URL: https://issues.apache.org/jira/browse/SPARK-13886 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: MahmoudHanafy >Priority: Minor > > There are multiple types that are supoprted by Spark SQL > One of them is ArrayType(Seq) which can be of any element type > So it can be BinaryType(Array\[Byte\]) > In equals method in Row class, there is no handling for > ArrayType of BinaryType. > So for example: > {code:xml} > val a = Row( Seq( Array(1.toByte) ) ) > val b = Row( Seq( Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > Also, I think this doesn't work for MapType of binary type. > I can fix this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13887) PyLint should fail fast to make errors easier to discover
holdenk created SPARK-13887: --- Summary: PyLint should fail fast to make errors easier to discover Key: SPARK-13887 URL: https://issues.apache.org/jira/browse/SPARK-13887 Project: Spark Issue Type: Improvement Reporter: holdenk Priority: Minor Right now our PyLint script runs all of the checks and then returns the output, this can make it difficult to find the part which errored and complicates the script a bit. We can simplify out script to fail fast which will both simplify the script and make it easier to discover the errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13887) PyLint should fail fast to make errors easier to discover
[ https://issues.apache.org/jira/browse/SPARK-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-13887: Component/s: PySpark Build > PyLint should fail fast to make errors easier to discover > - > > Key: SPARK-13887 > URL: https://issues.apache.org/jira/browse/SPARK-13887 > Project: Spark > Issue Type: Improvement > Components: Build, PySpark >Reporter: holdenk >Priority: Minor > > Right now our PyLint script runs all of the checks and then returns the > output, this can make it difficult to find the part which errored and > complicates the script a bit. We can simplify out script to fail fast which > will both simplify the script and make it easier to discover the errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method
[ https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MahmoudHanafy updated SPARK-13886: -- Summary: ArrayType of BinaryType not supported in Row.equals method (was: Binary Type ) > ArrayType of BinaryType not supported in Row.equals method > --- > > Key: SPARK-13886 > URL: https://issues.apache.org/jira/browse/SPARK-13886 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: MahmoudHanafy >Priority: Minor > > There are multiple types that are supoprted by Spark SQL > One of them is ArrayType(Seq) which can be of any element type > So it can be BinaryType(Array\[Byte\]) > In equals method in Row class, there is no handling for > ArrayType of BinaryType. > So for example: > {code:xml} > val a = Row( Seq( Array(1.toByte) ) ) > val b = Row( Seq( Array(1.toByte) ) ) > a.equals(b) // this will return false > {code} > I can fix this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13886) Binary Type
MahmoudHanafy created SPARK-13886: - Summary: Binary Type Key: SPARK-13886 URL: https://issues.apache.org/jira/browse/SPARK-13886 Project: Spark Issue Type: Bug Components: SQL Reporter: MahmoudHanafy Priority: Minor There are multiple types that are supoprted by Spark SQL One of them is ArrayType(Seq) which can be of any element type So it can be BinaryType(Array\[Byte\]) In equals method in Row class, there is no handling for ArrayType of BinaryType. So for example: {code:xml} val a = Row( Seq( Array(1.toByte) ) ) val b = Row( Seq( Array(1.toByte) ) ) a.equals(b) // this will return false {code} I can fix this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13885) Spark On Yarn attempt id representation regression
Saisai Shao created SPARK-13885: --- Summary: Spark On Yarn attempt id representation regression Key: SPARK-13885 URL: https://issues.apache.org/jira/browse/SPARK-13885 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 2.0.0 Reporter: Saisai Shao Due to the change of attempt id representation in SPARK-11314, previously attempt id ("1", "2") now changes to (appattempt-xxx-1, appattempt-xxx-2), this will affect all the parts using this attempt id, especially event log file name and history server url link. So here we should change to the original way to fix this regression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response
[ https://issues.apache.org/jira/browse/SPARK-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194665#comment-15194665 ] KaiXinXIaoLei commented on SPARK-13102: --- I just use F12 to debug, and the error info is : CRIPT5009: “d3”no defined file: spark-dag-viz.js,line: 295,column: 29 > Run query using ThriftServer, and open web using IE11, i click ”+detail" in > SQLPage, but not response > -- > > Key: SPARK-13102 > URL: https://issues.apache.org/jira/browse/SPARK-13102 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.0 >Reporter: KaiXinXIaoLei > Attachments: dag info is blank.png, details in SQLPage.png > > > I run query using ThriftServer, and open web using IE11. Then i click > ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in > StagesPage, but get nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13877) Consider removing Kafka modules from Spark / Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194653#comment-15194653 ] Saisai Shao commented on SPARK-13877: - I agreed to move this out, so it could easily support different versions. Currently if we want to introduce Kafka 0.9 supports, either dropping the 0.8 supports or maintaining two modules, either way is not so elegant. Maintaining it out of Spark would be a good choice. > Consider removing Kafka modules from Spark / Spark Streaming > > > Key: SPARK-13877 > URL: https://issues.apache.org/jira/browse/SPARK-13877 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Streaming >Affects Versions: 1.6.1 >Reporter: Hari Shreedharan > > Based on the discussion the PR for SPARK-13843 > ([here|https://github.com/apache/spark/pull/11672#issuecomment-196553283]), > we should consider moving the Kafka modules out of Spark as well. > Providing newer functionality (like security) has become painful while > maintaining compatibility with older versions of Kafka. Moving this out > allows more flexibility, allowing users to mix and match Kafka and Spark > versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13884) Remove DescribeCommand's dependency on LogicalPlan
[ https://issues.apache.org/jira/browse/SPARK-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194647#comment-15194647 ] Apache Spark commented on SPARK-13884: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/11710 > Remove DescribeCommand's dependency on LogicalPlan > -- > > Key: SPARK-13884 > URL: https://issues.apache.org/jira/browse/SPARK-13884 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > DescribeCommand should just take a TableIdentifier, and ask the metadata > catalog for table's information. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13884) Remove DescribeCommand's dependency on LogicalPlan
[ https://issues.apache.org/jira/browse/SPARK-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13884: Assignee: Apache Spark (was: Reynold Xin) > Remove DescribeCommand's dependency on LogicalPlan > -- > > Key: SPARK-13884 > URL: https://issues.apache.org/jira/browse/SPARK-13884 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > > DescribeCommand should just take a TableIdentifier, and ask the metadata > catalog for table's information. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13884) Remove DescribeCommand's dependency on LogicalPlan
[ https://issues.apache.org/jira/browse/SPARK-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13884: Assignee: Reynold Xin (was: Apache Spark) > Remove DescribeCommand's dependency on LogicalPlan > -- > > Key: SPARK-13884 > URL: https://issues.apache.org/jira/browse/SPARK-13884 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > DescribeCommand should just take a TableIdentifier, and ask the metadata > catalog for table's information. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13884) Remove DescribeCommand's dependency on LogicalPlan
Reynold Xin created SPARK-13884: --- Summary: Remove DescribeCommand's dependency on LogicalPlan Key: SPARK-13884 URL: https://issues.apache.org/jira/browse/SPARK-13884 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin DescribeCommand should just take a TableIdentifier, and ask the metadata catalog for table's information. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13034) PySpark ml.classification support export/import
[ https://issues.apache.org/jira/browse/SPARK-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-13034: -- Shepherd: Joseph K. Bradley Target Version/s: 2.0.0 > PySpark ml.classification support export/import > --- > > Key: SPARK-13034 > URL: https://issues.apache.org/jira/browse/SPARK-13034 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Reporter: Yanbo Liang >Priority: Minor > > Add export/import for all estimators and transformers(which have Scala > implementation) under pyspark/ml/classification.py. Please refer the > implementation at SPARK-13032. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13883) buildReader implementation for parquet
[ https://issues.apache.org/jira/browse/SPARK-13883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13883: Assignee: Michael Armbrust (was: Apache Spark) > buildReader implementation for parquet > -- > > Key: SPARK-13883 > URL: https://issues.apache.org/jira/browse/SPARK-13883 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Michael Armbrust >Assignee: Michael Armbrust > Fix For: 2.0.0 > > > Port parquet to the new strategy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13883) buildReader implementation for parquet
[ https://issues.apache.org/jira/browse/SPARK-13883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194638#comment-15194638 ] Apache Spark commented on SPARK-13883: -- User 'marmbrus' has created a pull request for this issue: https://github.com/apache/spark/pull/11709 > buildReader implementation for parquet > -- > > Key: SPARK-13883 > URL: https://issues.apache.org/jira/browse/SPARK-13883 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Michael Armbrust >Assignee: Michael Armbrust > Fix For: 2.0.0 > > > Port parquet to the new strategy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13883) buildReader implementation for parquet
[ https://issues.apache.org/jira/browse/SPARK-13883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13883: Assignee: Apache Spark (was: Michael Armbrust) > buildReader implementation for parquet > -- > > Key: SPARK-13883 > URL: https://issues.apache.org/jira/browse/SPARK-13883 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Michael Armbrust >Assignee: Apache Spark > Fix For: 2.0.0 > > > Port parquet to the new strategy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13883) buildReader implementation for parquet
Michael Armbrust created SPARK-13883: Summary: buildReader implementation for parquet Key: SPARK-13883 URL: https://issues.apache.org/jira/browse/SPARK-13883 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust Port parquet to the new strategy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13791) Add MetadataLog and HDFSMetadataLog
[ https://issues.apache.org/jira/browse/SPARK-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13791. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11625 [https://github.com/apache/spark/pull/11625] > Add MetadataLog and HDFSMetadataLog > --- > > Key: SPARK-13791 > URL: https://issues.apache.org/jira/browse/SPARK-13791 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > Fix For: 2.0.0 > > > - Add a MetadataLog interface for metadata reliably storage. > - Add HDFSMetadataLog as a MetadataLog implementation based on HDFS. > - Update FileStreamSource to use HDFSMetadataLog instead of managing metadata > by itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10380) Confusing examples in pyspark SQL docs
[ https://issues.apache.org/jira/browse/SPARK-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-10380: - Assignee: Reynold Xin > Confusing examples in pyspark SQL docs > -- > > Key: SPARK-10380 > URL: https://issues.apache.org/jira/browse/SPARK-10380 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Reporter: Michael Armbrust >Assignee: Reynold Xin >Priority: Minor > Labels: docs, starter > Fix For: 2.0.0 > > > There’s an error in the astype() documentation, as it uses cast instead of > astype. It should probably include a mention that astype is an alias for cast > (and vice versa in the cast documentation): > https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.astype > > The same error occurs with drop_duplicates and dropDuplicates: > https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.drop_duplicates > > The issue here is we are copying the code. According to [~davies] the > easiest way is to copy the method and just add new docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10380) Confusing examples in pyspark SQL docs
[ https://issues.apache.org/jira/browse/SPARK-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-10380. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11698 [https://github.com/apache/spark/pull/11698] > Confusing examples in pyspark SQL docs > -- > > Key: SPARK-10380 > URL: https://issues.apache.org/jira/browse/SPARK-10380 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Reporter: Michael Armbrust >Priority: Minor > Labels: docs, starter > Fix For: 2.0.0 > > > There’s an error in the astype() documentation, as it uses cast instead of > astype. It should probably include a mention that astype is an alias for cast > (and vice versa in the cast documentation): > https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.astype > > The same error occurs with drop_duplicates and dropDuplicates: > https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.drop_duplicates > > The issue here is we are copying the code. According to [~davies] the > easiest way is to copy the method and just add new docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13882) Remove org.apache.spark.sql.execution.local
[ https://issues.apache.org/jira/browse/SPARK-13882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13882. - Resolution: Fixed Fix Version/s: 2.0.0 > Remove org.apache.spark.sql.execution.local > --- > > Key: SPARK-13882 > URL: https://issues.apache.org/jira/browse/SPARK-13882 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.0.0 > > > We introduced some local operators in org.apache.spark.sql.execution.local > package but never fully wired the engine to actually use these. We still plan > to implement a full local mode, but it's probably going to be fairly > different from what the current iterator-based local mode would look like. > Let's just remove them for now, and we can always re-introduced them in the > future by looking at branch-1.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11826) Subtract BlockMatrix
[ https://issues.apache.org/jira/browse/SPARK-11826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-11826. --- Resolution: Fixed Fix Version/s: 2.0.0 > Subtract BlockMatrix > > > Key: SPARK-11826 > URL: https://issues.apache.org/jira/browse/SPARK-11826 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.6.0 >Reporter: Ehsan Mohyedin Kermani >Assignee: Ehsan Mohyedin Kermani >Priority: Minor > Fix For: 2.0.0 > > > It'd be more convenient to have subtract method for BlockMatrices. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13664) Simplify and Speedup HadoopFSRelation
[ https://issues.apache.org/jira/browse/SPARK-13664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13664. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11646 [https://github.com/apache/spark/pull/11646] > Simplify and Speedup HadoopFSRelation > - > > Key: SPARK-13664 > URL: https://issues.apache.org/jira/browse/SPARK-13664 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Michael Armbrust >Assignee: Michael Armbrust >Priority: Blocker > Fix For: 2.0.0 > > > A majority of Spark SQL queries likely run though {{HadoopFSRelation}}, > however there are currently several complexity and performance problems with > this code path: > - The class mixes the concerns of file management, schema reconciliation, > scan building, bucketing, partitioning, and writing data. > - For very large tables, we are broadcasting the entire list of files to > every executor. [SPARK-11441] > - For partitioned tables, we always do an extra projection. This results > not only in a copy, but undoes much of the performance gains that we are > going to get from vectorized reads. > This is an umbrella ticket to track a set of improvements to this codepath. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13877) Consider removing Kafka modules from Spark / Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated SPARK-13877: -- Summary: Consider removing Kafka modules from Spark / Spark Streaming (was: Consider removing Kafka modules from Spark) > Consider removing Kafka modules from Spark / Spark Streaming > > > Key: SPARK-13877 > URL: https://issues.apache.org/jira/browse/SPARK-13877 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Streaming >Affects Versions: 1.6.1 >Reporter: Hari Shreedharan > > Based on the discussion the PR for SPARK-13843 > ([here|https://github.com/apache/spark/pull/11672#issuecomment-196553283]), > we should consider moving the Kafka modules out of Spark as well. > Providing newer functionality (like security) has become painful while > maintaining compatibility with older versions of Kafka. Moving this out > allows more flexibility, allowing users to mix and match Kafka and Spark > versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13877) Consider removing Kafka modules from Spark
[ https://issues.apache.org/jira/browse/SPARK-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated SPARK-13877: -- Component/s: Streaming > Consider removing Kafka modules from Spark > -- > > Key: SPARK-13877 > URL: https://issues.apache.org/jira/browse/SPARK-13877 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Streaming >Affects Versions: 1.6.1 >Reporter: Hari Shreedharan > > Based on the discussion the PR for SPARK-13843 > ([here|https://github.com/apache/spark/pull/11672#issuecomment-196553283]), > we should consider moving the Kafka modules out of Spark as well. > Providing newer functionality (like security) has become painful while > maintaining compatibility with older versions of Kafka. Moving this out > allows more flexibility, allowing users to mix and match Kafka and Spark > versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-13712) Add OneVsOne to ML
[ https://issues.apache.org/jira/browse/SPARK-13712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-13712: - Comment: was deleted (was: OK, I have closed the PR. I had also planned to implement ECC after this PR. In general, OneVsOne is slowest among the three methods, but it generate the highest accuracy. ECC is the fastest one (about log(num_class) submodels) with lowest accuracy. OneVsRest is in middle of them, both speed and accuracy. In most case, num_class is a small number, and so OneVsOne is useful. Suppose there are 3 classes, OneVsOne is even faster than OneVsRest. So I think it may be a useful choice for user.) > Add OneVsOne to ML > -- > > Key: SPARK-13712 > URL: https://issues.apache.org/jira/browse/SPARK-13712 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: zhengruifeng >Priority: Minor > > Another Meta method for multi-class classification. > Most classification algorithms were designed for balanced data. > The OneVsRest method will generate K models on imbalanced data. > The OneVsOne will train K*(K-1)/2 models on balanced data. > OneVsOne is less sensitive to the problems of imbalanced datasets, and can > usually result in higher precision. > But it is much more computationally expensive, although each model are > trained on a much smaller dataset. (2/K of total) > The OneVsOne is implemented in the way OneVsRest did: > val classifier = new LogisticRegression() > val ovo = new OneVsOne() > ovo.setClassifier(classifier) > val ovoModel = ovo.fit(data) > val predictions = ovoModel.transform(data) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13712) Add OneVsOne to ML
[ https://issues.apache.org/jira/browse/SPARK-13712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194554#comment-15194554 ] zhengruifeng commented on SPARK-13712: -- OK, I have closed the PR. I had also planned to implement ECC after this PR. In general, OneVsOne is slowest among the three methods, but it generate the highest accuracy. ECC is the fastest one (about log(num_class) submodels) with lowest accuracy. OneVsRest is in middle of them, both speed and accuracy. In most case, num_class is a small number, and so OneVsOne is useful. Suppose there are 3 classes, OneVsOne is even faster than OneVsRest. So I think it may be a useful choice for user. > Add OneVsOne to ML > -- > > Key: SPARK-13712 > URL: https://issues.apache.org/jira/browse/SPARK-13712 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: zhengruifeng >Priority: Minor > > Another Meta method for multi-class classification. > Most classification algorithms were designed for balanced data. > The OneVsRest method will generate K models on imbalanced data. > The OneVsOne will train K*(K-1)/2 models on balanced data. > OneVsOne is less sensitive to the problems of imbalanced datasets, and can > usually result in higher precision. > But it is much more computationally expensive, although each model are > trained on a much smaller dataset. (2/K of total) > The OneVsOne is implemented in the way OneVsRest did: > val classifier = new LogisticRegression() > val ovo = new OneVsOne() > ovo.setClassifier(classifier) > val ovoModel = ovo.fit(data) > val predictions = ovoModel.transform(data) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13712) Add OneVsOne to ML
[ https://issues.apache.org/jira/browse/SPARK-13712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194555#comment-15194555 ] zhengruifeng commented on SPARK-13712: -- OK, I have closed the PR. I had also planned to implement ECC after this PR. In general, OneVsOne is slowest among the three methods, but it generate the highest accuracy. ECC is the fastest one (about log(num_class) submodels) with lowest accuracy. OneVsRest is in middle of them, both speed and accuracy. In most case, num_class is a small number, and so OneVsOne is useful. Suppose there are 3 classes, OneVsOne is even faster than OneVsRest. So I think it may be a useful choice for user. > Add OneVsOne to ML > -- > > Key: SPARK-13712 > URL: https://issues.apache.org/jira/browse/SPARK-13712 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: zhengruifeng >Priority: Minor > > Another Meta method for multi-class classification. > Most classification algorithms were designed for balanced data. > The OneVsRest method will generate K models on imbalanced data. > The OneVsOne will train K*(K-1)/2 models on balanced data. > OneVsOne is less sensitive to the problems of imbalanced datasets, and can > usually result in higher precision. > But it is much more computationally expensive, although each model are > trained on a much smaller dataset. (2/K of total) > The OneVsOne is implemented in the way OneVsRest did: > val classifier = new LogisticRegression() > val ovo = new OneVsOne() > ovo.setClassifier(classifier) > val ovoModel = ovo.fit(data) > val predictions = ovoModel.transform(data) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13118) Support for classes defined in package objects
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13118: Assignee: (was: Apache Spark) > Support for classes defined in package objects > -- > > Key: SPARK-13118 > URL: https://issues.apache.org/jira/browse/SPARK-13118 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > When you define a class inside of a package object, the name ends up being > something like {{org.mycompany.project.package$MyClass}}. However, when > reflect on this we try and load {{org.mycompany.project.MyClass}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13118) Support for classes defined in package objects
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194479#comment-15194479 ] Apache Spark commented on SPARK-13118: -- User 'jodersky' has created a pull request for this issue: https://github.com/apache/spark/pull/11708 > Support for classes defined in package objects > -- > > Key: SPARK-13118 > URL: https://issues.apache.org/jira/browse/SPARK-13118 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > When you define a class inside of a package object, the name ends up being > something like {{org.mycompany.project.package$MyClass}}. However, when > reflect on this we try and load {{org.mycompany.project.MyClass}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13118) Support for classes defined in package objects
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13118: Assignee: Apache Spark > Support for classes defined in package objects > -- > > Key: SPARK-13118 > URL: https://issues.apache.org/jira/browse/SPARK-13118 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust >Assignee: Apache Spark > > When you define a class inside of a package object, the name ends up being > something like {{org.mycompany.project.package$MyClass}}. However, when > reflect on this we try and load {{org.mycompany.project.MyClass}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13118) Support for classes defined in package objects
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194465#comment-15194465 ] Jakob Odersky commented on SPARK-13118: --- Sure, I'll submit a PR with the test > Support for classes defined in package objects > -- > > Key: SPARK-13118 > URL: https://issues.apache.org/jira/browse/SPARK-13118 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > When you define a class inside of a package object, the name ends up being > something like {{org.mycompany.project.package$MyClass}}. However, when > reflect on this we try and load {{org.mycompany.project.MyClass}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12718) SQL generation support for window functions
[ https://issues.apache.org/jira/browse/SPARK-12718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-12718: --- Assignee: Wenchen Fan (was: Xiao Li) > SQL generation support for window functions > --- > > Key: SPARK-12718 > URL: https://issues.apache.org/jira/browse/SPARK-12718 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Assignee: Wenchen Fan > > {{HiveWindowFunctionQuerySuite}} and {{HiveWindowFunctionQueryFileSuite}} can > be useful for bootstrapping test coverage. Please refer to SPARK-11012 for > more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13118) Support for classes defined in package objects
[ https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194436#comment-15194436 ] Michael Armbrust commented on SPARK-13118: -- Its likely that we have fixed this with other refactorings. If you add that regression test I think we can close this. > Support for classes defined in package objects > -- > > Key: SPARK-13118 > URL: https://issues.apache.org/jira/browse/SPARK-13118 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > When you define a class inside of a package object, the name ends up being > something like {{org.mycompany.project.package$MyClass}}. However, when > reflect on this we try and load {{org.mycompany.project.MyClass}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13244) Unify DataFrame and Dataset API
[ https://issues.apache.org/jira/browse/SPARK-13244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-13244: Issue Type: Sub-task (was: Improvement) Parent: SPARK-13485 > Unify DataFrame and Dataset API > --- > > Key: SPARK-13244 > URL: https://issues.apache.org/jira/browse/SPARK-13244 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > Fix For: 2.0.0 > > > A {{DataFrame}} is essentially a {{Dataset\[Row\]}}. However, to keep binary > compatibility, {{DataFrame}} didn't extend from {{Dataset\[Row\]}} in 1.6. > In Spark 2.0, they should be unified to minimize concepts. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13843) Move streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages
[ https://issues.apache.org/jira/browse/SPARK-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-13843: Summary: Move streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages (was: Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages) > Move streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, > streaming-twitter to Spark packages > --- > > Key: SPARK-13843 > URL: https://issues.apache.org/jira/browse/SPARK-13843 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > Fix For: 2.0.0 > > > Currently there are a few sub-projects, each for integrating with different > external sources for Streaming. Now that we have better ability to include > external libraries (Spark packages) and with Spark 2.0 coming up, we can move > the following projects out of Spark to https://github.com/spark-packages > - streaming-flume > - streaming-akka > - streaming-mqtt > - streaming-zeromq > - streaming-twitter > They are just some ancillary packages and considering the overhead of > maintenance, running tests and PR failures, it's better to maintain them out > of Spark. In addition, these projects can have their different release cycles > and we can release them faster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13843) Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages
[ https://issues.apache.org/jira/browse/SPARK-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13843. - Resolution: Fixed Fix Version/s: 2.0.0 > Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, > streaming-twitter to Spark packages > - > > Key: SPARK-13843 > URL: https://issues.apache.org/jira/browse/SPARK-13843 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > Fix For: 2.0.0 > > > Currently there are a few sub-projects, each for integrating with different > external sources for Streaming. Now that we have better ability to include > external libraries (Spark packages) and with Spark 2.0 coming up, we can move > the following projects out of Spark to https://github.com/spark-packages > - streaming-flume > - streaming-akka > - streaming-mqtt > - streaming-zeromq > - streaming-twitter > They are just some ancillary packages and considering the overhead of > maintenance, running tests and PR failures, it's better to maintain them out > of Spark. In addition, these projects can have their different release cycles > and we can release them faster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13882) Remove org.apache.spark.sql.execution.local
[ https://issues.apache.org/jira/browse/SPARK-13882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194404#comment-15194404 ] Apache Spark commented on SPARK-13882: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/11705 > Remove org.apache.spark.sql.execution.local > --- > > Key: SPARK-13882 > URL: https://issues.apache.org/jira/browse/SPARK-13882 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > We introduced some local operators in org.apache.spark.sql.execution.local > package but never fully wired the engine to actually use these. We still plan > to implement a full local mode, but it's probably going to be fairly > different from what the current iterator-based local mode would look like. > Let's just remove them for now, and we can always re-introduced them in the > future by looking at branch-1.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13882) Remove org.apache.spark.sql.execution.local
[ https://issues.apache.org/jira/browse/SPARK-13882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13882: Assignee: Reynold Xin (was: Apache Spark) > Remove org.apache.spark.sql.execution.local > --- > > Key: SPARK-13882 > URL: https://issues.apache.org/jira/browse/SPARK-13882 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > We introduced some local operators in org.apache.spark.sql.execution.local > package but never fully wired the engine to actually use these. We still plan > to implement a full local mode, but it's probably going to be fairly > different from what the current iterator-based local mode would look like. > Let's just remove them for now, and we can always re-introduced them in the > future by looking at branch-1.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13882) Remove org.apache.spark.sql.execution.local
[ https://issues.apache.org/jira/browse/SPARK-13882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13882: Assignee: Apache Spark (was: Reynold Xin) > Remove org.apache.spark.sql.execution.local > --- > > Key: SPARK-13882 > URL: https://issues.apache.org/jira/browse/SPARK-13882 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > > We introduced some local operators in org.apache.spark.sql.execution.local > package but never fully wired the engine to actually use these. We still plan > to implement a full local mode, but it's probably going to be fairly > different from what the current iterator-based local mode would look like. > Let's just remove them for now, and we can always re-introduced them in the > future by looking at branch-1.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13882) Remove org.apache.spark.sql.execution.local
Reynold Xin created SPARK-13882: --- Summary: Remove org.apache.spark.sql.execution.local Key: SPARK-13882 URL: https://issues.apache.org/jira/browse/SPARK-13882 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin We introduced some local operators in org.apache.spark.sql.execution.local package but never fully wired the engine to actually use these. We still plan to implement a full local mode, but it's probably going to be fairly different from what the current iterator-based local mode would look like. Let's just remove them for now, and we can always re-introduced them in the future by looking at branch-1.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13880) Rename DataFrame.scala as Dataset.scala
[ https://issues.apache.org/jira/browse/SPARK-13880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13880: Assignee: Reynold Xin (was: Apache Spark) > Rename DataFrame.scala as Dataset.scala > --- > > Key: SPARK-13880 > URL: https://issues.apache.org/jira/browse/SPARK-13880 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13881) Remove LegacyFunctions
[ https://issues.apache.org/jira/browse/SPARK-13881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13881: Assignee: Apache Spark (was: Reynold Xin) > Remove LegacyFunctions > -- > > Key: SPARK-13881 > URL: https://issues.apache.org/jira/browse/SPARK-13881 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > > It was introduced in Spark 1.6 for backward compatibility. We can remove it > now in Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13881) Remove LegacyFunctions
[ https://issues.apache.org/jira/browse/SPARK-13881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13881: Assignee: Reynold Xin (was: Apache Spark) > Remove LegacyFunctions > -- > > Key: SPARK-13881 > URL: https://issues.apache.org/jira/browse/SPARK-13881 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > It was introduced in Spark 1.6 for backward compatibility. We can remove it > now in Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13881) Remove LegacyFunctions
[ https://issues.apache.org/jira/browse/SPARK-13881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194397#comment-15194397 ] Apache Spark commented on SPARK-13881: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/11704 > Remove LegacyFunctions > -- > > Key: SPARK-13881 > URL: https://issues.apache.org/jira/browse/SPARK-13881 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > It was introduced in Spark 1.6 for backward compatibility. We can remove it > now in Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13880) Rename DataFrame.scala as Dataset.scala
[ https://issues.apache.org/jira/browse/SPARK-13880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13880: Assignee: Apache Spark (was: Reynold Xin) > Rename DataFrame.scala as Dataset.scala > --- > > Key: SPARK-13880 > URL: https://issues.apache.org/jira/browse/SPARK-13880 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13880) Rename DataFrame.scala as Dataset.scala
[ https://issues.apache.org/jira/browse/SPARK-13880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194396#comment-15194396 ] Apache Spark commented on SPARK-13880: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/11704 > Rename DataFrame.scala as Dataset.scala > --- > > Key: SPARK-13880 > URL: https://issues.apache.org/jira/browse/SPARK-13880 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13881) Remove LegacyFunctions
Reynold Xin created SPARK-13881: --- Summary: Remove LegacyFunctions Key: SPARK-13881 URL: https://issues.apache.org/jira/browse/SPARK-13881 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin It was introduced in Spark 1.6 for backward compatibility. We can remove it now in Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13880) Rename DataFrame.scala as Dataset.scala
Reynold Xin created SPARK-13880: --- Summary: Rename DataFrame.scala as Dataset.scala Key: SPARK-13880 URL: https://issues.apache.org/jira/browse/SPARK-13880 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13879) Decide DDL/DML commands that need Spark native implementation in 2.0
[ https://issues.apache.org/jira/browse/SPARK-13879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-13879: Summary: Decide DDL/DML commands that need Spark native implementation in 2.0 (was: Decide commands that need Spark native implementation in 2.0) > Decide DDL/DML commands that need Spark native implementation in 2.0 > > > Key: SPARK-13879 > URL: https://issues.apache.org/jira/browse/SPARK-13879 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > Attachments: Implementing native DDL and DML statements for Spark > 2.pdf > > > This task aims to decide commands that we current we delegate to Hive right > now but need to have native implementation in Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13879) Decide commands that need Spark native implementation in 2.0
[ https://issues.apache.org/jira/browse/SPARK-13879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194382#comment-15194382 ] Yin Huai commented on SPARK-13879: -- Just uploaded the initial doc (https://issues.apache.org/jira/secure/attachment/12793435/Implementing%20native%20DDL%20and%20DML%20statements%20for%20Spark%202.pdf). Please note that for Hive tables, we need to implement commands that we do not want to support for Data Source tables for now. Those commands will be summarized in another doc. > Decide commands that need Spark native implementation in 2.0 > > > Key: SPARK-13879 > URL: https://issues.apache.org/jira/browse/SPARK-13879 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > Attachments: Implementing native DDL and DML statements for Spark > 2.pdf > > > This task aims to decide commands that we current we delegate to Hive right > now but need to have native implementation in Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-13854) Add constraints to outer join
[ https://issues.apache.org/jira/browse/SPARK-13854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh closed SPARK-13854. --- Resolution: Not A Problem > Add constraints to outer join > - > > Key: SPARK-13854 > URL: https://issues.apache.org/jira/browse/SPARK-13854 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Liang-Chi Hsieh > > Currently, for left outer join we only keep left side constraint. For right > outer join, we only keep right side constraints. For full outer join, the > constraints are empty. > In fact, the constraints are less than the actual constraints for the join > operator. > For example, for left outer join, besides the constraints from left side, the > constraints of right side should be inherited with a bit modification. > Consider a join as following: > {code} > val tr1 = LocalRelation('a.int, 'b.int, 'c.int).subquery('tr1) > val tr2 = LocalRelation('a.int, 'd.int, 'e.int).subquery('tr2) > tr1.where('a.attr > 10) > .join(tr2.where('d.attr < 100), LeftOuter, Some("tr1.a".attr === > "tr2.a".attr)) > {code} > The constraints are not only "a" > 10, "a" is not null. It should also > include ("d" is null || "d" < 100). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13879) Decide commands that need Spark native implementation in 2.0
[ https://issues.apache.org/jira/browse/SPARK-13879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-13879: - Attachment: Implementing native DDL and DML statements for Spark 2.pdf > Decide commands that need Spark native implementation in 2.0 > > > Key: SPARK-13879 > URL: https://issues.apache.org/jira/browse/SPARK-13879 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > Attachments: Implementing native DDL and DML statements for Spark > 2.pdf > > > This task aims to decide commands that we current we delegate to Hive right > now but need to have native implementation in Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13859) TPCDS query 38 returns wrong results compared to TPC official result set
[ https://issues.apache.org/jira/browse/SPARK-13859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-13859: Description: Testing Spark SQL using TPC queries. Query 38 returns wrong results compared to official result set. This is at 1GB SF (validation run). SparkSQL returns count of 0, answer set reports 107. Actual results: {noformat} [0] {noformat} Expected: {noformat} +-+ | 1 | +-+ | 107 | +-+ {noformat} query used: {noformat} -- start query 38 in stream 0 using template query38.tpl and seed QUALIFICATION select count(*) from ( select distinct c_last_name, c_first_name, d_date from store_sales JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk where d_month_seq between 1200 and 1200 + 11) tmp1 JOIN (select distinct c_last_name, c_first_name, d_date from catalog_sales JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk JOIN customer ON catalog_sales.cs_bill_customer_sk = customer.c_customer_sk where d_month_seq between 1200 and 1200 + 11) tmp2 ON (tmp1.c_last_name = tmp2.c_last_name) and (tmp1.c_first_name = tmp2.c_first_name) and (tmp1.d_date = tmp2.d_date) JOIN ( select distinct c_last_name, c_first_name, d_date from web_sales JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk where d_month_seq between 1200 and 1200 + 11) tmp3 ON (tmp1.c_last_name = tmp3.c_last_name) and (tmp1.c_first_name = tmp3.c_first_name) and (tmp1.d_date = tmp3.d_date) limit 100 ; -- end query 38 in stream 0 using template query38.tpl {noformat} was: Testing Spark SQL using TPC queries. Query 38 returns wrong results compared to official result set. This is at 1GB SF (validation run). SparkSQL returns count of 0, answer set reports 107. Actual results: [0] Expected: +-+ | 1 | +-+ | 107 | +-+ query used: -- start query 38 in stream 0 using template query38.tpl and seed QUALIFICATION select count(*) from ( select distinct c_last_name, c_first_name, d_date from store_sales JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk where d_month_seq between 1200 and 1200 + 11) tmp1 JOIN (select distinct c_last_name, c_first_name, d_date from catalog_sales JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk JOIN customer ON catalog_sales.cs_bill_customer_sk = customer.c_customer_sk where d_month_seq between 1200 and 1200 + 11) tmp2 ON (tmp1.c_last_name = tmp2.c_last_name) and (tmp1.c_first_name = tmp2.c_first_name) and (tmp1.d_date = tmp2.d_date) JOIN ( select distinct c_last_name, c_first_name, d_date from web_sales JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk where d_month_seq between 1200 and 1200 + 11) tmp3 ON (tmp1.c_last_name = tmp3.c_last_name) and (tmp1.c_first_name = tmp3.c_first_name) and (tmp1.d_date = tmp3.d_date) limit 100 ; -- end query 38 in stream 0 using template query38.tpl > TPCDS query 38 returns wrong results compared to TPC official result set > - > > Key: SPARK-13859 > URL: https://issues.apache.org/jira/browse/SPARK-13859 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: JESSE CHEN > Labels: tpcds-result-mismatch > > Testing Spark SQL using TPC queries. Query 38 returns wrong results compared > to official result set. This is at 1GB SF (validation run). > SparkSQL returns count of 0, answer set reports 107. > Actual results: > {noformat} > [0] > {noformat} > Expected: > {noformat} > +-+ > | 1 | > +-+ > | 107 | > +-+ > {noformat} > query used: > {noformat} > -- start query 38 in stream 0 using template query38.tpl and seed > QUALIFICATION > select count(*) from ( > select distinct c_last_name, c_first_name, d_date > from store_sales > JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk > JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk > where d_month_seq between 1200 and 1200 + 11) tmp1 > JOIN > (select distinct c_last_name, c_first_name, d_date > from catalog_sales > JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk > JOIN customer ON catalog_sales.cs_bill_customer_sk = > customer.c_customer_sk > where d_month_seq between 1200 and 1200 + 11) tmp2 ON (tmp1.c_last_name =
[jira] [Updated] (SPARK-13858) TPCDS query 21 returns wrong results compared to TPC official result set
[ https://issues.apache.org/jira/browse/SPARK-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-13858: Description: Testing Spark SQL using TPC queries. Query 21 returns wrong results compared to official result set. This is at 1GB SF (validation run). SparkSQL missing at least one row (grep for ABDA) ; I believe 2 other rows are missing as well. Actual results: {noformat} [null,AABD,2565,1922] [null,AAHD,2956,2052] [null,AALA,2042,1793] [null,ACGC,2373,1771] [null,ACKC,2321,1856] [null,ACOB,1504,1397] [null,ADKB,1820,2163] [null,AEAD,2631,1965] [null,AEOC,1659,1798] [null,AFAC,1965,1705] [null,AFAD,1769,1313] [null,AHDE,2700,1985] [null,AHHA,1578,1082] [null,AIEC,1756,1804] [null,AIMC,3603,2951] [null,AJAC,2109,1989] [null,AJKB,2573,3540] [null,ALBE,3458,2992] [null,ALCE,1720,1810] [null,ALEC,2569,1946] [null,ALNB,2552,1750] [null,ANFE,2022,2269] [null,AOIB,2982,2540] [null,APJB,2344,2593] [null,BAPD,2182,2787] [null,BDCE,2844,2069] [null,BDDD,2417,2537] [null,BDJA,1584,1666] [null,BEOD,2141,2649] [null,BFCC,2745,2020] [null,BFMB,1642,1364] [null,BHPC,1923,1780] [null,BIDB,1956,2836] [null,BIGB,2023,2344] [null,BIJB,1977,2728] [null,BJFE,1891,2390] [null,BLDE,1983,1797] [null,BNID,2485,2324] [null,BNLD,2385,2786] [null,BOMB,2291,2092] [null,CAAA,2233,2560] [null,CBCD,1540,2012] [null,CBIA,2394,2122] [null,CBPB,1790,1661] [null,CCMD,2654,2691] [null,CDBC,1804,2072] [null,CFEA,1941,1567] [null,CGFD,2123,2265] [null,CHPC,2933,2174] [null,CIGD,2618,2399] [null,CJCB,2728,2367] [null,CJLA,1350,1732] [null,CLAE,2578,2329] [null,CLGA,1842,1588] [null,CLLB,3418,2657] [null,CLOB,3115,2560] [null,CMAD,1991,2243] [null,CMJA,1261,1855] [null,CMLA,3288,2753] [null,CMPD,1320,1676] [null,CNGB,2340,2118] [null,CNHD,3519,3348] [null,CNPC,2561,1948] [null,DCPC,2664,2627] [null,DDHA,1313,1926] [null,DDND,1109,835] [null,DEAA,2141,1847] [null,DEJA,3142,2723] [null,DFKB,1470,1650] [null,DGCC,2113,2331] [null,DGFC,2201,2928] [null,DHPA,2467,2133] [null,DMBA,3085,2087] [null,DPAB,3494,3081] [null,EAEC,2133,2148] [null,EAPA,1560,1275] [null,ECGC,2815,3307] [null,EDPD,2731,1883] [null,EEEC,2024,1902] [null,EEMC,2624,2387] [null,EFFA,2047,1878] [null,EGJA,2403,2633] [null,EGMA,2784,2772] [null,EGOC,2389,1753] [null,EHFD,1940,1420] [null,EHLB,2320,2057] [null,EHPA,1898,1853] [null,EIPB,2930,2326] [null,EJAE,2582,1836] [null,EJIB,2257,1681] [null,EJJA,2791,1941] [null,EJJD,3410,2405] [null,EJNC,2472,2067] [null,EJPD,1219,1229] [null,EKEB,2047,1713] [null,EMEA,2502,1897] [null,EMKC,2362,2042] [null,ENAC,2011,1909] [null,ENFB,2507,2162] [null,ENOD,3371,2709] {noformat} Expected results: {noformat} +--+--++---+ | W_WAREHOUSE_NAME | I_ITEM_ID| INV_BEFORE | INV_AFTER | +--+--++---+ | Bad cards must make. | AACD | 1889 | 2168 | | Bad cards must make. | AAHD | 2739 | 2039 | | Bad cards must make. | ABDA | 1717 | 1782 | | Bad cards must make. | ACGC | 2296 | 2276 | | Bad cards must make. | ACKC | 2443 | 1878 | | Bad cards must make. | ACOB | 2705 | 2428 | | Bad cards must make. | ADGB | 2242 | 2759 | | Bad cards must make. | ADKB | 2138 | 2456 | | Bad cards must make. | AEAD | 2914 | 2237 | | Bad cards must make. | AEOC | 1797 | 2073 | | Bad cards must make. | AFAC | 2058 | 2734 | | Bad cards must make. | AFAD | 2173 | 2515 | | Bad cards must make. |
[jira] [Created] (SPARK-13879) Decide commands that need Spark native implementation in 2.0
Yin Huai created SPARK-13879: Summary: Decide commands that need Spark native implementation in 2.0 Key: SPARK-13879 URL: https://issues.apache.org/jira/browse/SPARK-13879 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Yin Huai This task aims to decide commands that we current we delegate to Hive right now but need to have native implementation in Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13879) Decide commands that need Spark native implementation in 2.0
[ https://issues.apache.org/jira/browse/SPARK-13879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-13879: - Target Version/s: 2.0.0 > Decide commands that need Spark native implementation in 2.0 > > > Key: SPARK-13879 > URL: https://issues.apache.org/jira/browse/SPARK-13879 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > > This task aims to decide commands that we current we delegate to Hive right > now but need to have native implementation in Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13861) TPCDS query 40 returns wrong results compared to TPC official result set
[ https://issues.apache.org/jira/browse/SPARK-13861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-13861: Description: Testing Spark SQL using TPC queries. Query 40 returns wrong results compared to official result set. This is at 1GB SF (validation run). SparkSQL missing at least one row (grep for ABBD) ; I believe 5 rows are missing in total. Actual results: {noformat} [TN,AABD,0.0,-82.060899353] [TN,AACD,-216.54000234603882,158.0399932861328] [TN,AAHD,186.54999542236328,0.0] [TN,AALA,0.0,48.2254223633] [TN,ACGC,63.67999863624573,0.0] [TN,ACHC,102.6830517578,51.8838964844] [TN,ACKC,128.9235150146,44.8169482422] [TN,ACLD,205.43999433517456,-948.619930267334] [TN,ACOB,207.32000732421875,24.88389648438] [TN,ACPD,87.75,53.9900016784668] [TN,ADGB,44.310001373291016,222.4800033569336] [TN,ADKB,0.0,-471.8699951171875] [TN,AEAD,58.2400016784668,0.0] [TN,AEOC,19.9084741211,214.7076293945] [TN,AFAC,271.8199977874756,163.1699981689453] [TN,AFAD,2.349046325684,28.3169482422] [TN,AFDC,-378.0499496459961,-303.26999282836914] [TN,AGID,307.6099967956543,-19.29915527344] [TN,AHDE,80.574468689,-476.7200012207031] [TN,AHHA,8.27457763672,155.1276565552] [TN,AHJB,39.23999857902527,0.0] [TN,AIEC,82.3675750732,3.910858306885] [TN,AIEE,20.39618530273,-151.08999633789062] [TN,AIMC,24.46313354492,-150.330517578] [TN,AJAC,49.0915258789,82.084741211] [TN,AJCA,121.18000221252441,63.779998779296875] [TN,AJKB,27.94534057617,8.97267028809] [TN,ALBE,88.2599983215332,30.22542236328] [TN,ALCE,93.5245776367,92.0198092651] [TN,ALEC,64.179019165,15.1584741211] [TN,ALNB,4.19809265137,148.27000427246094] [TN,AMBE,28.44534057617,0.0] [TN,AMPB,0.0,131.92999839782715] [TN,ANFE,0.0,-137.3400115966797] [TN,AOIB,150.40999603271484,254.288058548] [TN,APJB,45.2745776367,334.482015991] [TN,APLA,50.2076293945,29.150001049041748] [TN,APLD,0.0,32.3838964844] [TN,BAPD,93.41999816894531,145.8699951171875] [TN,BBID,296.774577637,30.95084472656] [TN,BDCE,-1771.0800704956055,-54.779998779296875] [TN,BDDD,111.12000274658203,280.5899963378906] [TN,BDJA,0.0,79.5423706055] [TN,BEFD,0.0,3.429475479126] [TN,BEOD,269.838964844,297.5800061225891] [TN,BFMB,110.82999801635742,-941.4000930786133] [TN,BFNA,47.8661035156,0.0] [TN,BFOC,46.3415258789,83.5245776367] [TN,BHPC,27.378392334,77.61999893188477] [TN,BIDB,196.6199951171875,5.57171661377] [TN,BIGB,425.3399963378906,0.0] [TN,BIJB,209.6300048828125,0.0] [TN,BJFE,7.32923706055,55.1584741211] [TN,BKFA,0.0,138.14000129699707] [TN,BKMC,27.17076293945,54.970001220703125] [TN,BLDE,170.28999400138855,0.0] [TN,BNHB,58.0594277954,-337.8899841308594] [TN,BNID,54.41525878906,35.01504089355] [TN,BNLA,0.0,168.37999629974365] [TN,BNLD,0.0,96.4084741211] [TN,BNMC,202.40999698638916,49.52999830245972] [TN,BOCC,4.73019073486,69.83999633789062] [TN,BOMB,63.66999816894531,163.49000668525696] [TN,CAAA,121.91000366210938,0.0] [TN,CAAD,-1107.6099338531494,0.0] [TN,CAJC,115.8046594238,173.0519073486] [TN,CBCD,18.94534057617,226.38000106811523] [TN,CBFA,0.0,97.41000366210938] [TN,CBIA,2.14104904175,84.66000366210938] [TN,CBPB,95.44000244140625,26.6830517578] [TN,CCAB,160.43000602722168,135.8661035156] [TN,CCHD,0.0,121.62000274658203] [TN,CCMD,-115.87000274658203,124.37999820709229] [TN,CDBC,16.628392334,3.399910593033] [TN,CDEC,-3114.599931716919,0.0] [TN,CEEA,34.6830517578,26.4084741211] [TN,CELA,130.58999633789062,154.6300048828125] [TN,CELD,0.0,181.07000732421875] [TN,CFEA,3.779713897705,-315.13000106811523] [TN,CGFD,-386.8699951171875,96.92000102996826] [TN,CHHD,143.17000675201416,251.6338964844] [TN,CHPC,0.1700178813934,198.2991552734] [TN,CJCB,-918.6500339508057,270.9600028991699]
[jira] [Commented] (SPARK-13865) TPCDS query 87 returns wrong results compared to TPC official result set
[ https://issues.apache.org/jira/browse/SPARK-13865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194370#comment-15194370 ] Reynold Xin commented on SPARK-13865: - [~jfc...@us.ibm.com] thanks for filing these. Can you use the noformat tag in the future so the ticket is readable? > TPCDS query 87 returns wrong results compared to TPC official result set > - > > Key: SPARK-13865 > URL: https://issues.apache.org/jira/browse/SPARK-13865 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: JESSE CHEN > Labels: tpcds-result-mismatch > > Testing Spark SQL using TPC queries. Query 87 returns wrong results compared > to official result set. This is at 1GB SF (validation run). > SparkSQL returns count of 47555, answer set expects 47298. > Actual results: > {noformat} > [47555] > {noformat} > {noformat} > Expected: > +---+ > | 1 | > +---+ > | 47298 | > +---+ > {noformat} > Query used: > {noformat} > -- start query 87 in stream 0 using template query87.tpl and seed > QUALIFICATION > select count(*) > from > (select distinct c_last_name as cln1, c_first_name as cfn1, d_date as > ddate1, 1 as notnull1 >from store_sales > JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk > JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk >where > d_month_seq between 1200 and 1200+11 >) tmp1 >left outer join > (select distinct c_last_name as cln2, c_first_name as cfn2, d_date as > ddate2, 1 as notnull2 >from catalog_sales > JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk > JOIN customer ON catalog_sales.cs_bill_customer_sk = > customer.c_customer_sk >where > d_month_seq between 1200 and 1200+11 >) tmp2 > on (tmp1.cln1 = tmp2.cln2) > and (tmp1.cfn1 = tmp2.cfn2) > and (tmp1.ddate1= tmp2.ddate2) >left outer join > (select distinct c_last_name as cln3, c_first_name as cfn3 , d_date as > ddate3, 1 as notnull3 >from web_sales > JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk > JOIN customer ON web_sales.ws_bill_customer_sk = > customer.c_customer_sk >where > d_month_seq between 1200 and 1200+11 >) tmp3 > on (tmp1.cln1 = tmp3.cln3) > and (tmp1.cfn1 = tmp3.cfn3) > and (tmp1.ddate1= tmp3.ddate3) > where > notnull2 is null and notnull3 is null > ; > -- end query 87 in stream 0 using template query87.tpl > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13863) TPCDS query 66 returns wrong results compared to TPC official result set
[ https://issues.apache.org/jira/browse/SPARK-13863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-13863: Description: Testing Spark SQL using TPC queries. Query 66 returns wrong results compared to official result set. This is at 1GB SF (validation run). Aggregations slightly off -- eg. JAN_SALES column of "Doors canno" row - SparkSQL returns 6355232.185385704, expected 6355232.31 Actual results: {noformat} [null,null,Fairview,Williamson County,TN,United States,DHL,BARIAN,2001,9597806.850651741,1.1121820530080795E7,8670867.81564045,8994785.945689201,1.088724806326294E7,1.4187671518377304E7,9732598.460139751,1.9798897020946026E7,2.1007842467959404E7,2.149551364927292E7,3.479566905774999E7,3.3122997954660416E7,null,null,null,null,null,null,null,null,null,null,null,null,2.191359469742E7,3.2518476414670944E7,2.48856624883976E7,2.5698343830046654E7,3.373591080598068E7,3.552703167087555E7,2.5465193481492043E7,5.362323870799959E7,5.1409986978201866E7,5.415917383586836E7,9.222704311805725E7,8.343539111531019E7] [Bad cards must make.,621234,Fairview,Williamson County,TN,United States,DHL,BARIAN,2001,9506753.593884468,8008140.429557085,6116769.711647987,1.1973045160133362E7,7756254.925520897,5352978.574095726,1.373399613500309E7,1.6418794411203384E7,1.7212743279764652E7,1.704270732417488E7,3.43049358570323E7,3.532416421229005E7,15.30301560102066,12.890698882477594,9.846160563729589,19.273003667109915,12.485238936569628,8.61668642427125,22.107605403121994,26.429323590150222,27.707342611261865,27.433635834765774,55.22063482847413,56.86128610521969,3.0534943928382874E7,2.4481686250203133E7,2.217871080008793E7,2.569579825610423E7,2.995490355044937E7,1.8084140250833035E7,3.0805576178061485E7,4.7156887432252884E7,5.115858869637826E7,5.5759943171424866E7,8.625354428184557E7,8.345155532035494E7] [Conventional childr,977787,Fairview,Williamson County,TN,United States,DHL,BARIAN,2001,8860645.460736752,1.441581376543355E7,6761497.232810497,1.1820654735879421E7,8246260.600341797,6636877.482845306,1.1434492123092413E7,2.5673812070380323E7,2.307420611785E7,2.1834582007320404E7,2.6894900596512794E7,3.357509177109933E7,9.061938296108202,14.743306840276613,6.9151024024767125,12.08919195681618,8.43359606984118,6.787651587559771,11.694256645969329,26.257060147435304,23.598398219562938,22.330611889215547,27.505888906799534,34.337838170377935,2.3836085704864502E7,3.20733132298584E7,2.503790437837982E7,2.2659895963564873E7,2.175740087420273E7,2.4451608012176514E7,2.1933001734852314E7,5.59967034604629E7,5.737188052299309E7,6.208721474336243E7,8.284991027382469E7,8.897031933202875E7] [Doors canno,294242,Fairview,Williamson County,TN,United States,DHL,BARIAN,2001,6355232.185385704,1.0198920296742141E7,1.0246200903741479E7,1.2209716492156029E7,8566998.262890816,8806316.75278151,9789405.6993227,1.646658496404171E7,2.6443785668474197E7,2.701604788320923E7,3.366058958298761E7,2.7462468750599384E7,21.59865751791282,34.66167405313361,34.822360178837414,41.495491779406166,29.115484067165177,29.928823053070296,33.26991285854059,55.96272783641258,89.87087386734116,91.81574310672585,114.39763726112386,93.33293258813964,2.2645142994330406E7,2.448725452685547E7,2.4925759290207863E7,3.0503655031727314E7,2.6558160276379585E7,2.0976233452690125E7,2.9895796101181984E7,5.600219855566597E7,5.348815865275085E7,7.628723580410767E7,8.248374754962921E7,8.808826726185608E7] [Important issues liv,138504,Fairview,Williamson County,TN,United States,DHL,BARIAN,2001,1.1748784594717264E7,1.435130566355586E7,9896470.867572784,7990874.805492401,8879247.840401173,7362383.04259038,1.0011144724414349E7,1.7741201390372872E7,2.1346976135887742E7,1.8074978020030975E7,2.967512567988676E7,3.2545325348875403E7,84.8263197793368,103.6165429414014,71.45259969078715,57.694180713137534,64.10824120892663,53.156465102743454,72.28054586448297,128.09161750110374,154.12534032149065,130.5014874662896,214.25464737398747,234.97751219369408,2.7204167203903973E7,2.598037822457385E7,1.9943398915802002E7,2.5710421112384796E7,1.948448105346489E7,2.6346611484448195E7,2.5075158296625137E7,5.409477817043829E7,4.106673223178029E7,5.454705814340496E7,7.246596285337901E7,9.277032812079096E7] {noformat} Expected results: {noformat}
[jira] [Updated] (SPARK-13862) TPCDS query 49 returns wrong results compared to TPC official result set
[ https://issues.apache.org/jira/browse/SPARK-13862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-13862: Description: Testing Spark SQL using TPC queries. Query 49 returns wrong results compared to official result set. This is at 1GB SF (validation run). SparkSQL has right answer but in wrong order (and there is an 'order by' in the query). Actual results: {noformat} store,9797,0.8000,2,2] [store,12641,0.81609195402298850575,3,3] [store,6661,0.92207792207792207792,7,7] [store,13013,0.94202898550724637681,8,8] [store,9029,1.,10,10] [web,15597,0.66197183098591549296,3,3] [store,14925,0.96470588235294117647,9,9] [store,4063,1.,10,10] [catalog,8929,0.7625,7,7] [store,11589,0.82653061224489795918,6,6] [store,1171,0.82417582417582417582,5,5] [store,9471,0.7750,1,1] [catalog,12577,0.65591397849462365591,3,3] [web,97,0.90361445783132530120,9,8] [web,85,0.85714285714285714286,8,7] [catalog,361,0.74647887323943661972,5,5] [web,2915,0.69863013698630136986,4,4] [web,117,0.9250,10,9] [catalog,9295,0.77894736842105263158,9,9] [web,3305,0.7375,6,16] [catalog,16215,0.79069767441860465116,10,10] [web,7539,0.5900,1,1] [catalog,17543,0.57142857142857142857,1,1] [catalog,3411,0.71641791044776119403,4,4] [web,11933,0.71717171717171717172,5,5] [catalog,14513,0.63541667,2,2] [store,15839,0.81632653061224489796,4,4] [web,3337,0.62650602409638554217,2,2] [web,5299,0.92708333,11,10] [catalog,8189,0.74698795180722891566,6,6] [catalog,14869,0.77173913043478260870,8,8] [web,483,0.8000,7,6] {noformat} Expected results: {noformat} +-+---++-+---+ | CHANNEL | ITEM | RETURN_RATIO | RETURN_RANK | CURRENCY_RANK | +-+---++-+---+ | catalog | 17543 | .5714285714285714 | 1 | 1 | | catalog | 14513 | .63541666 | 2 | 2 | | catalog | 12577 | .6559139784946236 | 3 | 3 | | catalog | 3411 | .7164179104477611 | 4 | 4 | | catalog | 361 | .7464788732394366 | 5 | 5 | | catalog | 8189 | .7469879518072289 | 6 | 6 | | catalog | 8929 | .7625 | 7 | 7 | | catalog | 14869 | .7717391304347826 | 8 | 8 | | catalog | 9295 | .7789473684210526 | 9 | 9 | | catalog | 16215 | .7906976744186046 | 10 |10 | | store | 9471 | .7750 | 1 | 1 | | store | 9797 | .8000 | 2 | 2 | | store | 12641 | .8160919540229885 | 3 | 3 | | store | 15839 | .8163265306122448 | 4 | 4 | | store | 1171 | .8241758241758241 | 5 | 5 | | store | 11589 | .8265306122448979 | 6 | 6 | | store | 6661 | .9220779220779220 | 7 | 7 | | store | 13013 | .9420289855072463 | 8 | 8 | | store | 14925 | .9647058823529411 | 9 | 9 | | store | 4063 | 1. | 10 |10 | | store | 9029 | 1. | 10 |10 | | web | 7539 | .5900 | 1 | 1 | | web | 3337 | .6265060240963855 | 2 | 2 | | web | 15597 | .6619718309859154 | 3 | 3 | | web | 2915 | .6986301369863013 | 4 | 4 | | web | 11933 | .7171717171717171 | 5 | 5 | | web | 3305 | .7375 | 6 |16 | | web | 483 | .8000 | 7 | 6 | | web |85 | .8571428571428571 | 8 | 7 | | web |97 | .9036144578313253 | 9 | 8 | | web | 117 | .9250 | 10 | 9 | | web | 5299 | .92708333 | 11 |10 | +-+---++-+---+ {noformat} Query used: {noformat} -- start query 49 in stream 0 using template query49.tpl and seed QUALIFICATION select 'web' as channel ,web.item ,web.return_ratio ,web.return_rank ,web.currency_rank from ( select item ,return_ratio ,currency_ratio ,rank() over (order by return_ratio) as return_rank ,rank() over (order by currency_ratio) as currency_rank from ( select ws.ws_item_sk as item ,(cast(sum(coalesce(wr.wr_return_quantity,0)) as decimal(15,4))/ cast(sum(coalesce(ws.ws_quantity,0)) as decimal(15,4) )) as
[jira] [Updated] (SPARK-13865) TPCDS query 87 returns wrong results compared to TPC official result set
[ https://issues.apache.org/jira/browse/SPARK-13865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-13865: Description: Testing Spark SQL using TPC queries. Query 87 returns wrong results compared to official result set. This is at 1GB SF (validation run). SparkSQL returns count of 47555, answer set expects 47298. Actual results: {noformat} [47555] {noformat} {noformat} Expected: +---+ | 1 | +---+ | 47298 | +---+ {noformat} Query used: {noformat} -- start query 87 in stream 0 using template query87.tpl and seed QUALIFICATION select count(*) from (select distinct c_last_name as cln1, c_first_name as cfn1, d_date as ddate1, 1 as notnull1 from store_sales JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk where d_month_seq between 1200 and 1200+11 ) tmp1 left outer join (select distinct c_last_name as cln2, c_first_name as cfn2, d_date as ddate2, 1 as notnull2 from catalog_sales JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk JOIN customer ON catalog_sales.cs_bill_customer_sk = customer.c_customer_sk where d_month_seq between 1200 and 1200+11 ) tmp2 on (tmp1.cln1 = tmp2.cln2) and (tmp1.cfn1 = tmp2.cfn2) and (tmp1.ddate1= tmp2.ddate2) left outer join (select distinct c_last_name as cln3, c_first_name as cfn3 , d_date as ddate3, 1 as notnull3 from web_sales JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk where d_month_seq between 1200 and 1200+11 ) tmp3 on (tmp1.cln1 = tmp3.cln3) and (tmp1.cfn1 = tmp3.cfn3) and (tmp1.ddate1= tmp3.ddate3) where notnull2 is null and notnull3 is null ; -- end query 87 in stream 0 using template query87.tpl {noformat} was: Testing Spark SQL using TPC queries. Query 87 returns wrong results compared to official result set. This is at 1GB SF (validation run). SparkSQL returns count of 47555, answer set expects 47298. Actual results: [47555] Expected: +---+ | 1 | +---+ | 47298 | +---+ Query used: -- start query 87 in stream 0 using template query87.tpl and seed QUALIFICATION select count(*) from (select distinct c_last_name as cln1, c_first_name as cfn1, d_date as ddate1, 1 as notnull1 from store_sales JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk where d_month_seq between 1200 and 1200+11 ) tmp1 left outer join (select distinct c_last_name as cln2, c_first_name as cfn2, d_date as ddate2, 1 as notnull2 from catalog_sales JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk JOIN customer ON catalog_sales.cs_bill_customer_sk = customer.c_customer_sk where d_month_seq between 1200 and 1200+11 ) tmp2 on (tmp1.cln1 = tmp2.cln2) and (tmp1.cfn1 = tmp2.cfn2) and (tmp1.ddate1= tmp2.ddate2) left outer join (select distinct c_last_name as cln3, c_first_name as cfn3 , d_date as ddate3, 1 as notnull3 from web_sales JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk where d_month_seq between 1200 and 1200+11 ) tmp3 on (tmp1.cln1 = tmp3.cln3) and (tmp1.cfn1 = tmp3.cfn3) and (tmp1.ddate1= tmp3.ddate3) where notnull2 is null and notnull3 is null ; -- end query 87 in stream 0 using template query87.tpl > TPCDS query 87 returns wrong results compared to TPC official result set > - > > Key: SPARK-13865 > URL: https://issues.apache.org/jira/browse/SPARK-13865 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: JESSE CHEN > Labels: tpcds-result-mismatch > > Testing Spark SQL using TPC queries. Query 87 returns wrong results compared > to official result set. This is at 1GB SF (validation run). > SparkSQL returns count of 47555, answer set expects 47298. > Actual results: > {noformat} > [47555] > {noformat} > {noformat} > Expected: > +---+ > | 1 | > +---+ > | 47298 | > +---+ > {noformat} > Query used: > {noformat} > -- start query 87 in stream 0 using template query87.tpl and seed > QUALIFICATION > select count(*) > from > (select distinct c_last_name as cln1, c_first_name as cfn1, d_date as > ddate1, 1 as notnull1 >from
[jira] [Updated] (SPARK-13864) TPCDS query 74 returns wrong results compared to TPC official result set
[ https://issues.apache.org/jira/browse/SPARK-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-13864: Description: Testing Spark SQL using TPC queries. Query 74 returns wrong results compared to official result set. This is at 1GB SF (validation run). Spark SQL has right answer but in wrong order (and there is an 'order by' in the query). Actual results: {noformat} [BLEIBAAA,Paula,Wakefield] [DFIEBAAA,John,Gray] [OCLBBAAA,null,null] [PKBCBAAA,Andrea,White] [EJDL,Alice,Wright] [FACE,Priscilla,Miller] [LFKK,Ignacio,Miller] [LJNCBAAA,George,Gamez] [LIOP,Derek,Allen] [EADJ,Ruth,Carroll] [JGMM,Richard,Larson] [PKIK,Wendy,Horvath] [FJHF,Larissa,Roy] [EPOG,Felisha,Mendes] [EKJL,Aisha,Carlson] [HNFH,Rebecca,Wilson] [IBFCBAAA,Ruth,Grantham] [OPDL,Ann,Pence] [NIPL,Eric,Lawrence] [OCIC,Zachary,Pennington] [OFLC,James,Taylor] [GEHI,Tyler,Miller] [CADP,Cristobal,Thomas] [JIAL,Santos,Gutierrez] [PMMBBAAA,Paul,Jordan] [DIIO,David,Carroll] [DFKABAAA,Latoya,Craft] [HMOI,Grace,Henderson] [PPIBBAAA,Candice,Lee] [JONHBAAA,Warren,Orozco] [GNDA,Terry,Mcdowell] [CIJM,Elizabeth,Thomas] [DIJGBAAA,Ruth,Sanders] [NFBDBAAA,Vernice,Fernandez] [IDKF,Michael,Mack] [IMHB,Kathy,Knowles] [LHMC,Brooke,Nelson] [CFCGBAAA,Marcus,Sanders] [NJHCBAAA,Christopher,Schreiber] [PDFB,Terrance,Banks] [ANFA,Philip,Banks] [IADEBAAA,Diane,Aldridge] [ICHF,Linda,Mccoy] [CFEN,Christopher,Dawson] [KOJJ,Gracie,Mendoza] [FOJA,Don,Castillo] [FGPG,Albert,Wadsworth] [KJBK,Georgia,Scott] [EKFP,Annika,Chin] [IBAEBAAA,Sandra,Wilson] [MFFL,Margret,Gray] [KNAK,Gladys,Banks] [CJDI,James,Kerr] [OBADBAAA,Elizabeth,Burnham] [AMGD,Kenneth,Harlan] [HJLA,Audrey,Beltran] [AOPFBAAA,Jerry,Fields] [CNAGBAAA,Virginia,May] [HGOABAAA,Sonia,White] [KBCABAAA,Debra,Bell] [NJAG,Allen,Hood] [MMOBBAAA,Margaret,Smith] [NGDBBAAA,Carlos,Jewell] [FOGI,Michelle,Greene] [JEKFBAAA,Norma,Burkholder] [OCAJ,Jenna,Staton] [PFCL,Felicia,Neville] [DLHBBAAA,Henry,Bertrand] [DBEFBAAA,Bennie,Bowers] [DCKO,Robert,Gonzalez] [KKGE,Katie,Dunbar] [GFMDBAAA,Kathleen,Gibson] [IJEM,Charlie,Cummings] [KJBL,Kerry,Davis] [JKBN,Julie,Kern] [MDCA,Louann,Hamel] [EOAK,Molly,Benjamin] [IBHH,Jennifer,Ballard] [PJEN,Ashley,Norton] [KLHHBAAA,Manuel,Castaneda] [IMHHBAAA,Lillian,Davidson] [GHPBBAAA,Nick,Mendez] [BNBB,Irma,Smith] [FBAH,Michael,Williams] [PEHEBAAA,Edith,Molina] [FMHI,Emilio,Darling] [KAEC,Milton,Mackey] [OCDJ,Nina,Sanchez] [FGIG,Eduardo,Miller] [FHACBAAA,null,null] [HMJN,Ryan,Baptiste] [HHCABAAA,William,Stewart] {noformat} Expected results: {noformat} +--+-++ | CUSTOMER_ID | CUSTOMER_FIRST_NAME | CUSTOMER_LAST_NAME | +--+-++ | AMGD | Kenneth | Harlan | | ANFA | Philip | Banks | | AOPFBAAA | Jerry | Fields | | BLEIBAAA | Paula | Wakefield | | BNBB | Irma| Smith | | CADP | Cristobal | Thomas | | CFCGBAAA | Marcus | Sanders| | CFEN | Christopher | Dawson | | CIJM | Elizabeth | Thomas | | CJDI | James | Kerr | | CNAGBAAA | Virginia| May| | DBEFBAAA | Bennie | Bowers | | DCKO | Robert | Gonzalez | | DFIEBAAA | John| Gray | | DFKABAAA | Latoya | Craft | | DIIO | David | Carroll| | DIJGBAAA | Ruth| Sanders| | DLHBBAAA | Henry | Bertrand | | EADJ | Ruth| Carroll| | EJDL |
[jira] [Updated] (SPARK-13531) Some DataFrame joins stopped working with UnsupportedOperationException: No size estimation available for objects
[ https://issues.apache.org/jira/browse/SPARK-13531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13531: - Target Version/s: 2.0.0 > Some DataFrame joins stopped working with UnsupportedOperationException: No > size estimation available for objects > - > > Key: SPARK-13531 > URL: https://issues.apache.org/jira/browse/SPARK-13531 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: koert kuipers >Priority: Minor > > this is using spark 2.0.0-SNAPSHOT > dataframe df1: > schema: > {noformat}StructType(StructField(x,IntegerType,true)){noformat} > explain: > {noformat}== Physical Plan == > MapPartitions , obj#135: object, [if (input[0, object].isNullAt) > null else input[0, object].get AS x#128] > +- MapPartitions , createexternalrow(if (isnull(x#9)) null else > x#9), [input[0, object] AS obj#135] >+- WholeStageCodegen > : +- Project [_1#8 AS x#9] > : +- Scan ExistingRDD[_1#8]{noformat} > show: > {noformat}+---+ > | x| > +---+ > | 2| > | 3| > +---+{noformat} > dataframe df2: > schema: > {noformat}StructType(StructField(x,IntegerType,true), > StructField(y,StringType,true)){noformat} > explain: > {noformat}== Physical Plan == > MapPartitions , createexternalrow(x#2, if (isnull(y#3)) null else > y#3.toString), [if (input[0, object].isNullAt) null else input[0, object].get > AS x#130,if (input[0, object].isNullAt) null else staticinvoke(class > org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, > object].get, true) AS y#131] > +- WholeStageCodegen >: +- Project [_1#0 AS x#2,_2#1 AS y#3] >: +- Scan ExistingRDD[_1#0,_2#1]{noformat} > show: > {noformat}+---+---+ > | x| y| > +---+---+ > | 1| 1| > | 2| 2| > | 3| 3| > +---+---+{noformat} > i run: > df1.join(df2, Seq("x")).show > i get: > {noformat}java.lang.UnsupportedOperationException: No size estimation > available for objects. > at org.apache.spark.sql.types.ObjectType.defaultSize(ObjectType.scala:41) > at > org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323) > at > org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.immutable.List.map(List.scala:285) > at > org.apache.spark.sql.catalyst.plans.logical.UnaryNode.statistics(LogicalPlan.scala:323) > at > org.apache.spark.sql.execution.SparkStrategies$CanBroadcast$.unapply(SparkStrategies.scala:87){noformat} > now sure what changed, this ran about a week ago without issues (in our > internal unit tests). it is fully reproducible, however when i tried to > minimize the issue i could not reproduce it by just creating data frames in > the repl with the same contents, so it probably has something to do with way > these are created (from Row objects and StructTypes). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8360) Structured Streaming (aka Streaming DataFrames)
[ https://issues.apache.org/jira/browse/SPARK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-8360: --- Summary: Structured Streaming (aka Streaming DataFrames) (was: Streaming DataFrames) > Structured Streaming (aka Streaming DataFrames) > --- > > Key: SPARK-8360 > URL: https://issues.apache.org/jira/browse/SPARK-8360 > Project: Spark > Issue Type: Umbrella > Components: SQL, Streaming >Reporter: Reynold Xin > Attachments: > StructuredStreamingProgrammingAbstractionSemanticsandAPIs-ApacheJIRA.pdf > > > Umbrella ticket to track what's needed to make streaming DataFrame a reality. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-13135) Don't print expressions recursively in generated code
[ https://issues.apache.org/jira/browse/SPARK-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reopened SPARK-13135: That PR is not merged. > Don't print expressions recursively in generated code > - > > Key: SPARK-13135 > URL: https://issues.apache.org/jira/browse/SPARK-13135 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > Our code generation currently prints expressions recursively. For example, > for expression "(1 + 1) + 1)", we would print the following: > "(1 + 1) + 1)" > "(1 + 1)" > "1" > "1" > We should just print the project list once. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13878) Window functions failed in cluster
[ https://issues.apache.org/jira/browse/SPARK-13878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-13878. Resolution: Duplicate > Window functions failed in cluster > -- > > Key: SPARK-13878 > URL: https://issues.apache.org/jira/browse/SPARK-13878 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Davies Liu > > When cume_dist is used, we got the following error. > {code} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in > stage 102.0 failed 4 times, most recent failure: Lost task 10.3 in stage > 102.0 (TID 8448, ip-10-216-233-112.eu-west-1.compute.internal): > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, tree: window__partition__size#206 > at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:86) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:85) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:321) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:321) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:249) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:85) > at > org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:62) > at >
[jira] [Updated] (SPARK-13878) Window functions failed in cluster
[ https://issues.apache.org/jira/browse/SPARK-13878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-13878: --- Description: When cume_dist is used, we got the following error. {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 102.0 failed 4 times, most recent failure: Lost task 10.3 in stage 102.0 (TID 8448, ip-10-216-233-112.eu-west-1.compute.internal): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: window__partition__size#206 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:86) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:85) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:321) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:321) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:249) at org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:85) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:62) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:62) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at