date:20160314

[jira] [Assigned] (SPARK-13891) Issue an exception when hitting max iteration limit in testing

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13891:


Assignee: (was: Apache Spark)

> Issue an exception when hitting max iteration limit in testing
> --
>
> Key: SPARK-13891
> URL: https://issues.apache.org/jira/browse/SPARK-13891
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Issue an exception in the unit tests of Spark SQL when hitting the max 
> iteration limit. Then, we can catch the infinite loop bugs in Analyzer and 
> Optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13891) Issue an exception when hitting max iteration limit in testing

2016-03-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194761#comment-15194761
 ] 

Apache Spark commented on SPARK-13891:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/11714

> Issue an exception when hitting max iteration limit in testing
> --
>
> Key: SPARK-13891
> URL: https://issues.apache.org/jira/browse/SPARK-13891
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Issue an exception in the unit tests of Spark SQL when hitting the max 
> iteration limit. Then, we can catch the infinite loop bugs in Analyzer and 
> Optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13891) Issue an exception when hitting max iteration limit in testing

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13891:


Assignee: Apache Spark

> Issue an exception when hitting max iteration limit in testing
> --
>
> Key: SPARK-13891
> URL: https://issues.apache.org/jira/browse/SPARK-13891
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> Issue an exception in the unit tests of Spark SQL when hitting the max 
> iteration limit. Then, we can catch the infinite loop bugs in Analyzer and 
> Optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13891) Issue an exception when hitting max iteration limit in testing

2016-03-14 Thread Xiao Li (JIRA)

Xiao Li created SPARK-13891:
---

 Summary: Issue an exception when hitting max iteration limit in 
testing
 Key: SPARK-13891
 URL: https://issues.apache.org/jira/browse/SPARK-13891
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li


Issue an exception in the unit tests of Spark SQL when hitting the max 
iteration limit. Then, we can catch the infinite loop bugs in Analyzer and 
Optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13034) PySpark ml.classification support export/import

2016-03-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194746#comment-15194746
 ] 

Apache Spark commented on SPARK-13034:
--

User 'GayathriMurali' has created a pull request for this issue:
https://github.com/apache/spark/pull/11707

> PySpark ml.classification support export/import
> ---
>
> Key: SPARK-13034
> URL: https://issues.apache.org/jira/browse/SPARK-13034
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Reporter: Yanbo Liang
>Priority: Minor
>
> Add export/import for all estimators and transformers(which have Scala 
> implementation) under pyspark/ml/classification.py. Please refer the 
> implementation at SPARK-13032.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13353) Fast serialization for collecting DataFrame

2016-03-14 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13353.
-
   Resolution: Fixed
 Assignee: Davies Liu
Fix Version/s: 2.0.0

> Fast serialization for collecting DataFrame
> ---
>
> Key: SPARK-13353
> URL: https://issues.apache.org/jira/browse/SPARK-13353
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> UnsafeRowSerializer should be more efficient than JavaSerializer or 
> KyroSerializer for DataFrame.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13661) Avoid the copy of UnsafeRow in HashedRelation

2016-03-14 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13661.
-
   Resolution: Fixed
 Assignee: Davies Liu
Fix Version/s: 2.0.0

> Avoid the copy of UnsafeRow in HashedRelation
> -
>
> Key: SPARK-13661
> URL: https://issues.apache.org/jira/browse/SPARK-13661
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> We usually build the HashedRelation on top of array of UnsafeRow, the copy 
> could be avoided.
> The caller of HashedRelation need to do the copy if it's needed.
> Another approach could be making the copy() of UnsafeRow smart so that it 
> know when should copy the bytes or not, this could be also useful for other 
> components. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13889) Integer overflow when calculating the max number of executor failure

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13889:


Assignee: Apache Spark

> Integer overflow when calculating the max number of executor failure
> 
>
> Key: SPARK-13889
> URL: https://issues.apache.org/jira/browse/SPARK-13889
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Carson Wang
>Assignee: Apache Spark
>
> The max number of executor failure before failing the application is default 
> to twice the maximum number of executors if dynamic allocation is enabled. 
> The default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. 
> So this causes an integer overflow and a wrong result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13889) Integer overflow when calculating the max number of executor failure

2016-03-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194737#comment-15194737
 ] 

Apache Spark commented on SPARK-13889:
--

User 'carsonwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/11713

> Integer overflow when calculating the max number of executor failure
> 
>
> Key: SPARK-13889
> URL: https://issues.apache.org/jira/browse/SPARK-13889
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Carson Wang
>
> The max number of executor failure before failing the application is default 
> to twice the maximum number of executors if dynamic allocation is enabled. 
> The default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. 
> So this causes an integer overflow and a wrong result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13889) Integer overflow when calculating the max number of executor failure

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13889:


Assignee: (was: Apache Spark)

> Integer overflow when calculating the max number of executor failure
> 
>
> Key: SPARK-13889
> URL: https://issues.apache.org/jira/browse/SPARK-13889
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Carson Wang
>
> The max number of executor failure before failing the application is default 
> to twice the maximum number of executors if dynamic allocation is enabled. 
> The default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. 
> So this causes an integer overflow and a wrong result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13890) Remove some internal classes' dependency on SQLContext

2016-03-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194735#comment-15194735
 ] 

Apache Spark commented on SPARK-13890:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/11712

> Remove some internal classes' dependency on SQLContext
> --
>
> Key: SPARK-13890
> URL: https://issues.apache.org/jira/browse/SPARK-13890
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> In general it is better for internal classes to not depend on the external 
> class (in this case SQLContext) to reduce coupling between user-facing APIs 
> and the internal implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13890) Remove some internal classes' dependency on SQLContext

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13890:


Assignee: Apache Spark  (was: Reynold Xin)

> Remove some internal classes' dependency on SQLContext
> --
>
> Key: SPARK-13890
> URL: https://issues.apache.org/jira/browse/SPARK-13890
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> In general it is better for internal classes to not depend on the external 
> class (in this case SQLContext) to reduce coupling between user-facing APIs 
> and the internal implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13890) Remove some internal classes' dependency on SQLContext

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13890:


Assignee: Reynold Xin  (was: Apache Spark)

> Remove some internal classes' dependency on SQLContext
> --
>
> Key: SPARK-13890
> URL: https://issues.apache.org/jira/browse/SPARK-13890
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> In general it is better for internal classes to not depend on the external 
> class (in this case SQLContext) to reduce coupling between user-facing APIs 
> and the internal implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13712) Add OneVsOne to ML

2016-03-14 Thread Joseph K. Bradley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194734#comment-15194734
 ] 

Joseph K. Bradley commented on SPARK-13712:
---

Hm, my understanding is that ECOC-based ones can achieve performance similar to 
one-vs-one.  See, e.g., Allwein et al. "Reducing Multiclass to Binary: A 
Unifying Approach for Margin Classifiers"  JMLR 2001.  
[http://www.jmlr.org/papers/volume1/allwein00a/allwein00a.pdf]

It's possible those results rely on soft predictions from classifiers, but I 
don't think they do.  I'd need to refresh on that material to recall for sure.

> Add OneVsOne to ML
> --
>
> Key: SPARK-13712
> URL: https://issues.apache.org/jira/browse/SPARK-13712
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: zhengruifeng
>Priority: Minor
>
> Another Meta method for multi-class classification.
> Most classification algorithms were designed for balanced data.
> The OneVsRest method will generate K models on imbalanced data.
> The OneVsOne will train K*(K-1)/2 models on balanced data.
> OneVsOne is less sensitive to the problems of imbalanced datasets, and can 
> usually result in higher precision.
> But it is much more computationally expensive, although each model are 
> trained on a much smaller dataset. (2/K of total)
> The OneVsOne is implemented in the way OneVsRest did:
> val classifier = new LogisticRegression()
> val ovo = new OneVsOne()
> ovo.setClassifier(classifier)
> val ovoModel = ovo.fit(data)
> val predictions = ovoModel.transform(data)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13890) Remove some internal classes' dependency on SQLContext

2016-03-14 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-13890:
---

 Summary: Remove some internal classes' dependency on SQLContext
 Key: SPARK-13890
 URL: https://issues.apache.org/jira/browse/SPARK-13890
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


In general it is better for internal classes to not depend on the external 
class (in this case SQLContext) to reduce coupling between user-facing APIs and 
the internal implementations.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13888) Remove Streaming Akka docs from Spark

2016-03-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194731#comment-15194731
 ] 

Apache Spark commented on SPARK-13888:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/11711

> Remove Streaming Akka docs from Spark
> -
>
> Key: SPARK-13888
> URL: https://issues.apache.org/jira/browse/SPARK-13888
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Shixiong Zhu
>
> I have copied the docs of Streaming Akka to 
> https://github.com/spark-packages/dstream-akka/blob/master/README.md
> So we can remove them from Spark now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13888) Remove Streaming Akka docs from Spark

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13888:


Assignee: Apache Spark

> Remove Streaming Akka docs from Spark
> -
>
> Key: SPARK-13888
> URL: https://issues.apache.org/jira/browse/SPARK-13888
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Shixiong Zhu
>Assignee: Apache Spark
>
> I have copied the docs of Streaming Akka to 
> https://github.com/spark-packages/dstream-akka/blob/master/README.md
> So we can remove them from Spark now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13888) Remove Streaming Akka docs from Spark

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13888:


Assignee: (was: Apache Spark)

> Remove Streaming Akka docs from Spark
> -
>
> Key: SPARK-13888
> URL: https://issues.apache.org/jira/browse/SPARK-13888
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Shixiong Zhu
>
> I have copied the docs of Streaming Akka to 
> https://github.com/spark-packages/dstream-akka/blob/master/README.md
> So we can remove them from Spark now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13888) Remove Streaming Akka docs from Spark

2016-03-14 Thread Shixiong Zhu (JIRA)

Shixiong Zhu created SPARK-13888:


 Summary: Remove Streaming Akka docs from Spark
 Key: SPARK-13888
 URL: https://issues.apache.org/jira/browse/SPARK-13888
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Reporter: Shixiong Zhu


I have copied the docs of Streaming Akka to 
https://github.com/spark-packages/dstream-akka/blob/master/README.md

So we can remove them from Spark now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13889) Integer overflow when calculating the max number of executor failure

2016-03-14 Thread Carson Wang (JIRA)

Carson Wang created SPARK-13889:
---

 Summary: Integer overflow when calculating the max number of 
executor failure
 Key: SPARK-13889
 URL: https://issues.apache.org/jira/browse/SPARK-13889
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.6.1, 1.6.0
Reporter: Carson Wang


The max number of executor failure before failing the application is default to 
twice the maximum number of executors if dynamic allocation is enabled. The 
default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. So 
this causes an integer overflow and a wrong result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method

2016-03-14 Thread MahmoudHanafy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MahmoudHanafy updated SPARK-13886:
--
Description: 
There are multiple types that are supoprted by Spark SQL, One of them is 
ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Question1: Can the key in MapType be of BinaryType ?

Question2: Isn't there another way to handle BinaryType by using scala type 
instead of Array ?

I want to contribute by fixing this issue.

  was:
There are multiple types that are supoprted by Spark SQL, One of them is 
ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Question1: Can the key in MapType be of BinaryType ?

Question2: Why do you use Array\[Byte\] as Binary type ? Why not Seq\[Byte\] ?
I think this to differentiate between (ArrayType of Byte) and (BinaryType). 
Isn't there another way to handle BinaryType by using scala type instead of 
Array

I want to contribute by fixing this issue.


> ArrayType of BinaryType not supported in Row.equals method 
> ---
>
> Key: SPARK-13886
> URL: https://issues.apache.org/jira/browse/SPARK-13886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: MahmoudHanafy
>Priority: Minor
>
> There are multiple types that are supoprted by Spark SQL, One of them is 
> ArrayType(Seq) which can be of any element type
> So it can be BinaryType(Array\[Byte\])
> In equals method in Row class, there is no handling for ArrayType of 
> BinaryType.
> So for example:
> {code:xml}
> val a = Row( Seq( Array(1.toByte) ) )
> val b = Row( Seq( Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Also, this doesn't work for MapType of BinaryType.
> {code:xml}
> val a = Row( Map(1 -> Array(1.toByte) ) )
> val b = Row( Map(1 -> Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Question1: Can the key in MapType be of BinaryType ?
> Question2: Isn't there another way to handle BinaryType by using scala type 
> instead of Array ?
> I want to contribute by fixing this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method

2016-03-14 Thread MahmoudHanafy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MahmoudHanafy updated SPARK-13886:
--
Description: 
There are multiple types that are supoprted by Spark SQL, One of them is 
ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Question1: Can the key in MapType be of BinaryType ?

Question2: Why do you use Array\[Byte\] as Binary type ? Why not Seq\[Byte\] ?
I think this to differentiate between (ArrayType of Byte) and (BinaryType). 
Isn't there another way to handle BinaryType by using scala type instead of 
Array

I want to contribute by fixing this issue.

  was:
There are multiple types that are supoprted by Spark SQL, One of them is 
ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Question1: Can the key in MapType be of BinaryType ?

Question2: Why do you use Array\[Byte\] as Binary type ? Why not Seq\[Byte\] ?
I think this to differentiate between (ArrayType of Byte) and (BinaryType). 

I want to contribute by fixing this issue.


> ArrayType of BinaryType not supported in Row.equals method 
> ---
>
> Key: SPARK-13886
> URL: https://issues.apache.org/jira/browse/SPARK-13886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: MahmoudHanafy
>Priority: Minor
>
> There are multiple types that are supoprted by Spark SQL, One of them is 
> ArrayType(Seq) which can be of any element type
> So it can be BinaryType(Array\[Byte\])
> In equals method in Row class, there is no handling for ArrayType of 
> BinaryType.
> So for example:
> {code:xml}
> val a = Row( Seq( Array(1.toByte) ) )
> val b = Row( Seq( Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Also, this doesn't work for MapType of BinaryType.
> {code:xml}
> val a = Row( Map(1 -> Array(1.toByte) ) )
> val b = Row( Map(1 -> Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Question1: Can the key in MapType be of BinaryType ?
> Question2: Why do you use Array\[Byte\] as Binary type ? Why not Seq\[Byte\] ?
> I think this to differentiate between (ArrayType of Byte) and (BinaryType). 
> Isn't there another way to handle BinaryType by using scala type instead of 
> Array
> I want to contribute by fixing this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method

2016-03-14 Thread MahmoudHanafy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MahmoudHanafy updated SPARK-13886:
--
Description: 
There are multiple types that are supoprted by Spark SQL, One of them is 
ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Question1: Can the key in MapType be of BinaryType ?

Question2: Why do you use Array\[Byte\] as Binary type ? Why not Seq\[Byte\] ?
I think this to differentiate between (ArrayType of Byte) and (BinaryType). 

I want to contribute by fixing this issue.

  was:
There are multiple types that are supoprted by Spark SQL, One of them is 
ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Question1: Can the key in MapType be of BinaryType ?

Question2: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ?
All this overhead here because using Array and it needs special handling

I want to contribute by fixing this issue.


> ArrayType of BinaryType not supported in Row.equals method 
> ---
>
> Key: SPARK-13886
> URL: https://issues.apache.org/jira/browse/SPARK-13886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: MahmoudHanafy
>Priority: Minor
>
> There are multiple types that are supoprted by Spark SQL, One of them is 
> ArrayType(Seq) which can be of any element type
> So it can be BinaryType(Array\[Byte\])
> In equals method in Row class, there is no handling for ArrayType of 
> BinaryType.
> So for example:
> {code:xml}
> val a = Row( Seq( Array(1.toByte) ) )
> val b = Row( Seq( Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Also, this doesn't work for MapType of BinaryType.
> {code:xml}
> val a = Row( Map(1 -> Array(1.toByte) ) )
> val b = Row( Map(1 -> Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Question1: Can the key in MapType be of BinaryType ?
> Question2: Why do you use Array\[Byte\] as Binary type ? Why not Seq\[Byte\] ?
> I think this to differentiate between (ArrayType of Byte) and (BinaryType). 
> I want to contribute by fixing this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5433) Spark EC2 doesn't mount local disks for all instance types

2016-03-14 Thread Geet Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194722#comment-15194722
 ] 

Geet Kumar commented on SPARK-5433:
---

The spark-ec2 script does not mount disks for the d2.xlarge instance types. 

> Spark EC2 doesn't mount local disks for all instance types
> --
>
> Key: SPARK-5433
> URL: https://issues.apache.org/jira/browse/SPARK-5433
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 1.2.0
>Reporter: Tomer Kaftan
>Priority: Critical
>
> Launching a cluster using spark-ec2 will currently mount all local disks for 
> the r3 instance types.
> Branch 1.3.0 of the ec2 scripts has also been updated to mount one local ssd 
> disk for the i2 instance types.
> At the very least the i2 instance types need to have all local disks mounted.
> We also need to find if there are any other instance types that also need to 
> be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method

2016-03-14 Thread MahmoudHanafy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MahmoudHanafy updated SPARK-13886:
--
Description: 
There are multiple types that are supoprted by Spark SQL, One of them is 
ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Question: Can the key in MapType be of BinaryType ?
Question: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ?
All this overhead here because using Array and it needs special handling

I want to contribute by fixing this issue.

  was:
There are multiple types that are supoprted by Spark SQL, One of them is 
ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Question: Can the key in MapType be of BinaryType ?

I want to contribute by fixing this issue.


> ArrayType of BinaryType not supported in Row.equals method 
> ---
>
> Key: SPARK-13886
> URL: https://issues.apache.org/jira/browse/SPARK-13886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: MahmoudHanafy
>Priority: Minor
>
> There are multiple types that are supoprted by Spark SQL, One of them is 
> ArrayType(Seq) which can be of any element type
> So it can be BinaryType(Array\[Byte\])
> In equals method in Row class, there is no handling for ArrayType of 
> BinaryType.
> So for example:
> {code:xml}
> val a = Row( Seq( Array(1.toByte) ) )
> val b = Row( Seq( Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Also, this doesn't work for MapType of BinaryType.
> {code:xml}
> val a = Row( Map(1 -> Array(1.toByte) ) )
> val b = Row( Map(1 -> Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Question: Can the key in MapType be of BinaryType ?
> Question: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ?
> All this overhead here because using Array and it needs special handling
> I want to contribute by fixing this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method

2016-03-14 Thread MahmoudHanafy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MahmoudHanafy updated SPARK-13886:
--
Description: 
There are multiple types that are supoprted by Spark SQL, One of them is 
ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Question1: Can the key in MapType be of BinaryType ?

Question2: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ?
All this overhead here because using Array and it needs special handling

I want to contribute by fixing this issue.

  was:
There are multiple types that are supoprted by Spark SQL, One of them is 
ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}


Question: Can the key in MapType be of BinaryType ?
Question: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ?
All this overhead here because using Array and it needs special handling

I want to contribute by fixing this issue.


> ArrayType of BinaryType not supported in Row.equals method 
> ---
>
> Key: SPARK-13886
> URL: https://issues.apache.org/jira/browse/SPARK-13886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: MahmoudHanafy
>Priority: Minor
>
> There are multiple types that are supoprted by Spark SQL, One of them is 
> ArrayType(Seq) which can be of any element type
> So it can be BinaryType(Array\[Byte\])
> In equals method in Row class, there is no handling for ArrayType of 
> BinaryType.
> So for example:
> {code:xml}
> val a = Row( Seq( Array(1.toByte) ) )
> val b = Row( Seq( Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Also, this doesn't work for MapType of BinaryType.
> {code:xml}
> val a = Row( Map(1 -> Array(1.toByte) ) )
> val b = Row( Map(1 -> Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Question1: Can the key in MapType be of BinaryType ?
> Question2: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ?
> All this overhead here because using Array and it needs special handling
> I want to contribute by fixing this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method

2016-03-14 Thread MahmoudHanafy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MahmoudHanafy updated SPARK-13886:
--
Description: 
There are multiple types that are supoprted by Spark SQL, One of them is 
ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}


Question: Can the key in MapType be of BinaryType ?
Question: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ?
All this overhead here because using Array and it needs special handling

I want to contribute by fixing this issue.

  was:
There are multiple types that are supoprted by Spark SQL, One of them is 
ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Question: Can the key in MapType be of BinaryType ?
Question: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ?
All this overhead here because using Array and it needs special handling

I want to contribute by fixing this issue.


> ArrayType of BinaryType not supported in Row.equals method 
> ---
>
> Key: SPARK-13886
> URL: https://issues.apache.org/jira/browse/SPARK-13886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: MahmoudHanafy
>Priority: Minor
>
> There are multiple types that are supoprted by Spark SQL, One of them is 
> ArrayType(Seq) which can be of any element type
> So it can be BinaryType(Array\[Byte\])
> In equals method in Row class, there is no handling for ArrayType of 
> BinaryType.
> So for example:
> {code:xml}
> val a = Row( Seq( Array(1.toByte) ) )
> val b = Row( Seq( Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Also, this doesn't work for MapType of BinaryType.
> {code:xml}
> val a = Row( Map(1 -> Array(1.toByte) ) )
> val b = Row( Map(1 -> Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Question: Can the key in MapType be of BinaryType ?
> Question: Why do you use Array[Byte] as Binary type ? Why not Seq[Byte] ?
> All this overhead here because using Array and it needs special handling
> I want to contribute by fixing this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method

2016-03-14 Thread MahmoudHanafy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MahmoudHanafy updated SPARK-13886:
--
Description: 
There are multiple types that are supoprted by Spark SQL, One of them is 
ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Question: Can the key in MapType be of BinaryType ?

I want to contribute by fixing this issue.

  was:
There are multiple types that are supoprted by Spark SQL, One of them is 
ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

I want to contribute by fixing this issue.


> ArrayType of BinaryType not supported in Row.equals method 
> ---
>
> Key: SPARK-13886
> URL: https://issues.apache.org/jira/browse/SPARK-13886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: MahmoudHanafy
>Priority: Minor
>
> There are multiple types that are supoprted by Spark SQL, One of them is 
> ArrayType(Seq) which can be of any element type
> So it can be BinaryType(Array\[Byte\])
> In equals method in Row class, there is no handling for ArrayType of 
> BinaryType.
> So for example:
> {code:xml}
> val a = Row( Seq( Array(1.toByte) ) )
> val b = Row( Seq( Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Also, this doesn't work for MapType of BinaryType.
> {code:xml}
> val a = Row( Map(1 -> Array(1.toByte) ) )
> val b = Row( Map(1 -> Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Question: Can the key in MapType be of BinaryType ?
> I want to contribute by fixing this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method

2016-03-14 Thread MahmoudHanafy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MahmoudHanafy updated SPARK-13886:
--
Description: 
There are multiple types that are supoprted by Spark SQL, One of them is 
ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

I want to contribute by fixing this issue.

  was:
There are multiple types that are supoprted by Spark SQL
One of them is ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for
ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

I want to contribute by fixing this issue.


> ArrayType of BinaryType not supported in Row.equals method 
> ---
>
> Key: SPARK-13886
> URL: https://issues.apache.org/jira/browse/SPARK-13886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: MahmoudHanafy
>Priority: Minor
>
> There are multiple types that are supoprted by Spark SQL, One of them is 
> ArrayType(Seq) which can be of any element type
> So it can be BinaryType(Array\[Byte\])
> In equals method in Row class, there is no handling for ArrayType of 
> BinaryType.
> So for example:
> {code:xml}
> val a = Row( Seq( Array(1.toByte) ) )
> val b = Row( Seq( Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Also, this doesn't work for MapType of BinaryType.
> {code:xml}
> val a = Row( Map(1 -> Array(1.toByte) ) )
> val b = Row( Map(1 -> Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> I want to contribute by fixing this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method

2016-03-14 Thread MahmoudHanafy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MahmoudHanafy updated SPARK-13886:
--
Description: 
There are multiple types that are supoprted by Spark SQL
One of them is ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for
ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

I want to contribute by fixing this issue.

  was:
There are multiple types that are supoprted by Spark SQL
One of them is ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for
ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

I can fix this issue.


> ArrayType of BinaryType not supported in Row.equals method 
> ---
>
> Key: SPARK-13886
> URL: https://issues.apache.org/jira/browse/SPARK-13886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: MahmoudHanafy
>Priority: Minor
>
> There are multiple types that are supoprted by Spark SQL
> One of them is ArrayType(Seq) which can be of any element type
> So it can be BinaryType(Array\[Byte\])
> In equals method in Row class, there is no handling for
> ArrayType of BinaryType.
> So for example:
> {code:xml}
> val a = Row( Seq( Array(1.toByte) ) )
> val b = Row( Seq( Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Also, this doesn't work for MapType of BinaryType.
> {code:xml}
> val a = Row( Map(1 -> Array(1.toByte) ) )
> val b = Row( Map(1 -> Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> I want to contribute by fixing this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method

2016-03-14 Thread MahmoudHanafy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MahmoudHanafy updated SPARK-13886:
--
Description: 
There are multiple types that are supoprted by Spark SQL
One of them is ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for
ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, this doesn't work for MapType of BinaryType.
{code:xml}
val a = Row( Map(1 -> Array(1.toByte) ) )
val b = Row( Map(1 -> Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

I can fix this issue.

  was:
There are multiple types that are supoprted by Spark SQL
One of them is ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for
ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, I think this doesn't work for MapType of BinaryType.

I can fix this issue.


> ArrayType of BinaryType not supported in Row.equals method 
> ---
>
> Key: SPARK-13886
> URL: https://issues.apache.org/jira/browse/SPARK-13886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: MahmoudHanafy
>Priority: Minor
>
> There are multiple types that are supoprted by Spark SQL
> One of them is ArrayType(Seq) which can be of any element type
> So it can be BinaryType(Array\[Byte\])
> In equals method in Row class, there is no handling for
> ArrayType of BinaryType.
> So for example:
> {code:xml}
> val a = Row( Seq( Array(1.toByte) ) )
> val b = Row( Seq( Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Also, this doesn't work for MapType of BinaryType.
> {code:xml}
> val a = Row( Map(1 -> Array(1.toByte) ) )
> val b = Row( Map(1 -> Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> I can fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method

2016-03-14 Thread MahmoudHanafy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MahmoudHanafy updated SPARK-13886:
--
Description: 
There are multiple types that are supoprted by Spark SQL
One of them is ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for
ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, I think this doesn't work for MapType of BinaryType.

I can fix this issue.

  was:
There are multiple types that are supoprted by Spark SQL
One of them is ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for
ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, I think this doesn't work for MapType of binary type.

I can fix this issue.


> ArrayType of BinaryType not supported in Row.equals method 
> ---
>
> Key: SPARK-13886
> URL: https://issues.apache.org/jira/browse/SPARK-13886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: MahmoudHanafy
>Priority: Minor
>
> There are multiple types that are supoprted by Spark SQL
> One of them is ArrayType(Seq) which can be of any element type
> So it can be BinaryType(Array\[Byte\])
> In equals method in Row class, there is no handling for
> ArrayType of BinaryType.
> So for example:
> {code:xml}
> val a = Row( Seq( Array(1.toByte) ) )
> val b = Row( Seq( Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Also, I think this doesn't work for MapType of BinaryType.
> I can fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method

2016-03-14 Thread MahmoudHanafy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MahmoudHanafy updated SPARK-13886:
--
Description: 
There are multiple types that are supoprted by Spark SQL
One of them is ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for
ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

Also, I think this doesn't work for MapType of binary type.

I can fix this issue.

  was:
There are multiple types that are supoprted by Spark SQL
One of them is ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for
ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

I can fix this issue.


> ArrayType of BinaryType not supported in Row.equals method 
> ---
>
> Key: SPARK-13886
> URL: https://issues.apache.org/jira/browse/SPARK-13886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: MahmoudHanafy
>Priority: Minor
>
> There are multiple types that are supoprted by Spark SQL
> One of them is ArrayType(Seq) which can be of any element type
> So it can be BinaryType(Array\[Byte\])
> In equals method in Row class, there is no handling for
> ArrayType of BinaryType.
> So for example:
> {code:xml}
> val a = Row( Seq( Array(1.toByte) ) )
> val b = Row( Seq( Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> Also, I think this doesn't work for MapType of binary type.
> I can fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13887) PyLint should fail fast to make errors easier to discover

2016-03-14 Thread holdenk (JIRA)

holdenk created SPARK-13887:
---

 Summary: PyLint should fail fast to make errors easier to discover
 Key: SPARK-13887
 URL: https://issues.apache.org/jira/browse/SPARK-13887
 Project: Spark
  Issue Type: Improvement
Reporter: holdenk
Priority: Minor


Right now our PyLint script runs all of the checks and then returns the output, 
this can make it difficult to find the part which errored and complicates the 
script a bit. We can simplify out script to fail fast which will both simplify 
the script and make it easier to discover the errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13887) PyLint should fail fast to make errors easier to discover

2016-03-14 Thread holdenk (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

holdenk updated SPARK-13887:

Component/s: PySpark
 Build

> PyLint should fail fast to make errors easier to discover
> -
>
> Key: SPARK-13887
> URL: https://issues.apache.org/jira/browse/SPARK-13887
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, PySpark
>Reporter: holdenk
>Priority: Minor
>
> Right now our PyLint script runs all of the checks and then returns the 
> output, this can make it difficult to find the part which errored and 
> complicates the script a bit. We can simplify out script to fail fast which 
> will both simplify the script and make it easier to discover the errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13886) ArrayType of BinaryType not supported in Row.equals method

2016-03-14 Thread MahmoudHanafy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MahmoudHanafy updated SPARK-13886:
--
Summary: ArrayType of BinaryType not supported in Row.equals method   (was: 
Binary Type )

> ArrayType of BinaryType not supported in Row.equals method 
> ---
>
> Key: SPARK-13886
> URL: https://issues.apache.org/jira/browse/SPARK-13886
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: MahmoudHanafy
>Priority: Minor
>
> There are multiple types that are supoprted by Spark SQL
> One of them is ArrayType(Seq) which can be of any element type
> So it can be BinaryType(Array\[Byte\])
> In equals method in Row class, there is no handling for
> ArrayType of BinaryType.
> So for example:
> {code:xml}
> val a = Row( Seq( Array(1.toByte) ) )
> val b = Row( Seq( Array(1.toByte) ) )
> a.equals(b) // this will return false
> {code}
> I can fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13886) Binary Type

2016-03-14 Thread MahmoudHanafy (JIRA)

MahmoudHanafy created SPARK-13886:
-

 Summary: Binary Type 
 Key: SPARK-13886
 URL: https://issues.apache.org/jira/browse/SPARK-13886
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: MahmoudHanafy
Priority: Minor


There are multiple types that are supoprted by Spark SQL
One of them is ArrayType(Seq) which can be of any element type
So it can be BinaryType(Array\[Byte\])

In equals method in Row class, there is no handling for
ArrayType of BinaryType.

So for example:
{code:xml}
val a = Row( Seq( Array(1.toByte) ) )
val b = Row( Seq( Array(1.toByte) ) )

a.equals(b) // this will return false
{code}

I can fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13885) Spark On Yarn attempt id representation regression

2016-03-14 Thread Saisai Shao (JIRA)

Saisai Shao created SPARK-13885:
---

 Summary: Spark On Yarn attempt id representation regression
 Key: SPARK-13885
 URL: https://issues.apache.org/jira/browse/SPARK-13885
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 2.0.0
Reporter: Saisai Shao


Due to the change of attempt id representation in SPARK-11314, previously 
attempt id ("1", "2") now changes to (appattempt-xxx-1, 
appattempt-xxx-2), this will affect all the parts using this attempt id, 
especially event log file name and history server url link.

So here we should change to the original way to fix this regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13102) Run query using ThriftServer, and open web using IE11, i click ”+detail" in SQLPage, but not response

2016-03-14 Thread KaiXinXIaoLei (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194665#comment-15194665
 ] 

KaiXinXIaoLei commented on SPARK-13102:
---

I just use F12 to debug, and the error info is :
CRIPT5009: “d3”no defined
file: spark-dag-viz.js，line: 295，column: 29

> Run query using ThriftServer, and open web using IE11, i  click ”+detail" in 
> SQLPage, but not response
> --
>
> Key: SPARK-13102
> URL: https://issues.apache.org/jira/browse/SPARK-13102
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: KaiXinXIaoLei
> Attachments: dag info is blank.png, details in SQLPage.png
>
>
> I run query using ThriftServer, and open web using IE11. Then i  click 
> ”+detail" in SQLPage, but not response. And I click "DAG Visualization " in 
> StagesPage, but get nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13877) Consider removing Kafka modules from Spark / Spark Streaming

2016-03-14 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194653#comment-15194653
 ] 

Saisai Shao commented on SPARK-13877:
-

I agreed to move this out, so it could easily support different versions. 
Currently if we want to introduce Kafka 0.9 supports, either dropping the 0.8 
supports or maintaining two modules, either way is not  so elegant. Maintaining 
it out of Spark would be a good choice.

> Consider removing Kafka modules from Spark / Spark Streaming
> 
>
> Key: SPARK-13877
> URL: https://issues.apache.org/jira/browse/SPARK-13877
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Streaming
>Affects Versions: 1.6.1
>Reporter: Hari Shreedharan
>
> Based on the discussion the PR for SPARK-13843 
> ([here|https://github.com/apache/spark/pull/11672#issuecomment-196553283]), 
> we should consider moving the Kafka modules out of Spark as well. 
> Providing newer functionality (like security) has become painful while 
> maintaining compatibility with older versions of Kafka. Moving this out 
> allows more flexibility, allowing users to mix and match Kafka and Spark 
> versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13884) Remove DescribeCommand's dependency on LogicalPlan

2016-03-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194647#comment-15194647
 ] 

Apache Spark commented on SPARK-13884:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/11710

> Remove DescribeCommand's dependency on LogicalPlan
> --
>
> Key: SPARK-13884
> URL: https://issues.apache.org/jira/browse/SPARK-13884
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> DescribeCommand should just take a TableIdentifier, and ask the metadata 
> catalog for table's information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13884) Remove DescribeCommand's dependency on LogicalPlan

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13884:


Assignee: Apache Spark  (was: Reynold Xin)

> Remove DescribeCommand's dependency on LogicalPlan
> --
>
> Key: SPARK-13884
> URL: https://issues.apache.org/jira/browse/SPARK-13884
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> DescribeCommand should just take a TableIdentifier, and ask the metadata 
> catalog for table's information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13884) Remove DescribeCommand's dependency on LogicalPlan

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13884:


Assignee: Reynold Xin  (was: Apache Spark)

> Remove DescribeCommand's dependency on LogicalPlan
> --
>
> Key: SPARK-13884
> URL: https://issues.apache.org/jira/browse/SPARK-13884
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> DescribeCommand should just take a TableIdentifier, and ask the metadata 
> catalog for table's information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13884) Remove DescribeCommand's dependency on LogicalPlan

2016-03-14 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-13884:
---

 Summary: Remove DescribeCommand's dependency on LogicalPlan
 Key: SPARK-13884
 URL: https://issues.apache.org/jira/browse/SPARK-13884
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


DescribeCommand should just take a TableIdentifier, and ask the metadata 
catalog for table's information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13034) PySpark ml.classification support export/import

2016-03-14 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-13034:
--
Shepherd: Joseph K. Bradley
Target Version/s: 2.0.0

> PySpark ml.classification support export/import
> ---
>
> Key: SPARK-13034
> URL: https://issues.apache.org/jira/browse/SPARK-13034
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Reporter: Yanbo Liang
>Priority: Minor
>
> Add export/import for all estimators and transformers(which have Scala 
> implementation) under pyspark/ml/classification.py. Please refer the 
> implementation at SPARK-13032.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13883) buildReader implementation for parquet

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13883:


Assignee: Michael Armbrust  (was: Apache Spark)

> buildReader implementation for parquet
> --
>
> Key: SPARK-13883
> URL: https://issues.apache.org/jira/browse/SPARK-13883
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
> Fix For: 2.0.0
>
>
> Port parquet to the new strategy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13883) buildReader implementation for parquet

2016-03-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194638#comment-15194638
 ] 

Apache Spark commented on SPARK-13883:
--

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/11709

> buildReader implementation for parquet
> --
>
> Key: SPARK-13883
> URL: https://issues.apache.org/jira/browse/SPARK-13883
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
> Fix For: 2.0.0
>
>
> Port parquet to the new strategy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13883) buildReader implementation for parquet

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13883:


Assignee: Apache Spark  (was: Michael Armbrust)

> buildReader implementation for parquet
> --
>
> Key: SPARK-13883
> URL: https://issues.apache.org/jira/browse/SPARK-13883
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Apache Spark
> Fix For: 2.0.0
>
>
> Port parquet to the new strategy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13883) buildReader implementation for parquet

2016-03-14 Thread Michael Armbrust (JIRA)

Michael Armbrust created SPARK-13883:


 Summary: buildReader implementation for parquet
 Key: SPARK-13883
 URL: https://issues.apache.org/jira/browse/SPARK-13883
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust


Port parquet to the new strategy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13791) Add MetadataLog and HDFSMetadataLog

2016-03-14 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-13791.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11625
[https://github.com/apache/spark/pull/11625]

> Add MetadataLog and HDFSMetadataLog
> ---
>
> Key: SPARK-13791
> URL: https://issues.apache.org/jira/browse/SPARK-13791
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.0.0
>
>
> - Add a MetadataLog interface for  metadata reliably storage.
> - Add HDFSMetadataLog as a MetadataLog implementation based on HDFS. 
> - Update FileStreamSource to use HDFSMetadataLog instead of managing metadata 
> by itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10380) Confusing examples in pyspark SQL docs

2016-03-14 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-10380:
-
Assignee: Reynold Xin

> Confusing examples in pyspark SQL docs
> --
>
> Key: SPARK-10380
> URL: https://issues.apache.org/jira/browse/SPARK-10380
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Reporter: Michael Armbrust
>Assignee: Reynold Xin
>Priority: Minor
>  Labels: docs, starter
> Fix For: 2.0.0
>
>
> There’s an error in the astype() documentation, as it uses cast instead of 
> astype. It should probably include a mention that astype is an alias for cast 
> (and vice versa in the cast documentation): 
> https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.astype
>  
> The same error occurs with drop_duplicates and dropDuplicates: 
> https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.drop_duplicates
>  
> The issue here is we are copying the code.  According to [~davies] the 
> easiest way is to copy the method and just add new docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-10380) Confusing examples in pyspark SQL docs

2016-03-14 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-10380.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11698
[https://github.com/apache/spark/pull/11698]

> Confusing examples in pyspark SQL docs
> --
>
> Key: SPARK-10380
> URL: https://issues.apache.org/jira/browse/SPARK-10380
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Reporter: Michael Armbrust
>Priority: Minor
>  Labels: docs, starter
> Fix For: 2.0.0
>
>
> There’s an error in the astype() documentation, as it uses cast instead of 
> astype. It should probably include a mention that astype is an alias for cast 
> (and vice versa in the cast documentation): 
> https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.astype
>  
> The same error occurs with drop_duplicates and dropDuplicates: 
> https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.drop_duplicates
>  
> The issue here is we are copying the code.  According to [~davies] the 
> easiest way is to copy the method and just add new docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13882) Remove org.apache.spark.sql.execution.local

2016-03-14 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13882.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Remove org.apache.spark.sql.execution.local
> ---
>
> Key: SPARK-13882
> URL: https://issues.apache.org/jira/browse/SPARK-13882
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.0.0
>
>
> We introduced some local operators in org.apache.spark.sql.execution.local 
> package but never fully wired the engine to actually use these. We still plan 
> to implement a full local mode, but it's probably going to be fairly 
> different from what the current iterator-based local mode would look like.
> Let's just remove them for now, and we can always re-introduced them in the 
> future by looking at branch-1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-11826) Subtract BlockMatrix

2016-03-14 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-11826.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Subtract BlockMatrix
> 
>
> Key: SPARK-11826
> URL: https://issues.apache.org/jira/browse/SPARK-11826
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.6.0
>Reporter: Ehsan Mohyedin Kermani
>Assignee: Ehsan Mohyedin Kermani
>Priority: Minor
> Fix For: 2.0.0
>
>
> It'd be more convenient to have subtract method for BlockMatrices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13664) Simplify and Speedup HadoopFSRelation

2016-03-14 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-13664.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11646
[https://github.com/apache/spark/pull/11646]

> Simplify and Speedup HadoopFSRelation
> -
>
> Key: SPARK-13664
> URL: https://issues.apache.org/jira/browse/SPARK-13664
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
> Fix For: 2.0.0
>
>
> A majority of Spark SQL queries likely run though {{HadoopFSRelation}}, 
> however there are currently several complexity and performance problems with 
> this code path:
>  - The class mixes the concerns of file management, schema reconciliation, 
> scan building, bucketing, partitioning, and writing data.
>  - For very large tables, we are broadcasting the entire list of files to 
> every executor. [SPARK-11441]
>  - For partitioned tables, we always do an extra projection.  This results 
> not only in a copy, but undoes much of the performance gains that we are 
> going to get from vectorized reads.
> This is an umbrella ticket to track a set of improvements to this codepath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13877) Consider removing Kafka modules from Spark / Spark Streaming

2016-03-14 Thread Liwei Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated SPARK-13877:
--
Summary: Consider removing Kafka modules from Spark / Spark Streaming  
(was: Consider removing Kafka modules from Spark)

> Consider removing Kafka modules from Spark / Spark Streaming
> 
>
> Key: SPARK-13877
> URL: https://issues.apache.org/jira/browse/SPARK-13877
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Streaming
>Affects Versions: 1.6.1
>Reporter: Hari Shreedharan
>
> Based on the discussion the PR for SPARK-13843 
> ([here|https://github.com/apache/spark/pull/11672#issuecomment-196553283]), 
> we should consider moving the Kafka modules out of Spark as well. 
> Providing newer functionality (like security) has become painful while 
> maintaining compatibility with older versions of Kafka. Moving this out 
> allows more flexibility, allowing users to mix and match Kafka and Spark 
> versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13877) Consider removing Kafka modules from Spark

2016-03-14 Thread Liwei Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated SPARK-13877:
--
Component/s: Streaming

> Consider removing Kafka modules from Spark
> --
>
> Key: SPARK-13877
> URL: https://issues.apache.org/jira/browse/SPARK-13877
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Streaming
>Affects Versions: 1.6.1
>Reporter: Hari Shreedharan
>
> Based on the discussion the PR for SPARK-13843 
> ([here|https://github.com/apache/spark/pull/11672#issuecomment-196553283]), 
> we should consider moving the Kafka modules out of Spark as well. 
> Providing newer functionality (like security) has become painful while 
> maintaining compatibility with older versions of Kafka. Moving this out 
> allows more flexibility, allowing users to mix and match Kafka and Spark 
> versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-13712) Add OneVsOne to ML

2016-03-14 Thread zhengruifeng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng updated SPARK-13712:
-
Comment: was deleted

(was: OK, I have closed the PR.
I had also planned to implement ECC after this PR.
In general, OneVsOne is slowest among the three methods, but it generate the 
highest accuracy. ECC is the fastest one (about log(num_class) submodels) with 
lowest accuracy. OneVsRest is in middle of them, both speed and accuracy.
In most case, num_class is a small number, and so OneVsOne is useful.
Suppose there are 3 classes, OneVsOne is even faster than OneVsRest. So I think 
it may be a useful choice for user.)

> Add OneVsOne to ML
> --
>
> Key: SPARK-13712
> URL: https://issues.apache.org/jira/browse/SPARK-13712
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: zhengruifeng
>Priority: Minor
>
> Another Meta method for multi-class classification.
> Most classification algorithms were designed for balanced data.
> The OneVsRest method will generate K models on imbalanced data.
> The OneVsOne will train K*(K-1)/2 models on balanced data.
> OneVsOne is less sensitive to the problems of imbalanced datasets, and can 
> usually result in higher precision.
> But it is much more computationally expensive, although each model are 
> trained on a much smaller dataset. (2/K of total)
> The OneVsOne is implemented in the way OneVsRest did:
> val classifier = new LogisticRegression()
> val ovo = new OneVsOne()
> ovo.setClassifier(classifier)
> val ovoModel = ovo.fit(data)
> val predictions = ovoModel.transform(data)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13712) Add OneVsOne to ML

2016-03-14 Thread zhengruifeng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194554#comment-15194554
 ] 

zhengruifeng commented on SPARK-13712:
--

OK, I have closed the PR.
I had also planned to implement ECC after this PR.
In general, OneVsOne is slowest among the three methods, but it generate the 
highest accuracy. ECC is the fastest one (about log(num_class) submodels) with 
lowest accuracy. OneVsRest is in middle of them, both speed and accuracy.
In most case, num_class is a small number, and so OneVsOne is useful.
Suppose there are 3 classes, OneVsOne is even faster than OneVsRest. So I think 
it may be a useful choice for user.

> Add OneVsOne to ML
> --
>
> Key: SPARK-13712
> URL: https://issues.apache.org/jira/browse/SPARK-13712
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: zhengruifeng
>Priority: Minor
>
> Another Meta method for multi-class classification.
> Most classification algorithms were designed for balanced data.
> The OneVsRest method will generate K models on imbalanced data.
> The OneVsOne will train K*(K-1)/2 models on balanced data.
> OneVsOne is less sensitive to the problems of imbalanced datasets, and can 
> usually result in higher precision.
> But it is much more computationally expensive, although each model are 
> trained on a much smaller dataset. (2/K of total)
> The OneVsOne is implemented in the way OneVsRest did:
> val classifier = new LogisticRegression()
> val ovo = new OneVsOne()
> ovo.setClassifier(classifier)
> val ovoModel = ovo.fit(data)
> val predictions = ovoModel.transform(data)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13712) Add OneVsOne to ML

2016-03-14 Thread zhengruifeng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194555#comment-15194555
 ] 

zhengruifeng commented on SPARK-13712:
--

OK, I have closed the PR.
I had also planned to implement ECC after this PR.
In general, OneVsOne is slowest among the three methods, but it generate the 
highest accuracy. ECC is the fastest one (about log(num_class) submodels) with 
lowest accuracy. OneVsRest is in middle of them, both speed and accuracy.
In most case, num_class is a small number, and so OneVsOne is useful.
Suppose there are 3 classes, OneVsOne is even faster than OneVsRest. So I think 
it may be a useful choice for user.

> Add OneVsOne to ML
> --
>
> Key: SPARK-13712
> URL: https://issues.apache.org/jira/browse/SPARK-13712
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: zhengruifeng
>Priority: Minor
>
> Another Meta method for multi-class classification.
> Most classification algorithms were designed for balanced data.
> The OneVsRest method will generate K models on imbalanced data.
> The OneVsOne will train K*(K-1)/2 models on balanced data.
> OneVsOne is less sensitive to the problems of imbalanced datasets, and can 
> usually result in higher precision.
> But it is much more computationally expensive, although each model are 
> trained on a much smaller dataset. (2/K of total)
> The OneVsOne is implemented in the way OneVsRest did:
> val classifier = new LogisticRegression()
> val ovo = new OneVsOne()
> ovo.setClassifier(classifier)
> val ovoModel = ovo.fit(data)
> val predictions = ovoModel.transform(data)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13118) Support for classes defined in package objects

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13118:


Assignee: (was: Apache Spark)

> Support for classes defined in package objects
> --
>
> Key: SPARK-13118
> URL: https://issues.apache.org/jira/browse/SPARK-13118
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> When you define a class inside of a package object, the name ends up being 
> something like {{org.mycompany.project.package$MyClass}}.  However, when 
> reflect on this we try and load {{org.mycompany.project.MyClass}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-03-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194479#comment-15194479
 ] 

Apache Spark commented on SPARK-13118:
--

User 'jodersky' has created a pull request for this issue:
https://github.com/apache/spark/pull/11708

> Support for classes defined in package objects
> --
>
> Key: SPARK-13118
> URL: https://issues.apache.org/jira/browse/SPARK-13118
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> When you define a class inside of a package object, the name ends up being 
> something like {{org.mycompany.project.package$MyClass}}.  However, when 
> reflect on this we try and load {{org.mycompany.project.MyClass}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13118) Support for classes defined in package objects

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13118:


Assignee: Apache Spark

> Support for classes defined in package objects
> --
>
> Key: SPARK-13118
> URL: https://issues.apache.org/jira/browse/SPARK-13118
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>Assignee: Apache Spark
>
> When you define a class inside of a package object, the name ends up being 
> something like {{org.mycompany.project.package$MyClass}}.  However, when 
> reflect on this we try and load {{org.mycompany.project.MyClass}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-03-14 Thread Jakob Odersky (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194465#comment-15194465
 ] 

Jakob Odersky commented on SPARK-13118:
---

Sure, I'll submit a PR with the test

> Support for classes defined in package objects
> --
>
> Key: SPARK-13118
> URL: https://issues.apache.org/jira/browse/SPARK-13118
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> When you define a class inside of a package object, the name ends up being 
> something like {{org.mycompany.project.package$MyClass}}.  However, when 
> reflect on this we try and load {{org.mycompany.project.MyClass}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12718) SQL generation support for window functions

2016-03-14 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-12718:
---
Assignee: Wenchen Fan  (was: Xiao Li)

> SQL generation support for window functions
> ---
>
> Key: SPARK-12718
> URL: https://issues.apache.org/jira/browse/SPARK-12718
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Wenchen Fan
>
> {{HiveWindowFunctionQuerySuite}} and {{HiveWindowFunctionQueryFileSuite}} can 
> be useful for bootstrapping test coverage. Please refer to SPARK-11012 for 
> more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-03-14 Thread Michael Armbrust (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194436#comment-15194436
 ] 

Michael Armbrust commented on SPARK-13118:
--

Its likely that we have fixed this with other refactorings.  If you add that 
regression test I think we can close this.

> Support for classes defined in package objects
> --
>
> Key: SPARK-13118
> URL: https://issues.apache.org/jira/browse/SPARK-13118
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> When you define a class inside of a package object, the name ends up being 
> something like {{org.mycompany.project.package$MyClass}}.  However, when 
> reflect on this we try and load {{org.mycompany.project.MyClass}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13244) Unify DataFrame and Dataset API

2016-03-14 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13244:

Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-13485

> Unify DataFrame and Dataset API
> ---
>
> Key: SPARK-13244
> URL: https://issues.apache.org/jira/browse/SPARK-13244
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Cheng Lian
> Fix For: 2.0.0
>
>
> A {{DataFrame}} is essentially a {{Dataset\[Row\]}}. However, to keep binary 
> compatibility, {{DataFrame}} didn't extend from {{Dataset\[Row\]}} in 1.6.
> In Spark 2.0, they should be unified to minimize concepts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13843) Move streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages

2016-03-14 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13843:

Summary: Move streaming-flume, streaming-mqtt, streaming-zeromq, 
streaming-akka, streaming-twitter to Spark packages  (was: Remove 
streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, 
streaming-twitter to Spark packages)

> Move streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, 
> streaming-twitter to Spark packages
> ---
>
> Key: SPARK-13843
> URL: https://issues.apache.org/jira/browse/SPARK-13843
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.0.0
>
>
> Currently there are a few sub-projects, each for integrating with different 
> external sources for Streaming.  Now that we have better ability to include 
> external libraries (Spark packages) and with Spark 2.0 coming up, we can move 
> the following projects out of Spark to https://github.com/spark-packages
> - streaming-flume
> - streaming-akka
> - streaming-mqtt
> - streaming-zeromq
> - streaming-twitter
> They are just some ancillary packages and considering the overhead of 
> maintenance, running tests and PR failures, it's better to maintain them out 
> of Spark. In addition, these projects can have their different release cycles 
> and we can release them faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13843) Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, streaming-twitter to Spark packages

2016-03-14 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13843.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Remove streaming-flume, streaming-mqtt, streaming-zeromq, streaming-akka, 
> streaming-twitter to Spark packages
> -
>
> Key: SPARK-13843
> URL: https://issues.apache.org/jira/browse/SPARK-13843
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.0.0
>
>
> Currently there are a few sub-projects, each for integrating with different 
> external sources for Streaming.  Now that we have better ability to include 
> external libraries (Spark packages) and with Spark 2.0 coming up, we can move 
> the following projects out of Spark to https://github.com/spark-packages
> - streaming-flume
> - streaming-akka
> - streaming-mqtt
> - streaming-zeromq
> - streaming-twitter
> They are just some ancillary packages and considering the overhead of 
> maintenance, running tests and PR failures, it's better to maintain them out 
> of Spark. In addition, these projects can have their different release cycles 
> and we can release them faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13882) Remove org.apache.spark.sql.execution.local

2016-03-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194404#comment-15194404
 ] 

Apache Spark commented on SPARK-13882:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/11705

> Remove org.apache.spark.sql.execution.local
> ---
>
> Key: SPARK-13882
> URL: https://issues.apache.org/jira/browse/SPARK-13882
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> We introduced some local operators in org.apache.spark.sql.execution.local 
> package but never fully wired the engine to actually use these. We still plan 
> to implement a full local mode, but it's probably going to be fairly 
> different from what the current iterator-based local mode would look like.
> Let's just remove them for now, and we can always re-introduced them in the 
> future by looking at branch-1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13882) Remove org.apache.spark.sql.execution.local

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13882:


Assignee: Reynold Xin  (was: Apache Spark)

> Remove org.apache.spark.sql.execution.local
> ---
>
> Key: SPARK-13882
> URL: https://issues.apache.org/jira/browse/SPARK-13882
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> We introduced some local operators in org.apache.spark.sql.execution.local 
> package but never fully wired the engine to actually use these. We still plan 
> to implement a full local mode, but it's probably going to be fairly 
> different from what the current iterator-based local mode would look like.
> Let's just remove them for now, and we can always re-introduced them in the 
> future by looking at branch-1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13882) Remove org.apache.spark.sql.execution.local

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13882:


Assignee: Apache Spark  (was: Reynold Xin)

> Remove org.apache.spark.sql.execution.local
> ---
>
> Key: SPARK-13882
> URL: https://issues.apache.org/jira/browse/SPARK-13882
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> We introduced some local operators in org.apache.spark.sql.execution.local 
> package but never fully wired the engine to actually use these. We still plan 
> to implement a full local mode, but it's probably going to be fairly 
> different from what the current iterator-based local mode would look like.
> Let's just remove them for now, and we can always re-introduced them in the 
> future by looking at branch-1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13882) Remove org.apache.spark.sql.execution.local

2016-03-14 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-13882:
---

 Summary: Remove org.apache.spark.sql.execution.local
 Key: SPARK-13882
 URL: https://issues.apache.org/jira/browse/SPARK-13882
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


We introduced some local operators in org.apache.spark.sql.execution.local 
package but never fully wired the engine to actually use these. We still plan 
to implement a full local mode, but it's probably going to be fairly different 
from what the current iterator-based local mode would look like.

Let's just remove them for now, and we can always re-introduced them in the 
future by looking at branch-1.6.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13880) Rename DataFrame.scala as Dataset.scala

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13880:


Assignee: Reynold Xin  (was: Apache Spark)

> Rename DataFrame.scala as Dataset.scala
> ---
>
> Key: SPARK-13880
> URL: https://issues.apache.org/jira/browse/SPARK-13880
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13881) Remove LegacyFunctions

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13881:


Assignee: Apache Spark  (was: Reynold Xin)

> Remove LegacyFunctions
> --
>
> Key: SPARK-13881
> URL: https://issues.apache.org/jira/browse/SPARK-13881
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> It was introduced in Spark 1.6 for backward compatibility. We can remove it 
> now in Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13881) Remove LegacyFunctions

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13881:


Assignee: Reynold Xin  (was: Apache Spark)

> Remove LegacyFunctions
> --
>
> Key: SPARK-13881
> URL: https://issues.apache.org/jira/browse/SPARK-13881
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> It was introduced in Spark 1.6 for backward compatibility. We can remove it 
> now in Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13881) Remove LegacyFunctions

2016-03-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194397#comment-15194397
 ] 

Apache Spark commented on SPARK-13881:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/11704

> Remove LegacyFunctions
> --
>
> Key: SPARK-13881
> URL: https://issues.apache.org/jira/browse/SPARK-13881
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> It was introduced in Spark 1.6 for backward compatibility. We can remove it 
> now in Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-13880) Rename DataFrame.scala as Dataset.scala

2016-03-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13880:


Assignee: Apache Spark  (was: Reynold Xin)

> Rename DataFrame.scala as Dataset.scala
> ---
>
> Key: SPARK-13880
> URL: https://issues.apache.org/jira/browse/SPARK-13880
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13880) Rename DataFrame.scala as Dataset.scala

2016-03-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194396#comment-15194396
 ] 

Apache Spark commented on SPARK-13880:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/11704

> Rename DataFrame.scala as Dataset.scala
> ---
>
> Key: SPARK-13880
> URL: https://issues.apache.org/jira/browse/SPARK-13880
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13881) Remove LegacyFunctions

2016-03-14 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-13881:
---

 Summary: Remove LegacyFunctions
 Key: SPARK-13881
 URL: https://issues.apache.org/jira/browse/SPARK-13881
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


It was introduced in Spark 1.6 for backward compatibility. We can remove it now 
in Spark 2.0.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13880) Rename DataFrame.scala as Dataset.scala

2016-03-14 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-13880:
---

 Summary: Rename DataFrame.scala as Dataset.scala
 Key: SPARK-13880
 URL: https://issues.apache.org/jira/browse/SPARK-13880
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13879) Decide DDL/DML commands that need Spark native implementation in 2.0

2016-03-14 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13879:

Summary: Decide DDL/DML commands that need Spark native implementation in 
2.0  (was: Decide commands that need Spark native implementation in 2.0)

> Decide DDL/DML commands that need Spark native implementation in 2.0
> 
>
> Key: SPARK-13879
> URL: https://issues.apache.org/jira/browse/SPARK-13879
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
> Attachments: Implementing native DDL and DML statements for Spark 
> 2.pdf
>
>
> This task aims to decide commands that we current we delegate to Hive right 
> now but need to have native implementation in Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13879) Decide commands that need Spark native implementation in 2.0

2016-03-14 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194382#comment-15194382
 ] 

Yin Huai commented on SPARK-13879:
--

Just uploaded the initial doc 
(https://issues.apache.org/jira/secure/attachment/12793435/Implementing%20native%20DDL%20and%20DML%20statements%20for%20Spark%202.pdf).
 Please note that for Hive tables, we need to implement commands that we do not 
want to support for Data Source tables for now. Those commands will be 
summarized in another doc.

> Decide commands that need Spark native implementation in 2.0
> 
>
> Key: SPARK-13879
> URL: https://issues.apache.org/jira/browse/SPARK-13879
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
> Attachments: Implementing native DDL and DML statements for Spark 
> 2.pdf
>
>
> This task aims to decide commands that we current we delegate to Hive right 
> now but need to have native implementation in Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-13854) Add constraints to outer join

2016-03-14 Thread Liang-Chi Hsieh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh closed SPARK-13854.
---
Resolution: Not A Problem

> Add constraints to outer join
> -
>
> Key: SPARK-13854
> URL: https://issues.apache.org/jira/browse/SPARK-13854
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>
> Currently, for left outer join we only keep left side constraint. For right 
> outer join, we only keep right side constraints. For full outer join, the 
> constraints are empty.
> In fact, the constraints are less than the actual constraints for the join 
> operator.
> For example, for left outer join, besides the constraints from left side, the 
> constraints of right side should be inherited with a bit modification.
> Consider a join as following:
> {code}
> val tr1 = LocalRelation('a.int, 'b.int, 'c.int).subquery('tr1)
> val tr2 = LocalRelation('a.int, 'd.int, 'e.int).subquery('tr2)
> tr1.where('a.attr > 10)
>   .join(tr2.where('d.attr < 100), LeftOuter, Some("tr1.a".attr === 
> "tr2.a".attr))
> {code}
> The constraints are not only "a" > 10, "a" is not null. It should also 
> include ("d" is null || "d" < 100).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13879) Decide commands that need Spark native implementation in 2.0

2016-03-14 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-13879:
-
Attachment: Implementing native DDL and DML statements for Spark 2.pdf

> Decide commands that need Spark native implementation in 2.0
> 
>
> Key: SPARK-13879
> URL: https://issues.apache.org/jira/browse/SPARK-13879
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
> Attachments: Implementing native DDL and DML statements for Spark 
> 2.pdf
>
>
> This task aims to decide commands that we current we delegate to Hive right 
> now but need to have native implementation in Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13859) TPCDS query 38 returns wrong results compared to TPC official result set

2016-03-14 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13859:

Description: 
Testing Spark SQL using TPC queries. Query 38 returns wrong results compared to 
official result set. This is at 1GB SF (validation run).

SparkSQL returns count of 0, answer set reports 107.

Actual results:
{noformat}
[0]
{noformat}

Expected:
{noformat}
+-+
|   1 |
+-+
| 107 |
+-+
{noformat}

query used:
{noformat}
-- start query 38 in stream 0 using template query38.tpl and seed QUALIFICATION
 select  count(*) from (
select distinct c_last_name, c_first_name, d_date
from store_sales
 JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
where d_month_seq between 1200 and 1200 + 11) tmp1
  JOIN
(select distinct c_last_name, c_first_name, d_date
from catalog_sales
 JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
 JOIN customer ON catalog_sales.cs_bill_customer_sk = 
customer.c_customer_sk
where d_month_seq between 1200 and 1200 + 11) tmp2 ON (tmp1.c_last_name = 
tmp2.c_last_name) and (tmp1.c_first_name = tmp2.c_first_name) and (tmp1.d_date 
= tmp2.d_date) 
  JOIN
(
select distinct c_last_name, c_first_name, d_date
from web_sales
 JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
 JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk
where d_month_seq between 1200 and 1200 + 11) tmp3 ON (tmp1.c_last_name = 
tmp3.c_last_name) and (tmp1.c_first_name = tmp3.c_first_name) and (tmp1.d_date 
= tmp3.d_date) 
  limit 100
 ;
-- end query 38 in stream 0 using template query38.tpl

{noformat}

  was:
Testing Spark SQL using TPC queries. Query 38 returns wrong results compared to 
official result set. This is at 1GB SF (validation run).

SparkSQL returns count of 0, answer set reports 107.

Actual results:
[0]

Expected:
+-+
|   1 |
+-+
| 107 |
+-+

query used:
-- start query 38 in stream 0 using template query38.tpl and seed QUALIFICATION
 select  count(*) from (
select distinct c_last_name, c_first_name, d_date
from store_sales
 JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
where d_month_seq between 1200 and 1200 + 11) tmp1
  JOIN
(select distinct c_last_name, c_first_name, d_date
from catalog_sales
 JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
 JOIN customer ON catalog_sales.cs_bill_customer_sk = 
customer.c_customer_sk
where d_month_seq between 1200 and 1200 + 11) tmp2 ON (tmp1.c_last_name = 
tmp2.c_last_name) and (tmp1.c_first_name = tmp2.c_first_name) and (tmp1.d_date 
= tmp2.d_date) 
  JOIN
(
select distinct c_last_name, c_first_name, d_date
from web_sales
 JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
 JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk
where d_month_seq between 1200 and 1200 + 11) tmp3 ON (tmp1.c_last_name = 
tmp3.c_last_name) and (tmp1.c_first_name = tmp3.c_first_name) and (tmp1.d_date 
= tmp3.d_date) 
  limit 100
 ;
-- end query 38 in stream 0 using template query38.tpl



> TPCDS query 38 returns wrong results compared to TPC official result set 
> -
>
> Key: SPARK-13859
> URL: https://issues.apache.org/jira/browse/SPARK-13859
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: JESSE CHEN
>  Labels: tpcds-result-mismatch
>
> Testing Spark SQL using TPC queries. Query 38 returns wrong results compared 
> to official result set. This is at 1GB SF (validation run).
> SparkSQL returns count of 0, answer set reports 107.
> Actual results:
> {noformat}
> [0]
> {noformat}
> Expected:
> {noformat}
> +-+
> |   1 |
> +-+
> | 107 |
> +-+
> {noformat}
> query used:
> {noformat}
> -- start query 38 in stream 0 using template query38.tpl and seed 
> QUALIFICATION
>  select  count(*) from (
> select distinct c_last_name, c_first_name, d_date
> from store_sales
>  JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
>  JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
> where d_month_seq between 1200 and 1200 + 11) tmp1
>   JOIN
> (select distinct c_last_name, c_first_name, d_date
> from catalog_sales
>  JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
>  JOIN customer ON catalog_sales.cs_bill_customer_sk = 
> customer.c_customer_sk
> where d_month_seq between 1200 and 1200 + 11) tmp2 ON (tmp1.c_last_name =

[jira] [Updated] (SPARK-13858) TPCDS query 21 returns wrong results compared to TPC official result set

2016-03-14 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13858:

Description: 
Testing Spark SQL using TPC queries. Query 21 returns wrong results compared to 
official result set. This is at 1GB SF (validation run).

SparkSQL missing at least one row (grep for ABDA) ; I believe 2 
other rows are missing as well.

Actual results:
{noformat}
[null,AABD,2565,1922]
[null,AAHD,2956,2052]
[null,AALA,2042,1793]
[null,ACGC,2373,1771]
[null,ACKC,2321,1856]
[null,ACOB,1504,1397]
[null,ADKB,1820,2163]
[null,AEAD,2631,1965]
[null,AEOC,1659,1798]
[null,AFAC,1965,1705]
[null,AFAD,1769,1313]
[null,AHDE,2700,1985]
[null,AHHA,1578,1082]
[null,AIEC,1756,1804]
[null,AIMC,3603,2951]
[null,AJAC,2109,1989]
[null,AJKB,2573,3540]
[null,ALBE,3458,2992]
[null,ALCE,1720,1810]
[null,ALEC,2569,1946]
[null,ALNB,2552,1750]
[null,ANFE,2022,2269]
[null,AOIB,2982,2540]
[null,APJB,2344,2593]
[null,BAPD,2182,2787]
[null,BDCE,2844,2069]
[null,BDDD,2417,2537]
[null,BDJA,1584,1666]
[null,BEOD,2141,2649]
[null,BFCC,2745,2020]
[null,BFMB,1642,1364]
[null,BHPC,1923,1780]
[null,BIDB,1956,2836]
[null,BIGB,2023,2344]
[null,BIJB,1977,2728]
[null,BJFE,1891,2390]
[null,BLDE,1983,1797]
[null,BNID,2485,2324]
[null,BNLD,2385,2786]
[null,BOMB,2291,2092]
[null,CAAA,2233,2560]
[null,CBCD,1540,2012]
[null,CBIA,2394,2122]
[null,CBPB,1790,1661]
[null,CCMD,2654,2691]
[null,CDBC,1804,2072]
[null,CFEA,1941,1567]
[null,CGFD,2123,2265]
[null,CHPC,2933,2174]
[null,CIGD,2618,2399]
[null,CJCB,2728,2367]
[null,CJLA,1350,1732]
[null,CLAE,2578,2329]
[null,CLGA,1842,1588]
[null,CLLB,3418,2657]
[null,CLOB,3115,2560]
[null,CMAD,1991,2243]
[null,CMJA,1261,1855]
[null,CMLA,3288,2753]
[null,CMPD,1320,1676]
[null,CNGB,2340,2118]
[null,CNHD,3519,3348]
[null,CNPC,2561,1948]
[null,DCPC,2664,2627]
[null,DDHA,1313,1926]
[null,DDND,1109,835]
[null,DEAA,2141,1847]
[null,DEJA,3142,2723]
[null,DFKB,1470,1650]
[null,DGCC,2113,2331]
[null,DGFC,2201,2928]
[null,DHPA,2467,2133]
[null,DMBA,3085,2087]
[null,DPAB,3494,3081]
[null,EAEC,2133,2148]
[null,EAPA,1560,1275]
[null,ECGC,2815,3307]
[null,EDPD,2731,1883]
[null,EEEC,2024,1902]
[null,EEMC,2624,2387]
[null,EFFA,2047,1878]
[null,EGJA,2403,2633]
[null,EGMA,2784,2772]
[null,EGOC,2389,1753]
[null,EHFD,1940,1420]
[null,EHLB,2320,2057]
[null,EHPA,1898,1853]
[null,EIPB,2930,2326]
[null,EJAE,2582,1836]
[null,EJIB,2257,1681]
[null,EJJA,2791,1941]
[null,EJJD,3410,2405]
[null,EJNC,2472,2067]
[null,EJPD,1219,1229]
[null,EKEB,2047,1713]
[null,EMEA,2502,1897]
[null,EMKC,2362,2042]
[null,ENAC,2011,1909]
[null,ENFB,2507,2162]
[null,ENOD,3371,2709]
{noformat}


Expected results:
{noformat}
+--+--++---+
| W_WAREHOUSE_NAME | I_ITEM_ID| INV_BEFORE | INV_AFTER |
+--+--++---+
| Bad cards must make. | AACD |   1889 |  2168 |
| Bad cards must make. | AAHD |   2739 |  2039 |
| Bad cards must make. | ABDA |   1717 |  1782 |
| Bad cards must make. | ACGC |   2296 |  2276 |
| Bad cards must make. | ACKC |   2443 |  1878 |
| Bad cards must make. | ACOB |   2705 |  2428 |
| Bad cards must make. | ADGB |   2242 |  2759 |
| Bad cards must make. | ADKB |   2138 |  2456 |
| Bad cards must make. | AEAD |   2914 |  2237 |
| Bad cards must make. | AEOC |   1797 |  2073 |
| Bad cards must make. | AFAC |   2058 |  2734 |
| Bad cards must make. | AFAD |   2173 |  2515 |
| Bad cards must make. |

[jira] [Created] (SPARK-13879) Decide commands that need Spark native implementation in 2.0

2016-03-14 Thread Yin Huai (JIRA)

Yin Huai created SPARK-13879:


 Summary: Decide commands that need Spark native implementation in 
2.0
 Key: SPARK-13879
 URL: https://issues.apache.org/jira/browse/SPARK-13879
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yin Huai


This task aims to decide commands that we current we delegate to Hive right now 
but need to have native implementation in Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13879) Decide commands that need Spark native implementation in 2.0

2016-03-14 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-13879:
-
Target Version/s: 2.0.0

> Decide commands that need Spark native implementation in 2.0
> 
>
> Key: SPARK-13879
> URL: https://issues.apache.org/jira/browse/SPARK-13879
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> This task aims to decide commands that we current we delegate to Hive right 
> now but need to have native implementation in Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13861) TPCDS query 40 returns wrong results compared to TPC official result set

2016-03-14 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13861:

Description: 
Testing Spark SQL using TPC queries. Query 40 returns wrong results compared to 
official result set. This is at 1GB SF (validation run).

SparkSQL missing at least one row (grep for ABBD) ; I believe 5 
rows are missing in total.

Actual results:
{noformat}
[TN,AABD,0.0,-82.060899353]
[TN,AACD,-216.54000234603882,158.0399932861328]
[TN,AAHD,186.54999542236328,0.0]
[TN,AALA,0.0,48.2254223633]
[TN,ACGC,63.67999863624573,0.0]
[TN,ACHC,102.6830517578,51.8838964844]
[TN,ACKC,128.9235150146,44.8169482422]
[TN,ACLD,205.43999433517456,-948.619930267334]
[TN,ACOB,207.32000732421875,24.88389648438]
[TN,ACPD,87.75,53.9900016784668]
[TN,ADGB,44.310001373291016,222.4800033569336]
[TN,ADKB,0.0,-471.8699951171875]
[TN,AEAD,58.2400016784668,0.0]
[TN,AEOC,19.9084741211,214.7076293945]
[TN,AFAC,271.8199977874756,163.1699981689453]
[TN,AFAD,2.349046325684,28.3169482422]
[TN,AFDC,-378.0499496459961,-303.26999282836914]
[TN,AGID,307.6099967956543,-19.29915527344]
[TN,AHDE,80.574468689,-476.7200012207031]
[TN,AHHA,8.27457763672,155.1276565552]
[TN,AHJB,39.23999857902527,0.0]
[TN,AIEC,82.3675750732,3.910858306885]
[TN,AIEE,20.39618530273,-151.08999633789062]
[TN,AIMC,24.46313354492,-150.330517578]
[TN,AJAC,49.0915258789,82.084741211]
[TN,AJCA,121.18000221252441,63.779998779296875]
[TN,AJKB,27.94534057617,8.97267028809]
[TN,ALBE,88.2599983215332,30.22542236328]
[TN,ALCE,93.5245776367,92.0198092651]
[TN,ALEC,64.179019165,15.1584741211]
[TN,ALNB,4.19809265137,148.27000427246094]
[TN,AMBE,28.44534057617,0.0]
[TN,AMPB,0.0,131.92999839782715]
[TN,ANFE,0.0,-137.3400115966797]
[TN,AOIB,150.40999603271484,254.288058548]
[TN,APJB,45.2745776367,334.482015991]
[TN,APLA,50.2076293945,29.150001049041748]
[TN,APLD,0.0,32.3838964844]
[TN,BAPD,93.41999816894531,145.8699951171875]
[TN,BBID,296.774577637,30.95084472656]
[TN,BDCE,-1771.0800704956055,-54.779998779296875]
[TN,BDDD,111.12000274658203,280.5899963378906]
[TN,BDJA,0.0,79.5423706055]
[TN,BEFD,0.0,3.429475479126]
[TN,BEOD,269.838964844,297.5800061225891]
[TN,BFMB,110.82999801635742,-941.4000930786133]
[TN,BFNA,47.8661035156,0.0]
[TN,BFOC,46.3415258789,83.5245776367]
[TN,BHPC,27.378392334,77.61999893188477]
[TN,BIDB,196.6199951171875,5.57171661377]
[TN,BIGB,425.3399963378906,0.0]
[TN,BIJB,209.6300048828125,0.0]
[TN,BJFE,7.32923706055,55.1584741211]
[TN,BKFA,0.0,138.14000129699707]
[TN,BKMC,27.17076293945,54.970001220703125]
[TN,BLDE,170.28999400138855,0.0]
[TN,BNHB,58.0594277954,-337.8899841308594]
[TN,BNID,54.41525878906,35.01504089355]
[TN,BNLA,0.0,168.37999629974365]
[TN,BNLD,0.0,96.4084741211]
[TN,BNMC,202.40999698638916,49.52999830245972]
[TN,BOCC,4.73019073486,69.83999633789062]
[TN,BOMB,63.66999816894531,163.49000668525696]
[TN,CAAA,121.91000366210938,0.0]
[TN,CAAD,-1107.6099338531494,0.0]
[TN,CAJC,115.8046594238,173.0519073486]
[TN,CBCD,18.94534057617,226.38000106811523]
[TN,CBFA,0.0,97.41000366210938]
[TN,CBIA,2.14104904175,84.66000366210938]
[TN,CBPB,95.44000244140625,26.6830517578]
[TN,CCAB,160.43000602722168,135.8661035156]
[TN,CCHD,0.0,121.62000274658203]
[TN,CCMD,-115.87000274658203,124.37999820709229]
[TN,CDBC,16.628392334,3.399910593033]
[TN,CDEC,-3114.599931716919,0.0]
[TN,CEEA,34.6830517578,26.4084741211]
[TN,CELA,130.58999633789062,154.6300048828125]
[TN,CELD,0.0,181.07000732421875]
[TN,CFEA,3.779713897705,-315.13000106811523]
[TN,CGFD,-386.8699951171875,96.92000102996826]
[TN,CHHD,143.17000675201416,251.6338964844]
[TN,CHPC,0.1700178813934,198.2991552734]
[TN,CJCB,-918.6500339508057,270.9600028991699]

[jira] [Commented] (SPARK-13865) TPCDS query 87 returns wrong results compared to TPC official result set

2016-03-14 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194370#comment-15194370
 ] 

Reynold Xin commented on SPARK-13865:
-

[~jfc...@us.ibm.com] thanks for filing these. Can you use the noformat tag in 
the future so the ticket is readable?


> TPCDS query 87 returns wrong results compared to TPC official result set 
> -
>
> Key: SPARK-13865
> URL: https://issues.apache.org/jira/browse/SPARK-13865
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: JESSE CHEN
>  Labels: tpcds-result-mismatch
>
> Testing Spark SQL using TPC queries. Query 87 returns wrong results compared 
> to official result set. This is at 1GB SF (validation run).
> SparkSQL returns count of 47555, answer set expects 47298.
> Actual results:
> {noformat}
> [47555]
> {noformat}
> {noformat}
> Expected:
> +---+
> | 1 |
> +---+
> | 47298 |
> +---+
> {noformat}
> Query used:
> {noformat}
> -- start query 87 in stream 0 using template query87.tpl and seed 
> QUALIFICATION
> select count(*) 
> from 
>  (select distinct c_last_name as cln1, c_first_name as cfn1, d_date as 
> ddate1, 1 as notnull1
>from store_sales
> JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
> JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
>where
>  d_month_seq between 1200 and 1200+11
>) tmp1
>left outer join
>   (select distinct c_last_name as cln2, c_first_name as cfn2, d_date as 
> ddate2, 1 as notnull2
>from catalog_sales
> JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
> JOIN customer ON catalog_sales.cs_bill_customer_sk = 
> customer.c_customer_sk
>where 
>  d_month_seq between 1200 and 1200+11
>) tmp2 
>   on (tmp1.cln1 = tmp2.cln2)
>   and (tmp1.cfn1 = tmp2.cfn2)
>   and (tmp1.ddate1= tmp2.ddate2)
>left outer join
>   (select distinct c_last_name as cln3, c_first_name as cfn3 , d_date as 
> ddate3, 1 as notnull3
>from web_sales
> JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
> JOIN customer ON web_sales.ws_bill_customer_sk = 
> customer.c_customer_sk
>where 
>  d_month_seq between 1200 and 1200+11
>) tmp3 
>   on (tmp1.cln1 = tmp3.cln3)
>   and (tmp1.cfn1 = tmp3.cfn3)
>   and (tmp1.ddate1= tmp3.ddate3)
> where  
> notnull2 is null and notnull3 is null  
> ;
> -- end query 87 in stream 0 using template query87.tpl
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13863) TPCDS query 66 returns wrong results compared to TPC official result set

2016-03-14 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13863:

Description: 
Testing Spark SQL using TPC queries. Query 66 returns wrong results compared to 
official result set. This is at 1GB SF (validation run).

Aggregations slightly off -- eg. JAN_SALES column of "Doors canno"  row - 
SparkSQL returns 6355232.185385704, expected 6355232.31

Actual results:
{noformat}
[null,null,Fairview,Williamson County,TN,United 
States,DHL,BARIAN,2001,9597806.850651741,1.1121820530080795E7,8670867.81564045,8994785.945689201,1.088724806326294E7,1.4187671518377304E7,9732598.460139751,1.9798897020946026E7,2.1007842467959404E7,2.149551364927292E7,3.479566905774999E7,3.3122997954660416E7,null,null,null,null,null,null,null,null,null,null,null,null,2.191359469742E7,3.2518476414670944E7,2.48856624883976E7,2.5698343830046654E7,3.373591080598068E7,3.552703167087555E7,2.5465193481492043E7,5.362323870799959E7,5.1409986978201866E7,5.415917383586836E7,9.222704311805725E7,8.343539111531019E7]
[Bad cards must make.,621234,Fairview,Williamson County,TN,United 
States,DHL,BARIAN,2001,9506753.593884468,8008140.429557085,6116769.711647987,1.1973045160133362E7,7756254.925520897,5352978.574095726,1.373399613500309E7,1.6418794411203384E7,1.7212743279764652E7,1.704270732417488E7,3.43049358570323E7,3.532416421229005E7,15.30301560102066,12.890698882477594,9.846160563729589,19.273003667109915,12.485238936569628,8.61668642427125,22.107605403121994,26.429323590150222,27.707342611261865,27.433635834765774,55.22063482847413,56.86128610521969,3.0534943928382874E7,2.4481686250203133E7,2.217871080008793E7,2.569579825610423E7,2.995490355044937E7,1.8084140250833035E7,3.0805576178061485E7,4.7156887432252884E7,5.115858869637826E7,5.5759943171424866E7,8.625354428184557E7,8.345155532035494E7]
[Conventional childr,977787,Fairview,Williamson County,TN,United 
States,DHL,BARIAN,2001,8860645.460736752,1.441581376543355E7,6761497.232810497,1.1820654735879421E7,8246260.600341797,6636877.482845306,1.1434492123092413E7,2.5673812070380323E7,2.307420611785E7,2.1834582007320404E7,2.6894900596512794E7,3.357509177109933E7,9.061938296108202,14.743306840276613,6.9151024024767125,12.08919195681618,8.43359606984118,6.787651587559771,11.694256645969329,26.257060147435304,23.598398219562938,22.330611889215547,27.505888906799534,34.337838170377935,2.3836085704864502E7,3.20733132298584E7,2.503790437837982E7,2.2659895963564873E7,2.175740087420273E7,2.4451608012176514E7,2.1933001734852314E7,5.59967034604629E7,5.737188052299309E7,6.208721474336243E7,8.284991027382469E7,8.897031933202875E7]
[Doors canno,294242,Fairview,Williamson County,TN,United 
States,DHL,BARIAN,2001,6355232.185385704,1.0198920296742141E7,1.0246200903741479E7,1.2209716492156029E7,8566998.262890816,8806316.75278151,9789405.6993227,1.646658496404171E7,2.6443785668474197E7,2.701604788320923E7,3.366058958298761E7,2.7462468750599384E7,21.59865751791282,34.66167405313361,34.822360178837414,41.495491779406166,29.115484067165177,29.928823053070296,33.26991285854059,55.96272783641258,89.87087386734116,91.81574310672585,114.39763726112386,93.33293258813964,2.2645142994330406E7,2.448725452685547E7,2.4925759290207863E7,3.0503655031727314E7,2.6558160276379585E7,2.0976233452690125E7,2.9895796101181984E7,5.600219855566597E7,5.348815865275085E7,7.628723580410767E7,8.248374754962921E7,8.808826726185608E7]
[Important issues liv,138504,Fairview,Williamson County,TN,United 
States,DHL,BARIAN,2001,1.1748784594717264E7,1.435130566355586E7,9896470.867572784,7990874.805492401,8879247.840401173,7362383.04259038,1.0011144724414349E7,1.7741201390372872E7,2.1346976135887742E7,1.8074978020030975E7,2.967512567988676E7,3.2545325348875403E7,84.8263197793368,103.6165429414014,71.45259969078715,57.694180713137534,64.10824120892663,53.156465102743454,72.28054586448297,128.09161750110374,154.12534032149065,130.5014874662896,214.25464737398747,234.97751219369408,2.7204167203903973E7,2.598037822457385E7,1.9943398915802002E7,2.5710421112384796E7,1.948448105346489E7,2.6346611484448195E7,2.5075158296625137E7,5.409477817043829E7,4.106673223178029E7,5.454705814340496E7,7.246596285337901E7,9.277032812079096E7]
{noformat}

Expected results:
{noformat}

[jira] [Updated] (SPARK-13862) TPCDS query 49 returns wrong results compared to TPC official result set

2016-03-14 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13862:

Description: 
Testing Spark SQL using TPC queries. Query 49 returns wrong results compared to 
official result set. This is at 1GB SF (validation run).

SparkSQL has right answer but in wrong order (and there is an 'order by' in the 
query).

Actual results:
{noformat}
store,9797,0.8000,2,2]
[store,12641,0.81609195402298850575,3,3]
[store,6661,0.92207792207792207792,7,7]
[store,13013,0.94202898550724637681,8,8]
[store,9029,1.,10,10]
[web,15597,0.66197183098591549296,3,3]
[store,14925,0.96470588235294117647,9,9]
[store,4063,1.,10,10]
[catalog,8929,0.7625,7,7]
[store,11589,0.82653061224489795918,6,6]
[store,1171,0.82417582417582417582,5,5]
[store,9471,0.7750,1,1]
[catalog,12577,0.65591397849462365591,3,3]
[web,97,0.90361445783132530120,9,8]
[web,85,0.85714285714285714286,8,7]
[catalog,361,0.74647887323943661972,5,5]
[web,2915,0.69863013698630136986,4,4]
[web,117,0.9250,10,9]
[catalog,9295,0.77894736842105263158,9,9]
[web,3305,0.7375,6,16]
[catalog,16215,0.79069767441860465116,10,10]
[web,7539,0.5900,1,1]
[catalog,17543,0.57142857142857142857,1,1]
[catalog,3411,0.71641791044776119403,4,4]
[web,11933,0.71717171717171717172,5,5]
[catalog,14513,0.63541667,2,2]
[store,15839,0.81632653061224489796,4,4]
[web,3337,0.62650602409638554217,2,2]
[web,5299,0.92708333,11,10]
[catalog,8189,0.74698795180722891566,6,6]
[catalog,14869,0.77173913043478260870,8,8]
[web,483,0.8000,7,6]
{noformat}


Expected results:
{noformat}
+-+---++-+---+
| CHANNEL |  ITEM |   RETURN_RATIO | RETURN_RANK | CURRENCY_RANK |
+-+---++-+---+
| catalog | 17543 |  .5714285714285714 |   1 | 1 |
| catalog | 14513 |  .63541666 |   2 | 2 |
| catalog | 12577 |  .6559139784946236 |   3 | 3 |
| catalog |  3411 |  .7164179104477611 |   4 | 4 |
| catalog |   361 |  .7464788732394366 |   5 | 5 |
| catalog |  8189 |  .7469879518072289 |   6 | 6 |
| catalog |  8929 |  .7625 |   7 | 7 |
| catalog | 14869 |  .7717391304347826 |   8 | 8 |
| catalog |  9295 |  .7789473684210526 |   9 | 9 |
| catalog | 16215 |  .7906976744186046 |  10 |10 |
| store   |  9471 |  .7750 |   1 | 1 |
| store   |  9797 |  .8000 |   2 | 2 |
| store   | 12641 |  .8160919540229885 |   3 | 3 |
| store   | 15839 |  .8163265306122448 |   4 | 4 |
| store   |  1171 |  .8241758241758241 |   5 | 5 |
| store   | 11589 |  .8265306122448979 |   6 | 6 |
| store   |  6661 |  .9220779220779220 |   7 | 7 |
| store   | 13013 |  .9420289855072463 |   8 | 8 |
| store   | 14925 |  .9647058823529411 |   9 | 9 |
| store   |  4063 | 1. |  10 |10 |
| store   |  9029 | 1. |  10 |10 |
| web |  7539 |  .5900 |   1 | 1 |
| web |  3337 |  .6265060240963855 |   2 | 2 |
| web | 15597 |  .6619718309859154 |   3 | 3 |
| web |  2915 |  .6986301369863013 |   4 | 4 |
| web | 11933 |  .7171717171717171 |   5 | 5 |
| web |  3305 |  .7375 |   6 |16 |
| web |   483 |  .8000 |   7 | 6 |
| web |85 |  .8571428571428571 |   8 | 7 |
| web |97 |  .9036144578313253 |   9 | 8 |
| web |   117 |  .9250 |  10 | 9 |
| web |  5299 |  .92708333 |  11 |10 |
+-+---++-+---+
{noformat}

Query used:
{noformat}
-- start query 49 in stream 0 using template query49.tpl and seed QUALIFICATION
  select  
 'web' as channel
 ,web.item
 ,web.return_ratio
 ,web.return_rank
 ,web.currency_rank
 from (
select 
 item
,return_ratio
,currency_ratio
,rank() over (order by return_ratio) as return_rank
,rank() over (order by currency_ratio) as currency_rank
from
(   select ws.ws_item_sk as item
,(cast(sum(coalesce(wr.wr_return_quantity,0)) as decimal(15,4))/
cast(sum(coalesce(ws.ws_quantity,0)) as decimal(15,4) )) as

[jira] [Updated] (SPARK-13865) TPCDS query 87 returns wrong results compared to TPC official result set

2016-03-14 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13865:

Description: 
Testing Spark SQL using TPC queries. Query 87 returns wrong results compared to 
official result set. This is at 1GB SF (validation run).

SparkSQL returns count of 47555, answer set expects 47298.

Actual results:
{noformat}
[47555]
{noformat}

{noformat}
Expected:
+---+
| 1 |
+---+
| 47298 |
+---+
{noformat}

Query used:
{noformat}
-- start query 87 in stream 0 using template query87.tpl and seed QUALIFICATION
select count(*) 
from 
 (select distinct c_last_name as cln1, c_first_name as cfn1, d_date as 
ddate1, 1 as notnull1
   from store_sales
JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
   where
 d_month_seq between 1200 and 1200+11
   ) tmp1
   left outer join
  (select distinct c_last_name as cln2, c_first_name as cfn2, d_date as 
ddate2, 1 as notnull2
   from catalog_sales
JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
JOIN customer ON catalog_sales.cs_bill_customer_sk = 
customer.c_customer_sk
   where 
 d_month_seq between 1200 and 1200+11
   ) tmp2 
  on (tmp1.cln1 = tmp2.cln2)
  and (tmp1.cfn1 = tmp2.cfn2)
  and (tmp1.ddate1= tmp2.ddate2)
   left outer join
  (select distinct c_last_name as cln3, c_first_name as cfn3 , d_date as 
ddate3, 1 as notnull3
   from web_sales
JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk
   where 
 d_month_seq between 1200 and 1200+11
   ) tmp3 
  on (tmp1.cln1 = tmp3.cln3)
  and (tmp1.cfn1 = tmp3.cfn3)
  and (tmp1.ddate1= tmp3.ddate3)
where  
notnull2 is null and notnull3 is null  
;
-- end query 87 in stream 0 using template query87.tpl
{noformat}



  was:
Testing Spark SQL using TPC queries. Query 87 returns wrong results compared to 
official result set. This is at 1GB SF (validation run).

SparkSQL returns count of 47555, answer set expects 47298.

Actual results:
[47555]


Expected:
+---+
| 1 |
+---+
| 47298 |
+---+

Query used:
-- start query 87 in stream 0 using template query87.tpl and seed QUALIFICATION
select count(*) 
from 
 (select distinct c_last_name as cln1, c_first_name as cfn1, d_date as 
ddate1, 1 as notnull1
   from store_sales
JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
   where
 d_month_seq between 1200 and 1200+11
   ) tmp1
   left outer join
  (select distinct c_last_name as cln2, c_first_name as cfn2, d_date as 
ddate2, 1 as notnull2
   from catalog_sales
JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
JOIN customer ON catalog_sales.cs_bill_customer_sk = 
customer.c_customer_sk
   where 
 d_month_seq between 1200 and 1200+11
   ) tmp2 
  on (tmp1.cln1 = tmp2.cln2)
  and (tmp1.cfn1 = tmp2.cfn2)
  and (tmp1.ddate1= tmp2.ddate2)
   left outer join
  (select distinct c_last_name as cln3, c_first_name as cfn3 , d_date as 
ddate3, 1 as notnull3
   from web_sales
JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk
   where 
 d_month_seq between 1200 and 1200+11
   ) tmp3 
  on (tmp1.cln1 = tmp3.cln3)
  and (tmp1.cfn1 = tmp3.cfn3)
  and (tmp1.ddate1= tmp3.ddate3)
where  
notnull2 is null and notnull3 is null  
;
-- end query 87 in stream 0 using template query87.tpl




> TPCDS query 87 returns wrong results compared to TPC official result set 
> -
>
> Key: SPARK-13865
> URL: https://issues.apache.org/jira/browse/SPARK-13865
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: JESSE CHEN
>  Labels: tpcds-result-mismatch
>
> Testing Spark SQL using TPC queries. Query 87 returns wrong results compared 
> to official result set. This is at 1GB SF (validation run).
> SparkSQL returns count of 47555, answer set expects 47298.
> Actual results:
> {noformat}
> [47555]
> {noformat}
> {noformat}
> Expected:
> +---+
> | 1 |
> +---+
> | 47298 |
> +---+
> {noformat}
> Query used:
> {noformat}
> -- start query 87 in stream 0 using template query87.tpl and seed 
> QUALIFICATION
> select count(*) 
> from 
>  (select distinct c_last_name as cln1, c_first_name as cfn1, d_date as 
> ddate1, 1 as notnull1
>from

[jira] [Updated] (SPARK-13864) TPCDS query 74 returns wrong results compared to TPC official result set

2016-03-14 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13864:

Description: 
Testing Spark SQL using TPC queries. Query 74 returns wrong results compared to 
official result set. This is at 1GB SF (validation run).

Spark SQL has right answer but in wrong order (and there is an 'order by' in 
the query).

Actual results:
{noformat}
[BLEIBAAA,Paula,Wakefield]
[DFIEBAAA,John,Gray]
[OCLBBAAA,null,null]
[PKBCBAAA,Andrea,White]
[EJDL,Alice,Wright]
[FACE,Priscilla,Miller]
[LFKK,Ignacio,Miller]
[LJNCBAAA,George,Gamez]
[LIOP,Derek,Allen]
[EADJ,Ruth,Carroll]
[JGMM,Richard,Larson]
[PKIK,Wendy,Horvath]
[FJHF,Larissa,Roy]
[EPOG,Felisha,Mendes]
[EKJL,Aisha,Carlson]
[HNFH,Rebecca,Wilson]
[IBFCBAAA,Ruth,Grantham]
[OPDL,Ann,Pence]
[NIPL,Eric,Lawrence]
[OCIC,Zachary,Pennington]
[OFLC,James,Taylor]
[GEHI,Tyler,Miller]
[CADP,Cristobal,Thomas]
[JIAL,Santos,Gutierrez]
[PMMBBAAA,Paul,Jordan]
[DIIO,David,Carroll]
[DFKABAAA,Latoya,Craft]
[HMOI,Grace,Henderson]
[PPIBBAAA,Candice,Lee]
[JONHBAAA,Warren,Orozco]
[GNDA,Terry,Mcdowell]
[CIJM,Elizabeth,Thomas]
[DIJGBAAA,Ruth,Sanders]
[NFBDBAAA,Vernice,Fernandez]
[IDKF,Michael,Mack]
[IMHB,Kathy,Knowles]
[LHMC,Brooke,Nelson]
[CFCGBAAA,Marcus,Sanders]
[NJHCBAAA,Christopher,Schreiber]
[PDFB,Terrance,Banks]
[ANFA,Philip,Banks]
[IADEBAAA,Diane,Aldridge]
[ICHF,Linda,Mccoy]
[CFEN,Christopher,Dawson]
[KOJJ,Gracie,Mendoza]
[FOJA,Don,Castillo]
[FGPG,Albert,Wadsworth]
[KJBK,Georgia,Scott]
[EKFP,Annika,Chin]
[IBAEBAAA,Sandra,Wilson]
[MFFL,Margret,Gray]
[KNAK,Gladys,Banks]
[CJDI,James,Kerr]
[OBADBAAA,Elizabeth,Burnham]
[AMGD,Kenneth,Harlan]
[HJLA,Audrey,Beltran]
[AOPFBAAA,Jerry,Fields]
[CNAGBAAA,Virginia,May]
[HGOABAAA,Sonia,White]
[KBCABAAA,Debra,Bell]
[NJAG,Allen,Hood]
[MMOBBAAA,Margaret,Smith]
[NGDBBAAA,Carlos,Jewell]
[FOGI,Michelle,Greene]
[JEKFBAAA,Norma,Burkholder]
[OCAJ,Jenna,Staton]
[PFCL,Felicia,Neville]
[DLHBBAAA,Henry,Bertrand]
[DBEFBAAA,Bennie,Bowers]
[DCKO,Robert,Gonzalez]
[KKGE,Katie,Dunbar]
[GFMDBAAA,Kathleen,Gibson]
[IJEM,Charlie,Cummings]
[KJBL,Kerry,Davis]
[JKBN,Julie,Kern]
[MDCA,Louann,Hamel]
[EOAK,Molly,Benjamin]
[IBHH,Jennifer,Ballard]
[PJEN,Ashley,Norton]
[KLHHBAAA,Manuel,Castaneda]
[IMHHBAAA,Lillian,Davidson]
[GHPBBAAA,Nick,Mendez]
[BNBB,Irma,Smith]
[FBAH,Michael,Williams]
[PEHEBAAA,Edith,Molina]
[FMHI,Emilio,Darling]
[KAEC,Milton,Mackey]
[OCDJ,Nina,Sanchez]
[FGIG,Eduardo,Miller]
[FHACBAAA,null,null]
[HMJN,Ryan,Baptiste]
[HHCABAAA,William,Stewart]
{noformat}


Expected results:
{noformat}
+--+-++
| CUSTOMER_ID  | CUSTOMER_FIRST_NAME | CUSTOMER_LAST_NAME |
+--+-++
| AMGD | Kenneth | Harlan |
| ANFA | Philip  | Banks  |
| AOPFBAAA | Jerry   | Fields |
| BLEIBAAA | Paula   | Wakefield  |
| BNBB | Irma| Smith  |
| CADP | Cristobal   | Thomas |
| CFCGBAAA | Marcus  | Sanders|
| CFEN | Christopher | Dawson |
| CIJM | Elizabeth   | Thomas |
| CJDI | James   | Kerr   |
| CNAGBAAA | Virginia| May|
| DBEFBAAA | Bennie  | Bowers |
| DCKO | Robert  | Gonzalez   |
| DFIEBAAA | John| Gray   |
| DFKABAAA | Latoya  | Craft  |
| DIIO | David   | Carroll|
| DIJGBAAA | Ruth| Sanders|
| DLHBBAAA | Henry   | Bertrand   |
| EADJ | Ruth| Carroll|
| EJDL |

[jira] [Updated] (SPARK-13531) Some DataFrame joins stopped working with UnsupportedOperationException: No size estimation available for objects

2016-03-14 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-13531:
-
Target Version/s: 2.0.0

> Some DataFrame joins stopped working with UnsupportedOperationException: No 
> size estimation available for objects
> -
>
> Key: SPARK-13531
> URL: https://issues.apache.org/jira/browse/SPARK-13531
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: koert kuipers
>Priority: Minor
>
> this is using spark 2.0.0-SNAPSHOT
> dataframe df1:
> schema:
> {noformat}StructType(StructField(x,IntegerType,true)){noformat}
> explain:
> {noformat}== Physical Plan ==
> MapPartitions , obj#135: object, [if (input[0, object].isNullAt) 
> null else input[0, object].get AS x#128]
> +- MapPartitions , createexternalrow(if (isnull(x#9)) null else 
> x#9), [input[0, object] AS obj#135]
>+- WholeStageCodegen
>   :  +- Project [_1#8 AS x#9]
>   : +- Scan ExistingRDD[_1#8]{noformat}
> show:
> {noformat}+---+
> |  x|
> +---+
> |  2|
> |  3|
> +---+{noformat}
> dataframe df2:
> schema:
> {noformat}StructType(StructField(x,IntegerType,true), 
> StructField(y,StringType,true)){noformat}
> explain:
> {noformat}== Physical Plan ==
> MapPartitions , createexternalrow(x#2, if (isnull(y#3)) null else 
> y#3.toString), [if (input[0, object].isNullAt) null else input[0, object].get 
> AS x#130,if (input[0, object].isNullAt) null else staticinvoke(class 
> org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, 
> object].get, true) AS y#131]
> +- WholeStageCodegen
>:  +- Project [_1#0 AS x#2,_2#1 AS y#3]
>: +- Scan ExistingRDD[_1#0,_2#1]{noformat}
> show:
> {noformat}+---+---+
> |  x|  y|
> +---+---+
> |  1|  1|
> |  2|  2|
> |  3|  3|
> +---+---+{noformat}
> i run:
> df1.join(df2, Seq("x")).show
> i get:
> {noformat}java.lang.UnsupportedOperationException: No size estimation 
> available for objects.
> at org.apache.spark.sql.types.ObjectType.defaultSize(ObjectType.scala:41)
> at 
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
> at 
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
> at scala.collection.immutable.List.map(List.scala:285)
> at 
> org.apache.spark.sql.catalyst.plans.logical.UnaryNode.statistics(LogicalPlan.scala:323)
> at 
> org.apache.spark.sql.execution.SparkStrategies$CanBroadcast$.unapply(SparkStrategies.scala:87){noformat}
> now sure what changed, this ran about a week ago without issues (in our 
> internal unit tests). it is fully reproducible, however when i tried to 
> minimize the issue i could not reproduce it by just creating data frames in 
> the repl with the same contents, so it probably has something to do with way 
> these are created (from Row objects and StructTypes).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8360) Structured Streaming (aka Streaming DataFrames)

2016-03-14 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-8360:
---
Summary: Structured Streaming (aka Streaming DataFrames)  (was: Streaming 
DataFrames)

> Structured Streaming (aka Streaming DataFrames)
> ---
>
> Key: SPARK-8360
> URL: https://issues.apache.org/jira/browse/SPARK-8360
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL, Streaming
>Reporter: Reynold Xin
> Attachments: 
> StructuredStreamingProgrammingAbstractionSemanticsandAPIs-ApacheJIRA.pdf
>
>
> Umbrella ticket to track what's needed to make streaming DataFrame a reality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-13135) Don't print expressions recursively in generated code

2016-03-14 Thread Davies Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reopened SPARK-13135:


That PR is not merged.

> Don't print expressions recursively in generated code
> -
>
> Key: SPARK-13135
> URL: https://issues.apache.org/jira/browse/SPARK-13135
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> Our code generation currently prints expressions recursively. For example, 
> for expression "(1 + 1) + 1)", we would print the following:
> "(1 + 1) + 1)"
> "(1 + 1)"
> "1"
> "1"
> We should just print the project list once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13878) Window functions failed in cluster

2016-03-14 Thread Davies Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-13878.

Resolution: Duplicate

> Window functions failed in cluster
> --
>
> Key: SPARK-13878
> URL: https://issues.apache.org/jira/browse/SPARK-13878
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Davies Liu
>
> When cume_dist is used, we got the following error.
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in 
> stage 102.0 failed 4 times, most recent failure: Lost task 10.3 in stage 
> 102.0 (TID 8448, ip-10-216-233-112.eu-west-1.compute.internal): 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: window__partition__size#206
> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:86)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:85)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:321)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:321)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:249)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:85)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:62)
>   at 
>

[jira] [Updated] (SPARK-13878) Window functions failed in cluster

2016-03-14 Thread Davies Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-13878:
---
Description: 
When cume_dist is used, we got the following error.

{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in 
stage 102.0 failed 4 times, most recent failure: Lost task 10.3 in stage 102.0 
(TID 8448, ip-10-216-233-112.eu-west-1.compute.internal): 
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
attribute, tree: window__partition__size#206


at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:86)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:85)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:321)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:321)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:249)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:85)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:62)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:62)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at

1 2 3 4 >

1 - 100 of 331 matches

Mail list logo