[jira] [Commented] (SPARK-15072) Remove SparkSession.withHiveSupport

2016-05-03 Thread Sagar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270155#comment-15270155
 ] 

Sagar commented on SPARK-15072:
---

[~techaddict]  Yes it fails as assembly/assembly removed, test is ignored right 
now, means they are not considering it or what?

> Remove SparkSession.withHiveSupport
> ---
>
> Key: SPARK-15072
> URL: https://issues.apache.org/jira/browse/SPARK-15072
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Sandeep Singh
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15107) Allow running test cases with different iterations in micro-benchmark util

2016-05-03 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15107.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Allow running test cases with different iterations in micro-benchmark util
> --
>
> Key: SPARK-15107
> URL: https://issues.apache.org/jira/browse/SPARK-15107
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13946) PySpark DataFrames allows you to silently use aggregate expressions derived from different table expressions

2016-05-03 Thread Niranjan Molkeri` (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270151#comment-15270151
 ] 

Niranjan Molkeri` commented on SPARK-13946:
---

Hi, I ran the following code. 

{noformat}
import numpy as np
import pandas as pd

from pyspark import SparkContext
from pyspark.sql import SQLContext

sc = SparkContext(appName="fooAPP")
sqlContext = SQLContext(sc)



df = pd.DataFrame({'foo': np.random.randn(100),'bar': 
np.random.randn(100)})

sdf = sqlContext.createDataFrame(df)

sdf2 = sdf[sdf.bar > 0]

#sdf.agg(F.count(sdf2.foo)).show()

sdfCount = sdf.count()

sdf2Count = sdf2.count()

{noformat}

sdf.count() returns 100

sdf2.count() returns avg around 50

can you tell me what is "F" in
{noformat}
sdf.agg(F.count(sdf2.foo)).show()
{noformat}

So that I can further test have a look into the issue. 

Thank you. 


> PySpark DataFrames allows you to silently use aggregate expressions derived 
> from different table expressions
> 
>
> Key: SPARK-13946
> URL: https://issues.apache.org/jira/browse/SPARK-13946
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Wes McKinney
>
> In my opinion, this code should raise an exception rather than silently 
> discarding the predicate:
> {code}
> import numpy as np
> import pandas as pd
> df = pd.DataFrame({'foo': np.random.randn(100),
>'bar': np.random.randn(100)})
> sdf = sqlContext.createDataFrame(df)
> sdf2 = sdf[sdf.bar > 0]
> sdf.agg(F.count(sdf2.foo)).show()
> +--+
> |count(foo)|
> +--+
> |   100|
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14817) ML, Graph, R 2.0 QA: Programming guide update and migration guide

2016-05-03 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270123#comment-15270123
 ] 

Felix Cheung commented on SPARK-14817:
--

perhaps this SPARK-12071 should be included?

> ML, Graph, R 2.0 QA: Programming guide update and migration guide
> -
>
> Key: SPARK-14817
> URL: https://issues.apache.org/jira/browse/SPARK-14817
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib, SparkR
>Reporter: Joseph K. Bradley
>
> Before the release, we need to update the MLlib, GraphX, and SparkR 
> Programming Guides.  Updates will include:
> * Add migration guide subsection.
> ** Use the results of the QA audit JIRAs and [SPARK-13448].
> * Check phrasing, especially in main sections (for outdated items such as "In 
> this release, ...")
> For MLlib, we will make the DataFrame-based API (spark.ml) front-and-center, 
> to make it clear the RDD-based API is the older, maintenance-mode one.
> * No docs for spark.mllib will be deleted; they will just be reorganized and 
> put in a subsection.
> * If spark.ml docs are less complete, or if spark.ml docs say "refer to the 
> spark.mllib docs for details," then we should copy those details to the 
> spark.ml docs.  This per-feature work can happen under [SPARK-14815].
> * This big reorganization should be done *after* docs are added for each 
> feature (to minimize merge conflicts).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14385) Use FunctionIdentifier in FunctionRegistry/SessionCatalog

2016-05-03 Thread Niranjan Molkeri` (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270121#comment-15270121
 ] 

Niranjan Molkeri` commented on SPARK-14385:
---

Hi, I would like to take a look at the problem. Can give me further details on 
how to proceed with the bug. 

Thank you. 

> Use FunctionIdentifier in FunctionRegistry/SessionCatalog
> -
>
> Key: SPARK-14385
> URL: https://issues.apache.org/jira/browse/SPARK-14385
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>
> Right now it's confusing what's a qualified name or not. There's little 
> type-safety in this corner of the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15072) Remove SparkSession.withHiveSupport

2016-05-03 Thread Sandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270122#comment-15270122
 ] 

Sandeep Singh commented on SPARK-15072:
---

[~snanda] the first build/sbt will fail coz assembly/assembly was removed. 
secondly we don't need to fix this in this PR since the test is ignored right 
now.

> Remove SparkSession.withHiveSupport
> ---
>
> Key: SPARK-15072
> URL: https://issues.apache.org/jira/browse/SPARK-15072
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Sandeep Singh
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14539) Fetching delegation tokens in Hive-Thriftserver fails when hive.server2.enable.doAs = True

2016-05-03 Thread Niranjan Molkeri` (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270113#comment-15270113
 ] 

Niranjan Molkeri` commented on SPARK-14539:
---

Hi, Can i know which hive version are using? 

> Fetching delegation tokens in Hive-Thriftserver fails when 
> hive.server2.enable.doAs = True
> --
>
> Key: SPARK-14539
> URL: https://issues.apache.org/jira/browse/SPARK-14539
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Trystan Leftwich
>
> Similar to https://issues.apache.org/jira/browse/SPARK-13478
> When you are running Hive Thriftserver and have hive.server2.enable.doAs = 
> True you will get 
> {code}
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14813) ML 2.0 QA: API: Python API coverage

2016-05-03 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270107#comment-15270107
 ] 

holdenk commented on SPARK-14813:
-

While starting to do this audit, a number of params are missing but I'm 
assuming this is expected until Spark 2.1 (see 
https://issues.apache.org/jira/browse/SPARK-10931 )

> ML 2.0 QA: API: Python API coverage
> ---
>
> Key: SPARK-14813
> URL: https://issues.apache.org/jira/browse/SPARK-14813
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML, PySpark
>Reporter: Joseph K. Bradley
>
> For new public APIs added to MLlib, we need to check the generated HTML doc 
> and compare the Scala & Python versions.  We need to track:
> * Inconsistency: Do class/method/parameter names match?
> * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
> be as complete as the Scala doc.
> * API breaking changes: These should be very rare but are occasionally either 
> necessary (intentional) or accidental.  These must be recorded and added in 
> the Migration Guide for this release.
> ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
> component, please note that as well.
> * Missing classes/methods/parameters: We should create to-do JIRAs for 
> functionality missing from Python, to be added in the next release cycle.  
> Please use a *separate* JIRA (linked below) for this list of to-do items.
> UPDATE: This only needs to cover spark.ml since spark.mllib is going into 
> maintenance mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2016-05-03 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270105#comment-15270105
 ] 

holdenk commented on SPARK-10931:
-

So, just to be certain, for https://issues.apache.org/jira/browse/SPARK-14813 
we won't try and resolve the params not being present in the models?

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15110) SparkR - Implement repartitionByColumn on DataFrame

2016-05-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270104#comment-15270104
 ] 

Apache Spark commented on SPARK-15110:
--

User 'NarineK' has created a pull request for this issue:
https://github.com/apache/spark/pull/12887

> SparkR - Implement repartitionByColumn on DataFrame
> ---
>
> Key: SPARK-15110
> URL: https://issues.apache.org/jira/browse/SPARK-15110
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Narine Kokhlikyan
>
> Implement repartitionByColumn on DataFrame.
> This will allow us to run R functions on each partition identified by column 
> groups with dapply() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15110) SparkR - Implement repartitionByColumn on DataFrame

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15110:


Assignee: Apache Spark

> SparkR - Implement repartitionByColumn on DataFrame
> ---
>
> Key: SPARK-15110
> URL: https://issues.apache.org/jira/browse/SPARK-15110
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Narine Kokhlikyan
>Assignee: Apache Spark
>
> Implement repartitionByColumn on DataFrame.
> This will allow us to run R functions on each partition identified by column 
> groups with dapply() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15110) SparkR - Implement repartitionByColumn on DataFrame

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15110:


Assignee: (was: Apache Spark)

> SparkR - Implement repartitionByColumn on DataFrame
> ---
>
> Key: SPARK-15110
> URL: https://issues.apache.org/jira/browse/SPARK-15110
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Narine Kokhlikyan
>
> Implement repartitionByColumn on DataFrame.
> This will allow us to run R functions on each partition identified by column 
> groups with dapply() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15110) SparkR - Implement repartitionByColumn on DataFrame

2016-05-03 Thread Narine Kokhlikyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Narine Kokhlikyan updated SPARK-15110:
--
Description: 
Implement repartitionByColumn on DataFrame.

This will allow us to run R functions on each partition identified by column 
groups with dapply() method.

  was:
Implement repartitionByColumn on DataFrame.

This will allow us to run R functions on each partition with dapply() method.


> SparkR - Implement repartitionByColumn on DataFrame
> ---
>
> Key: SPARK-15110
> URL: https://issues.apache.org/jira/browse/SPARK-15110
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Narine Kokhlikyan
>
> Implement repartitionByColumn on DataFrame.
> This will allow us to run R functions on each partition identified by column 
> groups with dapply() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15110) SparkR - Implement repartitionByColumn on DataFrame

2016-05-03 Thread Narine Kokhlikyan (JIRA)
Narine Kokhlikyan created SPARK-15110:
-

 Summary: SparkR - Implement repartitionByColumn on DataFrame
 Key: SPARK-15110
 URL: https://issues.apache.org/jira/browse/SPARK-15110
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Narine Kokhlikyan


Implement repartitionByColumn on DataFrame.

This will allow us to run R functions on each partition with dapply() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11148) Unable to create views

2016-05-03 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270069#comment-15270069
 ] 

Yin Huai commented on SPARK-11148:
--

Hi [~lunendl], we have cut the 2.0 branch and we are in QA period right now. 
Based on  our schedule 
(https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage), early June 
is a good estimation.

> Unable to create views
> --
>
> Key: SPARK-11148
> URL: https://issues.apache.org/jira/browse/SPARK-11148
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Ubuntu 14.04
> Spark-1.5.1-bin-hadoop2.6
> (I don't have Hadoop or Hive installed)
> Start spark-all.sh and thriftserver with mysql jar driver
>Reporter: Lunen
>Priority: Critical
> Fix For: 2.0.0
>
>
> I am unable to create views within spark SQL. 
> Creating tables without specifying the column names work. eg.
> CREATE TABLE trade2 
> USING org.apache.spark.sql.jdbc
> OPTIONS ( 
> url "jdbc:mysql://192.168.30.191:3318/?user=root", 
> dbtable "database.trade", 
> driver "com.mysql.jdbc.Driver" 
> );
> Ceating tables with datatypes gives an error:
> CREATE TABLE trade2( 
> COL1 timestamp, 
> COL2 STRING, 
> COL3 STRING) 
> USING org.apache.spark.sql.jdbc 
> OPTIONS (
>   url "jdbc:mysql://192.168.30.191:3318/?user=root",   
>   dbtable "database.trade",   
>   driver "com.mysql.jdbc.Driver" 
> );
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.spark.sql.execution.datasources.jdbc.DefaultSource does not allow 
> user-specified schemas.; SQLState: null ErrorCode: 0
> Trying to create a VIEW from the table that was created.(The select statement 
> below returns data)
> CREATE VIEW viewtrade as Select Col1 from trade2;
> Error: org.apache.spark.sql.execution.QueryExecutionException: FAILED: 
> SemanticException [Error 10004]: Line 1:30 Invalid table alias or column 
> reference 'Col1': (possible column names are: col)
> SQLState:  null
> ErrorCode: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15109) Accept Dataset[_] in joins

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15109:


Assignee: Apache Spark  (was: Reynold Xin)

> Accept Dataset[_] in joins
> --
>
> Key: SPARK-15109
> URL: https://issues.apache.org/jira/browse/SPARK-15109
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15109) Accept Dataset[_] in joins

2016-05-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270068#comment-15270068
 ] 

Apache Spark commented on SPARK-15109:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/12886

> Accept Dataset[_] in joins
> --
>
> Key: SPARK-15109
> URL: https://issues.apache.org/jira/browse/SPARK-15109
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15109) Accept Dataset[_] in joins

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15109:


Assignee: Reynold Xin  (was: Apache Spark)

> Accept Dataset[_] in joins
> --
>
> Key: SPARK-15109
> URL: https://issues.apache.org/jira/browse/SPARK-15109
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13269) Expose more executor stats in stable status API

2016-05-03 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270067#comment-15270067
 ] 

Andrew Or commented on SPARK-13269:
---

Oops actually this was already done in SPARK-14069. Closing this as duplicate.

> Expose more executor stats in stable status API
> ---
>
> Key: SPARK-13269
> URL: https://issues.apache.org/jira/browse/SPARK-13269
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Andrew Or
> Fix For: 2.0.0
>
>
> Currently the stable status API is quite limited; it exposes only a small 
> subset of the things exposed by JobProgressListener. It is useful for very 
> high level querying but falls short when the developer wants to build an 
> application on top of Spark with more integration.
> In this issue I propose that we expose at least two things:
> - Which executors are running tasks, and
> - Which executors cached how much in memory and on disk
> The goal is not to expose exactly these two things, but to expose something 
> that would allow the developer to learn about them. These concepts are very 
> much fundamental in Spark's design so there's almost no chance that they will 
> go away in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13269) Expose more executor stats in stable status API

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13269.
---
   Resolution: Fixed
 Assignee: Wenchen Fan
Fix Version/s: 2.0.0

> Expose more executor stats in stable status API
> ---
>
> Key: SPARK-13269
> URL: https://issues.apache.org/jira/browse/SPARK-13269
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Andrew Or
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>
> Currently the stable status API is quite limited; it exposes only a small 
> subset of the things exposed by JobProgressListener. It is useful for very 
> high level querying but falls short when the developer wants to build an 
> application on top of Spark with more integration.
> In this issue I propose that we expose at least two things:
> - Which executors are running tasks, and
> - Which executors cached how much in memory and on disk
> The goal is not to expose exactly these two things, but to expose something 
> that would allow the developer to learn about them. These concepts are very 
> much fundamental in Spark's design so there's almost no chance that they will 
> go away in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15109) Accept Dataset[_] in joins

2016-05-03 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-15109:
---

 Summary: Accept Dataset[_] in joins
 Key: SPARK-15109
 URL: https://issues.apache.org/jira/browse/SPARK-15109
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15108) Function is Not Found when Describe Permanent UDTF

2016-05-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270039#comment-15270039
 ] 

Apache Spark commented on SPARK-15108:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/12885

> Function is Not Found when Describe Permanent UDTF
> --
>
> Key: SPARK-15108
> URL: https://issues.apache.org/jira/browse/SPARK-15108
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> When Describe UDTF, it returns a wrong result. The command is unable to find 
> the function, which has been created and cataloged in the catalog but not in 
> the functionRegistry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15108) Function is Not Found when Describe Permanent UDTF

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15108:


Assignee: Apache Spark

> Function is Not Found when Describe Permanent UDTF
> --
>
> Key: SPARK-15108
> URL: https://issues.apache.org/jira/browse/SPARK-15108
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> When Describe UDTF, it returns a wrong result. The command is unable to find 
> the function, which has been created and cataloged in the catalog but not in 
> the functionRegistry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15108) Function is Not Found when Describe Permanent UDTF

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15108:


Assignee: (was: Apache Spark)

> Function is Not Found when Describe Permanent UDTF
> --
>
> Key: SPARK-15108
> URL: https://issues.apache.org/jira/browse/SPARK-15108
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> When Describe UDTF, it returns a wrong result. The command is unable to find 
> the function, which has been created and cataloged in the catalog but not in 
> the functionRegistry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15108) Function is Not Found when Describe Permanent UDTF

2016-05-03 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-15108:

Summary: Function is Not Found when Describe Permanent UDTF  (was: Function 
is Not Found when Describe Permanent UDF)

> Function is Not Found when Describe Permanent UDTF
> --
>
> Key: SPARK-15108
> URL: https://issues.apache.org/jira/browse/SPARK-15108
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> When Describe UDF, it returns a wrong result. The command is unable to find 
> the function, which has been created and cataloged in the catalog but not in 
> the functionRegistry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15108) Function is Not Found when Describe Permanent UDTF

2016-05-03 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-15108:

Description: When Describe UDTF, it returns a wrong result. The command is 
unable to find the function, which has been created and cataloged in the 
catalog but not in the functionRegistry.  (was: When Describe UDF, it returns a 
wrong result. The command is unable to find the function, which has been 
created and cataloged in the catalog but not in the functionRegistry.)

> Function is Not Found when Describe Permanent UDTF
> --
>
> Key: SPARK-15108
> URL: https://issues.apache.org/jira/browse/SPARK-15108
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> When Describe UDTF, it returns a wrong result. The command is unable to find 
> the function, which has been created and cataloged in the catalog but not in 
> the functionRegistry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15108) Function is Not Found when Describe Permanent UDF

2016-05-03 Thread Xiao Li (JIRA)
Xiao Li created SPARK-15108:
---

 Summary: Function is Not Found when Describe Permanent UDF
 Key: SPARK-15108
 URL: https://issues.apache.org/jira/browse/SPARK-15108
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li


When Describe UDF, it returns a wrong result. The command is unable to find the 
function, which has been created and cataloged in the catalog but not in the 
functionRegistry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15089) kafka-spark consumer with SSL problem

2016-05-03 Thread JasonChang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270030#comment-15270030
 ] 

JasonChang commented on SPARK-15089:


Hi Sean
yes, broker  works with SSL
I run on kafka comsumer, it work
but kafka-spark consumer is not working
{code}
   public void consume(String topic, BiConsumer 
callback) {
Properties props = new Properties();
props.put("bootstrap.servers", kafkaHosts);
props.put("key.deserializer", 
org.apache.kafka.common.serialization.StringDeserializer.class);
props.put("value.deserializer", 
org.apache.kafka.common.serialization.StringDeserializer.class);
props.put("group.id", group);
props.put("security.protocol", "SSL");
props.put("ssl.truststore.location", 
"/opt/cert/client.truststore.jks");
props.put("ssl.truststore.password", "password");
props.put("ssl.keystore.location", "/opt/cert/keystore.jks");
props.put("ssl.keystore.password", "password");
props.put("ssl.key.password", "password");

try (KafkaConsumer consumer = new 
KafkaConsumer(props)) {
consumer.subscribe(Arrays.asList(topic));

while (!stopped.get()) {
ConsumerRecords records = 
consumer.poll(100);
for (ConsumerRecord record : 
records) {
System.out.println("<<< " + 
record.key() + ", " + record.value());
callback.accept(record.key(), 
record.value());
}
}
System.out.println("Finishing subscription to topic " + 
topic);
}
}

{code}

> kafka-spark consumer with SSL problem
> -
>
> Key: SPARK-15089
> URL: https://issues.apache.org/jira/browse/SPARK-15089
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.1
>Reporter: JasonChang
>
> I am not sure spark streaming support SSL
> I tried to add params to kafkaParams, but it not work
> {code}
> JavaStreamingContext jsc = new JavaStreamingContext(sparkConf, new 
> Duration(1));
> Set topicmap = new HashSet();
> topicmap.add(kafkaTopic);
> Map kafkaParams = new HashMap();
> kafkaParams.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, server_url);
> kafkaParams.put("security.protocol", "SSL");
> kafkaParams.put("ssl.keystore.type", "JKS");
> kafkaParams.put("ssl.keystore.location", "/opt/cert/keystore.jks");
> kafkaParams.put("ssl.keystore.password ", "password");
> kafkaParams.put("ssl.key.password", "password");
> kafkaParams.put("ssl.truststore.type", "JKS");
> kafkaParams.put("ssl.truststore.location", "/opt/cert/client.truststore.jks");
> kafkaParams.put("ssl.truststore.password", "password");
> kafkaParams.put(ConsumerConfig.GROUP_ID_CONFIG, kafkaTopic);
> JavaPairInputDStream stream = 
> KafkaUtils.createDirectStream(jsc,
>   String.class,
>   String.class,
>   StringDecoder.class,
>   StringDecoder.class,
>   kafkaParams,
>   topicmap
> );
> JavaDStream lines = stream.map(new Function, 
> String>() {
>   public String call(Tuple2 tuple2) {
>   return tuple2._2();
>   }
> });
> {code}
> {code}
> Exception in thread "main" org.apache.spark.SparkException: 
> java.io.EOFException: Received -1 when reading from channel, socket has 
> likely been closed.
>   at 
> org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
>   at 
> org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
>   at scala.util.Either.fold(Either.scala:97)
>   at 
> org.apache.spark.streaming.kafka.KafkaCluster$.checkErrors(KafkaCluster.scala:365)
>   at 
> org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:222)
>   at 
> org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
>   at 
> org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:607)
>   at 
> org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15072) Remove SparkSession.withHiveSupport

2016-05-03 Thread Sagar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270032#comment-15270032
 ] 

Sagar commented on SPARK-15072:
---

This helps to build test.jar  
$ ./build/sbt -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive-thriftserver -Phive 
package assembly/assembly streaming-kafka-assembly/assembly 
streaming-flume-assembly/assembly streaming-mqtt-assembly/assembly 
streaming-mqtt/test:assembly streaming-kinesis-asl-assembly/assembly
$ cd sql/hive/src/test/resources/regression-test-SPARK-8489/
$ scalac -classpath 
~/spark/assembly/target/scala-2.11/spark-assembly-2.0.0-SNAPSHOT-hadoop2.3.0.jar
 Main.scala MyCoolClass.scala
$ rm test.jar
$ jar cvf test.jar *.class
$ cd ~/spark
$ ~/bin/spark-submit' '--conf' 'spark.ui.enabled=false' '--conf' 
'spark.master.rest.enabled=false' '--driver-java-options' 
'-Dderby.system.durability=test' '--class' 'Main' 
'sql/hive/src/test/resources/regression-test-SPARK-8489/test.jar'

Let me know if you are still working on it.

> Remove SparkSession.withHiveSupport
> ---
>
> Key: SPARK-15072
> URL: https://issues.apache.org/jira/browse/SPARK-15072
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Sandeep Singh
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15032) When we create a new JDBC session, we may need to create a new session of executionHive

2016-05-03 Thread Sagar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270018#comment-15270018
 ] 

Sagar commented on SPARK-15032:
---

You are right! It is safer to create new session of executionHive while 
creating JDBC session but I think the problem is that it terminates the 
executionHive  process, let me know if you figured out other way, I can work on 
it.

> When we create a new JDBC session, we may need to create a new session of 
> executionHive
> ---
>
> Key: SPARK-15032
> URL: https://issues.apache.org/jira/browse/SPARK-15032
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Priority: Critical
>
> Right now, we only use executionHive in thriftserver. When we create a new 
> jdbc session, we probably need to create a new session of executionHive. I am 
> not sure what will break if we leave the code as is. But, I feel it will be 
> safer to create a new session of executionHive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15063) filtering and joining back doesn't work

2016-05-03 Thread Sagar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270012#comment-15270012
 ] 

Sagar commented on SPARK-15063:
---

What else is required to do it in new df for each filter can you elaborate?

> filtering and joining back doesn't work
> ---
>
> Key: SPARK-15063
> URL: https://issues.apache.org/jira/browse/SPARK-15063
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Neville Kadwa
>
> I'm trying to filter and join to do a simple pivot but getting very odd 
> results.
> {quote} {noformat}
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.implicits._
> val people = Array((1, "sam"), (2, "joe"), (3, "sally"), (4, "joanna"))
> val accounts = Array(
>   (1, "checking", 100.0),
>   (1, "savings", 300.0),
>   (2, "savings", 1000.0),
>   (3, "carloan", 12000.0),
>   (3, "checking", 400.0)
> )
> val t1 = sc.makeRDD(people).toDF("uid", "name")
> val t2 = sc.makeRDD(accounts).toDF("uid", "type", "amount")
> val t2c = t2.filter(t2("type") <=> "checking")
> val t2s = t2.filter(t2("type") <=> "savings")
> t1.
>   join(t2c, t1("uid") <=> t2c("uid"), "left").
>   join(t2s, t1("uid") <=> t2s("uid"), "left").
>   take(10)
> {noformat} {quote}
> The results are wrong:
> {quote} {noformat}
> Array(
>   [1,sam,1,checking,100.0,1,savings,300.0],
>   [1,sam,1,checking,100.0,2,savings,1000.0],
>   [2,joe,null,null,null,null,null,null],
>   [3,sally,3,checking,400.0,1,savings,300.0],
>   [3,sally,3,checking,400.0,2,savings,1000.0],
>   [4,joanna,null,null,null,null,null,null]
> )
> {noformat} {quote}
> The way I can force it to work properly is to create a new df for each filter:
> {quote} {noformat}
> val t2a = sc.makeRDD(accounts).toDF("uid", "type", "amount")
> val t2s = t2a.filter(t2a("type") <=> "savings")
> t1.
>   join(t2c, t1("uid") <=> t2c("uid"), "left").
>   join(t2s, t1("uid") <=> t2s("uid"), "left").
>   take(10)
> {noformat} {quote}
> The results are right:
> {quote} {noformat}
> Array(
>   [1,sam,1,checking,100.0,1,savings,300.0],
>   [2,joe,null,null,null,2,savings,1000.0],
>   [3,sally,3,checking,400.0,null,null,null],
>   [4,joanna,null,null,null,null,null,null]
> )
> {noformat} {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15086) Update Java API once the Scala one is finalized

2016-05-03 Thread Sagar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270007#comment-15270007
 ] 

Sagar commented on SPARK-15086:
---

In order to update Java API once Scala terminates. Please provide more 
information in order to make it work what else it includes.

> Update Java API once the Scala one is finalized
> ---
>
> Key: SPARK-15086
> URL: https://issues.apache.org/jira/browse/SPARK-15086
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
> Fix For: 2.0.0
>
>
> We should make sure we update the Java API once the Scala one is finalized. 
> This includes adding the equivalent API in Java as well as deprecating the 
> old ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15107) Allow running test cases with different iterations in micro-benchmark util

2016-05-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270003#comment-15270003
 ] 

Apache Spark commented on SPARK-15107:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/12884

> Allow running test cases with different iterations in micro-benchmark util
> --
>
> Key: SPARK-15107
> URL: https://issues.apache.org/jira/browse/SPARK-15107
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15107) Allow running test cases with different iterations in micro-benchmark util

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15107:


Assignee: Reynold Xin  (was: Apache Spark)

> Allow running test cases with different iterations in micro-benchmark util
> --
>
> Key: SPARK-15107
> URL: https://issues.apache.org/jira/browse/SPARK-15107
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15107) Allow running test cases with different iterations in micro-benchmark util

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15107:


Assignee: Apache Spark  (was: Reynold Xin)

> Allow running test cases with different iterations in micro-benchmark util
> --
>
> Key: SPARK-15107
> URL: https://issues.apache.org/jira/browse/SPARK-15107
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15107) Allow running test cases with different iterations in micro-benchmark util

2016-05-03 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-15107:
---

 Summary: Allow running test cases with different iterations in 
micro-benchmark util
 Key: SPARK-15107
 URL: https://issues.apache.org/jira/browse/SPARK-15107
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14645) non local Python resource doesn't work with Mesos cluster mode

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14645:
--
Assignee: Timothy Chen

> non local Python resource doesn't work with Mesos cluster mode
> --
>
> Key: SPARK-14645
> URL: https://issues.apache.org/jira/browse/SPARK-14645
> Project: Spark
>  Issue Type: Bug
>Reporter: Timothy Chen
>Assignee: Timothy Chen
> Fix For: 2.0.0
>
>
> Currently SparkSubmit explicitly allows non-local python resources for 
> cluster mode with Mesos, which it's actually supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14414) Make error messages consistent across DDLs

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14414.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Make error messages consistent across DDLs
> --
>
> Key: SPARK-14414
> URL: https://issues.apache.org/jira/browse/SPARK-14414
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> There are many different error messages right now when the user tries to run 
> something that's not supported. We might throw AnalysisException or 
> ParseException or NoSuchFunctionException etc. We should make all of these 
> consistent before 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15097) Import fails for someDataset.sqlContext.implicits._

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15097.
---
  Resolution: Fixed
Assignee: Koert Kuipers
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Import fails for someDataset.sqlContext.implicits._
> ---
>
> Key: SPARK-15097
> URL: https://issues.apache.org/jira/browse/SPARK-15097
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
> Environment: spark-2.0.0-SNAPSHOT
>Reporter: koert kuipers
>Assignee: Koert Kuipers
> Fix For: 2.0.0
>
>
> with the introduction of SparkSession SQLContext changed from being a lazy 
> val to a def inside Dataset. however this is troublesome if you want to do:
> import someDataset.sqlContext.implicits._
> you get this error:
> stable identifier required, but someDataset.sqlContext.implicits found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15084) Use builder pattern to create SparkSession in PySpark

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15084.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Use builder pattern to create SparkSession in PySpark
> -
>
> Key: SPARK-15084
> URL: https://issues.apache.org/jira/browse/SPARK-15084
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
> Fix For: 2.0.0
>
>
> This is a Python port of SPARK-15052.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14645) non local Python resource doesn't work with Mesos cluster mode

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14645.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> non local Python resource doesn't work with Mesos cluster mode
> --
>
> Key: SPARK-14645
> URL: https://issues.apache.org/jira/browse/SPARK-14645
> Project: Spark
>  Issue Type: Bug
>Reporter: Timothy Chen
> Fix For: 2.0.0
>
>
> Currently SparkSubmit explicitly allows non-local python resources for 
> cluster mode with Mesos, which it's actually supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14422) Improve handling of optional configs in SQLConf

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14422:
--
Assignee: Sandeep Singh

> Improve handling of optional configs in SQLConf
> ---
>
> Key: SPARK-14422
> URL: https://issues.apache.org/jira/browse/SPARK-14422
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
>Assignee: Sandeep Singh
>Priority: Minor
> Fix For: 2.0.0
>
>
> As Michael showed here: 
> https://github.com/apache/spark/pull/12119/files/69aa1a005cc7003ab62d6dfcdef42181b053eaed#r58634150
> Handling of optional configs in SQLConf is a little sub-optimal right now. We 
> should clean that up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14422) Improve handling of optional configs in SQLConf

2016-05-03 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14422.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Improve handling of optional configs in SQLConf
> ---
>
> Key: SPARK-14422
> URL: https://issues.apache.org/jira/browse/SPARK-14422
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
>Priority: Minor
> Fix For: 2.0.0
>
>
> As Michael showed here: 
> https://github.com/apache/spark/pull/12119/files/69aa1a005cc7003ab62d6dfcdef42181b053eaed#r58634150
> Handling of optional configs in SQLConf is a little sub-optimal right now. We 
> should clean that up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15106) Add package documentation for ML and remove BETA from Scala & Java for ML pipeline API.

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15106:


Assignee: (was: Apache Spark)

> Add package documentation for ML and remove BETA from Scala & Java for ML 
> pipeline API.
> ---
>
> Key: SPARK-15106
> URL: https://issues.apache.org/jira/browse/SPARK-15106
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, ML, PySpark
>Reporter: holdenk
>
> As part of the audit (SPARK-14813) I noticed we don't have a package 
> definition for PySpark ML and Scaladoc / Javadoc mention "BETA" which should 
> be going away now that we are deprecating MLLib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15106) Add package documentation for ML and remove BETA from Scala & Java for ML pipeline API.

2016-05-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269901#comment-15269901
 ] 

Apache Spark commented on SPARK-15106:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/12883

> Add package documentation for ML and remove BETA from Scala & Java for ML 
> pipeline API.
> ---
>
> Key: SPARK-15106
> URL: https://issues.apache.org/jira/browse/SPARK-15106
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, ML, PySpark
>Reporter: holdenk
>
> As part of the audit (SPARK-14813) I noticed we don't have a package 
> definition for PySpark ML and Scaladoc / Javadoc mention "BETA" which should 
> be going away now that we are deprecating MLLib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15106) Add package documentation for ML and remove BETA from Scala & Java for ML pipeline API.

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15106:


Assignee: Apache Spark

> Add package documentation for ML and remove BETA from Scala & Java for ML 
> pipeline API.
> ---
>
> Key: SPARK-15106
> URL: https://issues.apache.org/jira/browse/SPARK-15106
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, ML, PySpark
>Reporter: holdenk
>Assignee: Apache Spark
>
> As part of the audit (SPARK-14813) I noticed we don't have a package 
> definition for PySpark ML and Scaladoc / Javadoc mention "BETA" which should 
> be going away now that we are deprecating MLLib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15106) Add package documentation for ML and remove BETA from Scala & Java for ML pipeline API.

2016-05-03 Thread holdenk (JIRA)
holdenk created SPARK-15106:
---

 Summary: Add package documentation for ML and remove BETA from 
Scala & Java for ML pipeline API.
 Key: SPARK-15106
 URL: https://issues.apache.org/jira/browse/SPARK-15106
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, ML, PySpark
Reporter: holdenk


As part of the audit (SPARK-14813) I noticed we don't have a package definition 
for PySpark ML and Scaladoc / Javadoc mention "BETA" which should be going away 
now that we are deprecating MLLib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14813) ML 2.0 QA: API: Python API coverage

2016-05-03 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269827#comment-15269827
 ] 

holdenk commented on SPARK-14813:
-

I'm happy to start doing a first pass on this later on this week if no one else 
is interested. <3 PySpark

> ML 2.0 QA: API: Python API coverage
> ---
>
> Key: SPARK-14813
> URL: https://issues.apache.org/jira/browse/SPARK-14813
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML, PySpark
>Reporter: Joseph K. Bradley
>
> For new public APIs added to MLlib, we need to check the generated HTML doc 
> and compare the Scala & Python versions.  We need to track:
> * Inconsistency: Do class/method/parameter names match?
> * Docs: Is the Python doc missing or just a stub?  We want the Python doc to 
> be as complete as the Scala doc.
> * API breaking changes: These should be very rare but are occasionally either 
> necessary (intentional) or accidental.  These must be recorded and added in 
> the Migration Guide for this release.
> ** Note: If the API change is for an Alpha/Experimental/DeveloperApi 
> component, please note that as well.
> * Missing classes/methods/parameters: We should create to-do JIRAs for 
> functionality missing from Python, to be added in the next release cycle.  
> Please use a *separate* JIRA (linked below) for this list of to-do items.
> UPDATE: This only needs to cover spark.ml since spark.mllib is going into 
> maintenance mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14772) Python ML Params.copy treats uid, paramMaps differently than Scala

2016-05-03 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269824#comment-15269824
 ] 

holdenk commented on SPARK-14772:
-

I can take a look at this if no one else is working on it and it planned for 2.0

> Python ML Params.copy treats uid, paramMaps differently than Scala
> --
>
> Key: SPARK-14772
> URL: https://issues.apache.org/jira/browse/SPARK-14772
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> In PySpark, {{ml.param.Params.copy}} does not quite match the Scala 
> implementation:
> * It does not copy the UID
> * It does not respect the difference between defaultParamMap and paramMap.  
> This is an issue with {{_copyValues}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15096) LogisticRegression MultiClassSummarizer numClasses can fail if no valid labels are found

2016-05-03 Thread Miao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269776#comment-15269776
 ] 

Miao Wang commented on SPARK-15096:
---

If nobody is working on this one, I will work on this one now.

Thanks!

Miao

> LogisticRegression MultiClassSummarizer numClasses can fail if no valid 
> labels are found
> 
>
> Key: SPARK-15096
> URL: https://issues.apache.org/jira/browse/SPARK-15096
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.0.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> LogisticRegression.train calls labelSummarizer.histogram before it calls 
> labelSummarizer.countInvalid: 
> [https://github.com/apache/spark/blob/f5623b460224ce363316c63f5d28947215078fc5/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L292]
>   But if there are no valid labels, it is possible to get an Exception from 
> empty.max when numClasses is called here: 
> [https://github.com/apache/spark/blob/f5623b460224ce363316c63f5d28947215078fc5/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L751]
> Proposed fix: We should fix numClasses to throw a better exception: 
> [https://github.com/apache/spark/blob/f5623b460224ce363316c63f5d28947215078fc5/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L747]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14817) ML, Graph, R 2.0 QA: Programming guide update and migration guide

2016-05-03 Thread Xin Ren (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269775#comment-15269775
 ] 

Xin Ren commented on SPARK-14817:
-

ok, I'll start looking for new APIs. 

So just create new tickets under SPARK-14815? 

> ML, Graph, R 2.0 QA: Programming guide update and migration guide
> -
>
> Key: SPARK-14817
> URL: https://issues.apache.org/jira/browse/SPARK-14817
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib, SparkR
>Reporter: Joseph K. Bradley
>
> Before the release, we need to update the MLlib, GraphX, and SparkR 
> Programming Guides.  Updates will include:
> * Add migration guide subsection.
> ** Use the results of the QA audit JIRAs and [SPARK-13448].
> * Check phrasing, especially in main sections (for outdated items such as "In 
> this release, ...")
> For MLlib, we will make the DataFrame-based API (spark.ml) front-and-center, 
> to make it clear the RDD-based API is the older, maintenance-mode one.
> * No docs for spark.mllib will be deleted; they will just be reorganized and 
> put in a subsection.
> * If spark.ml docs are less complete, or if spark.ml docs say "refer to the 
> spark.mllib docs for details," then we should copy those details to the 
> spark.ml docs.  This per-feature work can happen under [SPARK-14815].
> * This big reorganization should be done *after* docs are added for each 
> feature (to minimize merge conflicts).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15101) Audit: ml.clustering and ml.recommendation

2016-05-03 Thread Miao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269768#comment-15269768
 ] 

Miao Wang commented on SPARK-15101:
---

[~josephkb] I want to know how to work on these kind of JIRAs. For example, if 
multiple examples and docs are missing, shall I just file a single PR for this 
JIRA? Or I just report the missing parts and updates the JIRA for further 
subtasks?

Thanks!

Miao

> Audit: ml.clustering and ml.recommendation
> --
>
> Key: SPARK-15101
> URL: https://issues.apache.org/jira/browse/SPARK-15101
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Joseph K. Bradley
>
> Audit this sub-package for new algorithms which do not have corresponding 
> sections & examples in the user guide.
> See parent issue for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14900) spark.ml classification metrics should include accuracy

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14900:


Assignee: (was: Apache Spark)

> spark.ml classification metrics should include accuracy
> ---
>
> Key: SPARK-14900
> URL: https://issues.apache.org/jira/browse/SPARK-14900
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> To compute "accuracy" (0/1 classification accuracy), users can use 
> {{precision}} in MulticlassMetrics and 
> MulticlassClassificationEvaluator.metricName.  We should also support 
> "accuracy" directly as an alias to help users familiar with that name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14900) spark.ml classification metrics should include accuracy

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14900:


Assignee: Apache Spark

> spark.ml classification metrics should include accuracy
> ---
>
> Key: SPARK-14900
> URL: https://issues.apache.org/jira/browse/SPARK-14900
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>Priority: Minor
>
> To compute "accuracy" (0/1 classification accuracy), users can use 
> {{precision}} in MulticlassMetrics and 
> MulticlassClassificationEvaluator.metricName.  We should also support 
> "accuracy" directly as an alias to help users familiar with that name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14900) spark.ml classification metrics should include accuracy

2016-05-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269754#comment-15269754
 ] 

Apache Spark commented on SPARK-14900:
--

User 'wangmiao1981' has created a pull request for this issue:
https://github.com/apache/spark/pull/12882

> spark.ml classification metrics should include accuracy
> ---
>
> Key: SPARK-14900
> URL: https://issues.apache.org/jira/browse/SPARK-14900
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> To compute "accuracy" (0/1 classification accuracy), users can use 
> {{precision}} in MulticlassMetrics and 
> MulticlassClassificationEvaluator.metricName.  We should also support 
> "accuracy" directly as an alias to help users familiar with that name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13269) Expose more executor stats in stable status API

2016-05-03 Thread Alex Bozarth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269738#comment-15269738
 ] 

Alex Bozarth commented on SPARK-13269:
--

Hey [~andrewor14], I was interested in this and took a look at the two examples 
you gave and am a bit confused at what exactly you actually want. You can 
currently see the used memory, used disk space, and active task count for each 
executor by calling /applications/[app-id]/executors or (in code) getting the 
ExecutorSummary class for each executor and checking activeTask, memoryUsed, 
and diskUsed. Are these numbers different from what you were interested in 
surfacing? I was unsure of how those example related to JobProgressListener as 
well.

> Expose more executor stats in stable status API
> ---
>
> Key: SPARK-13269
> URL: https://issues.apache.org/jira/browse/SPARK-13269
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Andrew Or
>
> Currently the stable status API is quite limited; it exposes only a small 
> subset of the things exposed by JobProgressListener. It is useful for very 
> high level querying but falls short when the developer wants to build an 
> application on top of Spark with more integration.
> In this issue I propose that we expose at least two things:
> - Which executors are running tasks, and
> - Which executors cached how much in memory and on disk
> The goal is not to expose exactly these two things, but to expose something 
> that would allow the developer to learn about them. These concepts are very 
> much fundamental in Spark's design so there's almost no chance that they will 
> go away in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15095) Drop binary mode in ThriftServer

2016-05-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269719#comment-15269719
 ] 

Apache Spark commented on SPARK-15095:
--

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/12881

> Drop binary mode in ThriftServer
> 
>
> Key: SPARK-15095
> URL: https://issues.apache.org/jira/browse/SPARK-15095
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15105) Remove HiveSessionHook from ThriftServer

2016-05-03 Thread Davies Liu (JIRA)
Davies Liu created SPARK-15105:
--

 Summary: Remove HiveSessionHook from ThriftServer
 Key: SPARK-15105
 URL: https://issues.apache.org/jira/browse/SPARK-15105
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu
Assignee: Davies Liu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15102) remove delegation token from ThriftServer

2016-05-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269700#comment-15269700
 ] 

Apache Spark commented on SPARK-15102:
--

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/12878

> remove delegation token from ThriftServer
> -
>
> Key: SPARK-15102
> URL: https://issues.apache.org/jira/browse/SPARK-15102
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> These feature is only useful for Hadoop



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15104) Bad spacing in log line

2016-05-03 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15104.
-
   Resolution: Fixed
 Assignee: Andrew Ash
Fix Version/s: 2.0.0

> Bad spacing in log line
> ---
>
> Key: SPARK-15104
> URL: https://issues.apache.org/jira/browse/SPARK-15104
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Andrew Ash
>Assignee: Andrew Ash
>Priority: Minor
> Fix For: 2.0.0
>
>
> {noformat}INFO  [2016-05-03 21:18:51,477] 
> org.apache.spark.scheduler.TaskSetManager: Starting task 0.0 in stage 101.0 
> (TID 7029, localhost, partition 0,NODE_LOCAL, 1894 bytes){noformat}
> Should have a space before "NODE_LOCAL"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15102) remove delegation token from ThriftServer

2016-05-03 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15102.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> remove delegation token from ThriftServer
> -
>
> Key: SPARK-15102
> URL: https://issues.apache.org/jira/browse/SPARK-15102
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> These feature is only useful for Hadoop



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15104) Bad spacing in log line

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15104:


Assignee: Apache Spark

> Bad spacing in log line
> ---
>
> Key: SPARK-15104
> URL: https://issues.apache.org/jira/browse/SPARK-15104
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Andrew Ash
>Assignee: Apache Spark
>Priority: Minor
>
> {noformat}INFO  [2016-05-03 21:18:51,477] 
> org.apache.spark.scheduler.TaskSetManager: Starting task 0.0 in stage 101.0 
> (TID 7029, localhost, partition 0,NODE_LOCAL, 1894 bytes){noformat}
> Should have a space before "NODE_LOCAL"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15104) Bad spacing in log line

2016-05-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269648#comment-15269648
 ] 

Apache Spark commented on SPARK-15104:
--

User 'ash211' has created a pull request for this issue:
https://github.com/apache/spark/pull/12880

> Bad spacing in log line
> ---
>
> Key: SPARK-15104
> URL: https://issues.apache.org/jira/browse/SPARK-15104
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Andrew Ash
>Priority: Minor
>
> {noformat}INFO  [2016-05-03 21:18:51,477] 
> org.apache.spark.scheduler.TaskSetManager: Starting task 0.0 in stage 101.0 
> (TID 7029, localhost, partition 0,NODE_LOCAL, 1894 bytes){noformat}
> Should have a space before "NODE_LOCAL"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15104) Bad spacing in log line

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15104:


Assignee: (was: Apache Spark)

> Bad spacing in log line
> ---
>
> Key: SPARK-15104
> URL: https://issues.apache.org/jira/browse/SPARK-15104
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Andrew Ash
>Priority: Minor
>
> {noformat}INFO  [2016-05-03 21:18:51,477] 
> org.apache.spark.scheduler.TaskSetManager: Starting task 0.0 in stage 101.0 
> (TID 7029, localhost, partition 0,NODE_LOCAL, 1894 bytes){noformat}
> Should have a space before "NODE_LOCAL"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15104) Bad spacing in log line

2016-05-03 Thread Andrew Ash (JIRA)
Andrew Ash created SPARK-15104:
--

 Summary: Bad spacing in log line
 Key: SPARK-15104
 URL: https://issues.apache.org/jira/browse/SPARK-15104
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.6.1
Reporter: Andrew Ash
Priority: Minor


{noformat}INFO  [2016-05-03 21:18:51,477] 
org.apache.spark.scheduler.TaskSetManager: Starting task 0.0 in stage 101.0 
(TID 7029, localhost, partition 0,NODE_LOCAL, 1894 bytes){noformat}

Should have a space before "NODE_LOCAL"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-9466) Flaky test: org.apache.spark.sql.hive.thriftserver.CliSuite

2016-05-03 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin closed SPARK-9466.
--
Resolution: Auto Closed

> Flaky test: org.apache.spark.sql.hive.thriftserver.CliSuite
> ---
>
> Key: SPARK-9466
> URL: https://issues.apache.org/jira/browse/SPARK-9466
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>  Labels: flaky-test
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-12008) Spark hive security authorization doesn't work as Apache hive's

2016-05-03 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin closed SPARK-12008.
---
Resolution: Invalid

Marking this as invalid since these are unsupported for now. We might add 
support explicitly for these features in the future.


> Spark hive security authorization doesn't work as Apache hive's
> ---
>
> Key: SPARK-12008
> URL: https://issues.apache.org/jira/browse/SPARK-12008
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: pin_zhang
>
> Spark hive security authorization doesn't consistent with apache hive
> The same hive-site.xml
>  
>  hive.security.authorization.enabled
>  true
> 
>
> hive.security.authorization.manager
> org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory
> 
> 
> hive.security.authenticator.manager
> org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator
>   
>
> hive.server2.enable.doAs
> true
> 
> 1. Run spark start-thriftserver.sh, Will meet exception when run sql.
>SQL standards based authorization should not be enabled from hive 
> cliInstead the use of storage based authorization in hive metastore is 
> reccomended. 
>Set hive.security.authorization.enabled=false to disable authz within cli
> 2. Change to start start-thriftserver.sh with hive configurations
> ./start-thriftserver.sh --conf 
> hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory
>  --conf 
> hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator
>  
> 3. Beeline connect with userA and create table tableA.
> 4. Beeline connect with userB to truncate tableA
>   A) In Apache hive, truncate table get exception
>   Error while compiling statement: FAILED: HiveAccessControlException 
> Permission denied: Principal [name=userB, type=USER] does not have following 
> privileges for operation TRUNCATETABLE [[OBJECT OWNERSHIP] on Object 
> [type=TABLE_OR_VIEW, name=default.tablea]] (state=42000,code=4)
>   B) In Spark hive, any user that can connect to the hive, can truncate, as 
> long as the spark user has privileges.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15103) Add support for batch jobs correctly inferring partitions from data written with file stream sink

2016-05-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269632#comment-15269632
 ] 

Apache Spark commented on SPARK-15103:
--

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/12879

> Add support for batch jobs correctly inferring partitions from data written 
> with file stream sink
> -
>
> Key: SPARK-15103
> URL: https://issues.apache.org/jira/browse/SPARK-15103
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> File Stream Sink writes the list of written files in a log. StreamFileCatalog 
> reads the list of the files for processing. However StreamFileCatalog does 
> not infer partitioning like HDFSFileCatalog.
> This JIRA is enable that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15103) Add support for batch jobs correctly inferring partitions from data written with file stream sink

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15103:


Assignee: Tathagata Das  (was: Apache Spark)

> Add support for batch jobs correctly inferring partitions from data written 
> with file stream sink
> -
>
> Key: SPARK-15103
> URL: https://issues.apache.org/jira/browse/SPARK-15103
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> File Stream Sink writes the list of written files in a log. StreamFileCatalog 
> reads the list of the files for processing. However StreamFileCatalog does 
> not infer partitioning like HDFSFileCatalog.
> This JIRA is enable that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15103) Add support for batch jobs correctly inferring partitions from data written with file stream sink

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15103:


Assignee: Apache Spark  (was: Tathagata Das)

> Add support for batch jobs correctly inferring partitions from data written 
> with file stream sink
> -
>
> Key: SPARK-15103
> URL: https://issues.apache.org/jira/browse/SPARK-15103
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Apache Spark
>
> File Stream Sink writes the list of written files in a log. StreamFileCatalog 
> reads the list of the files for processing. However StreamFileCatalog does 
> not infer partitioning like HDFSFileCatalog.
> This JIRA is enable that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-12066) spark sql throw java.lang.ArrayIndexOutOfBoundsException when use table.* with join

2016-05-03 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin closed SPARK-12066.
---
Resolution: Cannot Reproduce

Closing as cannot reproduce for now.

> spark sql  throw java.lang.ArrayIndexOutOfBoundsException when use table.* 
> with join 
> -
>
> Key: SPARK-12066
> URL: https://issues.apache.org/jira/browse/SPARK-12066
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0, 1.5.2
> Environment: linux 
>Reporter: Ricky Yang
>
> throw java.lang.ArrayIndexOutOfBoundsException  when I use following spark 
> sql on spark standlone or yarn.
>the sql:
> select ta.* 
> from bi_td.dm_price_seg_td tb 
> join bi_sor.sor_ord_detail_tf ta 
> on 1 = 1 
> where ta.sale_dt = '20140514' 
> and ta.sale_price >= tb.pri_from 
> and ta.sale_price < tb.pri_to limit 10 ; 
> But ,the result is correct when using no * as following:
> select ta.sale_dt 
> from bi_td.dm_price_seg_td tb 
> join bi_sor.sor_ord_detail_tf ta 
> on 1 = 1 
> where ta.sale_dt = '20140514' 
> and ta.sale_price >= tb.pri_from 
> and ta.sale_price < tb.pri_to limit 10 ; 
> standlone version is 1.4.0 and version spark on yarn  is 1.5.2
> error log :
>   
> 15/11/30 14:19:59 ERROR SparkSQLDriver: Failed in [select ta.* 
> from bi_td.dm_price_seg_td tb 
> join bi_sor.sor_ord_detail_tf ta 
> on 1 = 1 
> where ta.sale_dt = '20140514' 
> and ta.sale_price >= tb.pri_from 
> and ta.sale_price < tb.pri_to limit 10 ] 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3, namenode2-sit.cnsuning.com): java.lang.ArrayIndexOutOfBoundsException 
> Driver stacktrace: 
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
>  
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
>  
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
>  
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270) 
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
>  
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
>  
> at scala.Option.foreach(Option.scala:236) 
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
>  
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
>  
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
>  
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
>  
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 
> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) 
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824) 
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837) 
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850) 
> at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:215) 
> at 
> org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:207) 
> at 
> org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:587)
>  
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63)
>  
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:308)
>  
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) 
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) 
> at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409) 
> at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425) 
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
>  
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>  
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:606) 
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
>  
> at 

[jira] [Commented] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites

2016-05-03 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269620#comment-15269620
 ] 

Dongjoon Hyun commented on SPARK-15037:
---

Sure. Go ahead if you want.

This is still blocked by SPARK-15084 .

But, I think you can start with Spark/Java.

I am working on SPARK-15084 and SPARK-15031 (for examples).

Thanks.

> Use SparkSession instead of SQLContext in testsuites
> 
>
> Key: SPARK-15037
> URL: https://issues.apache.org/jira/browse/SPARK-15037
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Dongjoon Hyun
>
> This issue aims to update the existing testsuites to use `SparkSession` 
> instread of `SQLContext` since `SQLContext` exists just for backward 
> compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15056) Parse Unsupported Sampling Syntax and Issue Better Exceptions

2016-05-03 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-15056.
---
Resolution: Fixed
  Assignee: Xiao Li

> Parse Unsupported Sampling Syntax and Issue Better Exceptions
> -
>
> Key: SPARK-15056
> URL: https://issues.apache.org/jira/browse/SPARK-15056
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Minor
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling
> Compared with the current Spark parser, there are two extra syntax are 
> supported in Hive for sampling
> 1. In On clauses, rand() indicating sampling on the entire row instead of an 
> individual column. 
> 2. Users can specify total length to be read in block_sample.
> We need to parse and capture them. Issue a better error message for these 
> unsupported features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15103) Add support for batch jobs correctly inferring partitions from data written with file stream sink

2016-05-03 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-15103:
-

 Summary: Add support for batch jobs correctly inferring partitions 
from data written with file stream sink
 Key: SPARK-15103
 URL: https://issues.apache.org/jira/browse/SPARK-15103
 Project: Spark
  Issue Type: Sub-task
  Components: SQL, Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das


File Stream Sink writes the list of written files in a log. StreamFileCatalog 
reads the list of the files for processing. However StreamFileCatalog does not 
infer partitioning like HDFSFileCatalog.

This JIRA is enable that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13971) Implicit group by with distinct modifier on having raises an unexpected error

2016-05-03 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13971.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Implicit group by with distinct modifier on having raises an unexpected error
> -
>
> Key: SPARK-13971
> URL: https://issues.apache.org/jira/browse/SPARK-13971
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: spark standalone mode installed on Centos7
>Reporter: Javier PĂ©rez
> Fix For: 2.0.0
>
>
> 1. Start-thriftserver
> 2. connect with beeline
> 3 perform the following query over a simple talbe:
> SELECT COUNT(DISTINCT field1) FROM test_table HAVING COUNT(DISTINCT field1) = 
> 3
> TRACE:
> ERROR SparkExecuteStatementOperation: Error running hive query: 
> org.apache.hive.service.cli.HiveSQLException: 
> org.apache.spark.sql.AnalysisException: resolved attribute(s) 
> gid#13616,field1#13617 missing from 
> field1#13612,field2#13611,field2#13608,field3#13610,field4#13613,field5#13609 
> in operator !Expand [List(null, 0, if ((gid#13616 = 1)) field1#13617 else 
> null),List(field2#13608, 1, null)], [field2#13619,gid#13618,if ((gid = 1)) 
> field1 else null#13620];
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:246)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:154)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:151)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:164)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14973) The CrossValidator and TrainValidationSplit miss the seed when saving and loading

2016-05-03 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-14973.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12825
[https://github.com/apache/spark/pull/12825]

> The CrossValidator and TrainValidationSplit miss the seed when saving and 
> loading
> -
>
> Key: SPARK-14973
> URL: https://issues.apache.org/jira/browse/SPARK-14973
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Reporter: Xusen Yin
>Assignee: Xusen Yin
> Fix For: 2.0.0
>
>
> The CrossValidator and TrainValidationSplit miss the seed when saving and 
> loading. Need to fix both Spark side code and test suite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15102) remove delegation token from ThriftServer

2016-05-03 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-15102:

Issue Type: Sub-task  (was: Bug)
Parent: SPARK-14987

> remove delegation token from ThriftServer
> -
>
> Key: SPARK-15102
> URL: https://issues.apache.org/jira/browse/SPARK-15102
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> These feature is only useful for Hadoop



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14973) The CrossValidator and TrainValidationSplit miss the seed when saving and loading

2016-05-03 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-14973:
--
Shepherd: Joseph K. Bradley

> The CrossValidator and TrainValidationSplit miss the seed when saving and 
> loading
> -
>
> Key: SPARK-14973
> URL: https://issues.apache.org/jira/browse/SPARK-14973
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Reporter: Xusen Yin
>Assignee: Xusen Yin
>
> The CrossValidator and TrainValidationSplit miss the seed when saving and 
> loading. Need to fix both Spark side code and test suite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15095) Drop binary mode in ThriftServer

2016-05-03 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-15095:

Issue Type: Sub-task  (was: Bug)
Parent: SPARK-14987

> Drop binary mode in ThriftServer
> 
>
> Key: SPARK-15095
> URL: https://issues.apache.org/jira/browse/SPARK-15095
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14973) The CrossValidator and TrainValidationSplit miss the seed when saving and loading

2016-05-03 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-14973:
--
Assignee: Xusen Yin

> The CrossValidator and TrainValidationSplit miss the seed when saving and 
> loading
> -
>
> Key: SPARK-14973
> URL: https://issues.apache.org/jira/browse/SPARK-14973
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Reporter: Xusen Yin
>Assignee: Xusen Yin
>
> The CrossValidator and TrainValidationSplit miss the seed when saving and 
> loading. Need to fix both Spark side code and test suite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-10152) Support Init script for hive-thriftserver

2016-05-03 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin closed SPARK-10152.
---
Resolution: Won't Fix

> Support Init script for hive-thriftserver
> -
>
> Key: SPARK-10152
> URL: https://issues.apache.org/jira/browse/SPARK-10152
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Navis
>Priority: Trivial
>
> If some queries can be executed on thrift server in initialization 
> stage(mostly for registering functions or macros), things are done much 
> easier. 
> Not big stuff to be included in spark but wish someone can use of this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15102) remove delegation token from ThriftServer

2016-05-03 Thread Davies Liu (JIRA)
Davies Liu created SPARK-15102:
--

 Summary: remove delegation token from ThriftServer
 Key: SPARK-15102
 URL: https://issues.apache.org/jira/browse/SPARK-15102
 Project: Spark
  Issue Type: Bug
Reporter: Davies Liu
Assignee: Davies Liu


These feature is only useful for Hadoop



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15095) Drop binary mode in ThriftServer

2016-05-03 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15095.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Drop binary mode in ThriftServer
> 
>
> Key: SPARK-15095
> URL: https://issues.apache.org/jira/browse/SPARK-15095
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15100) Audit: ml.feature

2016-05-03 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-15100:
-

 Summary: Audit: ml.feature
 Key: SPARK-15100
 URL: https://issues.apache.org/jira/browse/SPARK-15100
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, ML
Reporter: Joseph K. Bradley


Audit this sub-package for new algorithms which do not have corresponding 
sections & examples in the user guide.

See parent issue for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14817) ML, Graph, R 2.0 QA: Programming guide update and migration guide

2016-05-03 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269588#comment-15269588
 ] 

Joseph K. Bradley commented on SPARK-14817:
---

[~BenFradet] [~iamshrek] [~podongfeng] [~wm624] If you'd like to begin, could 
you please help with the initial audit tasks on [SPARK-14815]?  That will let 
us identify missing programming guide items which we need to add.  Thank you!

> ML, Graph, R 2.0 QA: Programming guide update and migration guide
> -
>
> Key: SPARK-14817
> URL: https://issues.apache.org/jira/browse/SPARK-14817
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib, SparkR
>Reporter: Joseph K. Bradley
>
> Before the release, we need to update the MLlib, GraphX, and SparkR 
> Programming Guides.  Updates will include:
> * Add migration guide subsection.
> ** Use the results of the QA audit JIRAs and [SPARK-13448].
> * Check phrasing, especially in main sections (for outdated items such as "In 
> this release, ...")
> For MLlib, we will make the DataFrame-based API (spark.ml) front-and-center, 
> to make it clear the RDD-based API is the older, maintenance-mode one.
> * No docs for spark.mllib will be deleted; they will just be reorganized and 
> put in a subsection.
> * If spark.ml docs are less complete, or if spark.ml docs say "refer to the 
> spark.mllib docs for details," then we should copy those details to the 
> spark.ml docs.  This per-feature work can happen under [SPARK-14815].
> * This big reorganization should be done *after* docs are added for each 
> feature (to minimize merge conflicts).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15101) Audit: ml.clustering and ml.recommendation

2016-05-03 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-15101:
-

 Summary: Audit: ml.clustering and ml.recommendation
 Key: SPARK-15101
 URL: https://issues.apache.org/jira/browse/SPARK-15101
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, ML
Reporter: Joseph K. Bradley


Audit this sub-package for new algorithms which do not have corresponding 
sections & examples in the user guide.

See parent issue for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15098) Audit: ml.classification

2016-05-03 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-15098:
-

 Summary: Audit: ml.classification
 Key: SPARK-15098
 URL: https://issues.apache.org/jira/browse/SPARK-15098
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, ML
Reporter: Joseph K. Bradley


Audit this sub-package for new algorithms which do not have corresponding 
sections & examples in the user guide.

See parent issue for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15099) Audit: ml.regression

2016-05-03 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-15099:
-

 Summary: Audit: ml.regression
 Key: SPARK-15099
 URL: https://issues.apache.org/jira/browse/SPARK-15099
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, ML
Reporter: Joseph K. Bradley


Audit this sub-package for new algorithms which do not have corresponding 
sections & examples in the user guide.

See parent issue for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14815) ML, Graph, R 2.0 QA: Update user guide for new features & APIs

2016-05-03 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269581#comment-15269581
 ] 

Joseph K. Bradley commented on SPARK-14815:
---

I'll go ahead and create subtasks for auditing various parts of the API.  
Please check those subpackages to see if the user guide is missing sections.

> ML, Graph, R 2.0 QA: Update user guide for new features & APIs
> --
>
> Key: SPARK-14815
> URL: https://issues.apache.org/jira/browse/SPARK-14815
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib, SparkR
>Reporter: Joseph K. Bradley
>
> Check the user guide vs. a list of new APIs (classes, methods, data members) 
> to see what items require updates to the user guide.
> For each feature missing user guide doc:
> * Create a JIRA for that feature, and assign it to the author of the feature
> * Link it to (a) the original JIRA which introduced that feature ("related 
> to") and (b) to this JIRA ("requires").
> For MLlib:
> * This task does not include major reorganizations for the programming guide; 
> that will be under [SPARK-14817].
> * We should now begin copying algorithm details from the spark.mllib guide to 
> spark.ml as needed, rather than just linking back to the corresponding 
> algorithms in the spark.mllib user guide.
> If you would like to work on this task, please comment, and we can create & 
> link JIRAs for parts of this work (which should be broken into pieces for 
> this larger 2.0 release).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14808) Spark MLlib, GraphX, SparkR 2.0 QA umbrella

2016-05-03 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-14808:
--
Description: 
This JIRA lists tasks for the next Spark release's QA period for MLlib, GraphX, 
and SparkR.

The list below gives an overview of what is involved, and the corresponding 
JIRA issues are linked below that.

h2. API

* Check binary API compatibility for Scala/Java
* Audit new public APIs (from the generated html doc)
** Scala
** Java compatibility
** Python coverage
** R
* Check Experimental, DeveloperApi tags

h2. Algorithms and performance

*Performance*
* _List any other missing performance tests from spark-perf here_
* perf-tests for transformers (SPARK-2838)
* MultilayerPerceptron (SPARK-11911)

h2. Documentation and example code

* For new algorithms, create JIRAs for updating the user guide sections & 
examples
* Update Programming Guide
* Update website


  was:
This JIRA lists tasks for the next Spark release's QA period for MLlib, GraphX, 
and SparkR.

The list below gives an overview of what is involved, and the corresponding 
JIRA issues are linked below that.

h2. API

* Check binary API compatibility for Scala/Java
* Audit new public APIs (from the generated html doc)
** Scala
** Java compatibility
** Python coverage
** R
* Check Experimental, DeveloperApi tags

h2. Algorithms and performance

*Performance*
* _List any other missing performance tests from spark-perf here_
* perf-tests for transformers (SPARK-2838)
* MultilayerPerceptron (SPARK-11911)

h2. Documentation and example code

* For new algorithms, create JIRAs for updating the user guide
* For major components, create JIRAs for example code
* Update Programming Guide
* Update website



> Spark MLlib, GraphX, SparkR 2.0 QA umbrella
> ---
>
> Key: SPARK-14808
> URL: https://issues.apache.org/jira/browse/SPARK-14808
> Project: Spark
>  Issue Type: Umbrella
>  Components: Documentation, GraphX, ML, MLlib, SparkR
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Critical
>
> This JIRA lists tasks for the next Spark release's QA period for MLlib, 
> GraphX, and SparkR.
> The list below gives an overview of what is involved, and the corresponding 
> JIRA issues are linked below that.
> h2. API
> * Check binary API compatibility for Scala/Java
> * Audit new public APIs (from the generated html doc)
> ** Scala
> ** Java compatibility
> ** Python coverage
> ** R
> * Check Experimental, DeveloperApi tags
> h2. Algorithms and performance
> *Performance*
> * _List any other missing performance tests from spark-perf here_
> * perf-tests for transformers (SPARK-2838)
> * MultilayerPerceptron (SPARK-11911)
> h2. Documentation and example code
> * For new algorithms, create JIRAs for updating the user guide sections & 
> examples
> * Update Programming Guide
> * Update website



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-14809) Examples: Check for new APIs requiring example code in 2.0

2016-05-03 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley closed SPARK-14809.
-
Resolution: Duplicate

This used to be relevant when examples & the user guide were separate, but it 
can now be contained within [SPARK-14815]

> Examples: Check for new APIs requiring example code in 2.0
> --
>
> Key: SPARK-14809
> URL: https://issues.apache.org/jira/browse/SPARK-14809
> Project: Spark
>  Issue Type: Sub-task
>  Components: GraphX, ML, MLlib, SparkR
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Audit list of new features added to MLlib, GraphX & SparkR, and see which 
> major items are missing example code (in the examples folder).  We do not 
> need examples for everything, only for major items such as new algorithms.
> For any such items:
> * Create a JIRA for that feature, and assign it to the author of the feature 
> (or yourself if interested).
> * Link it to (a) the original JIRA which introduced that feature ("related 
> to") and (b) to this JIRA ("requires").



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14809) Examples: Check for new APIs requiring example code in 2.0

2016-05-03 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-14809:
--
Issue Type: Documentation  (was: Sub-task)
Parent: (was: SPARK-14808)

> Examples: Check for new APIs requiring example code in 2.0
> --
>
> Key: SPARK-14809
> URL: https://issues.apache.org/jira/browse/SPARK-14809
> Project: Spark
>  Issue Type: Documentation
>  Components: GraphX, ML, MLlib, SparkR
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Audit list of new features added to MLlib, GraphX & SparkR, and see which 
> major items are missing example code (in the examples folder).  We do not 
> need examples for everything, only for major items such as new algorithms.
> For any such items:
> * Create a JIRA for that feature, and assign it to the author of the feature 
> (or yourself if interested).
> * Link it to (a) the original JIRA which introduced that feature ("related 
> to") and (b) to this JIRA ("requires").



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15073) Make SparkSession constructors private

2016-05-03 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15073.
-
Resolution: Fixed

> Make SparkSession constructors private
> --
>
> Key: SPARK-15073
> URL: https://issues.apache.org/jira/browse/SPARK-15073
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> So users have to use the Builder pattern.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15097) Import fails for someDataset.sqlContext.implicits._

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15097:


Assignee: (was: Apache Spark)

> Import fails for someDataset.sqlContext.implicits._
> ---
>
> Key: SPARK-15097
> URL: https://issues.apache.org/jira/browse/SPARK-15097
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
> Environment: spark-2.0.0-SNAPSHOT
>Reporter: koert kuipers
>
> with the introduction of SparkSession SQLContext changed from being a lazy 
> val to a def inside Dataset. however this is troublesome if you want to do:
> import someDataset.sqlContext.implicits._
> you get this error:
> stable identifier required, but someDataset.sqlContext.implicits found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15097) Import fails for someDataset.sqlContext.implicits._

2016-05-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269548#comment-15269548
 ] 

Apache Spark commented on SPARK-15097:
--

User 'koertkuipers' has created a pull request for this issue:
https://github.com/apache/spark/pull/12877

> Import fails for someDataset.sqlContext.implicits._
> ---
>
> Key: SPARK-15097
> URL: https://issues.apache.org/jira/browse/SPARK-15097
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
> Environment: spark-2.0.0-SNAPSHOT
>Reporter: koert kuipers
>
> with the introduction of SparkSession SQLContext changed from being a lazy 
> val to a def inside Dataset. however this is troublesome if you want to do:
> import someDataset.sqlContext.implicits._
> you get this error:
> stable identifier required, but someDataset.sqlContext.implicits found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15097) Import fails for someDataset.sqlContext.implicits._

2016-05-03 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15097:


Assignee: Apache Spark

> Import fails for someDataset.sqlContext.implicits._
> ---
>
> Key: SPARK-15097
> URL: https://issues.apache.org/jira/browse/SPARK-15097
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
> Environment: spark-2.0.0-SNAPSHOT
>Reporter: koert kuipers
>Assignee: Apache Spark
>
> with the introduction of SparkSession SQLContext changed from being a lazy 
> val to a def inside Dataset. however this is troublesome if you want to do:
> import someDataset.sqlContext.implicits._
> you get this error:
> stable identifier required, but someDataset.sqlContext.implicits found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11316) coalesce doesn't handle UnionRDD with partial locality properly

2016-05-03 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-11316.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11327
[https://github.com/apache/spark/pull/11327]

> coalesce doesn't handle UnionRDD with partial locality properly
> ---
>
> Key: SPARK-11316
> URL: https://issues.apache.org/jira/browse/SPARK-11316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
> Fix For: 2.0.0
>
>
> So I haven't fully debugged this yet but reporting what I'm seeing and think 
> might be going on.
> I have a graph processing job that is seeing huge slow down in setupGroups in 
> the location iterator where its getting the preferred locations for the 
> coalesce.  They are coalescing from 2400 down to 1200 and its taking 17+ 
> hours to do the calculation.  Killed it at this point so don't know total 
> time.
> It appears that the job is doing an isEmpty call, a bunch of other 
> transformation, then a coalesce (where it takes so long), other 
> transformations, then finally a count to trigger it.   
> It appears that there is only one node that its finding in the setupGroup 
> call and to get to that node it has to first to through the while loop:
> while (numCreated < targetLen && tries < expectedCoupons2) {
> where expectedCoupons2 is around 19000.  It finds very few or none in this 
> loop.  
> Then it does the second loop:
> while (numCreated < targetLen) {  // if we don't have enough partition 
> groups, create duplicates
>   var (nxt_replica, nxt_part) = rotIt.next()
>   val pgroup = PartitionGroup(nxt_replica)
>   groupArr += pgroup
>   groupHash.getOrElseUpdate(nxt_replica, ArrayBuffer()) += pgroup
>   var tries = 0
>   while (!addPartToPGroup(nxt_part, pgroup) && tries < targetLen) { // 
> ensure at least one part
> nxt_part = rotIt.next()._2
> tries += 1
>   }
>   numCreated += 1
> }
> Where it has an inner while loop and both of those are going 1200 times.  
> 1200*1200 loops.  This is taking a very long time.
> The user can work around the issue by adding in a count() call very close to 
> after the isEmpty call before the coalesce is called.  I also tried putting 
> in a take(1)  right before the isEmpty call and it seems to work around 
> the issue, took 1 hours with the take vs a few minutes with the count().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14521) StackOverflowError in Kryo when executing TPC-DS

2016-05-03 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-14521.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12598
[https://github.com/apache/spark/pull/12598]

> StackOverflowError in Kryo when executing TPC-DS
> 
>
> Key: SPARK-14521
> URL: https://issues.apache.org/jira/browse/SPARK-14521
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Rajesh Balamohan
>Priority: Critical
> Fix For: 2.0.0
>
>
> Build details:  Spark build from master branch (Apr-10)
> DataSet:TPC-DS at 200 GB scale in Parq format stored in hive.
> Client: $SPARK_HOME/bin/beeline 
> Query:  TPC-DS Query27
> spark.sql.sources.fileScan=true (this is the default value anyways)
> Exception:
> {noformat}
> Exception in thread "broadcast-exchange-0" java.lang.StackOverflowError
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeName(DefaultClassResolver.java:108)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:99)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> at 
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
> at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
> at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14234) Executor crashes for TaskRunner thread interruption

2016-05-03 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-14234.
--
   Resolution: Fixed
 Assignee: Devaraj K
Fix Version/s: 2.0.0

> Executor crashes for TaskRunner thread interruption
> ---
>
> Key: SPARK-14234
> URL: https://issues.apache.org/jira/browse/SPARK-14234
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Devaraj K
>Assignee: Devaraj K
> Fix For: 2.0.0
>
>
> If the TaskRunner thread gets interrupted while running due to task kill or 
> any other reason, the interrupted thread will try to update the task status 
> as part of the exception handling and fails with the below exception. This is 
> happening from all of these catch blocks statusUpdate calls, below are the 
> exceptions correspondingly for all these catch cases.
> {code:title=Executor.scala|borderStyle=solid}
> case _: TaskKilledException | _: InterruptedException if task.killed 
> =>
>  ..
> case cDE: CommitDeniedException =>
>  ..
> case t: Throwable =>
>  ..
> {code}
> {code:xml}
> 16/03/29 17:32:33 ERROR SparkUncaughtExceptionHandler: Uncaught exception in 
> thread Thread[Executor task launch worker-2,5,main]
> java.lang.Error: java.nio.channels.ClosedByInterruptException
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.nio.channels.ClosedByInterruptException
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at 
> java.nio.channels.Channels$WritableByteChannelImpl.write(Channels.java:460)
>   at 
> org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:49)
>   at 
> org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:47)
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1204)
>   at 
> org.apache.spark.util.SerializableBuffer.writeObject(SerializableBuffer.scala:47)
>   at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
>   at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEnv.serialize(NettyRpcEnv.scala:253)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:513)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:135)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   ... 2 more
> {code}
> {code:xml}
> 16/03/29 08:00:29 ERROR SparkUncaughtExceptionHandler: Uncaught exception in 
> thread Thread[Executor task launch worker-4,5,main]
> java.lang.Error: java.nio.channels.ClosedByInterruptException
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 

[jira] [Comment Edited] (SPARK-15037) Use SparkSession instead of SQLContext in testsuites

2016-05-03 Thread Sandeep Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269475#comment-15269475
 ] 

Sandeep Singh edited comment on SPARK-15037 at 5/3/16 8:22 PM:
---

[~dongjoon] If it's ok with you, I can work on this one.


was (Author: techaddict):
cc: [~rxin] [~dongjoon] If you want I can work on this one.

> Use SparkSession instead of SQLContext in testsuites
> 
>
> Key: SPARK-15037
> URL: https://issues.apache.org/jira/browse/SPARK-15037
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Dongjoon Hyun
>
> This issue aims to update the existing testsuites to use `SparkSession` 
> instread of `SQLContext` since `SQLContext` exists just for backward 
> compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >