[jira] [Resolved] (SPARK-13482) `spark.storage.memoryMapThreshold` has two kind of the value.

2016-02-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-13482.
-
   Resolution: Fixed
 Assignee: SaintBacchus
Fix Version/s: 2.0.0
   1.6.1

> `spark.storage.memoryMapThreshold` has two kind of the value.
> -
>
> Key: SPARK-13482
> URL: https://issues.apache.org/jira/browse/SPARK-13482
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 1.5.1, 1.6.0, 1.6.1, 2.0.0
>Reporter: SaintBacchus
>Assignee: SaintBacchus
> Fix For: 1.6.1, 2.0.0
>
> Attachments: 2016-02-25_10-41-37.jpg
>
>
> `spark.storage.memoryMapThreshold` has two kind of the value, one is 
> 2*1024*1024 as integer and the other one is '2m' as string.
> "2m" is recommanded in document but it will go wrong if the code goes into 
> *TransportConf#memoryMapBytes*.
> Useage of the `spark.storage.memoryMapThreshold`:
> !https://issues.apache.org/jira/secure/attachment/12789859/2016-02-25_10-41-37.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13489) GSoC 2016 project ideas for MLlib

2016-02-24 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-13489:
-

 Summary: GSoC 2016 project ideas for MLlib
 Key: SPARK-13489
 URL: https://issues.apache.org/jira/browse/SPARK-13489
 Project: Spark
  Issue Type: Brainstorming
  Components: ML
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Minor


I want to use this JIRA to collect some GSoC project ideas for MLlib. Ideally, 
the student should have contributed to Spark. And the content of the project 
could be divided into small functional pieces so that it won't get stalled if 
the mentor is temporarily unavailable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7106) Support model save/load in Python's FPGrowth

2016-02-24 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-7106.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11321
[https://github.com/apache/spark/pull/11321]

> Support model save/load in Python's FPGrowth
> 
>
> Key: SPARK-7106
> URL: https://issues.apache.org/jira/browse/SPARK-7106
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Reporter: Joseph K. Bradley
>Assignee: Kai Jiang
>Priority: Minor
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13488) PairDStreamFunctions.mapWithState fails in case timeout is set java.util.NoSuchElementException: None.get

2016-02-24 Thread NITESH VERMA (JIRA)
NITESH VERMA created SPARK-13488:


 Summary: PairDStreamFunctions.mapWithState fails in case timeout 
is set java.util.NoSuchElementException: None.get
 Key: SPARK-13488
 URL: https://issues.apache.org/jira/browse/SPARK-13488
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.6.0
Reporter: NITESH VERMA


Using the new spark mapWithState API, I've encountered a issue when setting a 
timeout for mapWithState 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13479) Python API for DataFrame approxQuantile

2016-02-24 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-13479.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11356
[https://github.com/apache/spark/pull/11356]

> Python API for DataFrame approxQuantile
> ---
>
> Key: SPARK-13479
> URL: https://issues.apache.org/jira/browse/SPARK-13479
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SQL
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Minor
> Fix For: 2.0.0
>
>
> Add Python API for approxQuantile DataFrame stat function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13487) User-facing RuntimeConfig interface

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13487:


Assignee: Reynold Xin  (was: Apache Spark)

> User-facing RuntimeConfig interface
> ---
>
> Key: SPARK-13487
> URL: https://issues.apache.org/jira/browse/SPARK-13487
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13487) User-facing RuntimeConfig interface

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13487:


Assignee: Apache Spark  (was: Reynold Xin)

> User-facing RuntimeConfig interface
> ---
>
> Key: SPARK-13487
> URL: https://issues.apache.org/jira/browse/SPARK-13487
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13487) User-facing RuntimeConfig interface

2016-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166800#comment-15166800
 ] 

Apache Spark commented on SPARK-13487:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/11364

> User-facing RuntimeConfig interface
> ---
>
> Key: SPARK-13487
> URL: https://issues.apache.org/jira/browse/SPARK-13487
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13486) Move SQLConf into an internal package

2016-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166784#comment-15166784
 ] 

Apache Spark commented on SPARK-13486:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/11363

> Move SQLConf into an internal package
> -
>
> Key: SPARK-13486
> URL: https://issues.apache.org/jira/browse/SPARK-13486
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> To improve project structure, it would be better if the top level packages 
> contain only public classes. For private ones such as SQLConf, we can move 
> them into org.apache.spark.sql.internal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13486) Move SQLConf into an internal package

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13486:


Assignee: Reynold Xin  (was: Apache Spark)

> Move SQLConf into an internal package
> -
>
> Key: SPARK-13486
> URL: https://issues.apache.org/jira/browse/SPARK-13486
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> To improve project structure, it would be better if the top level packages 
> contain only public classes. For private ones such as SQLConf, we can move 
> them into org.apache.spark.sql.internal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13486) Move SQLConf into an internal package

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13486:


Assignee: Apache Spark  (was: Reynold Xin)

> Move SQLConf into an internal package
> -
>
> Key: SPARK-13486
> URL: https://issues.apache.org/jira/browse/SPARK-13486
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> To improve project structure, it would be better if the top level packages 
> contain only public classes. For private ones such as SQLConf, we can move 
> them into org.apache.spark.sql.internal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13477) User-facing catalog API

2016-02-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13477:

Issue Type: Sub-task  (was: New Feature)
Parent: SPARK-13485

> User-facing catalog API
> ---
>
> Key: SPARK-13477
> URL: https://issues.apache.org/jira/browse/SPARK-13477
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Attachments: User-facingCatalogAPI.pdf
>
>
> This ticket proposes introducing a user-facing catalog API in Spark 2.0. 
> Please see the attached design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13487) User-facing RuntimeConfig interface

2016-02-24 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-13487:
---

 Summary: User-facing RuntimeConfig interface
 Key: SPARK-13487
 URL: https://issues.apache.org/jira/browse/SPARK-13487
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13486) Move SQLConf into an internal package

2016-02-24 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-13486:
---

 Summary: Move SQLConf into an internal package
 Key: SPARK-13486
 URL: https://issues.apache.org/jira/browse/SPARK-13486
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


To improve project structure, it would be better if the top level packages 
contain only public classes. For private ones such as SQLConf, we can move them 
into org.apache.spark.sql.internal.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13485) Dataset API foundation in Spark 2.0

2016-02-24 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-13485:
---

 Summary: Dataset API foundation in Spark 2.0
 Key: SPARK-13485
 URL: https://issues.apache.org/jira/browse/SPARK-13485
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


As part of Spark 2.0, we want to create a stable API foundation for Dataset to 
become the main user-facing API in Spark. This ticket tracks various tasks 
related to that.

The main high level changes are:

1. Merge Dataset/DataFrame
2. Create a more natural entry point for Dataset (SQLContext is not ideal 
because of the name "SQL")
3. First class support for sessions
4. First class support for some system catalog






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13447) Fix AM failure situation for dynamic allocation disabled situation

2016-02-24 Thread Saisai Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-13447:

Summary: Fix AM failure situation for dynamic allocation disabled situation 
 (was: Fix AM failure situation for dynamic allocation diabled situation)

> Fix AM failure situation for dynamic allocation disabled situation
> --
>
> Key: SPARK-13447
> URL: https://issues.apache.org/jira/browse/SPARK-13447
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.6.0
>Reporter: Saisai Shao
>
> Because of lag of executor disconnection events, stale states in the 
> {{CoarseGrainedSchedulerBacked}} will ruin the newly registered executor 
> information. This situation is handled for dynamic allocation enabled 
> situation, for dynamic allocation disabled scenario, we should also have the 
> similar fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13292) QuantileDiscretizer should take random seed in PySpark

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13292:


Assignee: Apache Spark  (was: Yu Ishikawa)

> QuantileDiscretizer should take random seed in PySpark
> --
>
> Key: SPARK-13292
> URL: https://issues.apache.org/jira/browse/SPARK-13292
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Apache Spark
>Priority: Minor
>
> SPARK-11515 for the Python API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13292) QuantileDiscretizer should take random seed in PySpark

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13292:


Assignee: Yu Ishikawa  (was: Apache Spark)

> QuantileDiscretizer should take random seed in PySpark
> --
>
> Key: SPARK-13292
> URL: https://issues.apache.org/jira/browse/SPARK-13292
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Yu Ishikawa
>Priority: Minor
>
> SPARK-11515 for the Python API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13292) QuantileDiscretizer should take random seed in PySpark

2016-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166704#comment-15166704
 ] 

Apache Spark commented on SPARK-13292:
--

User 'yu-iskw' has created a pull request for this issue:
https://github.com/apache/spark/pull/11362

> QuantileDiscretizer should take random seed in PySpark
> --
>
> Key: SPARK-13292
> URL: https://issues.apache.org/jira/browse/SPARK-13292
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Yu Ishikawa
>Priority: Minor
>
> SPARK-11515 for the Python API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13484) Filter outer joined result using a non-nullable column from the right table

2016-02-24 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-13484:
--
Summary: Filter outer joined result using a non-nullable column from the 
right table  (was: Filter outer joined result using a non-nullable column)

> Filter outer joined result using a non-nullable column from the right table
> ---
>
> Key: SPARK-13484
> URL: https://issues.apache.org/jira/browse/SPARK-13484
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.0, 2.0.0
>Reporter: Xiangrui Meng
>
> Technically speaking, this is not a bug. But
> {code}
> val a = sqlContext.range(10).select(col("id"), lit(0).as("count"))
> val b = sqlContext.range(10).select((col("id") % 
> 3).as("id")).groupBy("id").count()
> a.join(b, a("id") === b("id"), "left_outer").filter(b("count").isNull).show()
> {code}
> returns nothing. This is because `b("count")` is not nullable and the filter 
> condition is always false by static analysis. However, it is common for users 
> to use `a(...)` and `b(...)` to filter the joined result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13484) Filter outer joined result using a non-nullable column

2016-02-24 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-13484:
-

 Summary: Filter outer joined result using a non-nullable column
 Key: SPARK-13484
 URL: https://issues.apache.org/jira/browse/SPARK-13484
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0, 1.5.2, 2.0.0
Reporter: Xiangrui Meng


Technically speaking, this is not a bug. But

{code}
val a = sqlContext.range(10).select(col("id"), lit(0).as("count"))
val b = sqlContext.range(10).select((col("id") % 
3).as("id")).groupBy("id").count()
a.join(b, a("id") === b("id"), "left_outer").filter(b("count").isNull).show()
{code}

returns nothing. This is because `b("count")` is not nullable and the filter 
condition is always false by static analysis. However, it is common for users 
to use `a(...)` and `b(...)` to filter the joined result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13483) URL address error in Spark web ui in YARN mode

2016-02-24 Thread yangping wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated SPARK-13483:

Summary: URL address error in Spark web ui in YARN mode  (was: URL address 
error in Spark web ui on YARN model)

> URL address error in Spark web ui in YARN mode
> --
>
> Key: SPARK-13483
> URL: https://issues.apache.org/jira/browse/SPARK-13483
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.2, 1.6.0
>Reporter: yangping wu
>Priority: Minor
>
> On YARN, when you run a Spark Streaming job, Spark web ui will record the 
> *Active Jobs* and *Completed Jobs* in the 
> http://l-hadoop-proxy-server:9981/proxy/application_XXX/jobs/ page: 
> !https://raw.githubusercontent.com/397090770/iteblog.github.com/master/sparkStreaming.png!
> but the url address for streaming batch is error.  because yarn has to go 
> through a proxy so the base uri is provided (_APPLICATION_WEB_PROXY_BASE_ or 
> _spark.ui.proxyBase_)and has to be on all links, so the right url is 
> +/proxy/application_1453101066555_1416734/streaming/batch/?id=1456370893000+ 
> not +/streaming/batch/?id=1456370893000+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13483) URL address error in Spark web ui on YARN model

2016-02-24 Thread yangping wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated SPARK-13483:

Description: 
On YARN, when you run a Spark Streaming job, Spark web ui will record the 
*Active Jobs* and *Completed Jobs* in the 
http://l-hadoop-proxy-server:9981/proxy/application_XXX/jobs/ page: 
!https://raw.githubusercontent.com/397090770/iteblog.github.com/master/sparkStreaming.png!

but the url address for streaming batch is error.  because yarn has to go 
through a proxy so the base uri is provided (_APPLICATION_WEB_PROXY_BASE_ or 
_spark.ui.proxyBase_)and has to be on all links, so the right url is 
+/proxy/application_1453101066555_1416734/streaming/batch/?id=1456370893000+ 
not +/streaming/batch/?id=1456370893000+

  was:
On YARN, when you run a Spark Streaming job, Spark web ui will record the 
*Active Jobs* and *Completed Jobs* in the 
http://l-hadoop-proxy-server:9981/proxy/application_XXX/jobs/ page: 
!https://raw.githubusercontent.com/397090770/iteblog.github.com/master/sparkStreaming.png!

but the url address for streaming batch is error.  because yarn has to go 
through a proxy so the base uri is provided 
({code}APPLICATION_WEB_PROXY_BASE{code} or {code}spark.ui.proxyBase{code})and 
has to be on all links, so the right url is 
+/proxy/application_1453101066555_1416734/streaming/batch/?id=1456370893000+ 
and not +/streaming/batch/?id=1456370893000+


> URL address error in Spark web ui on YARN model
> ---
>
> Key: SPARK-13483
> URL: https://issues.apache.org/jira/browse/SPARK-13483
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.2, 1.6.0
>Reporter: yangping wu
>Priority: Minor
>
> On YARN, when you run a Spark Streaming job, Spark web ui will record the 
> *Active Jobs* and *Completed Jobs* in the 
> http://l-hadoop-proxy-server:9981/proxy/application_XXX/jobs/ page: 
> !https://raw.githubusercontent.com/397090770/iteblog.github.com/master/sparkStreaming.png!
> but the url address for streaming batch is error.  because yarn has to go 
> through a proxy so the base uri is provided (_APPLICATION_WEB_PROXY_BASE_ or 
> _spark.ui.proxyBase_)and has to be on all links, so the right url is 
> +/proxy/application_1453101066555_1416734/streaming/batch/?id=1456370893000+ 
> not +/streaming/batch/?id=1456370893000+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13321) Support nested UNION in parser

2016-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166685#comment-15166685
 ] 

Apache Spark commented on SPARK-13321:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/11361

> Support nested UNION in parser
> --
>
> Key: SPARK-13321
> URL: https://issues.apache.org/jira/browse/SPARK-13321
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>
> The following SQL can not be parsed with current parser:
> {code}
> SELECT  `u_1`.`id` FROM (((SELECT  `t0`.`id` FROM `default`.`t0`) UNION ALL 
> (SELECT  `t0`.`id` FROM `default`.`t0`)) UNION ALL (SELECT  `t0`.`id` FROM 
> `default`.`t0`)) AS u_1
> {code}
> We should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13483) URL address error in Spark web ui on YARN model

2016-02-24 Thread yangping wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated SPARK-13483:

Description: 
On YARN, when you run a Spark Streaming job, Spark web ui will record the 
*Active Jobs* and *Completed Jobs* in the 
http://l-hadoop-proxy-server:9981/proxy/application_XXX/jobs/ page: 
!https://raw.githubusercontent.com/397090770/iteblog.github.com/master/sparkStreaming.png!

but the url address for streaming batch is error.  because yarn has to go 
through a proxy so the base uri is provided 
({code}APPLICATION_WEB_PROXY_BASE{code} or {code}spark.ui.proxyBase{code})and 
has to be on all links, so the right url is 
+/proxy/application_1453101066555_1416734/streaming/batch/?id=1456370893000+ 
and not +/streaming/batch/?id=1456370893000+

  was:
On YARN, when you run a Spark Streaming job, Spark web ui will record the 
*Active Jobs* and *Completed Jobs* in the 
http://l-hadoop-proxy-server:9981/proxy/application_XXX/jobs/ page: 
!https://raw.githubusercontent.com/397090770/iteblog.github.com/master/sparkStreaming.png!

but the url address for streaming batch is error.  because yarn has to go 
through a proxy so the base uri is provided and has to be on all links, so the 
right url is 
+/proxy/application_1453101066555_1416734/streaming/batch/?id=1456370893000+ 
and not +/streaming/batch/?id=1456370893000+


> URL address error in Spark web ui on YARN model
> ---
>
> Key: SPARK-13483
> URL: https://issues.apache.org/jira/browse/SPARK-13483
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.2, 1.6.0
>Reporter: yangping wu
>Priority: Minor
>
> On YARN, when you run a Spark Streaming job, Spark web ui will record the 
> *Active Jobs* and *Completed Jobs* in the 
> http://l-hadoop-proxy-server:9981/proxy/application_XXX/jobs/ page: 
> !https://raw.githubusercontent.com/397090770/iteblog.github.com/master/sparkStreaming.png!
> but the url address for streaming batch is error.  because yarn has to go 
> through a proxy so the base uri is provided 
> ({code}APPLICATION_WEB_PROXY_BASE{code} or {code}spark.ui.proxyBase{code})and 
> has to be on all links, so the right url is 
> +/proxy/application_1453101066555_1416734/streaming/batch/?id=1456370893000+ 
> and not +/streaming/batch/?id=1456370893000+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13483) URL address error in Spark web ui on YARN model

2016-02-24 Thread yangping wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated SPARK-13483:

Description: 
On YARN, when you run a Spark Streaming job, Spark web ui will record the 
*Active Jobs* and *Completed Jobs* in the 
http://l-hadoop-proxy-server:9981/proxy/application_XXX/jobs/ page: 
!https://raw.githubusercontent.com/397090770/iteblog.github.com/master/sparkStreaming.png!

but the url address for streaming batch is error.  because yarn has to go 
through a proxy so the base uri is provided and has to be on all links, so the 
right url is 
+/proxy/application_1453101066555_1416734/streaming/batch/?id=1456370893000+ 
and not +/streaming/batch/?id=1456370893000+

  was:
On YARN, when you run a Spark Streaming job, Spark web ui will record the 
*Active Jobs* and *Completed Jobs* in the 
+http://l-hadoop-proxy-server:9981/proxy/application_XXX/jobs/+ page: 
!https://raw.githubusercontent.com/397090770/iteblog.github.com/master/sparkStreaming.png!

but the url address for streaming batch is error.  because yarn has to go 
through a proxy so the base uri is provided and has to be on all links, so the 
right url is 
+/proxy/application_1453101066555_1416734/streaming/batch/?id=1456370893000+ 
and not +/streaming/batch/?id=1456370893000+


> URL address error in Spark web ui on YARN model
> ---
>
> Key: SPARK-13483
> URL: https://issues.apache.org/jira/browse/SPARK-13483
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.2, 1.6.0
>Reporter: yangping wu
>Priority: Minor
>
> On YARN, when you run a Spark Streaming job, Spark web ui will record the 
> *Active Jobs* and *Completed Jobs* in the 
> http://l-hadoop-proxy-server:9981/proxy/application_XXX/jobs/ page: 
> !https://raw.githubusercontent.com/397090770/iteblog.github.com/master/sparkStreaming.png!
> but the url address for streaming batch is error.  because yarn has to go 
> through a proxy so the base uri is provided and has to be on all links, so 
> the right url is 
> +/proxy/application_1453101066555_1416734/streaming/batch/?id=1456370893000+ 
> and not +/streaming/batch/?id=1456370893000+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13483) URL address error in Spark web ui on YARN model

2016-02-24 Thread yangping wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated SPARK-13483:

Description: 
On YARN, when you run a Spark Streaming job, Spark web ui will record the 
*Active Jobs* and *Completed Jobs* in the 
+http://l-hadoop-proxy-server:9981/proxy/application_XXX/jobs/+ page: 
!https://raw.githubusercontent.com/397090770/iteblog.github.com/master/sparkStreaming.png!

but the url address for streaming batch is error.  because yarn has to go 
through a proxy so the base uri is provided and has to be on all links, so the 
right url is 
+/proxy/application_1453101066555_1416734/streaming/batch/?id=1456370893000+ 
and not +/streaming/batch/?id=1456370893000+

  was:
On YARN, when you run a Spark Streaming job, Spark web ui will record the 
*Active Jobs* and *Completed Jobs* in the 
+http://l-hadoop-proxy-server:9981/proxy/application_XXX/jobs/+ page: 
!https://raw.githubusercontent.com/397090770/iteblog.github.com/master/sparkStreaming.png!
but the url address for streaming batch is error.  because yarn has to go 
through a proxy so the base uri is provided and has to be on all links, so the 
right url is 
+/proxy/application_1453101066555_1416734/streaming/batch/?id=1456370893000+ 
and not +/streaming/batch/?id=1456370893000+


> URL address error in Spark web ui on YARN model
> ---
>
> Key: SPARK-13483
> URL: https://issues.apache.org/jira/browse/SPARK-13483
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.2, 1.6.0
>Reporter: yangping wu
>Priority: Minor
>
> On YARN, when you run a Spark Streaming job, Spark web ui will record the 
> *Active Jobs* and *Completed Jobs* in the 
> +http://l-hadoop-proxy-server:9981/proxy/application_XXX/jobs/+ page: 
> !https://raw.githubusercontent.com/397090770/iteblog.github.com/master/sparkStreaming.png!
> but the url address for streaming batch is error.  because yarn has to go 
> through a proxy so the base uri is provided and has to be on all links, so 
> the right url is 
> +/proxy/application_1453101066555_1416734/streaming/batch/?id=1456370893000+ 
> and not +/streaming/batch/?id=1456370893000+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13092) Track constraints in ExpressionSet

2016-02-24 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-13092.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11338
[https://github.com/apache/spark/pull/11338]

> Track constraints in ExpressionSet
> --
>
> Key: SPARK-13092
> URL: https://issues.apache.org/jira/browse/SPARK-13092
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Sameer Agarwal
>Assignee: Michael Armbrust
> Fix For: 2.0.0
>
>
> Create a new ExpressionSet that operates similar to an AttributeSet for 
> keeping track of constraints. A nice addition to this will be to try and have 
> it do other type of canonicalization (i.e. don't allow both a = b and b = a).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13483) URL address error in Spark web ui on YARN model

2016-02-24 Thread yangping wu (JIRA)
yangping wu created SPARK-13483:
---

 Summary: URL address error in Spark web ui on YARN model
 Key: SPARK-13483
 URL: https://issues.apache.org/jira/browse/SPARK-13483
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.6.0, 1.5.2
Reporter: yangping wu
Priority: Minor


On YARN, when you run a Spark Streaming job, Spark web ui will record the 
*Active Jobs* and *Completed Jobs* in the 
+http://l-hadoop-proxy-server:9981/proxy/application_XXX/jobs/+ page: 
!https://raw.githubusercontent.com/397090770/iteblog.github.com/master/sparkStreaming.png!
but the url address for streaming batch is error.  because yarn has to go 
through a proxy so the base uri is provided and has to be on all links, so the 
right url is 
+/proxy/application_1453101066555_1416734/streaming/batch/?id=1456370893000+ 
and not +/streaming/batch/?id=1456370893000+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13482) `spark.storage.memoryMapThreshold` has two kind of the value.

2016-02-24 Thread SaintBacchus (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-13482:
-
Description: 
`spark.storage.memoryMapThreshold` has two kind of the value, one is 
2*1024*1024 as integer and the other one is '2m' as string.
"2m" is recommanded in document but it will go wrong if the code goes into 
*TransportConf#memoryMapBytes*.
Useage of the `spark.storage.memoryMapThreshold`:
!https://issues.apache.org/jira/secure/attachment/12789859/2016-02-25_10-41-37.jpg!


  was:
`spark.storage.memoryMapThreshold` has two kind of the value, one is 
2*1024*1024 as integer and the other one is '2m' as string.
"2m" is recommanded in document but it will go wrong if the code goes into 
"TransportConf#memoryMapBytes".
Useage of the `spark.storage.memoryMapThreshold`:
!https://issues.apache.org/jira/secure/attachment/12789859/2016-02-25_10-41-37.jpg!



> `spark.storage.memoryMapThreshold` has two kind of the value.
> -
>
> Key: SPARK-13482
> URL: https://issues.apache.org/jira/browse/SPARK-13482
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 1.5.1, 1.6.0, 1.6.1, 2.0.0
>Reporter: SaintBacchus
> Attachments: 2016-02-25_10-41-37.jpg
>
>
> `spark.storage.memoryMapThreshold` has two kind of the value, one is 
> 2*1024*1024 as integer and the other one is '2m' as string.
> "2m" is recommanded in document but it will go wrong if the code goes into 
> *TransportConf#memoryMapBytes*.
> Useage of the `spark.storage.memoryMapThreshold`:
> !https://issues.apache.org/jira/secure/attachment/12789859/2016-02-25_10-41-37.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13482) `spark.storage.memoryMapThreshold` has two kind of the value.

2016-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1512#comment-1512
 ] 

Apache Spark commented on SPARK-13482:
--

User 'SaintBacchus' has created a pull request for this issue:
https://github.com/apache/spark/pull/11360

> `spark.storage.memoryMapThreshold` has two kind of the value.
> -
>
> Key: SPARK-13482
> URL: https://issues.apache.org/jira/browse/SPARK-13482
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 1.5.1, 1.6.0, 1.6.1, 2.0.0
>Reporter: SaintBacchus
> Attachments: 2016-02-25_10-41-37.jpg
>
>
> `spark.storage.memoryMapThreshold` has two kind of the value, one is 
> 2*1024*1024 as integer and the other one is '2m' as string.
> "2m" is recommanded in document but it will go wrong if the code goes into 
> "TransportConf#memoryMapBytes".
> Useage of the `spark.storage.memoryMapThreshold`:
> !https://issues.apache.org/jira/secure/attachment/12789859/2016-02-25_10-41-37.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13482) `spark.storage.memoryMapThreshold` has two kind of the value.

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13482:


Assignee: (was: Apache Spark)

> `spark.storage.memoryMapThreshold` has two kind of the value.
> -
>
> Key: SPARK-13482
> URL: https://issues.apache.org/jira/browse/SPARK-13482
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 1.5.1, 1.6.0, 1.6.1, 2.0.0
>Reporter: SaintBacchus
> Attachments: 2016-02-25_10-41-37.jpg
>
>
> `spark.storage.memoryMapThreshold` has two kind of the value, one is 
> 2*1024*1024 as integer and the other one is '2m' as string.
> "2m" is recommanded in document but it will go wrong if the code goes into 
> "TransportConf#memoryMapBytes".
> Useage of the `spark.storage.memoryMapThreshold`:
> !https://issues.apache.org/jira/secure/attachment/12789859/2016-02-25_10-41-37.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13482) `spark.storage.memoryMapThreshold` has two kind of the value.

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13482:


Assignee: Apache Spark

> `spark.storage.memoryMapThreshold` has two kind of the value.
> -
>
> Key: SPARK-13482
> URL: https://issues.apache.org/jira/browse/SPARK-13482
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 1.5.1, 1.6.0, 1.6.1, 2.0.0
>Reporter: SaintBacchus
>Assignee: Apache Spark
> Attachments: 2016-02-25_10-41-37.jpg
>
>
> `spark.storage.memoryMapThreshold` has two kind of the value, one is 
> 2*1024*1024 as integer and the other one is '2m' as string.
> "2m" is recommanded in document but it will go wrong if the code goes into 
> "TransportConf#memoryMapBytes".
> Useage of the `spark.storage.memoryMapThreshold`:
> !https://issues.apache.org/jira/secure/attachment/12789859/2016-02-25_10-41-37.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13482) `spark.storage.memoryMapThreshold` has two kind of the value.

2016-02-24 Thread SaintBacchus (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-13482:
-
Description: 
`spark.storage.memoryMapThreshold` has two kind of the value, one is 
2*1024*1024 as integer and the other one is '2m' as string.
"2m" is recommanded in document but it will go wrong if the code goes into 
"TransportConf#memoryMapBytes".
!https://issues.apache.org/jira/secure/attachment/12789859/2016-02-25_10-41-37.jpg!


  was:
`spark.storage.memoryMapThreshold` has two kind of the value, one is 
2*1024*1024 as integer and the other one is '2m' as string.
"2m" is recommanded in document but it will go wrong if the code goes into 
"TransportConf#memoryMapBytes".
!!



> `spark.storage.memoryMapThreshold` has two kind of the value.
> -
>
> Key: SPARK-13482
> URL: https://issues.apache.org/jira/browse/SPARK-13482
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 1.5.1, 1.6.0, 1.6.1, 2.0.0
>Reporter: SaintBacchus
> Attachments: 2016-02-25_10-41-37.jpg
>
>
> `spark.storage.memoryMapThreshold` has two kind of the value, one is 
> 2*1024*1024 as integer and the other one is '2m' as string.
> "2m" is recommanded in document but it will go wrong if the code goes into 
> "TransportConf#memoryMapBytes".
> !https://issues.apache.org/jira/secure/attachment/12789859/2016-02-25_10-41-37.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13482) `spark.storage.memoryMapThreshold` has two kind of the value.

2016-02-24 Thread SaintBacchus (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-13482:
-
Description: 
`spark.storage.memoryMapThreshold` has two kind of the value, one is 
2*1024*1024 as integer and the other one is '2m' as string.
"2m" is recommanded in document but it will go wrong if the code goes into 
"TransportConf#memoryMapBytes".
Useage of the `spark.storage.memoryMapThreshold`:
!https://issues.apache.org/jira/secure/attachment/12789859/2016-02-25_10-41-37.jpg!


  was:
`spark.storage.memoryMapThreshold` has two kind of the value, one is 
2*1024*1024 as integer and the other one is '2m' as string.
"2m" is recommanded in document but it will go wrong if the code goes into 
"TransportConf#memoryMapBytes".
!https://issues.apache.org/jira/secure/attachment/12789859/2016-02-25_10-41-37.jpg!



> `spark.storage.memoryMapThreshold` has two kind of the value.
> -
>
> Key: SPARK-13482
> URL: https://issues.apache.org/jira/browse/SPARK-13482
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 1.5.1, 1.6.0, 1.6.1, 2.0.0
>Reporter: SaintBacchus
> Attachments: 2016-02-25_10-41-37.jpg
>
>
> `spark.storage.memoryMapThreshold` has two kind of the value, one is 
> 2*1024*1024 as integer and the other one is '2m' as string.
> "2m" is recommanded in document but it will go wrong if the code goes into 
> "TransportConf#memoryMapBytes".
> Useage of the `spark.storage.memoryMapThreshold`:
> !https://issues.apache.org/jira/secure/attachment/12789859/2016-02-25_10-41-37.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13482) `spark.storage.memoryMapThreshold` has two kind of the value.

2016-02-24 Thread SaintBacchus (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-13482:
-
Description: 
`spark.storage.memoryMapThreshold` has two kind of the value, one is 
2*1024*1024 as integer and the other one is '2m' as string.
"2m" is recommanded in document but it will go wrong if the code goes into 
"TransportConf#memoryMapBytes".
!!


  was:
`spark.storage.memoryMapThreshold` has two kind of the value, one is 
2*1024*1024 as integer and the other one is '2m' as string.
"2m" is recommanded in document but it will go wrong if the code goes into 
TransportConf#memoryMapBytes


> `spark.storage.memoryMapThreshold` has two kind of the value.
> -
>
> Key: SPARK-13482
> URL: https://issues.apache.org/jira/browse/SPARK-13482
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 1.5.1, 1.6.0, 1.6.1, 2.0.0
>Reporter: SaintBacchus
> Attachments: 2016-02-25_10-41-37.jpg
>
>
> `spark.storage.memoryMapThreshold` has two kind of the value, one is 
> 2*1024*1024 as integer and the other one is '2m' as string.
> "2m" is recommanded in document but it will go wrong if the code goes into 
> "TransportConf#memoryMapBytes".
> !!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13482) `spark.storage.memoryMapThreshold` has two kind of the value.

2016-02-24 Thread SaintBacchus (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-13482:
-
Attachment: 2016-02-25_10-41-37.jpg

> `spark.storage.memoryMapThreshold` has two kind of the value.
> -
>
> Key: SPARK-13482
> URL: https://issues.apache.org/jira/browse/SPARK-13482
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 1.5.1, 1.6.0, 1.6.1, 2.0.0
>Reporter: SaintBacchus
> Attachments: 2016-02-25_10-41-37.jpg
>
>
> `spark.storage.memoryMapThreshold` has two kind of the value, one is 
> 2*1024*1024 as integer and the other one is '2m' as string.
> "2m" is recommanded in document but it will go wrong if the code goes into 
> TransportConf#memoryMapBytes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13482) `spark.storage.memoryMapThreshold` has two kind of the value.

2016-02-24 Thread SaintBacchus (JIRA)
SaintBacchus created SPARK-13482:


 Summary: `spark.storage.memoryMapThreshold` has two kind of the 
value.
 Key: SPARK-13482
 URL: https://issues.apache.org/jira/browse/SPARK-13482
 Project: Spark
  Issue Type: Bug
  Components: Block Manager
Affects Versions: 1.6.0, 1.5.1, 1.6.1, 2.0.0
Reporter: SaintBacchus


`spark.storage.memoryMapThreshold` has two kind of the value, one is 
2*1024*1024 as integer and the other one is '2m' as string.
"2m" is recommanded in document but it will go wrong if the code goes into 
TransportConf#memoryMapBytes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13383) Keep broadcast hint after column pruning

2016-02-24 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-13383:
-
Assignee: Liang-Chi Hsieh

> Keep broadcast hint after column pruning
> 
>
> Key: SPARK-13383
> URL: https://issues.apache.org/jira/browse/SPARK-13383
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
> Fix For: 2.0.0
>
>
> When we do column pruning in Optimizer, we put additional Project on top of a 
> logical plan. However, when we already wrap a BroadcastHint on a logical 
> plan, the added Project will hide BroadcastHint after later execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13383) Keep broadcast hint after column pruning

2016-02-24 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166611#comment-15166611
 ] 

Liang-Chi Hsieh commented on SPARK-13383:
-

[~marmbrus] Can you help set the Assignee field? Thanks!

> Keep broadcast hint after column pruning
> 
>
> Key: SPARK-13383
> URL: https://issues.apache.org/jira/browse/SPARK-13383
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
> Fix For: 2.0.0
>
>
> When we do column pruning in Optimizer, we put additional Project on top of a 
> logical plan. However, when we already wrap a BroadcastHint on a logical 
> plan, the added Project will hide BroadcastHint after later execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13123) Add wholestage codegen for sort

2016-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166594#comment-15166594
 ] 

Apache Spark commented on SPARK-13123:
--

User 'sameeragarwal' has created a pull request for this issue:
https://github.com/apache/spark/pull/11359

> Add wholestage codegen for sort
> ---
>
> Key: SPARK-13123
> URL: https://issues.apache.org/jira/browse/SPARK-13123
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Nong Li
>Assignee: Nong Li
>
> It should just implement CodegenSupport. It's future work to have this 
> operator use codegen more effectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13478) Fetching delegation tokens for Hive fails when using proxy users

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13478:


Assignee: (was: Apache Spark)

> Fetching delegation tokens for Hive fails when using proxy users
> 
>
> Key: SPARK-13478
> URL: https://issues.apache.org/jira/browse/SPARK-13478
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> If you use spark-submit's proxy user support, the code that fetches 
> delegation tokens for the Hive Metastore fails. It seems like the Hive 
> library tries to connect to the Metastore as the proxy user, and it doesn't 
> have a Kerberos TGT for that user, so it fails.
> I don't know whether the same issue exists in the HBase code, but I'll make a 
> similar change so that both behave similarly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13478) Fetching delegation tokens for Hive fails when using proxy users

2016-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166586#comment-15166586
 ] 

Apache Spark commented on SPARK-13478:
--

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/11358

> Fetching delegation tokens for Hive fails when using proxy users
> 
>
> Key: SPARK-13478
> URL: https://issues.apache.org/jira/browse/SPARK-13478
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> If you use spark-submit's proxy user support, the code that fetches 
> delegation tokens for the Hive Metastore fails. It seems like the Hive 
> library tries to connect to the Metastore as the proxy user, and it doesn't 
> have a Kerberos TGT for that user, so it fails.
> I don't know whether the same issue exists in the HBase code, but I'll make a 
> similar change so that both behave similarly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13478) Fetching delegation tokens for Hive fails when using proxy users

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13478:


Assignee: Apache Spark

> Fetching delegation tokens for Hive fails when using proxy users
> 
>
> Key: SPARK-13478
> URL: https://issues.apache.org/jira/browse/SPARK-13478
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Marcelo Vanzin
>Assignee: Apache Spark
>Priority: Minor
>
> If you use spark-submit's proxy user support, the code that fetches 
> delegation tokens for the Hive Metastore fails. It seems like the Hive 
> library tries to connect to the Metastore as the proxy user, and it doesn't 
> have a Kerberos TGT for that user, so it fails.
> I don't know whether the same issue exists in the HBase code, but I'll make a 
> similar change so that both behave similarly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13478) Fetching delegation tokens for Hive fails when using proxy users

2016-02-24 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166570#comment-15166570
 ] 

Marcelo Vanzin commented on SPARK-13478:


For the record, here's the exception you get:

{noformat}
16/02/24 18:06:48 ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
[...plus lots of other stuff]
{noformat}

> Fetching delegation tokens for Hive fails when using proxy users
> 
>
> Key: SPARK-13478
> URL: https://issues.apache.org/jira/browse/SPARK-13478
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> If you use spark-submit's proxy user support, the code that fetches 
> delegation tokens for the Hive Metastore fails. It seems like the Hive 
> library tries to connect to the Metastore as the proxy user, and it doesn't 
> have a Kerberos TGT for that user, so it fails.
> I don't know whether the same issue exists in the HBase code, but I'll make a 
> similar change so that both behave similarly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13174) Add API and options for csv data sources

2016-02-24 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-13174:
-
Affects Version/s: 2.0.0

> Add API and options for csv data sources
> 
>
> Key: SPARK-13174
> URL: https://issues.apache.org/jira/browse/SPARK-13174
> Project: Spark
>  Issue Type: New Feature
>  Components: Input/Output
>Affects Versions: 2.0.0
>Reporter: Davies Liu
>
> We should have a API to load csv data source (with some options as 
> arguments), similar to json() and jdbc()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13184) Support minPartitions parameter for JSON and CSV datasources as options

2016-02-24 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-13184:
-
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-12420

> Support minPartitions parameter for JSON and CSV datasources as options
> ---
>
> Key: SPARK-13184
> URL: https://issues.apache.org/jira/browse/SPARK-13184
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> After looking through the pull requests below at Spark CSV datasources,
> https://github.com/databricks/spark-csv/pull/256
> https://github.com/databricks/spark-csv/issues/141
> https://github.com/databricks/spark-csv/pull/186
> It looks Spark might need to be able to set {{minPartitions}}.
> {{repartition()}} or {{coalesce()}} can be alternatives but it looks it needs 
> to shuffle the data for most cases.
> Although I am still not sure if it needs this, I will open this ticket just 
> for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13174) Add API and options for csv data sources

2016-02-24 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166563#comment-15166563
 ] 

Hyukjin Kwon commented on SPARK-13174:
--

[~davies] I carelessly opened (I think) the same issue and resolved that. Would 
you close this if you think it is the same issue with SPARK-13381?

> Add API and options for csv data sources
> 
>
> Key: SPARK-13174
> URL: https://issues.apache.org/jira/browse/SPARK-13174
> Project: Spark
>  Issue Type: New Feature
>  Components: Input/Output
>Reporter: Davies Liu
>
> We should have a API to load csv data source (with some options as 
> arguments), similar to json() and jdbc()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13481) History server page with a default sorting as "desc" time.

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13481:


Assignee: (was: Apache Spark)

> History server page with a default sorting as "desc" time.
> --
>
> Key: SPARK-13481
> URL: https://issues.apache.org/jira/browse/SPARK-13481
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Zhuo Liu
>Priority: Minor
>
> Now by default, it shows as ascending order of appId. We might prefer to 
> display as descending order by default, which will show the latest 
> application at the top.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13481) History server page with a default sorting as "desc" time.

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13481:


Assignee: Apache Spark

> History server page with a default sorting as "desc" time.
> --
>
> Key: SPARK-13481
> URL: https://issues.apache.org/jira/browse/SPARK-13481
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Zhuo Liu
>Assignee: Apache Spark
>Priority: Minor
>
> Now by default, it shows as ascending order of appId. We might prefer to 
> display as descending order by default, which will show the latest 
> application at the top.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13481) History server page with a default sorting as "desc" time.

2016-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166562#comment-15166562
 ] 

Apache Spark commented on SPARK-13481:
--

User 'zhuoliu' has created a pull request for this issue:
https://github.com/apache/spark/pull/11357

> History server page with a default sorting as "desc" time.
> --
>
> Key: SPARK-13481
> URL: https://issues.apache.org/jira/browse/SPARK-13481
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Zhuo Liu
>Priority: Minor
>
> Now by default, it shows as ascending order of appId. We might prefer to 
> display as descending order by default, which will show the latest 
> application at the top.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13478) Fetching delegation tokens for Hive fails when using proxy users

2016-02-24 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166544#comment-15166544
 ] 

Marcelo Vanzin commented on SPARK-13478:


Actually d'oh, I had secure HBase for the testing anyway, so I just checked and 
HBase doesn't have the same problem.

> Fetching delegation tokens for Hive fails when using proxy users
> 
>
> Key: SPARK-13478
> URL: https://issues.apache.org/jira/browse/SPARK-13478
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0, 2.0.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> If you use spark-submit's proxy user support, the code that fetches 
> delegation tokens for the Hive Metastore fails. It seems like the Hive 
> library tries to connect to the Metastore as the proxy user, and it doesn't 
> have a Kerberos TGT for that user, so it fails.
> I don't know whether the same issue exists in the HBase code, but I'll make a 
> similar change so that both behave similarly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13481) History server page with a default sorting as "desc" time.

2016-02-24 Thread Zhuo Liu (JIRA)
Zhuo Liu created SPARK-13481:


 Summary: History server page with a default sorting as "desc" time.
 Key: SPARK-13481
 URL: https://issues.apache.org/jira/browse/SPARK-13481
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Zhuo Liu
Priority: Minor


Now by default, it shows as ascending order of appId. We might prefer to 
display as descending order by default, which will show the latest application 
at the top.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13250) Make vectorized parquet reader work as the build side of a broadcast join

2016-02-24 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-13250.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11141
[https://github.com/apache/spark/pull/11141]

> Make vectorized parquet reader work as the build side of a broadcast join
> -
>
> Key: SPARK-13250
> URL: https://issues.apache.org/jira/browse/SPARK-13250
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Nong Li
> Fix For: 2.0.0
>
>
> The issue is that the build side requires unsafe row in certain 
> optimizations. The vectorized parquet reader explicitly does not want to 
> product unsafe rows in general.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13479) Python API for DataFrame approxQuantile

2016-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166460#comment-15166460
 ] 

Apache Spark commented on SPARK-13479:
--

User 'jkbradley' has created a pull request for this issue:
https://github.com/apache/spark/pull/11356

> Python API for DataFrame approxQuantile
> ---
>
> Key: SPARK-13479
> URL: https://issues.apache.org/jira/browse/SPARK-13479
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SQL
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Minor
>
> Add Python API for approxQuantile DataFrame stat function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13479) Python API for DataFrame approxQuantile

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13479:


Assignee: Joseph K. Bradley  (was: Apache Spark)

> Python API for DataFrame approxQuantile
> ---
>
> Key: SPARK-13479
> URL: https://issues.apache.org/jira/browse/SPARK-13479
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SQL
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Minor
>
> Add Python API for approxQuantile DataFrame stat function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13479) Python API for DataFrame approxQuantile

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13479:


Assignee: Apache Spark  (was: Joseph K. Bradley)

> Python API for DataFrame approxQuantile
> ---
>
> Key: SPARK-13479
> URL: https://issues.apache.org/jira/browse/SPARK-13479
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SQL
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>Priority: Minor
>
> Add Python API for approxQuantile DataFrame stat function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6160) ChiSqSelector should keep test statistic info

2016-02-24 Thread Gayathri Murali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166419#comment-15166419
 ] 

Gayathri Murali commented on SPARK-6160:


Is anyone working on this? If not, I can.

> ChiSqSelector should keep test statistic info
> -
>
> Key: SPARK-6160
> URL: https://issues.apache.org/jira/browse/SPARK-6160
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> It is useful to have the test statistics explaining selected features, but 
> these data are thrown out when constructing the ChiSqSelectorModel.  The data 
> are expensive to recompute, so the ChiSqSelectorModel should store and expose 
> them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13480) Regression with percentile() + function in GROUP BY

2016-02-24 Thread Jaka Jancar (JIRA)
Jaka Jancar created SPARK-13480:
---

 Summary: Regression with percentile() + function in GROUP BY
 Key: SPARK-13480
 URL: https://issues.apache.org/jira/browse/SPARK-13480
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.6.0
Reporter: Jaka Jancar


{code}
SELECT
  percentile(load_time, 0.50)
FROM
  (
select '2000-01-01' queued_at, 100 load_time
union all
select '2000-01-01' queued_at, 110 load_time
union all
select '2000-01-01' queued_at, 120 load_time
  ) t
GROUP BY
  year(queued_at)
{code}

fails with

{code}
Error in SQL statement: SparkException: Job aborted due to stage failure: Task 
0 in stage 6067.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
6067.0 (TID 268774, ip-10-0-163-203.ec2.internal): 
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
attribute, tree: year(cast(queued_at#78201 as date))#78209
at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:86)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:85)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:243)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:243)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:53)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:242)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:233)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:85)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:62)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:62)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.(Projection.scala:62)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:234)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:234)
at 
org.apache.spark.sql.execution.Exchange.org$apache$spark$sql$execution$Exchange$$getPartitionKeyExtractor$1(Exchange.scala:197)
at 
org.apache.spark.sql.execution.Exchange$$anonfun$3.apply(Exchange.scala:209)
at 
org.apache.spark.sql.execution.Exchange$$anonfun$3.apply(Exchange.scala:208)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Couldn't find year(cast(queued_at#78201 
as date))#78209 in [queued_at#78201,load_time#78202]
at scala.sys.package$.error(package.scala:27)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:92)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:86)
at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
... 33 more
{code}

This used to work (not sure whether on 1.5 on 1.4).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-

[jira] [Created] (SPARK-13479) Python API for DataFrame approxQuantile

2016-02-24 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-13479:
-

 Summary: Python API for DataFrame approxQuantile
 Key: SPARK-13479
 URL: https://issues.apache.org/jira/browse/SPARK-13479
 Project: Spark
  Issue Type: New Feature
  Components: PySpark, SQL
Reporter: Joseph K. Bradley
Assignee: Joseph K. Bradley
Priority: Minor


Add Python API for approxQuantile DataFrame stat function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13383) Keep broadcast hint after column pruning

2016-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166366#comment-15166366
 ] 

Apache Spark commented on SPARK-13383:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/11355

> Keep broadcast hint after column pruning
> 
>
> Key: SPARK-13383
> URL: https://issues.apache.org/jira/browse/SPARK-13383
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
> Fix For: 2.0.0
>
>
> When we do column pruning in Optimizer, we put additional Project on top of a 
> logical plan. However, when we already wrap a BroadcastHint on a logical 
> plan, the added Project will hide BroadcastHint after later execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13478) Fetching delegation tokens for Hive fails when using proxy users

2016-02-24 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-13478:
--

 Summary: Fetching delegation tokens for Hive fails when using 
proxy users
 Key: SPARK-13478
 URL: https://issues.apache.org/jira/browse/SPARK-13478
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.6.0, 2.0.0
Reporter: Marcelo Vanzin
Priority: Minor


If you use spark-submit's proxy user support, the code that fetches delegation 
tokens for the Hive Metastore fails. It seems like the Hive library tries to 
connect to the Metastore as the proxy user, and it doesn't have a Kerberos TGT 
for that user, so it fails.

I don't know whether the same issue exists in the HBase code, but I'll make a 
similar change so that both behave similarly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11381) Replace example code in mllib-linear-methods.md using include_example

2016-02-24 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166269#comment-15166269
 ] 

Dongjoon Hyun commented on SPARK-11381:
---

Thank you for assigning me, [~mengxr]!

> Replace example code in mllib-linear-methods.md using include_example
> -
>
> Key: SPARK-11381
> URL: https://issues.apache.org/jira/browse/SPARK-11381
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib
>Reporter: Xusen Yin
>Assignee: Dongjoon Hyun
>  Labels: starter
>
> This is similar to SPARK-11289 but for the example code in 
> mllib-linear-methods.md.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13396) Stop using our internal deprecated .metrics on ExceptionFailure instead use accumUpdates

2016-02-24 Thread Gayathri Murali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163925#comment-15163925
 ] 

Gayathri Murali commented on SPARK-13396:
-

I can work on this

> Stop using our internal deprecated .metrics on ExceptionFailure instead use 
> accumUpdates
> 
>
> Key: SPARK-13396
> URL: https://issues.apache.org/jira/browse/SPARK-13396
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: holdenk
>Priority: Minor
>
> src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala:385: value 
> metrics in class ExceptionFailure is deprecated: use accumUpdates instead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-732) Recomputation of RDDs may result in duplicated accumulator updates

2016-02-24 Thread Jim Lohse (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163921#comment-15163921
 ] 

Jim Lohse commented on SPARK-732:
-

Affects versions only goes to 1.1.0, presumably this is still an issue? Is it 
correct that this is only an issue in transformations, but in actions will work 
correctly? That idea seems to be supported by the docs under 
https://spark.apache.org/docs/latest/programming-guide.html#accumulators-a-nameaccumlinka:
 

"In Java, Spark also supports the more general Accumulable interface to 
accumulate data where the resulting type is not the same as the elements added 
(e.g. build a list by collecting together elements).

For accumulator updates performed inside actions only, Spark guarantees that 
each task’s update to the accumulator will only be applied once, i.e. restarted 
tasks will not update the value. In transformations, users should be aware of 
that each task’s update may be applied more than once if tasks or job stages 
are re-executed."

> Recomputation of RDDs may result in duplicated accumulator updates
> --
>
> Key: SPARK-732
> URL: https://issues.apache.org/jira/browse/SPARK-732
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 0.6.2, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0, 0.8.1, 0.8.2, 
> 0.9.0, 1.0.1, 1.1.0
>Reporter: Josh Rosen
>Assignee: Nan Zhu
>Priority: Blocker
>
> Currently, Spark doesn't guard against duplicated updates to the same 
> accumulator due to recomputations of an RDD.  For example:
> {code}
> val acc = sc.accumulator(0)
> data.map(x => acc += 1; f(x))
> data.count()
> // acc should equal data.count() here
> data.foreach{...}
> // Now, acc = 2 * data.count() because the map() was recomputed.
> {code}
> I think that this behavior is incorrect, especially because this behavior 
> allows the additon or removal of a cache() call to affect the outcome of a 
> computation.
> There's an old TODO to fix this duplicate update issue in the [DAGScheduler 
> code|https://github.com/mesos/spark/blob/ec5e553b418be43aa3f0ccc24e0d5ca9d63504b2/core/src/main/scala/spark/scheduler/DAGScheduler.scala#L494].
> I haven't tested whether recomputation due to blocks being dropped from the 
> cache can trigger duplicate accumulator updates.
> Hypothetically someone could be relying on the current behavior to implement 
> performance counters that track the actual number of computations performed 
> (including recomputations).  To be safe, we could add an explicit warning in 
> the release notes that documents the change in behavior when we fix this.
> Ignoring duplicate updates shouldn't be too hard, but there are a few 
> subtleties.  Currently, we allow accumulators to be used in multiple 
> transformations, so we'd need to detect duplicate updates at the 
> per-transformation level.  I haven't dug too deeply into the scheduler 
> internals, but we might also run into problems where pipelining causes what 
> is logically one set of accumulator updates to show up in two different tasks 
> (e.g. rdd.map(accum += x; ...) and rdd.map(accum += x; ...).count() may cause 
> what's logically the same accumulator update to be applied from two different 
> contexts, complicating the detection of duplicate updates).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13476) Generate does not always output UnsafeRow

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13476:


Assignee: Apache Spark  (was: Davies Liu)

> Generate does not always output UnsafeRow
> -
>
> Key: SPARK-13476
> URL: https://issues.apache.org/jira/browse/SPARK-13476
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Apache Spark
>
> Generate does not output UnsafeRow when join is true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13376) Improve column pruning

2016-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163915#comment-15163915
 ] 

Apache Spark commented on SPARK-13376:
--

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/11354

> Improve column pruning
> --
>
> Key: SPARK-13376
> URL: https://issues.apache.org/jira/browse/SPARK-13376
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> Column pruning could help to skip to columns that are not used by any 
> following operators.
> The current implementation only work with a few of logical plans, we should 
> improve that to support all of them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13476) Generate does not always output UnsafeRow

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13476:


Assignee: Davies Liu  (was: Apache Spark)

> Generate does not always output UnsafeRow
> -
>
> Key: SPARK-13476
> URL: https://issues.apache.org/jira/browse/SPARK-13476
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> Generate does not output UnsafeRow when join is true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13476) Generate does not always output UnsafeRow

2016-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163917#comment-15163917
 ] 

Apache Spark commented on SPARK-13476:
--

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/11354

> Generate does not always output UnsafeRow
> -
>
> Key: SPARK-13476
> URL: https://issues.apache.org/jira/browse/SPARK-13476
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> Generate does not output UnsafeRow when join is true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12634) Make Parameter Descriptions Consistent for PySpark MLlib Tree

2016-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163912#comment-15163912
 ] 

Apache Spark commented on SPARK-12634:
--

User 'BryanCutler' has created a pull request for this issue:
https://github.com/apache/spark/pull/11353

> Make Parameter Descriptions Consistent for PySpark MLlib Tree
> -
>
> Key: SPARK-12634
> URL: https://issues.apache.org/jira/browse/SPARK-12634
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Assignee: Vijay Kiran
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up tree.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-02-24 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163902#comment-15163902
 ] 

Jakob Odersky commented on SPARK-13118:
---

Ah, just realized the context of this issue, it's part of the Dataset API 
super-ticket

> Support for classes defined in package objects
> --
>
> Key: SPARK-13118
> URL: https://issues.apache.org/jira/browse/SPARK-13118
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> When you define a class inside of a package object, the name ends up being 
> something like {{org.mycompany.project.package$MyClass}}.  However, when 
> reflect on this we try and load {{org.mycompany.project.MyClass}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13477) User-facing catalog API

2016-02-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13477:

Description: 
This ticket proposes introducing a user-facing catalog API in Spark 2.0. Please 
see the attached design doc for more information.



> User-facing catalog API
> ---
>
> Key: SPARK-13477
> URL: https://issues.apache.org/jira/browse/SPARK-13477
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Attachments: User-facingCatalogAPI.pdf
>
>
> This ticket proposes introducing a user-facing catalog API in Spark 2.0. 
> Please see the attached design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13477) User-facing catalog API

2016-02-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13477:

Issue Type: New Feature  (was: Bug)

> User-facing catalog API
> ---
>
> Key: SPARK-13477
> URL: https://issues.apache.org/jira/browse/SPARK-13477
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Attachments: User-facingCatalogAPI.pdf
>
>
> This ticket proposes introducing a user-facing catalog API in Spark 2.0. 
> Please see the attached design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13477) User-facing catalog API

2016-02-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13477:

Attachment: User-facingCatalogAPI.pdf

design doc

> User-facing catalog API
> ---
>
> Key: SPARK-13477
> URL: https://issues.apache.org/jira/browse/SPARK-13477
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Attachments: User-facingCatalogAPI.pdf
>
>
> This ticket proposes introducing a user-facing catalog API in Spark 2.0. 
> Please see the attached design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13477) User-facing catalog API

2016-02-24 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-13477:
---

 Summary: User-facing catalog API
 Key: SPARK-13477
 URL: https://issues.apache.org/jira/browse/SPARK-13477
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13476) Generate does not always output UnsafeRow

2016-02-24 Thread Davies Liu (JIRA)
Davies Liu created SPARK-13476:
--

 Summary: Generate does not always output UnsafeRow
 Key: SPARK-13476
 URL: https://issues.apache.org/jira/browse/SPARK-13476
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu
Assignee: Davies Liu


Generate does not output UnsafeRow when join is true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13475) HiveCompatibilitySuite should still run in PR builder even if a PR only changes sql/core

2016-02-24 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-13475.
--
   Resolution: Fixed
Fix Version/s: 1.6.1
   2.0.0

Issue resolved by pull request 11351
[https://github.com/apache/spark/pull/11351]

> HiveCompatibilitySuite should still run in PR builder even if a PR only 
> changes sql/core
> 
>
> Key: SPARK-13475
> URL: https://issues.apache.org/jira/browse/SPARK-13475
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Reporter: Yin Huai
>Assignee: Yin Huai
> Fix For: 2.0.0, 1.6.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12344) Remove env-based configurations

2016-02-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-12344:

Assignee: (was: Reynold Xin)

> Remove env-based configurations
> ---
>
> Key: SPARK-12344
> URL: https://issues.apache.org/jira/browse/SPARK-12344
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, YARN
>Reporter: Marcelo Vanzin
>
> We should remove as many env-based configurations as it makes sense, since 
> they are deprecated and we prefer to use Spark's configuration.
> Tools available through the command line should consistently support both a 
> properties file with configuration keys and the {{--conf}} command line 
> argument such as the one SparkSubmit supports.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13475) HiveCompatibilitySuite should still run in PR builder even if a PR only changes sql/core

2016-02-24 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-13475:
-
Summary: HiveCompatibilitySuite should still run in PR builder even if a PR 
only changes sql/core  (was: HiveCompatibility should still run in PR builder 
even if a PR only changes sql/core)

> HiveCompatibilitySuite should still run in PR builder even if a PR only 
> changes sql/core
> 
>
> Key: SPARK-13475
> URL: https://issues.apache.org/jira/browse/SPARK-13475
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Reporter: Yin Huai
>Assignee: Yin Huai
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13475) HiveCompatibility should still run in PR builder even if a PR only changes sql/core

2016-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163782#comment-15163782
 ] 

Apache Spark commented on SPARK-13475:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/11351

> HiveCompatibility should still run in PR builder even if a PR only changes 
> sql/core
> ---
>
> Key: SPARK-13475
> URL: https://issues.apache.org/jira/browse/SPARK-13475
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Reporter: Yin Huai
>Assignee: Yin Huai
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13475) HiveCompatibility should still run in PR builder even if a PR only changes sql/core

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13475:


Assignee: Yin Huai  (was: Apache Spark)

> HiveCompatibility should still run in PR builder even if a PR only changes 
> sql/core
> ---
>
> Key: SPARK-13475
> URL: https://issues.apache.org/jira/browse/SPARK-13475
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Reporter: Yin Huai
>Assignee: Yin Huai
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13475) HiveCompatibility should still run in PR builder even if a PR only changes sql/core

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13475:


Assignee: Apache Spark  (was: Yin Huai)

> HiveCompatibility should still run in PR builder even if a PR only changes 
> sql/core
> ---
>
> Key: SPARK-13475
> URL: https://issues.apache.org/jira/browse/SPARK-13475
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Reporter: Yin Huai
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13475) HiveCompatibility should still run in PR builder even if a PR only changes sql/core

2016-02-24 Thread Yin Huai (JIRA)
Yin Huai created SPARK-13475:


 Summary: HiveCompatibility should still run in PR builder even if 
a PR only changes sql/core
 Key: SPARK-13475
 URL: https://issues.apache.org/jira/browse/SPARK-13475
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Reporter: Yin Huai
Assignee: Yin Huai






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13467) abstract python function to simplify pyspark code

2016-02-24 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-13467.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 11342
[https://github.com/apache/spark/pull/11342]

> abstract python function to simplify pyspark code
> -
>
> Key: SPARK-13467
> URL: https://issues.apache.org/jira/browse/SPARK-13467
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Priority: Trivial
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13118) Support for classes defined in package objects

2016-02-24 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163755#comment-15163755
 ] 

Jakob Odersky commented on SPARK-13118:
---

Hi Michael,
what's the concrete issue you encounter, is it a (de-)serialization bug?
I ran a simple test with DataFrames containing classes defined in package 
objects and everything worked out fine.

I also quickly checked {{o.a.s.sql.catalyst.ScalaReflection}} but it seems that 
type names are always accessed via native scala reflection utilities.

> Support for classes defined in package objects
> --
>
> Key: SPARK-13118
> URL: https://issues.apache.org/jira/browse/SPARK-13118
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> When you define a class inside of a package object, the name ends up being 
> something like {{org.mycompany.project.package$MyClass}}.  However, when 
> reflect on this we try and load {{org.mycompany.project.MyClass}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13321) Support nested UNION in parser

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13321:


Assignee: Liang-Chi Hsieh  (was: Apache Spark)

> Support nested UNION in parser
> --
>
> Key: SPARK-13321
> URL: https://issues.apache.org/jira/browse/SPARK-13321
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>
> The following SQL can not be parsed with current parser:
> {code}
> SELECT  `u_1`.`id` FROM (((SELECT  `t0`.`id` FROM `default`.`t0`) UNION ALL 
> (SELECT  `t0`.`id` FROM `default`.`t0`)) UNION ALL (SELECT  `t0`.`id` FROM 
> `default`.`t0`)) AS u_1
> {code}
> We should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13321) Support nested UNION in parser

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13321:


Assignee: Apache Spark  (was: Liang-Chi Hsieh)

> Support nested UNION in parser
> --
>
> Key: SPARK-13321
> URL: https://issues.apache.org/jira/browse/SPARK-13321
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> The following SQL can not be parsed with current parser:
> {code}
> SELECT  `u_1`.`id` FROM (((SELECT  `t0`.`id` FROM `default`.`t0`) UNION ALL 
> (SELECT  `t0`.`id` FROM `default`.`t0`)) UNION ALL (SELECT  `t0`.`id` FROM 
> `default`.`t0`)) AS u_1
> {code}
> We should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13321) Support nested UNION in parser

2016-02-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13321:

Target Version/s: 2.0.0

> Support nested UNION in parser
> --
>
> Key: SPARK-13321
> URL: https://issues.apache.org/jira/browse/SPARK-13321
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>
> The following SQL can not be parsed with current parser:
> {code}
> SELECT  `u_1`.`id` FROM (((SELECT  `t0`.`id` FROM `default`.`t0`) UNION ALL 
> (SELECT  `t0`.`id` FROM `default`.`t0`)) UNION ALL (SELECT  `t0`.`id` FROM 
> `default`.`t0`)) AS u_1
> {code}
> We should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-13321) Support nested UNION in parser

2016-02-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin reopened SPARK-13321:
-

> Support nested UNION in parser
> --
>
> Key: SPARK-13321
> URL: https://issues.apache.org/jira/browse/SPARK-13321
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>
> The following SQL can not be parsed with current parser:
> {code}
> SELECT  `u_1`.`id` FROM (((SELECT  `t0`.`id` FROM `default`.`t0`) UNION ALL 
> (SELECT  `t0`.`id` FROM `default`.`t0`)) UNION ALL (SELECT  `t0`.`id` FROM 
> `default`.`t0`)) AS u_1
> {code}
> We should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13321) Support nested UNION in parser

2016-02-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13321:

Fix Version/s: (was: 2.0.0)

> Support nested UNION in parser
> --
>
> Key: SPARK-13321
> URL: https://issues.apache.org/jira/browse/SPARK-13321
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>
> The following SQL can not be parsed with current parser:
> {code}
> SELECT  `u_1`.`id` FROM (((SELECT  `t0`.`id` FROM `default`.`t0`) UNION ALL 
> (SELECT  `t0`.`id` FROM `default`.`t0`)) UNION ALL (SELECT  `t0`.`id` FROM 
> `default`.`t0`)) AS u_1
> {code}
> We should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13474) Update packaging scripts to stage artifacts to home.apache.org

2016-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163735#comment-15163735
 ] 

Apache Spark commented on SPARK-13474:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/11350

> Update packaging scripts to stage artifacts to home.apache.org
> --
>
> Key: SPARK-13474
> URL: https://issues.apache.org/jira/browse/SPARK-13474
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> Due to the people.apache.org -> home.apache.org migration, we need to update 
> our packaging scripts to publish artifacts to the new server. Because the new 
> server only supports sftp instead of ssh, we need to update the scripts to 
> use lftp instead of ssh + rsync.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13474) Update packaging scripts to stage artifacts to home.apache.org

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13474:


Assignee: Apache Spark  (was: Josh Rosen)

> Update packaging scripts to stage artifacts to home.apache.org
> --
>
> Key: SPARK-13474
> URL: https://issues.apache.org/jira/browse/SPARK-13474
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Reporter: Josh Rosen
>Assignee: Apache Spark
>
> Due to the people.apache.org -> home.apache.org migration, we need to update 
> our packaging scripts to publish artifacts to the new server. Because the new 
> server only supports sftp instead of ssh, we need to update the scripts to 
> use lftp instead of ssh + rsync.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13474) Update packaging scripts to stage artifacts to home.apache.org

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13474:


Assignee: Josh Rosen  (was: Apache Spark)

> Update packaging scripts to stage artifacts to home.apache.org
> --
>
> Key: SPARK-13474
> URL: https://issues.apache.org/jira/browse/SPARK-13474
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> Due to the people.apache.org -> home.apache.org migration, we need to update 
> our packaging scripts to publish artifacts to the new server. Because the new 
> server only supports sftp instead of ssh, we need to update the scripts to 
> use lftp instead of ssh + rsync.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13474) Update packaging scripts to stage artifacts to home.apache.org

2016-02-24 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-13474:
--

 Summary: Update packaging scripts to stage artifacts to 
home.apache.org
 Key: SPARK-13474
 URL: https://issues.apache.org/jira/browse/SPARK-13474
 Project: Spark
  Issue Type: Task
  Components: Project Infra
Reporter: Josh Rosen
Assignee: Josh Rosen


Due to the people.apache.org -> home.apache.org migration, we need to update 
our packaging scripts to publish artifacts to the new server. Because the new 
server only supports sftp instead of ssh, we need to update the scripts to use 
lftp instead of ssh + rsync.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11432) Personalized PageRank shouldn't use uniform initialization

2016-02-24 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-11432:
--
Fix Version/s: 1.5.3
   1.4.2

> Personalized PageRank shouldn't use uniform initialization
> --
>
> Key: SPARK-11432
> URL: https://issues.apache.org/jira/browse/SPARK-11432
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 1.5.1
>Reporter: Yves Raimond
>Assignee: Yves Raimond
>Priority: Minor
> Fix For: 1.4.2, 1.5.3, 1.6.0
>
>
> The current implementation of personalized pagerank in GraphX uses uniform 
> initialization over the full graph - every vertex will get initially 
> activated.
> For example:
> {code}
> import org.apache.spark._
> import org.apache.spark.graphx._
> import org.apache.spark.rdd.RDD
> val users: RDD[(VertexId, (String, String))] =
>   sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", 
> "postdoc")),
>(5L, ("franklin", "prof")), (2L, ("istoica", "prof"
> val relationships: RDD[Edge[String]] =
>   sc.parallelize(Array(Edge(3L, 7L, "collab"),Edge(5L, 3L, "advisor"),
>Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi")))
> val defaultUser = ("John Doe", "Missing")
> val graph = Graph(users, relationships, defaultUser)
> graph.staticPersonalizedPageRank(3L, 0, 
> 0.15).vertices.collect.foreach(println)
> {code}
> Leads to all vertices being set to resetProb (0.15), which is different from 
> the behavior described in SPARK-5854, where only the source node should be 
> activated. 
> The risk is that, after a few iterations, the most activated nodes are the 
> source node and the nodes that were untouched by the propagation. For example 
> in the above example the vertex 2L will always have an activation of 0.15:
> {code}
> graph.personalizedPageRank(3L, 0, 0.15).vertices.collect.foreach(println)
> {code}
> Which leads into a higher score for 2L than for 7L and 5L, even though 
> there's no outbound path from 3L to 2L.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13376) Improve column pruning

2016-02-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13376:

Fix Version/s: (was: 2.0.0)

> Improve column pruning
> --
>
> Key: SPARK-13376
> URL: https://issues.apache.org/jira/browse/SPARK-13376
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> Column pruning could help to skip to columns that are not used by any 
> following operators.
> The current implementation only work with a few of logical plans, we should 
> improve that to support all of them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12713) UI Executor page should keep links around to executors that died

2016-02-24 Thread Alex Bozarth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163718#comment-15163718
 ] 

Alex Bozarth commented on SPARK-12713:
--

SPARK-7729 has fixed this

> UI Executor page should keep links around to executors that died
> 
>
> Key: SPARK-12713
> URL: https://issues.apache.org/jira/browse/SPARK-12713
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.5.2
>Reporter: Thomas Graves
>
> When an executor dies the web ui no longer shows it in the executors page 
> which makes getting to the logs to see what happened very difficult.  I'm 
> running on yarn so not sure if behavior is different in standalone mode.
> We should figure out a way to keep links around to the ones that died so we 
> can show stats and log links.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13376) Improve column pruning

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13376:


Assignee: Apache Spark  (was: Davies Liu)

> Improve column pruning
> --
>
> Key: SPARK-13376
> URL: https://issues.apache.org/jira/browse/SPARK-13376
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Apache Spark
>
> Column pruning could help to skip to columns that are not used by any 
> following operators.
> The current implementation only work with a few of logical plans, we should 
> improve that to support all of them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8172) Driver UI should enable viewing of dead executors' logs

2016-02-24 Thread Alex Bozarth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163717#comment-15163717
 ] 

Alex Bozarth commented on SPARK-8172:
-

I believe this has been fixed by SPARK-7729

> Driver UI should enable viewing of dead executors' logs
> ---
>
> Key: SPARK-8172
> URL: https://issues.apache.org/jira/browse/SPARK-8172
> Project: Spark
>  Issue Type: New Feature
>  Components: Web UI
>Reporter: Josh Rosen
>
> If possible, the Spark driver UI's executor page should include a list of 
> dead executors (perhaps of bounded size) and should have log viewer links for 
> viewing those dead executors' logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-13376) Improve column pruning

2016-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-13376:


Assignee: Davies Liu  (was: Apache Spark)

> Improve column pruning
> --
>
> Key: SPARK-13376
> URL: https://issues.apache.org/jira/browse/SPARK-13376
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
>
> Column pruning could help to skip to columns that are not used by any 
> following operators.
> The current implementation only work with a few of logical plans, we should 
> improve that to support all of them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >