[jira] [Updated] (SPARK-16286) Implement stack table generating function

2016-06-28 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-16286:

Summary: Implement stack table generating function  (was: Implement stack 
SQL function)

> Implement stack table generating function
> -
>
> Key: SPARK-16286
> URL: https://issues.apache.org/jira/browse/SPARK-16286
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16281) Implement parse_url SQL function

2016-06-28 Thread Peter Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354572#comment-15354572
 ] 

Peter Lee commented on SPARK-16281:
---

I can work on this one too.


> Implement parse_url SQL function
> 
>
> Key: SPARK-16281
> URL: https://issues.apache.org/jira/browse/SPARK-16281
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16286) Implement stack SQL function

2016-06-28 Thread Peter Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354571#comment-15354571
 ] 

Peter Lee commented on SPARK-16286:
---

This is actually a table generating function.


> Implement stack SQL function
> 
>
> Key: SPARK-16286
> URL: https://issues.apache.org/jira/browse/SPARK-16286
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16277) Implement java_method SQL function

2016-06-28 Thread Peter Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354568#comment-15354568
 ] 

Peter Lee commented on SPARK-16277:
---

I can work on this.


> Implement java_method SQL function
> --
>
> Key: SPARK-16277
> URL: https://issues.apache.org/jira/browse/SPARK-16277
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16284) Implement reflect SQL function

2016-06-28 Thread Peter Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354569#comment-15354569
 ] 

Peter Lee commented on SPARK-16284:
---

I can work on this.


> Implement reflect SQL function
> --
>
> Key: SPARK-16284
> URL: https://issues.apache.org/jira/browse/SPARK-16284
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16276) Implement elt SQL function

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16276:


Assignee: (was: Apache Spark)

> Implement elt SQL function
> --
>
> Key: SPARK-16276
> URL: https://issues.apache.org/jira/browse/SPARK-16276
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16276) Implement elt SQL function

2016-06-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354565#comment-15354565
 ] 

Apache Spark commented on SPARK-16276:
--

User 'petermaxlee' has created a pull request for this issue:
https://github.com/apache/spark/pull/13966

> Implement elt SQL function
> --
>
> Key: SPARK-16276
> URL: https://issues.apache.org/jira/browse/SPARK-16276
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16276) Implement elt SQL function

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16276:


Assignee: Apache Spark

> Implement elt SQL function
> --
>
> Key: SPARK-16276
> URL: https://issues.apache.org/jira/browse/SPARK-16276
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16276) Implement elt SQL function

2016-06-28 Thread Peter Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354551#comment-15354551
 ] 

Peter Lee commented on SPARK-16276:
---

I can work on this.


> Implement elt SQL function
> --
>
> Key: SPARK-16276
> URL: https://issues.apache.org/jira/browse/SPARK-16276
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16286) Implement stack SQL function

2016-06-28 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16286:
---

 Summary: Implement stack SQL function
 Key: SPARK-16286
 URL: https://issues.apache.org/jira/browse/SPARK-16286
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16285) Implement sentences SQL function

2016-06-28 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16285:
---

 Summary: Implement sentences SQL function
 Key: SPARK-16285
 URL: https://issues.apache.org/jira/browse/SPARK-16285
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16288) Implement inline table generating function

2016-06-28 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16288:
---

 Summary: Implement inline table generating function
 Key: SPARK-16288
 URL: https://issues.apache.org/jira/browse/SPARK-16288
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16289) Implement posexplode table generating function

2016-06-28 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16289:
---

 Summary: Implement posexplode table generating function
 Key: SPARK-16289
 URL: https://issues.apache.org/jira/browse/SPARK-16289
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16287) Implement str_to_map SQL function

2016-06-28 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16287:
---

 Summary: Implement str_to_map SQL function
 Key: SPARK-16287
 URL: https://issues.apache.org/jira/browse/SPARK-16287
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16284) Implement reflect SQL function

2016-06-28 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16284:
---

 Summary: Implement reflect SQL function
 Key: SPARK-16284
 URL: https://issues.apache.org/jira/browse/SPARK-16284
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16283) Implement percentile_approx SQL function

2016-06-28 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16283:
---

 Summary: Implement percentile_approx SQL function
 Key: SPARK-16283
 URL: https://issues.apache.org/jira/browse/SPARK-16283
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16282) Implement percentile SQL function

2016-06-28 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16282:
---

 Summary: Implement percentile SQL function
 Key: SPARK-16282
 URL: https://issues.apache.org/jira/browse/SPARK-16282
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16281) Implement parse_url SQL function

2016-06-28 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16281:
---

 Summary: Implement parse_url SQL function
 Key: SPARK-16281
 URL: https://issues.apache.org/jira/browse/SPARK-16281
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16279) Implement map_values SQL function

2016-06-28 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16279:
---

 Summary: Implement map_values SQL function
 Key: SPARK-16279
 URL: https://issues.apache.org/jira/browse/SPARK-16279
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16278) Implement map_keys SQL function

2016-06-28 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16278:
---

 Summary: Implement map_keys SQL function
 Key: SPARK-16278
 URL: https://issues.apache.org/jira/browse/SPARK-16278
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16280) Implement histogram_numeric SQL function

2016-06-28 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16280:
---

 Summary: Implement histogram_numeric SQL function
 Key: SPARK-16280
 URL: https://issues.apache.org/jira/browse/SPARK-16280
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16276) Implement elt SQL function

2016-06-28 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16276:
---

 Summary: Implement elt SQL function
 Key: SPARK-16276
 URL: https://issues.apache.org/jira/browse/SPARK-16276
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16277) Implement java_method SQL function

2016-06-28 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16277:
---

 Summary: Implement java_method SQL function
 Key: SPARK-16277
 URL: https://issues.apache.org/jira/browse/SPARK-16277
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16236) Add Path Option back to Load API in DataFrameReader

2016-06-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354532#comment-15354532
 ] 

Apache Spark commented on SPARK-16236:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/13965

> Add Path Option back to Load API in DataFrameReader
> ---
>
> Key: SPARK-16236
> URL: https://issues.apache.org/jira/browse/SPARK-16236
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> @koertkuipers identified the PR https://github.com/apache/spark/pull/13727/ 
> changed the behavior of `load` API. After the change, the `load` API does not 
> add the value of `path` into the `options`.  Thank you!
> We should add the option `path` back to `load()` API in `DataFrameReader`, if 
> and only if users specify one and only one path in the load API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16275) Implement all the Hive fallback functions

2016-06-28 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-16275:
---

 Summary: Implement all the Hive fallback functions
 Key: SPARK-16275
 URL: https://issues.apache.org/jira/browse/SPARK-16275
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Reynold Xin


As of Spark 2.0, Spark falls back to Hive for only the following built-in 
functions:

{code}
"elt", "hash", "java_method", "histogram_numeric",
"map_keys", "map_values",
"parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
"stack", "str_to_map",
"xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
"xpath_long",
"xpath_number", "xpath_short", "xpath_string",

// table generating function
"inline", "posexplode"
{code}

The goal of the ticket is to implement all of these in Spark so we don't need 
to fall back into Hive's UDFs.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16095) Yarn cluster mode should return consistent result for command line and SparkLauncher

2016-06-28 Thread Peng Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354509#comment-15354509
 ] 

Peng Zhang commented on SPARK-16095:


[~tgraves] [~sowen]
I have added a test case for explaining this issue, please take a look.

> Yarn cluster mode should return consistent result for command line and 
> SparkLauncher
> 
>
> Key: SPARK-16095
> URL: https://issues.apache.org/jira/browse/SPARK-16095
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Peng Zhang
>
> For an application with YarnApplicationState.FINISHED and 
> FinalApplicationStatus.FAILED, invoking spark-submit from command line will 
> got Exception, submit with SparkLauncher will got state with FINISHED which 
> means app succeeded.
> Also because the above fact, in test YarnClusterSuite, assert with false 
> condition will not fail the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16274) Implement xpath_boolean

2016-06-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354504#comment-15354504
 ] 

Apache Spark commented on SPARK-16274:
--

User 'petermaxlee' has created a pull request for this issue:
https://github.com/apache/spark/pull/13964

> Implement xpath_boolean
> ---
>
> Key: SPARK-16274
> URL: https://issues.apache.org/jira/browse/SPARK-16274
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Peter Lee
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16274) Implement xpath_boolean

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16274:


Assignee: (was: Apache Spark)

> Implement xpath_boolean
> ---
>
> Key: SPARK-16274
> URL: https://issues.apache.org/jira/browse/SPARK-16274
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Peter Lee
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16274) Implement xpath_boolean

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16274:


Assignee: Apache Spark

> Implement xpath_boolean
> ---
>
> Key: SPARK-16274
> URL: https://issues.apache.org/jira/browse/SPARK-16274
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Peter Lee
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16271) Implement Hive's UDFXPathUtil

2016-06-28 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-16271.
-
   Resolution: Fixed
 Assignee: Peter Lee
Fix Version/s: 2.1.0

> Implement Hive's UDFXPathUtil
> -
>
> Key: SPARK-16271
> URL: https://issues.apache.org/jira/browse/SPARK-16271
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Peter Lee
>Assignee: Peter Lee
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16250) Can't use escapeQuotes option in DataFrameWriter.csv()

2016-06-28 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354496#comment-15354496
 ] 

Hyukjin Kwon commented on SPARK-16250:
--

This was fixed in SPARK-16259 together.

> Can't use escapeQuotes option in DataFrameWriter.csv()
> --
>
> Key: SPARK-16250
> URL: https://issues.apache.org/jira/browse/SPARK-16250
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>Priority: Trivial
>
> I think it was done by mistake as below:
> {code}
> if nullValue is not None:
> self.option("nullValue", nullValue)
> if escapeQuotes is not None:
> self.option("escapeQuotes", nullValue)
> {code}
> This is using {{nullValue}} for {{escapeQuotes}} option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16273) NoSuchMethodError on SparkContext.rddToPairRDDFunctions

2016-06-28 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354486#comment-15354486
 ] 

Shixiong Zhu commented on SPARK-16273:
--

This method has been moved into RDD object. You need to recompile that with 
Spark 2.0.

> NoSuchMethodError on SparkContext.rddToPairRDDFunctions
> ---
>
> Key: SPARK-16273
> URL: https://issues.apache.org/jira/browse/SPARK-16273
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
> Environment: Ubuntu Server 14.04.3 LTS
>Reporter: Fu Zhouwang
>
> The exception occurs when I tried to run TeraSort 
> https://github.com/ehiggs/spark-terasort on Preview release of Spark 2.0, 
> when _saveAsNewAPIHadoopFile_ or _sc.newAPIHadoopFile_ is called.
> The exception is  
> bq. java.lang.NoSuchMethodError: 
> org.apache.spark.SparkContext$.rddToPairRDDFunctions(Lorg/apache/spark/rdd/RDD;Lscala/reflect/ClassTag;Lscala/reflect/ClassTag;Lscala/math/Ordering;)Lorg/apache/spark/rdd/PairRDDFunctions;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-16168) Spark sql can not read ORC table

2016-06-28 Thread AnfengYuan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AnfengYuan closed SPARK-16168.
--
Resolution: Duplicate

> Spark sql can not read ORC table
> 
>
> Key: SPARK-16168
> URL: https://issues.apache.org/jira/browse/SPARK-16168
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.1.0
>Reporter: AnfengYuan
>
> When using spark-sql shell to query orc table, exceptions are thrown:
> My table was generated by the tool in 
> https://github.com/hortonworks/hive-testbench
> {code}
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1429)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1417)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1416)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1416)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1638)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1597)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1586)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1872)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1885)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1898)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:347)
>   at 
> org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:39)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:310)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$hiveResultString$3.apply(QueryExecution.scala:131)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$hiveResultString$3.apply(QueryExecution.scala:130)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at 
> org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:130)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:323)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:239)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.IllegalArgumentException: Field "i_item_sk" does not 
> exist.
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:254)
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:254)
>   at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>   at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>   at 
> org.apache.spark.sql.types.StructType.fieldIndex(StructType.scala:253)
>   at 
> 

[jira] [Commented] (SPARK-16168) Spark sql can not read ORC table

2016-06-28 Thread AnfengYuan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354478#comment-15354478
 ] 

AnfengYuan commented on SPARK-16168:


I'm closing this now since it is an known issue 
[https://issues.apache.org/jira/browse/SPARK-16168] and still not resolved.

> Spark sql can not read ORC table
> 
>
> Key: SPARK-16168
> URL: https://issues.apache.org/jira/browse/SPARK-16168
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.1.0
>Reporter: AnfengYuan
>
> When using spark-sql shell to query orc table, exceptions are thrown:
> My table was generated by the tool in 
> https://github.com/hortonworks/hive-testbench
> {code}
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1429)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1417)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1416)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1416)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1638)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1597)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1586)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1872)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1885)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1898)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:347)
>   at 
> org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:39)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:310)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$hiveResultString$3.apply(QueryExecution.scala:131)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$hiveResultString$3.apply(QueryExecution.scala:130)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at 
> org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:130)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:323)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:239)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.IllegalArgumentException: Field "i_item_sk" does not 
> exist.
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:254)
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:254)
>   at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>   at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>   at 
> 

[jira] [Comment Edited] (SPARK-16168) Spark sql can not read ORC table

2016-06-28 Thread AnfengYuan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354478#comment-15354478
 ] 

AnfengYuan edited comment on SPARK-16168 at 6/29/16 3:33 AM:
-

I'm closing this now since it is an known issue 
[https://issues.apache.org/jira/browse/SPARK-14387] and still not resolved.


was (Author: yuananf):
I'm closing this now since it is an known issue 
[https://issues.apache.org/jira/browse/SPARK-16168] and still not resolved.

> Spark sql can not read ORC table
> 
>
> Key: SPARK-16168
> URL: https://issues.apache.org/jira/browse/SPARK-16168
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.1.0
>Reporter: AnfengYuan
>
> When using spark-sql shell to query orc table, exceptions are thrown:
> My table was generated by the tool in 
> https://github.com/hortonworks/hive-testbench
> {code}
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1429)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1417)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1416)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1416)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1638)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1597)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1586)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1872)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1885)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1898)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:347)
>   at 
> org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:39)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:310)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$hiveResultString$3.apply(QueryExecution.scala:131)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$hiveResultString$3.apply(QueryExecution.scala:130)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at 
> org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:130)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:323)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:239)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.IllegalArgumentException: Field "i_item_sk" does not 
> exist.
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$fieldIndex$1.apply(StructType.scala:254)
>   at 
> 

[jira] [Created] (SPARK-16274) Implement xpath_boolean

2016-06-28 Thread Peter Lee (JIRA)
Peter Lee created SPARK-16274:
-

 Summary: Implement xpath_boolean
 Key: SPARK-16274
 URL: https://issues.apache.org/jira/browse/SPARK-16274
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Peter Lee






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16274) Implement xpath_boolean

2016-06-28 Thread Peter Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354465#comment-15354465
 ] 

Peter Lee commented on SPARK-16274:
---

I will have a pull request as soon as 
https://github.com/apache/spark/pull/13961 is merged.


> Implement xpath_boolean
> ---
>
> Key: SPARK-16274
> URL: https://issues.apache.org/jira/browse/SPARK-16274
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Peter Lee
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16273) NoSuchMethodError on SparkContext.rddToPairRDDFunctions

2016-06-28 Thread Fu Zhouwang (JIRA)
Fu Zhouwang created SPARK-16273:
---

 Summary: NoSuchMethodError on SparkContext.rddToPairRDDFunctions
 Key: SPARK-16273
 URL: https://issues.apache.org/jira/browse/SPARK-16273
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.0
 Environment: Ubuntu Server 14.04.3 LTS
Reporter: Fu Zhouwang


The exception occurs when I tried to run TeraSort 
https://github.com/ehiggs/spark-terasort on Preview release of Spark 2.0, when 
_saveAsNewAPIHadoopFile_ or _sc.newAPIHadoopFile_ is called.

The exception is  
bq. java.lang.NoSuchMethodError: 
org.apache.spark.SparkContext$.rddToPairRDDFunctions(Lorg/apache/spark/rdd/RDD;Lscala/reflect/ClassTag;Lscala/reflect/ClassTag;Lscala/math/Ordering;)Lorg/apache/spark/rdd/PairRDDFunctions;




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16245) model loading backward compatibility for ml.feature.PCA

2016-06-28 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-16245:
--
Assignee: Yanbo Liang

> model loading backward compatibility for ml.feature.PCA
> ---
>
> Key: SPARK-16245
> URL: https://issues.apache.org/jira/browse/SPARK-16245
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>
> model loading backward compatibility for ml.feature.PCA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16245) model loading backward compatibility for ml.feature.PCA

2016-06-28 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-16245.
---
   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1

Issue resolved by pull request 13937
[https://github.com/apache/spark/pull/13937]

> model loading backward compatibility for ml.feature.PCA
> ---
>
> Key: SPARK-16245
> URL: https://issues.apache.org/jira/browse/SPARK-16245
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yanbo Liang
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>
> model loading backward compatibility for ml.feature.PCA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16095) Yarn cluster mode should return consistent result for command line and SparkLauncher

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16095:


Assignee: (was: Apache Spark)

> Yarn cluster mode should return consistent result for command line and 
> SparkLauncher
> 
>
> Key: SPARK-16095
> URL: https://issues.apache.org/jira/browse/SPARK-16095
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Peng Zhang
>
> For an application with YarnApplicationState.FINISHED and 
> FinalApplicationStatus.FAILED, invoking spark-submit from command line will 
> got Exception, submit with SparkLauncher will got state with FINISHED which 
> means app succeeded.
> Also because the above fact, in test YarnClusterSuite, assert with false 
> condition will not fail the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16095) Yarn cluster mode should return consistent result for command line and SparkLauncher

2016-06-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354425#comment-15354425
 ] 

Apache Spark commented on SPARK-16095:
--

User 'renozhang' has created a pull request for this issue:
https://github.com/apache/spark/pull/13962

> Yarn cluster mode should return consistent result for command line and 
> SparkLauncher
> 
>
> Key: SPARK-16095
> URL: https://issues.apache.org/jira/browse/SPARK-16095
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Peng Zhang
>
> For an application with YarnApplicationState.FINISHED and 
> FinalApplicationStatus.FAILED, invoking spark-submit from command line will 
> got Exception, submit with SparkLauncher will got state with FINISHED which 
> means app succeeded.
> Also because the above fact, in test YarnClusterSuite, assert with false 
> condition will not fail the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16095) Yarn cluster mode should return consistent result for command line and SparkLauncher

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16095:


Assignee: Apache Spark

> Yarn cluster mode should return consistent result for command line and 
> SparkLauncher
> 
>
> Key: SPARK-16095
> URL: https://issues.apache.org/jira/browse/SPARK-16095
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Peng Zhang
>Assignee: Apache Spark
>
> For an application with YarnApplicationState.FINISHED and 
> FinalApplicationStatus.FAILED, invoking spark-submit from command line will 
> got Exception, submit with SparkLauncher will got state with FINISHED which 
> means app succeeded.
> Also because the above fact, in test YarnClusterSuite, assert with false 
> condition will not fail the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16248) Whitelist the list of Hive fallback functions

2016-06-28 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-16248.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Whitelist the list of Hive fallback functions
> -
>
> Key: SPARK-16248
> URL: https://issues.apache.org/jira/browse/SPARK-16248
> Project: Spark
>  Issue Type: Improvement
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.0.0
>
>
> This patch removes the blind fallback into Hive for functions. Instead, it 
> creates a whitelist and adds only a small number of functions to the 
> whitelist, i.e. the ones we intend to support in the long run in Spark. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16230) Executors self-killing after being assigned tasks while still in init

2016-06-28 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354280#comment-15354280
 ] 

Saisai Shao commented on SPARK-16230:
-

I'm not sure which version Spark are you using? Here is a similar JIRA 
SPARK-13112 address the same issue and it is already fixed (2.0.0). If you're 
using lower version of Spark you could backport this patch to see if this issue 
still occurs.

> Executors self-killing after being assigned tasks while still in init
> -
>
> Key: SPARK-16230
> URL: https://issues.apache.org/jira/browse/SPARK-16230
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Tejas Patil
>Priority: Minor
>
> I see this happening frequently in our prod clusters:
> * EXECUTOR:   
> [CoarseGrainedExecutorBackend|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L61]
>  sends request to register itself to the driver.
> * DRIVER: Registers executor and 
> [replies|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L179]
> * EXECUTOR:  ExecutorBackend receives ACK and [starts creating an 
> Executor|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L81]
> * DRIVER:  Tries to launch a task as it knows there is a new executor. Sends 
> a 
> [LaunchTask|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L268]
>  to this new executor.
> * EXECUTOR:  Executor is not init'ed (one of the reasons I have seen is 
> because it was still trying to register to local external shuffle service). 
> Meanwhile, receives a `LaunchTask`. [Kills 
> itself|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L90]
>  as Executor is not init'ed.
> The driver assumes that Executor is ready to accept tasks as soon as it is 
> registered but thats not true.
> How this affects jobs / cluster:
> * We waste time + resources with these executors but they don't do any 
> meaningful computation.
> * Driver thinks that the executor has started running the task but since the 
> Executor has self killed, it does not tell driver (BTW: this is also another 
> issue which I think could be fixed separately). Driver waits for 10 mins and 
> then declares the executor dead. This adds up to the latency of the job. 
> Plus, failure attempts also gets bumped up for the tasks despite the tasks 
> were never started. For unlucky tasks, this might cause the job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16272) Allow configs to reference other configs, env and system properties

2016-06-28 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354211#comment-15354211
 ] 

Marcelo Vanzin commented on SPARK-16272:


BTW, I have the code mostly ready, just needs more testing and some cleanup. 
I'll hook up the config variable I mentioned above as an example, since that's 
the one that I ran into, and afterwards we can modify other configs to also use 
the functionality.

> Allow configs to reference other configs, env and system properties
> ---
>
> Key: SPARK-16272
> URL: https://issues.apache.org/jira/browse/SPARK-16272
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> Currently, Spark's configuration is static; it is whatever is written to the 
> config file, with some rare exceptions (such as some YARN code that does 
> expansion of Hadoop configuration).
> But there are a few use cases that don't work well in that situation. For 
> example, consider {{spark.sql.hive.metastore.jars}}. It references a list of 
> paths containing the classpath for accessing Hive's metastore. If you're 
> launching an application in cluster mode, it means that whatever is in the 
> configuration of the edge node needs to match the configuration of the random 
> node in the cluster where the driver will actually run.
> This would be easily solved if there was a way to reference system properties 
> or env variables; for example, when YARN launches a container, a bunch of env 
> variables are set, which could be used to modify that path to match the 
> correct location on the node.
> So I'm proposing a change where config properties can opt-in to use this 
> variable expansion feature; it's opt-in to avoid breaking existing code (who 
> knows) and to avoid the extra cost of doing the variable expansion of every 
> config read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16272) Allow configs to reference other configs, env and system properties

2016-06-28 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-16272:
--

 Summary: Allow configs to reference other configs, env and system 
properties
 Key: SPARK-16272
 URL: https://issues.apache.org/jira/browse/SPARK-16272
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Reporter: Marcelo Vanzin
Priority: Minor


Currently, Spark's configuration is static; it is whatever is written to the 
config file, with some rare exceptions (such as some YARN code that does 
expansion of Hadoop configuration).

But there are a few use cases that don't work well in that situation. For 
example, consider {{spark.sql.hive.metastore.jars}}. It references a list of 
paths containing the classpath for accessing Hive's metastore. If you're 
launching an application in cluster mode, it means that whatever is in the 
configuration of the edge node needs to match the configuration of the random 
node in the cluster where the driver will actually run.

This would be easily solved if there was a way to reference system properties 
or env variables; for example, when YARN launches a container, a bunch of env 
variables are set, which could be used to modify that path to match the correct 
location on the node.

So I'm proposing a change where config properties can opt-in to use this 
variable expansion feature; it's opt-in to avoid breaking existing code (who 
knows) and to avoid the extra cost of doing the variable expansion of every 
config read.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16265) Add option to SparkSubmit to ship driver JRE to YARN

2016-06-28 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354207#comment-15354207
 ] 

Saisai Shao commented on SPARK-16265:
-

If you want to run Spark on different JVM other than YARN, you could set 
{{spark.yarn.appMasterEnv.JAVA_HOME=}} and 
{{spark.executorEnv.JAVA_HOME=}}, the prerequisite is that JDK is 
already installed on the cluster. This could partially address your problem. 

> Add option to SparkSubmit to ship driver JRE to YARN
> 
>
> Key: SPARK-16265
> URL: https://issues.apache.org/jira/browse/SPARK-16265
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.6.2
>Reporter: Andrew Duffy
> Fix For: 2.1.0
>
>
> Add an option to {{SparkSubmit}} to allow the driver to package up it's 
> version of the JRE to be shipped to a YARN cluster. This allows deploying 
> Spark applications to a YARN cluster in which its required Java version need 
> not match one of the versions already installed on the YARN cluster, useful 
> in situations in which the Spark Application developer does not have 
> administrative access over the YARN cluster (ex. school or corporate 
> environment) but still wants to use certain language features in their code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16271) Implement Hive's UDFXPathUtil

2016-06-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354198#comment-15354198
 ] 

Apache Spark commented on SPARK-16271:
--

User 'petermaxlee' has created a pull request for this issue:
https://github.com/apache/spark/pull/13961

> Implement Hive's UDFXPathUtil
> -
>
> Key: SPARK-16271
> URL: https://issues.apache.org/jira/browse/SPARK-16271
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Peter Lee
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16271) Implement Hive's UDFXPathUtil

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16271:


Assignee: Apache Spark

> Implement Hive's UDFXPathUtil
> -
>
> Key: SPARK-16271
> URL: https://issues.apache.org/jira/browse/SPARK-16271
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Peter Lee
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16271) Implement Hive's UDFXPathUtil

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16271:


Assignee: (was: Apache Spark)

> Implement Hive's UDFXPathUtil
> -
>
> Key: SPARK-16271
> URL: https://issues.apache.org/jira/browse/SPARK-16271
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Peter Lee
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16271) Implement Hive's UDFXPathUtil

2016-06-28 Thread Peter Lee (JIRA)
Peter Lee created SPARK-16271:
-

 Summary: Implement Hive's UDFXPathUtil
 Key: SPARK-16271
 URL: https://issues.apache.org/jira/browse/SPARK-16271
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Peter Lee






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16270) Implement xpath user defined functions

2016-06-28 Thread Peter Lee (JIRA)
Peter Lee created SPARK-16270:
-

 Summary: Implement xpath user defined functions
 Key: SPARK-16270
 URL: https://issues.apache.org/jira/browse/SPARK-16270
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Peter Lee


Spark SQL currently falls back to Hive for xpath related functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16269) Support null handling for vectorized hashmap during hash aggregate

2016-06-28 Thread Qifan Pu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qifan Pu updated SPARK-16269:
-
External issue URL: https://github.com/apache/spark/pull/13960

> Support null handling for vectorized hashmap during hash aggregate
> --
>
> Key: SPARK-16269
> URL: https://issues.apache.org/jira/browse/SPARK-16269
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Qifan Pu
>Priority: Minor
>
> The current impl of vectorized hashmap does not support null keys. This patch 
> fix the problem by adding `generateFindOrInsertWithNullable()` method in 
> `VectorizedHashMapGenerator.scala`, which code-generates another version of 
> `findOrInsert` that handles null keys. We need null support so the aggregate 
> logic does not have to fallback to BytesToBytesMap. This would also us to 
> remove BytesToBytesMap completely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16269) Support null handling for vectorized hashmap during hash aggregate

2016-06-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354187#comment-15354187
 ] 

Apache Spark commented on SPARK-16269:
--

User 'ooq' has created a pull request for this issue:
https://github.com/apache/spark/pull/13960

> Support null handling for vectorized hashmap during hash aggregate
> --
>
> Key: SPARK-16269
> URL: https://issues.apache.org/jira/browse/SPARK-16269
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Qifan Pu
>Priority: Minor
>
> The current impl of vectorized hashmap does not support null keys. This patch 
> fix the problem by adding `generateFindOrInsertWithNullable()` method in 
> `VectorizedHashMapGenerator.scala`, which code-generates another version of 
> `findOrInsert` that handles null keys. We need null support so the aggregate 
> logic does not have to fallback to BytesToBytesMap. This would also us to 
> remove BytesToBytesMap completely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16269) Support null handling for vectorized hashmap during hash aggregate

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16269:


Assignee: (was: Apache Spark)

> Support null handling for vectorized hashmap during hash aggregate
> --
>
> Key: SPARK-16269
> URL: https://issues.apache.org/jira/browse/SPARK-16269
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Qifan Pu
>Priority: Minor
>
> The current impl of vectorized hashmap does not support null keys. This patch 
> fix the problem by adding `generateFindOrInsertWithNullable()` method in 
> `VectorizedHashMapGenerator.scala`, which code-generates another version of 
> `findOrInsert` that handles null keys. We need null support so the aggregate 
> logic does not have to fallback to BytesToBytesMap. This would also us to 
> remove BytesToBytesMap completely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16269) Support null handling for vectorized hashmap during hash aggregate

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16269:


Assignee: Apache Spark

> Support null handling for vectorized hashmap during hash aggregate
> --
>
> Key: SPARK-16269
> URL: https://issues.apache.org/jira/browse/SPARK-16269
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Qifan Pu
>Assignee: Apache Spark
>Priority: Minor
>
> The current impl of vectorized hashmap does not support null keys. This patch 
> fix the problem by adding `generateFindOrInsertWithNullable()` method in 
> `VectorizedHashMapGenerator.scala`, which code-generates another version of 
> `findOrInsert` that handles null keys. We need null support so the aggregate 
> logic does not have to fallback to BytesToBytesMap. This would also us to 
> remove BytesToBytesMap completely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16269) Support null handling for vectorized hashmap during hash aggregate

2016-06-28 Thread Qifan Pu (JIRA)
Qifan Pu created SPARK-16269:


 Summary: Support null handling for vectorized hashmap during hash 
aggregate
 Key: SPARK-16269
 URL: https://issues.apache.org/jira/browse/SPARK-16269
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: Qifan Pu
Priority: Minor


The current impl of vectorized hashmap does not support null keys. This patch 
fix the problem by adding `generateFindOrInsertWithNullable()` method in 
`VectorizedHashMapGenerator.scala`, which code-generates another version of 
`findOrInsert` that handles null keys. We need null support so the aggregate 
logic does not have to fallback to BytesToBytesMap. This would also us to 
remove BytesToBytesMap completely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16268) SQLContext should import DataStreamReader

2016-06-28 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-16268.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13958
[https://github.com/apache/spark/pull/13958]

> SQLContext should import DataStreamReader
> -
>
> Key: SPARK-16268
> URL: https://issues.apache.org/jira/browse/SPARK-16268
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-16188) Spark sql create a lot of small files

2016-06-28 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust closed SPARK-16188.

Resolution: Not A Bug

This is by design and changes would likely be too disruptive.  The correct 
solution is to use coalesce as suggested earlier.

> Spark sql create a lot of small files
> -
>
> Key: SPARK-16188
> URL: https://issues.apache.org/jira/browse/SPARK-16188
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: spark 1.6.1
>Reporter: cen yuhai
>
> I find that spark sql will create files as many as partition size. When the 
> results are small, there will be too many small files and most of them are 
> empty. 
> Hive have a function to detect the avg of file size. If  avg file size is 
> smaller than "hive.merge.smallfiles.avgsize", hive will add a job to merge 
> files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14351) Optimize ImpurityAggregator for decision trees

2016-06-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353985#comment-15353985
 ] 

Apache Spark commented on SPARK-14351:
--

User 'MechCoder' has created a pull request for this issue:
https://github.com/apache/spark/pull/13959

> Optimize ImpurityAggregator for decision trees
> --
>
> Key: SPARK-14351
> URL: https://issues.apache.org/jira/browse/SPARK-14351
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>
> {{RandomForest.binsToBestSplit}} currently takes a large amount of time.  
> Based on some quick profiling, I believe a big chunk of this is spent in 
> {{ImpurityAggregator.getCalculator}} (which seems to make unnecessary Array 
> copies) and {{RandomForest.calculateImpurityStats}}.
> This JIRA is for:
> * Doing more profiling to confirm that unnecessary time is being spent in 
> some of these methods.
> * Optimizing the implementation
> * Profiling again to confirm the speedups
> Local profiling for large enough examples should suffice, especially since 
> the optimizations should not need to change the amount of data communicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14351) Optimize ImpurityAggregator for decision trees

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14351:


Assignee: (was: Apache Spark)

> Optimize ImpurityAggregator for decision trees
> --
>
> Key: SPARK-14351
> URL: https://issues.apache.org/jira/browse/SPARK-14351
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>
> {{RandomForest.binsToBestSplit}} currently takes a large amount of time.  
> Based on some quick profiling, I believe a big chunk of this is spent in 
> {{ImpurityAggregator.getCalculator}} (which seems to make unnecessary Array 
> copies) and {{RandomForest.calculateImpurityStats}}.
> This JIRA is for:
> * Doing more profiling to confirm that unnecessary time is being spent in 
> some of these methods.
> * Optimizing the implementation
> * Profiling again to confirm the speedups
> Local profiling for large enough examples should suffice, especially since 
> the optimizations should not need to change the amount of data communicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-14351) Optimize ImpurityAggregator for decision trees

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14351:


Assignee: Apache Spark

> Optimize ImpurityAggregator for decision trees
> --
>
> Key: SPARK-14351
> URL: https://issues.apache.org/jira/browse/SPARK-14351
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>
> {{RandomForest.binsToBestSplit}} currently takes a large amount of time.  
> Based on some quick profiling, I believe a big chunk of this is spent in 
> {{ImpurityAggregator.getCalculator}} (which seems to make unnecessary Array 
> copies) and {{RandomForest.calculateImpurityStats}}.
> This JIRA is for:
> * Doing more profiling to confirm that unnecessary time is being spent in 
> some of these methods.
> * Optimizing the implementation
> * Profiling again to confirm the speedups
> Local profiling for large enough examples should suffice, especially since 
> the optimizations should not need to change the amount of data communicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16239) SQL issues with cast from date to string around daylight savings time

2016-06-28 Thread Glen Maisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353984#comment-15353984
 ] 

Glen Maisey commented on SPARK-16239:
-

As a layman myself it is not expected that a Date type would need to specify a 
timezone. I would not expect that the Date types would be impacted at all by 
timezone or daylight savings considerations at all (I like to think of dates as 
an integer from a certain date in the past). 

I would expect this sort of behaviour if I were using a Timestamp or Datetime 
datatype.

> SQL issues with cast from date to string around daylight savings time
> -
>
> Key: SPARK-16239
> URL: https://issues.apache.org/jira/browse/SPARK-16239
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Glen Maisey
>Priority: Critical
>
> Hi all,
> I have a dataframe with a date column. When I cast to a string using the 
> spark sql cast function it converts it to the wrong date on certain days. 
> Looking into it, it occurs once a year when summer daylight savings starts.
> I've tried to show this issue the code below. The toString() function works 
> correctly whereas the cast does not.
> Unfortunately my users are using SQL code rather than scala dataframes and 
> therefore this workaround does not apply. This was actually picked up where a 
> user was writing something like "SELECT date1 UNION ALL select date2" where 
> date1 was a string and date2 was a date type. It must be implicitly 
> converting the date to a string which gives this error.
> I'm in the Australia/Sydney timezone (see the time changes here 
> http://www.timeanddate.com/time/zone/australia/sydney) 
> val dates = 
> Array("2014-10-03","2014-10-04","2014-10-05","2014-10-06","2015-10-02","2015-10-03",
>  "2015-10-04", "2015-10-05")
> val df = sc.parallelize(dates)
> .toDF("txn_date")
> .select(col("txn_date").cast("Date"))
> df.select(
> col("txn_date"),
> col("txn_date").cast("Timestamp").alias("txn_date_timestamp"),
> col("txn_date").cast("String").alias("txn_date_str_cast"),
> col("txn_date".toString()).alias("txn_date_str_toString")
> )
> .show()
> +--++-+-+
> |  txn_date|  txn_date_timestamp|txn_date_str_cast|txn_date_str_toString|
> +--++-+-+
> |2014-10-03|2014-10-02 14:00:...|   2014-10-03|   2014-10-03|
> |2014-10-04|2014-10-03 14:00:...|   2014-10-04|   2014-10-04|
> |2014-10-05|2014-10-04 13:00:...|   2014-10-04|   2014-10-05|
> |2014-10-06|2014-10-05 13:00:...|   2014-10-06|   2014-10-06|
> |2015-10-02|2015-10-01 14:00:...|   2015-10-02|   2015-10-02|
> |2015-10-03|2015-10-02 14:00:...|   2015-10-03|   2015-10-03|
> |2015-10-04|2015-10-03 13:00:...|   2015-10-03|   2015-10-04|
> |2015-10-05|2015-10-04 13:00:...|   2015-10-05|   2015-10-05|
> +--++-+-+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16235) "evaluateEachIteration" is returning wrong results when calculated for classification model.

2016-06-28 Thread Mahmoud Rawas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353979#comment-15353979
 ] 

Mahmoud Rawas commented on SPARK-16235:
---

I don't fully agree with the statement 'it has no clear meaning' as 
mathematically it still has a quite acceptable representation for the error 
occurred on the predicted value, please check a quick graph created here 
(https://docs.google.com/spreadsheets/d/1VWr0-IO4KZkPwLdzji9gCd-yvRuKbZ3Hc5xnPfWnhsE/edit?usp=sharing)
And returning to the log-loss that is implemented in spark, I would agree with 
the fact that it will only work within the range of [-1,1], but as a developer 
I would prefer to do the values transformation inside the Loss Calculator so 
the mapping from [0,1] into [-1,1] needs to happen in side LogLoss class, and 
by this way both measure will be satisfied, and then user will have to decide 
which measure to use. 
Your Thoughts?

> "evaluateEachIteration" is returning wrong results when calculated for 
> classification model.
> 
>
> Key: SPARK-16235
> URL: https://issues.apache.org/jira/browse/SPARK-16235
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1, 1.6.2, 2.0.0
>Reporter: Mahmoud Rawas
>
> Basically within the mentioned function there is a code to map the actual 
> value which supposed to be in the range of \[0,1] into the range of \[-1,1], 
> in order to make it compatible with the predicted value produces by a 
> classification mode. 
> {code}
> val remappedData = algo match {
>   case Classification => data.map(x => new LabeledPoint((x.label * 2) - 
> 1, x.features))
>   case _ => data
> }
> {code}
> the problem with this approach is the fact that it will calculate an 
> incorrect error for an example mse will be be 4 time larger than the actual 
> expected mse 
> Instead we should map the predicted value into probability value in [0,1].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16268) SQLContext should import DataStreamReader

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16268:


Assignee: Apache Spark  (was: Shixiong Zhu)

> SQLContext should import DataStreamReader
> -
>
> Key: SPARK-16268
> URL: https://issues.apache.org/jira/browse/SPARK-16268
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Shixiong Zhu
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16268) SQLContext should import DataStreamReader

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16268:


Assignee: Shixiong Zhu  (was: Apache Spark)

> SQLContext should import DataStreamReader
> -
>
> Key: SPARK-16268
> URL: https://issues.apache.org/jira/browse/SPARK-16268
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16268) SQLContext should import DataStreamReader

2016-06-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353975#comment-15353975
 ] 

Apache Spark commented on SPARK-16268:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/13958

> SQLContext should import DataStreamReader
> -
>
> Key: SPARK-16268
> URL: https://issues.apache.org/jira/browse/SPARK-16268
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16268) SQLContext should import DataStreamReader

2016-06-28 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-16268:


 Summary: SQLContext should import DataStreamReader
 Key: SPARK-16268
 URL: https://issues.apache.org/jira/browse/SPARK-16268
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14083) Analyze JVM bytecode and turn closures into Catalyst expressions

2016-06-28 Thread Sean Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353969#comment-15353969
 ] 

Sean Zhong edited comment on SPARK-14083 at 6/29/16 12:17 AM:
--

For typed operation like map, it will first de-serialize InternalRow to type T, 
apply the operation, and then serialize T back to InternalRow.
For un-typed operation like select("column"), it directly operates on 
InternalRow.

If end user defines a custom serializer like Kryo, then it is not possible to 
map typed operation to un-typed operation.


{code}
scala> case class C(c: Int)
scala> val ds: Dataset[C] = Seq(C(1)).toDS
scala> ds.select("c")   // <- Return correct result when using default encoder.
res1: org.apache.spark.sql.DataFrame = [c: int]

scala> implicit val encoder: Encoder[C] = Encoders.kryo[C]   // <- Define a 
Kryo encoder
scala> val ds2: Dataset[C] = Seq(C(1)).toDS
ds2: org.apache.spark.sql.Dataset[C] = [value: binary]  // <- Row is encoded as 
binary by using Kryo encoder

scala> ds2.select("c")  // <- Fails even if "c" is an existing field in class C!
org.apache.spark.sql.AnalysisException: cannot resolve '`c`' given input 
columns: [value];
  ...

{code}


was (Author: clockfly):

For typed operation like map, it will first de-serialize InternalRow to type T, 
apply the operation, and then serialize T back to InternalRow.
For un-typed operation like select("column"), it directly operates on 
InternalRow.

If end user defines a custom serializer like Kryo, then it is not possible to 
map typed operation to un-typed operation.


```
scala> case class C(c: Int)
scala> val ds: Dataset[C] = Seq(C(1)).toDS
scala> ds.select("c")   // <- Return correct result when using default encoder.
res1: org.apache.spark.sql.DataFrame = [c: int]

scala> implicit val encoder: Encoder[C] = Encoders.kryo[C]   // <- Define a 
Kryo encoder
scala> val ds2: Dataset[C] = Seq(C(1)).toDS
ds2: org.apache.spark.sql.Dataset[C] = [value: binary]  // <- Row is encoded as 
binary by using Kryo encoder

scala> ds2.select("c")  // <- Fails even if "c" is an existing field in class C!
org.apache.spark.sql.AnalysisException: cannot resolve '`c`' given input 
columns: [value];
  ...

```

> Analyze JVM bytecode and turn closures into Catalyst expressions
> 
>
> Key: SPARK-14083
> URL: https://issues.apache.org/jira/browse/SPARK-14083
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> One big advantage of the Dataset API is the type safety, at the cost of 
> performance due to heavy reliance on user-defined closures/lambdas. These 
> closures are typically slower than expressions because we have more 
> flexibility to optimize expressions (known data types, no virtual function 
> calls, etc). In many cases, it's actually not going to be very difficult to 
> look into the byte code of these closures and figure out what they are trying 
> to do. If we can understand them, then we can turn them directly into 
> Catalyst expressions for more optimized executions.
> Some examples are:
> {code}
> df.map(_.name)  // equivalent to expression col("name")
> ds.groupBy(_.gender)  // equivalent to expression col("gender")
> df.filter(_.age > 18)  // equivalent to expression GreaterThan(col("age"), 
> lit(18)
> df.map(_.id + 1)  // equivalent to Add(col("age"), lit(1))
> {code}
> The goal of this ticket is to design a small framework for byte code analysis 
> and use that to convert closures/lambdas into Catalyst expressions in order 
> to speed up Dataset execution. It is a little bit futuristic, but I believe 
> it is very doable. The framework should be easy to reason about (e.g. similar 
> to Catalyst).
> Note that a big emphasis on "small" and "easy to reason about". A patch 
> should be rejected if it is too complicated or difficult to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14083) Analyze JVM bytecode and turn closures into Catalyst expressions

2016-06-28 Thread Sean Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353969#comment-15353969
 ] 

Sean Zhong commented on SPARK-14083:



For typed operation like map, it will first de-serialize InternalRow to type T, 
apply the operation, and then serialize T back to InternalRow.
For un-typed operation like select("column"), it directly operates on 
InternalRow.

If end user defines a custom serializer like Kryo, then it is not possible to 
map typed operation to un-typed operation.


```
scala> case class C(c: Int)
scala> val ds: Dataset[C] = Seq(C(1)).toDS
scala> ds.select("c")   // <- Return correct result when using default encoder.
res1: org.apache.spark.sql.DataFrame = [c: int]

scala> implicit val encoder: Encoder[C] = Encoders.kryo[C]   // <- Define a 
Kryo encoder
scala> val ds2: Dataset[C] = Seq(C(1)).toDS
ds2: org.apache.spark.sql.Dataset[C] = [value: binary]  // <- Row is encoded as 
binary by using Kryo encoder

scala> ds2.select("c")  // <- Fails even if "c" is an existing field in class C!
org.apache.spark.sql.AnalysisException: cannot resolve '`c`' given input 
columns: [value];
  ...

```

> Analyze JVM bytecode and turn closures into Catalyst expressions
> 
>
> Key: SPARK-14083
> URL: https://issues.apache.org/jira/browse/SPARK-14083
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> One big advantage of the Dataset API is the type safety, at the cost of 
> performance due to heavy reliance on user-defined closures/lambdas. These 
> closures are typically slower than expressions because we have more 
> flexibility to optimize expressions (known data types, no virtual function 
> calls, etc). In many cases, it's actually not going to be very difficult to 
> look into the byte code of these closures and figure out what they are trying 
> to do. If we can understand them, then we can turn them directly into 
> Catalyst expressions for more optimized executions.
> Some examples are:
> {code}
> df.map(_.name)  // equivalent to expression col("name")
> ds.groupBy(_.gender)  // equivalent to expression col("gender")
> df.filter(_.age > 18)  // equivalent to expression GreaterThan(col("age"), 
> lit(18)
> df.map(_.id + 1)  // equivalent to Add(col("age"), lit(1))
> {code}
> The goal of this ticket is to design a small framework for byte code analysis 
> and use that to convert closures/lambdas into Catalyst expressions in order 
> to speed up Dataset execution. It is a little bit futuristic, but I believe 
> it is very doable. The framework should be easy to reason about (e.g. similar 
> to Catalyst).
> Note that a big emphasis on "small" and "easy to reason about". A patch 
> should be rejected if it is too complicated or difficult to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13944) Separate out local linear algebra as a standalone module without Spark dependency

2016-06-28 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-13944:
--
Description: 
Separate out linear algebra as a standalone module without Spark dependency to 
simplify production deployment. We can call the new module mllib-local, which 
might contain local models in the future.

The major issue is to remove dependencies on user-defined types.

The package name will be changed from mllib to ml. For example, Vector will be 
changed from `org.apache.spark.mllib.linalg.Vector` to 
`org.apache.spark.ml.linalg.Vector`. The return vector type in the new ML 
pipeline will be the one in ML package; however, the existing mllib code will 
not be touched. As a result, this will potentially break the API. Also, when 
the vector is loaded from mllib vector by Spark SQL, the vector will 
automatically converted into the one in ml package.


  was:
Separate out linear algebra as a standalone module without Spark dependency to 
simplify production deployment. We can call the new module spark-mllib-local, 
which might contain local models in the future.

The major issue is to remove dependencies on user-defined types.

The package name will be changed from mllib to ml. For example, Vector will be 
changed from `org.apache.spark.mllib.linalg.Vector` to 
`org.apache.spark.ml.linalg.Vector`. The return vector type in the new ML 
pipeline will be the one in ML package; however, the existing mllib code will 
not be touched. As a result, this will potentially break the API. Also, when 
the vector is loaded from mllib vector by Spark SQL, the vector will 
automatically converted into the one in ml package.



> Separate out local linear algebra as a standalone module without Spark 
> dependency
> -
>
> Key: SPARK-13944
> URL: https://issues.apache.org/jira/browse/SPARK-13944
> Project: Spark
>  Issue Type: New Feature
>  Components: Build, ML
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: DB Tsai
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Separate out linear algebra as a standalone module without Spark dependency 
> to simplify production deployment. We can call the new module mllib-local, 
> which might contain local models in the future.
> The major issue is to remove dependencies on user-defined types.
> The package name will be changed from mllib to ml. For example, Vector will 
> be changed from `org.apache.spark.mllib.linalg.Vector` to 
> `org.apache.spark.ml.linalg.Vector`. The return vector type in the new ML 
> pipeline will be the one in ML package; however, the existing mllib code will 
> not be touched. As a result, this will potentially break the API. Also, when 
> the vector is loaded from mllib vector by Spark SQL, the vector will 
> automatically converted into the one in ml package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16267) Replace deprecated `CREATE TEMPORARY TABLE` from testsuites

2016-06-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353939#comment-15353939
 ] 

Apache Spark commented on SPARK-16267:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/13956

> Replace deprecated `CREATE TEMPORARY TABLE` from testsuites
> ---
>
> Key: SPARK-16267
> URL: https://issues.apache.org/jira/browse/SPARK-16267
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Reporter: Dongjoon Hyun
>Priority: Trivial
>
> After SPARK-15674, `DDLStrategy` prints out the following deprecation 
> messages in the testsuites.
> {code}
> 12:10:53.284 WARN org.apache.spark.sql.execution.SparkStrategies$DDLStrategy: 
> CREATE TEMPORARY TABLE normal_orc_source USING... is deprecated, please use 
> CREATE TEMPORARY VIEW viewName USING... instead
> {code}
> - JDBCWriteSuite: 14
> - DDLSuite: 6
> - TableScanSuite: 6
> - ParquetSourceSuite: 5
> - OrcSourceSuite: 2
> - SQLQuerySuite: 2
> - HiveCommandSuite: 2
> - JsonSuite: 1
> - PrunedScanSuite: 1
> - FilteredScanSuite  1
> This PR replaces `CREATE TEMPORARY TABLE` with `CREATE TEMPORARY VIEW` in 
> order to remove the deprecation messages except `DDLSuite`, `SQLQuerySuite`, 
> `HiveCommandSuite`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16267) Replace deprecated `CREATE TEMPORARY TABLE` from testsuites

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16267:


Assignee: Apache Spark

> Replace deprecated `CREATE TEMPORARY TABLE` from testsuites
> ---
>
> Key: SPARK-16267
> URL: https://issues.apache.org/jira/browse/SPARK-16267
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Trivial
>
> After SPARK-15674, `DDLStrategy` prints out the following deprecation 
> messages in the testsuites.
> {code}
> 12:10:53.284 WARN org.apache.spark.sql.execution.SparkStrategies$DDLStrategy: 
> CREATE TEMPORARY TABLE normal_orc_source USING... is deprecated, please use 
> CREATE TEMPORARY VIEW viewName USING... instead
> {code}
> - JDBCWriteSuite: 14
> - DDLSuite: 6
> - TableScanSuite: 6
> - ParquetSourceSuite: 5
> - OrcSourceSuite: 2
> - SQLQuerySuite: 2
> - HiveCommandSuite: 2
> - JsonSuite: 1
> - PrunedScanSuite: 1
> - FilteredScanSuite  1
> This PR replaces `CREATE TEMPORARY TABLE` with `CREATE TEMPORARY VIEW` in 
> order to remove the deprecation messages except `DDLSuite`, `SQLQuerySuite`, 
> `HiveCommandSuite`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16267) Replace deprecated `CREATE TEMPORARY TABLE` from testsuites

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16267:


Assignee: (was: Apache Spark)

> Replace deprecated `CREATE TEMPORARY TABLE` from testsuites
> ---
>
> Key: SPARK-16267
> URL: https://issues.apache.org/jira/browse/SPARK-16267
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Reporter: Dongjoon Hyun
>Priority: Trivial
>
> After SPARK-15674, `DDLStrategy` prints out the following deprecation 
> messages in the testsuites.
> {code}
> 12:10:53.284 WARN org.apache.spark.sql.execution.SparkStrategies$DDLStrategy: 
> CREATE TEMPORARY TABLE normal_orc_source USING... is deprecated, please use 
> CREATE TEMPORARY VIEW viewName USING... instead
> {code}
> - JDBCWriteSuite: 14
> - DDLSuite: 6
> - TableScanSuite: 6
> - ParquetSourceSuite: 5
> - OrcSourceSuite: 2
> - SQLQuerySuite: 2
> - HiveCommandSuite: 2
> - JsonSuite: 1
> - PrunedScanSuite: 1
> - FilteredScanSuite  1
> This PR replaces `CREATE TEMPORARY TABLE` with `CREATE TEMPORARY VIEW` in 
> order to remove the deprecation messages except `DDLSuite`, `SQLQuerySuite`, 
> `HiveCommandSuite`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16267) Replace deprecated `CREATE TEMPORARY TABLE` from testsuites

2016-06-28 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-16267:
--
Description: 
After SPARK-15674, `DDLStrategy` prints out the following deprecation messages 
in the testsuites.

{code}
12:10:53.284 WARN org.apache.spark.sql.execution.SparkStrategies$DDLStrategy: 
CREATE TEMPORARY TABLE normal_orc_source USING... is deprecated, please use 
CREATE TEMPORARY VIEW viewName USING... instead
{code}

- JDBCWriteSuite: 14
- DDLSuite: 6
- TableScanSuite: 6
- ParquetSourceSuite: 5
- OrcSourceSuite: 2
- SQLQuerySuite: 2
- HiveCommandSuite: 2
- JsonSuite: 1
- PrunedScanSuite: 1
- FilteredScanSuite  1

This PR replaces `CREATE TEMPORARY TABLE` with `CREATE TEMPORARY VIEW` in order 
to remove the deprecation messages except `DDLSuite`, `SQLQuerySuite`, 
`HiveCommandSuite`.


  was:
After SPARK-15674, `DDLStrategy` prints out the following deprecation messages 
in the testsuites.

{code}
12:10:53.284 WARN org.apache.spark.sql.execution.SparkStrategies$DDLStrategy: 
CREATE TEMPORARY TABLE normal_orc_source USING... is deprecated, please use 
CREATE TEMPORARY VIEW viewName USING... instead
{code}

- JDBCWriteSuite: 14
- DDLSuite: 6
- TableScanSuite: 6
- ParquetSourceSuite: 5
- OrcSourceSuite: 2
- SQLQuerySuite: 2
- HiveCommandSuite: 2
- JsonSuite: 1
- PrunedScanSuite: 1
- FilteredScanSuite  1

This PR replaces `CREATE TEMPORARY TABLE` with `CREATE TEMPORARY VIEW` in order 
to remove the deprecation messages except `DDLSuite` and `HiveCommandSuite`.



> Replace deprecated `CREATE TEMPORARY TABLE` from testsuites
> ---
>
> Key: SPARK-16267
> URL: https://issues.apache.org/jira/browse/SPARK-16267
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Reporter: Dongjoon Hyun
>Priority: Trivial
>
> After SPARK-15674, `DDLStrategy` prints out the following deprecation 
> messages in the testsuites.
> {code}
> 12:10:53.284 WARN org.apache.spark.sql.execution.SparkStrategies$DDLStrategy: 
> CREATE TEMPORARY TABLE normal_orc_source USING... is deprecated, please use 
> CREATE TEMPORARY VIEW viewName USING... instead
> {code}
> - JDBCWriteSuite: 14
> - DDLSuite: 6
> - TableScanSuite: 6
> - ParquetSourceSuite: 5
> - OrcSourceSuite: 2
> - SQLQuerySuite: 2
> - HiveCommandSuite: 2
> - JsonSuite: 1
> - PrunedScanSuite: 1
> - FilteredScanSuite  1
> This PR replaces `CREATE TEMPORARY TABLE` with `CREATE TEMPORARY VIEW` in 
> order to remove the deprecation messages except `DDLSuite`, `SQLQuerySuite`, 
> `HiveCommandSuite`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16267) Replace deprecate `CREATE TEMPORARY TABLE` from testsuites

2016-06-28 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-16267:
--
Summary: Replace deprecate `CREATE TEMPORARY TABLE` from testsuites  (was: 
Remove deprecate `CREATE TEMPORARY TABLE` from testsuites)

> Replace deprecate `CREATE TEMPORARY TABLE` from testsuites
> --
>
> Key: SPARK-16267
> URL: https://issues.apache.org/jira/browse/SPARK-16267
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Reporter: Dongjoon Hyun
>Priority: Trivial
>
> After SPARK-15674, `DDLStrategy` prints out the following deprecation 
> messages in the testsuites.
> {code}
> 12:10:53.284 WARN org.apache.spark.sql.execution.SparkStrategies$DDLStrategy: 
> CREATE TEMPORARY TABLE normal_orc_source USING... is deprecated, please use 
> CREATE TEMPORARY VIEW viewName USING... instead
> {code}
> - JDBCWriteSuite: 14
> - DDLSuite: 6
> - TableScanSuite: 6
> - ParquetSourceSuite: 5
> - OrcSourceSuite: 2
> - SQLQuerySuite: 2
> - HiveCommandSuite: 2
> - JsonSuite: 1
> - PrunedScanSuite: 1
> - FilteredScanSuite  1
> This PR replaces `CREATE TEMPORARY TABLE` with `CREATE TEMPORARY VIEW` in 
> order to remove the deprecation messages except `DDLSuite` and 
> `HiveCommandSuite`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16267) Remove deprecate `CREATE TEMPORARY TABLE` from testsuites

2016-06-28 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-16267:
-

 Summary: Remove deprecate `CREATE TEMPORARY TABLE` from testsuites
 Key: SPARK-16267
 URL: https://issues.apache.org/jira/browse/SPARK-16267
 Project: Spark
  Issue Type: Test
  Components: Tests
Reporter: Dongjoon Hyun
Priority: Trivial


After SPARK-15674, `DDLStrategy` prints out the following deprecation messages 
in the testsuites.

{code}
12:10:53.284 WARN org.apache.spark.sql.execution.SparkStrategies$DDLStrategy: 
CREATE TEMPORARY TABLE normal_orc_source USING... is deprecated, please use 
CREATE TEMPORARY VIEW viewName USING... instead
{code}

- JDBCWriteSuite: 14
- DDLSuite: 6
- TableScanSuite: 6
- ParquetSourceSuite: 5
- OrcSourceSuite: 2
- SQLQuerySuite: 2
- HiveCommandSuite: 2
- JsonSuite: 1
- PrunedScanSuite: 1
- FilteredScanSuite  1

This PR replaces `CREATE TEMPORARY TABLE` with `CREATE TEMPORARY VIEW` in order 
to remove the deprecation messages except `DDLSuite` and `HiveCommandSuite`.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16267) Replace deprecated `CREATE TEMPORARY TABLE` from testsuites

2016-06-28 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-16267:
--
Summary: Replace deprecated `CREATE TEMPORARY TABLE` from testsuites  (was: 
Replace deprecate `CREATE TEMPORARY TABLE` from testsuites)

> Replace deprecated `CREATE TEMPORARY TABLE` from testsuites
> ---
>
> Key: SPARK-16267
> URL: https://issues.apache.org/jira/browse/SPARK-16267
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Reporter: Dongjoon Hyun
>Priority: Trivial
>
> After SPARK-15674, `DDLStrategy` prints out the following deprecation 
> messages in the testsuites.
> {code}
> 12:10:53.284 WARN org.apache.spark.sql.execution.SparkStrategies$DDLStrategy: 
> CREATE TEMPORARY TABLE normal_orc_source USING... is deprecated, please use 
> CREATE TEMPORARY VIEW viewName USING... instead
> {code}
> - JDBCWriteSuite: 14
> - DDLSuite: 6
> - TableScanSuite: 6
> - ParquetSourceSuite: 5
> - OrcSourceSuite: 2
> - SQLQuerySuite: 2
> - HiveCommandSuite: 2
> - JsonSuite: 1
> - PrunedScanSuite: 1
> - FilteredScanSuite  1
> This PR replaces `CREATE TEMPORARY TABLE` with `CREATE TEMPORARY VIEW` in 
> order to remove the deprecation messages except `DDLSuite` and 
> `HiveCommandSuite`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16006) Attemping to write empty DataFrame with no fields throw non-intuitive exception

2016-06-28 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353914#comment-15353914
 ] 

Dongjoon Hyun commented on SPARK-16006:
---

Hi, [~tdas].
Could you review the PR again?

> Attemping to write empty DataFrame with no fields throw non-intuitive 
> exception
> ---
>
> Key: SPARK-16006
> URL: https://issues.apache.org/jira/browse/SPARK-16006
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Tathagata Das
>Priority: Minor
>
> Attempting to write an emptyDataFrame created with 
> {{sparkSession.emptyDataFrame.write.text("p")}} fails with the following 
> exception
> {code}
> org.apache.spark.sql.AnalysisException: Cannot use all columns for partition 
> columns;
>   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.validatePartitionColumn(PartitioningUtils.scala:355)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:435)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:213)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:196)
>   at org.apache.spark.sql.DataFrameWriter.text(DataFrameWriter.scala:525)
>   ... 48 elided
> {code}
> This is because # fields == # partitioning columns  = 0 at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.validatePartitionColumn(PartitioningUtils.scala:355).
>  This is a non-intuitive error message. Better error message "Cannot write 
> dataset with no fields".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16114) Add network word count example

2016-06-28 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-16114.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13816
[https://github.com/apache/spark/pull/13816]

> Add network word count example
> --
>
> Key: SPARK-16114
> URL: https://issues.apache.org/jira/browse/SPARK-16114
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: James Thomas
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16100) Aggregator fails with Tungsten error when complex types are used for results and partial sum

2016-06-28 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-16100.

   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1

Issue resolved by pull request 13835
[https://github.com/apache/spark/pull/13835]

> Aggregator fails with Tungsten error when complex types are used for results 
> and partial sum
> 
>
> Key: SPARK-16100
> URL: https://issues.apache.org/jira/browse/SPARK-16100
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Deenar Toraskar
>Assignee: Wenchen Fan
> Fix For: 2.0.1, 2.1.0
>
>
> I get a similar error when using complex types in Aggregator. Not sure if 
> this is the same issue or something else.
> {code:Agg.scala}
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.TypedColumn
> import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
> import org.apache.spark.sql.expressions.Aggregator
> import org.apache.spark.sql.{Encoder,Row}
> import sqlContext.implicits._
> object CustomSummer extends Aggregator[Valuation, Map[Int, Seq[Double]], 
> Seq[Seq[Double]]] with Serializable  {
>  def zero: Map[Int, Seq[Double]] = Map()
>  def reduce(b: Map[Int, Seq[Double]], a:Valuation): Map[Int, Seq[Double]] 
> = {
>val timeInterval: Int = a.timeInterval
>val currentSum: Seq[Double] = b.get(timeInterval).getOrElse(Nil)
>val currentRow: Seq[Double] = a.pvs
>b.updated(timeInterval, sumArray(currentSum, currentRow))
>  } 
> def sumArray(a: Seq[Double], b: Seq[Double]): Seq[Double] = Nil
>  def merge(b1: Map[Int, Seq[Double]], b2: Map[Int, Seq[Double]]): 
> Map[Int, Seq[Double]] = {
> /* merges two maps together ++ replaces any (k,v) from the map on the 
> left
> side of ++ (here map1) by (k,v) from the right side map, if (k,_) 
> already
> exists in the left side map (here map1), e.g. Map(1->1) ++ Map(1->2) 
> results in Map(1->2) */
> b1 ++ b2.map { case (timeInterval, exposures) =>
>   timeInterval -> sumArray(exposures, b1.getOrElse(timeInterval, Nil))
> }
>  }
>  def finish(exposures: Map[Int, Seq[Double]]): Seq[Seq[Double]] = 
>   {
> exposures.size match {
>   case 0 => null
>   case _ => {
> val range = exposures.keySet.max
> // convert map to 2 dimensional array, (timeInterval x 
> Seq[expScn1, expScn2, ...]
> (0 to range).map(x => exposures.getOrElse(x, Nil))
>   }
> }
>   }
>   override def bufferEncoder: Encoder[Map[Int,Seq[Double]]] = 
> ExpressionEncoder()
>   override def outputEncoder: Encoder[Seq[Seq[Double]]] = ExpressionEncoder()
>}
> case class Valuation(timeInterval : Int, pvs : Seq[Double])
> val valns = sc.parallelize(Seq(Valuation(0, Seq(1.0,2.0,3.0)),
>   Valuation(2, Seq(1.0,2.0,3.0)),
>   Valuation(1, Seq(1.0,2.0,3.0)),Valuation(2, Seq(1.0,2.0,3.0)),Valuation(0, 
> Seq(1.0,2.0,3.0))
>   )).toDS
> val g_c1 = 
> valns.groupByKey(_.timeInterval).agg(CustomSummer.toColumn).show(false)
> {code}
> I get the following error
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in 
> stage 10.0 failed 1 times, most recent failure: Lost task 1.0 in stage 10.0 
> (TID 19, localhost): java.lang.IndexOutOfBoundsException: 0
> at 
> scala.collection.mutable.ResizableArray$class.apply(ResizableArray.scala:43)
> at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:47)
> at scala.collection.mutable.ArrayBuffer.remove(ArrayBuffer.scala:167)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:244)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.withNewChildren(TreeNode.scala:214)
> at 
> org.apache.spark.sql.catalyst.expressions.Expression.canonicalized$lzycompute(Expression.scala:156)
> at 
> org.apache.spark.sql.catalyst.expressions.Expression.canonicalized(Expression.scala:154)
> at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:155)
> at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$1.apply(Expression.scala:155)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at 
> 

[jira] [Commented] (SPARK-16266) Move python DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming package

2016-06-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353815#comment-15353815
 ] 

Apache Spark commented on SPARK-16266:
--

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/13955

> Move python DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming 
> package
> -
>
> Key: SPARK-16266
> URL: https://issues.apache.org/jira/browse/SPARK-16266
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> Move python DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming 
> package to be consistent with the scala packaging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16266) Move python DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming package

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16266:


Assignee: Tathagata Das  (was: Apache Spark)

> Move python DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming 
> package
> -
>
> Key: SPARK-16266
> URL: https://issues.apache.org/jira/browse/SPARK-16266
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> Move python DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming 
> package to be consistent with the scala packaging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16266) Move python DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming package

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16266:


Assignee: Apache Spark  (was: Tathagata Das)

> Move python DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming 
> package
> -
>
> Key: SPARK-16266
> URL: https://issues.apache.org/jira/browse/SPARK-16266
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Streaming
>Reporter: Tathagata Das
>Assignee: Apache Spark
>
> Move python DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming 
> package to be consistent with the scala packaging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16266) Move python DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming package

2016-06-28 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-16266:
-

 Summary: Move python DataStreamReader/Writer from pyspark.sql to 
pyspark.sql.streaming package
 Key: SPARK-16266
 URL: https://issues.apache.org/jira/browse/SPARK-16266
 Project: Spark
  Issue Type: Sub-task
  Components: SQL, Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das


Move python DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming 
package to be consistent with the scala packaging. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16236) Add Path Option back to Load API in DataFrameReader

2016-06-28 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-16236.
-
   Resolution: Fixed
 Assignee: Xiao Li
Fix Version/s: 2.0.0

> Add Path Option back to Load API in DataFrameReader
> ---
>
> Key: SPARK-16236
> URL: https://issues.apache.org/jira/browse/SPARK-16236
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> @koertkuipers identified the PR https://github.com/apache/spark/pull/13727/ 
> changed the behavior of `load` API. After the change, the `load` API does not 
> add the value of `path` into the `options`.  Thank you!
> We should add the option `path` back to `load()` API in `DataFrameReader`, if 
> and only if users specify one and only one path in the load API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16032) Audit semantics of various insertion operations related to partitioned tables

2016-06-28 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353790#comment-15353790
 ] 

Yin Huai commented on SPARK-16032:
--

Thank you [~rdblue] for the detailed reply. Those are super helpful. Let's 
think about it more and improve it together :)

> Audit semantics of various insertion operations related to partitioned tables
> -
>
> Key: SPARK-16032
> URL: https://issues.apache.org/jira/browse/SPARK-16032
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Wenchen Fan
>Priority: Critical
> Attachments: [SPARK-16032] Spark SQL table insertion auditing - 
> Google Docs.pdf
>
>
> We found that semantics of various insertion operations related to partition 
> tables can be inconsistent. This is an umbrella ticket for all related 
> tickets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-16081) Disallow using "l" as variable name

2016-06-28 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin closed SPARK-16081.
---
Resolution: Won't Fix

See discussion on pr https://github.com/apache/spark/pull/13915

> Disallow using "l" as variable name
> ---
>
> Key: SPARK-16081
> URL: https://issues.apache.org/jira/browse/SPARK-16081
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Reporter: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16174) Improve `OptimizeIn` optimizer to remove deterministic repetitions

2016-06-28 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-16174:
--
Summary: Improve `OptimizeIn` optimizer to remove deterministic repetitions 
 (was: Improve OptimizeIn optimizer to remove deterministic repetitions)

> Improve `OptimizeIn` optimizer to remove deterministic repetitions
> --
>
> Key: SPARK-16174
> URL: https://issues.apache.org/jira/browse/SPARK-16174
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> This issue improves `OptimizeIn` optimizer to remove the deterministic 
> repetitions from SQL `IN` predicates. This optimizer prevents user mistakes 
> and also can optimize some queries like 
> [TPCDS-36|https://github.com/apache/spark/blob/master/sql/core/src/test/resources/tpcds/q36.sql#L19].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16175) Handle None for all Python UDT

2016-06-28 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-16175.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13878
[https://github.com/apache/spark/pull/13878]

> Handle None for all Python UDT
> --
>
> Key: SPARK-16175
> URL: https://issues.apache.org/jira/browse/SPARK-16175
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
> Attachments: nullvector.dbc
>
>
> For Scala UDT, we will not call serialize()/deserialize() for all null, we 
> should also do that in Python. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16265) Add option to SparkSubmit to ship driver JRE to YARN

2016-06-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353709#comment-15353709
 ] 

Apache Spark commented on SPARK-16265:
--

User 'andreweduffy' has created a pull request for this issue:
https://github.com/apache/spark/pull/13953

> Add option to SparkSubmit to ship driver JRE to YARN
> 
>
> Key: SPARK-16265
> URL: https://issues.apache.org/jira/browse/SPARK-16265
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.6.2
>Reporter: Andrew Duffy
> Fix For: 2.1.0
>
>
> Add an option to {{SparkSubmit}} to allow the driver to package up it's 
> version of the JRE to be shipped to a YARN cluster. This allows deploying 
> Spark applications to a YARN cluster in which its required Java version need 
> not match one of the versions already installed on the YARN cluster, useful 
> in situations in which the Spark Application developer does not have 
> administrative access over the YARN cluster (ex. school or corporate 
> environment) but still wants to use certain language features in their code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16265) Add option to SparkSubmit to ship driver JRE to YARN

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16265:


Assignee: Apache Spark

> Add option to SparkSubmit to ship driver JRE to YARN
> 
>
> Key: SPARK-16265
> URL: https://issues.apache.org/jira/browse/SPARK-16265
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.6.2
>Reporter: Andrew Duffy
>Assignee: Apache Spark
> Fix For: 2.1.0
>
>
> Add an option to {{SparkSubmit}} to allow the driver to package up it's 
> version of the JRE to be shipped to a YARN cluster. This allows deploying 
> Spark applications to a YARN cluster in which its required Java version need 
> not match one of the versions already installed on the YARN cluster, useful 
> in situations in which the Spark Application developer does not have 
> administrative access over the YARN cluster (ex. school or corporate 
> environment) but still wants to use certain language features in their code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16265) Add option to SparkSubmit to ship driver JRE to YARN

2016-06-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16265:


Assignee: (was: Apache Spark)

> Add option to SparkSubmit to ship driver JRE to YARN
> 
>
> Key: SPARK-16265
> URL: https://issues.apache.org/jira/browse/SPARK-16265
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.6.2
>Reporter: Andrew Duffy
> Fix For: 2.1.0
>
>
> Add an option to {{SparkSubmit}} to allow the driver to package up it's 
> version of the JRE to be shipped to a YARN cluster. This allows deploying 
> Spark applications to a YARN cluster in which its required Java version need 
> not match one of the versions already installed on the YARN cluster, useful 
> in situations in which the Spark Application developer does not have 
> administrative access over the YARN cluster (ex. school or corporate 
> environment) but still wants to use certain language features in their code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16264) Allow the user to use operators on the received DataFrame

2016-06-28 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353695#comment-15353695
 ] 

Shixiong Zhu commented on SPARK-16264:
--

Yep, probably this one will be "Won't Fix"

> Allow the user to use operators on the received DataFrame
> -
>
> Key: SPARK-16264
> URL: https://issues.apache.org/jira/browse/SPARK-16264
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Shixiong Zhu
>
> Currently Sink cannot apply any operators on the given DataFrame because new 
> DataFrame created by the operator will use QueryExecution rather than 
> IncrementalExecution.
> There are two options to fix this one:
> 1. Merge IncrementalExecution into QueryExecution so that QueryExecution can 
> also deal with streaming operators.
> 2. Make Dataset operators inherits the QueryExecution(IncrementalExecution is 
> just a subclass of IncrementalExecution) from it's parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16175) Handle None for all Python UDT

2016-06-28 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353684#comment-15353684
 ] 

Josh Rosen commented on SPARK-16175:


Here's a published copy of the error reproduction notebook which should be 
publically-viewable: 
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/7035702064814356/2718009245086785/1395282846718893/latest.html

> Handle None for all Python UDT
> --
>
> Key: SPARK-16175
> URL: https://issues.apache.org/jira/browse/SPARK-16175
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Davies Liu
>Assignee: Davies Liu
> Attachments: nullvector.dbc
>
>
> For Scala UDT, we will not call serialize()/deserialize() for all null, we 
> should also do that in Python. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16265) Add option to SparkSubmit to ship driver JRE to YARN

2016-06-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353682#comment-15353682
 ] 

Sean Owen commented on SPARK-16265:
---

I don't think this is in scope for Spark, since I assume you mean that your 
cluster has Java 6, and Spark requires Java 7. Spark simply doesn't support 
that and the answer really is: upgrade. You have other problems besides this 
one in this scenario anyway. If you just want to run Java 8 on a Java 7 cluster 
-- does not seem compelling compared to the complexity.

> Add option to SparkSubmit to ship driver JRE to YARN
> 
>
> Key: SPARK-16265
> URL: https://issues.apache.org/jira/browse/SPARK-16265
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.6.2
>Reporter: Andrew Duffy
> Fix For: 2.1.0
>
>
> Add an option to {{SparkSubmit}} to allow the driver to package up it's 
> version of the JRE to be shipped to a YARN cluster. This allows deploying 
> Spark applications to a YARN cluster in which its required Java version need 
> not match one of the versions already installed on the YARN cluster, useful 
> in situations in which the Spark Application developer does not have 
> administrative access over the YARN cluster (ex. school or corporate 
> environment) but still wants to use certain language features in their code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16259) Cleanup options for DataFrame reader API in Python

2016-06-28 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-16259.
-
   Resolution: Fixed
Fix Version/s: 2.1.0

> Cleanup options for DataFrame reader API in Python
> --
>
> Key: SPARK-16259
> URL: https://issues.apache.org/jira/browse/SPARK-16259
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.1.0
>
>
> There are some duplicated code for options, we should simplify them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16265) Add option to SparkSubmit to ship driver JRE to YARN

2016-06-28 Thread Andrew Duffy (JIRA)
Andrew Duffy created SPARK-16265:


 Summary: Add option to SparkSubmit to ship driver JRE to YARN
 Key: SPARK-16265
 URL: https://issues.apache.org/jira/browse/SPARK-16265
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.6.2
Reporter: Andrew Duffy
 Fix For: 2.1.0


Add an option to {{SparkSubmit}} to allow the driver to package up it's version 
of the JRE to be shipped to a YARN cluster. This allows deploying Spark 
applications to a YARN cluster in which its required Java version need not 
match one of the versions already installed on the YARN cluster, useful in 
situations in which the Spark Application developer does not have 
administrative access over the YARN cluster (ex. school or corporate 
environment) but still wants to use certain language features in their code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16148) TaskLocation does not allow for Executor ID's with underscores

2016-06-28 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-16148:
-
Assignee: Tom Magrino

> TaskLocation does not allow for Executor ID's with underscores
> --
>
> Key: SPARK-16148
> URL: https://issues.apache.org/jira/browse/SPARK-16148
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.6.1
>Reporter: Tom Magrino
>Assignee: Tom Magrino
>Priority: Minor
> Fix For: 1.6.3, 2.0.1, 2.1.0
>
>
> Currently, the logic in TaskLocation does not allow for Executor IDs which 
> contain underscores, leading to an IllegalArgumentException being thrown from 
> core/src/scala/org/apache/spark/scheduler/TaskLocation.scala in the apply 
> method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >