[jira] [Issue Comment Deleted] (SPARK-8582) Optimize checkpointing to avoid computing an RDD twice

2017-08-25 Thread Lev Katzav (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lev Katzav updated SPARK-8582:
--
Comment: was deleted

(was: Any update on this?
what are the plans for spark 2?

thanks)

> Optimize checkpointing to avoid computing an RDD twice
> --
>
> Key: SPARK-8582
> URL: https://issues.apache.org/jira/browse/SPARK-8582
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Shixiong Zhu
>
> In Spark, checkpointing allows the user to truncate the lineage of his RDD 
> and save the intermediate contents to HDFS for fault tolerance. However, this 
> is not currently implemented super efficiently:
> Every time we checkpoint an RDD, we actually compute it twice: once during 
> the action that triggered the checkpointing in the first place, and once 
> while we checkpoint (we iterate through an RDD's partitions and write them to 
> disk). See this line for more detail: 
> https://github.com/apache/spark/blob/0401cbaa8ee51c71f43604f338b65022a479da0a/core/src/main/scala/org/apache/spark/rdd/RDDCheckpointData.scala#L102.
> Instead, we should have a `CheckpointingInterator` that writes checkpoint 
> data to HDFS while we run the action. This will speed up many usages of 
> `RDD#checkpoint` by 2X.
> (Alternatively, the user can just cache the RDD before checkpointing it, but 
> this is not always viable for very large input data. It's also not a great 
> API to use in general.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21751:


Assignee: Apache Spark

> CodeGeneraor.splitExpressions counts code size more precisely
> -
>
> Key: SPARK-21751
> URL: https://issues.apache.org/jira/browse/SPARK-21751
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>Priority: Minor
>
> Current {{CodeGeneraor.splitExpressions}} splits statements if their total 
> length is more than 1200 characters. It may include comments or empty line.
> It would be good to exclude comment or empty line to reduce the number of 
> generated methods in a class. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21751:


Assignee: (was: Apache Spark)

> CodeGeneraor.splitExpressions counts code size more precisely
> -
>
> Key: SPARK-21751
> URL: https://issues.apache.org/jira/browse/SPARK-21751
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Current {{CodeGeneraor.splitExpressions}} splits statements if their total 
> length is more than 1200 characters. It may include comments or empty line.
> It would be good to exclude comment or empty line to reduce the number of 
> generated methods in a class. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21781) Modify DataSourceScanExec to use concrete ColumnVector type.

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21781:


Assignee: (was: Apache Spark)

> Modify DataSourceScanExec to use concrete ColumnVector type.
> 
>
> Key: SPARK-21781
> URL: https://issues.apache.org/jira/browse/SPARK-21781
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Takuya Ueshin
>
> As mentioned at 
> https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we 
> have more {{ColumnVector}} implementations, it might (or might not) have huge 
> performance implications because it might disable inlining, or force virtual 
> dispatches.
> As for read path, one of the major paths is the one generated by 
> {{ColumnBatchScan}}. Currently it refers {{ColumnVector}} so the penalty will 
> be bigger as we have more classes, but we can know the concrete type from its 
> usage, e.g. vectorized Parquet reader uses {{OnHeapColumnVector}}. We can use 
> the concrete type in the generated code directly to avoid the penalty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21781) Modify DataSourceScanExec to use concrete ColumnVector type.

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21781:


Assignee: Apache Spark

> Modify DataSourceScanExec to use concrete ColumnVector type.
> 
>
> Key: SPARK-21781
> URL: https://issues.apache.org/jira/browse/SPARK-21781
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>
> As mentioned at 
> https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we 
> have more {{ColumnVector}} implementations, it might (or might not) have huge 
> performance implications because it might disable inlining, or force virtual 
> dispatches.
> As for read path, one of the major paths is the one generated by 
> {{ColumnBatchScan}}. Currently it refers {{ColumnVector}} so the penalty will 
> be bigger as we have more classes, but we can know the concrete type from its 
> usage, e.g. vectorized Parquet reader uses {{OnHeapColumnVector}}. We can use 
> the concrete type in the generated code directly to avoid the penalty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21781) Modify DataSourceScanExec to use concrete ColumnVector type.

2017-08-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142606#comment-16142606
 ] 

Apache Spark commented on SPARK-21781:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/18989

> Modify DataSourceScanExec to use concrete ColumnVector type.
> 
>
> Key: SPARK-21781
> URL: https://issues.apache.org/jira/browse/SPARK-21781
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Takuya Ueshin
>
> As mentioned at 
> https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we 
> have more {{ColumnVector}} implementations, it might (or might not) have huge 
> performance implications because it might disable inlining, or force virtual 
> dispatches.
> As for read path, one of the major paths is the one generated by 
> {{ColumnBatchScan}}. Currently it refers {{ColumnVector}} so the penalty will 
> be bigger as we have more classes, but we can know the concrete type from its 
> usage, e.g. vectorized Parquet reader uses {{OnHeapColumnVector}}. We can use 
> the concrete type in the generated code directly to avoid the penalty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21781) Modify DataSourceScanExec to use concrete ColumnVector type.

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21781:


Assignee: (was: Apache Spark)

> Modify DataSourceScanExec to use concrete ColumnVector type.
> 
>
> Key: SPARK-21781
> URL: https://issues.apache.org/jira/browse/SPARK-21781
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Takuya Ueshin
>
> As mentioned at 
> https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we 
> have more {{ColumnVector}} implementations, it might (or might not) have huge 
> performance implications because it might disable inlining, or force virtual 
> dispatches.
> As for read path, one of the major paths is the one generated by 
> {{ColumnBatchScan}}. Currently it refers {{ColumnVector}} so the penalty will 
> be bigger as we have more classes, but we can know the concrete type from its 
> usage, e.g. vectorized Parquet reader uses {{OnHeapColumnVector}}. We can use 
> the concrete type in the generated code directly to avoid the penalty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21781) Modify DataSourceScanExec to use concrete ColumnVector type.

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21781:


Assignee: Apache Spark

> Modify DataSourceScanExec to use concrete ColumnVector type.
> 
>
> Key: SPARK-21781
> URL: https://issues.apache.org/jira/browse/SPARK-21781
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>
> As mentioned at 
> https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we 
> have more {{ColumnVector}} implementations, it might (or might not) have huge 
> performance implications because it might disable inlining, or force virtual 
> dispatches.
> As for read path, one of the major paths is the one generated by 
> {{ColumnBatchScan}}. Currently it refers {{ColumnVector}} so the penalty will 
> be bigger as we have more classes, but we can know the concrete type from its 
> usage, e.g. vectorized Parquet reader uses {{OnHeapColumnVector}}. We can use 
> the concrete type in the generated code directly to avoid the penalty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21751:


Assignee: Apache Spark

> CodeGeneraor.splitExpressions counts code size more precisely
> -
>
> Key: SPARK-21751
> URL: https://issues.apache.org/jira/browse/SPARK-21751
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>Priority: Minor
>
> Current {{CodeGeneraor.splitExpressions}} splits statements if their total 
> length is more than 1200 characters. It may include comments or empty line.
> It would be good to exclude comment or empty line to reduce the number of 
> generated methods in a class. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21751:


Assignee: Apache Spark

> CodeGeneraor.splitExpressions counts code size more precisely
> -
>
> Key: SPARK-21751
> URL: https://issues.apache.org/jira/browse/SPARK-21751
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>Priority: Minor
>
> Current {{CodeGeneraor.splitExpressions}} splits statements if their total 
> length is more than 1200 characters. It may include comments or empty line.
> It would be good to exclude comment or empty line to reduce the number of 
> generated methods in a class. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21751:


Assignee: (was: Apache Spark)

> CodeGeneraor.splitExpressions counts code size more precisely
> -
>
> Key: SPARK-21751
> URL: https://issues.apache.org/jira/browse/SPARK-21751
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Current {{CodeGeneraor.splitExpressions}} splits statements if their total 
> length is more than 1200 characters. It may include comments or empty line.
> It would be good to exclude comment or empty line to reduce the number of 
> generated methods in a class. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21751:


Assignee: (was: Apache Spark)

> CodeGeneraor.splitExpressions counts code size more precisely
> -
>
> Key: SPARK-21751
> URL: https://issues.apache.org/jira/browse/SPARK-21751
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Current {{CodeGeneraor.splitExpressions}} splits statements if their total 
> length is more than 1200 characters. It may include comments or empty line.
> It would be good to exclude comment or empty line to reduce the number of 
> generated methods in a class. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21832) Merge SQLBuilderTest into ExpressionSQLBuilderSuite

2017-08-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142580#comment-16142580
 ] 

Apache Spark commented on SPARK-21832:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/19044

> Merge SQLBuilderTest into ExpressionSQLBuilderSuite
> ---
>
> Key: SPARK-21832
> URL: https://issues.apache.org/jira/browse/SPARK-21832
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.3.0
>
>
> After SPARK-19025, there is no need to keep SQLBuilderTest. 
> ExpressionSQLBuilderSuite is the only place to use it.
> This issue aims to remove SQLBuilderTest.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21832) Merge SQLBuilderTest into ExpressionSQLBuilderSuite

2017-08-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142579#comment-16142579
 ] 

Apache Spark commented on SPARK-21832:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/19044

> Merge SQLBuilderTest into ExpressionSQLBuilderSuite
> ---
>
> Key: SPARK-21832
> URL: https://issues.apache.org/jira/browse/SPARK-21832
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.3.0
>
>
> After SPARK-19025, there is no need to keep SQLBuilderTest. 
> ExpressionSQLBuilderSuite is the only place to use it.
> This issue aims to remove SQLBuilderTest.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21751:


Assignee: (was: Apache Spark)

> CodeGeneraor.splitExpressions counts code size more precisely
> -
>
> Key: SPARK-21751
> URL: https://issues.apache.org/jira/browse/SPARK-21751
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Current {{CodeGeneraor.splitExpressions}} splits statements if their total 
> length is more than 1200 characters. It may include comments or empty line.
> It would be good to exclude comment or empty line to reduce the number of 
> generated methods in a class. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21751:


Assignee: Apache Spark

> CodeGeneraor.splitExpressions counts code size more precisely
> -
>
> Key: SPARK-21751
> URL: https://issues.apache.org/jira/browse/SPARK-21751
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>Priority: Minor
>
> Current {{CodeGeneraor.splitExpressions}} splits statements if their total 
> length is more than 1200 characters. It may include comments or empty line.
> It would be good to exclude comment or empty line to reduce the number of 
> generated methods in a class. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-21831) Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite

2017-08-25 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li closed SPARK-21831.
---
   Resolution: Fixed
 Assignee: Dongjoon Hyun
Fix Version/s: 2.3.0

> Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite
> 
>
> Key: SPARK-21831
> URL: https://issues.apache.org/jira/browse/SPARK-21831
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.3.0
>
>
> SPARK-19025 removes SQLBuilder, so we need to remove the following in 
> HiveCompatibilitySuite.
> {code}
> // Ensures that the plans generation use metastore relation and not 
> OrcRelation
> // Was done because SqlBuilder does not work with plans having logical 
> relation
> TestHive.setConf(HiveUtils.CONVERT_METASTORE_ORC, false)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21831) Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21831:


Assignee: (was: Apache Spark)

> Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite
> 
>
> Key: SPARK-21831
> URL: https://issues.apache.org/jira/browse/SPARK-21831
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.3.0
>
>
> SPARK-19025 removes SQLBuilder, so we need to remove the following in 
> HiveCompatibilitySuite.
> {code}
> // Ensures that the plans generation use metastore relation and not 
> OrcRelation
> // Was done because SqlBuilder does not work with plans having logical 
> relation
> TestHive.setConf(HiveUtils.CONVERT_METASTORE_ORC, false)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21831) Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21831:


Assignee: Apache Spark

> Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite
> 
>
> Key: SPARK-21831
> URL: https://issues.apache.org/jira/browse/SPARK-21831
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 2.3.0
>
>
> SPARK-19025 removes SQLBuilder, so we need to remove the following in 
> HiveCompatibilitySuite.
> {code}
> // Ensures that the plans generation use metastore relation and not 
> OrcRelation
> // Was done because SqlBuilder does not work with plans having logical 
> relation
> TestHive.setConf(HiveUtils.CONVERT_METASTORE_ORC, false)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21831) Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21831:


Assignee: (was: Apache Spark)

> Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite
> 
>
> Key: SPARK-21831
> URL: https://issues.apache.org/jira/browse/SPARK-21831
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> SPARK-19025 removes SQLBuilder, so we need to remove the following in 
> HiveCompatibilitySuite.
> {code}
> // Ensures that the plans generation use metastore relation and not 
> OrcRelation
> // Was done because SqlBuilder does not work with plans having logical 
> relation
> TestHive.setConf(HiveUtils.CONVERT_METASTORE_ORC, false)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21831) Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21831:


Assignee: Apache Spark

> Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite
> 
>
> Key: SPARK-21831
> URL: https://issues.apache.org/jira/browse/SPARK-21831
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>
> SPARK-19025 removes SQLBuilder, so we need to remove the following in 
> HiveCompatibilitySuite.
> {code}
> // Ensures that the plans generation use metastore relation and not 
> OrcRelation
> // Was done because SqlBuilder does not work with plans having logical 
> relation
> TestHive.setConf(HiveUtils.CONVERT_METASTORE_ORC, false)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21834:


Assignee: (was: Apache Spark)

> Incorrect executor request in case of dynamic allocation
> 
>
> Key: SPARK-21834
> URL: https://issues.apache.org/jira/browse/SPARK-21834
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Sital Kedia
>
> killExecutor api currently does not allow killing an executor without 
> updating the total number of executors needed. In case of dynamic allocation 
> is turned on and the allocator tries to kill an executor, the scheduler 
> reduces the total number of executors needed ( see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635)
>  which is incorrect because the allocator already takes care of setting the 
> required number of executors itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21834:


Assignee: Apache Spark

> Incorrect executor request in case of dynamic allocation
> 
>
> Key: SPARK-21834
> URL: https://issues.apache.org/jira/browse/SPARK-21834
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Sital Kedia
>Assignee: Apache Spark
>
> killExecutor api currently does not allow killing an executor without 
> updating the total number of executors needed. In case of dynamic allocation 
> is turned on and the allocator tries to kill an executor, the scheduler 
> reduces the total number of executors needed ( see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635)
>  which is incorrect because the allocator already takes care of setting the 
> required number of executors itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21834:


Assignee: (was: Apache Spark)

> Incorrect executor request in case of dynamic allocation
> 
>
> Key: SPARK-21834
> URL: https://issues.apache.org/jira/browse/SPARK-21834
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Sital Kedia
>
> killExecutor api currently does not allow killing an executor without 
> updating the total number of executors needed. In case of dynamic allocation 
> is turned on and the allocator tries to kill an executor, the scheduler 
> reduces the total number of executors needed ( see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635)
>  which is incorrect because the allocator already takes care of setting the 
> required number of executors itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21834:


Assignee: Apache Spark

> Incorrect executor request in case of dynamic allocation
> 
>
> Key: SPARK-21834
> URL: https://issues.apache.org/jira/browse/SPARK-21834
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Sital Kedia
>Assignee: Apache Spark
>
> killExecutor api currently does not allow killing an executor without 
> updating the total number of executors needed. In case of dynamic allocation 
> is turned on and the allocator tries to kill an executor, the scheduler 
> reduces the total number of executors needed ( see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635)
>  which is incorrect because the allocator already takes care of setting the 
> required number of executors itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21834:


Assignee: (was: Apache Spark)

> Incorrect executor request in case of dynamic allocation
> 
>
> Key: SPARK-21834
> URL: https://issues.apache.org/jira/browse/SPARK-21834
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Sital Kedia
>
> killExecutor api currently does not allow killing an executor without 
> updating the total number of executors needed. In case of dynamic allocation 
> is turned on and the allocator tries to kill an executor, the scheduler 
> reduces the total number of executors needed ( see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635)
>  which is incorrect because the allocator already takes care of setting the 
> required number of executors itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21834:


Assignee: Apache Spark

> Incorrect executor request in case of dynamic allocation
> 
>
> Key: SPARK-21834
> URL: https://issues.apache.org/jira/browse/SPARK-21834
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Sital Kedia
>Assignee: Apache Spark
>
> killExecutor api currently does not allow killing an executor without 
> updating the total number of executors needed. In case of dynamic allocation 
> is turned on and the allocator tries to kill an executor, the scheduler 
> reduces the total number of executors needed ( see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635)
>  which is incorrect because the allocator already takes care of setting the 
> required number of executors itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21751:


Assignee: (was: Apache Spark)

> CodeGeneraor.splitExpressions counts code size more precisely
> -
>
> Key: SPARK-21751
> URL: https://issues.apache.org/jira/browse/SPARK-21751
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Current {{CodeGeneraor.splitExpressions}} splits statements if their total 
> length is more than 1200 characters. It may include comments or empty line.
> It would be good to exclude comment or empty line to reduce the number of 
> generated methods in a class. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21751:


Assignee: Apache Spark

> CodeGeneraor.splitExpressions counts code size more precisely
> -
>
> Key: SPARK-21751
> URL: https://issues.apache.org/jira/browse/SPARK-21751
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>Priority: Minor
>
> Current {{CodeGeneraor.splitExpressions}} splits statements if their total 
> length is more than 1200 characters. It may include comments or empty line.
> It would be good to exclude comment or empty line to reduce the number of 
> generated methods in a class. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely

2017-08-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142535#comment-16142535
 ] 

Apache Spark commented on SPARK-21751:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/18966

> CodeGeneraor.splitExpressions counts code size more precisely
> -
>
> Key: SPARK-21751
> URL: https://issues.apache.org/jira/browse/SPARK-21751
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Current {{CodeGeneraor.splitExpressions}} splits statements if their total 
> length is more than 1200 characters. It may include comments or empty line.
> It would be good to exclude comment or empty line to reduce the number of 
> generated methods in a class. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21843:


Assignee: (was: Apache Spark)

> testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" 
> in ExchangeCoordinatorSuite
> -
>
> Key: SPARK-21843
> URL: https://issues.apache.org/jira/browse/SPARK-21843
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: iamhumanbeing
>Priority: Minor
> Fix For: 2.3.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> testNameNote = "(minNumPostShufflePartitions: 3) is not correct. 
> it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in 
> ExchangeCoordinatorSuite



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21843:


Assignee: Apache Spark

> testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" 
> in ExchangeCoordinatorSuite
> -
>
> Key: SPARK-21843
> URL: https://issues.apache.org/jira/browse/SPARK-21843
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: iamhumanbeing
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 2.3.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> testNameNote = "(minNumPostShufflePartitions: 3) is not correct. 
> it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in 
> ExchangeCoordinatorSuite



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21843:


Assignee: (was: Apache Spark)

> testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" 
> in ExchangeCoordinatorSuite
> -
>
> Key: SPARK-21843
> URL: https://issues.apache.org/jira/browse/SPARK-21843
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: iamhumanbeing
>Priority: Minor
> Fix For: 2.3.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> testNameNote = "(minNumPostShufflePartitions: 3) is not correct. 
> it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in 
> ExchangeCoordinatorSuite



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21843:


Assignee: Apache Spark

> testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" 
> in ExchangeCoordinatorSuite
> -
>
> Key: SPARK-21843
> URL: https://issues.apache.org/jira/browse/SPARK-21843
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: iamhumanbeing
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 2.3.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> testNameNote = "(minNumPostShufflePartitions: 3) is not correct. 
> it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in 
> ExchangeCoordinatorSuite



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21843:


Assignee: Apache Spark

> testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" 
> in ExchangeCoordinatorSuite
> -
>
> Key: SPARK-21843
> URL: https://issues.apache.org/jira/browse/SPARK-21843
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: iamhumanbeing
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 2.3.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> testNameNote = "(minNumPostShufflePartitions: 3) is not correct. 
> it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in 
> ExchangeCoordinatorSuite



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21843:


Assignee: (was: Apache Spark)

> testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" 
> in ExchangeCoordinatorSuite
> -
>
> Key: SPARK-21843
> URL: https://issues.apache.org/jira/browse/SPARK-21843
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: iamhumanbeing
>Priority: Minor
> Fix For: 2.3.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> testNameNote = "(minNumPostShufflePartitions: 3) is not correct. 
> it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in 
> ExchangeCoordinatorSuite



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite

2017-08-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142505#comment-16142505
 ] 

Apache Spark commented on SPARK-21843:
--

User 'iamhumanbeing' has created a pull request for this issue:
https://github.com/apache/spark/pull/19057

> testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" 
> in ExchangeCoordinatorSuite
> -
>
> Key: SPARK-21843
> URL: https://issues.apache.org/jira/browse/SPARK-21843
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: iamhumanbeing
>Priority: Minor
> Fix For: 2.3.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> testNameNote = "(minNumPostShufflePartitions: 3) is not correct. 
> it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in 
> ExchangeCoordinatorSuite



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21834:


Assignee: (was: Apache Spark)

> Incorrect executor request in case of dynamic allocation
> 
>
> Key: SPARK-21834
> URL: https://issues.apache.org/jira/browse/SPARK-21834
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Sital Kedia
>
> killExecutor api currently does not allow killing an executor without 
> updating the total number of executors needed. In case of dynamic allocation 
> is turned on and the allocator tries to kill an executor, the scheduler 
> reduces the total number of executors needed ( see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635)
>  which is incorrect because the allocator already takes care of setting the 
> required number of executors itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21834:


Assignee: Apache Spark

> Incorrect executor request in case of dynamic allocation
> 
>
> Key: SPARK-21834
> URL: https://issues.apache.org/jira/browse/SPARK-21834
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Sital Kedia
>Assignee: Apache Spark
>
> killExecutor api currently does not allow killing an executor without 
> updating the total number of executors needed. In case of dynamic allocation 
> is turned on and the allocator tries to kill an executor, the scheduler 
> reduces the total number of executors needed ( see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635)
>  which is incorrect because the allocator already takes care of setting the 
> required number of executors itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21834:


Assignee: Apache Spark

> Incorrect executor request in case of dynamic allocation
> 
>
> Key: SPARK-21834
> URL: https://issues.apache.org/jira/browse/SPARK-21834
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Sital Kedia
>Assignee: Apache Spark
>
> killExecutor api currently does not allow killing an executor without 
> updating the total number of executors needed. In case of dynamic allocation 
> is turned on and the allocator tries to kill an executor, the scheduler 
> reduces the total number of executors needed ( see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635)
>  which is incorrect because the allocator already takes care of setting the 
> required number of executors itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21834:


Assignee: (was: Apache Spark)

> Incorrect executor request in case of dynamic allocation
> 
>
> Key: SPARK-21834
> URL: https://issues.apache.org/jira/browse/SPARK-21834
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Sital Kedia
>
> killExecutor api currently does not allow killing an executor without 
> updating the total number of executors needed. In case of dynamic allocation 
> is turned on and the allocator tries to kill an executor, the scheduler 
> reduces the total number of executors needed ( see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635)
>  which is incorrect because the allocator already takes care of setting the 
> required number of executors itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21834:


Assignee: Apache Spark

> Incorrect executor request in case of dynamic allocation
> 
>
> Key: SPARK-21834
> URL: https://issues.apache.org/jira/browse/SPARK-21834
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Sital Kedia
>Assignee: Apache Spark
>
> killExecutor api currently does not allow killing an executor without 
> updating the total number of executors needed. In case of dynamic allocation 
> is turned on and the allocator tries to kill an executor, the scheduler 
> reduces the total number of executors needed ( see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635)
>  which is incorrect because the allocator already takes care of setting the 
> required number of executors itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21834:


Assignee: (was: Apache Spark)

> Incorrect executor request in case of dynamic allocation
> 
>
> Key: SPARK-21834
> URL: https://issues.apache.org/jira/browse/SPARK-21834
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Sital Kedia
>
> killExecutor api currently does not allow killing an executor without 
> updating the total number of executors needed. In case of dynamic allocation 
> is turned on and the allocator tries to kill an executor, the scheduler 
> reduces the total number of executors needed ( see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635)
>  which is incorrect because the allocator already takes care of setting the 
> required number of executors itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21834:


Assignee: Apache Spark

> Incorrect executor request in case of dynamic allocation
> 
>
> Key: SPARK-21834
> URL: https://issues.apache.org/jira/browse/SPARK-21834
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Sital Kedia
>Assignee: Apache Spark
>
> killExecutor api currently does not allow killing an executor without 
> updating the total number of executors needed. In case of dynamic allocation 
> is turned on and the allocator tries to kill an executor, the scheduler 
> reduces the total number of executors needed ( see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635)
>  which is incorrect because the allocator already takes care of setting the 
> required number of executors itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21834) Incorrect executor request in case of dynamic allocation

2017-08-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142468#comment-16142468
 ] 

Apache Spark commented on SPARK-21834:
--

User 'sitalkedia' has created a pull request for this issue:
https://github.com/apache/spark/pull/19048

> Incorrect executor request in case of dynamic allocation
> 
>
> Key: SPARK-21834
> URL: https://issues.apache.org/jira/browse/SPARK-21834
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Sital Kedia
>
> killExecutor api currently does not allow killing an executor without 
> updating the total number of executors needed. In case of dynamic allocation 
> is turned on and the allocator tries to kill an executor, the scheduler 
> reduces the total number of executors needed ( see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635)
>  which is incorrect because the allocator already takes care of setting the 
> required number of executors itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21834:


Assignee: (was: Apache Spark)

> Incorrect executor request in case of dynamic allocation
> 
>
> Key: SPARK-21834
> URL: https://issues.apache.org/jira/browse/SPARK-21834
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Sital Kedia
>
> killExecutor api currently does not allow killing an executor without 
> updating the total number of executors needed. In case of dynamic allocation 
> is turned on and the allocator tries to kill an executor, the scheduler 
> reduces the total number of executors needed ( see 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635)
>  which is incorrect because the allocator already takes care of setting the 
> required number of executors itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-21772) HiveException unable to move results from srcf to destf in InsertIntoHiveTable

2017-08-25 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li closed SPARK-21772.
---
Resolution: Cannot Reproduce

> HiveException unable to move results from srcf to destf in 
> InsertIntoHiveTable 
> ---
>
> Key: SPARK-21772
> URL: https://issues.apache.org/jira/browse/SPARK-21772
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: JDK1.7
> CentOS 6.3
> Spark2.1
>Reporter: liupengcheng
>  Labels: sql
>
> Currently, when execute {code:java} create table as select {code} would 
> return Exception:
> {code:java}
> 2017-08-17,16:14:18,792 ERROR 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.spark.sql.hive.client.Shim_v0_12.loadTable(HiveShim.scala:346)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply$mcV$sp(HiveClientImpl.scala:770)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:770)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:770)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:316)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:262)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:261)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:305)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:769)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply$mcV$sp(HiveExternalCatalog.scala:765)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:763)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:763)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:100)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:763)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:323)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:120)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:120)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:138)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:119)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectCommand.run(CreateHiveTableAsSelectCommand.scala:92)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:120)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:120)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> 

[jira] [Commented] (SPARK-21691) Accessing canonicalized plan for query with limit throws exception

2017-08-25 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142395#comment-16142395
 ] 

Xiao Li commented on SPARK-21691:
-

If you really need to call `canonicalized`, you can do it
{noformat}
session.sql("select * from (values 0, 1) limit 
1").queryExecution.analyzed.canonicalized
{noformat}


> Accessing canonicalized plan for query with limit throws exception
> --
>
> Key: SPARK-21691
> URL: https://issues.apache.org/jira/browse/SPARK-21691
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Bjoern Toldbod
>
> Accessing the logical, canonicalized plan fails for queries with limits.
> The following demonstrates the issue:
> {code:java}
> val session = SparkSession.builder.master("local").getOrCreate()
> // This works
> session.sql("select * from (values 0, 
> 1)").queryExecution.logical.canonicalized
> // This fails
> session.sql("select * from (values 0, 1) limit 
> 1").queryExecution.logical.canonicalized
> {code}
> The message in the thrown exception is somewhat confusing (or at least not 
> directly related to the limit):
> "Invalid call to toAttribute on unresolved object, tree: *"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21691) Accessing canonicalized plan for query with limit throws exception

2017-08-25 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142391#comment-16142391
 ] 

Xiao Li commented on SPARK-21691:
-

This is a pretty internal API. We do not expect users to call it. Could you 
show us why you need it?

> Accessing canonicalized plan for query with limit throws exception
> --
>
> Key: SPARK-21691
> URL: https://issues.apache.org/jira/browse/SPARK-21691
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Bjoern Toldbod
>
> Accessing the logical, canonicalized plan fails for queries with limits.
> The following demonstrates the issue:
> {code:java}
> val session = SparkSession.builder.master("local").getOrCreate()
> // This works
> session.sql("select * from (values 0, 
> 1)").queryExecution.logical.canonicalized
> // This fails
> session.sql("select * from (values 0, 1) limit 
> 1").queryExecution.logical.canonicalized
> {code}
> The message in the thrown exception is somewhat confusing (or at least not 
> directly related to the limit):
> "Invalid call to toAttribute on unresolved object, tree: *"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite

2017-08-25 Thread iamhumanbeing (JIRA)
iamhumanbeing created SPARK-21843:
-

 Summary: testNameNote should be "(minNumPostShufflePartitions: " + 
numPartitions + ")" in ExchangeCoordinatorSuite
 Key: SPARK-21843
 URL: https://issues.apache.org/jira/browse/SPARK-21843
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 2.3.0
Reporter: iamhumanbeing
Priority: Minor
 Fix For: 2.3.0


testNameNote = "(minNumPostShufflePartitions: 3) is not correct. 
it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in 
ExchangeCoordinatorSuite



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21798) No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21798:


Assignee: Apache Spark

> No config to replace deprecated SPARK_CLASSPATH config for launching daemons 
> like History Server
> 
>
> Key: SPARK-21798
> URL: https://issues.apache.org/jira/browse/SPARK-21798
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sanket Reddy
>Assignee: Apache Spark
>Priority: Minor
>
> History Server Launch uses SparkClassCommandBuilder for launching the server. 
> It is observed that SPARK_CLASSPATH has been removed and deprecated. For 
> spark-submit this takes a different route and spark.driver.extraClasspath 
> takes care of specifying additional jars in the classpath that were 
> previously specified in the SPARK_CLASSPATH. Right now the only way specify 
> the additional jars for launching daemons such as history server is using 
> SPARK_DIST_CLASSPATH 
> (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I 
> presume is a distribution classpath. It would be nice to have a similar 
> config like spark.driver.extraClasspath for launching daemons similar to 
> history server. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21798) No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21798:


Assignee: (was: Apache Spark)

> No config to replace deprecated SPARK_CLASSPATH config for launching daemons 
> like History Server
> 
>
> Key: SPARK-21798
> URL: https://issues.apache.org/jira/browse/SPARK-21798
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sanket Reddy
>Priority: Minor
>
> History Server Launch uses SparkClassCommandBuilder for launching the server. 
> It is observed that SPARK_CLASSPATH has been removed and deprecated. For 
> spark-submit this takes a different route and spark.driver.extraClasspath 
> takes care of specifying additional jars in the classpath that were 
> previously specified in the SPARK_CLASSPATH. Right now the only way specify 
> the additional jars for launching daemons such as history server is using 
> SPARK_DIST_CLASSPATH 
> (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I 
> presume is a distribution classpath. It would be nice to have a similar 
> config like spark.driver.extraClasspath for launching daemons similar to 
> history server. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21798) No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21798:


Assignee: Apache Spark

> No config to replace deprecated SPARK_CLASSPATH config for launching daemons 
> like History Server
> 
>
> Key: SPARK-21798
> URL: https://issues.apache.org/jira/browse/SPARK-21798
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sanket Reddy
>Assignee: Apache Spark
>Priority: Minor
>
> History Server Launch uses SparkClassCommandBuilder for launching the server. 
> It is observed that SPARK_CLASSPATH has been removed and deprecated. For 
> spark-submit this takes a different route and spark.driver.extraClasspath 
> takes care of specifying additional jars in the classpath that were 
> previously specified in the SPARK_CLASSPATH. Right now the only way specify 
> the additional jars for launching daemons such as history server is using 
> SPARK_DIST_CLASSPATH 
> (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I 
> presume is a distribution classpath. It would be nice to have a similar 
> config like spark.driver.extraClasspath for launching daemons similar to 
> history server. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21798) No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21798:


Assignee: (was: Apache Spark)

> No config to replace deprecated SPARK_CLASSPATH config for launching daemons 
> like History Server
> 
>
> Key: SPARK-21798
> URL: https://issues.apache.org/jira/browse/SPARK-21798
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sanket Reddy
>Priority: Minor
>
> History Server Launch uses SparkClassCommandBuilder for launching the server. 
> It is observed that SPARK_CLASSPATH has been removed and deprecated. For 
> spark-submit this takes a different route and spark.driver.extraClasspath 
> takes care of specifying additional jars in the classpath that were 
> previously specified in the SPARK_CLASSPATH. Right now the only way specify 
> the additional jars for launching daemons such as history server is using 
> SPARK_DIST_CLASSPATH 
> (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I 
> presume is a distribution classpath. It would be nice to have a similar 
> config like spark.driver.extraClasspath for launching daemons similar to 
> history server. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21798) No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21798:


Assignee: (was: Apache Spark)

> No config to replace deprecated SPARK_CLASSPATH config for launching daemons 
> like History Server
> 
>
> Key: SPARK-21798
> URL: https://issues.apache.org/jira/browse/SPARK-21798
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sanket Reddy
>Priority: Minor
>
> History Server Launch uses SparkClassCommandBuilder for launching the server. 
> It is observed that SPARK_CLASSPATH has been removed and deprecated. For 
> spark-submit this takes a different route and spark.driver.extraClasspath 
> takes care of specifying additional jars in the classpath that were 
> previously specified in the SPARK_CLASSPATH. Right now the only way specify 
> the additional jars for launching daemons such as history server is using 
> SPARK_DIST_CLASSPATH 
> (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I 
> presume is a distribution classpath. It would be nice to have a similar 
> config like spark.driver.extraClasspath for launching daemons similar to 
> history server. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21798) No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21798:


Assignee: Apache Spark

> No config to replace deprecated SPARK_CLASSPATH config for launching daemons 
> like History Server
> 
>
> Key: SPARK-21798
> URL: https://issues.apache.org/jira/browse/SPARK-21798
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sanket Reddy
>Assignee: Apache Spark
>Priority: Minor
>
> History Server Launch uses SparkClassCommandBuilder for launching the server. 
> It is observed that SPARK_CLASSPATH has been removed and deprecated. For 
> spark-submit this takes a different route and spark.driver.extraClasspath 
> takes care of specifying additional jars in the classpath that were 
> previously specified in the SPARK_CLASSPATH. Right now the only way specify 
> the additional jars for launching daemons such as history server is using 
> SPARK_DIST_CLASSPATH 
> (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I 
> presume is a distribution classpath. It would be nice to have a similar 
> config like spark.driver.extraClasspath for launching daemons similar to 
> history server. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21798) No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server

2017-08-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142357#comment-16142357
 ] 

Apache Spark commented on SPARK-21798:
--

User 'pgandhi999' has created a pull request for this issue:
https://github.com/apache/spark/pull/19047

> No config to replace deprecated SPARK_CLASSPATH config for launching daemons 
> like History Server
> 
>
> Key: SPARK-21798
> URL: https://issues.apache.org/jira/browse/SPARK-21798
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sanket Reddy
>Priority: Minor
>
> History Server Launch uses SparkClassCommandBuilder for launching the server. 
> It is observed that SPARK_CLASSPATH has been removed and deprecated. For 
> spark-submit this takes a different route and spark.driver.extraClasspath 
> takes care of specifying additional jars in the classpath that were 
> previously specified in the SPARK_CLASSPATH. Right now the only way specify 
> the additional jars for launching daemons such as history server is using 
> SPARK_DIST_CLASSPATH 
> (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I 
> presume is a distribution classpath. It would be nice to have a similar 
> config like spark.driver.extraClasspath for launching daemons similar to 
> history server. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos

2017-08-25 Thread Arthur Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142337#comment-16142337
 ] 

Arthur Rand commented on SPARK-16742:
-

Gotcha, https://issues.apache.org/jira/browse/SPARK-21842 is to track work. 

> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>Assignee: Arthur Rand
> Fix For: 2.3.0
>
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21842) Support Kerberos ticket renewal and creation in Mesos

2017-08-25 Thread Arthur Rand (JIRA)
Arthur Rand created SPARK-21842:
---

 Summary: Support Kerberos ticket renewal and creation in Mesos 
 Key: SPARK-21842
 URL: https://issues.apache.org/jira/browse/SPARK-21842
 Project: Spark
  Issue Type: New Feature
  Components: Mesos
Affects Versions: 2.3.0
Reporter: Arthur Rand
 Fix For: 2.3.0


We at Mesosphere have written Kerberos support for Spark on Mesos. The code to 
use Kerberos on a Mesos cluster has been added to Apache Spark (SPARK-16742). 
This ticket is to complete the implementation and allow for ticket renewal and 
creation. Specifically for long running and streaming jobs.

Mesosphere design doc (needs revision, wip): 
https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs

2017-08-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142227#comment-16142227
 ] 

Apache Spark commented on SPARK-17321:
--

User 'jerryshao' has created a pull request for this issue:
https://github.com/apache/spark/pull/19032

> YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
> --
>
> Key: SPARK-17321
> URL: https://issues.apache.org/jira/browse/SPARK-17321
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.2, 2.0.0, 2.1.1
>Reporter: yunjiong zhao
>
> We run spark on yarn, after enabled spark dynamic allocation, we notice some 
> spark application failed randomly due to YarnShuffleService.
> From log I found
> {quote}
> 2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: 
> Error while initializing Netty pipeline
> java.lang.NullPointerException
> at 
> org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77)
> at 
> org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159)
> at 
> org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116)
> at 
> io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> {quote} 
> Which caused by the first disk in yarn.nodemanager.local-dirs was broken.
> If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost 
> hundred nodes which is unacceptable.
> We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good 
> disks if the first one is broken?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs

2017-08-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142229#comment-16142229
 ] 

Apache Spark commented on SPARK-17321:
--

User 'jerryshao' has created a pull request for this issue:
https://github.com/apache/spark/pull/19032

> YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
> --
>
> Key: SPARK-17321
> URL: https://issues.apache.org/jira/browse/SPARK-17321
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.2, 2.0.0, 2.1.1
>Reporter: yunjiong zhao
>
> We run spark on yarn, after enabled spark dynamic allocation, we notice some 
> spark application failed randomly due to YarnShuffleService.
> From log I found
> {quote}
> 2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: 
> Error while initializing Netty pipeline
> java.lang.NullPointerException
> at 
> org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77)
> at 
> org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159)
> at 
> org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116)
> at 
> io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> {quote} 
> Which caused by the first disk in yarn.nodemanager.local-dirs was broken.
> If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost 
> hundred nodes which is unacceptable.
> We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good 
> disks if the first one is broken?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21841) Spark SQL doesn't pick up column added in hive when table created with saveAsTable

2017-08-25 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-21841:
-

 Summary: Spark SQL doesn't pick up column added in hive when table 
created with saveAsTable
 Key: SPARK-21841
 URL: https://issues.apache.org/jira/browse/SPARK-21841
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0
Reporter: Thomas Graves


If you create a table in Spark sql but then you modify the table in hive to add 
a column, spark sql doesn't pick up the new column.

Basic example:
{code}
t1 = spark.sql("select ip_address from mydb.test_table limit 1")
t1.show()

++
|  ip_address|
++
|1.30.25.5|
++

t1.write.saveAsTable('mydb.t1')

In Hive:
alter table mydb.t1 add columns (bcookie string)

t1 = spark.table("mydb.t1")
t1.show()
++
|  ip_address|
++
|1.30.25.5|
++
{code}

It looks like its because spark sql is picking up the schema from 
spark.sql.sources.schema.part.0 rather then from hive. 

Interestingly enough it appears that if you create the table differently like:
spark.sql("create table mydb.t1 select ip_address from mydb.test_table limit 
1") 
Run your alter table on mydb.t1
val t1 = spark.table("mydb.t1")  

Then it works properly.

It looks like the difference is when it doesn't work 
spark.sql.sources.provider=parquet is set.
Its doing this from createDataSourceTable where provider is parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20812:


Assignee: (was: Apache Spark)

> Add Mesos Secrets support to the spark dispatcher
> -
>
> Key: SPARK-20812
> URL: https://issues.apache.org/jira/browse/SPARK-20812
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.3.0
>Reporter: Michael Gummelt
>
> Mesos 1.4 will support secrets.  In order to support sending keytabs through 
> the Spark Dispatcher, or any other secret, we need to integrate this with the 
> Spark Dispatcher.
> The integration should include support for both file-based and env-based 
> secrets.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20812:


Assignee: Apache Spark

> Add Mesos Secrets support to the spark dispatcher
> -
>
> Key: SPARK-20812
> URL: https://issues.apache.org/jira/browse/SPARK-20812
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.3.0
>Reporter: Michael Gummelt
>Assignee: Apache Spark
>
> Mesos 1.4 will support secrets.  In order to support sending keytabs through 
> the Spark Dispatcher, or any other secret, we need to integrate this with the 
> Spark Dispatcher.
> The integration should include support for both file-based and env-based 
> secrets.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20812:


Assignee: Apache Spark

> Add Mesos Secrets support to the spark dispatcher
> -
>
> Key: SPARK-20812
> URL: https://issues.apache.org/jira/browse/SPARK-20812
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.3.0
>Reporter: Michael Gummelt
>Assignee: Apache Spark
>
> Mesos 1.4 will support secrets.  In order to support sending keytabs through 
> the Spark Dispatcher, or any other secret, we need to integrate this with the 
> Spark Dispatcher.
> The integration should include support for both file-based and env-based 
> secrets.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher

2017-08-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142201#comment-16142201
 ] 

Apache Spark commented on SPARK-20812:
--

User 'ArtRand' has created a pull request for this issue:
https://github.com/apache/spark/pull/18837

> Add Mesos Secrets support to the spark dispatcher
> -
>
> Key: SPARK-20812
> URL: https://issues.apache.org/jira/browse/SPARK-20812
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.3.0
>Reporter: Michael Gummelt
>
> Mesos 1.4 will support secrets.  In order to support sending keytabs through 
> the Spark Dispatcher, or any other secret, we need to integrate this with the 
> Spark Dispatcher.
> The integration should include support for both file-based and env-based 
> secrets.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21806) BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21806:


Assignee: (was: Apache Spark)

> BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
> --
>
> Key: SPARK-21806
> URL: https://issues.apache.org/jira/browse/SPARK-21806
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.2.0
>Reporter: Marc Kaminski
>Priority: Minor
> Attachments: PRROC_example.jpeg
>
>
> I would like to reference to a [discussion in scikit-learn| 
> https://github.com/scikit-learn/scikit-learn/issues/4223], as this behavior 
> is probably based on the scikit implementation. 
> Summary: 
> Currently, the y-axis intercept of the precision recall curve is set to (0.0, 
> 1.0). This behavior is not ideal in certain edge cases (see example below) 
> and can also have an impact on cross validation, when optimization metric is 
> set to "areaUnderPR". 
> Please consider [blucena's 
> post|https://github.com/scikit-learn/scikit-learn/issues/4223#issuecomment-215273613]
>  for possible alternatives. 
> Edge case example: 
> Consider a bad classifier, that assigns a high probability to all samples. A 
> possible output might look like this: 
> ||Real label || Score ||
> |1.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 0.95 |
> |0.0 | 0.95 |
> |1.0 | 1.0 |
> This results in the following pr points (first line set by default): 
> ||Threshold || Recall ||Precision ||
> |1.0 | 0.0 | 1.0 | 
> |0.95| 1.0 | 0.2 |
> |0.0| 1.0 | 0,16 |
> The auPRC would be around 0.6. Classifiers with a more differentiated 
> probability assignment  will be falsely assumed to perform worse in regard to 
> this auPRC.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21765) Ensure all leaf nodes that are derived from streaming sources have isStreaming=true

2017-08-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142171#comment-16142171
 ] 

Apache Spark commented on SPARK-21765:
--

User 'joseph-torres' has created a pull request for this issue:
https://github.com/apache/spark/pull/19056

> Ensure all leaf nodes that are derived from streaming sources have 
> isStreaming=true
> ---
>
> Key: SPARK-21765
> URL: https://issues.apache.org/jira/browse/SPARK-21765
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Jose Torres
> Fix For: 3.0.0
>
>
> LogicalPlan has an isStreaming bit, but it's incompletely implemented. Some 
> streaming sources don't set the bit, and the bit can sometimes be lost in 
> rewriting. Setting the bit for all plans that are logically streaming will 
> help us simplify the logic around checking query plan validity.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21806) BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21806:


Assignee: Apache Spark

> BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
> --
>
> Key: SPARK-21806
> URL: https://issues.apache.org/jira/browse/SPARK-21806
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.2.0
>Reporter: Marc Kaminski
>Assignee: Apache Spark
>Priority: Minor
> Attachments: PRROC_example.jpeg
>
>
> I would like to reference to a [discussion in scikit-learn| 
> https://github.com/scikit-learn/scikit-learn/issues/4223], as this behavior 
> is probably based on the scikit implementation. 
> Summary: 
> Currently, the y-axis intercept of the precision recall curve is set to (0.0, 
> 1.0). This behavior is not ideal in certain edge cases (see example below) 
> and can also have an impact on cross validation, when optimization metric is 
> set to "areaUnderPR". 
> Please consider [blucena's 
> post|https://github.com/scikit-learn/scikit-learn/issues/4223#issuecomment-215273613]
>  for possible alternatives. 
> Edge case example: 
> Consider a bad classifier, that assigns a high probability to all samples. A 
> possible output might look like this: 
> ||Real label || Score ||
> |1.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 0.95 |
> |0.0 | 0.95 |
> |1.0 | 1.0 |
> This results in the following pr points (first line set by default): 
> ||Threshold || Recall ||Precision ||
> |1.0 | 0.0 | 1.0 | 
> |0.95| 1.0 | 0.2 |
> |0.0| 1.0 | 0,16 |
> The auPRC would be around 0.6. Classifiers with a more differentiated 
> probability assignment  will be falsely assumed to perform worse in regard to 
> this auPRC.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21806) BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21806:


Assignee: Apache Spark

> BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
> --
>
> Key: SPARK-21806
> URL: https://issues.apache.org/jira/browse/SPARK-21806
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.2.0
>Reporter: Marc Kaminski
>Assignee: Apache Spark
>Priority: Minor
> Attachments: PRROC_example.jpeg
>
>
> I would like to reference to a [discussion in scikit-learn| 
> https://github.com/scikit-learn/scikit-learn/issues/4223], as this behavior 
> is probably based on the scikit implementation. 
> Summary: 
> Currently, the y-axis intercept of the precision recall curve is set to (0.0, 
> 1.0). This behavior is not ideal in certain edge cases (see example below) 
> and can also have an impact on cross validation, when optimization metric is 
> set to "areaUnderPR". 
> Please consider [blucena's 
> post|https://github.com/scikit-learn/scikit-learn/issues/4223#issuecomment-215273613]
>  for possible alternatives. 
> Edge case example: 
> Consider a bad classifier, that assigns a high probability to all samples. A 
> possible output might look like this: 
> ||Real label || Score ||
> |1.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 0.95 |
> |0.0 | 0.95 |
> |1.0 | 1.0 |
> This results in the following pr points (first line set by default): 
> ||Threshold || Recall ||Precision ||
> |1.0 | 0.0 | 1.0 | 
> |0.95| 1.0 | 0.2 |
> |0.0| 1.0 | 0,16 |
> The auPRC would be around 0.6. Classifiers with a more differentiated 
> probability assignment  will be falsely assumed to perform worse in regard to 
> this auPRC.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21806) BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21806:


Assignee: (was: Apache Spark)

> BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
> --
>
> Key: SPARK-21806
> URL: https://issues.apache.org/jira/browse/SPARK-21806
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.2.0
>Reporter: Marc Kaminski
>Priority: Minor
> Attachments: PRROC_example.jpeg
>
>
> I would like to reference to a [discussion in scikit-learn| 
> https://github.com/scikit-learn/scikit-learn/issues/4223], as this behavior 
> is probably based on the scikit implementation. 
> Summary: 
> Currently, the y-axis intercept of the precision recall curve is set to (0.0, 
> 1.0). This behavior is not ideal in certain edge cases (see example below) 
> and can also have an impact on cross validation, when optimization metric is 
> set to "areaUnderPR". 
> Please consider [blucena's 
> post|https://github.com/scikit-learn/scikit-learn/issues/4223#issuecomment-215273613]
>  for possible alternatives. 
> Edge case example: 
> Consider a bad classifier, that assigns a high probability to all samples. A 
> possible output might look like this: 
> ||Real label || Score ||
> |1.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 0.95 |
> |0.0 | 0.95 |
> |1.0 | 1.0 |
> This results in the following pr points (first line set by default): 
> ||Threshold || Recall ||Precision ||
> |1.0 | 0.0 | 1.0 | 
> |0.95| 1.0 | 0.2 |
> |0.0| 1.0 | 0,16 |
> The auPRC would be around 0.6. Classifiers with a more differentiated 
> probability assignment  will be falsely assumed to perform worse in regard to 
> this auPRC.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21806) BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21806:


Assignee: (was: Apache Spark)

> BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
> --
>
> Key: SPARK-21806
> URL: https://issues.apache.org/jira/browse/SPARK-21806
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.2.0
>Reporter: Marc Kaminski
>Priority: Minor
> Attachments: PRROC_example.jpeg
>
>
> I would like to reference to a [discussion in scikit-learn| 
> https://github.com/scikit-learn/scikit-learn/issues/4223], as this behavior 
> is probably based on the scikit implementation. 
> Summary: 
> Currently, the y-axis intercept of the precision recall curve is set to (0.0, 
> 1.0). This behavior is not ideal in certain edge cases (see example below) 
> and can also have an impact on cross validation, when optimization metric is 
> set to "areaUnderPR". 
> Please consider [blucena's 
> post|https://github.com/scikit-learn/scikit-learn/issues/4223#issuecomment-215273613]
>  for possible alternatives. 
> Edge case example: 
> Consider a bad classifier, that assigns a high probability to all samples. A 
> possible output might look like this: 
> ||Real label || Score ||
> |1.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 0.95 |
> |0.0 | 0.95 |
> |1.0 | 1.0 |
> This results in the following pr points (first line set by default): 
> ||Threshold || Recall ||Precision ||
> |1.0 | 0.0 | 1.0 | 
> |0.95| 1.0 | 0.2 |
> |0.0| 1.0 | 0,16 |
> The auPRC would be around 0.6. Classifiers with a more differentiated 
> probability assignment  will be falsely assumed to perform worse in regard to 
> this auPRC.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21806) BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21806:


Assignee: Apache Spark

> BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
> --
>
> Key: SPARK-21806
> URL: https://issues.apache.org/jira/browse/SPARK-21806
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.2.0
>Reporter: Marc Kaminski
>Assignee: Apache Spark
>Priority: Minor
> Attachments: PRROC_example.jpeg
>
>
> I would like to reference to a [discussion in scikit-learn| 
> https://github.com/scikit-learn/scikit-learn/issues/4223], as this behavior 
> is probably based on the scikit implementation. 
> Summary: 
> Currently, the y-axis intercept of the precision recall curve is set to (0.0, 
> 1.0). This behavior is not ideal in certain edge cases (see example below) 
> and can also have an impact on cross validation, when optimization metric is 
> set to "areaUnderPR". 
> Please consider [blucena's 
> post|https://github.com/scikit-learn/scikit-learn/issues/4223#issuecomment-215273613]
>  for possible alternatives. 
> Edge case example: 
> Consider a bad classifier, that assigns a high probability to all samples. A 
> possible output might look like this: 
> ||Real label || Score ||
> |1.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 1.0 |
> |0.0 | 0.95 |
> |0.0 | 0.95 |
> |1.0 | 1.0 |
> This results in the following pr points (first line set by default): 
> ||Threshold || Recall ||Precision ||
> |1.0 | 0.0 | 1.0 | 
> |0.95| 1.0 | 0.2 |
> |0.0| 1.0 | 0,16 |
> The auPRC would be around 0.6. Classifiers with a more differentiated 
> probability assignment  will be falsely assumed to perform worse in regard to 
> this auPRC.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21728) Allow SparkSubmit to use logging

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21728:


Assignee: Apache Spark

> Allow SparkSubmit to use logging
> 
>
> Key: SPARK-21728
> URL: https://issues.apache.org/jira/browse/SPARK-21728
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Assignee: Apache Spark
>Priority: Minor
>
> Currently, code in {{SparkSubmit}} cannot call classes or methods that 
> initialize the Spark {{Logging}} framework. That is because at that time 
> {{SparkSubmit}} doesn't yet know which application will run, and logging is 
> initialized differently for certain special applications (notably, the 
> shells).
> It would be better if either {{SparkSubmit}} did logging initialization 
> earlier based on the application to be run, or did it in a way that could be 
> overridden later when the app initializes.
> Without this, there are currently a few parts of {{SparkSubmit}} that 
> duplicates code from other parts of Spark just to avoid logging. For example:
> * 
> [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860]
>  replicates code from Utils.scala
> * 
> [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54]
>  replicates code from Utils.scala and installs its own shutdown hook
> * a few parts of the code could use {{SparkConf}} but can't right now because 
> of the logging issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21728) Allow SparkSubmit to use logging

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21728:


Assignee: (was: Apache Spark)

> Allow SparkSubmit to use logging
> 
>
> Key: SPARK-21728
> URL: https://issues.apache.org/jira/browse/SPARK-21728
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> Currently, code in {{SparkSubmit}} cannot call classes or methods that 
> initialize the Spark {{Logging}} framework. That is because at that time 
> {{SparkSubmit}} doesn't yet know which application will run, and logging is 
> initialized differently for certain special applications (notably, the 
> shells).
> It would be better if either {{SparkSubmit}} did logging initialization 
> earlier based on the application to be run, or did it in a way that could be 
> overridden later when the app initializes.
> Without this, there are currently a few parts of {{SparkSubmit}} that 
> duplicates code from other parts of Spark just to avoid logging. For example:
> * 
> [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860]
>  replicates code from Utils.scala
> * 
> [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54]
>  replicates code from Utils.scala and installs its own shutdown hook
> * a few parts of the code could use {{SparkConf}} but can't right now because 
> of the logging issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21728) Allow SparkSubmit to use logging

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21728:


Assignee: Apache Spark

> Allow SparkSubmit to use logging
> 
>
> Key: SPARK-21728
> URL: https://issues.apache.org/jira/browse/SPARK-21728
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Assignee: Apache Spark
>Priority: Minor
>
> Currently, code in {{SparkSubmit}} cannot call classes or methods that 
> initialize the Spark {{Logging}} framework. That is because at that time 
> {{SparkSubmit}} doesn't yet know which application will run, and logging is 
> initialized differently for certain special applications (notably, the 
> shells).
> It would be better if either {{SparkSubmit}} did logging initialization 
> earlier based on the application to be run, or did it in a way that could be 
> overridden later when the app initializes.
> Without this, there are currently a few parts of {{SparkSubmit}} that 
> duplicates code from other parts of Spark just to avoid logging. For example:
> * 
> [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860]
>  replicates code from Utils.scala
> * 
> [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54]
>  replicates code from Utils.scala and installs its own shutdown hook
> * a few parts of the code could use {{SparkConf}} but can't right now because 
> of the logging issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21728) Allow SparkSubmit to use logging

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21728:


Assignee: (was: Apache Spark)

> Allow SparkSubmit to use logging
> 
>
> Key: SPARK-21728
> URL: https://issues.apache.org/jira/browse/SPARK-21728
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> Currently, code in {{SparkSubmit}} cannot call classes or methods that 
> initialize the Spark {{Logging}} framework. That is because at that time 
> {{SparkSubmit}} doesn't yet know which application will run, and logging is 
> initialized differently for certain special applications (notably, the 
> shells).
> It would be better if either {{SparkSubmit}} did logging initialization 
> earlier based on the application to be run, or did it in a way that could be 
> overridden later when the app initializes.
> Without this, there are currently a few parts of {{SparkSubmit}} that 
> duplicates code from other parts of Spark just to avoid logging. For example:
> * 
> [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860]
>  replicates code from Utils.scala
> * 
> [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54]
>  replicates code from Utils.scala and installs its own shutdown hook
> * a few parts of the code could use {{SparkConf}} but can't right now because 
> of the logging issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21728) Allow SparkSubmit to use logging

2017-08-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142151#comment-16142151
 ] 

Apache Spark commented on SPARK-21728:
--

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/19013

> Allow SparkSubmit to use logging
> 
>
> Key: SPARK-21728
> URL: https://issues.apache.org/jira/browse/SPARK-21728
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> Currently, code in {{SparkSubmit}} cannot call classes or methods that 
> initialize the Spark {{Logging}} framework. That is because at that time 
> {{SparkSubmit}} doesn't yet know which application will run, and logging is 
> initialized differently for certain special applications (notably, the 
> shells).
> It would be better if either {{SparkSubmit}} did logging initialization 
> earlier based on the application to be run, or did it in a way that could be 
> overridden later when the app initializes.
> Without this, there are currently a few parts of {{SparkSubmit}} that 
> duplicates code from other parts of Spark just to avoid logging. For example:
> * 
> [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860]
>  replicates code from Utils.scala
> * 
> [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54]
>  replicates code from Utils.scala and installs its own shutdown hook
> * a few parts of the code could use {{SparkConf}} but can't right now because 
> of the logging issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21728) Allow SparkSubmit to use logging

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21728:


Assignee: (was: Apache Spark)

> Allow SparkSubmit to use logging
> 
>
> Key: SPARK-21728
> URL: https://issues.apache.org/jira/browse/SPARK-21728
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> Currently, code in {{SparkSubmit}} cannot call classes or methods that 
> initialize the Spark {{Logging}} framework. That is because at that time 
> {{SparkSubmit}} doesn't yet know which application will run, and logging is 
> initialized differently for certain special applications (notably, the 
> shells).
> It would be better if either {{SparkSubmit}} did logging initialization 
> earlier based on the application to be run, or did it in a way that could be 
> overridden later when the app initializes.
> Without this, there are currently a few parts of {{SparkSubmit}} that 
> duplicates code from other parts of Spark just to avoid logging. For example:
> * 
> [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860]
>  replicates code from Utils.scala
> * 
> [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54]
>  replicates code from Utils.scala and installs its own shutdown hook
> * a few parts of the code could use {{SparkConf}} but can't right now because 
> of the logging issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21728) Allow SparkSubmit to use logging

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21728:


Assignee: Apache Spark

> Allow SparkSubmit to use logging
> 
>
> Key: SPARK-21728
> URL: https://issues.apache.org/jira/browse/SPARK-21728
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Assignee: Apache Spark
>Priority: Minor
>
> Currently, code in {{SparkSubmit}} cannot call classes or methods that 
> initialize the Spark {{Logging}} framework. That is because at that time 
> {{SparkSubmit}} doesn't yet know which application will run, and logging is 
> initialized differently for certain special applications (notably, the 
> shells).
> It would be better if either {{SparkSubmit}} did logging initialization 
> earlier based on the application to be run, or did it in a way that could be 
> overridden later when the app initializes.
> Without this, there are currently a few parts of {{SparkSubmit}} that 
> duplicates code from other parts of Spark just to avoid logging. For example:
> * 
> [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860]
>  replicates code from Utils.scala
> * 
> [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54]
>  replicates code from Utils.scala and installs its own shutdown hook
> * a few parts of the code could use {{SparkConf}} but can't right now because 
> of the logging issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21728) Allow SparkSubmit to use logging

2017-08-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142149#comment-16142149
 ] 

Apache Spark commented on SPARK-21728:
--

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/19013

> Allow SparkSubmit to use logging
> 
>
> Key: SPARK-21728
> URL: https://issues.apache.org/jira/browse/SPARK-21728
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> Currently, code in {{SparkSubmit}} cannot call classes or methods that 
> initialize the Spark {{Logging}} framework. That is because at that time 
> {{SparkSubmit}} doesn't yet know which application will run, and logging is 
> initialized differently for certain special applications (notably, the 
> shells).
> It would be better if either {{SparkSubmit}} did logging initialization 
> earlier based on the application to be run, or did it in a way that could be 
> overridden later when the app initializes.
> Without this, there are currently a few parts of {{SparkSubmit}} that 
> duplicates code from other parts of Spark just to avoid logging. For example:
> * 
> [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860]
>  replicates code from Utils.scala
> * 
> [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54]
>  replicates code from Utils.scala and installs its own shutdown hook
> * a few parts of the code could use {{SparkConf}} but can't right now because 
> of the logging issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21837) UserDefinedTypeSuite local UDFs not actually testing what it intends

2017-08-25 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21837.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> UserDefinedTypeSuite local UDFs not actually testing what it intends
> 
>
> Key: SPARK-21837
> URL: https://issues.apache.org/jira/browse/SPARK-21837
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.2.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 2.3.0
>
>
> Consider this test in {{UserDefinedTypeSuite}}:
> {code}
>   test("Local UDTs") {
> val df = Seq((1, new UDT.MyDenseVector(Array(0.1, 1.0.toDF("int", 
> "vec")
> df.collect()(0).getAs[UDT.MyDenseVector](1)
> df.take(1)(0).getAs[UDT.MyDenseVector](1)
> 
> df.limit(1).groupBy('int).agg(first('vec)).collect()(0).getAs[UDT.MyDenseVector](0)
> df.orderBy('int).limit(1).groupBy('int).agg(first('vec)).collect()(0)
>   .getAs[UDT.MyDenseVector](0)
>   }
> {code}
> I claim the last two lines can't be right, because they say that the first 
> column in the aggregation is the vector, when it is the grouping key (int). 
> But it passes! 
> But it started failing when I made seemingly unrelated changes in 
> https://github.com/apache/spark/pull/18645 like:
> {code}
> [info] - Local UDTs *** FAILED *** (144 milliseconds)
> [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.spark.sql.UDT$MyDenseVector
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:211)
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:205)
> {code}
> I modified the test to actually assert that the vector that results in each 
> case is the expected one, and it began failing with the same error, in 
> master. Therefore I am pretty sure the test is not quite doing what it seems 
> to want to, and the result of these expressions just happened to not be fully 
> evaluated or checked.
> CC [~marmbrus] for the discussion at 
> https://github.com/apache/spark/commit/3ae25f244bd471ef77002c703f2cc7ed6b524f11##commitcomment-23320234
>  and apologies if I'm still really missing something here. I'll open a PR to 
> show you what I mean.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21837) UserDefinedTypeSuite local UDFs not actually testing what it intends

2017-08-25 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21837:


Assignee: Sean Owen  (was: Apache Spark)

> UserDefinedTypeSuite local UDFs not actually testing what it intends
> 
>
> Key: SPARK-21837
> URL: https://issues.apache.org/jira/browse/SPARK-21837
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.2.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
>
> Consider this test in {{UserDefinedTypeSuite}}:
> {code}
>   test("Local UDTs") {
> val df = Seq((1, new UDT.MyDenseVector(Array(0.1, 1.0.toDF("int", 
> "vec")
> df.collect()(0).getAs[UDT.MyDenseVector](1)
> df.take(1)(0).getAs[UDT.MyDenseVector](1)
> 
> df.limit(1).groupBy('int).agg(first('vec)).collect()(0).getAs[UDT.MyDenseVector](0)
> df.orderBy('int).limit(1).groupBy('int).agg(first('vec)).collect()(0)
>   .getAs[UDT.MyDenseVector](0)
>   }
> {code}
> I claim the last two lines can't be right, because they say that the first 
> column in the aggregation is the vector, when it is the grouping key (int). 
> But it passes! 
> But it started failing when I made seemingly unrelated changes in 
> https://github.com/apache/spark/pull/18645 like:
> {code}
> [info] - Local UDTs *** FAILED *** (144 milliseconds)
> [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.spark.sql.UDT$MyDenseVector
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:211)
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:205)
> {code}
> I modified the test to actually assert that the vector that results in each 
> case is the expected one, and it began failing with the same error, in 
> master. Therefore I am pretty sure the test is not quite doing what it seems 
> to want to, and the result of these expressions just happened to not be fully 
> evaluated or checked.
> CC [~marmbrus] for the discussion at 
> https://github.com/apache/spark/commit/3ae25f244bd471ef77002c703f2cc7ed6b524f11##commitcomment-23320234
>  and apologies if I'm still really missing something here. I'll open a PR to 
> show you what I mean.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20682) Support a new faster ORC data source based on Apache ORC

2017-08-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142138#comment-16142138
 ] 

Apache Spark commented on SPARK-20682:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/18953

> Support a new faster ORC data source based on Apache ORC
> 
>
> Key: SPARK-20682
> URL: https://issues.apache.org/jira/browse/SPARK-20682
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.2, 1.6.3, 2.1.1, 2.2.0
>Reporter: Dongjoon Hyun
>
> Since SPARK-2883, Apache Spark supports Apache ORC inside `sql/hive` module 
> with Hive dependency. This issue aims to add a new and faster ORC data source 
> inside `sql/core` and to replace the old ORC data source eventually. In this 
> issue, the latest Apache ORC 1.4.0 (released yesterday) is used.
> There are four key benefits.
> - Speed: Use both Spark `ColumnarBatch` and ORC `RowBatch` together. This is 
> faster than the current implementation in Spark.
> - Stability: Apache ORC 1.4.0 has many fixes and we can depend on ORC 
> community more.
> - Usability: User can use `ORC` data sources without hive module, i.e, 
> `-Phive`.
> - Maintainability: Reduce the Hive dependency and can remove old legacy code 
> later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >