[jira] [Issue Comment Deleted] (SPARK-8582) Optimize checkpointing to avoid computing an RDD twice
[ https://issues.apache.org/jira/browse/SPARK-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lev Katzav updated SPARK-8582: -- Comment: was deleted (was: Any update on this? what are the plans for spark 2? thanks) > Optimize checkpointing to avoid computing an RDD twice > -- > > Key: SPARK-8582 > URL: https://issues.apache.org/jira/browse/SPARK-8582 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Shixiong Zhu > > In Spark, checkpointing allows the user to truncate the lineage of his RDD > and save the intermediate contents to HDFS for fault tolerance. However, this > is not currently implemented super efficiently: > Every time we checkpoint an RDD, we actually compute it twice: once during > the action that triggered the checkpointing in the first place, and once > while we checkpoint (we iterate through an RDD's partitions and write them to > disk). See this line for more detail: > https://github.com/apache/spark/blob/0401cbaa8ee51c71f43604f338b65022a479da0a/core/src/main/scala/org/apache/spark/rdd/RDDCheckpointData.scala#L102. > Instead, we should have a `CheckpointingInterator` that writes checkpoint > data to HDFS while we run the action. This will speed up many usages of > `RDD#checkpoint` by 2X. > (Alternatively, the user can just cache the RDD before checkpointing it, but > this is not always viable for very large input data. It's also not a great > API to use in general.) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely
[ https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21751: Assignee: Apache Spark > CodeGeneraor.splitExpressions counts code size more precisely > - > > Key: SPARK-21751 > URL: https://issues.apache.org/jira/browse/SPARK-21751 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Assignee: Apache Spark >Priority: Minor > > Current {{CodeGeneraor.splitExpressions}} splits statements if their total > length is more than 1200 characters. It may include comments or empty line. > It would be good to exclude comment or empty line to reduce the number of > generated methods in a class. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely
[ https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21751: Assignee: (was: Apache Spark) > CodeGeneraor.splitExpressions counts code size more precisely > - > > Key: SPARK-21751 > URL: https://issues.apache.org/jira/browse/SPARK-21751 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Priority: Minor > > Current {{CodeGeneraor.splitExpressions}} splits statements if their total > length is more than 1200 characters. It may include comments or empty line. > It would be good to exclude comment or empty line to reduce the number of > generated methods in a class. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21781) Modify DataSourceScanExec to use concrete ColumnVector type.
[ https://issues.apache.org/jira/browse/SPARK-21781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21781: Assignee: (was: Apache Spark) > Modify DataSourceScanExec to use concrete ColumnVector type. > > > Key: SPARK-21781 > URL: https://issues.apache.org/jira/browse/SPARK-21781 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Takuya Ueshin > > As mentioned at > https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we > have more {{ColumnVector}} implementations, it might (or might not) have huge > performance implications because it might disable inlining, or force virtual > dispatches. > As for read path, one of the major paths is the one generated by > {{ColumnBatchScan}}. Currently it refers {{ColumnVector}} so the penalty will > be bigger as we have more classes, but we can know the concrete type from its > usage, e.g. vectorized Parquet reader uses {{OnHeapColumnVector}}. We can use > the concrete type in the generated code directly to avoid the penalty. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21781) Modify DataSourceScanExec to use concrete ColumnVector type.
[ https://issues.apache.org/jira/browse/SPARK-21781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21781: Assignee: Apache Spark > Modify DataSourceScanExec to use concrete ColumnVector type. > > > Key: SPARK-21781 > URL: https://issues.apache.org/jira/browse/SPARK-21781 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark > > As mentioned at > https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we > have more {{ColumnVector}} implementations, it might (or might not) have huge > performance implications because it might disable inlining, or force virtual > dispatches. > As for read path, one of the major paths is the one generated by > {{ColumnBatchScan}}. Currently it refers {{ColumnVector}} so the penalty will > be bigger as we have more classes, but we can know the concrete type from its > usage, e.g. vectorized Parquet reader uses {{OnHeapColumnVector}}. We can use > the concrete type in the generated code directly to avoid the penalty. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21781) Modify DataSourceScanExec to use concrete ColumnVector type.
[ https://issues.apache.org/jira/browse/SPARK-21781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142606#comment-16142606 ] Apache Spark commented on SPARK-21781: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/18989 > Modify DataSourceScanExec to use concrete ColumnVector type. > > > Key: SPARK-21781 > URL: https://issues.apache.org/jira/browse/SPARK-21781 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Takuya Ueshin > > As mentioned at > https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we > have more {{ColumnVector}} implementations, it might (or might not) have huge > performance implications because it might disable inlining, or force virtual > dispatches. > As for read path, one of the major paths is the one generated by > {{ColumnBatchScan}}. Currently it refers {{ColumnVector}} so the penalty will > be bigger as we have more classes, but we can know the concrete type from its > usage, e.g. vectorized Parquet reader uses {{OnHeapColumnVector}}. We can use > the concrete type in the generated code directly to avoid the penalty. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21781) Modify DataSourceScanExec to use concrete ColumnVector type.
[ https://issues.apache.org/jira/browse/SPARK-21781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21781: Assignee: (was: Apache Spark) > Modify DataSourceScanExec to use concrete ColumnVector type. > > > Key: SPARK-21781 > URL: https://issues.apache.org/jira/browse/SPARK-21781 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Takuya Ueshin > > As mentioned at > https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we > have more {{ColumnVector}} implementations, it might (or might not) have huge > performance implications because it might disable inlining, or force virtual > dispatches. > As for read path, one of the major paths is the one generated by > {{ColumnBatchScan}}. Currently it refers {{ColumnVector}} so the penalty will > be bigger as we have more classes, but we can know the concrete type from its > usage, e.g. vectorized Parquet reader uses {{OnHeapColumnVector}}. We can use > the concrete type in the generated code directly to avoid the penalty. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21781) Modify DataSourceScanExec to use concrete ColumnVector type.
[ https://issues.apache.org/jira/browse/SPARK-21781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21781: Assignee: Apache Spark > Modify DataSourceScanExec to use concrete ColumnVector type. > > > Key: SPARK-21781 > URL: https://issues.apache.org/jira/browse/SPARK-21781 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark > > As mentioned at > https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we > have more {{ColumnVector}} implementations, it might (or might not) have huge > performance implications because it might disable inlining, or force virtual > dispatches. > As for read path, one of the major paths is the one generated by > {{ColumnBatchScan}}. Currently it refers {{ColumnVector}} so the penalty will > be bigger as we have more classes, but we can know the concrete type from its > usage, e.g. vectorized Parquet reader uses {{OnHeapColumnVector}}. We can use > the concrete type in the generated code directly to avoid the penalty. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely
[ https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21751: Assignee: Apache Spark > CodeGeneraor.splitExpressions counts code size more precisely > - > > Key: SPARK-21751 > URL: https://issues.apache.org/jira/browse/SPARK-21751 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Assignee: Apache Spark >Priority: Minor > > Current {{CodeGeneraor.splitExpressions}} splits statements if their total > length is more than 1200 characters. It may include comments or empty line. > It would be good to exclude comment or empty line to reduce the number of > generated methods in a class. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely
[ https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21751: Assignee: Apache Spark > CodeGeneraor.splitExpressions counts code size more precisely > - > > Key: SPARK-21751 > URL: https://issues.apache.org/jira/browse/SPARK-21751 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Assignee: Apache Spark >Priority: Minor > > Current {{CodeGeneraor.splitExpressions}} splits statements if their total > length is more than 1200 characters. It may include comments or empty line. > It would be good to exclude comment or empty line to reduce the number of > generated methods in a class. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely
[ https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21751: Assignee: (was: Apache Spark) > CodeGeneraor.splitExpressions counts code size more precisely > - > > Key: SPARK-21751 > URL: https://issues.apache.org/jira/browse/SPARK-21751 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Priority: Minor > > Current {{CodeGeneraor.splitExpressions}} splits statements if their total > length is more than 1200 characters. It may include comments or empty line. > It would be good to exclude comment or empty line to reduce the number of > generated methods in a class. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely
[ https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21751: Assignee: (was: Apache Spark) > CodeGeneraor.splitExpressions counts code size more precisely > - > > Key: SPARK-21751 > URL: https://issues.apache.org/jira/browse/SPARK-21751 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Priority: Minor > > Current {{CodeGeneraor.splitExpressions}} splits statements if their total > length is more than 1200 characters. It may include comments or empty line. > It would be good to exclude comment or empty line to reduce the number of > generated methods in a class. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21832) Merge SQLBuilderTest into ExpressionSQLBuilderSuite
[ https://issues.apache.org/jira/browse/SPARK-21832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142580#comment-16142580 ] Apache Spark commented on SPARK-21832: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/19044 > Merge SQLBuilderTest into ExpressionSQLBuilderSuite > --- > > Key: SPARK-21832 > URL: https://issues.apache.org/jira/browse/SPARK-21832 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 2.3.0 > > > After SPARK-19025, there is no need to keep SQLBuilderTest. > ExpressionSQLBuilderSuite is the only place to use it. > This issue aims to remove SQLBuilderTest. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21832) Merge SQLBuilderTest into ExpressionSQLBuilderSuite
[ https://issues.apache.org/jira/browse/SPARK-21832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142579#comment-16142579 ] Apache Spark commented on SPARK-21832: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/19044 > Merge SQLBuilderTest into ExpressionSQLBuilderSuite > --- > > Key: SPARK-21832 > URL: https://issues.apache.org/jira/browse/SPARK-21832 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 2.3.0 > > > After SPARK-19025, there is no need to keep SQLBuilderTest. > ExpressionSQLBuilderSuite is the only place to use it. > This issue aims to remove SQLBuilderTest. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely
[ https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21751: Assignee: (was: Apache Spark) > CodeGeneraor.splitExpressions counts code size more precisely > - > > Key: SPARK-21751 > URL: https://issues.apache.org/jira/browse/SPARK-21751 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Priority: Minor > > Current {{CodeGeneraor.splitExpressions}} splits statements if their total > length is more than 1200 characters. It may include comments or empty line. > It would be good to exclude comment or empty line to reduce the number of > generated methods in a class. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely
[ https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21751: Assignee: Apache Spark > CodeGeneraor.splitExpressions counts code size more precisely > - > > Key: SPARK-21751 > URL: https://issues.apache.org/jira/browse/SPARK-21751 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Assignee: Apache Spark >Priority: Minor > > Current {{CodeGeneraor.splitExpressions}} splits statements if their total > length is more than 1200 characters. It may include comments or empty line. > It would be good to exclude comment or empty line to reduce the number of > generated methods in a class. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-21831) Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite
[ https://issues.apache.org/jira/browse/SPARK-21831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-21831. --- Resolution: Fixed Assignee: Dongjoon Hyun Fix Version/s: 2.3.0 > Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite > > > Key: SPARK-21831 > URL: https://issues.apache.org/jira/browse/SPARK-21831 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 2.3.0 > > > SPARK-19025 removes SQLBuilder, so we need to remove the following in > HiveCompatibilitySuite. > {code} > // Ensures that the plans generation use metastore relation and not > OrcRelation > // Was done because SqlBuilder does not work with plans having logical > relation > TestHive.setConf(HiveUtils.CONVERT_METASTORE_ORC, false) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21831) Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite
[ https://issues.apache.org/jira/browse/SPARK-21831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21831: Assignee: (was: Apache Spark) > Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite > > > Key: SPARK-21831 > URL: https://issues.apache.org/jira/browse/SPARK-21831 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Priority: Minor > Fix For: 2.3.0 > > > SPARK-19025 removes SQLBuilder, so we need to remove the following in > HiveCompatibilitySuite. > {code} > // Ensures that the plans generation use metastore relation and not > OrcRelation > // Was done because SqlBuilder does not work with plans having logical > relation > TestHive.setConf(HiveUtils.CONVERT_METASTORE_ORC, false) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21831) Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite
[ https://issues.apache.org/jira/browse/SPARK-21831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21831: Assignee: Apache Spark > Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite > > > Key: SPARK-21831 > URL: https://issues.apache.org/jira/browse/SPARK-21831 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Minor > Fix For: 2.3.0 > > > SPARK-19025 removes SQLBuilder, so we need to remove the following in > HiveCompatibilitySuite. > {code} > // Ensures that the plans generation use metastore relation and not > OrcRelation > // Was done because SqlBuilder does not work with plans having logical > relation > TestHive.setConf(HiveUtils.CONVERT_METASTORE_ORC, false) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21831) Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite
[ https://issues.apache.org/jira/browse/SPARK-21831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21831: Assignee: (was: Apache Spark) > Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite > > > Key: SPARK-21831 > URL: https://issues.apache.org/jira/browse/SPARK-21831 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Priority: Minor > > SPARK-19025 removes SQLBuilder, so we need to remove the following in > HiveCompatibilitySuite. > {code} > // Ensures that the plans generation use metastore relation and not > OrcRelation > // Was done because SqlBuilder does not work with plans having logical > relation > TestHive.setConf(HiveUtils.CONVERT_METASTORE_ORC, false) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21831) Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite
[ https://issues.apache.org/jira/browse/SPARK-21831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21831: Assignee: Apache Spark > Remove `spark.sql.hive.convertMetastoreOrc` config in HiveCompatibilitySuite > > > Key: SPARK-21831 > URL: https://issues.apache.org/jira/browse/SPARK-21831 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Minor > > SPARK-19025 removes SQLBuilder, so we need to remove the following in > HiveCompatibilitySuite. > {code} > // Ensures that the plans generation use metastore relation and not > OrcRelation > // Was done because SqlBuilder does not work with plans having logical > relation > TestHive.setConf(HiveUtils.CONVERT_METASTORE_ORC, false) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21834: Assignee: (was: Apache Spark) > Incorrect executor request in case of dynamic allocation > > > Key: SPARK-21834 > URL: https://issues.apache.org/jira/browse/SPARK-21834 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.0 >Reporter: Sital Kedia > > killExecutor api currently does not allow killing an executor without > updating the total number of executors needed. In case of dynamic allocation > is turned on and the allocator tries to kill an executor, the scheduler > reduces the total number of executors needed ( see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) > which is incorrect because the allocator already takes care of setting the > required number of executors itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21834: Assignee: Apache Spark > Incorrect executor request in case of dynamic allocation > > > Key: SPARK-21834 > URL: https://issues.apache.org/jira/browse/SPARK-21834 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.0 >Reporter: Sital Kedia >Assignee: Apache Spark > > killExecutor api currently does not allow killing an executor without > updating the total number of executors needed. In case of dynamic allocation > is turned on and the allocator tries to kill an executor, the scheduler > reduces the total number of executors needed ( see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) > which is incorrect because the allocator already takes care of setting the > required number of executors itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21834: Assignee: (was: Apache Spark) > Incorrect executor request in case of dynamic allocation > > > Key: SPARK-21834 > URL: https://issues.apache.org/jira/browse/SPARK-21834 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.0 >Reporter: Sital Kedia > > killExecutor api currently does not allow killing an executor without > updating the total number of executors needed. In case of dynamic allocation > is turned on and the allocator tries to kill an executor, the scheduler > reduces the total number of executors needed ( see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) > which is incorrect because the allocator already takes care of setting the > required number of executors itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21834: Assignee: Apache Spark > Incorrect executor request in case of dynamic allocation > > > Key: SPARK-21834 > URL: https://issues.apache.org/jira/browse/SPARK-21834 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.0 >Reporter: Sital Kedia >Assignee: Apache Spark > > killExecutor api currently does not allow killing an executor without > updating the total number of executors needed. In case of dynamic allocation > is turned on and the allocator tries to kill an executor, the scheduler > reduces the total number of executors needed ( see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) > which is incorrect because the allocator already takes care of setting the > required number of executors itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21834: Assignee: (was: Apache Spark) > Incorrect executor request in case of dynamic allocation > > > Key: SPARK-21834 > URL: https://issues.apache.org/jira/browse/SPARK-21834 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.0 >Reporter: Sital Kedia > > killExecutor api currently does not allow killing an executor without > updating the total number of executors needed. In case of dynamic allocation > is turned on and the allocator tries to kill an executor, the scheduler > reduces the total number of executors needed ( see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) > which is incorrect because the allocator already takes care of setting the > required number of executors itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21834: Assignee: Apache Spark > Incorrect executor request in case of dynamic allocation > > > Key: SPARK-21834 > URL: https://issues.apache.org/jira/browse/SPARK-21834 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.0 >Reporter: Sital Kedia >Assignee: Apache Spark > > killExecutor api currently does not allow killing an executor without > updating the total number of executors needed. In case of dynamic allocation > is turned on and the allocator tries to kill an executor, the scheduler > reduces the total number of executors needed ( see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) > which is incorrect because the allocator already takes care of setting the > required number of executors itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely
[ https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21751: Assignee: (was: Apache Spark) > CodeGeneraor.splitExpressions counts code size more precisely > - > > Key: SPARK-21751 > URL: https://issues.apache.org/jira/browse/SPARK-21751 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Priority: Minor > > Current {{CodeGeneraor.splitExpressions}} splits statements if their total > length is more than 1200 characters. It may include comments or empty line. > It would be good to exclude comment or empty line to reduce the number of > generated methods in a class. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely
[ https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21751: Assignee: Apache Spark > CodeGeneraor.splitExpressions counts code size more precisely > - > > Key: SPARK-21751 > URL: https://issues.apache.org/jira/browse/SPARK-21751 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Assignee: Apache Spark >Priority: Minor > > Current {{CodeGeneraor.splitExpressions}} splits statements if their total > length is more than 1200 characters. It may include comments or empty line. > It would be good to exclude comment or empty line to reduce the number of > generated methods in a class. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely
[ https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142535#comment-16142535 ] Apache Spark commented on SPARK-21751: -- User 'kiszk' has created a pull request for this issue: https://github.com/apache/spark/pull/18966 > CodeGeneraor.splitExpressions counts code size more precisely > - > > Key: SPARK-21751 > URL: https://issues.apache.org/jira/browse/SPARK-21751 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Priority: Minor > > Current {{CodeGeneraor.splitExpressions}} splits statements if their total > length is more than 1200 characters. It may include comments or empty line. > It would be good to exclude comment or empty line to reduce the number of > generated methods in a class. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite
[ https://issues.apache.org/jira/browse/SPARK-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21843: Assignee: (was: Apache Spark) > testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" > in ExchangeCoordinatorSuite > - > > Key: SPARK-21843 > URL: https://issues.apache.org/jira/browse/SPARK-21843 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.3.0 >Reporter: iamhumanbeing >Priority: Minor > Fix For: 2.3.0 > > Original Estimate: 72h > Remaining Estimate: 72h > > testNameNote = "(minNumPostShufflePartitions: 3) is not correct. > it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in > ExchangeCoordinatorSuite -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite
[ https://issues.apache.org/jira/browse/SPARK-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21843: Assignee: Apache Spark > testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" > in ExchangeCoordinatorSuite > - > > Key: SPARK-21843 > URL: https://issues.apache.org/jira/browse/SPARK-21843 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.3.0 >Reporter: iamhumanbeing >Assignee: Apache Spark >Priority: Minor > Fix For: 2.3.0 > > Original Estimate: 72h > Remaining Estimate: 72h > > testNameNote = "(minNumPostShufflePartitions: 3) is not correct. > it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in > ExchangeCoordinatorSuite -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite
[ https://issues.apache.org/jira/browse/SPARK-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21843: Assignee: (was: Apache Spark) > testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" > in ExchangeCoordinatorSuite > - > > Key: SPARK-21843 > URL: https://issues.apache.org/jira/browse/SPARK-21843 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.3.0 >Reporter: iamhumanbeing >Priority: Minor > Fix For: 2.3.0 > > Original Estimate: 72h > Remaining Estimate: 72h > > testNameNote = "(minNumPostShufflePartitions: 3) is not correct. > it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in > ExchangeCoordinatorSuite -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite
[ https://issues.apache.org/jira/browse/SPARK-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21843: Assignee: Apache Spark > testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" > in ExchangeCoordinatorSuite > - > > Key: SPARK-21843 > URL: https://issues.apache.org/jira/browse/SPARK-21843 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.3.0 >Reporter: iamhumanbeing >Assignee: Apache Spark >Priority: Minor > Fix For: 2.3.0 > > Original Estimate: 72h > Remaining Estimate: 72h > > testNameNote = "(minNumPostShufflePartitions: 3) is not correct. > it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in > ExchangeCoordinatorSuite -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite
[ https://issues.apache.org/jira/browse/SPARK-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21843: Assignee: Apache Spark > testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" > in ExchangeCoordinatorSuite > - > > Key: SPARK-21843 > URL: https://issues.apache.org/jira/browse/SPARK-21843 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.3.0 >Reporter: iamhumanbeing >Assignee: Apache Spark >Priority: Minor > Fix For: 2.3.0 > > Original Estimate: 72h > Remaining Estimate: 72h > > testNameNote = "(minNumPostShufflePartitions: 3) is not correct. > it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in > ExchangeCoordinatorSuite -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite
[ https://issues.apache.org/jira/browse/SPARK-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21843: Assignee: (was: Apache Spark) > testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" > in ExchangeCoordinatorSuite > - > > Key: SPARK-21843 > URL: https://issues.apache.org/jira/browse/SPARK-21843 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.3.0 >Reporter: iamhumanbeing >Priority: Minor > Fix For: 2.3.0 > > Original Estimate: 72h > Remaining Estimate: 72h > > testNameNote = "(minNumPostShufflePartitions: 3) is not correct. > it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in > ExchangeCoordinatorSuite -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite
[ https://issues.apache.org/jira/browse/SPARK-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142505#comment-16142505 ] Apache Spark commented on SPARK-21843: -- User 'iamhumanbeing' has created a pull request for this issue: https://github.com/apache/spark/pull/19057 > testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" > in ExchangeCoordinatorSuite > - > > Key: SPARK-21843 > URL: https://issues.apache.org/jira/browse/SPARK-21843 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.3.0 >Reporter: iamhumanbeing >Priority: Minor > Fix For: 2.3.0 > > Original Estimate: 72h > Remaining Estimate: 72h > > testNameNote = "(minNumPostShufflePartitions: 3) is not correct. > it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in > ExchangeCoordinatorSuite -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21834: Assignee: (was: Apache Spark) > Incorrect executor request in case of dynamic allocation > > > Key: SPARK-21834 > URL: https://issues.apache.org/jira/browse/SPARK-21834 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.0 >Reporter: Sital Kedia > > killExecutor api currently does not allow killing an executor without > updating the total number of executors needed. In case of dynamic allocation > is turned on and the allocator tries to kill an executor, the scheduler > reduces the total number of executors needed ( see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) > which is incorrect because the allocator already takes care of setting the > required number of executors itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21834: Assignee: Apache Spark > Incorrect executor request in case of dynamic allocation > > > Key: SPARK-21834 > URL: https://issues.apache.org/jira/browse/SPARK-21834 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.0 >Reporter: Sital Kedia >Assignee: Apache Spark > > killExecutor api currently does not allow killing an executor without > updating the total number of executors needed. In case of dynamic allocation > is turned on and the allocator tries to kill an executor, the scheduler > reduces the total number of executors needed ( see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) > which is incorrect because the allocator already takes care of setting the > required number of executors itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21834: Assignee: Apache Spark > Incorrect executor request in case of dynamic allocation > > > Key: SPARK-21834 > URL: https://issues.apache.org/jira/browse/SPARK-21834 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.0 >Reporter: Sital Kedia >Assignee: Apache Spark > > killExecutor api currently does not allow killing an executor without > updating the total number of executors needed. In case of dynamic allocation > is turned on and the allocator tries to kill an executor, the scheduler > reduces the total number of executors needed ( see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) > which is incorrect because the allocator already takes care of setting the > required number of executors itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21834: Assignee: (was: Apache Spark) > Incorrect executor request in case of dynamic allocation > > > Key: SPARK-21834 > URL: https://issues.apache.org/jira/browse/SPARK-21834 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.0 >Reporter: Sital Kedia > > killExecutor api currently does not allow killing an executor without > updating the total number of executors needed. In case of dynamic allocation > is turned on and the allocator tries to kill an executor, the scheduler > reduces the total number of executors needed ( see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) > which is incorrect because the allocator already takes care of setting the > required number of executors itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21834: Assignee: Apache Spark > Incorrect executor request in case of dynamic allocation > > > Key: SPARK-21834 > URL: https://issues.apache.org/jira/browse/SPARK-21834 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.0 >Reporter: Sital Kedia >Assignee: Apache Spark > > killExecutor api currently does not allow killing an executor without > updating the total number of executors needed. In case of dynamic allocation > is turned on and the allocator tries to kill an executor, the scheduler > reduces the total number of executors needed ( see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) > which is incorrect because the allocator already takes care of setting the > required number of executors itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21834: Assignee: (was: Apache Spark) > Incorrect executor request in case of dynamic allocation > > > Key: SPARK-21834 > URL: https://issues.apache.org/jira/browse/SPARK-21834 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.0 >Reporter: Sital Kedia > > killExecutor api currently does not allow killing an executor without > updating the total number of executors needed. In case of dynamic allocation > is turned on and the allocator tries to kill an executor, the scheduler > reduces the total number of executors needed ( see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) > which is incorrect because the allocator already takes care of setting the > required number of executors itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21834: Assignee: Apache Spark > Incorrect executor request in case of dynamic allocation > > > Key: SPARK-21834 > URL: https://issues.apache.org/jira/browse/SPARK-21834 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.0 >Reporter: Sital Kedia >Assignee: Apache Spark > > killExecutor api currently does not allow killing an executor without > updating the total number of executors needed. In case of dynamic allocation > is turned on and the allocator tries to kill an executor, the scheduler > reduces the total number of executors needed ( see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) > which is incorrect because the allocator already takes care of setting the > required number of executors itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21834) Incorrect executor request in case of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142468#comment-16142468 ] Apache Spark commented on SPARK-21834: -- User 'sitalkedia' has created a pull request for this issue: https://github.com/apache/spark/pull/19048 > Incorrect executor request in case of dynamic allocation > > > Key: SPARK-21834 > URL: https://issues.apache.org/jira/browse/SPARK-21834 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.0 >Reporter: Sital Kedia > > killExecutor api currently does not allow killing an executor without > updating the total number of executors needed. In case of dynamic allocation > is turned on and the allocator tries to kill an executor, the scheduler > reduces the total number of executors needed ( see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) > which is incorrect because the allocator already takes care of setting the > required number of executors itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21834) Incorrect executor request in case of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-21834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21834: Assignee: (was: Apache Spark) > Incorrect executor request in case of dynamic allocation > > > Key: SPARK-21834 > URL: https://issues.apache.org/jira/browse/SPARK-21834 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.2.0 >Reporter: Sital Kedia > > killExecutor api currently does not allow killing an executor without > updating the total number of executors needed. In case of dynamic allocation > is turned on and the allocator tries to kill an executor, the scheduler > reduces the total number of executors needed ( see > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) > which is incorrect because the allocator already takes care of setting the > required number of executors itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-21772) HiveException unable to move results from srcf to destf in InsertIntoHiveTable
[ https://issues.apache.org/jira/browse/SPARK-21772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-21772. --- Resolution: Cannot Reproduce > HiveException unable to move results from srcf to destf in > InsertIntoHiveTable > --- > > Key: SPARK-21772 > URL: https://issues.apache.org/jira/browse/SPARK-21772 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 > Environment: JDK1.7 > CentOS 6.3 > Spark2.1 >Reporter: liupengcheng > Labels: sql > > Currently, when execute {code:java} create table as select {code} would > return Exception: > {code:java} > 2017-08-17,16:14:18,792 ERROR > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.sql.hive.client.Shim_v0_12.loadTable(HiveShim.scala:346) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply$mcV$sp(HiveClientImpl.scala:770) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:770) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:770) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:316) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:262) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:261) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:305) > at > org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:769) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply$mcV$sp(HiveExternalCatalog.scala:765) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:763) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:763) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:100) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:763) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:323) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:120) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:120) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:138) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:119) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at > org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectCommand.run(CreateHiveTableAsSelectCommand.scala:92) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:120) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:120) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at >
[jira] [Commented] (SPARK-21691) Accessing canonicalized plan for query with limit throws exception
[ https://issues.apache.org/jira/browse/SPARK-21691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142395#comment-16142395 ] Xiao Li commented on SPARK-21691: - If you really need to call `canonicalized`, you can do it {noformat} session.sql("select * from (values 0, 1) limit 1").queryExecution.analyzed.canonicalized {noformat} > Accessing canonicalized plan for query with limit throws exception > -- > > Key: SPARK-21691 > URL: https://issues.apache.org/jira/browse/SPARK-21691 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Bjoern Toldbod > > Accessing the logical, canonicalized plan fails for queries with limits. > The following demonstrates the issue: > {code:java} > val session = SparkSession.builder.master("local").getOrCreate() > // This works > session.sql("select * from (values 0, > 1)").queryExecution.logical.canonicalized > // This fails > session.sql("select * from (values 0, 1) limit > 1").queryExecution.logical.canonicalized > {code} > The message in the thrown exception is somewhat confusing (or at least not > directly related to the limit): > "Invalid call to toAttribute on unresolved object, tree: *" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21691) Accessing canonicalized plan for query with limit throws exception
[ https://issues.apache.org/jira/browse/SPARK-21691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142391#comment-16142391 ] Xiao Li commented on SPARK-21691: - This is a pretty internal API. We do not expect users to call it. Could you show us why you need it? > Accessing canonicalized plan for query with limit throws exception > -- > > Key: SPARK-21691 > URL: https://issues.apache.org/jira/browse/SPARK-21691 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Bjoern Toldbod > > Accessing the logical, canonicalized plan fails for queries with limits. > The following demonstrates the issue: > {code:java} > val session = SparkSession.builder.master("local").getOrCreate() > // This works > session.sql("select * from (values 0, > 1)").queryExecution.logical.canonicalized > // This fails > session.sql("select * from (values 0, 1) limit > 1").queryExecution.logical.canonicalized > {code} > The message in the thrown exception is somewhat confusing (or at least not > directly related to the limit): > "Invalid call to toAttribute on unresolved object, tree: *" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21843) testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite
iamhumanbeing created SPARK-21843: - Summary: testNameNote should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite Key: SPARK-21843 URL: https://issues.apache.org/jira/browse/SPARK-21843 Project: Spark Issue Type: Test Components: Tests Affects Versions: 2.3.0 Reporter: iamhumanbeing Priority: Minor Fix For: 2.3.0 testNameNote = "(minNumPostShufflePartitions: 3) is not correct. it should be "(minNumPostShufflePartitions: " + numPartitions + ")" in ExchangeCoordinatorSuite -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21798) No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server
[ https://issues.apache.org/jira/browse/SPARK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21798: Assignee: Apache Spark > No config to replace deprecated SPARK_CLASSPATH config for launching daemons > like History Server > > > Key: SPARK-21798 > URL: https://issues.apache.org/jira/browse/SPARK-21798 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sanket Reddy >Assignee: Apache Spark >Priority: Minor > > History Server Launch uses SparkClassCommandBuilder for launching the server. > It is observed that SPARK_CLASSPATH has been removed and deprecated. For > spark-submit this takes a different route and spark.driver.extraClasspath > takes care of specifying additional jars in the classpath that were > previously specified in the SPARK_CLASSPATH. Right now the only way specify > the additional jars for launching daemons such as history server is using > SPARK_DIST_CLASSPATH > (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I > presume is a distribution classpath. It would be nice to have a similar > config like spark.driver.extraClasspath for launching daemons similar to > history server. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21798) No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server
[ https://issues.apache.org/jira/browse/SPARK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21798: Assignee: (was: Apache Spark) > No config to replace deprecated SPARK_CLASSPATH config for launching daemons > like History Server > > > Key: SPARK-21798 > URL: https://issues.apache.org/jira/browse/SPARK-21798 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sanket Reddy >Priority: Minor > > History Server Launch uses SparkClassCommandBuilder for launching the server. > It is observed that SPARK_CLASSPATH has been removed and deprecated. For > spark-submit this takes a different route and spark.driver.extraClasspath > takes care of specifying additional jars in the classpath that were > previously specified in the SPARK_CLASSPATH. Right now the only way specify > the additional jars for launching daemons such as history server is using > SPARK_DIST_CLASSPATH > (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I > presume is a distribution classpath. It would be nice to have a similar > config like spark.driver.extraClasspath for launching daemons similar to > history server. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21798) No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server
[ https://issues.apache.org/jira/browse/SPARK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21798: Assignee: Apache Spark > No config to replace deprecated SPARK_CLASSPATH config for launching daemons > like History Server > > > Key: SPARK-21798 > URL: https://issues.apache.org/jira/browse/SPARK-21798 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sanket Reddy >Assignee: Apache Spark >Priority: Minor > > History Server Launch uses SparkClassCommandBuilder for launching the server. > It is observed that SPARK_CLASSPATH has been removed and deprecated. For > spark-submit this takes a different route and spark.driver.extraClasspath > takes care of specifying additional jars in the classpath that were > previously specified in the SPARK_CLASSPATH. Right now the only way specify > the additional jars for launching daemons such as history server is using > SPARK_DIST_CLASSPATH > (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I > presume is a distribution classpath. It would be nice to have a similar > config like spark.driver.extraClasspath for launching daemons similar to > history server. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21798) No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server
[ https://issues.apache.org/jira/browse/SPARK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21798: Assignee: (was: Apache Spark) > No config to replace deprecated SPARK_CLASSPATH config for launching daemons > like History Server > > > Key: SPARK-21798 > URL: https://issues.apache.org/jira/browse/SPARK-21798 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sanket Reddy >Priority: Minor > > History Server Launch uses SparkClassCommandBuilder for launching the server. > It is observed that SPARK_CLASSPATH has been removed and deprecated. For > spark-submit this takes a different route and spark.driver.extraClasspath > takes care of specifying additional jars in the classpath that were > previously specified in the SPARK_CLASSPATH. Right now the only way specify > the additional jars for launching daemons such as history server is using > SPARK_DIST_CLASSPATH > (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I > presume is a distribution classpath. It would be nice to have a similar > config like spark.driver.extraClasspath for launching daemons similar to > history server. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21798) No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server
[ https://issues.apache.org/jira/browse/SPARK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21798: Assignee: (was: Apache Spark) > No config to replace deprecated SPARK_CLASSPATH config for launching daemons > like History Server > > > Key: SPARK-21798 > URL: https://issues.apache.org/jira/browse/SPARK-21798 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sanket Reddy >Priority: Minor > > History Server Launch uses SparkClassCommandBuilder for launching the server. > It is observed that SPARK_CLASSPATH has been removed and deprecated. For > spark-submit this takes a different route and spark.driver.extraClasspath > takes care of specifying additional jars in the classpath that were > previously specified in the SPARK_CLASSPATH. Right now the only way specify > the additional jars for launching daemons such as history server is using > SPARK_DIST_CLASSPATH > (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I > presume is a distribution classpath. It would be nice to have a similar > config like spark.driver.extraClasspath for launching daemons similar to > history server. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21798) No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server
[ https://issues.apache.org/jira/browse/SPARK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21798: Assignee: Apache Spark > No config to replace deprecated SPARK_CLASSPATH config for launching daemons > like History Server > > > Key: SPARK-21798 > URL: https://issues.apache.org/jira/browse/SPARK-21798 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sanket Reddy >Assignee: Apache Spark >Priority: Minor > > History Server Launch uses SparkClassCommandBuilder for launching the server. > It is observed that SPARK_CLASSPATH has been removed and deprecated. For > spark-submit this takes a different route and spark.driver.extraClasspath > takes care of specifying additional jars in the classpath that were > previously specified in the SPARK_CLASSPATH. Right now the only way specify > the additional jars for launching daemons such as history server is using > SPARK_DIST_CLASSPATH > (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I > presume is a distribution classpath. It would be nice to have a similar > config like spark.driver.extraClasspath for launching daemons similar to > history server. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21798) No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server
[ https://issues.apache.org/jira/browse/SPARK-21798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142357#comment-16142357 ] Apache Spark commented on SPARK-21798: -- User 'pgandhi999' has created a pull request for this issue: https://github.com/apache/spark/pull/19047 > No config to replace deprecated SPARK_CLASSPATH config for launching daemons > like History Server > > > Key: SPARK-21798 > URL: https://issues.apache.org/jira/browse/SPARK-21798 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sanket Reddy >Priority: Minor > > History Server Launch uses SparkClassCommandBuilder for launching the server. > It is observed that SPARK_CLASSPATH has been removed and deprecated. For > spark-submit this takes a different route and spark.driver.extraClasspath > takes care of specifying additional jars in the classpath that were > previously specified in the SPARK_CLASSPATH. Right now the only way specify > the additional jars for launching daemons such as history server is using > SPARK_DIST_CLASSPATH > (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I > presume is a distribution classpath. It would be nice to have a similar > config like spark.driver.extraClasspath for launching daemons similar to > history server. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: Apache Spark > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: (was: Apache Spark) > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: Apache Spark > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: (was: Apache Spark) > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: Apache Spark > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: (was: Apache Spark) > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142337#comment-16142337 ] Arthur Rand commented on SPARK-16742: - Gotcha, https://issues.apache.org/jira/browse/SPARK-21842 is to track work. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt >Assignee: Arthur Rand > Fix For: 2.3.0 > > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21842) Support Kerberos ticket renewal and creation in Mesos
Arthur Rand created SPARK-21842: --- Summary: Support Kerberos ticket renewal and creation in Mesos Key: SPARK-21842 URL: https://issues.apache.org/jira/browse/SPARK-21842 Project: Spark Issue Type: New Feature Components: Mesos Affects Versions: 2.3.0 Reporter: Arthur Rand Fix For: 2.3.0 We at Mesosphere have written Kerberos support for Spark on Mesos. The code to use Kerberos on a Mesos cluster has been added to Apache Spark (SPARK-16742). This ticket is to complete the implementation and allow for ticket renewal and creation. Specifically for long running and streaming jobs. Mesosphere design doc (needs revision, wip): https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: Apache Spark > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: (was: Apache Spark) > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: (was: Apache Spark) > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: Apache Spark > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: Apache Spark > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: (was: Apache Spark) > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: Apache Spark > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: (was: Apache Spark) > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: (was: Apache Spark) > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression
[ https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21839: Assignee: Apache Spark > Support SQL config for ORC compression > --- > > Key: SPARK-21839 > URL: https://issues.apache.org/jira/browse/SPARK-21839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark > > This issue aims to provide `spark.sql.orc.compression.codec` like > `spark.sql.parquet.compression.codec`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
[ https://issues.apache.org/jira/browse/SPARK-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142227#comment-16142227 ] Apache Spark commented on SPARK-17321: -- User 'jerryshao' has created a pull request for this issue: https://github.com/apache/spark/pull/19032 > YARN shuffle service should use good disk from yarn.nodemanager.local-dirs > -- > > Key: SPARK-17321 > URL: https://issues.apache.org/jira/browse/SPARK-17321 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.2, 2.0.0, 2.1.1 >Reporter: yunjiong zhao > > We run spark on yarn, after enabled spark dynamic allocation, we notice some > spark application failed randomly due to YarnShuffleService. > From log I found > {quote} > 2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: > Error while initializing Netty pipeline > java.lang.NullPointerException > at > org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77) > at > org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159) > at > org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135) > at > org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123) > at > org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116) > at > io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > {quote} > Which caused by the first disk in yarn.nodemanager.local-dirs was broken. > If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost > hundred nodes which is unacceptable. > We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good > disks if the first one is broken? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
[ https://issues.apache.org/jira/browse/SPARK-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142229#comment-16142229 ] Apache Spark commented on SPARK-17321: -- User 'jerryshao' has created a pull request for this issue: https://github.com/apache/spark/pull/19032 > YARN shuffle service should use good disk from yarn.nodemanager.local-dirs > -- > > Key: SPARK-17321 > URL: https://issues.apache.org/jira/browse/SPARK-17321 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.2, 2.0.0, 2.1.1 >Reporter: yunjiong zhao > > We run spark on yarn, after enabled spark dynamic allocation, we notice some > spark application failed randomly due to YarnShuffleService. > From log I found > {quote} > 2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: > Error while initializing Netty pipeline > java.lang.NullPointerException > at > org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77) > at > org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159) > at > org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135) > at > org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123) > at > org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116) > at > io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > {quote} > Which caused by the first disk in yarn.nodemanager.local-dirs was broken. > If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost > hundred nodes which is unacceptable. > We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good > disks if the first one is broken? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21841) Spark SQL doesn't pick up column added in hive when table created with saveAsTable
Thomas Graves created SPARK-21841: - Summary: Spark SQL doesn't pick up column added in hive when table created with saveAsTable Key: SPARK-21841 URL: https://issues.apache.org/jira/browse/SPARK-21841 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: Thomas Graves If you create a table in Spark sql but then you modify the table in hive to add a column, spark sql doesn't pick up the new column. Basic example: {code} t1 = spark.sql("select ip_address from mydb.test_table limit 1") t1.show() ++ | ip_address| ++ |1.30.25.5| ++ t1.write.saveAsTable('mydb.t1') In Hive: alter table mydb.t1 add columns (bcookie string) t1 = spark.table("mydb.t1") t1.show() ++ | ip_address| ++ |1.30.25.5| ++ {code} It looks like its because spark sql is picking up the schema from spark.sql.sources.schema.part.0 rather then from hive. Interestingly enough it appears that if you create the table differently like: spark.sql("create table mydb.t1 select ip_address from mydb.test_table limit 1") Run your alter table on mydb.t1 val t1 = spark.table("mydb.t1") Then it works properly. It looks like the difference is when it doesn't work spark.sql.sources.provider=parquet is set. Its doing this from createDataSourceTable where provider is parquet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher
[ https://issues.apache.org/jira/browse/SPARK-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20812: Assignee: (was: Apache Spark) > Add Mesos Secrets support to the spark dispatcher > - > > Key: SPARK-20812 > URL: https://issues.apache.org/jira/browse/SPARK-20812 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Michael Gummelt > > Mesos 1.4 will support secrets. In order to support sending keytabs through > the Spark Dispatcher, or any other secret, we need to integrate this with the > Spark Dispatcher. > The integration should include support for both file-based and env-based > secrets. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher
[ https://issues.apache.org/jira/browse/SPARK-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20812: Assignee: Apache Spark > Add Mesos Secrets support to the spark dispatcher > - > > Key: SPARK-20812 > URL: https://issues.apache.org/jira/browse/SPARK-20812 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Michael Gummelt >Assignee: Apache Spark > > Mesos 1.4 will support secrets. In order to support sending keytabs through > the Spark Dispatcher, or any other secret, we need to integrate this with the > Spark Dispatcher. > The integration should include support for both file-based and env-based > secrets. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher
[ https://issues.apache.org/jira/browse/SPARK-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20812: Assignee: Apache Spark > Add Mesos Secrets support to the spark dispatcher > - > > Key: SPARK-20812 > URL: https://issues.apache.org/jira/browse/SPARK-20812 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Michael Gummelt >Assignee: Apache Spark > > Mesos 1.4 will support secrets. In order to support sending keytabs through > the Spark Dispatcher, or any other secret, we need to integrate this with the > Spark Dispatcher. > The integration should include support for both file-based and env-based > secrets. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher
[ https://issues.apache.org/jira/browse/SPARK-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142201#comment-16142201 ] Apache Spark commented on SPARK-20812: -- User 'ArtRand' has created a pull request for this issue: https://github.com/apache/spark/pull/18837 > Add Mesos Secrets support to the spark dispatcher > - > > Key: SPARK-20812 > URL: https://issues.apache.org/jira/browse/SPARK-20812 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Michael Gummelt > > Mesos 1.4 will support secrets. In order to support sending keytabs through > the Spark Dispatcher, or any other secret, we need to integrate this with the > Spark Dispatcher. > The integration should include support for both file-based and env-based > secrets. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21806) BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
[ https://issues.apache.org/jira/browse/SPARK-21806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21806: Assignee: (was: Apache Spark) > BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading > -- > > Key: SPARK-21806 > URL: https://issues.apache.org/jira/browse/SPARK-21806 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 2.2.0 >Reporter: Marc Kaminski >Priority: Minor > Attachments: PRROC_example.jpeg > > > I would like to reference to a [discussion in scikit-learn| > https://github.com/scikit-learn/scikit-learn/issues/4223], as this behavior > is probably based on the scikit implementation. > Summary: > Currently, the y-axis intercept of the precision recall curve is set to (0.0, > 1.0). This behavior is not ideal in certain edge cases (see example below) > and can also have an impact on cross validation, when optimization metric is > set to "areaUnderPR". > Please consider [blucena's > post|https://github.com/scikit-learn/scikit-learn/issues/4223#issuecomment-215273613] > for possible alternatives. > Edge case example: > Consider a bad classifier, that assigns a high probability to all samples. A > possible output might look like this: > ||Real label || Score || > |1.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 0.95 | > |0.0 | 0.95 | > |1.0 | 1.0 | > This results in the following pr points (first line set by default): > ||Threshold || Recall ||Precision || > |1.0 | 0.0 | 1.0 | > |0.95| 1.0 | 0.2 | > |0.0| 1.0 | 0,16 | > The auPRC would be around 0.6. Classifiers with a more differentiated > probability assignment will be falsely assumed to perform worse in regard to > this auPRC. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21765) Ensure all leaf nodes that are derived from streaming sources have isStreaming=true
[ https://issues.apache.org/jira/browse/SPARK-21765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142171#comment-16142171 ] Apache Spark commented on SPARK-21765: -- User 'joseph-torres' has created a pull request for this issue: https://github.com/apache/spark/pull/19056 > Ensure all leaf nodes that are derived from streaming sources have > isStreaming=true > --- > > Key: SPARK-21765 > URL: https://issues.apache.org/jira/browse/SPARK-21765 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 2.2.0 >Reporter: Jose Torres > Fix For: 3.0.0 > > > LogicalPlan has an isStreaming bit, but it's incompletely implemented. Some > streaming sources don't set the bit, and the bit can sometimes be lost in > rewriting. Setting the bit for all plans that are logically streaming will > help us simplify the logic around checking query plan validity. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21806) BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
[ https://issues.apache.org/jira/browse/SPARK-21806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21806: Assignee: Apache Spark > BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading > -- > > Key: SPARK-21806 > URL: https://issues.apache.org/jira/browse/SPARK-21806 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 2.2.0 >Reporter: Marc Kaminski >Assignee: Apache Spark >Priority: Minor > Attachments: PRROC_example.jpeg > > > I would like to reference to a [discussion in scikit-learn| > https://github.com/scikit-learn/scikit-learn/issues/4223], as this behavior > is probably based on the scikit implementation. > Summary: > Currently, the y-axis intercept of the precision recall curve is set to (0.0, > 1.0). This behavior is not ideal in certain edge cases (see example below) > and can also have an impact on cross validation, when optimization metric is > set to "areaUnderPR". > Please consider [blucena's > post|https://github.com/scikit-learn/scikit-learn/issues/4223#issuecomment-215273613] > for possible alternatives. > Edge case example: > Consider a bad classifier, that assigns a high probability to all samples. A > possible output might look like this: > ||Real label || Score || > |1.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 0.95 | > |0.0 | 0.95 | > |1.0 | 1.0 | > This results in the following pr points (first line set by default): > ||Threshold || Recall ||Precision || > |1.0 | 0.0 | 1.0 | > |0.95| 1.0 | 0.2 | > |0.0| 1.0 | 0,16 | > The auPRC would be around 0.6. Classifiers with a more differentiated > probability assignment will be falsely assumed to perform worse in regard to > this auPRC. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21806) BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
[ https://issues.apache.org/jira/browse/SPARK-21806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21806: Assignee: Apache Spark > BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading > -- > > Key: SPARK-21806 > URL: https://issues.apache.org/jira/browse/SPARK-21806 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 2.2.0 >Reporter: Marc Kaminski >Assignee: Apache Spark >Priority: Minor > Attachments: PRROC_example.jpeg > > > I would like to reference to a [discussion in scikit-learn| > https://github.com/scikit-learn/scikit-learn/issues/4223], as this behavior > is probably based on the scikit implementation. > Summary: > Currently, the y-axis intercept of the precision recall curve is set to (0.0, > 1.0). This behavior is not ideal in certain edge cases (see example below) > and can also have an impact on cross validation, when optimization metric is > set to "areaUnderPR". > Please consider [blucena's > post|https://github.com/scikit-learn/scikit-learn/issues/4223#issuecomment-215273613] > for possible alternatives. > Edge case example: > Consider a bad classifier, that assigns a high probability to all samples. A > possible output might look like this: > ||Real label || Score || > |1.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 0.95 | > |0.0 | 0.95 | > |1.0 | 1.0 | > This results in the following pr points (first line set by default): > ||Threshold || Recall ||Precision || > |1.0 | 0.0 | 1.0 | > |0.95| 1.0 | 0.2 | > |0.0| 1.0 | 0,16 | > The auPRC would be around 0.6. Classifiers with a more differentiated > probability assignment will be falsely assumed to perform worse in regard to > this auPRC. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21806) BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
[ https://issues.apache.org/jira/browse/SPARK-21806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21806: Assignee: (was: Apache Spark) > BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading > -- > > Key: SPARK-21806 > URL: https://issues.apache.org/jira/browse/SPARK-21806 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 2.2.0 >Reporter: Marc Kaminski >Priority: Minor > Attachments: PRROC_example.jpeg > > > I would like to reference to a [discussion in scikit-learn| > https://github.com/scikit-learn/scikit-learn/issues/4223], as this behavior > is probably based on the scikit implementation. > Summary: > Currently, the y-axis intercept of the precision recall curve is set to (0.0, > 1.0). This behavior is not ideal in certain edge cases (see example below) > and can also have an impact on cross validation, when optimization metric is > set to "areaUnderPR". > Please consider [blucena's > post|https://github.com/scikit-learn/scikit-learn/issues/4223#issuecomment-215273613] > for possible alternatives. > Edge case example: > Consider a bad classifier, that assigns a high probability to all samples. A > possible output might look like this: > ||Real label || Score || > |1.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 0.95 | > |0.0 | 0.95 | > |1.0 | 1.0 | > This results in the following pr points (first line set by default): > ||Threshold || Recall ||Precision || > |1.0 | 0.0 | 1.0 | > |0.95| 1.0 | 0.2 | > |0.0| 1.0 | 0,16 | > The auPRC would be around 0.6. Classifiers with a more differentiated > probability assignment will be falsely assumed to perform worse in regard to > this auPRC. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21806) BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
[ https://issues.apache.org/jira/browse/SPARK-21806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21806: Assignee: (was: Apache Spark) > BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading > -- > > Key: SPARK-21806 > URL: https://issues.apache.org/jira/browse/SPARK-21806 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 2.2.0 >Reporter: Marc Kaminski >Priority: Minor > Attachments: PRROC_example.jpeg > > > I would like to reference to a [discussion in scikit-learn| > https://github.com/scikit-learn/scikit-learn/issues/4223], as this behavior > is probably based on the scikit implementation. > Summary: > Currently, the y-axis intercept of the precision recall curve is set to (0.0, > 1.0). This behavior is not ideal in certain edge cases (see example below) > and can also have an impact on cross validation, when optimization metric is > set to "areaUnderPR". > Please consider [blucena's > post|https://github.com/scikit-learn/scikit-learn/issues/4223#issuecomment-215273613] > for possible alternatives. > Edge case example: > Consider a bad classifier, that assigns a high probability to all samples. A > possible output might look like this: > ||Real label || Score || > |1.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 0.95 | > |0.0 | 0.95 | > |1.0 | 1.0 | > This results in the following pr points (first line set by default): > ||Threshold || Recall ||Precision || > |1.0 | 0.0 | 1.0 | > |0.95| 1.0 | 0.2 | > |0.0| 1.0 | 0,16 | > The auPRC would be around 0.6. Classifiers with a more differentiated > probability assignment will be falsely assumed to perform worse in regard to > this auPRC. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21806) BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading
[ https://issues.apache.org/jira/browse/SPARK-21806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21806: Assignee: Apache Spark > BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading > -- > > Key: SPARK-21806 > URL: https://issues.apache.org/jira/browse/SPARK-21806 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 2.2.0 >Reporter: Marc Kaminski >Assignee: Apache Spark >Priority: Minor > Attachments: PRROC_example.jpeg > > > I would like to reference to a [discussion in scikit-learn| > https://github.com/scikit-learn/scikit-learn/issues/4223], as this behavior > is probably based on the scikit implementation. > Summary: > Currently, the y-axis intercept of the precision recall curve is set to (0.0, > 1.0). This behavior is not ideal in certain edge cases (see example below) > and can also have an impact on cross validation, when optimization metric is > set to "areaUnderPR". > Please consider [blucena's > post|https://github.com/scikit-learn/scikit-learn/issues/4223#issuecomment-215273613] > for possible alternatives. > Edge case example: > Consider a bad classifier, that assigns a high probability to all samples. A > possible output might look like this: > ||Real label || Score || > |1.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 0.95 | > |0.0 | 0.95 | > |1.0 | 1.0 | > This results in the following pr points (first line set by default): > ||Threshold || Recall ||Precision || > |1.0 | 0.0 | 1.0 | > |0.95| 1.0 | 0.2 | > |0.0| 1.0 | 0,16 | > The auPRC would be around 0.6. Classifiers with a more differentiated > probability assignment will be falsely assumed to perform worse in regard to > this auPRC. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21728) Allow SparkSubmit to use logging
[ https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21728: Assignee: Apache Spark > Allow SparkSubmit to use logging > > > Key: SPARK-21728 > URL: https://issues.apache.org/jira/browse/SPARK-21728 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Marcelo Vanzin >Assignee: Apache Spark >Priority: Minor > > Currently, code in {{SparkSubmit}} cannot call classes or methods that > initialize the Spark {{Logging}} framework. That is because at that time > {{SparkSubmit}} doesn't yet know which application will run, and logging is > initialized differently for certain special applications (notably, the > shells). > It would be better if either {{SparkSubmit}} did logging initialization > earlier based on the application to be run, or did it in a way that could be > overridden later when the app initializes. > Without this, there are currently a few parts of {{SparkSubmit}} that > duplicates code from other parts of Spark just to avoid logging. For example: > * > [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860] > replicates code from Utils.scala > * > [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54] > replicates code from Utils.scala and installs its own shutdown hook > * a few parts of the code could use {{SparkConf}} but can't right now because > of the logging issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21728) Allow SparkSubmit to use logging
[ https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21728: Assignee: (was: Apache Spark) > Allow SparkSubmit to use logging > > > Key: SPARK-21728 > URL: https://issues.apache.org/jira/browse/SPARK-21728 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Marcelo Vanzin >Priority: Minor > > Currently, code in {{SparkSubmit}} cannot call classes or methods that > initialize the Spark {{Logging}} framework. That is because at that time > {{SparkSubmit}} doesn't yet know which application will run, and logging is > initialized differently for certain special applications (notably, the > shells). > It would be better if either {{SparkSubmit}} did logging initialization > earlier based on the application to be run, or did it in a way that could be > overridden later when the app initializes. > Without this, there are currently a few parts of {{SparkSubmit}} that > duplicates code from other parts of Spark just to avoid logging. For example: > * > [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860] > replicates code from Utils.scala > * > [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54] > replicates code from Utils.scala and installs its own shutdown hook > * a few parts of the code could use {{SparkConf}} but can't right now because > of the logging issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21728) Allow SparkSubmit to use logging
[ https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21728: Assignee: Apache Spark > Allow SparkSubmit to use logging > > > Key: SPARK-21728 > URL: https://issues.apache.org/jira/browse/SPARK-21728 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Marcelo Vanzin >Assignee: Apache Spark >Priority: Minor > > Currently, code in {{SparkSubmit}} cannot call classes or methods that > initialize the Spark {{Logging}} framework. That is because at that time > {{SparkSubmit}} doesn't yet know which application will run, and logging is > initialized differently for certain special applications (notably, the > shells). > It would be better if either {{SparkSubmit}} did logging initialization > earlier based on the application to be run, or did it in a way that could be > overridden later when the app initializes. > Without this, there are currently a few parts of {{SparkSubmit}} that > duplicates code from other parts of Spark just to avoid logging. For example: > * > [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860] > replicates code from Utils.scala > * > [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54] > replicates code from Utils.scala and installs its own shutdown hook > * a few parts of the code could use {{SparkConf}} but can't right now because > of the logging issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21728) Allow SparkSubmit to use logging
[ https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21728: Assignee: (was: Apache Spark) > Allow SparkSubmit to use logging > > > Key: SPARK-21728 > URL: https://issues.apache.org/jira/browse/SPARK-21728 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Marcelo Vanzin >Priority: Minor > > Currently, code in {{SparkSubmit}} cannot call classes or methods that > initialize the Spark {{Logging}} framework. That is because at that time > {{SparkSubmit}} doesn't yet know which application will run, and logging is > initialized differently for certain special applications (notably, the > shells). > It would be better if either {{SparkSubmit}} did logging initialization > earlier based on the application to be run, or did it in a way that could be > overridden later when the app initializes. > Without this, there are currently a few parts of {{SparkSubmit}} that > duplicates code from other parts of Spark just to avoid logging. For example: > * > [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860] > replicates code from Utils.scala > * > [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54] > replicates code from Utils.scala and installs its own shutdown hook > * a few parts of the code could use {{SparkConf}} but can't right now because > of the logging issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21728) Allow SparkSubmit to use logging
[ https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142151#comment-16142151 ] Apache Spark commented on SPARK-21728: -- User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/19013 > Allow SparkSubmit to use logging > > > Key: SPARK-21728 > URL: https://issues.apache.org/jira/browse/SPARK-21728 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Marcelo Vanzin >Priority: Minor > > Currently, code in {{SparkSubmit}} cannot call classes or methods that > initialize the Spark {{Logging}} framework. That is because at that time > {{SparkSubmit}} doesn't yet know which application will run, and logging is > initialized differently for certain special applications (notably, the > shells). > It would be better if either {{SparkSubmit}} did logging initialization > earlier based on the application to be run, or did it in a way that could be > overridden later when the app initializes. > Without this, there are currently a few parts of {{SparkSubmit}} that > duplicates code from other parts of Spark just to avoid logging. For example: > * > [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860] > replicates code from Utils.scala > * > [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54] > replicates code from Utils.scala and installs its own shutdown hook > * a few parts of the code could use {{SparkConf}} but can't right now because > of the logging issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21728) Allow SparkSubmit to use logging
[ https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21728: Assignee: (was: Apache Spark) > Allow SparkSubmit to use logging > > > Key: SPARK-21728 > URL: https://issues.apache.org/jira/browse/SPARK-21728 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Marcelo Vanzin >Priority: Minor > > Currently, code in {{SparkSubmit}} cannot call classes or methods that > initialize the Spark {{Logging}} framework. That is because at that time > {{SparkSubmit}} doesn't yet know which application will run, and logging is > initialized differently for certain special applications (notably, the > shells). > It would be better if either {{SparkSubmit}} did logging initialization > earlier based on the application to be run, or did it in a way that could be > overridden later when the app initializes. > Without this, there are currently a few parts of {{SparkSubmit}} that > duplicates code from other parts of Spark just to avoid logging. For example: > * > [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860] > replicates code from Utils.scala > * > [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54] > replicates code from Utils.scala and installs its own shutdown hook > * a few parts of the code could use {{SparkConf}} but can't right now because > of the logging issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21728) Allow SparkSubmit to use logging
[ https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21728: Assignee: Apache Spark > Allow SparkSubmit to use logging > > > Key: SPARK-21728 > URL: https://issues.apache.org/jira/browse/SPARK-21728 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Marcelo Vanzin >Assignee: Apache Spark >Priority: Minor > > Currently, code in {{SparkSubmit}} cannot call classes or methods that > initialize the Spark {{Logging}} framework. That is because at that time > {{SparkSubmit}} doesn't yet know which application will run, and logging is > initialized differently for certain special applications (notably, the > shells). > It would be better if either {{SparkSubmit}} did logging initialization > earlier based on the application to be run, or did it in a way that could be > overridden later when the app initializes. > Without this, there are currently a few parts of {{SparkSubmit}} that > duplicates code from other parts of Spark just to avoid logging. For example: > * > [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860] > replicates code from Utils.scala > * > [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54] > replicates code from Utils.scala and installs its own shutdown hook > * a few parts of the code could use {{SparkConf}} but can't right now because > of the logging issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21728) Allow SparkSubmit to use logging
[ https://issues.apache.org/jira/browse/SPARK-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142149#comment-16142149 ] Apache Spark commented on SPARK-21728: -- User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/19013 > Allow SparkSubmit to use logging > > > Key: SPARK-21728 > URL: https://issues.apache.org/jira/browse/SPARK-21728 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Marcelo Vanzin >Priority: Minor > > Currently, code in {{SparkSubmit}} cannot call classes or methods that > initialize the Spark {{Logging}} framework. That is because at that time > {{SparkSubmit}} doesn't yet know which application will run, and logging is > initialized differently for certain special applications (notably, the > shells). > It would be better if either {{SparkSubmit}} did logging initialization > earlier based on the application to be run, or did it in a way that could be > overridden later when the app initializes. > Without this, there are currently a few parts of {{SparkSubmit}} that > duplicates code from other parts of Spark just to avoid logging. For example: > * > [downloadFiles|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L860] > replicates code from Utils.scala > * > [createTempDir|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala#L54] > replicates code from Utils.scala and installs its own shutdown hook > * a few parts of the code could use {{SparkConf}} but can't right now because > of the logging issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21837) UserDefinedTypeSuite local UDFs not actually testing what it intends
[ https://issues.apache.org/jira/browse/SPARK-21837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21837. - Resolution: Fixed Fix Version/s: 2.3.0 > UserDefinedTypeSuite local UDFs not actually testing what it intends > > > Key: SPARK-21837 > URL: https://issues.apache.org/jira/browse/SPARK-21837 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.2.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > Fix For: 2.3.0 > > > Consider this test in {{UserDefinedTypeSuite}}: > {code} > test("Local UDTs") { > val df = Seq((1, new UDT.MyDenseVector(Array(0.1, 1.0.toDF("int", > "vec") > df.collect()(0).getAs[UDT.MyDenseVector](1) > df.take(1)(0).getAs[UDT.MyDenseVector](1) > > df.limit(1).groupBy('int).agg(first('vec)).collect()(0).getAs[UDT.MyDenseVector](0) > df.orderBy('int).limit(1).groupBy('int).agg(first('vec)).collect()(0) > .getAs[UDT.MyDenseVector](0) > } > {code} > I claim the last two lines can't be right, because they say that the first > column in the aggregation is the vector, when it is the grouping key (int). > But it passes! > But it started failing when I made seemingly unrelated changes in > https://github.com/apache/spark/pull/18645 like: > {code} > [info] - Local UDTs *** FAILED *** (144 milliseconds) > [info] java.lang.ClassCastException: java.lang.Integer cannot be cast to > org.apache.spark.sql.UDT$MyDenseVector > [info] at > org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:211) > [info] at > org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:205) > {code} > I modified the test to actually assert that the vector that results in each > case is the expected one, and it began failing with the same error, in > master. Therefore I am pretty sure the test is not quite doing what it seems > to want to, and the result of these expressions just happened to not be fully > evaluated or checked. > CC [~marmbrus] for the discussion at > https://github.com/apache/spark/commit/3ae25f244bd471ef77002c703f2cc7ed6b524f11##commitcomment-23320234 > and apologies if I'm still really missing something here. I'll open a PR to > show you what I mean. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21837) UserDefinedTypeSuite local UDFs not actually testing what it intends
[ https://issues.apache.org/jira/browse/SPARK-21837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21837: Assignee: Sean Owen (was: Apache Spark) > UserDefinedTypeSuite local UDFs not actually testing what it intends > > > Key: SPARK-21837 > URL: https://issues.apache.org/jira/browse/SPARK-21837 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.2.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > > Consider this test in {{UserDefinedTypeSuite}}: > {code} > test("Local UDTs") { > val df = Seq((1, new UDT.MyDenseVector(Array(0.1, 1.0.toDF("int", > "vec") > df.collect()(0).getAs[UDT.MyDenseVector](1) > df.take(1)(0).getAs[UDT.MyDenseVector](1) > > df.limit(1).groupBy('int).agg(first('vec)).collect()(0).getAs[UDT.MyDenseVector](0) > df.orderBy('int).limit(1).groupBy('int).agg(first('vec)).collect()(0) > .getAs[UDT.MyDenseVector](0) > } > {code} > I claim the last two lines can't be right, because they say that the first > column in the aggregation is the vector, when it is the grouping key (int). > But it passes! > But it started failing when I made seemingly unrelated changes in > https://github.com/apache/spark/pull/18645 like: > {code} > [info] - Local UDTs *** FAILED *** (144 milliseconds) > [info] java.lang.ClassCastException: java.lang.Integer cannot be cast to > org.apache.spark.sql.UDT$MyDenseVector > [info] at > org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:211) > [info] at > org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:205) > {code} > I modified the test to actually assert that the vector that results in each > case is the expected one, and it began failing with the same error, in > master. Therefore I am pretty sure the test is not quite doing what it seems > to want to, and the result of these expressions just happened to not be fully > evaluated or checked. > CC [~marmbrus] for the discussion at > https://github.com/apache/spark/commit/3ae25f244bd471ef77002c703f2cc7ed6b524f11##commitcomment-23320234 > and apologies if I'm still really missing something here. I'll open a PR to > show you what I mean. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20682) Support a new faster ORC data source based on Apache ORC
[ https://issues.apache.org/jira/browse/SPARK-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142138#comment-16142138 ] Apache Spark commented on SPARK-20682: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/18953 > Support a new faster ORC data source based on Apache ORC > > > Key: SPARK-20682 > URL: https://issues.apache.org/jira/browse/SPARK-20682 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.4.1, 1.5.2, 1.6.3, 2.1.1, 2.2.0 >Reporter: Dongjoon Hyun > > Since SPARK-2883, Apache Spark supports Apache ORC inside `sql/hive` module > with Hive dependency. This issue aims to add a new and faster ORC data source > inside `sql/core` and to replace the old ORC data source eventually. In this > issue, the latest Apache ORC 1.4.0 (released yesterday) is used. > There are four key benefits. > - Speed: Use both Spark `ColumnarBatch` and ORC `RowBatch` together. This is > faster than the current implementation in Spark. > - Stability: Apache ORC 1.4.0 has many fixes and we can depend on ORC > community more. > - Usability: User can use `ORC` data sources without hive module, i.e, > `-Phive`. > - Maintainability: Reduce the Hive dependency and can remove old legacy code > later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org