[jira] [Updated] (SPARK-14445) Support native execution of SHOW COLUMNS and SHOW PARTITIONS command
[ https://issues.apache.org/jira/browse/SPARK-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14445: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-14118 > Support native execution of SHOW COLUMNS and SHOW PARTITIONS command > - > > Key: SPARK-14445 > URL: https://issues.apache.org/jira/browse/SPARK-14445 > Project: Spark > Issue Type: Sub-task >Reporter: Dilip Biswal > > 1. Support native execution of SHOW COLUMNS > 2. Support native execution of SHOW PARTITIONS > The syntax of SHOW commands are described in following link. > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Show -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14445) Show columns/partitions
[ https://issues.apache.org/jira/browse/SPARK-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14445: -- Summary: Show columns/partitions (was: show columns/partitions) > Show columns/partitions > --- > > Key: SPARK-14445 > URL: https://issues.apache.org/jira/browse/SPARK-14445 > Project: Spark > Issue Type: Sub-task >Reporter: Dilip Biswal > > 1. Support native execution of SHOW COLUMNS > 2. Support native execution of SHOW PARTITIONS > The syntax of SHOW commands are described in following link. > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Show -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14445) show columns/partitions
[ https://issues.apache.org/jira/browse/SPARK-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14445: -- Summary: show columns/partitions (was: Support native execution of SHOW COLUMNS and SHOW PARTITIONS command) > show columns/partitions > --- > > Key: SPARK-14445 > URL: https://issues.apache.org/jira/browse/SPARK-14445 > Project: Spark > Issue Type: Sub-task >Reporter: Dilip Biswal > > 1. Support native execution of SHOW COLUMNS > 2. Support native execution of SHOW PARTITIONS > The syntax of SHOW commands are described in following link. > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Show -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14499) Add tests to make sure drop partitions of an external table will not delete data
[ https://issues.apache.org/jira/browse/SPARK-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14499: -- Assignee: Xiao Li > Add tests to make sure drop partitions of an external table will not delete > data > > > Key: SPARK-14499 > URL: https://issues.apache.org/jira/browse/SPARK-14499 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Xiao Li > > This is a follow-up of SPARK-14132 > (https://github.com/apache/spark/pull/12220#issuecomment-207625166) to > address https://github.com/apache/spark/pull/12220#issuecomment-207612627. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14441) Consolidate DDL tests
[ https://issues.apache.org/jira/browse/SPARK-14441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-14441: - Assignee: Andrew Or > Consolidate DDL tests > - > > Key: SPARK-14441 > URL: https://issues.apache.org/jira/browse/SPARK-14441 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > Today we have DDLSuite, DDLCommandSuite, HiveDDLCommandSuite. It's confusing > whether a test should exist in one or the other. It also makes it less clear > whether our test coverage is comprehensive. Ideally we should consolidate > these files as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-14414) Make error messages consistent across DDLs
[ https://issues.apache.org/jira/browse/SPARK-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reopened SPARK-14414: --- > Make error messages consistent across DDLs > -- > > Key: SPARK-14414 > URL: https://issues.apache.org/jira/browse/SPARK-14414 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 2.0.0 > > > There are many different error messages right now when the user tries to run > something that's not supported. We might throw AnalysisException or > ParseException or NoSuchFunctionException etc. We should make all of these > consistent before 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14357) Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure
[ https://issues.apache.org/jira/browse/SPARK-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14357: -- Assignee: Jason Moore > Tasks that fail due to CommitDeniedException (a side-effect of speculation) > can cause job failure > - > > Key: SPARK-14357 > URL: https://issues.apache.org/jira/browse/SPARK-14357 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.0, 1.6.1 >Reporter: Jason Moore >Assignee: Jason Moore >Priority: Critical > Fix For: 1.5.2, 1.6.2, 2.0.0 > > > Speculation can often result in a CommitDeniedException, but ideally this > shouldn't result in the job failing. So changes were made along with > SPARK-8167 to ensure that the CommitDeniedException is caught and given a > failure reason that doesn't increment the failure count. > However, I'm still noticing that this exception is causing jobs to fail using > the 1.6.1 release version. > {noformat} > 16/04/04 11:36:02 ERROR InsertIntoHadoopFsRelation: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 18 in > stage 315.0 failed 8 times, most recent failure: Lost task 18.8 in stage > 315.0 (TID 100793, qaphdd099.quantium.com.au.local): > org.apache.spark.SparkException: Task failed while writing rows. > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:272) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Failed to commit task > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:287) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:267) > ... 8 more > Caused by: org.apache.spark.executor.CommitDeniedException: > attempt_201604041136_0315_m_18_8: Not committed because the driver did > not authorize commit > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:135) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:219) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:282) > ... 9 more > {noformat} > It seems to me that the CommitDeniedException gets wrapped into a > RuntimeException at > [WriterContainer.scala#L286|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala#L286] > and then into a SparkException at > [InsertIntoHadoopFsRelation.scala#L154|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L154] > which results in it not being able to be handled properly at > [Executor.scala#L290|https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/executor/Executor.scala#L290] > The solution might be that this catch block should type match on the > inner-most cause of an error? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14357) Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure
[ https://issues.apache.org/jira/browse/SPARK-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14357: -- Target Version/s: 1.5.2, 1.6.2, 2.0.0 Fix Version/s: 1.5.2 2.0.0 1.6.2 > Tasks that fail due to CommitDeniedException (a side-effect of speculation) > can cause job failure > - > > Key: SPARK-14357 > URL: https://issues.apache.org/jira/browse/SPARK-14357 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.0, 1.6.1 >Reporter: Jason Moore >Priority: Critical > Fix For: 1.5.2, 1.6.2, 2.0.0 > > > Speculation can often result in a CommitDeniedException, but ideally this > shouldn't result in the job failing. So changes were made along with > SPARK-8167 to ensure that the CommitDeniedException is caught and given a > failure reason that doesn't increment the failure count. > However, I'm still noticing that this exception is causing jobs to fail using > the 1.6.1 release version. > {noformat} > 16/04/04 11:36:02 ERROR InsertIntoHadoopFsRelation: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 18 in > stage 315.0 failed 8 times, most recent failure: Lost task 18.8 in stage > 315.0 (TID 100793, qaphdd099.quantium.com.au.local): > org.apache.spark.SparkException: Task failed while writing rows. > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:272) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Failed to commit task > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:287) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:267) > ... 8 more > Caused by: org.apache.spark.executor.CommitDeniedException: > attempt_201604041136_0315_m_18_8: Not committed because the driver did > not authorize commit > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:135) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:219) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:282) > ... 9 more > {noformat} > It seems to me that the CommitDeniedException gets wrapped into a > RuntimeException at > [WriterContainer.scala#L286|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala#L286] > and then into a SparkException at > [InsertIntoHadoopFsRelation.scala#L154|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L154] > which results in it not being able to be handled properly at > [Executor.scala#L290|https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/executor/Executor.scala#L290] > The solution might be that this catch block should type match on the > inner-most cause of an error? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14455) ReceiverTracker#allocatedExecutors throw NPE for receiver-less streaming application
[ https://issues.apache.org/jira/browse/SPARK-14455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14455. --- Resolution: Fixed Assignee: Saisai Shao Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > ReceiverTracker#allocatedExecutors throw NPE for receiver-less streaming > application > > > Key: SPARK-14455 > URL: https://issues.apache.org/jira/browse/SPARK-14455 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 2.0.0 >Reporter: Saisai Shao >Assignee: Saisai Shao > Fix For: 2.0.0 > > > When testing streaming dynamic allocation with direct Kafka, streaming > {{ExecutorAllocationManager}} throws NPE which is due to {{ReceiverTracker}} > not start, it will happen when running receiver-less streaming application > like direct Kafka. Here is the exception log: > {noformat} > Exception in thread "RecurringTimer - streaming-executor-allocation-manager" > java.lang.NullPointerException > at > org.apache.spark.streaming.scheduler.ReceiverTracker.allocatedExecutors(ReceiverTracker.scala:244) > at > org.apache.spark.streaming.scheduler.ExecutorAllocationManager.killExecutor(ExecutorAllocationManager.scala:124) > at > org.apache.spark.streaming.scheduler.ExecutorAllocationManager.org$apache$spark$streaming$scheduler$ExecutorAllocationManager$$manageAllocation(ExecutorAllocationManager.scala:100) > at > org.apache.spark.streaming.scheduler.ExecutorAllocationManager$$anonfun$1.apply$mcVJ$sp(ExecutorAllocationManager.scala:66) > at > org.apache.spark.streaming.util.RecurringTimer.triggerActionForNextInterval(RecurringTimer.scala:94) > at > org.apache.spark.streaming.util.RecurringTimer.org$apache$spark$streaming$util$RecurringTimer$$loop(RecurringTimer.scala:106) > at > org.apache.spark.streaming.util.RecurringTimer$$anon$1.run(RecurringTimer.scala:29) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14506) HiveClientImpl's toHiveTable misses a table property for external tables
[ https://issues.apache.org/jira/browse/SPARK-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14506. --- Resolution: Fixed Assignee: Yin Huai Fix Version/s: 2.0.0 > HiveClientImpl's toHiveTable misses a table property for external tables > > > Key: SPARK-14506 > URL: https://issues.apache.org/jira/browse/SPARK-14506 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Yin Huai > Fix For: 2.0.0 > > > For an external table stored in Hive metastore, its table property needs to > have a filed called EXTERNAL and its value needs to be TRUE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14468) Always enable OutputCommitCoordinator
[ https://issues.apache.org/jira/browse/SPARK-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14468. --- Resolution: Fixed Fix Version/s: 1.5.2 2.0.0 1.6.2 1.4.2 Target Version/s: 1.5.2, 1.4.2, 1.6.2, 2.0.0 (was: 1.4.2, 1.5.2, 1.6.2, 2.0.0) > Always enable OutputCommitCoordinator > - > > Key: SPARK-14468 > URL: https://issues.apache.org/jira/browse/SPARK-14468 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 1.4.2, 1.6.2, 2.0.0, 1.5.2 > > > The OutputCommitCoordinator was originally introduced in SPARK-4879 because > speculation causes the output of some partitions to be deleted. However, as > we can see in SPARK-10063, speculation is not the only case where this can > happen. > More specifically, when we retry a stage we're not guaranteed to kill the > tasks that are still running (we don't even interrupt their threads), so we > may end up with multiple concurrent task attempts for the same task. This > leads to problems like SPARK-8029, but this fix alone is necessary but not > sufficient. > In general, when we run into situations like these, we need the > OutputCommitCoordinator because we don't control what the underlying file > system does. Enabling this doesn't induce heavy performance costs so there's > little reason why we shouldn't always enable it to ensure correctness. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14388) Create Table
[ https://issues.apache.org/jira/browse/SPARK-14388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-14388: - Assignee: Andrew Or > Create Table > > > Key: SPARK-14388 > URL: https://issues.apache.org/jira/browse/SPARK-14388 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Andrew Or > > For now, we still ask Hive to handle creating hive tables. We should handle > them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-14410) SessionCatalog needs to check function existence
[ https://issues.apache.org/jira/browse/SPARK-14410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14410: -- Comment: was deleted (was: User 'rekhajoshm' has created a pull request for this issue: https://github.com/apache/spark/pull/12183) > SessionCatalog needs to check function existence > - > > Key: SPARK-14410 > URL: https://issues.apache.org/jira/browse/SPARK-14410 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Yin Huai >Assignee: Andrew Or > Fix For: 2.0.0 > > > Right now, operations for an existing functions in SessionCatalog do not > really check if the function exists. We should add this check and avoid of > doing the check in command. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14468) Always enable OutputCommitCoordinator
Andrew Or created SPARK-14468: - Summary: Always enable OutputCommitCoordinator Key: SPARK-14468 URL: https://issues.apache.org/jira/browse/SPARK-14468 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Andrew Or Assignee: Andrew Or The OutputCommitCoordinator was originally introduced in SPARK-4879 because speculation causes the output of some partitions to be deleted. However, as we can see in SPARK-10063, speculation is not the only case where this can happen. More specifically, when we retry a stage we're not guaranteed to kill the tasks that are still running (we don't even interrupt their threads), so we may end up with multiple concurrent task attempts for the same task. This leads to problems like SPARK-8029, but this fix alone is necessary but not sufficient. In general, when we run into situations like these, we need the OutputCommitCoordinator because we don't control what the underlying file system does. Enabling this doesn't induce heavy performance costs so there's little reason why we shouldn't always enable it to ensure correctness. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14130) [Table related commands] Alter column
[ https://issues.apache.org/jira/browse/SPARK-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-14130: - Assignee: Andrew Or > [Table related commands] Alter column > - > > Key: SPARK-14130 > URL: https://issues.apache.org/jira/browse/SPARK-14130 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Andrew Or > > For alter column command, we have the following tokens. > TOK_ALTERTABLE_RENAMECOL > TOK_ALTERTABLE_ADDCOLS > TOK_ALTERTABLE_REPLACECOLS > For data source tables, we should throw exceptions. For Hive tables, we > should support them. *For Hive tables, we should check Hive's behavior to see > if there is any file format that does not any of above command*. > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java > is a good reference for Hive's behavior. > Also, for a Hive table stored in a format, we need to make sure that even if > Spark can read this tables after an alter column operation. If we cannot read > the table, even Hive allows the alter column operation, we should still throw > an exception. For example, if renaming a column of a Hive parquet table > causes the renamed column inaccessible (we cannot read values), we should not > allow this renaming operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13112) CoarsedExecutorBackend register to driver should wait Executor was ready
[ https://issues.apache.org/jira/browse/SPARK-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-13112. --- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > CoarsedExecutorBackend register to driver should wait Executor was ready > > > Key: SPARK-13112 > URL: https://issues.apache.org/jira/browse/SPARK-13112 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: SuYan >Assignee: Shixiong Zhu > Fix For: 2.0.0 > > > desc: > due to some host's disk are busy, it will results failed in timeoutException > while executor try to register to shuffler server on that host... > and then it will exit(1) while launch task on a null executor. > and yarn cluster resource are a little busy, yarn will thought that host is > idle, it will prefer to allocate the same host executor, so it will have a > chance that one task failed 4 times in the same host. > currently, CoarsedExecutorBackend register to driver first, and after > registerDriver successful, then initial Executor. > if exception occurs in Executor initialization, > But Driver don't know that event, will still launch task in that executor, > then will call system.exit(1). > {code} > override def receive: PartialFunction[Any, Unit] = { > case RegisteredExecutor(hostname) => > logInfo("Successfully registered with driver") executor = new > Executor(executorId, hostname, env, userClassPath, isLocal = false) > .. > case LaunchTask(data) => >if (executor == null) { > logError("Received LaunchTask command but executor was null") > System.exit(1) > {code} > It is more reasonable to register with driver after Executor is ready... and > make registerTimeout to be configurable... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14444) Add a new scalastyle `NoScalaDoc` to prevent ScalaDoc-style multiline comments
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-1. --- Resolution: Fixed Assignee: Dongjoon Hyun Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Add a new scalastyle `NoScalaDoc` to prevent ScalaDoc-style multiline comments > -- > > Key: SPARK-1 > URL: https://issues.apache.org/jira/browse/SPARK-1 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun > Fix For: 2.0.0 > > > According to the [Spark Code Style > Guide|https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Indentation], > this issue add a new scalastyle rule to prevent the followings. > {code} > /** In Spark, we don't use the ScalaDoc style so this > * is not correct. > */ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14424) spark-class and related (spark-shell, etc.) no longer work with sbt build as documented
[ https://issues.apache.org/jira/browse/SPARK-14424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14424. --- Resolution: Fixed Assignee: holdenk Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > spark-class and related (spark-shell, etc.) no longer work with sbt build as > documented > --- > > Key: SPARK-14424 > URL: https://issues.apache.org/jira/browse/SPARK-14424 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: holdenk >Assignee: holdenk >Priority: Minor > Fix For: 2.0.0 > > > SPARK-13579 disabled building the assembly artifacts. Our shell scripts > (specifically spark-class but everything depends on it) aren't yet ready for > this change. > Repro steps: > ./build/sbt clean compile assembly && ./bin/spark-shell > Result: > " > Failed to find Spark jars directory (.../assembly/target/scala-2.11/jars). > You need to build Spark before running this program. > " > Follow up fix docs and clarify assembly requirement has been replaced with > package -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14391) Flaky Test org.apache.spark.launcher.LauncherServerSuite.testCommunication
[ https://issues.apache.org/jira/browse/SPARK-14391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14391: -- Fix Version/s: (was: 1.6.2) > Flaky Test org.apache.spark.launcher.LauncherServerSuite.testCommunication > -- > > Key: SPARK-14391 > URL: https://issues.apache.org/jira/browse/SPARK-14391 > Project: Spark > Issue Type: Test > Components: Spark Submit >Reporter: Burak Yavuz >Assignee: Marcelo Vanzin > Labels: flaky-test > Fix For: 2.0.0 > > > Several Failures: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54893 > https://spark-tests.appspot.com/tests/org.apache.spark.launcher.LauncherServerSuite/testCommunication -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14391) Flaky Test org.apache.spark.launcher.LauncherServerSuite.testCommunication
[ https://issues.apache.org/jira/browse/SPARK-14391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14391: -- Target Version/s: 2.0.0 (was: 1.6.2, 2.0.0) > Flaky Test org.apache.spark.launcher.LauncherServerSuite.testCommunication > -- > > Key: SPARK-14391 > URL: https://issues.apache.org/jira/browse/SPARK-14391 > Project: Spark > Issue Type: Test > Components: Spark Submit >Reporter: Burak Yavuz >Assignee: Marcelo Vanzin > Labels: flaky-test > Fix For: 2.0.0 > > > Several Failures: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54893 > https://spark-tests.appspot.com/tests/org.apache.spark.launcher.LauncherServerSuite/testCommunication -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14391) Flaky Test org.apache.spark.launcher.LauncherServerSuite.testCommunication
[ https://issues.apache.org/jira/browse/SPARK-14391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14391. --- Resolution: Fixed Assignee: Marcelo Vanzin Fix Version/s: 2.0.0 1.6.2 Target Version/s: 1.6.2, 2.0.0 > Flaky Test org.apache.spark.launcher.LauncherServerSuite.testCommunication > -- > > Key: SPARK-14391 > URL: https://issues.apache.org/jira/browse/SPARK-14391 > Project: Spark > Issue Type: Test > Components: Spark Submit >Reporter: Burak Yavuz >Assignee: Marcelo Vanzin > Labels: flaky-test > Fix For: 1.6.2, 2.0.0 > > > Several Failures: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54893 > https://spark-tests.appspot.com/tests/org.apache.spark.launcher.LauncherServerSuite/testCommunication -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14127) [Table related commands] Describe table
[ https://issues.apache.org/jira/browse/SPARK-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14127: -- Assignee: (was: Andrew Or) > [Table related commands] Describe table > --- > > Key: SPARK-14127 > URL: https://issues.apache.org/jira/browse/SPARK-14127 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > > TOK_DESCTABLE > Describe a column/table/partition (see here and here). Seems we support > DESCRIBE and DESCRIBE EXTENDED. It will be good to also support other > syntaxes (and check if we are missing anything). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14384) Remove alterFunction from SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-14384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14384. --- Resolution: Fixed Fix Version/s: 2.0.0 > Remove alterFunction from SessionCatalog > > > Key: SPARK-14384 > URL: https://issues.apache.org/jira/browse/SPARK-14384 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Yin Huai > Fix For: 2.0.0 > > > Seems there is no way to call alterFunction (it is not allowed in SQL right > now). So, let's remove it for now. When we want to support it, we can add it > back and make sure that we implement the right semantic. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14441) Consolidate DDL tests
[ https://issues.apache.org/jira/browse/SPARK-14441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14441: -- Description: Today we have DDLSuite, DDLCommandSuite, HiveDDLCommandSuite. It's confusing whether a test should exist in one or the other. It also makes it less clear whether our test coverage is comprehensive. Ideally we should consolidate these files as much as possible. > Consolidate DDL tests > - > > Key: SPARK-14441 > URL: https://issues.apache.org/jira/browse/SPARK-14441 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 2.0.0 >Reporter: Andrew Or > > Today we have DDLSuite, DDLCommandSuite, HiveDDLCommandSuite. It's confusing > whether a test should exist in one or the other. It also makes it less clear > whether our test coverage is comprehensive. Ideally we should consolidate > these files as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14441) Consolidate DDL tests
Andrew Or created SPARK-14441: - Summary: Consolidate DDL tests Key: SPARK-14441 URL: https://issues.apache.org/jira/browse/SPARK-14441 Project: Spark Issue Type: Sub-task Components: SQL, Tests Affects Versions: 2.0.0 Reporter: Andrew Or -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14132) [Table related commands] Alter partition
[ https://issues.apache.org/jira/browse/SPARK-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14132: -- Description: For alter column command, we have the following tokens. TOK_ALTERTABLE_ADDPARTS TOK_ALTERTABLE_DROPPARTS TOK_MSCK TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE For data source tables, we should throw exceptions. For Hive tables, we should support add and drop partitions. For now, it should be fine to throw an exception for the rest. was: For alter column command, we have the following tokens. TOK_ALTERTABLE_ADDPARTS TOK_ALTERTABLE_DROPPARTS TOK_MSCK TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE For data source tables, we should throw exceptions. For Hive tables, we should support them. For now, it should be fine to throw an exception for TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE. > [Table related commands] Alter partition > > > Key: SPARK-14132 > URL: https://issues.apache.org/jira/browse/SPARK-14132 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Andrew Or > > For alter column command, we have the following tokens. > TOK_ALTERTABLE_ADDPARTS > TOK_ALTERTABLE_DROPPARTS > TOK_MSCK > TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE > For data source tables, we should throw exceptions. > For Hive tables, we should support add and drop partitions. For now, it > should be fine to throw an exception for the rest. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14132) [Table related commands] Alter partition
[ https://issues.apache.org/jira/browse/SPARK-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14132: -- Description: For alter column command, we have the following tokens. TOK_ALTERTABLE_ADDPARTS TOK_ALTERTABLE_DROPPARTS TOK_MSCK TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE For data source tables, we should throw exceptions. For Hive tables, we should support them. For now, it should be fine to throw an exception for TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE. was: For alter column command, we have the following tokens. TOK_ALTERTABLE_ADDPARTS TOK_ALTERTABLE_DROPPARTS TOK_ALTERTABLE_CLUSTER_SORT TOK_MSCK TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE For data source tables, we should throw exceptions. For Hive tables, we should support them. For now, it should be fine to throw an exception for TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE. > [Table related commands] Alter partition > > > Key: SPARK-14132 > URL: https://issues.apache.org/jira/browse/SPARK-14132 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Andrew Or > > For alter column command, we have the following tokens. > TOK_ALTERTABLE_ADDPARTS > TOK_ALTERTABLE_DROPPARTS > TOK_MSCK > TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE > For data source tables, we should throw exceptions. > For Hive tables, we should support them. For now, it should be fine to throw > an exception for TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14383) Missing "|" in the g4 definition
[ https://issues.apache.org/jira/browse/SPARK-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14383. --- Resolution: Fixed Assignee: Bo Meng Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Missing "|" in the g4 definition > > > Key: SPARK-14383 > URL: https://issues.apache.org/jira/browse/SPARK-14383 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Assignee: Bo Meng >Priority: Trivial > Fix For: 2.0.0 > > > It is really a trivial bug in the g4 file I found. It is missing a "|" > between DISTRIBUTE and UNSET. I will post the PR shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14429) Improve LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
[ https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14429. --- Resolution: Fixed Assignee: Bo Meng Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Improve LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL > > > Key: SPARK-14429 > URL: https://issues.apache.org/jira/browse/SPARK-14429 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Bo Meng >Assignee: Bo Meng >Priority: Minor > Fix For: 2.0.0 > > > LIKE is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} > etc DDL. In the pattern, user can use `|` or `\*` as wildcards. > I'd like to address a few issues in this JIRA: > # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the > replacement was scattered in several places; it is good to have one place to > do the same thing; > # Consistency with Hive: the pattern is case insensitive in Hive and white > spaces will be trimmed, but current pattern matching does not do that. For > example, suppose we have tables t1, t2, t3, {code:SQL}SHOW TABLES LIKE ' T* > '; {code} will list all the t-tables. > # Sort the result. > Please use Hive to verify it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14426) Merge PerserUtils and ParseUtils
[ https://issues.apache.org/jira/browse/SPARK-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14426. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Merge PerserUtils and ParseUtils > > > Key: SPARK-14426 > URL: https://issues.apache.org/jira/browse/SPARK-14426 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta > Fix For: 2.0.0 > > > We have ParserUtils and ParseUtils which are both utility collections for use > during the parsing process. > Those name and what they are used for is very similar so I think we can merge > them. > Also, the original unescapeSQLString method may have a fault. When "\u0061" > style character literals are passed to the method, it's not unescaped > successfully. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14426) Merge PerserUtils and ParseUtils
[ https://issues.apache.org/jira/browse/SPARK-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14426: -- Assignee: Kousuke Saruta > Merge PerserUtils and ParseUtils > > > Key: SPARK-14426 > URL: https://issues.apache.org/jira/browse/SPARK-14426 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta > Fix For: 2.0.0 > > > We have ParserUtils and ParseUtils which are both utility collections for use > during the parsing process. > Those name and what they are used for is very similar so I think we can merge > them. > Also, the original unescapeSQLString method may have a fault. When "\u0061" > style character literals are passed to the method, it's not unescaped > successfully. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14426) Merge ParserUtils and ParseUtils
[ https://issues.apache.org/jira/browse/SPARK-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14426: -- Summary: Merge ParserUtils and ParseUtils (was: Merge PerserUtils and ParseUtils) > Merge ParserUtils and ParseUtils > > > Key: SPARK-14426 > URL: https://issues.apache.org/jira/browse/SPARK-14426 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta > Fix For: 2.0.0 > > > We have ParserUtils and ParseUtils which are both utility collections for use > during the parsing process. > Those name and what they are used for is very similar so I think we can merge > them. > Also, the original unescapeSQLString method may have a fault. When "\u0061" > style character literals are passed to the method, it's not unescaped > successfully. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14057) sql time stamps do not respect time zones
[ https://issues.apache.org/jira/browse/SPARK-14057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228529#comment-15228529 ] Andrew Davidson commented on SPARK-14057: - Hi Vijay I should have some time to take a look tomorrow Kind regards Andy From: "Vijay Parmar (JIRA)"Date: Tuesday, April 5, 2016 at 3:23 PM To: Andrew Davidson Subject: [jira] [Commented] (SPARK-14057) sql time stamps do not respect time zones > sql time stamps do not respect time zones > - > > Key: SPARK-14057 > URL: https://issues.apache.org/jira/browse/SPARK-14057 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Andrew Davidson >Priority: Minor > > we have time stamp data. The time stamp data is UTC how ever when we load the > data into spark data frames, the system assume the time stamps are in the > local time zone. This causes problems for our data scientists. Often they > pull data from our data center into their local macs. The data centers run > UTC. There computers are typically in PST or EST. > It is possible to hack around this problem > This cause a lot of errors in their analysis > A complete description of this issue can be found in the following mail msg > https://www.mail-archive.com/user@spark.apache.org/msg48121.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14410) SessionCatalog needs to check function existence
[ https://issues.apache.org/jira/browse/SPARK-14410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-14410: - Assignee: Andrew Or > SessionCatalog needs to check function existence > - > > Key: SPARK-14410 > URL: https://issues.apache.org/jira/browse/SPARK-14410 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Yin Huai >Assignee: Andrew Or > > Right now, operations for an existing functions in SessionCatalog do not > really check if the function exists. We should add this check and avoid of > doing the check in command. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14410) SessionCatalog needs to check function existence
[ https://issues.apache.org/jira/browse/SPARK-14410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14410: -- Summary: SessionCatalog needs to check function existence (was: SessionCatalog needs to check database/table/function existence ) > SessionCatalog needs to check function existence > - > > Key: SPARK-14410 > URL: https://issues.apache.org/jira/browse/SPARK-14410 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Yin Huai > > Right now, operations for an existing db/table/function in SessionCatalog do > not really check if the db/table/function exists. We should add this check > and avoid of doing the check in command. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14410) SessionCatalog needs to check function existence
[ https://issues.apache.org/jira/browse/SPARK-14410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14410: -- Description: Right now, operations for an existing functions in SessionCatalog do not really check if the function exists. We should add this check and avoid of doing the check in command. (was: Right now, operations for an existing db/table/function in SessionCatalog do not really check if the db/table/function exists. We should add this check and avoid of doing the check in command.) > SessionCatalog needs to check function existence > - > > Key: SPARK-14410 > URL: https://issues.apache.org/jira/browse/SPARK-14410 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Yin Huai > > Right now, operations for an existing functions in SessionCatalog do not > really check if the function exists. We should add this check and avoid of > doing the check in command. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14132) [Table related commands] Alter partition
[ https://issues.apache.org/jira/browse/SPARK-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-14132: - Assignee: Andrew Or > [Table related commands] Alter partition > > > Key: SPARK-14132 > URL: https://issues.apache.org/jira/browse/SPARK-14132 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Andrew Or > > For alter column command, we have the following tokens. > TOK_ALTERTABLE_ADDPARTS > TOK_ALTERTABLE_DROPPARTS > TOK_ALTERTABLE_CLUSTER_SORT > TOK_MSCK > TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE > For data source tables, we should throw exceptions. > For Hive tables, we should support them. For now, it should be fine to throw > an exception for TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14252) Executors do not try to download remote cached blocks
[ https://issues.apache.org/jira/browse/SPARK-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14252. --- Resolution: Fixed Assignee: Eric Liang Fix Version/s: 2.0.0 > Executors do not try to download remote cached blocks > - > > Key: SPARK-14252 > URL: https://issues.apache.org/jira/browse/SPARK-14252 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin >Assignee: Eric Liang > Fix For: 2.0.0 > > > I noticed this when taking a look at the root cause of SPARK-14209. 2.0.0 > includes SPARK-12817, which changed the caching code a bit to remove > duplication. But it seems to have removed the part where executors check > whether other executors contain the needed cached block. > In 1.6, that was done by the call to {{BlockManager.get}} in > {{CacheManager.getOrCompute}}. But in the new version, {{RDD.iterator}} calls > {{BlockManager.getOrElseUpdate}}, which never calls {{BlockManager.get}}, and > thus the executor never gets block that are cached by other executors, > causing the blocks to be instead recomputed locally. > I wrote a small program that shows this. In 1.6, running with > {{--num-executors 2}}, I get 5 blocks cached on each executor, and messages > like these in the logs: > {noformat} > 16/03/29 13:18:01 DEBUG spark.CacheManager: Looking for partition rdd_0_7 > 16/03/29 13:18:01 DEBUG storage.BlockManager: Getting local block rdd_0_7 > 16/03/29 13:18:01 DEBUG storage.BlockManager: Block rdd_0_7 not registered > locally > 16/03/29 13:18:01 DEBUG storage.BlockManager: Getting remote block rdd_0_7 > 16/03/29 13:18:01 DEBUG storage.BlockManager: Getting remote block rdd_0_7 > from BlockManagerId(1, blah, 58831) > 1 > {noformat} > On 2.0, I get (almost) all the 10 partitions cached on *both* executors, > because once the second one fails to find a block locally it just recomputes > it and caches it. It never tries to download the block from the other > executor. The log messages above, which still exist in the code, don't show > up anywhere. > Here's the code I used for the above (trimmed of some other stuff from my > little test harness, so might not compile as is): > {code} > val sc = new SparkContext(conf) > try { > val rdd = sc.parallelize(1 to 1000, 10) > rdd.cache() > rdd.count() > // Create a single task that will sleep and block, so that a particular > executor is busy. > // This should force future tasks to download cached data from that > executor. > println("Running sleep job..") > val thread = new Thread(new Runnable() { > override def run(): Unit = { > rdd.mapPartitionsWithIndex { (i, iter) => > if (i == 0) { > Thread.sleep(TimeUnit.MINUTES.toMillis(10)) > } > iter > }.count() > } > }) > thread.setDaemon(true) > thread.start() > // Wait a few seconds to make sure the task is running (too lazy for > listeners) > println("Waiting for tasks to start...") > TimeUnit.SECONDS.sleep(10) > // Now run a job that will touch everything and should use the cached > data. > val cnt = rdd.map(_*2).count() > println(s"Counted $cnt elements.") > println("Killing sleep job.") > thread.interrupt() > thread.join() > } finally { > sc.stop() > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14416) Add thread-safe comments for CoarseGrainedSchedulerBackend's fields
[ https://issues.apache.org/jira/browse/SPARK-14416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14416. --- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 2.0.0 > Add thread-safe comments for CoarseGrainedSchedulerBackend's fields > --- > > Key: SPARK-14416 > URL: https://issues.apache.org/jira/browse/SPARK-14416 > Project: Spark > Issue Type: Improvement >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > Fix For: 2.0.0 > > > While I was reviewing https://github.com/apache/spark/pull/12078, I found > most of CoarseGrainedSchedulerBackend's mutable fields doesn't have any > comments about the thread-safe assumptions and it's hard for people to figure > out which part of codes should be protected by the lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14416) Add thread-safe comments for CoarseGrainedSchedulerBackend's fields
[ https://issues.apache.org/jira/browse/SPARK-14416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14416: -- Target Version/s: 2.0.0 > Add thread-safe comments for CoarseGrainedSchedulerBackend's fields > --- > > Key: SPARK-14416 > URL: https://issues.apache.org/jira/browse/SPARK-14416 > Project: Spark > Issue Type: Improvement >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > Fix For: 2.0.0 > > > While I was reviewing https://github.com/apache/spark/pull/12078, I found > most of CoarseGrainedSchedulerBackend's mutable fields doesn't have any > comments about the thread-safe assumptions and it's hard for people to figure > out which part of codes should be protected by the lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14414) Make error messages consistent across DDLs
Andrew Or created SPARK-14414: - Summary: Make error messages consistent across DDLs Key: SPARK-14414 URL: https://issues.apache.org/jira/browse/SPARK-14414 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or There are many different error messages right now when the user tries to run something that's not supported. We might throw AnalysisException or ParseException or NoSuchFunctionException etc. We should make all of these consistent before 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14243) updatedBlockStatuses does not update correctly when removing blocks
[ https://issues.apache.org/jira/browse/SPARK-14243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14243. --- Resolution: Fixed > updatedBlockStatuses does not update correctly when removing blocks > --- > > Key: SPARK-14243 > URL: https://issues.apache.org/jira/browse/SPARK-14243 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.1 >Reporter: jeanlyn >Assignee: jeanlyn > Fix For: 1.6.2, 2.0.0 > > > Currently, *updatedBlockStatuses* of *TaskMetrics* does not update correctly > when removing blocks in *BlockManager.removeBlock* and the method invoke > *removeBlock*. See: > branch-1.6:https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1108 > branch-1.5:https://github.com/apache/spark/blob/branch-1.5/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1101 > We should make sure *updatedBlockStatuses* update correctly when: > * Block removed from BlockManager > * Block dropped from memory to disk > * Block added to BlockManager -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14243) updatedBlockStatuses does not update correctly when removing blocks
[ https://issues.apache.org/jira/browse/SPARK-14243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14243: -- Fix Version/s: 1.6.2 > updatedBlockStatuses does not update correctly when removing blocks > --- > > Key: SPARK-14243 > URL: https://issues.apache.org/jira/browse/SPARK-14243 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.1 >Reporter: jeanlyn >Assignee: jeanlyn > Fix For: 1.6.2, 2.0.0 > > > Currently, *updatedBlockStatuses* of *TaskMetrics* does not update correctly > when removing blocks in *BlockManager.removeBlock* and the method invoke > *removeBlock*. See: > branch-1.6:https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1108 > branch-1.5:https://github.com/apache/spark/blob/branch-1.5/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1101 > We should make sure *updatedBlockStatuses* update correctly when: > * Block removed from BlockManager > * Block dropped from memory to disk > * Block added to BlockManager -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14123) Function related commands
[ https://issues.apache.org/jira/browse/SPARK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14123. --- Resolution: Fixed Assignee: Liang-Chi Hsieh Fix Version/s: 2.0.0 > Function related commands > - > > Key: SPARK-14123 > URL: https://issues.apache.org/jira/browse/SPARK-14123 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Liang-Chi Hsieh > Fix For: 2.0.0 > > > We should support TOK_CREATEFUNCTION/TOK_DROPFUNCTION. > For now, we can throw exceptions for TOK_CREATEMACRO/TOK_DROPMACRO. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14385) Use FunctionIdentifier in FunctionRegistry/SessionCatalog
Andrew Or created SPARK-14385: - Summary: Use FunctionIdentifier in FunctionRegistry/SessionCatalog Key: SPARK-14385 URL: https://issues.apache.org/jira/browse/SPARK-14385 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Yin Huai Right now it's confusing what's a qualified name or not. There's little type-safety in this corner of the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14385) Use FunctionIdentifier in FunctionRegistry/SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-14385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14385: -- Assignee: (was: Yin Huai) > Use FunctionIdentifier in FunctionRegistry/SessionCatalog > - > > Key: SPARK-14385 > URL: https://issues.apache.org/jira/browse/SPARK-14385 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or > > Right now it's confusing what's a qualified name or not. There's little > type-safety in this corner of the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11327) spark-dispatcher doesn't pass along some spark properties
[ https://issues.apache.org/jira/browse/SPARK-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-11327: -- Fix Version/s: 1.6.2 > spark-dispatcher doesn't pass along some spark properties > - > > Key: SPARK-11327 > URL: https://issues.apache.org/jira/browse/SPARK-11327 > Project: Spark > Issue Type: Bug > Components: Mesos >Reporter: Alan Braithwaite >Assignee: Jo Voordeckers > Fix For: 1.6.2, 2.0.0 > > > I haven't figured out exactly what's going on yet, but there's something in > the spark-dispatcher which is failing to pass along properties to the > spark-driver when using spark-submit in a clustered mesos docker environment. > Most importantly, it's not passing along spark.mesos.executor.docker.image. > cli: > {code} > docker run -t -i --rm --net=host > --entrypoint=/usr/local/spark/bin/spark-submit > docker.example.com/spark:2015.10.2 --conf spark.driver.memory=8G --conf > spark.mesos.executor.docker.image=docker.example.com/spark:2015.10.2 --master > mesos://spark-dispatcher.example.com:31262 --deploy-mode cluster > --properties-file /usr/local/spark/conf/spark-defaults.conf --class > com.example.spark.streaming.MyApp > http://jarserver.example.com:8000/sparkapp.jar zk1.example.com:2181 > spark-testing my-stream 40 > {code} > submit output: > {code} > 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request to launch > an application in mesos://compute1.example.com:31262. > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending POST request to server > at http://compute1.example.com:31262/v1/submissions/create: > { > "action" : "CreateSubmissionRequest", > "appArgs" : [ "zk1.example.com:2181", "spark-testing", "requests", "40" ], > "appResource" : "http://jarserver.example.com:8000/sparkapp.jar;, > "clientSparkVersion" : "1.5.0", > "environmentVariables" : { > "SPARK_SCALA_VERSION" : "2.10", > "SPARK_CONF_DIR" : "/usr/local/spark/conf", > "SPARK_HOME" : "/usr/local/spark", > "SPARK_ENV_LOADED" : "1" > }, > "mainClass" : "com.example.spark.streaming.MyApp", > "sparkProperties" : { > "spark.serializer" : "org.apache.spark.serializer.KryoSerializer", > "spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY" : > "/usr/local/lib/libmesos.so", > "spark.history.fs.logDirectory" : "hdfs://hdfsha.example.com/spark/logs", > "spark.eventLog.enabled" : "true", > "spark.driver.maxResultSize" : "0", > "spark.mesos.deploy.recoveryMode" : "ZOOKEEPER", > "spark.mesos.deploy.zookeeper.url" : > "zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181,zk4.example.com:2181,zk5.example.com:2181", > "spark.jars" : "http://jarserver.example.com:8000/sparkapp.jar;, > "spark.driver.supervise" : "false", > "spark.app.name" : "com.example.spark.streaming.MyApp", > "spark.driver.memory" : "8G", > "spark.logConf" : "true", > "spark.deploy.zookeeper.dir" : "/spark_mesos_dispatcher", > "spark.mesos.executor.docker.image" : > "docker.example.com/spark-prod:2015.10.2", > "spark.submit.deployMode" : "cluster", > "spark.master" : "mesos://compute1.example.com:31262", > "spark.executor.memory" : "8G", > "spark.eventLog.dir" : "hdfs://hdfsha.example.com/spark/logs", > "spark.mesos.docker.executor.network" : "HOST", > "spark.mesos.executor.home" : "/usr/local/spark" > } > } > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server: > { > "action" : "CreateSubmissionResponse", > "serverSparkVersion" : "1.5.0", > "submissionId" : "driver-20151026220353-0011", > "success" : true > } > 15/10/26 22:03:53 INFO RestSubmissionClient: Submission successfully created > as driver-20151026220353-0011. Polling submission state... > 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request for the > status of submission driver-20151026220353-0011 in > mesos://compute1.example.com:31262. > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending GET request to server > at > http://compute1.example.com:31262/v1/submissions/status/driver-20151026220353-0011. > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server: > { > "action" : "SubmissionStatusResponse", > "driverState" : "QUEUED", > "serverSparkVersion" : "1.5.0", > "submissionId" : "driver-20151026220353-0011", > "success" : true > } > 15/10/26 22:03:53 INFO RestSubmissionClient: State of driver > driver-20151026220353-0011 is now QUEUED. > 15/10/26 22:03:53 INFO RestSubmissionClient: Server responded with > CreateSubmissionResponse: > { > "action" : "CreateSubmissionResponse", > "serverSparkVersion" : "1.5.0", > "submissionId" : "driver-20151026220353-0011", > "success" : true > } > {code} > driver log: > {code} > 15/10/26 22:08:08 INFO SparkContext: Running
[jira] [Updated] (SPARK-11327) spark-dispatcher doesn't pass along some spark properties
[ https://issues.apache.org/jira/browse/SPARK-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-11327: -- Target Version/s: 1.6.2, 2.0.0 (was: 2.0.0) > spark-dispatcher doesn't pass along some spark properties > - > > Key: SPARK-11327 > URL: https://issues.apache.org/jira/browse/SPARK-11327 > Project: Spark > Issue Type: Bug > Components: Mesos >Reporter: Alan Braithwaite >Assignee: Jo Voordeckers > Fix For: 1.6.2, 2.0.0 > > > I haven't figured out exactly what's going on yet, but there's something in > the spark-dispatcher which is failing to pass along properties to the > spark-driver when using spark-submit in a clustered mesos docker environment. > Most importantly, it's not passing along spark.mesos.executor.docker.image. > cli: > {code} > docker run -t -i --rm --net=host > --entrypoint=/usr/local/spark/bin/spark-submit > docker.example.com/spark:2015.10.2 --conf spark.driver.memory=8G --conf > spark.mesos.executor.docker.image=docker.example.com/spark:2015.10.2 --master > mesos://spark-dispatcher.example.com:31262 --deploy-mode cluster > --properties-file /usr/local/spark/conf/spark-defaults.conf --class > com.example.spark.streaming.MyApp > http://jarserver.example.com:8000/sparkapp.jar zk1.example.com:2181 > spark-testing my-stream 40 > {code} > submit output: > {code} > 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request to launch > an application in mesos://compute1.example.com:31262. > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending POST request to server > at http://compute1.example.com:31262/v1/submissions/create: > { > "action" : "CreateSubmissionRequest", > "appArgs" : [ "zk1.example.com:2181", "spark-testing", "requests", "40" ], > "appResource" : "http://jarserver.example.com:8000/sparkapp.jar;, > "clientSparkVersion" : "1.5.0", > "environmentVariables" : { > "SPARK_SCALA_VERSION" : "2.10", > "SPARK_CONF_DIR" : "/usr/local/spark/conf", > "SPARK_HOME" : "/usr/local/spark", > "SPARK_ENV_LOADED" : "1" > }, > "mainClass" : "com.example.spark.streaming.MyApp", > "sparkProperties" : { > "spark.serializer" : "org.apache.spark.serializer.KryoSerializer", > "spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY" : > "/usr/local/lib/libmesos.so", > "spark.history.fs.logDirectory" : "hdfs://hdfsha.example.com/spark/logs", > "spark.eventLog.enabled" : "true", > "spark.driver.maxResultSize" : "0", > "spark.mesos.deploy.recoveryMode" : "ZOOKEEPER", > "spark.mesos.deploy.zookeeper.url" : > "zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181,zk4.example.com:2181,zk5.example.com:2181", > "spark.jars" : "http://jarserver.example.com:8000/sparkapp.jar;, > "spark.driver.supervise" : "false", > "spark.app.name" : "com.example.spark.streaming.MyApp", > "spark.driver.memory" : "8G", > "spark.logConf" : "true", > "spark.deploy.zookeeper.dir" : "/spark_mesos_dispatcher", > "spark.mesos.executor.docker.image" : > "docker.example.com/spark-prod:2015.10.2", > "spark.submit.deployMode" : "cluster", > "spark.master" : "mesos://compute1.example.com:31262", > "spark.executor.memory" : "8G", > "spark.eventLog.dir" : "hdfs://hdfsha.example.com/spark/logs", > "spark.mesos.docker.executor.network" : "HOST", > "spark.mesos.executor.home" : "/usr/local/spark" > } > } > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server: > { > "action" : "CreateSubmissionResponse", > "serverSparkVersion" : "1.5.0", > "submissionId" : "driver-20151026220353-0011", > "success" : true > } > 15/10/26 22:03:53 INFO RestSubmissionClient: Submission successfully created > as driver-20151026220353-0011. Polling submission state... > 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request for the > status of submission driver-20151026220353-0011 in > mesos://compute1.example.com:31262. > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending GET request to server > at > http://compute1.example.com:31262/v1/submissions/status/driver-20151026220353-0011. > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server: > { > "action" : "SubmissionStatusResponse", > "driverState" : "QUEUED", > "serverSparkVersion" : "1.5.0", > "submissionId" : "driver-20151026220353-0011", > "success" : true > } > 15/10/26 22:03:53 INFO RestSubmissionClient: State of driver > driver-20151026220353-0011 is now QUEUED. > 15/10/26 22:03:53 INFO RestSubmissionClient: Server responded with > CreateSubmissionResponse: > { > "action" : "CreateSubmissionResponse", > "serverSparkVersion" : "1.5.0", > "submissionId" : "driver-20151026220353-0011", > "success" : true > } > {code} > driver log: > {code} > 15/10/26 22:08:08
[jira] [Resolved] (SPARK-14358) Change SparkListener from a trait to an abstract class, and remove JavaSparkListener
[ https://issues.apache.org/jira/browse/SPARK-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14358. --- Resolution: Fixed Fix Version/s: 2.0.0 > Change SparkListener from a trait to an abstract class, and remove > JavaSparkListener > > > Key: SPARK-14358 > URL: https://issues.apache.org/jira/browse/SPARK-14358 > Project: Spark > Issue Type: Sub-task > Components: Scheduler, Spark Core >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.0.0 > > > Scala traits are difficult to maintain binary compatibility on, and as a > result we had to introduce JavaSparkListener. In Spark 2.0 we can change > SparkListener from a trait to an abstract class and then remove > JavaSparkListener. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14364) HeartbeatReceiver object should be private[spark]
[ https://issues.apache.org/jira/browse/SPARK-14364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14364. --- Resolution: Fixed Fix Version/s: 2.0.0 > HeartbeatReceiver object should be private[spark] > - > > Key: SPARK-14364 > URL: https://issues.apache.org/jira/browse/SPARK-14364 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.0.0 > > > It's a mistake that HeartbeatReceiver object was made public in Spark 1.x. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13845) BlockStatus and StreamBlockId keep on growing result driver OOM
[ https://issues.apache.org/jira/browse/SPARK-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222370#comment-15222370 ] Andrew Or commented on SPARK-13845: --- Thanks josh > BlockStatus and StreamBlockId keep on growing result driver OOM > --- > > Key: SPARK-13845 > URL: https://issues.apache.org/jira/browse/SPARK-13845 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.1 >Reporter: jeanlyn >Assignee: jeanlyn > Fix For: 1.6.2, 2.0.0 > > > We have a streaming job using *FlumePollInputStream* always driver OOM after > few days, here is some driver heap dump before OOM > {noformat} > num #instances #bytes class name > -- >1: 13845916 553836640 org.apache.spark.storage.BlockStatus >2: 14020324 336487776 org.apache.spark.storage.StreamBlockId >3: 13883881 333213144 scala.collection.mutable.DefaultEntry >4: 8907 89043952 [Lscala.collection.mutable.HashEntry; >5: 62360 65107352 [B >6:163368 24453904 [Ljava.lang.Object; >7:293651 20342664 [C > ... > {noformat} > *BlockStatus* and *StreamBlockId* keep on growing, and the driver OOM in the > end. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14333) Duration of task should be the total time (not just computation time)
[ https://issues.apache.org/jira/browse/SPARK-14333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222189#comment-15222189 ] Andrew Or commented on SPARK-14333: --- can you post a screenshot too > Duration of task should be the total time (not just computation time) > - > > Key: SPARK-14333 > URL: https://issues.apache.org/jira/browse/SPARK-14333 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Davies Liu > > Right now, the duration of a task in Stage page is task computation time, it > should be total time (including serialization and fetching results). We > should also have a separate column for computation time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-14294) Support native execution of ALTER TABLE ... RENAME TO
[ https://issues.apache.org/jira/browse/SPARK-14294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-14294. - Resolution: Duplicate Assignee: Andrew Or > Support native execution of ALTER TABLE ... RENAME TO > - > > Key: SPARK-14294 > URL: https://issues.apache.org/jira/browse/SPARK-14294 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Bo Meng >Assignee: Andrew Or >Priority: Minor > > Support native execution of ALTER TABLE ... RENAME TO > The syntax for ALTER TABLE ... RENAME TO commands is described as following: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenameTable -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14129) [Table related commands] Alter table
[ https://issues.apache.org/jira/browse/SPARK-14129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-14129: - Assignee: Andrew Or > [Table related commands] Alter table > > > Key: SPARK-14129 > URL: https://issues.apache.org/jira/browse/SPARK-14129 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Andrew Or > > For alter table command, we have the following tokens. > TOK_ALTERTABLE_RENAME > TOK_ALTERTABLE_LOCATION > TOK_ALTERTABLE_PROPERTIES/TOK_ALTERTABLE_DROPPROPERTIES > TOK_ALTERTABLE_SERIALIZER > TOK_ALTERTABLE_SERDEPROPERTIES > TOK_ALTERTABLE_CLUSTER_SORT > TOK_ALTERTABLE_SKEWED > For a data source table, let's implement TOK_ALTERTABLE_RENAME, > TOK_ALTERTABLE_LOCATION, and TOK_ALTERTABLE_SERDEPROPERTIES. We need to > decide what we do for > TOK_ALTERTABLE_PROPERTIES/TOK_ALTERTABLE_DROPPROPERTIES. It will be use to > allow users to correct the data format (e.g. changing csv to > com.databricks.spark.csv to allow the table be accessed by the older versions > of spark). > For a Hive table, we should implement all commands supported by the data > source table and TOK_ALTERTABLE_PROPERTIES/TOK_ALTERTABLE_DROPPROPERTIES. > For TOK_ALTERTABLE_CLUSTER_SORT and TOK_ALTERTABLE_SKEWED, we should throw > exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14304) Fix tests that don't create temp files in the `java.io.tmpdir` folder
[ https://issues.apache.org/jira/browse/SPARK-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14304. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Fix tests that don't create temp files in the `java.io.tmpdir` folder > - > > Key: SPARK-14304 > URL: https://issues.apache.org/jira/browse/SPARK-14304 > Project: Spark > Issue Type: Improvement > Components: Tests >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > Fix For: 2.0.0 > > > If I press `CTRL-C` when running these tests, the temp files will be left in > `sql/core` folder and I need to delete them manually. It's annoying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11327) spark-dispatcher doesn't pass along some spark properties
[ https://issues.apache.org/jira/browse/SPARK-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-11327. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > spark-dispatcher doesn't pass along some spark properties > - > > Key: SPARK-11327 > URL: https://issues.apache.org/jira/browse/SPARK-11327 > Project: Spark > Issue Type: Bug > Components: Mesos >Reporter: Alan Braithwaite > Fix For: 2.0.0 > > > I haven't figured out exactly what's going on yet, but there's something in > the spark-dispatcher which is failing to pass along properties to the > spark-driver when using spark-submit in a clustered mesos docker environment. > Most importantly, it's not passing along spark.mesos.executor.docker.image. > cli: > {code} > docker run -t -i --rm --net=host > --entrypoint=/usr/local/spark/bin/spark-submit > docker.example.com/spark:2015.10.2 --conf spark.driver.memory=8G --conf > spark.mesos.executor.docker.image=docker.example.com/spark:2015.10.2 --master > mesos://spark-dispatcher.example.com:31262 --deploy-mode cluster > --properties-file /usr/local/spark/conf/spark-defaults.conf --class > com.example.spark.streaming.MyApp > http://jarserver.example.com:8000/sparkapp.jar zk1.example.com:2181 > spark-testing my-stream 40 > {code} > submit output: > {code} > 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request to launch > an application in mesos://compute1.example.com:31262. > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending POST request to server > at http://compute1.example.com:31262/v1/submissions/create: > { > "action" : "CreateSubmissionRequest", > "appArgs" : [ "zk1.example.com:2181", "spark-testing", "requests", "40" ], > "appResource" : "http://jarserver.example.com:8000/sparkapp.jar;, > "clientSparkVersion" : "1.5.0", > "environmentVariables" : { > "SPARK_SCALA_VERSION" : "2.10", > "SPARK_CONF_DIR" : "/usr/local/spark/conf", > "SPARK_HOME" : "/usr/local/spark", > "SPARK_ENV_LOADED" : "1" > }, > "mainClass" : "com.example.spark.streaming.MyApp", > "sparkProperties" : { > "spark.serializer" : "org.apache.spark.serializer.KryoSerializer", > "spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY" : > "/usr/local/lib/libmesos.so", > "spark.history.fs.logDirectory" : "hdfs://hdfsha.example.com/spark/logs", > "spark.eventLog.enabled" : "true", > "spark.driver.maxResultSize" : "0", > "spark.mesos.deploy.recoveryMode" : "ZOOKEEPER", > "spark.mesos.deploy.zookeeper.url" : > "zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181,zk4.example.com:2181,zk5.example.com:2181", > "spark.jars" : "http://jarserver.example.com:8000/sparkapp.jar;, > "spark.driver.supervise" : "false", > "spark.app.name" : "com.example.spark.streaming.MyApp", > "spark.driver.memory" : "8G", > "spark.logConf" : "true", > "spark.deploy.zookeeper.dir" : "/spark_mesos_dispatcher", > "spark.mesos.executor.docker.image" : > "docker.example.com/spark-prod:2015.10.2", > "spark.submit.deployMode" : "cluster", > "spark.master" : "mesos://compute1.example.com:31262", > "spark.executor.memory" : "8G", > "spark.eventLog.dir" : "hdfs://hdfsha.example.com/spark/logs", > "spark.mesos.docker.executor.network" : "HOST", > "spark.mesos.executor.home" : "/usr/local/spark" > } > } > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server: > { > "action" : "CreateSubmissionResponse", > "serverSparkVersion" : "1.5.0", > "submissionId" : "driver-20151026220353-0011", > "success" : true > } > 15/10/26 22:03:53 INFO RestSubmissionClient: Submission successfully created > as driver-20151026220353-0011. Polling submission state... > 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request for the > status of submission driver-20151026220353-0011 in > mesos://compute1.example.com:31262. > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending GET request to server > at > http://compute1.example.com:31262/v1/submissions/status/driver-20151026220353-0011. > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server: > { > "action" : "SubmissionStatusResponse", > "driverState" : "QUEUED", > "serverSparkVersion" : "1.5.0", > "submissionId" : "driver-20151026220353-0011", > "success" : true > } > 15/10/26 22:03:53 INFO RestSubmissionClient: State of driver > driver-20151026220353-0011 is now QUEUED. > 15/10/26 22:03:53 INFO RestSubmissionClient: Server responded with > CreateSubmissionResponse: > { > "action" : "CreateSubmissionResponse", > "serverSparkVersion" : "1.5.0", > "submissionId" : "driver-20151026220353-0011", > "success" : true > } > {code} > driver log: > {code} > 15/10/26 22:08:08 INFO
[jira] [Resolved] (SPARK-14069) Improve SparkStatusTracker to also track executor information
[ https://issues.apache.org/jira/browse/SPARK-14069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14069. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Improve SparkStatusTracker to also track executor information > - > > Key: SPARK-14069 > URL: https://issues.apache.org/jira/browse/SPARK-14069 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14243) updatedBlockStatuses does not update correctly when removing blocks
[ https://issues.apache.org/jira/browse/SPARK-14243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14243: -- Fix Version/s: 2.0.0 > updatedBlockStatuses does not update correctly when removing blocks > --- > > Key: SPARK-14243 > URL: https://issues.apache.org/jira/browse/SPARK-14243 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.1 >Reporter: jeanlyn >Assignee: jeanlyn > Fix For: 2.0.0 > > > Currently, *updatedBlockStatuses* of *TaskMetrics* does not update correctly > when removing blocks in *BlockManager.removeBlock* and the method invoke > *removeBlock*. See: > branch-1.6:https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1108 > branch-1.5:https://github.com/apache/spark/blob/branch-1.5/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1101 > We should make sure *updatedBlockStatuses* update correctly when: > * Block removed from BlockManager > * Block dropped from memory to disk > * Block added to BlockManager -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-14243) updatedBlockStatuses does not update correctly when removing blocks
[ https://issues.apache.org/jira/browse/SPARK-14243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reopened SPARK-14243: --- > updatedBlockStatuses does not update correctly when removing blocks > --- > > Key: SPARK-14243 > URL: https://issues.apache.org/jira/browse/SPARK-14243 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.1 >Reporter: jeanlyn >Assignee: jeanlyn > Fix For: 2.0.0 > > > Currently, *updatedBlockStatuses* of *TaskMetrics* does not update correctly > when removing blocks in *BlockManager.removeBlock* and the method invoke > *removeBlock*. See: > branch-1.6:https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1108 > branch-1.5:https://github.com/apache/spark/blob/branch-1.5/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1101 > We should make sure *updatedBlockStatuses* update correctly when: > * Block removed from BlockManager > * Block dropped from memory to disk > * Block added to BlockManager -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14243) updatedBlockStatuses does not update correctly when removing blocks
[ https://issues.apache.org/jira/browse/SPARK-14243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14243. --- Resolution: Fixed Target Version/s: 1.6.2, 2.0.0 > updatedBlockStatuses does not update correctly when removing blocks > --- > > Key: SPARK-14243 > URL: https://issues.apache.org/jira/browse/SPARK-14243 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.1 >Reporter: jeanlyn >Assignee: jeanlyn > > Currently, *updatedBlockStatuses* of *TaskMetrics* does not update correctly > when removing blocks in *BlockManager.removeBlock* and the method invoke > *removeBlock*. See: > branch-1.6:https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1108 > branch-1.5:https://github.com/apache/spark/blob/branch-1.5/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1101 > We should make sure *updatedBlockStatuses* update correctly when: > * Block removed from BlockManager > * Block dropped from memory to disk > * Block added to BlockManager -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14182) Parse DDL command: Alter View
[ https://issues.apache.org/jira/browse/SPARK-14182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14182. --- Resolution: Fixed Assignee: Xiao Li Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Parse DDL command: Alter View > - > > Key: SPARK-14182 > URL: https://issues.apache.org/jira/browse/SPARK-14182 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.0 > > > Based on the Hive DDL document > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL > and > https://cwiki.apache.org/confluence/display/Hive/PartitionedViews > The syntax of DDL command for {{ALTER VIEW}} include > {code} > ALTER VIEW view_name AS select_statement > ALTER VIEW view_name RENAME TO new_view_name > ALTER VIEW view_name SET TBLPROPERTIES ('comment' = new_comment); > ALTER VIEW view_name UNSET TBLPROPERTIES [IF EXISTS] ('comment', 'key') > ALTER VIEW view_name ADD [IF NOT EXISTS] PARTITION spec1[, PARTITION spec2, > ...] > ALTER VIEW view_name DROP [IF EXISTS] PARTITION spec1[, PARTITION spec2, ...] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13796) Lock release errors occur frequently in executor logs
[ https://issues.apache.org/jira/browse/SPARK-13796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-13796: -- Target Version/s: 2.0.0 Fix Version/s: 2.0.0 > Lock release errors occur frequently in executor logs > - > > Key: SPARK-13796 > URL: https://issues.apache.org/jira/browse/SPARK-13796 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Nishkam Ravi >Priority: Minor > Fix For: 2.0.0 > > > Executor logs contain a lot of these error messages (irrespective of the > workload): > 16/03/08 17:53:07 ERROR executor.Executor: 1 block locks were not released by > TID = 1119 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13796) Lock release errors occur frequently in executor logs
[ https://issues.apache.org/jira/browse/SPARK-13796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-13796: -- Assignee: Nishkam Ravi > Lock release errors occur frequently in executor logs > - > > Key: SPARK-13796 > URL: https://issues.apache.org/jira/browse/SPARK-13796 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Nishkam Ravi >Assignee: Nishkam Ravi >Priority: Minor > Fix For: 2.0.0 > > > Executor logs contain a lot of these error messages (irrespective of the > workload): > 16/03/08 17:53:07 ERROR executor.Executor: 1 block locks were not released by > TID = 1119 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14127) [Table related commands] Describe table
[ https://issues.apache.org/jira/browse/SPARK-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-14127: - Assignee: Andrew Or > [Table related commands] Describe table > --- > > Key: SPARK-14127 > URL: https://issues.apache.org/jira/browse/SPARK-14127 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Andrew Or > > TOK_DESCTABLE > Describe a column/table/partition (see here and here). Seems we support > DESCRIBE and DESCRIBE EXTENDED. It will be good to also support other > syntaxes (and check if we are missing anything). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-14125) View related commands
[ https://issues.apache.org/jira/browse/SPARK-14125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14125: -- Comment: was deleted (was: User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/12014) > View related commands > - > > Key: SPARK-14125 > URL: https://issues.apache.org/jira/browse/SPARK-14125 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > > We should support the following commands. > TOK_ALTERVIEW_AS > TOK_ALTERVIEW_PROPERTIES > TOK_ALTERVIEW_RENAME > TOK_DROPVIEW > TOK_DROPVIEW_PROPERTIES > TOK_DESCTABLE > For TOK_ALTERVIEW_ADDPARTS/TOK_ALTERVIEW_DROPPARTS, we should throw > exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14125) View related commands
[ https://issues.apache.org/jira/browse/SPARK-14125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14125: -- Assignee: Xiao Li > View related commands > - > > Key: SPARK-14125 > URL: https://issues.apache.org/jira/browse/SPARK-14125 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Xiao Li > > We should support the following commands. > TOK_ALTERVIEW_AS > TOK_ALTERVIEW_PROPERTIES > TOK_ALTERVIEW_RENAME > TOK_DROPVIEW > TOK_DROPVIEW_PROPERTIES > TOK_DESCTABLE > For TOK_ALTERVIEW_ADDPARTS/TOK_ALTERVIEW_DROPPARTS, we should throw > exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14124) Database related commands
[ https://issues.apache.org/jira/browse/SPARK-14124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14124. --- Resolution: Fixed Assignee: Xiao Li Fix Version/s: 2.0.0 > Database related commands > - > > Key: SPARK-14124 > URL: https://issues.apache.org/jira/browse/SPARK-14124 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Xiao Li > Fix For: 2.0.0 > > > We should support the following commands. > TOK_CREATEDATABASE > TOK_DESCDATABASE > TOK_DROPDATABASE > TOK_ALTERDATABASE_PROPERTIES > For, TOK_ALTERDATABASE_OWNER, let's throw an exception for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14253) Avoid registering temporary functions in Hive
Andrew Or created SPARK-14253: - Summary: Avoid registering temporary functions in Hive Key: SPARK-14253 URL: https://issues.apache.org/jira/browse/SPARK-14253 Project: Spark Issue Type: Bug Components: SQL Reporter: Andrew Or Assignee: Andrew Or Spark should just handle all temporary functions ourselves instead of passing it to Hive, which is what the HiveFunctionRegistry does at the moment. The extra call to Hive is unnecessary and potentially slow, and it makes the semantics of a "temporary function" confusing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14120) Import/Export commands (Exception)
[ https://issues.apache.org/jira/browse/SPARK-14120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14120. --- Resolution: Fixed Assignee: Andrew Or Fix Version/s: 2.0.0 > Import/Export commands (Exception) > -- > > Key: SPARK-14120 > URL: https://issues.apache.org/jira/browse/SPARK-14120 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Andrew Or > Fix For: 2.0.0 > > > Tokens: TOK_IMPORT/TOK_EXPORT (Exception) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14119) Role management commands (Exception)
[ https://issues.apache.org/jira/browse/SPARK-14119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14119. --- Resolution: Fixed Fix Version/s: 2.0.0 > Role management commands (Exception) > > > Key: SPARK-14119 > URL: https://issues.apache.org/jira/browse/SPARK-14119 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Andrew Or > Fix For: 2.0.0 > > > All role management commands should throw exceptions. > TOK_CREATEROLE (Exception) > TOK_DROPROLE (Exception) > TOK_GRANT (Exception) > TOK_GRANT_ROLE (Exception) > TOK_SHOW_GRANT (Exception) > TOK_SHOW_ROLE_GRANT (Exception) > TOK_SHOW_ROLE_PRINCIPALS (Exception) > TOK_SHOW_ROLES (Exception) > TOK_SHOW_SET_ROLE (Exception) > TOK_REVOKE (Exception) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14119) Role management commands (Exception)
[ https://issues.apache.org/jira/browse/SPARK-14119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-14119: - Assignee: Andrew Or > Role management commands (Exception) > > > Key: SPARK-14119 > URL: https://issues.apache.org/jira/browse/SPARK-14119 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Andrew Or > Fix For: 2.0.0 > > > All role management commands should throw exceptions. > TOK_CREATEROLE (Exception) > TOK_DROPROLE (Exception) > TOK_GRANT (Exception) > TOK_GRANT_ROLE (Exception) > TOK_SHOW_GRANT (Exception) > TOK_SHOW_ROLE_GRANT (Exception) > TOK_SHOW_ROLE_PRINCIPALS (Exception) > TOK_SHOW_ROLES (Exception) > TOK_SHOW_SET_ROLE (Exception) > TOK_REVOKE (Exception) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14122) Show commands (Exception)
[ https://issues.apache.org/jira/browse/SPARK-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14122. --- Resolution: Fixed Assignee: Andrew Or Fix Version/s: 2.0.0 > Show commands (Exception) > - > > Key: SPARK-14122 > URL: https://issues.apache.org/jira/browse/SPARK-14122 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Andrew Or > Fix For: 2.0.0 > > > For the following tokens, we should throw exceptions. > TOK_SHOW_CREATETABLE (TBD) > TOK_SHOW_COMPACTIONS (Exception) > TOK_SHOW_TRANSACTIONS (Exception) > TOK_SHOWINDEXES (Exception) > TOK_SHOWLOCKS (Exception) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10570) Add Spark version endpoint to standalone JSON API
[ https://issues.apache.org/jira/browse/SPARK-10570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-10570. --- Resolution: Fixed Assignee: Jakob Odersky Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Add Spark version endpoint to standalone JSON API > - > > Key: SPARK-10570 > URL: https://issues.apache.org/jira/browse/SPARK-10570 > Project: Spark > Issue Type: New Feature > Components: Spark Core, Web UI >Affects Versions: 1.4.1 >Reporter: Matt Cheah >Assignee: Jakob Odersky >Priority: Minor > Labels: api > Fix For: 2.0.0 > > > Let's have an ability to navigate to /api/vX/version to get the Spark version > running on the cluster. > The use case I had in mind is verifying that my Spark application is > compatible with the version of the cluster that my user has installed. I want > to do this without starting a SparkContext and having the context > initialization fail. > This is a strictly additive change, so I think it can just be added to the v1 > API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14232) Event timeline on job page doesn't show if an executor is removed with multiple line reason
[ https://issues.apache.org/jira/browse/SPARK-14232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-14232. --- Resolution: Fixed Fix Version/s: 2.0.0 1.6.2 Target Version/s: 1.6.2, 2.0.0 > Event timeline on job page doesn't show if an executor is removed with > multiple line reason > --- > > Key: SPARK-14232 > URL: https://issues.apache.org/jira/browse/SPARK-14232 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.1 >Reporter: Carson Wang >Assignee: Carson Wang >Priority: Minor > Fix For: 1.6.2, 2.0.0 > > > The event timeline doesn't show on job page if an executor is removed with > multiple line reason. For example, the html looks like below. > {code} > { > 'className': 'executor removed', > 'group': 'executors', > 'start': new Date(1459221687861), > 'content': ' 'data-toggle="tooltip" data-placement="bottom"' + > 'data-title="Executor 21' + > 'Removed at 2016/03/29 11:21:27' + > 'Reason: Container marked as failed: > container_1459220038649_0005_01_22 on host: sr560. Exit status: 137. > Diagnostics: Container killed on request. Exit code is 137 > Container exited with a non-zero exit code 137 > Killed by external signal > "' + > 'data-html="true">Executor 21 removed' > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14232) Event timeline on job page doesn't show if an executor is removed with multiple line reason
[ https://issues.apache.org/jira/browse/SPARK-14232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14232: -- Assignee: Carson Wang > Event timeline on job page doesn't show if an executor is removed with > multiple line reason > --- > > Key: SPARK-14232 > URL: https://issues.apache.org/jira/browse/SPARK-14232 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.1 >Reporter: Carson Wang >Assignee: Carson Wang >Priority: Minor > > The event timeline doesn't show on job page if an executor is removed with > multiple line reason. For example, the html looks like below. > {code} > { > 'className': 'executor removed', > 'group': 'executors', > 'start': new Date(1459221687861), > 'content': ' 'data-toggle="tooltip" data-placement="bottom"' + > 'data-title="Executor 21' + > 'Removed at 2016/03/29 11:21:27' + > 'Reason: Container marked as failed: > container_1459220038649_0005_01_22 on host: sr560. Exit status: 137. > Diagnostics: Container killed on request. Exit code is 137 > Container exited with a non-zero exit code 137 > Killed by external signal > "' + > 'data-html="true">Executor 21 removed' > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13845) BlockStatus and StreamBlockId keep on growing result driver OOM
[ https://issues.apache.org/jira/browse/SPARK-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-13845: -- Assignee: jeanlyn > BlockStatus and StreamBlockId keep on growing result driver OOM > --- > > Key: SPARK-13845 > URL: https://issues.apache.org/jira/browse/SPARK-13845 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.1 >Reporter: jeanlyn >Assignee: jeanlyn > Fix For: 2.0.0 > > > We have a streaming job using *FlumePollInputStream* always driver OOM after > few days, here is some driver heap dump before OOM > {noformat} > num #instances #bytes class name > -- >1: 13845916 553836640 org.apache.spark.storage.BlockStatus >2: 14020324 336487776 org.apache.spark.storage.StreamBlockId >3: 13883881 333213144 scala.collection.mutable.DefaultEntry >4: 8907 89043952 [Lscala.collection.mutable.HashEntry; >5: 62360 65107352 [B >6:163368 24453904 [Ljava.lang.Object; >7:293651 20342664 [C > ... > {noformat} > *BlockStatus* and *StreamBlockId* keep on growing, and the driver OOM in the > end. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14243) updatedBlockStatuses does not update correctly when removing blocks
[ https://issues.apache.org/jira/browse/SPARK-14243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-14243: -- Assignee: jeanlyn > updatedBlockStatuses does not update correctly when removing blocks > --- > > Key: SPARK-14243 > URL: https://issues.apache.org/jira/browse/SPARK-14243 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.1 >Reporter: jeanlyn >Assignee: jeanlyn > > Currently, *updatedBlockStatuses* of *TaskMetrics* does not update correctly > when removing blocks in *BlockManager.removeBlock* and the method invoke > *removeBlock*. See: > branch-1.6:https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1108 > branch-1.5:https://github.com/apache/spark/blob/branch-1.5/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1101 > We should make sure *updatedBlockStatuses* update correctly when: > * Block removed from BlockManager > * Block dropped from memory to disk > * Block added to BlockManager -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14243) updatedBlockStatuses does not update correctly when removing blocks
[ https://issues.apache.org/jira/browse/SPARK-14243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216448#comment-15216448 ] Andrew Or commented on SPARK-14243: --- Thanks I've assigned this to you. > updatedBlockStatuses does not update correctly when removing blocks > --- > > Key: SPARK-14243 > URL: https://issues.apache.org/jira/browse/SPARK-14243 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.1 >Reporter: jeanlyn >Assignee: jeanlyn > > Currently, *updatedBlockStatuses* of *TaskMetrics* does not update correctly > when removing blocks in *BlockManager.removeBlock* and the method invoke > *removeBlock*. See: > branch-1.6:https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1108 > branch-1.5:https://github.com/apache/spark/blob/branch-1.5/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1101 > We should make sure *updatedBlockStatuses* update correctly when: > * Block removed from BlockManager > * Block dropped from memory to disk > * Block added to BlockManager -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14057) sql time stamps do not respect time zones
[ https://issues.apache.org/jira/browse/SPARK-14057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216293#comment-15216293 ] Andrew Davidson commented on SPARK-14057: - Hi Vijay I am fairly new to this also. I think I would contact the Russel about his analysis spark-connector potentially making a change to the way dates are handled could break a lot of existing apps. I think the key is to figure how to come up with a solution that is backwards compatible. I have filed several bugs in the past but only after discussion on the users email group. I would suggest before you do a lot of work you see what the user group thinks many thanks for taking this on Andy > sql time stamps do not respect time zones > - > > Key: SPARK-14057 > URL: https://issues.apache.org/jira/browse/SPARK-14057 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Andrew Davidson >Priority: Minor > > we have time stamp data. The time stamp data is UTC how ever when we load the > data into spark data frames, the system assume the time stamps are in the > local time zone. This causes problems for our data scientists. Often they > pull data from our data center into their local macs. The data centers run > UTC. There computers are typically in PST or EST. > It is possible to hack around this problem > This cause a lot of errors in their analysis > A complete description of this issue can be found in the following mail msg > https://www.mail-archive.com/user@spark.apache.org/msg48121.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13447) Fix AM failure situation for dynamic allocation disabled situation
[ https://issues.apache.org/jira/browse/SPARK-13447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-13447. --- Resolution: Fixed Assignee: Saisai Shao Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Fix AM failure situation for dynamic allocation disabled situation > -- > > Key: SPARK-13447 > URL: https://issues.apache.org/jira/browse/SPARK-13447 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 1.6.0 >Reporter: Saisai Shao >Assignee: Saisai Shao > Fix For: 2.0.0 > > > Because of lag of executor disconnection events, stale states in the > {{CoarseGrainedSchedulerBacked}} will ruin the newly registered executor > information. This situation is handled for dynamic allocation enabled > situation, for dynamic allocation disabled scenario, we should also have the > similar fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13845) BlockStatus and StreamBlockId keep on growing result driver OOM
[ https://issues.apache.org/jira/browse/SPARK-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-13845: -- Target Version/s: 1.6.2, 2.0.0 (was: 1.6.1, 2.0.0) > BlockStatus and StreamBlockId keep on growing result driver OOM > --- > > Key: SPARK-13845 > URL: https://issues.apache.org/jira/browse/SPARK-13845 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.1 >Reporter: jeanlyn > Fix For: 2.0.0 > > > We have a streaming job using *FlumePollInputStream* always driver OOM after > few days, here is some driver heap dump before OOM > {noformat} > num #instances #bytes class name > -- >1: 13845916 553836640 org.apache.spark.storage.BlockStatus >2: 14020324 336487776 org.apache.spark.storage.StreamBlockId >3: 13883881 333213144 scala.collection.mutable.DefaultEntry >4: 8907 89043952 [Lscala.collection.mutable.HashEntry; >5: 62360 65107352 [B >6:163368 24453904 [Ljava.lang.Object; >7:293651 20342664 [C > ... > {noformat} > *BlockStatus* and *StreamBlockId* keep on growing, and the driver OOM in the > end. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13845) BlockStatus and StreamBlockId keep on growing result driver OOM
[ https://issues.apache.org/jira/browse/SPARK-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-13845: -- Target Version/s: 1.6.1, 2.0.0 Fix Version/s: 2.0.0 > BlockStatus and StreamBlockId keep on growing result driver OOM > --- > > Key: SPARK-13845 > URL: https://issues.apache.org/jira/browse/SPARK-13845 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2, 1.6.1 >Reporter: jeanlyn > Fix For: 2.0.0 > > > We have a streaming job using *FlumePollInputStream* always driver OOM after > few days, here is some driver heap dump before OOM > {noformat} > num #instances #bytes class name > -- >1: 13845916 553836640 org.apache.spark.storage.BlockStatus >2: 14020324 336487776 org.apache.spark.storage.StreamBlockId >3: 13883881 333213144 scala.collection.mutable.DefaultEntry >4: 8907 89043952 [Lscala.collection.mutable.HashEntry; >5: 62360 65107352 [B >6:163368 24453904 [Ljava.lang.Object; >7:293651 20342664 [C > ... > {noformat} > *BlockStatus* and *StreamBlockId* keep on growing, and the driver OOM in the > end. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14013) Properly implement temporary functions in SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-14013: - Assignee: Andrew Or > Properly implement temporary functions in SessionCatalog > > > Key: SPARK-14013 > URL: https://issues.apache.org/jira/browse/SPARK-14013 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Andrew Or >Assignee: Andrew Or > > Right now `SessionCatalog` just contains `CatalogFunction`, which is > metadata. In the future the catalog should probably take in a function > registry or something. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14057) sql time stamps do not respect time zones
[ https://issues.apache.org/jira/browse/SPARK-14057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214334#comment-15214334 ] Andrew Davidson commented on SPARK-14057: - Hi Vijay here is some more info from the email thread mentioned above Russel is very involved in the development of cassandra and the spark cassandra connector. He has a suggestion for how to fix this bug kind regards Andy http://www.slideshare.net/RussellSpitzer On Fri, Mar 18, 2016 at 11:35 AM Russell Spitzerwrote: > Unfortunately part of Spark SQL. They have based their type on > java.sql.timestamp (and date) which adjust to the client timezone when > displaying and storing. > See discussions > http://stackoverflow.com/questions/9202857/timezones-in-sql-date-vs-java-sql-d > ate > And Code > https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/ > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.s > cala#L81-L93 > > sql time stamps do not respect time zones > - > > Key: SPARK-14057 > URL: https://issues.apache.org/jira/browse/SPARK-14057 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Andrew Davidson >Priority: Minor > > we have time stamp data. The time stamp data is UTC how ever when we load the > data into spark data frames, the system assume the time stamps are in the > local time zone. This causes problems for our data scientists. Often they > pull data from our data center into their local macs. The data centers run > UTC. There computers are typically in PST or EST. > It is possible to hack around this problem > This cause a lot of errors in their analysis > A complete description of this issue can be found in the following mail msg > https://www.mail-archive.com/user@spark.apache.org/msg48121.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14079) Limit the number of queries on SQL UI
[ https://issues.apache.org/jira/browse/SPARK-14079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207275#comment-15207275 ] Andrew Or commented on SPARK-14079: --- we should do `maxRetainedQueries` or something, similar to what we already do for the All Jobs page. > Limit the number of queries on SQL UI > - > > Key: SPARK-14079 > URL: https://issues.apache.org/jira/browse/SPARK-14079 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > > The SQL UI become very very slow if there are hundreds of SQL queries on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14057) sql time stamps do not respect time zones
Andrew Davidson created SPARK-14057: --- Summary: sql time stamps do not respect time zones Key: SPARK-14057 URL: https://issues.apache.org/jira/browse/SPARK-14057 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.0 Reporter: Andrew Davidson Priority: Minor we have time stamp data. The time stamp data is UTC how ever when we load the data into spark data frames, the system assume the time stamps are in the local time zone. This causes problems for our data scientists. Often they pull data from our data center into their local macs. The data centers run UTC. There computers are typically in PST or EST. It is possible to hack around this problem This cause a lot of errors in their analysis A complete description of this issue can be found in the following mail msg https://www.mail-archive.com/user@spark.apache.org/msg48121.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14013) Properly implement temporary functions in SessionCatalog
Andrew Or created SPARK-14013: - Summary: Properly implement temporary functions in SessionCatalog Key: SPARK-14013 URL: https://issues.apache.org/jira/browse/SPARK-14013 Project: Spark Issue Type: Bug Reporter: Andrew Or Right now `SessionCatalog` just contains `CatalogFunction`, which is metadata. In the future the catalog should probably take in a function registry or something. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14014) Replace existing analysis.Catalog with SessionCatalog
Andrew Or created SPARK-14014: - Summary: Replace existing analysis.Catalog with SessionCatalog Key: SPARK-14014 URL: https://issues.apache.org/jira/browse/SPARK-14014 Project: Spark Issue Type: Bug Reporter: Andrew Or Assignee: Andrew Or As of this moment, there exist many catalogs in Spark. For Spark 2.0, we will have two high level catalogs only: SessionCatalog and ExternalCatalog. SessionCatalog (implemented in SPARK-13923) keeps track of temporary functions and tables and delegates other operations to ExternalCatalog. At the same time, there's this legacy catalog called `analysis.Catalog` that also tracks temporary functions and tables. The goal is to get rid of this legacy catalog and replace it with SessionCatalog, which is the new thing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13923) Implement SessionCatalog to manage temp functions and tables
Andrew Or created SPARK-13923: - Summary: Implement SessionCatalog to manage temp functions and tables Key: SPARK-13923 URL: https://issues.apache.org/jira/browse/SPARK-13923 Project: Spark Issue Type: Bug Components: SQL Reporter: Andrew Or Assignee: Andrew Or Today, we have ExternalCatalog, which is dead code. As part of the effort of merging SQLContext/HiveContext we'll parse Hive commands and call the corresponding methods in ExternalCatalog. However, this handles only persisted things. We need something in addition to that to handle temporary things. The new catalog is called SessionCatalog and will internally call ExternalCatalog. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6157) Unrolling with MEMORY_AND_DISK should always release memory
[ https://issues.apache.org/jira/browse/SPARK-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-6157. -- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Unrolling with MEMORY_AND_DISK should always release memory > --- > > Key: SPARK-6157 > URL: https://issues.apache.org/jira/browse/SPARK-6157 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 1.2.1 >Reporter: SuYan >Assignee: Josh Rosen > Fix For: 2.0.0 > > > === EDIT by andrewor14 === > The existing description was somewhat confusing, so here's a more succinct > version of it. > If unrolling a block with MEMORY_AND_DISK was unsuccessful, we will drop the > block to disk > directly. After doing so, however, we don't need the underlying array that > held the partial > values anymore, so we should release the pending unroll memory for other > tasks on the same > executor. Otherwise, other tasks may unnecessarily drop their blocks to disk > due to the lack > of unroll space, resulting in worse performance. > === Original comment === > Current code: > Now we want to cache a Memory_and_disk level block > 1. Try to put in memory and unroll unsuccessful. then reserved unroll memory > because we got a iterator from an unroll Array > 2. Then put into disk. > 3. Get value from get(blockId), and iterator from that value, and then > nothing with an unroll Array. So here we should release the reserved unroll > memory instead will release until the task is end. > and also, have somebody already pull a request, for get Memory_and_disk level > block, while cache in memory from disk, we should, use file.length to check > if we can put in memory store instead just allocate a file.length buffer, may > lead to OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10907) Get rid of pending unroll memory
[ https://issues.apache.org/jira/browse/SPARK-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-10907. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Get rid of pending unroll memory > > > Key: SPARK-10907 > URL: https://issues.apache.org/jira/browse/SPARK-10907 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 1.4.0 >Reporter: Andrew Or >Assignee: Josh Rosen > Fix For: 2.0.0 > > > It's incredibly complicated to have both unroll memory and pending unroll > memory in MemoryStore.scala. We can probably express it with only unroll > memory through some minor refactoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13054) Always post TaskEnd event for tasks in cancelled stages
[ https://issues.apache.org/jira/browse/SPARK-13054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-13054. --- Resolution: Fixed Assignee: Thomas Graves (was: Andrew Or) Fix Version/s: 2.0.0 > Always post TaskEnd event for tasks in cancelled stages > --- > > Key: SPARK-13054 > URL: https://issues.apache.org/jira/browse/SPARK-13054 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Thomas Graves > Fix For: 2.0.0 > > > {code} > // The success case is dealt with separately below. > // TODO: Why post it only for failed tasks in cancelled stages? Clarify > semantics here. > if (event.reason != Success) { > val attemptId = task.stageAttemptId > listenerBus.post(SparkListenerTaskEnd( > stageId, attemptId, taskType, event.reason, event.taskInfo, > taskMetrics)) > } > {code} > Today we only post task end events for canceled stages if the task failed. > There is no reason why we shouldn't just post it for all the tasks, including > the ones that succeeded. If we do that we will be able to simplify another > branch in the DAGScheduler, which needs a lot of simplification. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12583) spark shuffle fails with mesos after 2mins
[ https://issues.apache.org/jira/browse/SPARK-12583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12583: -- Assignee: Bertrand Bossy > spark shuffle fails with mesos after 2mins > -- > > Key: SPARK-12583 > URL: https://issues.apache.org/jira/browse/SPARK-12583 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 1.6.0 >Reporter: Adrian Bridgett >Assignee: Bertrand Bossy > Fix For: 2.0.0 > > > See user mailing list "Executor deregistered after 2mins" for more details. > As of 1.6, the driver registers with each shuffle manager via > MesosExternalShuffleClient. Once this disconnects, the shuffle manager > automatically cleans up the data associate with that driver. > However, the connection is terminated before this happens as it's idle. > Looking at a packet trace, after 120secs the shuffle manager is sending a FIN > packet to the driver. The only way to delay this is to increase > spark.shuffle.io.connectionTimeout=3600s on the shuffle manager. > I patched the MesosExternalShuffleClient (and ExternalShuffleClient) with > newbie Scala skills to call the TransportContext call with > closeIdleConnections "false" and this didn't help (hadn't done the network > trace first). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12583) spark shuffle fails with mesos after 2mins
[ https://issues.apache.org/jira/browse/SPARK-12583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12583. --- Resolution: Fixed Fix Version/s: 2.0.0 > spark shuffle fails with mesos after 2mins > -- > > Key: SPARK-12583 > URL: https://issues.apache.org/jira/browse/SPARK-12583 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 1.6.0 >Reporter: Adrian Bridgett > Fix For: 2.0.0 > > > See user mailing list "Executor deregistered after 2mins" for more details. > As of 1.6, the driver registers with each shuffle manager via > MesosExternalShuffleClient. Once this disconnects, the shuffle manager > automatically cleans up the data associate with that driver. > However, the connection is terminated before this happens as it's idle. > Looking at a packet trace, after 120secs the shuffle manager is sending a FIN > packet to the driver. The only way to delay this is to increase > spark.shuffle.io.connectionTimeout=3600s on the shuffle manager. > I patched the MesosExternalShuffleClient (and ExternalShuffleClient) with > newbie Scala skills to call the TransportContext call with > closeIdleConnections "false" and this didn't help (hadn't done the network > trace first). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13833) Guard against race condition when re-caching spilled bytes in memory
[ https://issues.apache.org/jira/browse/SPARK-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-13833. --- Resolution: Fixed Fix Version/s: 2.0.0 > Guard against race condition when re-caching spilled bytes in memory > > > Key: SPARK-13833 > URL: https://issues.apache.org/jira/browse/SPARK-13833 > Project: Spark > Issue Type: Improvement > Components: Block Manager >Reporter: Josh Rosen >Assignee: Apache Spark > Fix For: 2.0.0 > > > When reading data from the DiskStore and attempting to cache it back into the > memory store, we should guard against race conditions where multiple readers > are attempting to re-cache the same block in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13328) Possible poor read performance for broadcast variables with dynamic resource allocation
[ https://issues.apache.org/jira/browse/SPARK-13328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-13328. --- Resolution: Fixed Assignee: Nezih Yigitbasi Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Possible poor read performance for broadcast variables with dynamic resource > allocation > --- > > Key: SPARK-13328 > URL: https://issues.apache.org/jira/browse/SPARK-13328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Nezih Yigitbasi >Assignee: Nezih Yigitbasi > Fix For: 2.0.0 > > > When dynamic resource allocation is enabled fetching broadcast variables from > removed executors were causing job failures and SPARK-9591 fixed this problem > by trying all locations of a block before giving up. However, the locations > of a block is retrieved only once from the driver in this process and the > locations in this list can be stale due to dynamic resource allocation. This > situation gets worse when running on a large cluster as the size of this > location list can be in the order of several hundreds out of which there may > be tens of stale entries. What we have observed is with the default settings > of 3 max retries and 5s between retries (that's 15s per location) the time it > takes to read a broadcast variable can be as high as ~17m (below log shows > the failed 70th block fetch attempt where each attempt takes 15s) > {code} > ... > 16/02/13 01:02:27 WARN storage.BlockManager: Failed to fetch remote block > broadcast_18_piece0 from BlockManagerId(8, ip-10-178-77-38.ec2.internal, > 60675) (failed attempt 70) > ... > 16/02/13 01:02:27 INFO broadcast.TorrentBroadcast: Reading broadcast variable > 18 took 1051049 ms > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13604) Sync worker's state after registering with master
[ https://issues.apache.org/jira/browse/SPARK-13604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-13604. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Sync worker's state after registering with master > - > > Key: SPARK-13604 > URL: https://issues.apache.org/jira/browse/SPARK-13604 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > Fix For: 2.0.0 > > > If Master cannot talk with Worker for a while and then network is back, > Worker may leak existing executors and drivers. We should sync worker's state > after registering with master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3854) Scala style: require spaces before `{`
[ https://issues.apache.org/jira/browse/SPARK-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-3854: - Assignee: Dongjoon Hyun > Scala style: require spaces before `{` > -- > > Key: SPARK-3854 > URL: https://issues.apache.org/jira/browse/SPARK-3854 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Reporter: Josh Rosen >Assignee: Dongjoon Hyun > Fix For: 2.0.0 > > > We should require spaces before opening curly braces. This isn't in the > style guide, but it probably should be: > {code} > // Correct: > if (true) { > println("Wow!") > } > // Incorrect: > if (true){ >println("Wow!") > } > {code} > See https://github.com/apache/spark/pull/1658#discussion-diff-18611791 for an > example "in the wild." > {{git grep "){"}} shows only a few occurrences of this style. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3854) Scala style: require spaces before `{`
[ https://issues.apache.org/jira/browse/SPARK-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-3854. -- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Scala style: require spaces before `{` > -- > > Key: SPARK-3854 > URL: https://issues.apache.org/jira/browse/SPARK-3854 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Reporter: Josh Rosen > Fix For: 2.0.0 > > > We should require spaces before opening curly braces. This isn't in the > style guide, but it probably should be: > {code} > // Correct: > if (true) { > println("Wow!") > } > // Incorrect: > if (true){ >println("Wow!") > } > {code} > See https://github.com/apache/spark/pull/1658#discussion-diff-18611791 for an > example "in the wild." > {{git grep "){"}} shows only a few occurrences of this style. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13696) Remove BlockStore interface to more cleanly reflect different memory and disk store responsibilities
[ https://issues.apache.org/jira/browse/SPARK-13696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-13696. --- Resolution: Fixed Fix Version/s: 2.0.0 > Remove BlockStore interface to more cleanly reflect different memory and disk > store responsibilities > > > Key: SPARK-13696 > URL: https://issues.apache.org/jira/browse/SPARK-13696 > Project: Spark > Issue Type: Improvement > Components: Block Manager >Reporter: Josh Rosen >Assignee: Josh Rosen > Fix For: 2.0.0 > > > Today, both the MemoryStore and DiskStore implement a common BlockStore API, > but I feel that this API is inappropriate because it abstracts away important > distinctions between the behavior of these two stores. > For instance, the disk store doesn't have a notion of storing deserialized > objects, so it's confusing for it to expose object-based APIs like > putIterator() and getValues() instead of only exposing binary APIs and > pushing the responsibilities of serialization and deserialization to the > client. > As part of a larger BlockManager interface cleanup, I'd like to remove the > BlockStore API and refine the MemoryStore and DiskStore interfaces to reflect > more narrow sets of responsibilities for those components. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org