from:"Andrew Or \(JIRA\)"

[jira] [Updated] (SPARK-14445) Support native execution of SHOW COLUMNS and SHOW PARTITIONS command

2016-04-13 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14445:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-14118

> Support native execution of SHOW COLUMNS  and SHOW PARTITIONS command
> -
>
> Key: SPARK-14445
> URL: https://issues.apache.org/jira/browse/SPARK-14445
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Dilip Biswal
>
> 1. Support native execution of SHOW COLUMNS
> 2. Support native execution of SHOW PARTITIONS
> The syntax of SHOW commands are described in following link.
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Show



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14445) Show columns/partitions

2016-04-13 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14445:
--
Summary: Show columns/partitions  (was: show columns/partitions)

> Show columns/partitions
> ---
>
> Key: SPARK-14445
> URL: https://issues.apache.org/jira/browse/SPARK-14445
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Dilip Biswal
>
> 1. Support native execution of SHOW COLUMNS
> 2. Support native execution of SHOW PARTITIONS
> The syntax of SHOW commands are described in following link.
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Show



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14445) show columns/partitions

2016-04-13 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14445:
--
Summary: show columns/partitions  (was: Support native execution of SHOW 
COLUMNS  and SHOW PARTITIONS command)

> show columns/partitions
> ---
>
> Key: SPARK-14445
> URL: https://issues.apache.org/jira/browse/SPARK-14445
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Dilip Biswal
>
> 1. Support native execution of SHOW COLUMNS
> 2. Support native execution of SHOW PARTITIONS
> The syntax of SHOW commands are described in following link.
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Show



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14499) Add tests to make sure drop partitions of an external table will not delete data

2016-04-13 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14499:
--
Assignee: Xiao Li

> Add tests to make sure drop partitions of an external table will not delete 
> data
> 
>
> Key: SPARK-14499
> URL: https://issues.apache.org/jira/browse/SPARK-14499
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Xiao Li
>
> This is a follow-up of SPARK-14132 
> (https://github.com/apache/spark/pull/12220#issuecomment-207625166) to 
> address https://github.com/apache/spark/pull/12220#issuecomment-207612627.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14441) Consolidate DDL tests

2016-04-13 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-14441:
-

Assignee: Andrew Or

> Consolidate DDL tests
> -
>
> Key: SPARK-14441
> URL: https://issues.apache.org/jira/browse/SPARK-14441
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> Today we have DDLSuite, DDLCommandSuite, HiveDDLCommandSuite. It's confusing 
> whether a test should exist in one or the other. It also makes it less clear 
> whether our test coverage is comprehensive. Ideally we should consolidate 
> these files as much as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-14414) Make error messages consistent across DDLs

2016-04-12 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reopened SPARK-14414:
---

> Make error messages consistent across DDLs
> --
>
> Key: SPARK-14414
> URL: https://issues.apache.org/jira/browse/SPARK-14414
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> There are many different error messages right now when the user tries to run 
> something that's not supported. We might throw AnalysisException or 
> ParseException or NoSuchFunctionException etc. We should make all of these 
> consistent before 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14357) Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure

2016-04-10 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14357:
--
Assignee: Jason Moore

> Tasks that fail due to CommitDeniedException (a side-effect of speculation) 
> can cause job failure
> -
>
> Key: SPARK-14357
> URL: https://issues.apache.org/jira/browse/SPARK-14357
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.0, 1.6.1
>Reporter: Jason Moore
>Assignee: Jason Moore
>Priority: Critical
> Fix For: 1.5.2, 1.6.2, 2.0.0
>
>
> Speculation can often result in a CommitDeniedException, but ideally this 
> shouldn't result in the job failing.  So changes were made along with 
> SPARK-8167 to ensure that the CommitDeniedException is caught and given a 
> failure reason that doesn't increment the failure count.
> However, I'm still noticing that this exception is causing jobs to fail using 
> the 1.6.1 release version.
> {noformat}
> 16/04/04 11:36:02 ERROR InsertIntoHadoopFsRelation: Aborting job.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 18 in 
> stage 315.0 failed 8 times, most recent failure: Lost task 18.8 in stage 
> 315.0 (TID 100793, qaphdd099.quantium.com.au.local): 
> org.apache.spark.SparkException: Task failed while writing rows.
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:272)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Failed to commit task
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:287)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:267)
> ... 8 more
> Caused by: org.apache.spark.executor.CommitDeniedException: 
> attempt_201604041136_0315_m_18_8: Not committed because the driver did 
> not authorize commit
> at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:135)
> at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:219)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:282)
> ... 9 more
> {noformat}
> It seems to me that the CommitDeniedException gets wrapped into a 
> RuntimeException at 
> [WriterContainer.scala#L286|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala#L286]
>  and then into a SparkException at 
> [InsertIntoHadoopFsRelation.scala#L154|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L154]
>  which results in it not being able to be handled properly at 
> [Executor.scala#L290|https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/executor/Executor.scala#L290]
> The solution might be that this catch block should type match on the 
> inner-most cause of an error?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14357) Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure

2016-04-10 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14357:
--
Target Version/s: 1.5.2, 1.6.2, 2.0.0
   Fix Version/s: 1.5.2
  2.0.0
  1.6.2

> Tasks that fail due to CommitDeniedException (a side-effect of speculation) 
> can cause job failure
> -
>
> Key: SPARK-14357
> URL: https://issues.apache.org/jira/browse/SPARK-14357
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.0, 1.6.1
>Reporter: Jason Moore
>Priority: Critical
> Fix For: 1.5.2, 1.6.2, 2.0.0
>
>
> Speculation can often result in a CommitDeniedException, but ideally this 
> shouldn't result in the job failing.  So changes were made along with 
> SPARK-8167 to ensure that the CommitDeniedException is caught and given a 
> failure reason that doesn't increment the failure count.
> However, I'm still noticing that this exception is causing jobs to fail using 
> the 1.6.1 release version.
> {noformat}
> 16/04/04 11:36:02 ERROR InsertIntoHadoopFsRelation: Aborting job.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 18 in 
> stage 315.0 failed 8 times, most recent failure: Lost task 18.8 in stage 
> 315.0 (TID 100793, qaphdd099.quantium.com.au.local): 
> org.apache.spark.SparkException: Task failed while writing rows.
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:272)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Failed to commit task
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:287)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:267)
> ... 8 more
> Caused by: org.apache.spark.executor.CommitDeniedException: 
> attempt_201604041136_0315_m_18_8: Not committed because the driver did 
> not authorize commit
> at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:135)
> at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:219)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:282)
> ... 9 more
> {noformat}
> It seems to me that the CommitDeniedException gets wrapped into a 
> RuntimeException at 
> [WriterContainer.scala#L286|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala#L286]
>  and then into a SparkException at 
> [InsertIntoHadoopFsRelation.scala#L154|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L154]
>  which results in it not being able to be handled properly at 
> [Executor.scala#L290|https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/executor/Executor.scala#L290]
> The solution might be that this catch block should type match on the 
> inner-most cause of an error?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14455) ReceiverTracker#allocatedExecutors throw NPE for receiver-less streaming application

2016-04-10 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14455.
---
  Resolution: Fixed
Assignee: Saisai Shao
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> ReceiverTracker#allocatedExecutors throw NPE for receiver-less streaming 
> application
> 
>
> Key: SPARK-14455
> URL: https://issues.apache.org/jira/browse/SPARK-14455
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
> Fix For: 2.0.0
>
>
> When testing streaming dynamic allocation with direct Kafka, streaming 
> {{ExecutorAllocationManager}} throws NPE which is due to {{ReceiverTracker}} 
> not start, it will happen when running receiver-less streaming application 
> like direct Kafka. Here is the exception log:
> {noformat}
> Exception in thread "RecurringTimer - streaming-executor-allocation-manager" 
> java.lang.NullPointerException
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker.allocatedExecutors(ReceiverTracker.scala:244)
>   at 
> org.apache.spark.streaming.scheduler.ExecutorAllocationManager.killExecutor(ExecutorAllocationManager.scala:124)
>   at 
> org.apache.spark.streaming.scheduler.ExecutorAllocationManager.org$apache$spark$streaming$scheduler$ExecutorAllocationManager$$manageAllocation(ExecutorAllocationManager.scala:100)
>   at 
> org.apache.spark.streaming.scheduler.ExecutorAllocationManager$$anonfun$1.apply$mcVJ$sp(ExecutorAllocationManager.scala:66)
>   at 
> org.apache.spark.streaming.util.RecurringTimer.triggerActionForNextInterval(RecurringTimer.scala:94)
>   at 
> org.apache.spark.streaming.util.RecurringTimer.org$apache$spark$streaming$util$RecurringTimer$$loop(RecurringTimer.scala:106)
>   at 
> org.apache.spark.streaming.util.RecurringTimer$$anon$1.run(RecurringTimer.scala:29)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14506) HiveClientImpl's toHiveTable misses a table property for external tables

2016-04-10 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14506.
---
   Resolution: Fixed
 Assignee: Yin Huai
Fix Version/s: 2.0.0

> HiveClientImpl's toHiveTable misses a table property for external tables
> 
>
> Key: SPARK-14506
> URL: https://issues.apache.org/jira/browse/SPARK-14506
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
> Fix For: 2.0.0
>
>
> For an external table stored in Hive metastore, its table property needs to 
> have a filed called EXTERNAL and its value needs to be TRUE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14468) Always enable OutputCommitCoordinator

2016-04-07 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14468.
---
  Resolution: Fixed
   Fix Version/s: 1.5.2
  2.0.0
  1.6.2
  1.4.2
Target Version/s: 1.5.2, 1.4.2, 1.6.2, 2.0.0  (was: 1.4.2, 1.5.2, 1.6.2, 
2.0.0)

> Always enable OutputCommitCoordinator
> -
>
> Key: SPARK-14468
> URL: https://issues.apache.org/jira/browse/SPARK-14468
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.4.2, 1.6.2, 2.0.0, 1.5.2
>
>
> The OutputCommitCoordinator was originally introduced in SPARK-4879 because 
> speculation causes the output of some partitions to be deleted. However, as 
> we can see in SPARK-10063, speculation is not the only case where this can 
> happen.
> More specifically, when we retry a stage we're not guaranteed to kill the 
> tasks that are still running (we don't even interrupt their threads), so we 
> may end up with multiple concurrent task attempts for the same task. This 
> leads to problems like SPARK-8029, but this fix alone is necessary but not 
> sufficient.
> In general, when we run into situations like these, we need the 
> OutputCommitCoordinator because we don't control what the underlying file 
> system does. Enabling this doesn't induce heavy performance costs so there's 
> little reason why we shouldn't always enable it to ensure correctness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14388) Create Table

2016-04-07 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-14388:
-

Assignee: Andrew Or

> Create Table
> 
>
> Key: SPARK-14388
> URL: https://issues.apache.org/jira/browse/SPARK-14388
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Andrew Or
>
> For now, we still ask Hive to handle creating hive tables. We should handle 
> them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-14410) SessionCatalog needs to check function existence

2016-04-07 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14410:
--
Comment: was deleted

(was: User 'rekhajoshm' has created a pull request for this issue:
https://github.com/apache/spark/pull/12183)

> SessionCatalog needs to check function existence 
> -
>
> Key: SPARK-14410
> URL: https://issues.apache.org/jira/browse/SPARK-14410
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> Right now, operations for an existing functions in SessionCatalog do not 
> really check if the function exists. We should add this check and avoid of 
> doing the check in command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14468) Always enable OutputCommitCoordinator

2016-04-07 Thread Andrew Or (JIRA)

Andrew Or created SPARK-14468:
-

 Summary: Always enable OutputCommitCoordinator
 Key: SPARK-14468
 URL: https://issues.apache.org/jira/browse/SPARK-14468
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Andrew Or
Assignee: Andrew Or


The OutputCommitCoordinator was originally introduced in SPARK-4879 because 
speculation causes the output of some partitions to be deleted. However, as we 
can see in SPARK-10063, speculation is not the only case where this can happen.

More specifically, when we retry a stage we're not guaranteed to kill the tasks 
that are still running (we don't even interrupt their threads), so we may end 
up with multiple concurrent task attempts for the same task. This leads to 
problems like SPARK-8029, but this fix alone is necessary but not sufficient.

In general, when we run into situations like these, we need the 
OutputCommitCoordinator because we don't control what the underlying file 
system does. Enabling this doesn't induce heavy performance costs so there's 
little reason why we shouldn't always enable it to ensure correctness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14130) [Table related commands] Alter column

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-14130:
-

Assignee: Andrew Or

> [Table related commands] Alter column
> -
>
> Key: SPARK-14130
> URL: https://issues.apache.org/jira/browse/SPARK-14130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Andrew Or
>
> For alter column command, we have the following tokens.
> TOK_ALTERTABLE_RENAMECOL
> TOK_ALTERTABLE_ADDCOLS
> TOK_ALTERTABLE_REPLACECOLS
> For data source tables, we should throw exceptions. For Hive tables, we 
> should support them. *For Hive tables, we should check Hive's behavior to see 
> if there is any file format that does not any of above command*. 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
>  is a good reference for Hive's behavior. 
> Also, for a Hive table stored in a format, we need to make sure that even if 
> Spark can read this tables after an alter column operation. If we cannot read 
> the table, even Hive allows the alter column operation, we should still throw 
> an exception. For example, if renaming a column of a Hive parquet table 
> causes the renamed column inaccessible (we cannot read values), we should not 
> allow this renaming operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13112) CoarsedExecutorBackend register to driver should wait Executor was ready

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13112.
---
  Resolution: Fixed
Assignee: Shixiong Zhu
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> CoarsedExecutorBackend register to driver should wait Executor was ready
> 
>
> Key: SPARK-13112
> URL: https://issues.apache.org/jira/browse/SPARK-13112
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: SuYan
>Assignee: Shixiong Zhu
> Fix For: 2.0.0
>
>
> desc： 
> due to some host's disk are busy, it will results failed in timeoutException 
> while executor try to register to shuffler server on that host... 
> and then it will exit(1) while launch task on a null executor.
> and yarn cluster resource are a little busy, yarn will thought that host is 
> idle, it will prefer to allocate the same host executor, so it will have a 
> chance that one task failed 4 times in the same host. 
> currently, CoarsedExecutorBackend register to driver first, and after 
> registerDriver successful, then initial Executor. 
> if exception occurs in Executor initialization,
> But Driver don't know that event, will still launch task in that executor,
> then will call system.exit(1). 
> {code}
>  override def receive: PartialFunction[Any, Unit] = { 
>   case RegisteredExecutor(hostname) => 
>   logInfo("Successfully registered with driver") executor = new 
> Executor(executorId, hostname, env, userClassPath, isLocal = false) 
> ..
> case LaunchTask(data) =>
>if (executor == null) {
> logError("Received LaunchTask command but executor was null")
> System.exit(1) 
> {code}
>  It is more reasonable to register with driver after Executor is ready... and 
> make registerTimeout to be configurable...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14444) Add a new scalastyle `NoScalaDoc` to prevent ScalaDoc-style multiline comments

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-1.
---
  Resolution: Fixed
Assignee: Dongjoon Hyun
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Add a new scalastyle `NoScalaDoc` to prevent ScalaDoc-style multiline comments
> --
>
> Key: SPARK-1
> URL: https://issues.apache.org/jira/browse/SPARK-1
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
> Fix For: 2.0.0
>
>
> According to the [Spark Code Style 
> Guide|https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Indentation],
>  this issue add a new scalastyle rule to prevent the followings.
> {code}
> /** In Spark, we don't use the ScalaDoc style so this
>   * is not correct.
>   */
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14424) spark-class and related (spark-shell, etc.) no longer work with sbt build as documented

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14424.
---
  Resolution: Fixed
Assignee: holdenk
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> spark-class and related (spark-shell, etc.) no longer work with sbt build as 
> documented
> ---
>
> Key: SPARK-14424
> URL: https://issues.apache.org/jira/browse/SPARK-14424
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: holdenk
>Assignee: holdenk
>Priority: Minor
> Fix For: 2.0.0
>
>
> SPARK-13579 disabled building the assembly artifacts. Our shell scripts 
> (specifically spark-class but everything depends on it) aren't yet ready for 
> this change.
> Repro steps:
> ./build/sbt clean compile assembly && ./bin/spark-shell
> Result:
> "
> Failed to find Spark jars directory (.../assembly/target/scala-2.11/jars).
> You need to build Spark before running this program.
> "
> Follow up fix docs and clarify assembly requirement has been replaced with 
> package 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14391) Flaky Test org.apache.spark.launcher.LauncherServerSuite.testCommunication

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14391:
--
Fix Version/s: (was: 1.6.2)

> Flaky Test org.apache.spark.launcher.LauncherServerSuite.testCommunication
> --
>
> Key: SPARK-14391
> URL: https://issues.apache.org/jira/browse/SPARK-14391
> Project: Spark
>  Issue Type: Test
>  Components: Spark Submit
>Reporter: Burak Yavuz
>Assignee: Marcelo Vanzin
>  Labels: flaky-test
> Fix For: 2.0.0
>
>
> Several Failures:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54893
> https://spark-tests.appspot.com/tests/org.apache.spark.launcher.LauncherServerSuite/testCommunication



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14391) Flaky Test org.apache.spark.launcher.LauncherServerSuite.testCommunication

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14391:
--
Target Version/s: 2.0.0  (was: 1.6.2, 2.0.0)

> Flaky Test org.apache.spark.launcher.LauncherServerSuite.testCommunication
> --
>
> Key: SPARK-14391
> URL: https://issues.apache.org/jira/browse/SPARK-14391
> Project: Spark
>  Issue Type: Test
>  Components: Spark Submit
>Reporter: Burak Yavuz
>Assignee: Marcelo Vanzin
>  Labels: flaky-test
> Fix For: 2.0.0
>
>
> Several Failures:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54893
> https://spark-tests.appspot.com/tests/org.apache.spark.launcher.LauncherServerSuite/testCommunication



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14391) Flaky Test org.apache.spark.launcher.LauncherServerSuite.testCommunication

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14391.
---
  Resolution: Fixed
Assignee: Marcelo Vanzin
   Fix Version/s: 2.0.0
  1.6.2
Target Version/s: 1.6.2, 2.0.0

> Flaky Test org.apache.spark.launcher.LauncherServerSuite.testCommunication
> --
>
> Key: SPARK-14391
> URL: https://issues.apache.org/jira/browse/SPARK-14391
> Project: Spark
>  Issue Type: Test
>  Components: Spark Submit
>Reporter: Burak Yavuz
>Assignee: Marcelo Vanzin
>  Labels: flaky-test
> Fix For: 1.6.2, 2.0.0
>
>
> Several Failures:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54893
> https://spark-tests.appspot.com/tests/org.apache.spark.launcher.LauncherServerSuite/testCommunication



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14127) [Table related commands] Describe table

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14127:
--
Assignee: (was: Andrew Or)

> [Table related commands] Describe table
> ---
>
> Key: SPARK-14127
> URL: https://issues.apache.org/jira/browse/SPARK-14127
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> TOK_DESCTABLE
> Describe a column/table/partition (see here and here). Seems we support 
> DESCRIBE and DESCRIBE EXTENDED. It will be good to also support other 
> syntaxes (and check if we are missing anything).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14384) Remove alterFunction from SessionCatalog

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14384.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Remove alterFunction from SessionCatalog
> 
>
> Key: SPARK-14384
> URL: https://issues.apache.org/jira/browse/SPARK-14384
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
> Fix For: 2.0.0
>
>
> Seems there is no way to call alterFunction (it is not allowed in SQL right 
> now). So, let's remove it for now. When we want to support it, we can add it 
> back and make sure that we implement the right semantic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14441) Consolidate DDL tests

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14441:
--
Description: Today we have DDLSuite, DDLCommandSuite, HiveDDLCommandSuite. 
It's confusing whether a test should exist in one or the other. It also makes 
it less clear whether our test coverage is comprehensive. Ideally we should 
consolidate these files as much as possible.

> Consolidate DDL tests
> -
>
> Key: SPARK-14441
> URL: https://issues.apache.org/jira/browse/SPARK-14441
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>
> Today we have DDLSuite, DDLCommandSuite, HiveDDLCommandSuite. It's confusing 
> whether a test should exist in one or the other. It also makes it less clear 
> whether our test coverage is comprehensive. Ideally we should consolidate 
> these files as much as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14441) Consolidate DDL tests

2016-04-06 Thread Andrew Or (JIRA)

Andrew Or created SPARK-14441:
-

 Summary: Consolidate DDL tests
 Key: SPARK-14441
 URL: https://issues.apache.org/jira/browse/SPARK-14441
 Project: Spark
  Issue Type: Sub-task
  Components: SQL, Tests
Affects Versions: 2.0.0
Reporter: Andrew Or






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14132) [Table related commands] Alter partition

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14132:
--
Description: 
For alter column command, we have the following tokens.
TOK_ALTERTABLE_ADDPARTS
TOK_ALTERTABLE_DROPPARTS
TOK_MSCK
TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE

For data source tables, we should throw exceptions.

For Hive tables, we should support add and drop partitions. For now, it should 
be fine to throw an exception for the rest.

  was:
For alter column command, we have the following tokens.
TOK_ALTERTABLE_ADDPARTS
TOK_ALTERTABLE_DROPPARTS
TOK_MSCK
TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE

For data source tables, we should throw exceptions.

For Hive tables, we should support them. For now, it should be fine to throw an 
exception for TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE.


> [Table related commands] Alter partition
> 
>
> Key: SPARK-14132
> URL: https://issues.apache.org/jira/browse/SPARK-14132
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Andrew Or
>
> For alter column command, we have the following tokens.
> TOK_ALTERTABLE_ADDPARTS
> TOK_ALTERTABLE_DROPPARTS
> TOK_MSCK
> TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE
> For data source tables, we should throw exceptions.
> For Hive tables, we should support add and drop partitions. For now, it 
> should be fine to throw an exception for the rest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14132) [Table related commands] Alter partition

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14132:
--
Description: 
For alter column command, we have the following tokens.
TOK_ALTERTABLE_ADDPARTS
TOK_ALTERTABLE_DROPPARTS
TOK_MSCK
TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE

For data source tables, we should throw exceptions.

For Hive tables, we should support them. For now, it should be fine to throw an 
exception for TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE.

  was:
For alter column command, we have the following tokens.
TOK_ALTERTABLE_ADDPARTS
TOK_ALTERTABLE_DROPPARTS
TOK_ALTERTABLE_CLUSTER_SORT
TOK_MSCK
TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE

For data source tables, we should throw exceptions.

For Hive tables, we should support them. For now, it should be fine to throw an 
exception for TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE.


> [Table related commands] Alter partition
> 
>
> Key: SPARK-14132
> URL: https://issues.apache.org/jira/browse/SPARK-14132
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Andrew Or
>
> For alter column command, we have the following tokens.
> TOK_ALTERTABLE_ADDPARTS
> TOK_ALTERTABLE_DROPPARTS
> TOK_MSCK
> TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE
> For data source tables, we should throw exceptions.
> For Hive tables, we should support them. For now, it should be fine to throw 
> an exception for TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14383) Missing "|" in the g4 definition

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14383.
---
  Resolution: Fixed
Assignee: Bo Meng
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Missing "|" in the g4 definition
> 
>
> Key: SPARK-14383
> URL: https://issues.apache.org/jira/browse/SPARK-14383
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Assignee: Bo Meng
>Priority: Trivial
> Fix For: 2.0.0
>
>
> It is really a trivial bug in the g4 file I found. It is missing a "|" 
> between DISTRIBUTE and UNSET. I will post the PR shortly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14429) Improve LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14429.
---
  Resolution: Fixed
Assignee: Bo Meng
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Improve LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE " DDL
> 
>
> Key: SPARK-14429
> URL: https://issues.apache.org/jira/browse/SPARK-14429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Bo Meng
>Assignee: Bo Meng
>Priority: Minor
> Fix For: 2.0.0
>
>
> LIKE  is commonly used in {code:SQL} SHOW TABLES / FUNCTIONS {code} 
> etc DDL. In the pattern, user can use `|` or `\*` as wildcards.
> I'd like to address a few issues in this JIRA:
> # Currently, we used `replaceAll()` to replace `\*` with `.\*`, but the 
> replacement was scattered in several places; it is good to have one place to 
> do the same thing;
> # Consistency with Hive: the pattern is case insensitive in Hive and white 
> spaces will be trimmed, but current pattern matching does not do that. For 
> example, suppose we have tables t1, t2, t3, {code:SQL}SHOW TABLES LIKE '  T*  
> '; {code} will list all the t-tables. 
> # Sort the result.
> Please use Hive to verify it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14426) Merge PerserUtils and ParseUtils

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14426.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Merge PerserUtils and ParseUtils
> 
>
> Key: SPARK-14426
> URL: https://issues.apache.org/jira/browse/SPARK-14426
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
> Fix For: 2.0.0
>
>
> We have ParserUtils and ParseUtils which are both utility collections for use 
> during the parsing process.
> Those name and what they are used for is very similar so I think we can merge 
> them.
> Also, the original unescapeSQLString method may have a fault. When "\u0061" 
> style character literals are passed to the method, it's not unescaped 
> successfully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14426) Merge PerserUtils and ParseUtils

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14426:
--
Assignee: Kousuke Saruta

> Merge PerserUtils and ParseUtils
> 
>
> Key: SPARK-14426
> URL: https://issues.apache.org/jira/browse/SPARK-14426
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
> Fix For: 2.0.0
>
>
> We have ParserUtils and ParseUtils which are both utility collections for use 
> during the parsing process.
> Those name and what they are used for is very similar so I think we can merge 
> them.
> Also, the original unescapeSQLString method may have a fault. When "\u0061" 
> style character literals are passed to the method, it's not unescaped 
> successfully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14426) Merge ParserUtils and ParseUtils

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14426:
--
Summary: Merge ParserUtils and ParseUtils  (was: Merge PerserUtils and 
ParseUtils)

> Merge ParserUtils and ParseUtils
> 
>
> Key: SPARK-14426
> URL: https://issues.apache.org/jira/browse/SPARK-14426
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
> Fix For: 2.0.0
>
>
> We have ParserUtils and ParseUtils which are both utility collections for use 
> during the parsing process.
> Those name and what they are used for is very similar so I think we can merge 
> them.
> Also, the original unescapeSQLString method may have a fault. When "\u0061" 
> style character literals are passed to the method, it's not unescaped 
> successfully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14057) sql time stamps do not respect time zones

2016-04-06 Thread Andrew Davidson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228529#comment-15228529
 ] 

Andrew Davidson commented on SPARK-14057:
-

Hi Vijay

I should have some time to take a look tomorrow

Kind regards

Andy

From:  "Vijay Parmar (JIRA)" 
Date:  Tuesday, April 5, 2016 at 3:23 PM
To:  Andrew Davidson 
Subject:  [jira] [Commented] (SPARK-14057) sql time stamps do not respect
time zones





> sql time stamps do not respect time zones
> -
>
> Key: SPARK-14057
> URL: https://issues.apache.org/jira/browse/SPARK-14057
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Andrew Davidson
>Priority: Minor
>
> we have time stamp data. The time stamp data is UTC how ever when we load the 
> data into spark data frames, the system assume the time stamps are in the 
> local time zone. This causes problems for our data scientists. Often they 
> pull data from our data center into their local macs. The data centers run 
> UTC. There computers are typically in PST or EST.
> It is possible to hack around this problem
> This cause a lot of errors in their analysis
> A complete description of this issue can be found in the following mail msg
> https://www.mail-archive.com/user@spark.apache.org/msg48121.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14410) SessionCatalog needs to check function existence

2016-04-06 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-14410:
-

Assignee: Andrew Or

> SessionCatalog needs to check function existence 
> -
>
> Key: SPARK-14410
> URL: https://issues.apache.org/jira/browse/SPARK-14410
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Andrew Or
>
> Right now, operations for an existing functions in SessionCatalog do not 
> really check if the function exists. We should add this check and avoid of 
> doing the check in command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14410) SessionCatalog needs to check function existence

2016-04-05 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14410:
--
Summary: SessionCatalog needs to check function existence   (was: 
SessionCatalog needs to check database/table/function existence )

> SessionCatalog needs to check function existence 
> -
>
> Key: SPARK-14410
> URL: https://issues.apache.org/jira/browse/SPARK-14410
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Yin Huai
>
> Right now, operations for an existing db/table/function in SessionCatalog do 
> not really check if the db/table/function exists. We should add this check 
> and avoid of doing the check in command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14410) SessionCatalog needs to check function existence

2016-04-05 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14410:
--
Description: Right now, operations for an existing functions in 
SessionCatalog do not really check if the function exists. We should add this 
check and avoid of doing the check in command.  (was: Right now, operations for 
an existing db/table/function in SessionCatalog do not really check if the 
db/table/function exists. We should add this check and avoid of doing the check 
in command.)

> SessionCatalog needs to check function existence 
> -
>
> Key: SPARK-14410
> URL: https://issues.apache.org/jira/browse/SPARK-14410
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Yin Huai
>
> Right now, operations for an existing functions in SessionCatalog do not 
> really check if the function exists. We should add this check and avoid of 
> doing the check in command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14132) [Table related commands] Alter partition

2016-04-05 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-14132:
-

Assignee: Andrew Or

> [Table related commands] Alter partition
> 
>
> Key: SPARK-14132
> URL: https://issues.apache.org/jira/browse/SPARK-14132
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Andrew Or
>
> For alter column command, we have the following tokens.
> TOK_ALTERTABLE_ADDPARTS
> TOK_ALTERTABLE_DROPPARTS
> TOK_ALTERTABLE_CLUSTER_SORT
> TOK_MSCK
> TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE
> For data source tables, we should throw exceptions.
> For Hive tables, we should support them. For now, it should be fine to throw 
> an exception for TOK_ALTERTABLE_ARCHIVE/TOK_ALTERTABLE_UNARCHIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14252) Executors do not try to download remote cached blocks

2016-04-05 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14252.
---
   Resolution: Fixed
 Assignee: Eric Liang
Fix Version/s: 2.0.0

> Executors do not try to download remote cached blocks
> -
>
> Key: SPARK-14252
> URL: https://issues.apache.org/jira/browse/SPARK-14252
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Marcelo Vanzin
>Assignee: Eric Liang
> Fix For: 2.0.0
>
>
> I noticed this when taking a look at the root cause of SPARK-14209. 2.0.0 
> includes SPARK-12817, which changed the caching code a bit to remove 
> duplication. But it seems to have removed the part where executors check 
> whether other executors contain the needed cached block.
> In 1.6, that was done by the call to {{BlockManager.get}} in 
> {{CacheManager.getOrCompute}}. But in the new version, {{RDD.iterator}} calls 
> {{BlockManager.getOrElseUpdate}}, which never calls {{BlockManager.get}}, and 
> thus the executor never gets block that are cached by other executors, 
> causing the blocks to be instead recomputed locally.
> I wrote a small program that shows this. In 1.6, running with 
> {{--num-executors 2}}, I get 5 blocks cached on each executor, and messages 
> like these in the logs:
> {noformat}
> 16/03/29 13:18:01 DEBUG spark.CacheManager: Looking for partition rdd_0_7
> 16/03/29 13:18:01 DEBUG storage.BlockManager: Getting local block rdd_0_7
> 16/03/29 13:18:01 DEBUG storage.BlockManager: Block rdd_0_7 not registered 
> locally
> 16/03/29 13:18:01 DEBUG storage.BlockManager: Getting remote block rdd_0_7
> 16/03/29 13:18:01 DEBUG storage.BlockManager: Getting remote block rdd_0_7 
> from BlockManagerId(1, blah, 58831)
> 1
> {noformat}
> On 2.0, I get (almost) all the 10 partitions cached on *both* executors, 
> because once the second one fails to find a block locally it just recomputes 
> it and caches it. It never tries to download the block from the other 
> executor.  The log messages above, which still exist in the code, don't show 
> up anywhere.
> Here's the code I used for the above (trimmed of some other stuff from my 
> little test harness, so might not compile as is):
> {code}
> val sc = new SparkContext(conf)
> try {
>   val rdd = sc.parallelize(1 to 1000, 10)
>   rdd.cache()
>   rdd.count()
>   // Create a single task that will sleep and block, so that a particular 
> executor is busy.
>   // This should force future tasks to download cached data from that 
> executor.
>   println("Running sleep job..")
>   val thread =  new Thread(new Runnable() {
> override def run(): Unit = {
>   rdd.mapPartitionsWithIndex { (i, iter) =>
> if (i == 0) {
>   Thread.sleep(TimeUnit.MINUTES.toMillis(10))
> }
> iter
>   }.count()
> }
>   })
>   thread.setDaemon(true)
>   thread.start()
>   // Wait a few seconds to make sure the task is running (too lazy for 
> listeners)
>   println("Waiting for tasks to start...")
>   TimeUnit.SECONDS.sleep(10)
>   // Now run a job that will touch everything and should use the cached 
> data.
>   val cnt = rdd.map(_*2).count()
>   println(s"Counted $cnt elements.")
>   println("Killing sleep job.")
>   thread.interrupt()
>   thread.join()
> } finally {
>   sc.stop()
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14416) Add thread-safe comments for CoarseGrainedSchedulerBackend's fields

2016-04-05 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14416.
---
   Resolution: Fixed
 Assignee: Shixiong Zhu
Fix Version/s: 2.0.0

> Add thread-safe comments for CoarseGrainedSchedulerBackend's fields
> ---
>
> Key: SPARK-14416
> URL: https://issues.apache.org/jira/browse/SPARK-14416
> Project: Spark
>  Issue Type: Improvement
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Minor
> Fix For: 2.0.0
>
>
> While I was reviewing https://github.com/apache/spark/pull/12078, I found 
> most of CoarseGrainedSchedulerBackend's mutable fields doesn't have any 
> comments about the thread-safe assumptions and it's hard for people to figure 
> out which part of codes should be protected by the lock. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14416) Add thread-safe comments for CoarseGrainedSchedulerBackend's fields

2016-04-05 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14416:
--
Target Version/s: 2.0.0

> Add thread-safe comments for CoarseGrainedSchedulerBackend's fields
> ---
>
> Key: SPARK-14416
> URL: https://issues.apache.org/jira/browse/SPARK-14416
> Project: Spark
>  Issue Type: Improvement
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Minor
> Fix For: 2.0.0
>
>
> While I was reviewing https://github.com/apache/spark/pull/12078, I found 
> most of CoarseGrainedSchedulerBackend's mutable fields doesn't have any 
> comments about the thread-safe assumptions and it's hard for people to figure 
> out which part of codes should be protected by the lock. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14414) Make error messages consistent across DDLs

2016-04-05 Thread Andrew Or (JIRA)

Andrew Or created SPARK-14414:
-

 Summary: Make error messages consistent across DDLs
 Key: SPARK-14414
 URL: https://issues.apache.org/jira/browse/SPARK-14414
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Andrew Or


There are many different error messages right now when the user tries to run 
something that's not supported. We might throw AnalysisException or 
ParseException or NoSuchFunctionException etc. We should make all of these 
consistent before 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14243) updatedBlockStatuses does not update correctly when removing blocks

2016-04-05 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14243.
---
Resolution: Fixed

> updatedBlockStatuses does not update correctly when removing blocks
> ---
>
> Key: SPARK-14243
> URL: https://issues.apache.org/jira/browse/SPARK-14243
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.1
>Reporter: jeanlyn
>Assignee: jeanlyn
> Fix For: 1.6.2, 2.0.0
>
>
> Currently, *updatedBlockStatuses* of *TaskMetrics* does not update correctly 
> when removing blocks in *BlockManager.removeBlock* and the method invoke 
> *removeBlock*. See:
> branch-1.6:https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1108
> branch-1.5:https://github.com/apache/spark/blob/branch-1.5/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1101
> We should make sure *updatedBlockStatuses* update correctly when:
> * Block removed from BlockManager
> * Block dropped from memory to disk
> * Block added to BlockManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14243) updatedBlockStatuses does not update correctly when removing blocks

2016-04-05 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14243:
--
Fix Version/s: 1.6.2

> updatedBlockStatuses does not update correctly when removing blocks
> ---
>
> Key: SPARK-14243
> URL: https://issues.apache.org/jira/browse/SPARK-14243
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.1
>Reporter: jeanlyn
>Assignee: jeanlyn
> Fix For: 1.6.2, 2.0.0
>
>
> Currently, *updatedBlockStatuses* of *TaskMetrics* does not update correctly 
> when removing blocks in *BlockManager.removeBlock* and the method invoke 
> *removeBlock*. See:
> branch-1.6:https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1108
> branch-1.5:https://github.com/apache/spark/blob/branch-1.5/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1101
> We should make sure *updatedBlockStatuses* update correctly when:
> * Block removed from BlockManager
> * Block dropped from memory to disk
> * Block added to BlockManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14123) Function related commands

2016-04-05 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14123.
---
   Resolution: Fixed
 Assignee: Liang-Chi Hsieh
Fix Version/s: 2.0.0

> Function related commands
> -
>
> Key: SPARK-14123
> URL: https://issues.apache.org/jira/browse/SPARK-14123
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Liang-Chi Hsieh
> Fix For: 2.0.0
>
>
> We should support TOK_CREATEFUNCTION/TOK_DROPFUNCTION.
> For now, we can throw exceptions for TOK_CREATEMACRO/TOK_DROPMACRO.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14385) Use FunctionIdentifier in FunctionRegistry/SessionCatalog

2016-04-04 Thread Andrew Or (JIRA)

Andrew Or created SPARK-14385:
-

 Summary: Use FunctionIdentifier in FunctionRegistry/SessionCatalog
 Key: SPARK-14385
 URL: https://issues.apache.org/jira/browse/SPARK-14385
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Andrew Or
Assignee: Yin Huai


Right now it's confusing what's a qualified name or not. There's little 
type-safety in this corner of the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14385) Use FunctionIdentifier in FunctionRegistry/SessionCatalog

2016-04-04 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14385:
--
Assignee: (was: Yin Huai)

> Use FunctionIdentifier in FunctionRegistry/SessionCatalog
> -
>
> Key: SPARK-14385
> URL: https://issues.apache.org/jira/browse/SPARK-14385
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>
> Right now it's confusing what's a qualified name or not. There's little 
> type-safety in this corner of the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-11327) spark-dispatcher doesn't pass along some spark properties

2016-04-04 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11327:
--
Fix Version/s: 1.6.2

> spark-dispatcher doesn't pass along some spark properties
> -
>
> Key: SPARK-11327
> URL: https://issues.apache.org/jira/browse/SPARK-11327
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Reporter: Alan Braithwaite
>Assignee: Jo Voordeckers
> Fix For: 1.6.2, 2.0.0
>
>
> I haven't figured out exactly what's going on yet, but there's something in 
> the spark-dispatcher which is failing to pass along properties to the 
> spark-driver when using spark-submit in a clustered mesos docker environment.
> Most importantly, it's not passing along spark.mesos.executor.docker.image.
> cli:
> {code}
> docker run -t -i --rm --net=host 
> --entrypoint=/usr/local/spark/bin/spark-submit 
> docker.example.com/spark:2015.10.2 --conf spark.driver.memory=8G --conf 
> spark.mesos.executor.docker.image=docker.example.com/spark:2015.10.2 --master 
> mesos://spark-dispatcher.example.com:31262 --deploy-mode cluster 
> --properties-file /usr/local/spark/conf/spark-defaults.conf --class 
> com.example.spark.streaming.MyApp 
> http://jarserver.example.com:8000/sparkapp.jar zk1.example.com:2181 
> spark-testing my-stream 40
> {code}
> submit output:
> {code}
> 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request to launch 
> an application in mesos://compute1.example.com:31262.
> 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending POST request to server 
> at http://compute1.example.com:31262/v1/submissions/create:
> {
>   "action" : "CreateSubmissionRequest",
>   "appArgs" : [ "zk1.example.com:2181", "spark-testing", "requests", "40" ],
>   "appResource" : "http://jarserver.example.com:8000/sparkapp.jar;,
>   "clientSparkVersion" : "1.5.0",
>   "environmentVariables" : {
> "SPARK_SCALA_VERSION" : "2.10",
> "SPARK_CONF_DIR" : "/usr/local/spark/conf",
> "SPARK_HOME" : "/usr/local/spark",
> "SPARK_ENV_LOADED" : "1"
>   },
>   "mainClass" : "com.example.spark.streaming.MyApp",
>   "sparkProperties" : {
> "spark.serializer" : "org.apache.spark.serializer.KryoSerializer",
> "spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY" : 
> "/usr/local/lib/libmesos.so",
> "spark.history.fs.logDirectory" : "hdfs://hdfsha.example.com/spark/logs",
> "spark.eventLog.enabled" : "true",
> "spark.driver.maxResultSize" : "0",
> "spark.mesos.deploy.recoveryMode" : "ZOOKEEPER",
> "spark.mesos.deploy.zookeeper.url" : 
> "zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181,zk4.example.com:2181,zk5.example.com:2181",
> "spark.jars" : "http://jarserver.example.com:8000/sparkapp.jar;,
> "spark.driver.supervise" : "false",
> "spark.app.name" : "com.example.spark.streaming.MyApp",
> "spark.driver.memory" : "8G",
> "spark.logConf" : "true",
> "spark.deploy.zookeeper.dir" : "/spark_mesos_dispatcher",
> "spark.mesos.executor.docker.image" : 
> "docker.example.com/spark-prod:2015.10.2",
> "spark.submit.deployMode" : "cluster",
> "spark.master" : "mesos://compute1.example.com:31262",
> "spark.executor.memory" : "8G",
> "spark.eventLog.dir" : "hdfs://hdfsha.example.com/spark/logs",
> "spark.mesos.docker.executor.network" : "HOST",
> "spark.mesos.executor.home" : "/usr/local/spark"
>   }
> }
> 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server:
> {
>   "action" : "CreateSubmissionResponse",
>   "serverSparkVersion" : "1.5.0",
>   "submissionId" : "driver-20151026220353-0011",
>   "success" : true
> }
> 15/10/26 22:03:53 INFO RestSubmissionClient: Submission successfully created 
> as driver-20151026220353-0011. Polling submission state...
> 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request for the 
> status of submission driver-20151026220353-0011 in 
> mesos://compute1.example.com:31262.
> 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending GET request to server 
> at 
> http://compute1.example.com:31262/v1/submissions/status/driver-20151026220353-0011.
> 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server:
> {
>   "action" : "SubmissionStatusResponse",
>   "driverState" : "QUEUED",
>   "serverSparkVersion" : "1.5.0",
>   "submissionId" : "driver-20151026220353-0011",
>   "success" : true
> }
> 15/10/26 22:03:53 INFO RestSubmissionClient: State of driver 
> driver-20151026220353-0011 is now QUEUED.
> 15/10/26 22:03:53 INFO RestSubmissionClient: Server responded with 
> CreateSubmissionResponse:
> {
>   "action" : "CreateSubmissionResponse",
>   "serverSparkVersion" : "1.5.0",
>   "submissionId" : "driver-20151026220353-0011",
>   "success" : true
> }
> {code}
> driver log:
> {code}
> 15/10/26 22:08:08 INFO SparkContext: Running

[jira] [Updated] (SPARK-11327) spark-dispatcher doesn't pass along some spark properties

2016-04-04 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11327:
--
Target Version/s: 1.6.2, 2.0.0  (was: 2.0.0)

> spark-dispatcher doesn't pass along some spark properties
> -
>
> Key: SPARK-11327
> URL: https://issues.apache.org/jira/browse/SPARK-11327
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Reporter: Alan Braithwaite
>Assignee: Jo Voordeckers
> Fix For: 1.6.2, 2.0.0
>
>
> I haven't figured out exactly what's going on yet, but there's something in 
> the spark-dispatcher which is failing to pass along properties to the 
> spark-driver when using spark-submit in a clustered mesos docker environment.
> Most importantly, it's not passing along spark.mesos.executor.docker.image.
> cli:
> {code}
> docker run -t -i --rm --net=host 
> --entrypoint=/usr/local/spark/bin/spark-submit 
> docker.example.com/spark:2015.10.2 --conf spark.driver.memory=8G --conf 
> spark.mesos.executor.docker.image=docker.example.com/spark:2015.10.2 --master 
> mesos://spark-dispatcher.example.com:31262 --deploy-mode cluster 
> --properties-file /usr/local/spark/conf/spark-defaults.conf --class 
> com.example.spark.streaming.MyApp 
> http://jarserver.example.com:8000/sparkapp.jar zk1.example.com:2181 
> spark-testing my-stream 40
> {code}
> submit output:
> {code}
> 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request to launch 
> an application in mesos://compute1.example.com:31262.
> 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending POST request to server 
> at http://compute1.example.com:31262/v1/submissions/create:
> {
>   "action" : "CreateSubmissionRequest",
>   "appArgs" : [ "zk1.example.com:2181", "spark-testing", "requests", "40" ],
>   "appResource" : "http://jarserver.example.com:8000/sparkapp.jar;,
>   "clientSparkVersion" : "1.5.0",
>   "environmentVariables" : {
> "SPARK_SCALA_VERSION" : "2.10",
> "SPARK_CONF_DIR" : "/usr/local/spark/conf",
> "SPARK_HOME" : "/usr/local/spark",
> "SPARK_ENV_LOADED" : "1"
>   },
>   "mainClass" : "com.example.spark.streaming.MyApp",
>   "sparkProperties" : {
> "spark.serializer" : "org.apache.spark.serializer.KryoSerializer",
> "spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY" : 
> "/usr/local/lib/libmesos.so",
> "spark.history.fs.logDirectory" : "hdfs://hdfsha.example.com/spark/logs",
> "spark.eventLog.enabled" : "true",
> "spark.driver.maxResultSize" : "0",
> "spark.mesos.deploy.recoveryMode" : "ZOOKEEPER",
> "spark.mesos.deploy.zookeeper.url" : 
> "zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181,zk4.example.com:2181,zk5.example.com:2181",
> "spark.jars" : "http://jarserver.example.com:8000/sparkapp.jar;,
> "spark.driver.supervise" : "false",
> "spark.app.name" : "com.example.spark.streaming.MyApp",
> "spark.driver.memory" : "8G",
> "spark.logConf" : "true",
> "spark.deploy.zookeeper.dir" : "/spark_mesos_dispatcher",
> "spark.mesos.executor.docker.image" : 
> "docker.example.com/spark-prod:2015.10.2",
> "spark.submit.deployMode" : "cluster",
> "spark.master" : "mesos://compute1.example.com:31262",
> "spark.executor.memory" : "8G",
> "spark.eventLog.dir" : "hdfs://hdfsha.example.com/spark/logs",
> "spark.mesos.docker.executor.network" : "HOST",
> "spark.mesos.executor.home" : "/usr/local/spark"
>   }
> }
> 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server:
> {
>   "action" : "CreateSubmissionResponse",
>   "serverSparkVersion" : "1.5.0",
>   "submissionId" : "driver-20151026220353-0011",
>   "success" : true
> }
> 15/10/26 22:03:53 INFO RestSubmissionClient: Submission successfully created 
> as driver-20151026220353-0011. Polling submission state...
> 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request for the 
> status of submission driver-20151026220353-0011 in 
> mesos://compute1.example.com:31262.
> 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending GET request to server 
> at 
> http://compute1.example.com:31262/v1/submissions/status/driver-20151026220353-0011.
> 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server:
> {
>   "action" : "SubmissionStatusResponse",
>   "driverState" : "QUEUED",
>   "serverSparkVersion" : "1.5.0",
>   "submissionId" : "driver-20151026220353-0011",
>   "success" : true
> }
> 15/10/26 22:03:53 INFO RestSubmissionClient: State of driver 
> driver-20151026220353-0011 is now QUEUED.
> 15/10/26 22:03:53 INFO RestSubmissionClient: Server responded with 
> CreateSubmissionResponse:
> {
>   "action" : "CreateSubmissionResponse",
>   "serverSparkVersion" : "1.5.0",
>   "submissionId" : "driver-20151026220353-0011",
>   "success" : true
> }
> {code}
> driver log:
> {code}
> 15/10/26 22:08:08

[jira] [Resolved] (SPARK-14358) Change SparkListener from a trait to an abstract class, and remove JavaSparkListener

2016-04-04 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14358.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Change SparkListener from a trait to an abstract class, and remove 
> JavaSparkListener
> 
>
> Key: SPARK-14358
> URL: https://issues.apache.org/jira/browse/SPARK-14358
> Project: Spark
>  Issue Type: Sub-task
>  Components: Scheduler, Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.0.0
>
>
> Scala traits are difficult to maintain binary compatibility on, and as a 
> result we had to introduce JavaSparkListener. In Spark 2.0 we can change 
> SparkListener from a trait to an abstract class and then remove 
> JavaSparkListener.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14364) HeartbeatReceiver object should be private[spark]

2016-04-04 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14364.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> HeartbeatReceiver object should be private[spark]
> -
>
> Key: SPARK-14364
> URL: https://issues.apache.org/jira/browse/SPARK-14364
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.0.0
>
>
> It's a mistake that HeartbeatReceiver object was made public in Spark 1.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13845) BlockStatus and StreamBlockId keep on growing result driver OOM

2016-04-01 Thread Andrew Or (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222370#comment-15222370
 ] 

Andrew Or commented on SPARK-13845:
---

Thanks josh

> BlockStatus and StreamBlockId keep on growing result driver OOM
> ---
>
> Key: SPARK-13845
> URL: https://issues.apache.org/jira/browse/SPARK-13845
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.1
>Reporter: jeanlyn
>Assignee: jeanlyn
> Fix For: 1.6.2, 2.0.0
>
>
> We have a streaming job using *FlumePollInputStream* always driver OOM after 
> few days, here is some driver heap dump before OOM
> {noformat}
>  num #instances #bytes  class name
> --
>1:  13845916  553836640  org.apache.spark.storage.BlockStatus
>2:  14020324  336487776  org.apache.spark.storage.StreamBlockId
>3:  13883881  333213144  scala.collection.mutable.DefaultEntry
>4:  8907   89043952  [Lscala.collection.mutable.HashEntry;
>5: 62360   65107352  [B
>6:163368   24453904  [Ljava.lang.Object;
>7:293651   20342664  [C
> ...
> {noformat}
> *BlockStatus* and *StreamBlockId* keep on growing, and the driver OOM in the 
> end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14333) Duration of task should be the total time (not just computation time)

2016-04-01 Thread Andrew Or (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222189#comment-15222189
 ] 

Andrew Or commented on SPARK-14333:
---

can you post a screenshot too

> Duration of task should be the total time (not just computation time)
> -
>
> Key: SPARK-14333
> URL: https://issues.apache.org/jira/browse/SPARK-14333
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Davies Liu
>
> Right now, the duration of a task in Stage page is task computation time, it 
> should be total time (including serialization and fetching results). We 
> should also have a separate column for computation time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-14294) Support native execution of ALTER TABLE ... RENAME TO

2016-03-31 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-14294.
-
Resolution: Duplicate
  Assignee: Andrew Or

> Support native execution of ALTER TABLE ... RENAME TO
> -
>
> Key: SPARK-14294
> URL: https://issues.apache.org/jira/browse/SPARK-14294
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Bo Meng
>Assignee: Andrew Or
>Priority: Minor
>
> Support native execution of ALTER TABLE ... RENAME TO
> The syntax for ALTER TABLE ... RENAME TO commands is described as following:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenameTable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14129) [Table related commands] Alter table

2016-03-31 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-14129:
-

Assignee: Andrew Or

> [Table related commands] Alter table
> 
>
> Key: SPARK-14129
> URL: https://issues.apache.org/jira/browse/SPARK-14129
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Andrew Or
>
> For alter table command, we have the following tokens. 
> TOK_ALTERTABLE_RENAME
> TOK_ALTERTABLE_LOCATION
> TOK_ALTERTABLE_PROPERTIES/TOK_ALTERTABLE_DROPPROPERTIES
> TOK_ALTERTABLE_SERIALIZER
> TOK_ALTERTABLE_SERDEPROPERTIES
> TOK_ALTERTABLE_CLUSTER_SORT
> TOK_ALTERTABLE_SKEWED
> For a data source table, let's implement TOK_ALTERTABLE_RENAME, 
> TOK_ALTERTABLE_LOCATION, and TOK_ALTERTABLE_SERDEPROPERTIES. We need to 
> decide what we do for 
> TOK_ALTERTABLE_PROPERTIES/TOK_ALTERTABLE_DROPPROPERTIES. It will be use to 
> allow users to correct the data format (e.g. changing csv to 
> com.databricks.spark.csv to allow the table be accessed by the older versions 
> of spark).
> For a Hive table, we should implement all commands supported by the data 
> source table and TOK_ALTERTABLE_PROPERTIES/TOK_ALTERTABLE_DROPPROPERTIES.
> For TOK_ALTERTABLE_CLUSTER_SORT and TOK_ALTERTABLE_SKEWED, we should throw 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14304) Fix tests that don't create temp files in the `java.io.tmpdir` folder

2016-03-31 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14304.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Fix tests that don't create temp files in the `java.io.tmpdir` folder
> -
>
> Key: SPARK-14304
> URL: https://issues.apache.org/jira/browse/SPARK-14304
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Minor
> Fix For: 2.0.0
>
>
> If I press `CTRL-C` when running these tests, the temp files will be left in 
> `sql/core` folder and I need to delete them manually. It's annoying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-11327) spark-dispatcher doesn't pass along some spark properties

2016-03-31 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-11327.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> spark-dispatcher doesn't pass along some spark properties
> -
>
> Key: SPARK-11327
> URL: https://issues.apache.org/jira/browse/SPARK-11327
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Reporter: Alan Braithwaite
> Fix For: 2.0.0
>
>
> I haven't figured out exactly what's going on yet, but there's something in 
> the spark-dispatcher which is failing to pass along properties to the 
> spark-driver when using spark-submit in a clustered mesos docker environment.
> Most importantly, it's not passing along spark.mesos.executor.docker.image.
> cli:
> {code}
> docker run -t -i --rm --net=host 
> --entrypoint=/usr/local/spark/bin/spark-submit 
> docker.example.com/spark:2015.10.2 --conf spark.driver.memory=8G --conf 
> spark.mesos.executor.docker.image=docker.example.com/spark:2015.10.2 --master 
> mesos://spark-dispatcher.example.com:31262 --deploy-mode cluster 
> --properties-file /usr/local/spark/conf/spark-defaults.conf --class 
> com.example.spark.streaming.MyApp 
> http://jarserver.example.com:8000/sparkapp.jar zk1.example.com:2181 
> spark-testing my-stream 40
> {code}
> submit output:
> {code}
> 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request to launch 
> an application in mesos://compute1.example.com:31262.
> 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending POST request to server 
> at http://compute1.example.com:31262/v1/submissions/create:
> {
>   "action" : "CreateSubmissionRequest",
>   "appArgs" : [ "zk1.example.com:2181", "spark-testing", "requests", "40" ],
>   "appResource" : "http://jarserver.example.com:8000/sparkapp.jar;,
>   "clientSparkVersion" : "1.5.0",
>   "environmentVariables" : {
> "SPARK_SCALA_VERSION" : "2.10",
> "SPARK_CONF_DIR" : "/usr/local/spark/conf",
> "SPARK_HOME" : "/usr/local/spark",
> "SPARK_ENV_LOADED" : "1"
>   },
>   "mainClass" : "com.example.spark.streaming.MyApp",
>   "sparkProperties" : {
> "spark.serializer" : "org.apache.spark.serializer.KryoSerializer",
> "spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY" : 
> "/usr/local/lib/libmesos.so",
> "spark.history.fs.logDirectory" : "hdfs://hdfsha.example.com/spark/logs",
> "spark.eventLog.enabled" : "true",
> "spark.driver.maxResultSize" : "0",
> "spark.mesos.deploy.recoveryMode" : "ZOOKEEPER",
> "spark.mesos.deploy.zookeeper.url" : 
> "zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181,zk4.example.com:2181,zk5.example.com:2181",
> "spark.jars" : "http://jarserver.example.com:8000/sparkapp.jar;,
> "spark.driver.supervise" : "false",
> "spark.app.name" : "com.example.spark.streaming.MyApp",
> "spark.driver.memory" : "8G",
> "spark.logConf" : "true",
> "spark.deploy.zookeeper.dir" : "/spark_mesos_dispatcher",
> "spark.mesos.executor.docker.image" : 
> "docker.example.com/spark-prod:2015.10.2",
> "spark.submit.deployMode" : "cluster",
> "spark.master" : "mesos://compute1.example.com:31262",
> "spark.executor.memory" : "8G",
> "spark.eventLog.dir" : "hdfs://hdfsha.example.com/spark/logs",
> "spark.mesos.docker.executor.network" : "HOST",
> "spark.mesos.executor.home" : "/usr/local/spark"
>   }
> }
> 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server:
> {
>   "action" : "CreateSubmissionResponse",
>   "serverSparkVersion" : "1.5.0",
>   "submissionId" : "driver-20151026220353-0011",
>   "success" : true
> }
> 15/10/26 22:03:53 INFO RestSubmissionClient: Submission successfully created 
> as driver-20151026220353-0011. Polling submission state...
> 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request for the 
> status of submission driver-20151026220353-0011 in 
> mesos://compute1.example.com:31262.
> 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending GET request to server 
> at 
> http://compute1.example.com:31262/v1/submissions/status/driver-20151026220353-0011.
> 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server:
> {
>   "action" : "SubmissionStatusResponse",
>   "driverState" : "QUEUED",
>   "serverSparkVersion" : "1.5.0",
>   "submissionId" : "driver-20151026220353-0011",
>   "success" : true
> }
> 15/10/26 22:03:53 INFO RestSubmissionClient: State of driver 
> driver-20151026220353-0011 is now QUEUED.
> 15/10/26 22:03:53 INFO RestSubmissionClient: Server responded with 
> CreateSubmissionResponse:
> {
>   "action" : "CreateSubmissionResponse",
>   "serverSparkVersion" : "1.5.0",
>   "submissionId" : "driver-20151026220353-0011",
>   "success" : true
> }
> {code}
> driver log:
> {code}
> 15/10/26 22:08:08 INFO

[jira] [Resolved] (SPARK-14069) Improve SparkStatusTracker to also track executor information

2016-03-31 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14069.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Improve SparkStatusTracker to also track executor information
> -
>
> Key: SPARK-14069
> URL: https://issues.apache.org/jira/browse/SPARK-14069
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14243) updatedBlockStatuses does not update correctly when removing blocks

2016-03-31 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14243:
--
Fix Version/s: 2.0.0

> updatedBlockStatuses does not update correctly when removing blocks
> ---
>
> Key: SPARK-14243
> URL: https://issues.apache.org/jira/browse/SPARK-14243
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.1
>Reporter: jeanlyn
>Assignee: jeanlyn
> Fix For: 2.0.0
>
>
> Currently, *updatedBlockStatuses* of *TaskMetrics* does not update correctly 
> when removing blocks in *BlockManager.removeBlock* and the method invoke 
> *removeBlock*. See:
> branch-1.6:https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1108
> branch-1.5:https://github.com/apache/spark/blob/branch-1.5/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1101
> We should make sure *updatedBlockStatuses* update correctly when:
> * Block removed from BlockManager
> * Block dropped from memory to disk
> * Block added to BlockManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-14243) updatedBlockStatuses does not update correctly when removing blocks

2016-03-31 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reopened SPARK-14243:
---

> updatedBlockStatuses does not update correctly when removing blocks
> ---
>
> Key: SPARK-14243
> URL: https://issues.apache.org/jira/browse/SPARK-14243
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.1
>Reporter: jeanlyn
>Assignee: jeanlyn
> Fix For: 2.0.0
>
>
> Currently, *updatedBlockStatuses* of *TaskMetrics* does not update correctly 
> when removing blocks in *BlockManager.removeBlock* and the method invoke 
> *removeBlock*. See:
> branch-1.6:https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1108
> branch-1.5:https://github.com/apache/spark/blob/branch-1.5/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1101
> We should make sure *updatedBlockStatuses* update correctly when:
> * Block removed from BlockManager
> * Block dropped from memory to disk
> * Block added to BlockManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14243) updatedBlockStatuses does not update correctly when removing blocks

2016-03-31 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14243.
---
  Resolution: Fixed
Target Version/s: 1.6.2, 2.0.0

> updatedBlockStatuses does not update correctly when removing blocks
> ---
>
> Key: SPARK-14243
> URL: https://issues.apache.org/jira/browse/SPARK-14243
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.1
>Reporter: jeanlyn
>Assignee: jeanlyn
>
> Currently, *updatedBlockStatuses* of *TaskMetrics* does not update correctly 
> when removing blocks in *BlockManager.removeBlock* and the method invoke 
> *removeBlock*. See:
> branch-1.6:https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1108
> branch-1.5:https://github.com/apache/spark/blob/branch-1.5/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1101
> We should make sure *updatedBlockStatuses* update correctly when:
> * Block removed from BlockManager
> * Block dropped from memory to disk
> * Block added to BlockManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14182) Parse DDL command: Alter View

2016-03-31 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14182.
---
  Resolution: Fixed
Assignee: Xiao Li
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Parse DDL command: Alter View
> -
>
> Key: SPARK-14182
> URL: https://issues.apache.org/jira/browse/SPARK-14182
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> Based on the Hive DDL document 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
> and 
> https://cwiki.apache.org/confluence/display/Hive/PartitionedViews
> The syntax of DDL command for {{ALTER VIEW}} include
> {code}
> ALTER VIEW view_name AS select_statement
> ALTER VIEW view_name RENAME TO new_view_name
> ALTER VIEW view_name SET TBLPROPERTIES ('comment' = new_comment);
> ALTER VIEW view_name UNSET TBLPROPERTIES [IF EXISTS] ('comment', 'key')
> ALTER VIEW view_name ADD [IF NOT EXISTS] PARTITION spec1[, PARTITION spec2, 
> ...]
> ALTER VIEW view_name DROP [IF EXISTS] PARTITION spec1[, PARTITION spec2, ...]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13796) Lock release errors occur frequently in executor logs

2016-03-31 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-13796:
--
Target Version/s: 2.0.0
   Fix Version/s: 2.0.0

> Lock release errors occur frequently in executor logs
> -
>
> Key: SPARK-13796
> URL: https://issues.apache.org/jira/browse/SPARK-13796
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Nishkam Ravi
>Priority: Minor
> Fix For: 2.0.0
>
>
> Executor logs contain a lot of these error messages (irrespective of the 
> workload):
> 16/03/08 17:53:07 ERROR executor.Executor: 1 block locks were not released by 
> TID = 1119



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13796) Lock release errors occur frequently in executor logs

2016-03-31 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-13796:
--
Assignee: Nishkam Ravi

> Lock release errors occur frequently in executor logs
> -
>
> Key: SPARK-13796
> URL: https://issues.apache.org/jira/browse/SPARK-13796
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Nishkam Ravi
>Assignee: Nishkam Ravi
>Priority: Minor
> Fix For: 2.0.0
>
>
> Executor logs contain a lot of these error messages (irrespective of the 
> workload):
> 16/03/08 17:53:07 ERROR executor.Executor: 1 block locks were not released by 
> TID = 1119



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14127) [Table related commands] Describe table

2016-03-29 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-14127:
-

Assignee: Andrew Or

> [Table related commands] Describe table
> ---
>
> Key: SPARK-14127
> URL: https://issues.apache.org/jira/browse/SPARK-14127
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Andrew Or
>
> TOK_DESCTABLE
> Describe a column/table/partition (see here and here). Seems we support 
> DESCRIBE and DESCRIBE EXTENDED. It will be good to also support other 
> syntaxes (and check if we are missing anything).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-14125) View related commands

2016-03-29 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14125:
--
Comment: was deleted

(was: User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/12014)

> View related commands
> -
>
> Key: SPARK-14125
> URL: https://issues.apache.org/jira/browse/SPARK-14125
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> We should support the following commands.
> TOK_ALTERVIEW_AS
> TOK_ALTERVIEW_PROPERTIES
> TOK_ALTERVIEW_RENAME
> TOK_DROPVIEW
> TOK_DROPVIEW_PROPERTIES
> TOK_DESCTABLE
> For TOK_ALTERVIEW_ADDPARTS/TOK_ALTERVIEW_DROPPARTS, we should throw 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14125) View related commands

2016-03-29 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14125:
--
Assignee: Xiao Li

> View related commands
> -
>
> Key: SPARK-14125
> URL: https://issues.apache.org/jira/browse/SPARK-14125
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Xiao Li
>
> We should support the following commands.
> TOK_ALTERVIEW_AS
> TOK_ALTERVIEW_PROPERTIES
> TOK_ALTERVIEW_RENAME
> TOK_DROPVIEW
> TOK_DROPVIEW_PROPERTIES
> TOK_DESCTABLE
> For TOK_ALTERVIEW_ADDPARTS/TOK_ALTERVIEW_DROPPARTS, we should throw 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14124) Database related commands

2016-03-29 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14124.
---
   Resolution: Fixed
 Assignee: Xiao Li
Fix Version/s: 2.0.0

> Database related commands
> -
>
> Key: SPARK-14124
> URL: https://issues.apache.org/jira/browse/SPARK-14124
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Xiao Li
> Fix For: 2.0.0
>
>
> We should support the following commands.
> TOK_CREATEDATABASE
> TOK_DESCDATABASE
> TOK_DROPDATABASE
> TOK_ALTERDATABASE_PROPERTIES
> For, TOK_ALTERDATABASE_OWNER, let's throw an exception for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14253) Avoid registering temporary functions in Hive

2016-03-29 Thread Andrew Or (JIRA)

Andrew Or created SPARK-14253:
-

 Summary: Avoid registering temporary functions in Hive
 Key: SPARK-14253
 URL: https://issues.apache.org/jira/browse/SPARK-14253
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Andrew Or
Assignee: Andrew Or


Spark should just handle all temporary functions ourselves instead of passing 
it to Hive, which is what the HiveFunctionRegistry does at the moment. The 
extra call to Hive is unnecessary and potentially slow, and it makes the 
semantics of a "temporary function" confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14120) Import/Export commands (Exception)

2016-03-29 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14120.
---
   Resolution: Fixed
 Assignee: Andrew Or
Fix Version/s: 2.0.0

> Import/Export commands (Exception)
> --
>
> Key: SPARK-14120
> URL: https://issues.apache.org/jira/browse/SPARK-14120
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> Tokens: TOK_IMPORT/TOK_EXPORT (Exception)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14119) Role management commands (Exception)

2016-03-29 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14119.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Role management commands (Exception)
> 
>
> Key: SPARK-14119
> URL: https://issues.apache.org/jira/browse/SPARK-14119
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> All role management commands should throw exceptions.
> TOK_CREATEROLE (Exception)
> TOK_DROPROLE (Exception)
> TOK_GRANT (Exception)
> TOK_GRANT_ROLE (Exception)
> TOK_SHOW_GRANT (Exception)
> TOK_SHOW_ROLE_GRANT (Exception)
> TOK_SHOW_ROLE_PRINCIPALS (Exception)
> TOK_SHOW_ROLES (Exception)
> TOK_SHOW_SET_ROLE (Exception)
> TOK_REVOKE (Exception)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14119) Role management commands (Exception)

2016-03-29 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-14119:
-

Assignee: Andrew Or

> Role management commands (Exception)
> 
>
> Key: SPARK-14119
> URL: https://issues.apache.org/jira/browse/SPARK-14119
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> All role management commands should throw exceptions.
> TOK_CREATEROLE (Exception)
> TOK_DROPROLE (Exception)
> TOK_GRANT (Exception)
> TOK_GRANT_ROLE (Exception)
> TOK_SHOW_GRANT (Exception)
> TOK_SHOW_ROLE_GRANT (Exception)
> TOK_SHOW_ROLE_PRINCIPALS (Exception)
> TOK_SHOW_ROLES (Exception)
> TOK_SHOW_SET_ROLE (Exception)
> TOK_REVOKE (Exception)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14122) Show commands (Exception)

2016-03-29 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14122.
---
   Resolution: Fixed
 Assignee: Andrew Or
Fix Version/s: 2.0.0

> Show commands (Exception)
> -
>
> Key: SPARK-14122
> URL: https://issues.apache.org/jira/browse/SPARK-14122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Andrew Or
> Fix For: 2.0.0
>
>
> For the following tokens, we should throw exceptions.
> TOK_SHOW_CREATETABLE (TBD)
> TOK_SHOW_COMPACTIONS (Exception)
> TOK_SHOW_TRANSACTIONS (Exception)
> TOK_SHOWINDEXES (Exception)
> TOK_SHOWLOCKS (Exception)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-10570) Add Spark version endpoint to standalone JSON API

2016-03-29 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-10570.
---
  Resolution: Fixed
Assignee: Jakob Odersky
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Add Spark version endpoint to standalone JSON API
> -
>
> Key: SPARK-10570
> URL: https://issues.apache.org/jira/browse/SPARK-10570
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, Web UI
>Affects Versions: 1.4.1
>Reporter: Matt Cheah
>Assignee: Jakob Odersky
>Priority: Minor
>  Labels: api
> Fix For: 2.0.0
>
>
> Let's have an ability to navigate to /api/vX/version to get the Spark version 
> running on the cluster.
> The use case I had in mind is verifying that my Spark application is 
> compatible with the version of the cluster that my user has installed. I want 
> to do this without starting a SparkContext and having the context 
> initialization fail.
> This is a strictly additive change, so I think it can just be added to the v1 
> API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14232) Event timeline on job page doesn't show if an executor is removed with multiple line reason

2016-03-29 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-14232.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
  1.6.2
Target Version/s: 1.6.2, 2.0.0

> Event timeline on job page doesn't show if an executor is removed with 
> multiple line reason
> ---
>
> Key: SPARK-14232
> URL: https://issues.apache.org/jira/browse/SPARK-14232
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.1
>Reporter: Carson Wang
>Assignee: Carson Wang
>Priority: Minor
> Fix For: 1.6.2, 2.0.0
>
>
> The event timeline doesn't show on job page if an executor is removed with 
> multiple line reason. For example, the html looks like below.
> {code}
> {
>   'className': 'executor removed',
>   'group': 'executors',
>   'start': new Date(1459221687861),
>   'content': ' 'data-toggle="tooltip" data-placement="bottom"' +
> 'data-title="Executor 21' +
> 'Removed at 2016/03/29 11:21:27' +
> 'Reason: Container marked as failed: 
> container_1459220038649_0005_01_22 on host: sr560. Exit status: 137. 
> Diagnostics: Container killed on request. Exit code is 137
> Container exited with a non-zero exit code 137
> Killed by external signal
> "' +
> 'data-html="true">Executor 21 removed'
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14232) Event timeline on job page doesn't show if an executor is removed with multiple line reason

2016-03-29 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14232:
--
Assignee: Carson Wang

> Event timeline on job page doesn't show if an executor is removed with 
> multiple line reason
> ---
>
> Key: SPARK-14232
> URL: https://issues.apache.org/jira/browse/SPARK-14232
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.1
>Reporter: Carson Wang
>Assignee: Carson Wang
>Priority: Minor
>
> The event timeline doesn't show on job page if an executor is removed with 
> multiple line reason. For example, the html looks like below.
> {code}
> {
>   'className': 'executor removed',
>   'group': 'executors',
>   'start': new Date(1459221687861),
>   'content': ' 'data-toggle="tooltip" data-placement="bottom"' +
> 'data-title="Executor 21' +
> 'Removed at 2016/03/29 11:21:27' +
> 'Reason: Container marked as failed: 
> container_1459220038649_0005_01_22 on host: sr560. Exit status: 137. 
> Diagnostics: Container killed on request. Exit code is 137
> Container exited with a non-zero exit code 137
> Killed by external signal
> "' +
> 'data-html="true">Executor 21 removed'
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13845) BlockStatus and StreamBlockId keep on growing result driver OOM

2016-03-29 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-13845:
--
Assignee: jeanlyn

> BlockStatus and StreamBlockId keep on growing result driver OOM
> ---
>
> Key: SPARK-13845
> URL: https://issues.apache.org/jira/browse/SPARK-13845
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.1
>Reporter: jeanlyn
>Assignee: jeanlyn
> Fix For: 2.0.0
>
>
> We have a streaming job using *FlumePollInputStream* always driver OOM after 
> few days, here is some driver heap dump before OOM
> {noformat}
>  num #instances #bytes  class name
> --
>1:  13845916  553836640  org.apache.spark.storage.BlockStatus
>2:  14020324  336487776  org.apache.spark.storage.StreamBlockId
>3:  13883881  333213144  scala.collection.mutable.DefaultEntry
>4:  8907   89043952  [Lscala.collection.mutable.HashEntry;
>5: 62360   65107352  [B
>6:163368   24453904  [Ljava.lang.Object;
>7:293651   20342664  [C
> ...
> {noformat}
> *BlockStatus* and *StreamBlockId* keep on growing, and the driver OOM in the 
> end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14243) updatedBlockStatuses does not update correctly when removing blocks

2016-03-29 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-14243:
--
Assignee: jeanlyn

> updatedBlockStatuses does not update correctly when removing blocks
> ---
>
> Key: SPARK-14243
> URL: https://issues.apache.org/jira/browse/SPARK-14243
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.1
>Reporter: jeanlyn
>Assignee: jeanlyn
>
> Currently, *updatedBlockStatuses* of *TaskMetrics* does not update correctly 
> when removing blocks in *BlockManager.removeBlock* and the method invoke 
> *removeBlock*. See:
> branch-1.6:https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1108
> branch-1.5:https://github.com/apache/spark/blob/branch-1.5/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1101
> We should make sure *updatedBlockStatuses* update correctly when:
> * Block removed from BlockManager
> * Block dropped from memory to disk
> * Block added to BlockManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14243) updatedBlockStatuses does not update correctly when removing blocks

2016-03-29 Thread Andrew Or (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216448#comment-15216448
 ] 

Andrew Or commented on SPARK-14243:
---

Thanks I've assigned this to you.

> updatedBlockStatuses does not update correctly when removing blocks
> ---
>
> Key: SPARK-14243
> URL: https://issues.apache.org/jira/browse/SPARK-14243
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.1
>Reporter: jeanlyn
>Assignee: jeanlyn
>
> Currently, *updatedBlockStatuses* of *TaskMetrics* does not update correctly 
> when removing blocks in *BlockManager.removeBlock* and the method invoke 
> *removeBlock*. See:
> branch-1.6:https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1108
> branch-1.5:https://github.com/apache/spark/blob/branch-1.5/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1101
> We should make sure *updatedBlockStatuses* update correctly when:
> * Block removed from BlockManager
> * Block dropped from memory to disk
> * Block added to BlockManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14057) sql time stamps do not respect time zones

2016-03-29 Thread Andrew Davidson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216293#comment-15216293
 ] 

Andrew Davidson commented on SPARK-14057:
-

Hi Vijay

I am fairly new to this also. I think I would contact the Russel about his 
analysis spark-connector

potentially making a change to the way dates are handled could break a lot of 
existing apps. I think the key is to figure how to come up with a solution that 
is backwards compatible.

I have filed several bugs in the past but only after discussion on the users 
email group. I would suggest before you do a lot of work you see what the user 
group thinks

many thanks for taking this on

Andy

> sql time stamps do not respect time zones
> -
>
> Key: SPARK-14057
> URL: https://issues.apache.org/jira/browse/SPARK-14057
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Andrew Davidson
>Priority: Minor
>
> we have time stamp data. The time stamp data is UTC how ever when we load the 
> data into spark data frames, the system assume the time stamps are in the 
> local time zone. This causes problems for our data scientists. Often they 
> pull data from our data center into their local macs. The data centers run 
> UTC. There computers are typically in PST or EST.
> It is possible to hack around this problem
> This cause a lot of errors in their analysis
> A complete description of this issue can be found in the following mail msg
> https://www.mail-archive.com/user@spark.apache.org/msg48121.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13447) Fix AM failure situation for dynamic allocation disabled situation

2016-03-28 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13447.
---
  Resolution: Fixed
Assignee: Saisai Shao
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Fix AM failure situation for dynamic allocation disabled situation
> --
>
> Key: SPARK-13447
> URL: https://issues.apache.org/jira/browse/SPARK-13447
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.6.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
> Fix For: 2.0.0
>
>
> Because of lag of executor disconnection events, stale states in the 
> {{CoarseGrainedSchedulerBacked}} will ruin the newly registered executor 
> information. This situation is handled for dynamic allocation enabled 
> situation, for dynamic allocation disabled scenario, we should also have the 
> similar fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13845) BlockStatus and StreamBlockId keep on growing result driver OOM

2016-03-28 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-13845:
--
Target Version/s: 1.6.2, 2.0.0  (was: 1.6.1, 2.0.0)

> BlockStatus and StreamBlockId keep on growing result driver OOM
> ---
>
> Key: SPARK-13845
> URL: https://issues.apache.org/jira/browse/SPARK-13845
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.1
>Reporter: jeanlyn
> Fix For: 2.0.0
>
>
> We have a streaming job using *FlumePollInputStream* always driver OOM after 
> few days, here is some driver heap dump before OOM
> {noformat}
>  num #instances #bytes  class name
> --
>1:  13845916  553836640  org.apache.spark.storage.BlockStatus
>2:  14020324  336487776  org.apache.spark.storage.StreamBlockId
>3:  13883881  333213144  scala.collection.mutable.DefaultEntry
>4:  8907   89043952  [Lscala.collection.mutable.HashEntry;
>5: 62360   65107352  [B
>6:163368   24453904  [Ljava.lang.Object;
>7:293651   20342664  [C
> ...
> {noformat}
> *BlockStatus* and *StreamBlockId* keep on growing, and the driver OOM in the 
> end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13845) BlockStatus and StreamBlockId keep on growing result driver OOM

2016-03-28 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-13845:
--
Target Version/s: 1.6.1, 2.0.0
   Fix Version/s: 2.0.0

> BlockStatus and StreamBlockId keep on growing result driver OOM
> ---
>
> Key: SPARK-13845
> URL: https://issues.apache.org/jira/browse/SPARK-13845
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.1
>Reporter: jeanlyn
> Fix For: 2.0.0
>
>
> We have a streaming job using *FlumePollInputStream* always driver OOM after 
> few days, here is some driver heap dump before OOM
> {noformat}
>  num #instances #bytes  class name
> --
>1:  13845916  553836640  org.apache.spark.storage.BlockStatus
>2:  14020324  336487776  org.apache.spark.storage.StreamBlockId
>3:  13883881  333213144  scala.collection.mutable.DefaultEntry
>4:  8907   89043952  [Lscala.collection.mutable.HashEntry;
>5: 62360   65107352  [B
>6:163368   24453904  [Ljava.lang.Object;
>7:293651   20342664  [C
> ...
> {noformat}
> *BlockStatus* and *StreamBlockId* keep on growing, and the driver OOM in the 
> end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14013) Properly implement temporary functions in SessionCatalog

2016-03-28 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-14013:
-

Assignee: Andrew Or

> Properly implement temporary functions in SessionCatalog
> 
>
> Key: SPARK-14013
> URL: https://issues.apache.org/jira/browse/SPARK-14013
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> Right now `SessionCatalog` just contains `CatalogFunction`, which is 
> metadata. In the future the catalog should probably take in a function 
> registry or something.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14057) sql time stamps do not respect time zones

2016-03-28 Thread Andrew Davidson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214334#comment-15214334
 ] 

Andrew Davidson commented on SPARK-14057:
-

Hi Vijay

here is some more info from the email thread mentioned above Russel is very 
involved in the development of cassandra and the spark cassandra connector. He 
has a suggestion for how to fix this bug

kind regards

Andy

http://www.slideshare.net/RussellSpitzer

On Fri, Mar 18, 2016 at 11:35 AM Russell Spitzer 
wrote:
> Unfortunately part of Spark SQL. They have based their type on
> java.sql.timestamp (and date) which adjust to the client timezone when
> displaying and storing.
> See discussions
> http://stackoverflow.com/questions/9202857/timezones-in-sql-date-vs-java-sql-d
> ate
> And Code
> https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/
> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.s
> cala#L81-L93
> 

> sql time stamps do not respect time zones
> -
>
> Key: SPARK-14057
> URL: https://issues.apache.org/jira/browse/SPARK-14057
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Andrew Davidson
>Priority: Minor
>
> we have time stamp data. The time stamp data is UTC how ever when we load the 
> data into spark data frames, the system assume the time stamps are in the 
> local time zone. This causes problems for our data scientists. Often they 
> pull data from our data center into their local macs. The data centers run 
> UTC. There computers are typically in PST or EST.
> It is possible to hack around this problem
> This cause a lot of errors in their analysis
> A complete description of this issue can be found in the following mail msg
> https://www.mail-archive.com/user@spark.apache.org/msg48121.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14079) Limit the number of queries on SQL UI

2016-03-22 Thread Andrew Or (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207275#comment-15207275
 ] 

Andrew Or commented on SPARK-14079:
---

we should do `maxRetainedQueries` or something, similar to what we already do 
for the All Jobs page.

> Limit the number of queries on SQL UI
> -
>
> Key: SPARK-14079
> URL: https://issues.apache.org/jira/browse/SPARK-14079
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>
> The SQL UI become very very slow if there are hundreds of SQL queries on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14057) sql time stamps do not respect time zones

2016-03-21 Thread Andrew Davidson (JIRA)

Andrew Davidson created SPARK-14057:
---

 Summary: sql time stamps do not respect time zones
 Key: SPARK-14057
 URL: https://issues.apache.org/jira/browse/SPARK-14057
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Andrew Davidson
Priority: Minor


we have time stamp data. The time stamp data is UTC how ever when we load the 
data into spark data frames, the system assume the time stamps are in the local 
time zone. This causes problems for our data scientists. Often they pull data 
from our data center into their local macs. The data centers run UTC. There 
computers are typically in PST or EST.

It is possible to hack around this problem

This cause a lot of errors in their analysis

A complete description of this issue can be found in the following mail msg

https://www.mail-archive.com/user@spark.apache.org/msg48121.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14013) Properly implement temporary functions in SessionCatalog

2016-03-19 Thread Andrew Or (JIRA)

Andrew Or created SPARK-14013:
-

 Summary: Properly implement temporary functions in SessionCatalog
 Key: SPARK-14013
 URL: https://issues.apache.org/jira/browse/SPARK-14013
 Project: Spark
  Issue Type: Bug
Reporter: Andrew Or


Right now `SessionCatalog` just contains `CatalogFunction`, which is metadata. 
In the future the catalog should probably take in a function registry or 
something.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14014) Replace existing analysis.Catalog with SessionCatalog

2016-03-19 Thread Andrew Or (JIRA)

Andrew Or created SPARK-14014:
-

 Summary: Replace existing analysis.Catalog with SessionCatalog
 Key: SPARK-14014
 URL: https://issues.apache.org/jira/browse/SPARK-14014
 Project: Spark
  Issue Type: Bug
Reporter: Andrew Or
Assignee: Andrew Or


As of this moment, there exist many catalogs in Spark. For Spark 2.0, we will 
have two high level catalogs only: SessionCatalog and ExternalCatalog. 
SessionCatalog (implemented in SPARK-13923) keeps track of temporary functions 
and tables and delegates other operations to ExternalCatalog.

At the same time, there's this legacy catalog called `analysis.Catalog` that 
also tracks temporary functions and tables. The goal is to get rid of this 
legacy catalog and replace it with SessionCatalog, which is the new thing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13923) Implement SessionCatalog to manage temp functions and tables

2016-03-15 Thread Andrew Or (JIRA)

Andrew Or created SPARK-13923:
-

 Summary: Implement SessionCatalog to manage temp functions and 
tables
 Key: SPARK-13923
 URL: https://issues.apache.org/jira/browse/SPARK-13923
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Andrew Or
Assignee: Andrew Or


Today, we have ExternalCatalog, which is dead code. As part of the effort of 
merging SQLContext/HiveContext we'll parse Hive commands and call the 
corresponding methods in ExternalCatalog.

However, this handles only persisted things. We need something in addition to 
that to handle temporary things. The new catalog is called SessionCatalog and 
will internally call ExternalCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6157) Unrolling with MEMORY_AND_DISK should always release memory

2016-03-14 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-6157.
--
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Unrolling with MEMORY_AND_DISK should always release memory
> ---
>
> Key: SPARK-6157
> URL: https://issues.apache.org/jira/browse/SPARK-6157
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 1.2.1
>Reporter: SuYan
>Assignee: Josh Rosen
> Fix For: 2.0.0
>
>
> === EDIT by andrewor14 ===
> The existing description was somewhat confusing, so here's a more succinct 
> version of it.
> If unrolling a block with MEMORY_AND_DISK was unsuccessful, we will drop the 
> block to disk
> directly. After doing so, however, we don't need the underlying array that 
> held the partial
> values anymore, so we should release the pending unroll memory for other 
> tasks on the same
> executor. Otherwise, other tasks may unnecessarily drop their blocks to disk 
> due to the lack
> of unroll space, resulting in worse performance.
> === Original comment ===
> Current code:
> Now we want to cache a Memory_and_disk level block
> 1. Try to put in memory and unroll unsuccessful. then reserved unroll memory 
> because we got a iterator from an unroll Array 
> 2. Then put into disk.
> 3. Get value from get(blockId), and iterator from that value, and then 
> nothing with an unroll Array. So here we should release the reserved unroll 
> memory instead will release  until the task is end.
> and also, have somebody already pull a request, for get Memory_and_disk level 
> block, while cache in memory from disk, we should, use file.length to check 
> if we can put in memory store instead just allocate a file.length buffer, may 
> lead to OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-10907) Get rid of pending unroll memory

2016-03-14 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-10907.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Get rid of pending unroll memory
> 
>
> Key: SPARK-10907
> URL: https://issues.apache.org/jira/browse/SPARK-10907
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Josh Rosen
> Fix For: 2.0.0
>
>
> It's incredibly complicated to have both unroll memory and pending unroll 
> memory in MemoryStore.scala. We can probably express it with only unroll 
> memory through some minor refactoring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13054) Always post TaskEnd event for tasks in cancelled stages

2016-03-14 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13054.
---
   Resolution: Fixed
 Assignee: Thomas Graves  (was: Andrew Or)
Fix Version/s: 2.0.0

> Always post TaskEnd event for tasks in cancelled stages
> ---
>
> Key: SPARK-13054
> URL: https://issues.apache.org/jira/browse/SPARK-13054
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Thomas Graves
> Fix For: 2.0.0
>
>
> {code}
> // The success case is dealt with separately below.
> // TODO: Why post it only for failed tasks in cancelled stages? Clarify 
> semantics here.
> if (event.reason != Success) {
>   val attemptId = task.stageAttemptId
>   listenerBus.post(SparkListenerTaskEnd(
> stageId, attemptId, taskType, event.reason, event.taskInfo, 
> taskMetrics))
> }
> {code}
> Today we only post task end events for canceled stages if the task failed. 
> There is no reason why we shouldn't just post it for all the tasks, including 
> the ones that succeeded. If we do that we will be able to simplify another 
> branch in the DAGScheduler, which needs a lot of simplification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12583) spark shuffle fails with mesos after 2mins

2016-03-14 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12583:
--
Assignee: Bertrand Bossy

> spark shuffle fails with mesos after 2mins
> --
>
> Key: SPARK-12583
> URL: https://issues.apache.org/jira/browse/SPARK-12583
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 1.6.0
>Reporter: Adrian Bridgett
>Assignee: Bertrand Bossy
> Fix For: 2.0.0
>
>
> See user mailing list "Executor deregistered after 2mins" for more details.
> As of 1.6, the driver registers with each shuffle manager via  
> MesosExternalShuffleClient.  Once this disconnects, the shuffle manager 
> automatically cleans up the data associate with that driver.
> However, the connection is terminated before this happens as it's idle. 
> Looking at a packet trace, after 120secs the shuffle manager is sending a FIN 
> packet to the driver.   The only way to delay this is to increase 
> spark.shuffle.io.connectionTimeout=3600s on the shuffle manager.
> I patched the MesosExternalShuffleClient (and ExternalShuffleClient) with 
> newbie Scala skills to call the TransportContext call with 
> closeIdleConnections "false" and this didn't help (hadn't done the network 
> trace first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12583) spark shuffle fails with mesos after 2mins

2016-03-14 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-12583.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> spark shuffle fails with mesos after 2mins
> --
>
> Key: SPARK-12583
> URL: https://issues.apache.org/jira/browse/SPARK-12583
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 1.6.0
>Reporter: Adrian Bridgett
> Fix For: 2.0.0
>
>
> See user mailing list "Executor deregistered after 2mins" for more details.
> As of 1.6, the driver registers with each shuffle manager via  
> MesosExternalShuffleClient.  Once this disconnects, the shuffle manager 
> automatically cleans up the data associate with that driver.
> However, the connection is terminated before this happens as it's idle. 
> Looking at a packet trace, after 120secs the shuffle manager is sending a FIN 
> packet to the driver.   The only way to delay this is to increase 
> spark.shuffle.io.connectionTimeout=3600s on the shuffle manager.
> I patched the MesosExternalShuffleClient (and ExternalShuffleClient) with 
> newbie Scala skills to call the TransportContext call with 
> closeIdleConnections "false" and this didn't help (hadn't done the network 
> trace first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13833) Guard against race condition when re-caching spilled bytes in memory

2016-03-14 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13833.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Guard against race condition when re-caching spilled bytes in memory
> 
>
> Key: SPARK-13833
> URL: https://issues.apache.org/jira/browse/SPARK-13833
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Reporter: Josh Rosen
>Assignee: Apache Spark
> Fix For: 2.0.0
>
>
> When reading data from the DiskStore and attempting to cache it back into the 
> memory store, we should guard against race conditions where multiple readers 
> are attempting to re-cache the same block in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13328) Possible poor read performance for broadcast variables with dynamic resource allocation

2016-03-11 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13328.
---
  Resolution: Fixed
Assignee: Nezih Yigitbasi
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Possible poor read performance for broadcast variables with dynamic resource 
> allocation
> ---
>
> Key: SPARK-13328
> URL: https://issues.apache.org/jira/browse/SPARK-13328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2
>Reporter: Nezih Yigitbasi
>Assignee: Nezih Yigitbasi
> Fix For: 2.0.0
>
>
> When dynamic resource allocation is enabled fetching broadcast variables from 
> removed executors were causing job failures and SPARK-9591 fixed this problem 
> by trying all locations of a block before giving up. However, the locations 
> of a block is retrieved only once from the driver in this process and the 
> locations in this list can be stale due to dynamic resource allocation. This 
> situation gets worse when running on a large cluster as the size of this 
> location list can be in the order of several hundreds out of which there may 
> be tens of stale entries. What we have observed is with the default settings 
> of 3 max retries and 5s between retries (that's 15s per location) the time it 
> takes to read a broadcast variable can be as high as ~17m (below log shows 
> the failed 70th block fetch attempt where each attempt takes 15s)
> {code}
> ...
> 16/02/13 01:02:27 WARN storage.BlockManager: Failed to fetch remote block 
> broadcast_18_piece0 from BlockManagerId(8, ip-10-178-77-38.ec2.internal, 
> 60675) (failed attempt 70)
> ...
> 16/02/13 01:02:27 INFO broadcast.TorrentBroadcast: Reading broadcast variable 
> 18 took 1051049 ms
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13604) Sync worker's state after registering with master

2016-03-10 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13604.
---
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Sync worker's state after registering with master
> -
>
> Key: SPARK-13604
> URL: https://issues.apache.org/jira/browse/SPARK-13604
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.0.0
>
>
> If Master cannot talk with Worker for a while and then network is back, 
> Worker may leak existing executors and drivers. We should sync worker's state 
> after registering with master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3854) Scala style: require spaces before `{`

2016-03-10 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-3854:
-
Assignee: Dongjoon Hyun

> Scala style: require spaces before `{`
> --
>
> Key: SPARK-3854
> URL: https://issues.apache.org/jira/browse/SPARK-3854
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Reporter: Josh Rosen
>Assignee: Dongjoon Hyun
> Fix For: 2.0.0
>
>
> We should require spaces before opening curly braces.  This isn't in the 
> style guide, but it probably should be:
> {code}
> // Correct:
> if (true) {
>   println("Wow!")
> }
> // Incorrect:
> if (true){
>println("Wow!")
> }
> {code}
> See https://github.com/apache/spark/pull/1658#discussion-diff-18611791 for an 
> example "in the wild."
> {{git grep "){"}} shows only a few occurrences of this style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3854) Scala style: require spaces before `{`

2016-03-10 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-3854.
--
  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Scala style: require spaces before `{`
> --
>
> Key: SPARK-3854
> URL: https://issues.apache.org/jira/browse/SPARK-3854
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Reporter: Josh Rosen
> Fix For: 2.0.0
>
>
> We should require spaces before opening curly braces.  This isn't in the 
> style guide, but it probably should be:
> {code}
> // Correct:
> if (true) {
>   println("Wow!")
> }
> // Incorrect:
> if (true){
>println("Wow!")
> }
> {code}
> See https://github.com/apache/spark/pull/1658#discussion-diff-18611791 for an 
> example "in the wild."
> {{git grep "){"}} shows only a few occurrences of this style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-13696) Remove BlockStore interface to more cleanly reflect different memory and disk store responsibilities

2016-03-10 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-13696.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Remove BlockStore interface to more cleanly reflect different memory and disk 
> store responsibilities
> 
>
> Key: SPARK-13696
> URL: https://issues.apache.org/jira/browse/SPARK-13696
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 2.0.0
>
>
> Today, both the MemoryStore and DiskStore implement a common BlockStore API, 
> but I feel that this API is inappropriate because it abstracts away important 
> distinctions between the behavior of these two stores.
> For instance, the disk store doesn't have a notion of storing deserialized 
> objects, so it's confusing for it to expose object-based APIs like 
> putIterator() and getValues() instead of only exposing binary APIs and 
> pushing the responsibilities of serialization and deserialization to the 
> client.
> As part of a larger BlockManager interface cleanup, I'd like to remove the 
> BlockStore API and refine the MemoryStore and DiskStore interfaces to reflect 
> more narrow sets of responsibilities for those components.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

< 2 3 4 5 6 7 8 9 10 11 >

601 - 700 of 3551 matches

Mail list logo