[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-11-23 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222593#comment-14222593
 ] 

Xuefu Zhang commented on SPARK-2321:


I have created SPARK-4567 to track the request of making SparkJobInfo and 
SparkStageInfo serializable.

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical
 Fix For: 1.2.0


 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-11-19 Thread Aniket Bhatnagar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218063#comment-14218063
 ] 

Aniket Bhatnagar commented on SPARK-2321:
-

Just a quick question, will the API be usable from job submitter process in 
yarn-cluster mode (i.e. when the driver is running as a separate YARN process?)?

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical
 Fix For: 1.2.0


 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-11-19 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218502#comment-14218502
 ] 

Patrick Wendell commented on SPARK-2321:


Currently this a programmatic API for Spark applications to know their own 
progress. The task in this JIRA was just to stabilize existing interfaces. It's 
not a reporting API for external services to query progress. That is probably 
best to roll-up with a general REST API for exposing specifics of a Spark 
application.

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical
 Fix For: 1.2.0


 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-11-18 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217485#comment-14217485
 ] 

Rui Li commented on SPARK-2321:
---

Hi [~joshrosen],

Shall we make {{SparkJobInfo}} and {{SparkStageInfo}} serializable? Mainly for 
the case when spark context runs remotely. What's your opinion?

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical
 Fix For: 1.2.0


 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-11-16 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214223#comment-14214223
 ] 

Rui Li commented on SPARK-2321:
---

Hey [~joshrosen],

Thanks a lot for the update! I created SPARK-4440 for the enhancement.

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical
 Fix For: 1.2.0


 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-11-14 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212000#comment-14212000
 ] 

Rui Li commented on SPARK-2321:
---

Hi [~joshrosen],

The new API is quite useful. But the information exposed is relatively limited 
at the moment. Do you have any plan to enhance it? For example, submission and 
completion time is not available in {{SparkStageInfo}}, while they're provided 
in {{StageInfo}}.
Thanks!

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical
 Fix For: 1.2.0


 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-11-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212884#comment-14212884
 ] 

Apache Spark commented on SPARK-2321:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/3197

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical
 Fix For: 1.2.0


 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-10-16 Thread Dev Lakhani (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173619#comment-14173619
 ] 

Dev Lakhani commented on SPARK-2321:


There are some issues and bugs under the webui component that are active. 
Should we incorporate these into this Jira or is it best to work on them 
separately and then merge these (2321) changes later?

https://issues.apache.org/jira/browse/SPARK/component/12322616


 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical

 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-10-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162614#comment-14162614
 ] 

Apache Spark commented on SPARK-2321:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/2696

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical

 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-10-07 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162615#comment-14162615
 ] 

Josh Rosen commented on SPARK-2321:
---

I've opened a WIP pull request in order to discuss the design / implementation 
of a pull-based progress / status API: 
https://github.com/apache/spark/pull/2696.  I'd like to focus on discussing the 
most high-level interface / API design decisions now; once we're happy with 
those decisions, we can focus on the details of which pieces of data to expose, 
etc.

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical

 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-22 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143191#comment-14143191
 ] 

Chengxiang Li commented on SPARK-2321:
--

I agree that a stable, immutable, and Java-friendly *info classes should be 
part of this new API design, and between register a new private SparkListener 
to collect JobInfo and get it from SparkContext, I think there are 2 more 
issues which should be resolved.
# The TaskInfo/StageInfo should be collected by job, while current Stage/Task 
SparkListener events does not include any job id info, Scheduler have the 
information to connect TaskInfo/StageInfo with job, maybe we should redesign 
the SparkListener event API, and add job id info into Stage/Task event in 
Scheduler before post it to listener bus.
# get job id after submit job and get job info by job id. Currently we could 
only get job id through a very limited way, execute spark async actions with 
SimpleFutureAction, 

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical

 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-22 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143425#comment-14143425
 ] 

Josh Rosen commented on SPARK-2321:
---

{quote}
... maybe we should redesign the SparkListener event API, and add job id info 
into Stage/Task event in Scheduler before post it to listener bus.
{quote}

A stage may be used by multiple jobs, so we'd have to think carefully about how 
the API should reflect this.  It looks like DAGScheduler's internal {{Stage}} 
class tracks the id of the job that first submitted the stage, and 
{{activeJobForStage}} finds the earliest-created active job that needs the 
stage.  It might make sense to associate Stage/Task start events with the list 
of active jobs that depend on them.

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical

 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-22 Thread Mark Hamstra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143814#comment-14143814
 ] 

Mark Hamstra commented on SPARK-2321:
-

Which would be kind of the opposite half of the SparkListenerJobStart event, 
which includes an array of the StageIds in a Job.  I included that way back 
when as a suggestion of at least some of what might be needed to implement 
better job-based progress reporting.  I'd have to look, but I don't believe 
anything is actually using that stage-reporting on JobStart right now.  In any 
event, any proper progress reporting should rationalize, extend or eliminate 
that part of SparkListenerJobStart.

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical

 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-21 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142772#comment-14142772
 ] 

Josh Rosen commented on SPARK-2321:
---

The scheduler has some data structures like StageInfo, TaskInfo, RDDInfo, etc. 
that expose some of the information that we might want in a user-facing 
progress API, but we can't  expose these classes in their current form since 
they're marked @DeveloperAPI and are full of public, mutable fields (the 
responses returned from our progress / status API need to be immutable).

Maybe we should stabilize these scheduler.*Info classes' public interfaces, 
make them immutable, and add a JobInfo class for capturing per-job information. 
 We can then register a new, private SparkListener for maintaining a view of 
stage progress and add methods to SparkContext that provide stable, pull-based 
access to the snapshots of job/stage/task state.

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical

 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-17 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138242#comment-14138242
 ] 

Josh Rosen commented on SPARK-2321:
---

I agree that this should be a pull API.  A pull-based API will be easier to 
expose in Python and Java and might be sufficient for many of the common 
use-cases.

With a push-based API, we might have to worry about things like callback 
processing speed, rate-limiting, etc. (the Rx frameworks have interesting ways 
of addressing these concerns, but I'm not sure whether we want to add those 
dependencies); the pull approach won't face these issues.

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical

 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-04 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121085#comment-14121085
 ] 

Chengxiang Li commented on SPARK-2321:
--

I collect some hive side requirement here, which should be helpful for spark 
job status and statistic API design.

Hive should be able to get the following job status information through Spark 
job status API.
1. job identifier
2. current job execution state, should include RUNNING/SUCCEEDED/FAILED/KILLED.
3. running/failed/killed/total task number on job level.
4. stage identifier
5. stage state, should include RUNNING/SUCCEEDED/FAILED/KILLED
6. running/failed/killed/total task number on stage level.

MR/Tez use Counter to collect statistic information, similiar to MR/Tez 
Counter, it would be better if Spark job statistic API organize statistic 
information with:
1. group same kind statistic information by groupName.
2. displayName for both group and statistic information which would uniform 
print string for frontend(Web UI/Hive CLI/...).


 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Critical

 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-04 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121094#comment-14121094
 ] 

Reynold Xin commented on SPARK-2321:


What about pull vs push? i.e. should this be a listener like API, or some 
service with states that the caller can poll to ask?

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Critical

 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-04 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121173#comment-14121173
 ] 

Chengxiang Li commented on SPARK-2321:
--

I'm not sure whether i understand you right, here is my thought about the API 
design:
# The JobStatus/JobStatistic API only contains getter method.
# JobProgressListener contains variables of JobStatusImpl/JobStatisticImpl.
# DagScheduler post events to JobProgressListener through listener bus.
# Caller get JobStatusImpl/JobStatisticImpl from JobProgressListener with 
updated state.

So i think it should be a pull style API.

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Reynold Xin
Priority: Critical

 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org