subject:"\[jira\] \[Commented\] \(SPARK\-2321\) Design a proper progress reporting event listener API"

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-11-23 Thread Xuefu Zhang (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222593#comment-14222593
]

Xuefu Zhang commented on SPARK-2321:

I have created SPARK-4567 to track the request of making SparkJobInfo and
SparkStageInfo serializable.

Design a proper progress reporting event listener API
---

Key: SPARK-2321
URL: https://issues.apache.org/jira/browse/SPARK-2321
Project: Spark
Issue Type: Improvement
Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical
Fix For: 1.2.0

This is a ticket to track progress on redesigning the SparkListener and
JobProgressListener API.
There are multiple problems with the current design, including:
0. I'm not sure if the API is usable in Java (there are at least some enums
we used in Scala and a bunch of case classes that might complicate things).
1. The whole API is marked as DeveloperApi, because we haven't paid a lot of
attention to it yet. Something as important as progress reporting deserves a
more stable API.
2. There is no easy way to connect jobs with stages. Similarly, there is no
easy way to connect job groups with jobs / stages.
3. JobProgressListener itself has no encapsulation at all. States can be
arbitrarily mutated by external programs. Variable names are sort of randomly
decided and inconsistent.
We should just revisit these and propose a new, concrete design.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-11-19 Thread Aniket Bhatnagar (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218063#comment-14218063
]

Aniket Bhatnagar commented on SPARK-2321:
-

Just a quick question, will the API be usable from job submitter process in
yarn-cluster mode (i.e. when the driver is running as a separate YARN process?)?

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-11-19 Thread Patrick Wendell (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218502#comment-14218502
]

Patrick Wendell commented on SPARK-2321:

Currently this a programmatic API for Spark applications to know their own
progress. The task in this JIRA was just to stabilize existing interfaces. It's
not a reporting API for external services to query progress. That is probably
best to roll-up with a general REST API for exposing specifics of a Spark
application.

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-11-18 Thread Rui Li (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217485#comment-14217485
]

Rui Li commented on SPARK-2321:
---

Hi [~joshrosen],

Shall we make {{SparkJobInfo}} and {{SparkStageInfo}} serializable? Mainly for
the case when spark context runs remotely. What's your opinion?

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-11-16 Thread Rui Li (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214223#comment-14214223
]

Rui Li commented on SPARK-2321:
---

Hey [~joshrosen],

Thanks a lot for the update! I created SPARK-4440 for the enhancement.

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-11-14 Thread Rui Li (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212000#comment-14212000
]

Rui Li commented on SPARK-2321:
---

Hi [~joshrosen],

The new API is quite useful. But the information exposed is relatively limited
at the moment. Do you have any plan to enhance it? For example, submission and
completion time is not available in {{SparkStageInfo}}, while they're provided
in {{StageInfo}}.
Thanks!

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-11-14 Thread Apache Spark (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212884#comment-14212884
]

Apache Spark commented on SPARK-2321:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/3197

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-10-16 Thread Dev Lakhani (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173619#comment-14173619
]

Dev Lakhani commented on SPARK-2321:

There are some issues and bugs under the webui component that are active.
Should we incorporate these into this Jira or is it best to work on them
separately and then merge these (2321) changes later?

https://issues.apache.org/jira/browse/SPARK/component/12322616

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-10-07 Thread Apache Spark (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162614#comment-14162614
]

Apache Spark commented on SPARK-2321:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/2696

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-10-07 Thread Josh Rosen (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162615#comment-14162615
]

Josh Rosen commented on SPARK-2321:
---

I've opened a WIP pull request in order to discuss the design / implementation
of a pull-based progress / status API:
https://github.com/apache/spark/pull/2696. I'd like to focus on discussing the
most high-level interface / API design decisions now; once we're happy with
those decisions, we can focus on the details of which pieces of data to expose,
etc.

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-22 Thread Chengxiang Li (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143191#comment-14143191
]

Chengxiang Li commented on SPARK-2321:
--

I agree that a stable, immutable, and Java-friendly *info classes should be
part of this new API design, and between register a new private SparkListener
to collect JobInfo and get it from SparkContext, I think there are 2 more
issues which should be resolved.
# The TaskInfo/StageInfo should be collected by job, while current Stage/Task
SparkListener events does not include any job id info, Scheduler have the
information to connect TaskInfo/StageInfo with job, maybe we should redesign
the SparkListener event API, and add job id info into Stage/Task event in
Scheduler before post it to listener bus.
# get job id after submit job and get job info by job id. Currently we could
only get job id through a very limited way, execute spark async actions with
SimpleFutureAction,

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-22 Thread Josh Rosen (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143425#comment-14143425
]

Josh Rosen commented on SPARK-2321:
---

{quote}
... maybe we should redesign the SparkListener event API, and add job id info
into Stage/Task event in Scheduler before post it to listener bus.
{quote}

A stage may be used by multiple jobs, so we'd have to think carefully about how
the API should reflect this. It looks like DAGScheduler's internal {{Stage}}
class tracks the id of the job that first submitted the stage, and
{{activeJobForStage}} finds the earliest-created active job that needs the
stage. It might make sense to associate Stage/Task start events with the list
of active jobs that depend on them.

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-22 Thread Mark Hamstra (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143814#comment-14143814
]

Mark Hamstra commented on SPARK-2321:
-

Which would be kind of the opposite half of the SparkListenerJobStart event,
which includes an array of the StageIds in a Job. I included that way back
when as a suggestion of at least some of what might be needed to implement
better job-based progress reporting. I'd have to look, but I don't believe
anything is actually using that stage-reporting on JobStart right now. In any
event, any proper progress reporting should rationalize, extend or eliminate
that part of SparkListenerJobStart.

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-21 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142772#comment-14142772
 ] 

Josh Rosen commented on SPARK-2321:
---

The scheduler has some data structures like StageInfo, TaskInfo, RDDInfo, etc. 
that expose some of the information that we might want in a user-facing 
progress API, but we can't  expose these classes in their current form since 
they're marked @DeveloperAPI and are full of public, mutable fields (the 
responses returned from our progress / status API need to be immutable).

Maybe we should stabilize these scheduler.*Info classes' public interfaces, 
make them immutable, and add a JobInfo class for capturing per-job information. 
 We can then register a new, private SparkListener for maintaining a view of 
stage progress and add methods to SparkContext that provide stable, pull-based 
access to the snapshots of job/stage/task state.

 Design a proper progress reporting  event listener API
 ---

 Key: SPARK-2321
 URL: https://issues.apache.org/jira/browse/SPARK-2321
 Project: Spark
  Issue Type: Improvement
  Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical

 This is a ticket to track progress on redesigning the SparkListener and 
 JobProgressListener API.
 There are multiple problems with the current design, including:
 0. I'm not sure if the API is usable in Java (there are at least some enums 
 we used in Scala and a bunch of case classes that might complicate things).
 1. The whole API is marked as DeveloperApi, because we haven't paid a lot of 
 attention to it yet. Something as important as progress reporting deserves a 
 more stable API.
 2. There is no easy way to connect jobs with stages. Similarly, there is no 
 easy way to connect job groups with jobs / stages.
 3. JobProgressListener itself has no encapsulation at all. States can be 
 arbitrarily mutated by external programs. Variable names are sort of randomly 
 decided and inconsistent. 
 We should just revisit these and propose a new, concrete design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-17 Thread Josh Rosen (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138242#comment-14138242
]

Josh Rosen commented on SPARK-2321:
---

I agree that this should be a pull API. A pull-based API will be easier to
expose in Python and Java and might be sufficient for many of the common
use-cases.

With a push-based API, we might have to worry about things like callback
processing speed, rate-limiting, etc. (the Rx frameworks have interesting ways
of addressing these concerns, but I'm not sure whether we want to add those
dependencies); the pull approach won't face these issues.

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-04 Thread Chengxiang Li (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121085#comment-14121085
]

Chengxiang Li commented on SPARK-2321:
--

I collect some hive side requirement here, which should be helpful for spark
job status and statistic API design.

Hive should be able to get the following job status information through Spark
job status API.
1. job identifier
2. current job execution state, should include RUNNING/SUCCEEDED/FAILED/KILLED.
3. running/failed/killed/total task number on job level.
4. stage identifier
5. stage state, should include RUNNING/SUCCEEDED/FAILED/KILLED
6. running/failed/killed/total task number on stage level.

MR/Tez use Counter to collect statistic information, similiar to MR/Tez
Counter, it would be better if Spark job statistic API organize statistic
information with:
1. group same kind statistic information by groupName.
2. displayName for both group and statistic information which would uniform
print string for frontend(Web UI/Hive CLI/...).

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-04 Thread Reynold Xin (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121094#comment-14121094
]

Reynold Xin commented on SPARK-2321:

What about pull vs push? i.e. should this be a listener like API, or some
service with states that the caller can poll to ask?

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-04 Thread Chengxiang Li (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121173#comment-14121173
]

Chengxiang Li commented on SPARK-2321:
--

I'm not sure whether i understand you right, here is my thought about the API
design:
# The JobStatus/JobStatistic API only contains getter method.
# JobProgressListener contains variables of JobStatusImpl/JobStatisticImpl.
# DagScheduler post events to JobProgressListener through listener bus.
# Caller get JobStatusImpl/JobStatisticImpl from JobProgressListener with
updated state.

So i think it should be a pull style API.

Design a proper progress reporting event listener API
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

18 matches

Site Navigation

Mail list logo

Footer information