[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-09 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23241
  
Thanks a lot @srowen .



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23160: [SPARK-26196][SPARK-26281][WebUI] Total tasks title in t...

2018-12-07 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23160
  
Thanks @srowen. This issue happens only in master branch.

Thank you all for the review and comments 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-06 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23241
  
Thanks @srowen .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...

2018-12-06 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239618167
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

Updated the code


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...

2018-12-06 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239614561
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

Thanks @vanzin for the review. 

It seems we need both the methods in the ZstdCompressionCodec class, like 
in the previous change. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...

2018-12-06 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239534469
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala ---
@@ -118,10 +118,12 @@ private[spark] class ReplayListenerBus extends 
SparkListenerBus with Logging {
   case e: HaltReplayException =>
 // Just stop replay.
   case _: EOFException if maybeTruncated =>
-  case _: IOException if maybeTruncated =>
-logWarning(s"Failed to read Spark event log: $sourceName")
   case ioe: IOException =>
-throw ioe
+if (maybeTruncated) {
--- End diff --

Thanks. updated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...

2018-12-06 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239530668
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala ---
@@ -118,10 +118,12 @@ private[spark] class ReplayListenerBus extends 
SparkListenerBus with Logging {
   case e: HaltReplayException =>
 // Just stop replay.
   case _: EOFException if maybeTruncated =>
-  case _: IOException if maybeTruncated =>
-logWarning(s"Failed to read Spark event log: $sourceName")
   case ioe: IOException =>
-throw ioe
+if (maybeTruncated) {
--- End diff --

Yes. I think, I simplified it in one block.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...

2018-12-06 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239521593
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

I have updated the code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...

2018-12-06 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239516496
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

Thanks @srowen . 

> Is it actually desirable to not fail on a partial frame? I'm not sure. We 
shouldn't encounter it elsewhere.
Yes. Ideally it shouldn't fail. Even for EventLoggingListener if the 
application is finished, the frame will close (That is why it is applicable for 
only running application). After analyzing again the zstd code, the impact 
seems lesser "Either throw exception or read the frame", and latter seems 
better.
I can update the code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23160: [SPARK-26196][SPARK-26281][WebUI] Total tasks title in t...

2018-12-06 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23160
  
cc @tgravescs @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...

2018-12-06 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239496266
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala ---
@@ -118,8 +118,6 @@ private[spark] class ReplayListenerBus extends 
SparkListenerBus with Logging {
   case e: HaltReplayException =>
 // Just stop replay.
   case _: EOFException if maybeTruncated =>
-  case _: IOException if maybeTruncated =>
--- End diff --

This was added for zstd incomplete frame reading issue. But after this 
change, that issue is no longer happens. Yes. we can keep as it is.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...

2018-12-06 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239494478
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

Hi @srowen Thanks for the comment. 
Yes. This parameter is, by default false, intended for continuous stream 
reading. So, if some classes doesn't need continuously read the data, do we 
need to set `isContinuous` as true.

 This method is called by other classes, like 'UnsafeShuffleWriter' etc. 
which are performance sensitive. If it try to read from the open frames, this 
issue (read error exception) will happen in other classes as well. But, other 
than from 'EventLoggingListener', this issue hasn't reported. That is why, I 
tried to limit it to the EventLoggingListener call. 

Yes. If we do 'continuous' true for all, then this code will be much 
simplified. 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-06 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23241
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-05 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23241
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...

2018-12-05 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23241
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading open frames of zstd, ...

2018-12-05 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23241
  
Thanks @vanzin I updated the title


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE]When zstd compression enabled,...

2018-12-05 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239232597
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

@srowen I updated the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...

2018-12-05 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23240
  
Thanks. I will update the PR title


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23241: [SPARK-26283][CORE]When zstd compression enabled, Inprog...

2018-12-05 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23241
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE]When zstd compression enabled,...

2018-12-05 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239219205
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

Thanks. I will update the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE]When zstd compression enabled,...

2018-12-05 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239216282
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

Thanks @srowen for the review.
Since, the CompressionCodec class used by many classes, we need to see any 
use case for, whether to read open frame for zstd case. As far as the 
eventLoggingListener class is concerned, it needs the open frame data also. So, 
I tried to change as minimal as possible without impacting the other calls.


> I think that if we introduce a new method we might try to make it a 
little more general, like: compressedInputStreamForPartialFile or something. It 
would be good to avoid the isInstanceOf below.

Yeah. This is a cleaner solution. Thanks.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23241: [SPARK-26283][CORE]When zstd compression enabled, Inprog...

2018-12-05 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23241
  
cc @vanzin @srowen  Kindly review


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE]When zstd compression enabled,...

2018-12-05 Thread shahidki31
GitHub user shahidki31 opened a pull request:

https://github.com/apache/spark/pull/23241

[SPARK-26283][CORE]When zstd compression enabled, Inprogress application in 
the history server appUI showing finished job as running.

## What changes were proposed in this pull request?
Root cause: Prior to Spark2.4, When we enable zst for eventLog compression, 
It always throws exception in the Application UI, when we open from the history 
server. But after 2.4 it will display the UI information based on the completed 
frames in the zstd compressed eventLog. But doesn't read open frames for 
inprogress application.
In this PR, we have added 'setContinous(true)' for reading input stream 
from eventLog, so that it can read from open frames also. (By default 
'isContinous=false' for zstd inputStream and we we try to read an open frame, 
it throws truncated error)

## How was this patch tested?
Test steps:
1) Add the configurations in the spark-defaults.conf
   (i) spark.eventLog.compress true
   (ii) spark.io.compression.codec zstd 
2) Restart history server
3) bin/spark-shell
4) sc.parallelize(1 to 1000, 1000).count
5) Open app UI from the history server UI

**Before fix**
![screenshot from 2018-12-06 
00-01-38](https://user-images.githubusercontent.com/23054875/49537340-bfe28b00-f8ee-11e8-9fca-6d42fdc89e1a.png)

**After fix:**
![screenshot from 2018-12-06 
00-34-39](https://user-images.githubusercontent.com/23054875/49537353-ca9d2000-f8ee-11e8-803d-645897b9153b.png)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shahidki31/spark zstdEventLog

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23241.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23241


commit 7d06f353dedc641956a6fd6ead5174d6733c1360
Author: Shahid 
Date:   2018-12-05T17:37:13Z

zstd eventLog




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...

2018-12-05 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23240
  
Hi @gengliangwang , It seems, this was already handled in the PR, 
https://github.com/apache/spark/pull/23160


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-04 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23088
  
Thanks @vanzin @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-04 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23088
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-03 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23088
  
Retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-03 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23088
  
Hi @vanzin , I have opened a JIRA for disk store case 
https://issues.apache.org/jira/browse/SPARK-26260. I will try to work on the 
same.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-12-03 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r238492653
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala ---
@@ -77,6 +77,34 @@ class AppStatusStoreSuite extends SparkFunSuite {
 assert(store.count(classOf[CachedQuantile]) === 2)
   }
 
+  test("only successfull task have taskSummary") {
+val store = new InMemoryStore()
+(0 until 5).foreach { i => store.write(newTaskData(i, "FAILED")) }
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-12-03 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r238492582
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -148,11 +148,20 @@ private[spark] class AppStatusStore(
 // cheaper for disk stores (avoids deserialization).
 val count = {
   Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(TaskIndexNames.EXEC_RUN_TIME)
-  .first(0L)
-  .closeableIterator()
+if (store.isInstanceOf[LevelDB]) {
--- End diff --

Done. Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23191: [SPARK-26219][CORE][branch-2.4] Executor summary ...

2018-12-03 Thread shahidki31
Github user shahidki31 closed the pull request at:

https://github.com/apache/spark/pull/23191


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23191: [SPARK-26219][CORE][branch-2.4] Executor summary should ...

2018-12-03 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23191
  
Thanks a lot @vanzin @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23205: [SPARK-26253][WebUI] : Task Summary Metrics Table on Sta...

2018-12-03 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23205
  
Thanks @pgandhi999 :+1: 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-02 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23088
  
Thanks @pgandhi999 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-12-02 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r238135277
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -148,11 +148,20 @@ private[spark] class AppStatusStore(
 // cheaper for disk stores (avoids deserialization).
 val count = {
   Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(TaskIndexNames.EXEC_RUN_TIME)
-  .first(0L)
-  .closeableIterator()
+if (store.isInstanceOf[LevelDB]) {
--- End diff --

Yes. Now, for diskStore case, it finds total tasks count and inMemory case 
only successful tasks count.

This 'count' is used to find quantileIndices for all the tasks metrics.

https://github.com/apache/spark/blob/676bbb2446af1f281b8f76a5428b7ba75b7588b3/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala#L222


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-02 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23088
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23088
  
Hi @pgandhi999 , It seems after your checkins, when there is no summary 
metrics, it is displaying empty table rather than a message which shown in the 
PR title. could you please help me to fix that. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23088
  
Hi @srowen . Yes. Currently for disk store case, we need to have a more 
optimized code.

> While it makes some sense I have two concerns: different answers based on 
disk vs memory store which shouldn't really affect things. But would a user 
ever have both and see both side by side and be confused?

This configurable store is only for history server. So user can configure 
either one at a time for history server. But, the live UI (which open from Yarn 
UI), also goes through the same code flow. Where it has 'ElementTrackingStore', 
which is also inMemory. So, if a user configure disk store for History server 
and open both live and inProgress History UI, the summary metrics will be 
different.

> changing the way the indexing works, so that you can index by specific 
metrics for successful and failed tasks differently, would be tricky, and also 
would require changing the disk store version (to invalidate old stores).

I think @vanzin suggestion seems work, but need time to give it a try and 
to test it. May be we can add as "TODO" for diskStore case or open a seperate 
JIRA for that.

> Second is, that seems like it should still entail pushing down all the 
quantile logic into the KVStore, to be clean, right? and that's a bigger change.

Thanks @srowen for the suggestion. Probably @vanzin can answer this well.

I have modified the code, for InMemory case. Disk store still uses the old 
code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23160: [SPARK-26196][WebUI] Total tasks title in the stage page...

2018-12-01 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23160
  
> So as far as I can think, I expect that if you sort the Duration column, 
it will perform sorting on row.duration instead of 
row.taskMetrics.executorRunTime, thus, not getting the desired results.

@pgandhi999 Actually we have mapped the "Duration" to "executorRunTime" for 
sorting in the PR which I have mentioned above. So, after this PR the 
"Duration" is sorting correctly


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23160: [SPARK-26196][WebUI] Total tasks title in the sta...

2018-12-01 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23160#discussion_r238060278
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -661,8 +662,8 @@ $(document).ready(function () {
 {data : "launchTime", name: "Launch Time", render: 
formatDate},
 {
 data : function (row, type) {
-if (row.duration) {
-return type === 'display' ? 
formatDuration(row.duration) : row.duration;
+if (row.taskMetrics && 
row.taskMetrics.executorRunTime) {
+return type === 'display' ? 
formatDuration(row.taskMetrics.executorRunTime) : 
row.taskMetrics.executorRunTime;
--- End diff --

Thanks removed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23191: [SPARK-26219][CORE][branch-2.4] Executor summary should ...

2018-12-01 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23191
  
cc @vanzin . Kindly review


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23160: [SPARK-26196][WebUI] Total tasks title in the stage page...

2018-11-30 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23160
  
Hi @pgandhi999 , There is a small nit there. Recently we fixed a bug 
related to the "duration" metrics in the tasks table (see 
https://github.com/apache/spark/pull/23081), but that hasn't reflected here. I 
have updated the code


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23191: [SPARK-26219][CORE][branch-2.4] Executor summary should ...

2018-11-30 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23191
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23191: [SPARK-26219][CORE][branch-2.4] Executor summary should ...

2018-11-30 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23191
  
Jenkins, test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23191: [SPARK-26219][CORE][branch-2.4] Executor summary ...

2018-11-30 Thread shahidki31
GitHub user shahidki31 opened a pull request:

https://github.com/apache/spark/pull/23191

[SPARK-26219][CORE][branch-2.4] Executor summary should get updated for 
failure jobs in the history server UI

## What changes were proposed in this pull request?
Back port the commit https://github.com/apache/spark/pull/23181 into 
Spark2.4 branch


## How was this patch tested?
Added UT




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shahidki31/spark branch-2.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23191.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23191


commit 590d580bd7e6a1a9b1f7d5645d60671c0d93decc
Author: Shahid 
Date:   2018-12-01T04:05:39Z

[SPARK-26219][CORE][branch-2.4] Executor summary should get updated for 
failure jobs in the history server UI




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...

2018-11-30 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23181
  
Thanks @vanzin. I will open a PR in 2.4 branch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...

2018-11-30 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23181
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23181: [SPARK-26219][CORE] Executor summary should get u...

2018-11-30 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23181#discussion_r237959492
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala ---
@@ -1274,47 +1274,69 @@ class AppStatusListenerSuite extends SparkFunSuite 
with BeforeAndAfter {
   }
 
   test("SPARK-25451: total tasks in the executor summary should match 
total stage tasks") {
-val testConf = conf.clone.set(LIVE_ENTITY_UPDATE_PERIOD, Long.MaxValue)
 
-val listener = new AppStatusListener(store, testConf, true)
+val isLiveSeq = Seq(true, false)
 
-val stage = new StageInfo(1, 0, "stage", 4, Nil, Nil, "details")
-listener.onJobStart(SparkListenerJobStart(1, time, Seq(stage), null))
-listener.onStageSubmitted(SparkListenerStageSubmitted(stage, new 
Properties()))
+isLiveSeq.foreach { live: Boolean =>
+  val testConf = if (live) {
+conf.clone.set(LIVE_ENTITY_UPDATE_PERIOD, Long.MaxValue)
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23181: [SPARK-26219][CORE] Executor summary should get u...

2018-11-30 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23181#discussion_r237959457
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala ---
@@ -1274,47 +1274,69 @@ class AppStatusListenerSuite extends SparkFunSuite 
with BeforeAndAfter {
   }
 
   test("SPARK-25451: total tasks in the executor summary should match 
total stage tasks") {
-val testConf = conf.clone.set(LIVE_ENTITY_UPDATE_PERIOD, Long.MaxValue)
 
-val listener = new AppStatusListener(store, testConf, true)
+val isLiveSeq = Seq(true, false)
 
-val stage = new StageInfo(1, 0, "stage", 4, Nil, Nil, "details")
-listener.onJobStart(SparkListenerJobStart(1, time, Seq(stage), null))
-listener.onStageSubmitted(SparkListenerStageSubmitted(stage, new 
Properties()))
+isLiveSeq.foreach { live: Boolean =>
--- End diff --

Thanks. Updated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23160: [SPARK-26196][WebUI] Total tasks title in the sta...

2018-11-30 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23160#discussion_r237907702
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -610,7 +610,8 @@ $(document).ready(function () {
 $("#accumulator-table").DataTable(accumulatorConf);
 
 // building tasks table that uses server side functionality
-var totalTasksToShow = responseBody.numCompleteTasks + 
responseBody.numActiveTasks;
+var totalTasksToShow = responseBody.numCompleteTasks + 
responseBody.numActiveTasks +
--- End diff --

Yes @pgandhi999 . All the failed tasks are displaying in the tasks table. 
Thanks for your great work.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23160: [SPARK-26196][WebUI] Total tasks title in the stage page...

2018-11-30 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23160
  
@tgravescs Yes. Task table is showing all the tasks. Sorry, I didn't put 
the whole screen shot.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...

2018-11-29 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23181
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23181: [SPARK-26219][CORE] Executor summary should get updated ...

2018-11-29 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23181
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23168: [SPARK-26207][doc]add PowerIterationClustering (P...

2018-11-29 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23168#discussion_r237637501
  
--- Diff: docs/ml-clustering.md ---
@@ -265,3 +265,38 @@ Refer to the [R API 
docs](api/R/spark.gaussianMixture.html) for more details.
 
 
 
+
+## Power Iteration Clustering (PIC)
+
+Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
+developed by http://www.icml2010.org/papers/387.pdf>Lin and 
Cohen.
--- End diff --

Hi @huaxingao ,  It seems  the link is not accessible now. This link 
(http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf)  seems working.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23181: [SPARK-26100][CORE] Executor summary should get updated ...

2018-11-29 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23181
  
cc @vanzin Kindly review


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23181: [SPARK-26100][CORE] Executor summary should get updated ...

2018-11-29 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23181
  
Before patch:
![screenshot from 2018-11-29 
22-13-34](https://user-images.githubusercontent.com/23054875/49246338-a21ead00-f43a-11e8-8214-f1020420be52.png)

After patch:
![screenshot from 2018-11-30 
00-54-49](https://user-images.githubusercontent.com/23054875/49246353-aa76e800-f43a-11e8-98ef-7faecaa7a50e.png)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23181: [SPARK-26100][CORE] Executor summary should get updated ...

2018-11-29 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23181
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23181: Executor summary should update for history events

2018-11-29 Thread shahidki31
GitHub user shahidki31 opened a pull request:

https://github.com/apache/spark/pull/23181

Executor summary should update for history events

## What changes were proposed in this pull request?
The root cause of the problem is, whenever the taskEnd event comes after 
stageCompleted event, execSummary is updating only for live UI. we need to 
update for history UI too.

To see the previous discussion, refer: PR for 
https://github.com/apache/spark/pull/23038, 
https://issues.apache.org/jira/browse/SPARK-26100.

## How was this patch tested?

Added UT. Manually verified


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shahidki31/spark executorUpdate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23181.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23181


commit ae71eba254235c6317c91064e9a7b2e55c1c1cce
Author: Shahid 
Date:   2018-11-29T18:09:45Z

Executor summary should update for history events




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23180: Executor summary should update for history events

2018-11-29 Thread shahidki31
Github user shahidki31 closed the pull request at:

https://github.com/apache/spark/pull/23180


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23180: Executor summary should update for history events

2018-11-29 Thread shahidki31
GitHub user shahidki31 opened a pull request:

https://github.com/apache/spark/pull/23180

Executor summary should update for history events

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shahidki31/spark executorUpdate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23180.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23180


commit ae71eba254235c6317c91064e9a7b2e55c1c1cce
Author: Shahid 
Date:   2018-11-29T18:09:45Z

Executor summary should update for history events




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23158: [SPARK-26186][SPARK-26184][CORE] Last updated time is no...

2018-11-29 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23158
  
Thanks a lot @vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23158: [SPARK-26186][SPARK-26184][CORE] Last updated time is no...

2018-11-29 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23158
  
It is random failure. Jenkins, retest this please 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23158: [SPARK-26186][SPARK-26184][CORE] Last updated time is no...

2018-11-28 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23158
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23158: [SPARK-26186][SPARK-26184][CORE] Last updated time is no...

2018-11-28 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23158
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23158: [SPARK-26186][SPARK-26184] Last updated time is n...

2018-11-28 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23158#discussion_r237295187
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 ---
@@ -334,6 +334,42 @@ class FsHistoryProviderSuite extends SparkFunSuite 
with BeforeAndAfter with Matc
 assert(!log2.exists())
   }
 
+  test("should not clean inprogress application with lastUpdated time less 
the maxTime") {
+val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1)
+val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6)
+val maxAge = TimeUnit.DAYS.toMillis(7)
+val clock = new ManualClock(0)
+val provider = new FsHistoryProvider(
+  createTestConf().set("spark.history.fs.cleaner.maxAge", 
s"${maxAge}ms"), clock)
+val log = newLogFile("inProgressApp1", None, inProgress = true)
+writeFile(log, true, None,
+  SparkListenerApplicationStart(
+"inProgressApp1", Some("inProgressApp1"), 3L, "test", 
Some("attempt1"))
+)
+clock.setTime(firstFileModifiedTime)
+provider.checkForLogs()
--- End diff --

I see, Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23158: [SPARK-26186][SPARK-26184] Last updated time is n...

2018-11-28 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23158#discussion_r237292237
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 ---
@@ -334,6 +334,42 @@ class FsHistoryProviderSuite extends SparkFunSuite 
with BeforeAndAfter with Matc
 assert(!log2.exists())
   }
 
+  test("should not clean inprogress application with lastUpdated time less 
the maxTime") {
+val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1)
+val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6)
+val maxAge = TimeUnit.DAYS.toMillis(7)
+val clock = new ManualClock(0)
+val provider = new FsHistoryProvider(
+  createTestConf().set("spark.history.fs.cleaner.maxAge", 
s"${maxAge}ms"), clock)
+val log = newLogFile("inProgressApp1", None, inProgress = true)
+writeFile(log, true, None,
+  SparkListenerApplicationStart(
+"inProgressApp1", Some("inProgressApp1"), 3L, "test", 
Some("attempt1"))
+)
+clock.setTime(firstFileModifiedTime)
+provider.checkForLogs()
--- End diff --

added Thanks. 
But for inProgress application, do we really need to set log file's last 
modified time, as the cleaner check only the application's lastUpdated time, 
which we update whenever size of the logFile changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23158: [SPARK-26186][SPARK-26184] Last updated time is n...

2018-11-28 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23158#discussion_r237291446
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
 ---
@@ -334,6 +334,42 @@ class FsHistoryProviderSuite extends SparkFunSuite 
with BeforeAndAfter with Matc
 assert(!log2.exists())
   }
 
+  test("should not clean inprogress application with lastUpdated time less 
the maxTime") {
+val firstFileModifiedTime = TimeUnit.DAYS.toMillis(1)
+val secondFileModifiedTime = TimeUnit.DAYS.toMillis(6)
+val maxAge = TimeUnit.DAYS.toMillis(7)
+val clock = new ManualClock(0)
+val provider = new FsHistoryProvider(
+  createTestConf().set("spark.history.fs.cleaner.maxAge", 
s"${maxAge}ms"), clock)
--- End diff --

Done. Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-28 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r237154542
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
--- End diff --

@srowen  Yes. everything is loaded in "sorted" order based on index, and 
then we do filtering. For In memory case, this doesn't cause any issue. but for 
diskStore extra de serialization overhead is there. 

May be one possible solution can be, for diskStore case, bring only first 
time and sort  based on the corresponding indices to compute the quantiles.

If the solution seems complicated, then we can tell the user that, summary 
metrics display the quantile summary of all the tasks, instead of completed.

correct me if I am wrong


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23160: [SPARK-26196]Total tasks title in the stage page ...

2018-11-27 Thread shahidki31
GitHub user shahidki31 opened a pull request:

https://github.com/apache/spark/pull/23160

[SPARK-26196]Total tasks title in the stage page is incorrect when there 
are failed or killed tasks

## What changes were proposed in this pull request?

Total tasks  = numCompleteTasks +  numActiveTasks + numKilledTasks + 
numFailedTasks;

## How was this patch tested?
test step:
```
bin/spark-shell 
scala > sc.parallelize(1 to 100, 10).count
```
![screenshot from 2018-11-28 
07-26-00](https://user-images.githubusercontent.com/23054875/49123523-e2691880-f2de-11e8-9c16-60d1865e6e77.png)



After patch:
![screenshot from 2018-11-28 
07-24-31](https://user-images.githubusercontent.com/23054875/49123525-e432dc00-f2de-11e8-89ca-4a53e19c9c18.png)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shahidki31/spark totalTasks

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23160.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23160


commit 37bcd6231816c7bc2b2561bff10955b822934ac6
Author: Shahid 
Date:   2018-11-28T01:41:50Z

Total tasks title in the stage page is incorrect




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23158: [SPARK-26186][SPARK-26184] Last updated time is not gett...

2018-11-27 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23158
  
Expired was determined by the `lastUpdateTime` which we need to update when 
ever an eventLog update happens

https://github.com/apache/spark/blob/2d89d109e19d1e84c4ada3c9d5d48cfcf3d997ea/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L788-L793


https://github.com/apache/spark/blob/2d89d109e19d1e84c4ada3c9d5d48cfcf3d997ea/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L1129-L1130


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23158: [SPARK-26186][SPARK-26184] Last updated time is not gett...

2018-11-27 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23158
  
cc @vanzin @srowen Kindly review


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23158: [SPARK-26186][SPARK-26184] Last updated time is n...

2018-11-27 Thread shahidki31
GitHub user shahidki31 opened a pull request:

https://github.com/apache/spark/pull/23158

[SPARK-26186][SPARK-26184] Last updated time is not getting updated for the 
Inprogress application

## What changes were proposed in this pull request?

When the 'spark.history.fs.inProgressOptimization.enabled' is true, 
inProgress application's last updated time is not getting updated in the 
History UI. Also, during the cleaning time, InProgress application is getting 
removed from the listing, even if the last updated time is within the cleaning 
threshold time.

## How was this patch tested?
Added UT, attached screen shot.
Before patch:
![screenshot from 2018-11-27 
23-22-38](https://user-images.githubusercontent.com/23054875/49101600-9b5a3380-f29c-11e8-8efc-3fb594e4279a.png)
![screenshot from 2018-11-27 
23-20-11](https://user-images.githubusercontent.com/23054875/49101601-9c8b6080-f29c-11e8-928e-643a8c8f4477.png)


After Patch:
![screenshot from 2018-11-27 
23-37-10](https://user-images.githubusercontent.com/23054875/49101911-669aac00-f29d-11e8-8181-663e4a08ab0e.png)
![screenshot from 2018-11-27 
23-39-04](https://user-images.githubusercontent.com/23054875/49102010-a5306680-f29d-11e8-947a-e8a2a09a785a.png)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shahidki31/spark HistoryLastUpdateTime

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23158.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23158


commit 985470c0e12a3c67df47b5174748652c6e6f6e57
Author: Shahid 
Date:   2018-11-27T13:32:58Z

update lastUpdateTime for inprogress application




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-26 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236455725
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
--- End diff --

@vanzin Yes. It seems, loading the stage page take lot more time if we 
enable disk store. 
InMemory store seems okay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...

2018-11-26 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23038
  
Oh. I could have re based and tested in local. Thanks @vanzin for the fix.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...

2018-11-26 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23038
  
Thank you @vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-26 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236409746
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala ---
@@ -95,10 +123,18 @@ class AppStatusStoreSuite extends SparkFunSuite {
 
   private def newTaskData(i: Int): TaskDataWrapper = {
 new TaskDataWrapper(
-  i, i, i, i, i, i, i.toString, i.toString, i.toString, i.toString, 
false, Nil, None,
+  i, i, i, i, i, i, i.toString, i.toString, "SUCCESS", i.toString, 
false, Nil, None,
   i, i, i, i, i, i, i, i, i, i,
   i, i, i, i, i, i, i, i, i, i,
   i, i, i, i, stageId, attemptId)
   }
 
+  private def failedTaskData(i: Int): TaskDataWrapper = {
--- End diff --

Thanks. done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-26 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236409661
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
+.parent(stageKey)
+.index(index)
+.first(0L)
+.asScala
+.filter(_.status == "SUCCESS") // Filter "SUCCESS" tasks
+.zipWithIndex
+.filter(x => indices.contains(x._2))
--- End diff --

Thanks. I have modified


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-26 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236409557
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
--- End diff --

Thank you @vanzin for the review. I will check the time with large number 
of tasks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-26 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236234523
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -150,8 +150,9 @@ private[spark] class AppStatusStore(
   Utils.tryWithResource(
 store.view(classOf[TaskDataWrapper])
   .parent(stageKey)
-  .index(TaskIndexNames.EXEC_RUN_TIME)
-  .first(0L)
+  .index(TaskIndexNames.STATUS)
+  .first("SUCCESS")
+  .last("SUCCESS")
--- End diff --

Here the index in "status". So, it will read in a sorted order based on the 
"status". ie. "SUCCESS" task will be in one group, "FAILED" task be after that 
and so on.  So, if we do  first("success") to last("success"), it will count 
only successful tasks, not all task between the first and last successful one.

Also, In the UT I have added, even indices has "success" tasks and odd 
indices has "failed tasks". But the count is still 3. ie. ("0", "2", "4" )


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-26 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236234400
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
--- End diff --

Thank you @srowen for the suggestion. I will try adding the filter method.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-25 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236120634
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
--- End diff --

Yes. If we do, "if (status == "SUCCESS")" for every iterator value, we 
can't do the skip function.
Becuase, earlier we know the exact index we need to take. ie. we can 
directly skip to 25th percentile, 50th percentile and so on.  Now, we don't 
know which index has the 25th percentile of the "SUCCESS" value, unless we 
iterate each.

Otherwise, we have to filter the "SUCCESS" the tasks  prior, like I have 
done in the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-25 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236087784
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
+.parent(stageKey)
+.index(index)
+.first(0L)
+.asScala
+.filter(_.status.startsWith("S")) // Filter "SUCCESS" tasks
+.zipWithIndex
+.filter(x => indices.contains(x._2))
+
+  if(quantileTasks.size > indices.length) {
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-25 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236087746
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala ---
@@ -77,6 +77,30 @@ class AppStatusStoreSuite extends SparkFunSuite {
 assert(store.count(classOf[CachedQuantile]) === 2)
   }
 
+  test("only successfull task have taskSummary") {
+val store = new InMemoryStore()
+(0 until 5).foreach { i => store.write(failedTaskData(i)) }
+val appStore = new AppStatusStore(store).taskSummary(stageId, 
attemptId, uiQuantiles)
+assert(appStore.size === 0)
+  }
+
+  test("summary should contain task metrics of only successfull tasks") {
+val store = new InMemoryStore()
+(0 until 5).foreach { i =>
+  if (i % 2 == 1) store.write(failedTaskData(i))
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-25 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236087749
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala ---
@@ -77,6 +77,30 @@ class AppStatusStoreSuite extends SparkFunSuite {
 assert(store.count(classOf[CachedQuantile]) === 2)
   }
 
+  test("only successfull task have taskSummary") {
+val store = new InMemoryStore()
+(0 until 5).foreach { i => store.write(failedTaskData(i)) }
+val appStore = new AppStatusStore(store).taskSummary(stageId, 
attemptId, uiQuantiles)
+assert(appStore.size === 0)
+  }
+
+  test("summary should contain task metrics of only successfull tasks") {
+val store = new InMemoryStore()
+(0 until 5).foreach { i =>
+  if (i % 2 == 1) store.write(failedTaskData(i))
+  else store.write(newTaskData(i))
+}
+val summary = new AppStatusStore(store).taskSummary(stageId, 
attemptId, uiQuantiles).get
+
+val values = (0 to 2).map( i =>
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-25 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236087731
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
--- End diff --

Yes. That was the original intend. unfortunately after converting to the 
scala collection, the skip() functionality is not there. Also the kvstore 
doesn't have any filter API to filter the "success" tasks.

The PR was for reducing the computational time for loading  the stagePage 
from the diskStore ( for history server), by avoiding in memory sorting.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-25 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236087741
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
+.parent(stageKey)
+.index(index)
+.first(0L)
+.asScala
+.filter(_.status.startsWith("S")) // Filter "SUCCESS" tasks
+.zipWithIndex
+.filter(x => indices.contains(x._2))
+
+  if(quantileTasks.size > indices.length) {
+  quantileTasks.map(task => fn(task._1).toDouble).toIndexedSeq
+  } else {
+indices.map( index =>
+ fn(quantileTasks.filter(_._2 == 
index).head._1).toDouble).toIndexedSeq
--- End diff --

Modified. Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-25 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23088#discussion_r236087736
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -222,29 +223,20 @@ private[spark] class AppStatusStore(
 val indices = quantiles.map { q => math.min((q * count).toLong, count 
- 1) }
 
 def scanTasks(index: String)(fn: TaskDataWrapper => Long): 
IndexedSeq[Double] = {
-  Utils.tryWithResource(
-store.view(classOf[TaskDataWrapper])
-  .parent(stageKey)
-  .index(index)
-  .first(0L)
-  .closeableIterator()
-  ) { it =>
-var last = Double.NaN
-var currentIdx = -1L
-indices.map { idx =>
-  if (idx == currentIdx) {
-last
-  } else {
-val diff = idx - currentIdx
-currentIdx = idx
-if (it.skip(diff - 1)) {
-  last = fn(it.next()).toDouble
-  last
-} else {
-  Double.NaN
-}
-  }
-}.toIndexedSeq
+  val quantileTasks = store.view(classOf[TaskDataWrapper])
+.parent(stageKey)
+.index(index)
+.first(0L)
+.asScala
+.filter(_.status.startsWith("S")) // Filter "SUCCESS" tasks
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-11-25 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23088
  
Thank you @srowen for the review. I will update the PR.


> When did the behavior change? you said you found the commit.

https://issues.apache.org/jira/browse/SPARK-20657 After the PR correspond 
to the JIRA, this behavior change occurred. Also, In the PR discussion hasn't 
mentioned whether it is intentional or not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23134: [SPARK-25504][Docs] Update doc about retained tas...

2018-11-25 Thread shahidki31
Github user shahidki31 closed the pull request at:

https://github.com/apache/spark/pull/23134


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23134: [SPARK-25504][Docs] Update doc about retained tasks, job...

2018-11-25 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23134
  
In the JIRA, when the user want retainedTasks as 6, actually retained 
around 58000. 
As per the code, it retained atleast 0.9*retainedTasks to retainedTasks. if 
we can clearly specify about the threshold, the confusion doesn't arises.
If it is okay with the documentation, I can close the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23134: [SPARK-25504][Docs] Update doc about retained tasks, job...

2018-11-25 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23134
  
Hi @srowen , Some of the users seems getting confused with the threshold 
values, is it worth to update the docs? Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23134: [SPARK-25504][Docs] Update doc about retained tas...

2018-11-25 Thread shahidki31
GitHub user shahidki31 opened a pull request:

https://github.com/apache/spark/pull/23134

[SPARK-25504][Docs] Update doc about retained tasks, jobs and executions

## What changes were proposed in this pull request?

Updated the documentation about spark.ui.retainedTasks, 
spark.ui.retainedJobs, spark.ui.retainedStages

## How was this patch tested?
NA



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shahidki31/spark docUpdate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23134.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23134


commit ca62b7ef2dc1582ae021b63d1aecd518f3deb8a8
Author: Shahid 
Date:   2018-11-25T16:02:03Z

update doc about retained tasks, jobs and executions




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...

2018-11-25 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23038
  
@vanzin could you please check the updated changes, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-11-25 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23088
  
@srowen , could you please review the PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23016: [SPARK-26006][mllib] unpersist 'dataInternalRepr' in the...

2018-11-25 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23016
  
> I am currently using it with spark 2.3 as
> 
> org.apache.spark
> spark-mllib_2.11
> 
> How can i get this fix?

You can cherry pick the commit from the master branch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23016: [SPARK-26006][mllib] unpersist 'dataInternalRepr' in the...

2018-11-25 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23016
  
@idlevi Actually, input and output of the prefix span are RDD. Earlier 
intermediate rdd was cached, now final rdd is cached, and materialized it. So, 
if you materialize the model, earlier it will compute from the intermediate 
level, now it directly get from the finalRdd.
 I ran all the UTs in the prefixSpanSuite, and there is hardly any time 
difference with/without the patch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [WIP][SPARK-26119][CORE][WEBUI]Task summary table should...

2018-11-21 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23088
  
(Still need more  things to test, so changed to WIP)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [WIP][SPARK-26119][CORE][WEBUI]Task summary table should...

2018-11-21 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23088
  
@srowen Yes. I did check the commit history, and the PR which modified the 
behavior didn't mention about this behavior change. If it is intentional, then 
we should change the table title of the table  from "summary metrics of 
completed tasks" to "summary metrics of all tasks".


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-11-21 Thread shahidki31
GitHub user shahidki31 reopened a pull request:

https://github.com/apache/spark/pull/23088

[SPARK-26119][CORE][WEBUI]Task summary table should contain only successful 
tasks' metrics

## What changes were proposed in this pull request?

Task summary table in the stage page currently displays the summary of all 
the tasks. However, we should display the task summary of only successful 
tasks, to follow the behavior of previous versions of spark.


## How was this patch tested?
Added UT. attached screenshot
Before patch:
![screenshot from 2018-11-20 
00-36-18](https://user-images.githubusercontent.com/23054875/48729339-62e3a580-ec5d-11e8-81f0-0d191a234ffe.png)


![screenshot from 2018-11-20 
01-18-37](https://user-images.githubusercontent.com/23054875/48731112-41d18380-ec62-11e8-8c31-1ffbfa04e746.png)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shahidki31/spark summaryMetrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23088.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23088


commit cfe2d5f3744f2d10d917883713cd78678b5157a1
Author: Shahid 
Date:   2018-11-19T18:23:57Z

task summary should contain only successful tasks

commit 8f4498e0c67f8a83401f3b0e06aef4922ef49c20
Author: Shahid 
Date:   2018-11-21T18:37:01Z

update PR




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23081: [SPARK-26109][WebUI]Duration in the task summary metrics...

2018-11-21 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23081
  
Thank you @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >