Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/7770#issuecomment-127066305
I took a quick pass through the current diff.
One high-level question:
The tooltip comment says that this will only be used if Tungsten is
enabled, but I noticed that there are also peak memory consumption tests for
several non-Tungsten operators, including the existing external sorter,
ExternalAppendOnlyMap, etc. Is the idea that displaying only those operators'
memory usage for non-Tungsten jobs will be confusing? It seems like the same
confusion could arise if a user has Tungsten enabled but is running non-SQL
jobs. Given this, I wonder whether it makes sense to just always show that
metric irrespective of whether Tungsten is used.
---
Reviewed 7 of 35 files at r1, 8 of 11 files at r2, 2 of 4 files at r3, 4 of
24 files at r4, 1 of 1 files at r5, 4 of 10 files at r6, 1 of 13 files at r7, 3
of 4 files at r9, 1 of 3 files at r10.
Review status: 31 of 49 files reviewed at latest revision, 17 unresolved
discussions, some commit checks failed.
---
<sup>**[core/src/main/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriter.java,
line 455
\[r11\]](https://reviewable.io:443/reviews/apache/spark/7770#-JvjyFnipw8r_uM6llmJ)**
([raw
file](https://github.com/apache/spark/blob/6aa2f7a8c2f4eb1de6281593326dce5a92d5c1e3/core/src/main/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriter.java#L455)):</sup>
This could possibly be null due to mocking. Do you remember which tests
this was null in?
---
<sup>**[core/src/main/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriter.java,
line 459
\[r11\]](https://reviewable.io:443/reviews/apache/spark/7770#-JvjySxUYG3yq_glTmT-)**
([raw
file](https://github.com/apache/spark/blob/6aa2f7a8c2f4eb1de6281593326dce5a92d5c1e3/core/src/main/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriter.java#L459)):</sup>
Why is this Java conversion necessary? As far as I know, you should still
be able to call methods in Scala maps from Java, although you might have some
weird looking imports.
---
<sup>**[core/src/main/scala/org/apache/spark/Accumulators.scala, line 157
\[r11\]](https://reviewable.io:443/reviews/apache/spark/7770#-Jvjz0QAGo9oQR2jYhzP)**
([raw
file](https://github.com/apache/spark/blob/6aa2f7a8c2f4eb1de6281593326dce5a92d5c1e3/core/src/main/scala/org/apache/spark/Accumulators.scala#L157)):</sup>
I fear that this could mask bugs if TaskContext is null when we're trying
to deserialize an external accumulator. Instead of doing a null check here,
could you write out the `isInternal` flag and check that here to decide whether
to register? If you do that, can you also add a comment that acts as a
cross-reference to explain where internal accumulators are registered?
---
<sup>**[core/src/main/scala/org/apache/spark/Accumulators.scala, line 264
\[r11\]](https://reviewable.io:443/reviews/apache/spark/7770#-JvjzLl6KJl6UVVA1CmA)**
([raw
file](https://github.com/apache/spark/blob/6aa2f7a8c2f4eb1de6281593326dce5a92d5c1e3/core/src/main/scala/org/apache/spark/Accumulators.scala#L264)):</sup>
Name boolean parameters? IntelliJ likes to complain about this.
---
<sup>**[core/src/main/scala/org/apache/spark/Accumulators.scala, line 268
\[r11\]](https://reviewable.io:443/reviews/apache/spark/7770#-JvjzK9DHuqJ_ogH2Wg4)**
([raw
file](https://github.com/apache/spark/blob/6aa2f7a8c2f4eb1de6281593326dce5a92d5c1e3/core/src/main/scala/org/apache/spark/Accumulators.scala#L268)):</sup>
Name boolean parameters?
---
<sup>**[core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala,
line 791
\[r1\]](https://reviewable.io:443/reviews/apache/spark/7770#-JvVHinFI4nT0TEsx8sk-r1-791)**
([raw
file](https://github.com/apache/spark/blob/5b5e6f36b8a0e37f1953e12c438e01c58872e5fa/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L791)):</sup>
Will this change break any user programs that may have relied on the old
behavior? Was the old behavior specified?
---
<sup>**[core/src/main/scala/org/apache/spark/scheduler/Stage.scala, line 78
\[r11\]](https://reviewable.io:443/reviews/apache/spark/7770#-Jvk-IpC7oolKQcwsJeO)**
([raw
file](https://github.com/apache/spark/blob/6aa2f7a8c2f4eb1de6281593326dce5a92d5c1e3/core/src/main/scala/org/apache/spark/scheduler/Stage.scala#L78)):</sup>
Should this comment describe what happens during partial stage
recomputations?
---
<sup>**[core/src/main/scala/org/apache/spark/TaskContext.scala, line 65
\[r11\]](https://reviewable.io:443/reviews/apache/spark/7770#-Jvk-YWmoTLAUs2KfDN3)**
([raw
file](https://github.com/apache/spark/blob/6aa2f7a8c2f4eb1de6281593326dce5a92d5c1e3/core/src/main/scala/org/apache/spark/TaskContext.scala#L65)):</sup>
Could use a `@VisibleForTesting` annotation here.
---
<sup>**[core/src/main/scala/org/apache/spark/TaskContext.scala, line 67
\[r11\]](https://reviewable.io:443/reviews/apache/spark/7770#-Jvk-blSJZ0ZULb49BEx)**
([raw
file](https://github.com/apache/spark/blob/6aa2f7a8c2f4eb1de6281593326dce5a92d5c1e3/core/src/main/scala/org/apache/spark/TaskContext.scala#L67)):</sup>
I thought that you could call `private[spark]` and `protected[spark]`
methods from Java?
---
<sup>**[core/src/test/scala/org/apache/spark/CacheManagerSuite.scala, line
89
\[r11\]](https://reviewable.io:443/reviews/apache/spark/7770#-Jvk1DKhxW_LXgEZcwle)**
([raw
file](https://github.com/apache/spark/blob/6aa2f7a8c2f4eb1de6281593326dce5a92d5c1e3/core/src/test/scala/org/apache/spark/CacheManagerSuite.scala#L89)):</sup>
Given that we removed local execution in 1.5, we might be able to remove
this code as well. Shouldn't happen here, but just wanted to note it since I
just noticed this.
---
<sup>**[sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala,
line 1621
\[r11\]](https://reviewable.io:443/reviews/apache/spark/7770#-Jvk1l80_3dvUKbI95U6)**
([raw
file](https://github.com/apache/spark/blob/6aa2f7a8c2f4eb1de6281593326dce5a92d5c1e3/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala#L1621)):</sup>
Instead of using `originalValue` and a `finally`, block, this can be
slightly simplified by using the new `withSQLConf` helper methods from
`SQLTestUtils` (which is mixed into this suite). Take a look at other uses in
this file; should be straightforward cleanup.
---
<sup>**[unsafe/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java,
line 343
\[r11\]](https://reviewable.io:443/reviews/apache/spark/7770#-Jvk20Ae8fgwebtW_UOT)**
([raw
file](https://github.com/apache/spark/blob/82f47b811607a1eeeecba437fe0ffc15d4e5f9ec/unsafe/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java#L343)):</sup>
This change is conflicted due to refactoring in Reynold's latest patch.
---
Comments from the [review on
Reviewable.io](https://reviewable.io:443/reviews/apache/spark/7770)
<!-- Sent from Reviewable.io -->
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]