GitHub user jerryshao opened a pull request:
https://github.com/apache/spark/pull/19227
[SPARK-20060][CORE] Support accessing secure Hadoop cluster in standalone
client mode
## What changes were proposed in this pull request?
This PR leverages the facility of SPARK-16742
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19135
So it somehow reflects that CPU core contention is the main issue for
memory pre-occupation , am I right?
AFAIK from our customer, we usually don't allocate so many cores to one
exe
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19160
@squito would you please help to review this PR, thanks a lot.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19210
You should also update the the files under `dev/deps`.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19132#discussion_r138533547
--- Diff:
core/src/main/scala/org/apache/spark/status/api/v1/AllStagesResource.scala ---
@@ -47,7 +47,8 @@ private[v1] class AllStagesResource(ui
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19132#discussion_r138526050
--- Diff:
core/src/main/scala/org/apache/spark/status/api/v1/AllStagesResource.scala ---
@@ -142,7 +142,7 @@ private[v1] object AllStagesResource
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19132#discussion_r138510683
--- Diff:
core/src/main/scala/org/apache/spark/status/api/v1/AllStagesResource.scala ---
@@ -142,7 +142,7 @@ private[v1] object AllStagesResource
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/17387
@yaooqinn I think the patch here is quite old and cannot be merged anymore,
can you please close it.
If you still want to address this issue, can you please create a new PR,
thanks
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19132
Overall LGTM, @ajbozarth can you please review again?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19141#discussion_r138505323
--- Diff:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
---
@@ -565,7 +565,6 @@ private[spark] class Client
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19132
Thanks @HyukjinKwon , I will ping Josh about this thing ð .
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19130
Jenkins, retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19141#discussion_r138262309
--- Diff:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
---
@@ -565,7 +565,6 @@ private[spark] class Client
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19130
@tgravescs , thanks for your comments, can you review again, if it is what
you expected.
---
-
To unsubscribe, e-mail
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19132
Looks like I don't have the Jenkins permission to trigger UT ð . Let me
ping @srowen to trigger the test.
---
---
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19132#discussion_r138240053
--- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ---
@@ -50,6 +50,7 @@ private[spark] class SparkUI private (
val
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19132
ok to test.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19174
This seems not a big problem, all the temp files are created under
`target/tmp`, this can be cleaned by `mvn clean` or `sbt clean
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19132
Still I have a question about history server, is you event log an
incomplete event log or completed when you met such issue
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19184#discussion_r137981999
--- Diff:
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
---
@@ -104,6 +124,10 @@ public void loadNext
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19132
So your PR description is quite confusing, would you please elaborate your
problem in detail and describe how to reproduce the issue
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19142
NVM, I mean in the Spark code there're some intended empty "else" branch,
are you going to add trace
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/18948
The patch here is not solid, we will not merge it unless you have better
solution.
---
-
To unsubscribe, e-mail: reviews
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19142
I'm -1 on this PR. This PR actually fix nothing instead of adding one trace
log, also usually user will not enable trace log, so this one line fix is not
so helpful.
You can find
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19160
Jenkins, retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19132
Is this a problem only in History UI, or it also has issues in Live UI?
From my understanding you only pass a last update time for history UI, so is it
intended?
Also you mentioned "
GitHub user jerryshao opened a pull request:
https://github.com/apache/spark/pull/19160
[SPARK-21934][CORE] Expose Shuffle Netty memory usage to MetricsSystem
## What changes were proposed in this pull request?
This is a followup work of SPARK-9104 to expose the Netty
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19150
Merging to master, thanks @dongjoon-hyun .
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19130
@vanzin @tgravescs , would you please help to review, thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19131
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19150
LGTM, there's a typo in PR description, "Timeouts is deprecated." not
"TimeLimits".
---
-
To uns
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19141#discussion_r137427105
--- Diff:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
---
@@ -565,7 +565,6 @@ private[spark] class Client
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19141
OK to test.
(I may not have the permission to trigger Jenkins test ð )
---
-
To unsubscribe, e-mail: reviews
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19141
I see, thanks for the explanation.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19131
Personally I'm not fond of such fix, this will break lots of existing PRs
and force them to rebase again. Usually this could be addressed when fixing
other issues. IMHO I don't encourag
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19141
Also looks like this is not a Spark 2.2 issue, would you please fix the PR
title be more accurate about the problem?
---
-
To
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19141
Can you please describe your usage scenario and steps to reproduce your
issue, from my understanding. Did you configure your default FS to a local FS
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19131
What about other components, here you only fixed sql and core module.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19134
I see @felixcheung . Since we have a solution to turn off Python Kafka unit
test as mentioned by @vanzin , so it is fine to just mark as deprecated, not
remove the code.
Another thing is
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19131
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19140
@redsanket can you please test this with a secure Hadoop environment using
spark-submit (not Oozie), I don't want to bring in any regression
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/18628
Thanks @jiangxb1987 , let me merge it to master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19135
Sorry I'm not so familiar with this part, but from the test result seems
that the performance just improved a little. I would doubt the way you generate
RDD `0 until Integer.MAX_VALUE` might
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19134
Yes, Python kafka.py itself is OK to leave without calling it, but the UT
will involve Scala Kafka module to do the test.
Currently I don't know how to address this issue. Ideal
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19134
@srowen how do you handle python kafka.py, should it also be opt-in? As far
as I understand looks like you don't address it in th
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19130
Jenkins, retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19121
Sorry I didn't clearly say the problem. But IMO the changes you made is
really not so necessary.
---
-
To unsubscribe, e
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/18519
@ArtRand @vanzin , is it only worked in client deploy mode, am I
understanding correctly? I don't see a code to ship tokens from local client to
remote d
GitHub user jerryshao opened a pull request:
https://github.com/apache/spark/pull/19130
[SPARK-21917][CORE][YARN] Supporting Download http(s) resources in yarn mode
## What changes were proposed in this pull request?
In the current Spark, when submitting application on YARN with
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19121
No, I don't agree with you.
SPARK_USER is set in SparkContext with driver's current UGI and this env
variable will be propagated to executors to create executor's UGI with
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19121
UGI is only used for security, normally it is used for Spark application to
communicate with Hadoop using correct user.
doAs already wraps the whole `CoarseGrainedExecutorBackend` process
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19121
Can you please elaborate the problem you met, did you meet any unexpected
behavior?
The changes here get rid of env variable "SPARK_USER", this might be OK for
yarn application
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/18628
@cloud-fan @jiangxb1987, what do you think about this PR, I think it mostly
copies from HS2, and it is quite isolated unless we enabled spnego, so it
should be safe to merge.
---
If your
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19115
Please create a new pr against master branch and close this one. If the
issue doesn't exist in master branch, then consider backporting that fix to 2.2
branch.
---
If your project is set u
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19115
@awarrior please follow the
[doc](https://spark.apache.org/contributing.html) to submit patch.
You need to change the PR title like other PRs by adding JIRA id and
component tag
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19079
Please close this PR @lgrcyanny thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19103
@vanzin From my understanding seems like it is a workaround to avoid
issuing new HDFS tokens (since this user credential we already has HDFS
tokens). But how to handle HBase/Hive thing without
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19103
@tgravescs , I think it is in `AMCredentialRenewer` we explicitly create a
new `Credential` every time when issuing new tokens.
```
// HACK:
// HDFS will not issue new
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19103
>Oozie client gets the necessary tokens the application needs before
launching. It passes those tokens along to the oozie launcher job (MR job)
which will then actually call the Spark client
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19077
This PR generally looks fine to me, my concern is that will this change
bring in subtle impact on the code which leverage it.
CC @JoshRosen to take a review.
---
If your project is set
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/18935
@squito can you please review again? Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19077#discussion_r136332974
--- Diff:
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java
---
@@ -47,23 +47,29 @@ private boolean shouldPool(long
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/16803
@windpiger can you please rebase the code, it seems too old to review.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19032
Merge to master branch.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user jerryshao closed the pull request at:
https://github.com/apache/spark/pull/19074
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19074
Thanks @vanzin , it should be passed now ð .
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/9518
Merge to master branch, thanks @xflin !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19032#discussion_r136223332
--- Diff:
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
---
@@ -321,6 +326,7 @@ public ByteBuffer getMetaData
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/9518
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/9518
This LGTM, since it is a quite independent PR and doesn't bring in other
dependencies, I think it is good to be merged.
Ping others do you have further comments?
---
If your proje
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19077
Can you please add some unit test to verify your changes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19077#discussion_r136069800
--- Diff:
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java
---
@@ -47,23 +47,29 @@ private boolean shouldPool(long
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19079#discussion_r135973949
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -481,7 +481,7 @@ object SparkSubmit extends CommandLineUtils
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19079#discussion_r135973543
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -481,7 +481,7 @@ object SparkSubmit extends CommandLineUtils
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19079#discussion_r135958924
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -481,7 +481,7 @@ object SparkSubmit extends CommandLineUtils
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19079
Currently for Spark yarn-client application, we don't support fetching
files using above `SparkFiles.get` API. Since you already know where the file
is in client mode, so may be you don'
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19074
@vanzin @srowen pushed another commit to change 2.10 repl code, I tested
locally with 2.10 code, please review.
---
If your project is set up for it, you can reply to this email and have your
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19074
OK, so I will do the test locally.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19032
@vanzin @tgravescs do you have any further comment?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19032
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19074
Ohh, sorry about it, I forgot to fix it in 2.10 repl code. I will push a
fix soon.
BTW how do we trigger scala 2.10 build on Jenkins?
---
If your project is set up for it, you can reply
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19073
Yes, maybe. You can take a try locally.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19074
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19073
I think it is already in master branch @caneGuy .
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19073
@caneGuy can you please check this JIRA
(https://issues.apache.org/jira/browse/SPARK-14423), I remembered I fixed this
issue before.
---
If your project is set up for it, you can reply to this
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19073
I tried locally with same name jar uploaded twice, the spark application
can be started. Can you please paste your exception here?
---
If your project is set up for it, you can reply to this
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19073
But as I remembered the same name file will be ignored when met again. This
should not be a fatal issue, right?
---
If your project is set up for it, you can reply to this email and have your
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19073
@caneGuy why do you think it is misleading?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19074
CC @vanzin @tgravescs please review. Since 2.2 and master are quite
different in this part of code, so backporting changes a lot.
---
If your project is set up for it, you can reply to this
GitHub user jerryshao opened a pull request:
https://github.com/apache/spark/pull/19074
[SPARK-21714][CORE][BACKPORT-2.2] Avoiding re-uploading remote resources in
yarn client mode
## What changes were proposed in this pull request?
This is a backport PR to fix issue of
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19061
>If such a thing as a non-Spark repl-like application exists, it wouldn't
be getting the progress bar by default, for example, because its default log
level is "INFO" in Spa
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/18962
Sorry I missed the comments, I will file another PR against branch 2.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19032#discussion_r135466826
--- Diff:
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
---
@@ -73,6 +75,8 @@
public class
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19061
@dongjoon-hyun I'm just thinking if other repl-like projects may actually
require this, you changes here make them fail to leverage this feature. Did you
see any issue with this feature on i
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19012
LGTM, I tried locally. Looks like now the NPE is gone in yarn UT, thanks
for the fix.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19047#discussion_r135247004
--- Diff:
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java ---
@@ -136,7 +136,8 @@ void addOptionString(List cmd, String
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19034
Agree with @vanzin , if you really want to fix this issue I think you
should find out the root cause and fix the code in Spark.
---
If your project is set up for it, you can reply to this email
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19039
The changes you made in `BlacklistTracker` seems break the design purpose
of backlist. The blacklist in Spark as well as in MR/TEZ assumes bad
nodes/executors will be back to normal in several
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19034
My thinking is that different users may have different deploys and
different environment, so usually they will maintain their own scripts for such
purpose, this seems not a Spark problem from my
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19034
Should it be better to maintain it in your in-house env rather than in
community Spark? IMHO such script seem not so proper to maintain Spark codebase.
---
If your project is set up for it, you
801 - 900 of 2785 matches
Mail list logo