Add option to Spark UI to proxy to the executors?

2021-08-20 Thread Holden Karau
Hi Folks,

I'm wondering what people think about the idea of having the Spark UI
(optionally) act as a proxy to the executors? This could help with exec UI
access in some deployment environments.

Cheers,

Holden :)

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: Observer Namenode and Committer Algorithm V1

2021-08-20 Thread Steve Loughran
ooh, this is fun,

v2 isn't safe to use unless every task attempt generates files with exactly
the same names and it is okay to intermingle the output of two task
attempts.

This is because task commit can felt partway through (or worse, that
process pause for a full GC), and a second attempt committed. Spark commit
algorithm assumes that it's OK to commit a 2nd attempt if the first attempt
failed, timed out etc. It is for v1, but not v2

Therefore: a (nonbinding) -1 to any proposal to switch to v2. You are only
changing problems


I think the best fix here is to do it in the FileOutputCommitter. Be aware
that we are all scared of that class and always want to do the minimum
necessary.

I will certainly add to the manifest committer, whose "call for reviewers
and testing" is still open, especially all the way through spark
https://github.com/apache/hadoop/pull/2971

That committer works with HDFS too, I'd be interested in anyone
benchmarking it on queries with deep/wide directory trees and with
different tasks all generating output for the same destination directories
(i.e file rename dominates in job commit, not task rename). I'm not
optimising it for HDFS -it's trying to deal with cloud storage quirks like
nonatomic dir rename (GCS), slow list/file rename perf (everywhere), deep
directory delete timeouts, and other cloud storage specific issues.


Further reading on the commit problem in general
https://github.com/steveloughran/zero-rename-committer/releases/tag/tag_release_2021-05-17

-Steve



On Tue, 17 Aug 2021 at 17:39, Adam Binford  wrote:

> Hi,
>
> We ran into an interesting issue that I wanted to share as well as get
> thoughts on if anything should be done about this. We run our own Hadoop
> cluster and recently deployed an Observer Namenode to take some burden off
> of our Active Namenode. We mostly use Delta Lake as our format, and
> everything seemed great. But when running some one-off analytics we ran
> into an issue. Specifically, we did something like:
>
> "df..repartition(1).write.csv()"
>
> This is our quick way of creating a CSV we can download and do other
> things with when our result is some small aggregation. However, we kept
> getting an empty output directory (just a _SUCCESS file and nothing else),
> even though in the Spark UI it says it wrote some positive number of rows.
> Eventually traced it back to our update to use the
> ObserverReadProxyProvider in our notebook sessions. I finally figured out
> it was due to the "Maintaining Client Consistency" section talked about in
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html
> .
>
> After setting the auto msync period to a low value, the writes started
> working. I kept digging in and realized it's due to how the
> FileOutputCommitter v1 algorithm works. During the commitJob phase, the
> AM/driver does a file system listing on the output directory to find all
> the finished task output files it needs to move to the top level output
> directory. But since this is a read, the observer can serve this request,
> but it can be out of date and not see the newly written files that just
> finished from the executors. The auto msync fixed it because it forced the
> driver to do an msync before the read took place. However, frequent auto
> msyncs can defeat some of the performance benefits of the Observer.
>
> The v2 algorithm shouldn't have this issue because the tasks themselves
> copy the output to the final directory when they finish, and the driver
> simply adds the _SUCCESS at the end. And Hadoop's default is v2, but Spark
> overrides that to use v1 by default because of potential correctness
> issues, which is fair. While this is mostly an issue with Hadoop, the fact
> that Spark defaults to the v1 algorithm makes it somewhat of a Spark
> problem. Also, things like Delta Lake (or even regular structured streaming
> output I think) shouldn't have issues because they are direct write with
> transaction log based, so no file moving on the driver involved.
>
> So I mostly wanted to share that in case anyone else runs into this same
> issue. But also wanted to get thoughts on if anything should be done about
> this to prevent it from happening. Several ideas in no particular order:
>
> - Perform an msync during Spark's commitJob before calling the parent
> commitJob. Since this is only available in newer APIs, probably isn't even
> possible while maintaining compatibility with older Hadoop versions.
> - Attempt to get an msync added upstream in Hadoop's v1 committer's
> commitJob
> - Attempt to detect the use of the ObserverReadProxyProvider and either
> force using v2 committer on the spark side or just print out a warning that
> you either need to use the v2 committer or you need to set the auto msync
> period very low or 0 to guarantee correct output.
> - Simply add something to the Spark docs somewhere about things to know
> when using the ObserverReadProxyProvider
> - Assume that if 

[VOTE] Release Spark 3.2.0 (RC1)

2021-08-20 Thread Gengliang Wang
Please vote on releasing the following candidate as Apache Spark version 3.2
.0.

The vote is open until 11:59pm Pacific time Aug 25 and passes if a majority
+1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.2.0-rc1 (commit
6bb3523d8e838bd2082fb90d7f3741339245c044):
https://github.com/apache/spark/tree/v3.2.0-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1388

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/

The list of bug fixes going into 3.2.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12349407

This release is using the release script of the tag v3.2.0-rc1.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.0?
===
The current list of open tickets targeted at 3.2.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: Observer Namenode and Committer Algorithm V1

2021-08-20 Thread Adam Binford
So it turns out Delta Lake isn't compatible out of the box due to it's
mixed use of the FileContext API for writes and the FileSystem API for
reads on the driver. Bringing that up with those devs now but in the
meantime the auto-msync-only-on-driver trick is already coming in handy,
thanks!

On Wed, Aug 18, 2021 at 10:52 AM Adam Binford  wrote:

> Ahhh we don't do any RDD checkpointing but that makes sense too. Thanks
> for the tip on setting that on the driver only, I didn't know that was
> possible but it makes a lot of sense.
>
> I couldn't tell you the first thing about reflection but good to know it's
> actually something possible to implement on the Spark side. Only really
> know enough Scala to get my away around the Spark repo. So I probably
> couldn't help much implementing the fixes but happy to test or bounce ideas
> off of. We'll probably stick to the committer v2 for ad-hoc cases and auto
> msync turned off and see if we run into any other issues. Are there any
> issues for this yet? If we encounter anything else I can report it there.
>
> Adam
>
> On Tue, Aug 17, 2021 at 4:17 PM Erik Krogen  wrote:
>
>> Hi Adam,
>>
>> Thanks for this great writeup of the issue. We (LinkedIn) also operate
>> Observer NameNodes, and have observed the same issues, but have not yet
>> gotten around to implementing a proper fix.
>>
>> To add a bit of context from our side, there is at least one other place
>> besides the committer v1 algorithm where this can occur, specifically
>> around RDD checkpointing. In this situation, executors write out data to
>> HDFS, then communicate their status back to the driver, which then tries to
>> gather metadata about those files on HDFS (a listing operation). For the
>> time being, we have worked around this by enabling auto-msync mode (as
>> described in HDFS-14211
>> )
>> with dfs.client.failover.observer.auto-msync-period.=0. We set
>> this in our default configurations *on the driver only*, which helps to
>> make sure we get most of the scalability benefits of the observer reads. We
>> achieve this by putting the config as a System property in
>> spark.driver.defaultJavaOptions. This can cause performance issues with
>> operations which perform many metadata operations serially, but it's a
>> tradeoff we find acceptable for now in terms of correctness vs. performance.
>>
>> Long-term, we believe adding appropriate msync() commands to Spark is the
>> right way forward (the first option you mentioned). I think the
>> documentation mentioned in your 4th option is a good short-term addition,
>> but in the long run, targeted msync() operations will be a more performant
>> fix that can work out-of-the-box. We can hide the calls behind reflection
>> to mitigate concerns around compatibility if needed. There is interest from
>> our side in pursuing this work, and certainly we would be happy to
>> collaborate if there is interest from you or others as well.
>>
>> On Tue, Aug 17, 2021 at 9:40 AM Adam Binford  wrote:
>>
>>> Hi,
>>>
>>> We ran into an interesting issue that I wanted to share as well as get
>>> thoughts on if anything should be done about this. We run our own Hadoop
>>> cluster and recently deployed an Observer Namenode to take some burden off
>>> of our Active Namenode. We mostly use Delta Lake as our format, and
>>> everything seemed great. But when running some one-off analytics we ran
>>> into an issue. Specifically, we did something like:
>>>
>>> "df..repartition(1).write.csv()"
>>>
>>> This is our quick way of creating a CSV we can download and do other
>>> things with when our result is some small aggregation. However, we kept
>>> getting an empty output directory (just a _SUCCESS file and nothing else),
>>> even though in the Spark UI it says it wrote some positive number of rows.
>>> Eventually traced it back to our update to use the
>>> ObserverReadProxyProvider in our notebook sessions. I finally figured out
>>> it was due to the "Maintaining Client Consistency" section talked about in
>>> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html
>>> .
>>>
>>> After setting the auto msync period to a low value, the writes started
>>> working. I kept digging in and realized it's due to how the
>>> FileOutputCommitter v1 algorithm works. During the commitJob phase, the
>>> AM/driver does a file system listing on the output directory to find all
>>> the finished task output files it needs to move to the top level output
>>> directory. But since this is a read, the observer can serve this request,
>>> but it can be out of date and not see the newly written files that just
>>> finished from the executors. The auto msync fixed it because it forced the
>>> driver to do an msync before the read took place. However, frequent auto
>>> msyncs can defeat some of the performance benefits of the Observer.
>>>
>>> The v2 algorithm shouldn't have this issue because the tasks themselves
>>> 

Spark Thriftserver is failing for when submitting command from beeline

2021-08-20 Thread Pralabh Kumar
Hi Dev

Environment details

Hadoop 3.2
Hive 3.1
Spark 3.0.3

Cluster : Kerborized .

1) Hive server is running fine
2) Spark sql , sparkshell, spark submit everything is working as expected.
3) Connecting Hive through beeline is working fine (after kinit)
beeline -u "jdbc:hive2://:/default;principal=

Now launched Spark thrift server and try to connect it through beeline.

beeline client perfectly connects with STS .

4) beeline -u "jdbc:hive2://:/default;principal=
   a) Log says connected to
   Spark sql
   Drive : Hive JDBC


Now when I run any commands ("show tables") it fails .  Log ins STS  says

*21/08/19 19:30:12 DEBUG UserGroupInformation: PrivilegedAction as:
(auth:PROXY) via  (auth:KERBEROS)
from:org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Client.createClientTransport(HadoopThriftAuthBridge.java:208)*
*21/08/19 19:30:12 DEBUG UserGroupInformation: PrivilegedAction as:*
**  * (auth:PROXY) via * **  * (auth:KERBEROS)
from:org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)*
21/08/19 19:30:12 DEBUG TSaslTransport: opening transport
org.apache.thrift.transport.TSaslClientTransport@f43fd2f
21/08/19 19:30:12 ERROR TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed to
find any Kerberos tgt)]
at
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:95)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
at
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:38)
at
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at
org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:480)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:247)
at
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1707)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3600)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3652)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3632)
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1556)
at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1545)
at
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$databaseExists$1(HiveClientImpl.scala:384)




My guess is authorization through proxy is not working .



Please help


Regards
Pralabh Kumar


Re: -1s on committed but not released code?

2021-08-20 Thread Tom Graves
 So personally I think its fine to comment post merge but I think an issue 
should also be filed (that might just be me though).  This change was reviewed 
and committed so if someone found a problem with it, then it should be 
officially tracked as a bug. 
I would think a -1 on a already committed issue is very rare and the person who 
gave it should give technical reasons for it. From that reason it should be 
fairly clear, if its a functional bug just fix it as a bug, if its something 
else with the design then I think it has to be discussed further.  In my 
opinion it has been committed and is valid until that discussion comes to a 
conclusion.   The one argument against that is if something is pushed in very 
quickly and people aren't given time to adequately review.  I can see in that 
case where you might revert it more quickly. 
Tom

On Thursday, August 19, 2021, 08:25:14 PM CDT, Hyukjin Kwon 
 wrote:  
 
 Yeah, I think we can discuss and revert it (or fix it) per the veto set. Often 
problems are found later after codes are merged.


2021년 8월 20일 (금) 오전 4:08, Mridul Muralidharan 님이 작성:

Hi Holden,
  In the past, I have seen discussions on the merged pr to thrash out the 
details.Usually it would be clear whether to revert and reformulate the change 
or concerns get addressed and possibly result in follow up work.
This is usually helped by the fact that we typically are conservative and don’t 
merge changes too quickly: giving folks sufficient time to review and opine.
Regards,Mridul 
On Thu, Aug 19, 2021 at 1:36 PM Holden Karau  wrote:

Hi Y'all,
This just recently came up but I'm not super sure on how we want to handle this 
in general. If code was committed under the lazy consensus model and then a 
committer or PMC -1s it post merge, what do we want to do?
I know we had some previous discussion around -1s, but that was largely focused 
on pre-commit -1s.
Cheers,
Holden :)

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
YouTube Live Streams: https://www.youtube.com/user/holdenkarau