Re: [SPARK-34738] issues w/k8s+minikube and PV tests

2021-04-16 Thread shane knapp ☠
alright, my canary build w/skipping the PV integration test passed w/the
docker driver:
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-k8s-clone/20/

i'll put together a PR for this over the weekend (it's a one-liner) and
once we merge i can get the remaining workers upgraded early next week.

On Thu, Apr 15, 2021 at 3:05 PM shane knapp ☠  wrote:

> i'm all for that...  and once they're turned off, we can finish the
> minikube/k8s/move-to-docker project in a couple of hours max.
>
> On Thu, Apr 15, 2021 at 3:00 PM Holden Karau  wrote:
>
>> What about if we just turn off the PV tests for now?
>> I'd be happy to help with the debugging/upgrading.
>>
>> On Thu, Apr 15, 2021 at 2:28 AM Rob Vesse  wrote:
>> >
>> > There’s at least one test (the persistent volumes one) that relies on
>> some Minikube functionality because we run integration tests for our
>> $dayjob Spark image builds using Docker for Desktop instead and that one
>> test fails because it relies on some minikube specific functionality.  That
>> test could be refactored because I think it’s just adding a minimal Ceph
>> cluster to the K8S cluster which can be done to any K8S cluster in principal
>> >
>> >
>> >
>> > Rob
>> >
>> >
>> >
>> > From: shane knapp ☠ 
>> > Date: Wednesday, 14 April 2021 at 18:56
>> > To: Frank Luo 
>> > Cc: dev , Brian K Shiratsuki 
>> > Subject: Re: [SPARK-34738] issues w/k8s+minikube and PV tests
>> >
>> >
>> >
>> > On Wed, Apr 14, 2021 at 10:32 AM Frank Luo  wrote:
>> >
>> > Is there any hard dependency on minkube? (i.e, GPU setting), kind (
>> https://kind.sigs.k8s.io/) is a stabler and simpler k8s cluster env on a
>> single machine (only requires docker) , it been widely used by k8s projects
>> testing.
>> >
>> >
>> >
>> > there are no hard deps on minikube...  it installs happily and
>> successfully runs every integration test except for persistent volumes.
>> >
>> >
>> >
>> > i haven't tried kind yet, but my time is super limited on this and i'd
>> rather not venture down another rabbit hole unless we absolutely have to.
>> >
>> >
>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [DISCUSS] Add error IDs

2021-04-16 Thread Yuming Wang
+1 for this proposal.

On Fri, Apr 16, 2021 at 5:15 AM Karen  wrote:

> We could leave space in the numbering system, but a more flexible method
> may be to have the severity as a field associated with the error class -
> the same way we would associate error ID with SQLSTATE, or with whether an
> error is user-facing or internal. As you noted, I don't believe there is a
> standard framework for hints/warnings in Spark today. I propose that we
> leave out severity as a field until there is sufficient demand. We will
> leave room in the format for other fields.
>
> On Thu, Apr 15, 2021 at 3:18 AM Steve Loughran 
> wrote:
>
>>
>> Machine readable logs are always good, especially if you can read the
>> entire logs into an SQL query.
>>
>> It might be good to use some specific differentiation between
>> hint/warn/fatal error in the numbering so that any automated analysis of
>> the logs can identify the class of an error even if its an error not
>> actually recognised. See VMS docs for an example of this; that in Windows
>> is apparently based on their work
>> https://www.stsci.edu/ftp/documents/system-docs/vms-guide/html/VUG_19.html
>> . Even if things are only errors for now, leaving room in the format for
>> other levels is wise.
>>
>> The trend in cloud infras is always to have some string "NoSuchBucket"
>> which is (a) guaranteed to be maintained over time and (b) searchable in
>> google.
>>
>> (That said. AWS has every service not just making up their own values but
>> not even consistent responses for the same problem. S3 throttling: 503.
>> DynamoDB: 500 + one of two different messages. see
>> com.amazonaws.retry.RetryUtils for the details )
>>
>> On Wed, 14 Apr 2021 at 20:04, Karen  wrote:
>>
>>> Hi all,
>>>
>>> We would like to kick off a discussion on adding error IDs to Spark.
>>>
>>> Proposal:
>>>
>>> Add error IDs to provide a language-agnostic, locale-agnostic, specific,
>>> and succinct answer for which class the problem falls under. When partnered
>>> with a text-based error class (eg. 12345 TABLE_OR_VIEW_NOT_FOUND), error
>>> IDs can provide meaningful categorization. They are useful for all Spark
>>> personas: from users, to support engineers, to developers.
>>>
>>> Add SQLSTATEs. As discussed in #32013
>>> , SQLSTATEs
>>> 
>>> are portable error codes that are part of the ANSI/ISO SQL-99 standard
>>> , and
>>> especially useful for JDBC/ODBC users. They are not mutually exclusive with
>>> adding product-specific error IDs, which can be more specific; for example,
>>> MySQL uses an N-1 mapping from error IDs to SQLSTATEs:
>>> https://dev.mysql.com/doc/refman/8.0/en/error-message-elements.html.
>>>
>>> Uniquely link error IDs to error messages (1-1). This simplifies the
>>> auditing process and ensures that we uphold quality standards, as outlined
>>> in SPIP: Standardize Error Message in Spark (
>>> https://docs.google.com/document/d/1XGj1o3xAFh8BA7RCn3DtwIPC6--hIFOaNUNSlpaOIZs/edit
>>> ).
>>>
>>> Requirements:
>>>
>>> Changes are backwards compatible; developers should still be able to
>>> throw exceptions in the existing style (eg. throw new
>>> AnalysisException(“Arbitrary error message.”)). Adding error IDs will be a
>>> gradual process, as there are thousands of exceptions thrown across the
>>> code base.
>>>
>>> Optional:
>>>
>>> Label errors as user-facing or internal. Internal errors should be
>>> logged, and end-users should be aware that they likely cannot fix the error
>>> themselves.
>>>
>>> End result:
>>>
>>> Before:
>>>
>>> AnalysisException: Cannot find column ‘fakeColumn’; line 1 pos 14;
>>>
>>> After:
>>>
>>> AnalysisException: SPK-12345 COLUMN_NOT_FOUND: Cannot find column
>>> ‘fakeColumn’; line 1 pos 14; (SQLSTATE 42704)
>>>
>>> Please let us know what you think about this proposal! We’d love to hear
>>> what you think.
>>>
>>> Best,
>>>
>>> Karen Feng
>>>
>>