Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-30 Thread Alexander Shorin
> Which Python version will run that stored procedure?
>
> All Python versions supported in PySpark
>

Where in stored procedure defines the exact python version which will run
the code? That was the question.


> How to manage external dependencies?
>
> Existing way we have
> https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html
> .
> In fact, this will use the external dependencies within your Python
> interpreter so you can use all existing conda or venvs.
>
Current proposal solves this issue nohow (the stored code doesn't provide
any manifest about its dependencies and what is required to run it). So
feels like it's better to stay with UDF since they are under control and
their behaviour is predictable. Did I miss something?

How to test it via a common CI process?
>
> Existing way of PySpark unittests, see
> https://github.com/apache/spark/tree/master/python/pyspark/tests
>
Sorry, but this wouldn't work since stored procedure thing requires some
specific definition and this code will not be stored as regular python
code. Do you have any examples how to test stored python procedures as a
unit e.g. without spark?

How to manage versions and do upgrades? Migrations?
>
> This is a new feature so no migration is needed. We will keep the
> compatibility according to the sember we follow.
>
Question was not about spark, but about stored procedures itself. Any
guidelines which will not copy flaws of other systems?

Current Python UDF solution handles these problems in a good way since they
> delegate them to project level.
>
> Current UDF solution cannot handle stored procedures because UDF is on the
> worker side. This is Driver side.
>
How so? Currently it works and we never faced such issue. May be you should
have the same Python code also on the driver side? But such trivial idea
doesn't require new feature on Spark since you already have to ship that
code somehow.

--
,,,^..^,,,


Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-30 Thread Alexander Shorin
-1

Great idea to ignore the experience of others and copy bad practices back
for nothing.

If you are familiar with Python ecosystem then you should answer the
questions:
1. Which Python version will run that stored procedure?
2. How to manage external dependencies?
3. How to test it via a common CI process?
4. How to manage versions and do upgrades? Migrations?

Current Python UDF solution handles these problems in a good way since they
delegate them to project level.

--
,,,^..^,,,


On Thu, Aug 31, 2023 at 1:29 AM Allison Wang
 wrote:

> Hi all,
>
> I would like to start a discussion on “Python Stored Procedures".
>
> This proposal aims to extend Spark SQL by introducing support for stored
> procedures, starting with Python as the procedural language. This will
> enable users to run complex logic using Python within their SQL workflows
> and save these routines in catalogs like HMS for future use.
>
> *SPIP*:
> https://docs.google.com/document/d/1ce2EZrf2BxHu7TjfGn4TgToK3TBYYzRkmsIVcfmkNzE/edit?usp=sharing
> *JIRA*: https://issues.apache.org/jira/browse/SPARK-45023
>
> Looking forward to your feedback!
>
> Thanks,
> Allison
>
>


Re: Renaming blacklisting feature input

2020-08-04 Thread Alexander Shorin
Hi Sean!

Your point is good and I accept it, but I thought it worth to remind yet
again that ASF isn't limited by the US and the world is not limited by
English language and as the result shouldn't be limited by some people's
personal issue - there are specialists around who can help with these.

P.S. Sorry if my response was a bit quite disrespectful, but the intention
was to remind that blindly renaming all the things around is not a solution
- let's think world wide or at least about compatibility issues which
somehow should be handled? And what will be motivation for people to handle
them?

On Tue, Aug 4, 2020 at 5:57 PM Sean Owen  wrote:

> I know this kind of argument has bounced around not just within the
> ASF but outside too. While we should feel open to debate here, even if
> I don't think it will get anywhere new, let me suggest it won't matter
> to the decision process here, so, not worth it.
>
> We should discuss this type of change like any other. If a portion of
> the community, and committers/PMC accept that this is at the least a
> small, positive change for some, then we start with a legitimate
> proposal: there is an identifiable positive. Arguments against should
> be grounded in specific reasons there are more significant harms than
> benefits. (One clear issue: API compatibility, which I believe is
> still intended to be entirely preserved).
>
> I'd merely say that if one's position is only "meh, this does not
> matter to me, this change doesn't improve my world", it's not worth
> arguing with the people to which it matters at least a little. Changes
> happen here all the time that I don't care about or even distantly
> make my work a little harder. Doesn't mean either position is right-er
> even, we don't need to decide that.
>
> On Tue, Aug 4, 2020 at 9:33 AM Alexander Shorin  wrote:
> >
> >
> > Just no changes? Name provides no issues and is pretty clear about its
> intentions. Racist links are quite overminded.
> >
> > --
> > ,,^..^,,
> >
> >
> > On Tue, Aug 4, 2020 at 5:19 PM Tom Graves 
> wrote:
> >>
> >> Hey Folks,
> >>
> >> We have jira https://issues.apache.org/jira/browse/SPARK-32037 to
> rename the blacklisting feature.  It would be nice to come to a consensus
> on what we want to call that.
> >> It doesn't looks like we have any references to whitelist other then
> from other components.  There is some discussion on the jira and I linked
> to what some other projects have done so please take a look at that.
> >>
> >> A few options:
> >>  - blocklist
> >>  - denylist
> >>  - healthy /HealthTracker
> >>  - quarantined
> >>  - benched
> >>  - exiled
> >>  - banlist
> >>
> >> Please let me know thoughts and suggestions.
> >>
> >>
> >> Thanks,
> >> Tom
>


Re: Renaming blacklisting feature input

2020-08-04 Thread Alexander Shorin
Just no changes? Name provides no issues and is pretty clear about its
intentions. Racist links are quite overminded.

--
,,^..^,,


On Tue, Aug 4, 2020 at 5:19 PM Tom Graves 
wrote:

> Hey Folks,
>
> We have jira https://issues.apache.org/jira/browse/SPARK-32037 to rename
> the blacklisting feature.  It would be nice to come to a consensus on what
> we want to call that.
> It doesn't looks like we have any references to whitelist other then from
> other components.  There is some discussion on the jira and I linked to
> what some other projects have done so please take a look at that.
>
> A few options:
>  - blocklist
>  - denylist
>  - healthy /HealthTracker
>  - quarantined
>  - benched
>  - exiled
>  - banlist
>
> Please let me know thoughts and suggestions.
>
>
> Thanks,
> Tom
>


Re: Python friendly API for Spark 3.0

2018-09-15 Thread Alexander Shorin
What's the release due for Apache Spark 3.0? Will it be tomorrow or
somewhere at the middle of 2019 year?

I think we shouldn't care much about Python 2.x today, since quite
soon it support turns into pumpkin. For today's projects I hope nobody
takes into account support of 2.7 unless there is some legacy still to
carry on, but do we want to take that baggage into Apache Spark 3.x
era? The next time you may drop it would be only 4.0 release because
of breaking change.

--
,,,^..^,,,
On Sat, Sep 15, 2018 at 2:21 PM Maciej Szymkiewicz
 wrote:
>
> There is no need to ditch Python 2. There are basically two options
>
> Use stub files and limit yourself to support only Python 3 support. Python 3 
> users benefit from type hints, Python 2 users don't, but no core 
> functionality is affected. This is the approach I've used with 
> https://github.com/zero323/pyspark-stubs/.
> Use comment based inline syntax or stub files and don't use backward 
> incompatible features (primarily typing module - 
> https://docs.python.org/3/library/typing.html). Both Python 2 and 3 is 
> supported, but more advanced components are not. Small win for Python 2 
> users, moderate loss for Python 3 users.
>
>
>
> On Sat, 15 Sep 2018 at 02:38, Nicholas Chammas  
> wrote:
>>
>> Do we need to ditch Python 2 support to provide type hints? I don’t think so.
>>
>> Python lets you specify typing stubs that provide the same benefit without 
>> forcing Python 3.
>>
>> 2018년 9월 14일 (금) 오후 8:01, Holden Karau 님이 작성:
>>>
>>>
>>>
>>> On Fri, Sep 14, 2018, 3:26 PM Erik Erlandson  wrote:

 To be clear, is this about "python-friendly API" or "friendly python API" ?
>>>
>>> Well what would you consider to be different between those two statements? 
>>> I think it would be good to be a bit more explicit, but I don't think we 
>>> should necessarily limit ourselves.


 On the python side, it might be nice to take advantage of static typing. 
 Requires python 3.6 but with python 2 going EOL, a spark-3.0 might be a 
 good opportunity to jump the python-3-only train.
>>>
>>> I think we can make types sort of work without ditching 2 (the types only 
>>> would work in 3 but it would still function in 2). Ditching 2 entirely 
>>> would be a big thing to consider, I honestly hadn't been considering that 
>>> but it could be from just spending so much time maintaining a 2/3 code 
>>> base. I'd suggest reaching out to to user@ before making that kind of 
>>> change.


 On Fri, Sep 14, 2018 at 12:15 PM, Holden Karau  
 wrote:
>
> Since we're talking about Spark 3.0 in the near future (and since some 
> recent conversation on a proposed change reminded me) I wanted to open up 
> the floor and see if folks have any ideas on how we could make a more 
> Python friendly API for 3.0? I'm planning on taking some time to look at 
> other systems in the solution space and see what we might want to learn 
> from them but I'd love to hear what other folks are thinking too.
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): 
> https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau


>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org