Re: Adding Spark 4 to JIRA for targetted versions

2021-09-13 Thread Hyukjin Kwon
BTW, I vaguely remember that adding a new version affects the default
version for the merging script to use for JIRA resolution. e.g., now it's
3.3.0 but it becomes 4.0.0 ...
Maybe it's nicer to double check how it's affected.

2021년 9월 14일 (화) 오후 1:32, Dongjoon Hyun 님이 작성:

> I'm fine to have the version number, but breaking API compatibility should
> be discussed separately in the community.
> We decided to strive to avoid breaking APIs even in major versions and
> made a policy for that.
>
> https://spark.apache.org/versioning-policy.html
> > The Spark project strives to avoid breaking APIs or silently changing
> behavior, even at major versions.
>
>
>
> On Mon, Sep 13, 2021 at 9:00 PM Senthil Kumar  wrote:
>
>> We can have a feature(new tab) in Spark UI for Data, so that we can use
>> it to display data related metrics and detect skewness in the data. It will
>> be helpful to the users to understand their data in a better/deeper way.
>>
>> On Tue, Sep 14, 2021 at 4:07 AM Sean Owen  wrote:
>>
>>> Sure, doesn't hurt to have a placeholder.
>>>
>>> On Mon, Sep 13, 2021, 5:32 PM Holden Karau  wrote:
>>>
 Hi Folks,

 I'm going through the Spark 3.2 tickets just to make sure were not
 missing anything important and I was wondering what folks thoughts are on
 adding Spark 4 so we can target API breaking changes to the next major
 version and avoid loosing track of the issue.

 Cheers,


 Holden :)

 --
 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>>
>>
>> --
>> Senthil kumar
>>
>


Re: Adding Spark 4 to JIRA for targetted versions

2021-09-13 Thread Dongjoon Hyun
I'm fine to have the version number, but breaking API compatibility should
be discussed separately in the community.
We decided to strive to avoid breaking APIs even in major versions and made
a policy for that.

https://spark.apache.org/versioning-policy.html
> The Spark project strives to avoid breaking APIs or silently changing
behavior, even at major versions.



On Mon, Sep 13, 2021 at 9:00 PM Senthil Kumar  wrote:

> We can have a feature(new tab) in Spark UI for Data, so that we can use it
> to display data related metrics and detect skewness in the data. It will be
> helpful to the users to understand their data in a better/deeper way.
>
> On Tue, Sep 14, 2021 at 4:07 AM Sean Owen  wrote:
>
>> Sure, doesn't hurt to have a placeholder.
>>
>> On Mon, Sep 13, 2021, 5:32 PM Holden Karau  wrote:
>>
>>> Hi Folks,
>>>
>>> I'm going through the Spark 3.2 tickets just to make sure were not
>>> missing anything important and I was wondering what folks thoughts are on
>>> adding Spark 4 so we can target API breaking changes to the next major
>>> version and avoid loosing track of the issue.
>>>
>>> Cheers,
>>>
>>>
>>> Holden :)
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>
>
> --
> Senthil kumar
>


Re: Adding Spark 4 to JIRA for targetted versions

2021-09-13 Thread Senthil Kumar
We can have a feature(new tab) in Spark UI for Data, so that we can use it
to display data related metrics and detect skewness in the data. It will be
helpful to the users to understand their data in a better/deeper way.

On Tue, Sep 14, 2021 at 4:07 AM Sean Owen  wrote:

> Sure, doesn't hurt to have a placeholder.
>
> On Mon, Sep 13, 2021, 5:32 PM Holden Karau  wrote:
>
>> Hi Folks,
>>
>> I'm going through the Spark 3.2 tickets just to make sure were not
>> missing anything important and I was wondering what folks thoughts are on
>> adding Spark 4 so we can target API breaking changes to the next major
>> version and avoid loosing track of the issue.
>>
>> Cheers,
>>
>>
>> Holden :)
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>

-- 
Senthil kumar


Re: Adding Spark 4 to JIRA for targetted versions

2021-09-13 Thread Sean Owen
Sure, doesn't hurt to have a placeholder.

On Mon, Sep 13, 2021, 5:32 PM Holden Karau  wrote:

> Hi Folks,
>
> I'm going through the Spark 3.2 tickets just to make sure were not missing
> anything important and I was wondering what folks thoughts are on adding
> Spark 4 so we can target API breaking changes to the next major version and
> avoid loosing track of the issue.
>
> Cheers,
>
>
> Holden :)
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Adding Spark 4 to JIRA for targetted versions

2021-09-13 Thread Holden Karau
Hi Folks,

I'm going through the Spark 3.2 tickets just to make sure were not missing
anything important and I was wondering what folks thoughts are on adding
Spark 4 so we can target API breaking changes to the next major version and
avoid loosing track of the issue.

Cheers,


Holden :)

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


regex_column_names users feedback

2021-09-13 Thread Pablo Langa Blanco
Hi Spark devs & users,

I’m writing to get some feedback from the users of the regex_column_names
feature (spark.sql.parser.quotedRegexColumnNames) (
https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select.html)

Now, some queries like  SELECT `col_.*`/col_b FROM (SELECT 3 AS col_a, 1 as
col_b) are not allowed but in some cases, when the regular expression
resolves to only one column, it could be resolved.

For example:

   -

   SELECT `col_.*`/exp FROM (SELECT 3 AS col_a, 1 as exp) --> Could be
   resolved to SELECT col_a/exp FROM (SELECT 3 AS col_a, 1 as exp)
   -

   SELECT `col_a`/exp FROM (SELECT 3 AS col_a, 1 as col_b) -->  Could be
   resolved to SELECT col_a/exp FROM (SELECT 3 AS col_a, 1 as exp)


Does it make sense for you? Or it’s confusing and it’s preferable to fail?

Thanks

Regards