Re: LLM script for error message improvement

2023-08-04 Thread Maciej
Besides, in case a separate discussion doesn't happen, our core 
responsibility is to follow the ASF guidelines, including the ASF 
Generative Tooling Guidance 
(https://www.apache.org/legal/generative-tooling.html).


As far as I understand it, both the first (which explicitly mentions 
ChatGPT) and the third acceptance conditions are not satisfied by this 
and the other mentioned PR.


On a side note, we should probably take a closer look at the following

'When providing contributions authored using generative AI tooling, a 
recommended practice is for contributors to indicate the tooling used to 
create the contribution. This should be included as a token in the 
source control commit message, for example including the phrase 
“Generated-by: ”.'


and consider adjusting PR template / merge tool accordingly.

Best regards,
Maciej Szymkiewicz

Web:https://zero323.net
PGP: A30CEF0C31A501EC

On 8/3/23 22:14, Maciej wrote:
I am sitting on the fence about that. In the linked PR Xiao wrote the 
following
>We published the error guideline a few years ago, but not all 
contributors adhered to it, resulting in variable quality in error 
messages.
If a policy exists but is not enforced (if that's indeed the case, I 
didn't go through the source to confirm that) it might be useful to 
learn the reasons why it happens. Normally, I'd expect
-Policy is too complex to enforce. In such case, additional tooling 
can be useful.
-Policy is not well known, and the people responsible for introducing 
it are not committed to enforcing it.
-Policy or some of its components don't really reflect community 
values and expectations.
If the problem of suspected violations was never raised on our 
standard communication channel, and as far as I can tell, it has not, 
then introducing a new tool to enforce the policy seems a bit premature.
If these were the only considerations, I'd say that improving the 
overall consistency of the project outweighs possible risks, even if 
the case for such might be poorly supported.
However, there is an elephant in the room. It is another attempt, 
after SPARK-44546, to embed generative tools directly within the Spark 
dev workflow. By principle, I am not against such tools. In fact, it 
is pretty clear that they are already used by Spark committers, and 
even if we wanted to, there is little we can do to prevent that. In 
such cases, decisions which tools, if any, to use, to what extent and 
how to treat their output are the sole responsibility of contributors.
In contrast, these proposals try to push a proprietary tool burdened 
with serious privacy and ethical issues and likely to introduce 
unclear liabilities as a standard or even required developer tool.
I can't speak for others, but personally, I'm quite uneasy about it. 
If we go this way, I strongly believe that it should be preceded by a 
serious discussion, if not the development of a formal policy, about 
what categories of tools, to what capacity, to what extent are 
acceptable within the project. Ideally, with an official opinion from 
the ASF as the copyright owner.

WDYT All? Shall we start a separate discussion?
Best regards,
Maciej Szymkiewicz

Web:https://zero323.net
PGP: A30CEF0C31A501EC
On 8/3/23 18:33, Haejoon Lee wrote:


Additional information:

Please check https://issues.apache.org/jira/browse/SPARK-37935if you 
want to start contributing to improving error messages.


You can create sub-tasks if you believe there are error messages that 
need improvement, in addition to the tasks listed in the umbrella JIRA.


You can also refer to https://github.com/apache/spark/pull/41504, 
https://github.com/apache/spark/pull/41455as an example PR.



On Thu, Aug 3, 2023 at 1:10 PM Ruifeng Zheng  wrote:

+1 from my side, I'm fine to have it as a helper script

On Thu, Aug 3, 2023 at 10:53 AM Hyukjin Kwon
 wrote:

I think adding that dev tool script to improve the error
message is fine.

On Thu, 3 Aug 2023 at 10:24, Haejoon Lee
 wrote:

Dear contributors, I hope you are doing well!

I see there are contributors who are interested in
working on error message improvements and persistent
contribution, so I want to share an llm-based error
message improvement script for helping your contribution.

You can find a detail for the script at
https://github.com/apache/spark/pull/41711. I believe
this can help your error message improvement work, so I
encourage you to take a look at the pull request and
leverage the script.

Please let me know if you have any questions or concerns.

Thanks all for your time and contributions!

Best regards,

Haejoon



OpenPGP_signature
Description: OpenPGP digital signature


Re: LLM script for error message improvement

2023-08-03 Thread Maciej
I am sitting on the fence about that. In the linked PR Xiao wrote the 
following
We published the error guideline a few years ago, but not all 

contributors adhered to it, resulting in variable quality in error messages.
If a policy exists but is not enforced (if that's indeed the case, I 
didn't go through the source to confirm that) it might be useful to 
learn the reasons why it happens. Normally, I'd expect
-Policy is too complex to enforce. In such case, additional tooling can 
be useful.
-Policy is not well known, and the people responsible for introducing it 
are not committed to enforcing it.
-Policy or some of its components don't really reflect community values 
and expectations.
If the problem of suspected violations was never raised on our standard 
communication channel, and as far as I can tell, it has not, then 
introducing a new tool to enforce the policy seems a bit premature.
If these were the only considerations, I'd say that improving the 
overall consistency of the project outweighs possible risks, even if the 
case for such might be poorly supported.
However, there is an elephant in the room. It is another attempt, after 
SPARK-44546, to embed generative tools directly within the Spark dev 
workflow. By principle, I am not against such tools. In fact, it is 
pretty clear that they are already used by Spark committers, and even if 
we wanted to, there is little we can do to prevent that. In such cases, 
decisions which tools, if any, to use, to what extent and how to treat 
their output are the sole responsibility of contributors.
In contrast, these proposals try to push a proprietary tool burdened 
with serious privacy and ethical issues and likely to introduce unclear 
liabilities as a standard or even required developer tool.
I can't speak for others, but personally, I'm quite uneasy about it. If 
we go this way, I strongly believe that it should be preceded by a 
serious discussion, if not the development of a formal policy, about 
what categories of tools, to what capacity, to what extent are 
acceptable within the project. Ideally, with an official opinion from 
the ASF as the copyright owner.

WDYT All? Shall we start a separate discussion?

Best regards,
Maciej Szymkiewicz

Web:https://zero323.net
PGP: A30CEF0C31A501EC

On 8/3/23 18:33, Haejoon Lee wrote:


Additional information:

Please check https://issues.apache.org/jira/browse/SPARK-37935if you 
want to start contributing to improving error messages.


You can create sub-tasks if you believe there are error messages that 
need improvement, in addition to the tasks listed in the umbrella JIRA.


You can also refer to https://github.com/apache/spark/pull/41504, 
https://github.com/apache/spark/pull/41455as an example PR.



On Thu, Aug 3, 2023 at 1:10 PM Ruifeng Zheng  wrote:

+1 from my side, I'm fine to have it as a helper script

On Thu, Aug 3, 2023 at 10:53 AM Hyukjin Kwon
 wrote:

I think adding that dev tool script to improve the error
message is fine.

On Thu, 3 Aug 2023 at 10:24, Haejoon Lee
 wrote:

Dear contributors, I hope you are doing well!

I see there are contributors who are interested in working
on error message improvements and persistent contribution,
so I want to share an llm-based error message improvement
script for helping your contribution.

You can find a detail for the script at
https://github.com/apache/spark/pull/41711. I believe this
can help your error message improvement work, so I
encourage you to take a look at the pull request and
leverage the script.

Please let me know if you have any questions or concerns.

Thanks all for your time and contributions!

Best regards,

Haejoon



OpenPGP_signature
Description: OpenPGP digital signature


Re: LLM script for error message improvement

2023-08-03 Thread Haejoon Lee
Additional information:

Please check https://issues.apache.org/jira/browse/SPARK-37935 if you want
to start contributing to improving error messages.

You can create sub-tasks if you believe there are error messages that need
improvement, in addition to the tasks listed in the umbrella JIRA.

You can also refer to https://github.com/apache/spark/pull/41504,
https://github.com/apache/spark/pull/41455 as an example PR.

On Thu, Aug 3, 2023 at 1:10 PM Ruifeng Zheng  wrote:

> +1 from my side, I'm fine to have it as a helper script
>
> On Thu, Aug 3, 2023 at 10:53 AM Hyukjin Kwon  wrote:
>
>> I think adding that dev tool script to improve the error message is fine.
>>
>> On Thu, 3 Aug 2023 at 10:24, Haejoon Lee
>>  wrote:
>>
>>> Dear contributors, I hope you are doing well!
>>>
>>> I see there are contributors who are interested in working on error
>>> message improvements and persistent contribution, so I want to share an
>>> llm-based error message improvement script for helping your contribution.
>>>
>>> You can find a detail for the script at
>>> https://github.com/apache/spark/pull/41711. I believe this can help
>>> your error message improvement work, so I encourage you to take a look at
>>> the pull request and leverage the script.
>>>
>>> Please let me know if you have any questions or concerns.
>>>
>>> Thanks all for your time and contributions!
>>>
>>> Best regards,
>>>
>>> Haejoon
>>>
>>


Re: LLM script for error message improvement

2023-08-02 Thread Ruifeng Zheng
+1 from my side, I'm fine to have it as a helper script

On Thu, Aug 3, 2023 at 10:53 AM Hyukjin Kwon  wrote:

> I think adding that dev tool script to improve the error message is fine.
>
> On Thu, 3 Aug 2023 at 10:24, Haejoon Lee
>  wrote:
>
>> Dear contributors, I hope you are doing well!
>>
>> I see there are contributors who are interested in working on error
>> message improvements and persistent contribution, so I want to share an
>> llm-based error message improvement script for helping your contribution.
>>
>> You can find a detail for the script at
>> https://github.com/apache/spark/pull/41711. I believe this can help your
>> error message improvement work, so I encourage you to take a look at the
>> pull request and leverage the script.
>>
>> Please let me know if you have any questions or concerns.
>>
>> Thanks all for your time and contributions!
>>
>> Best regards,
>>
>> Haejoon
>>
>


Re: LLM script for error message improvement

2023-08-02 Thread Hyukjin Kwon
I think adding that dev tool script to improve the error message is fine.

On Thu, 3 Aug 2023 at 10:24, Haejoon Lee 
wrote:

> Dear contributors, I hope you are doing well!
>
> I see there are contributors who are interested in working on error
> message improvements and persistent contribution, so I want to share an
> llm-based error message improvement script for helping your contribution.
>
> You can find a detail for the script at
> https://github.com/apache/spark/pull/41711. I believe this can help your
> error message improvement work, so I encourage you to take a look at the
> pull request and leverage the script.
>
> Please let me know if you have any questions or concerns.
>
> Thanks all for your time and contributions!
>
> Best regards,
>
> Haejoon
>