Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-10 Thread Gengliang Wang
Thanks everyone for the valuable feedback!

Given the generally positive feedback received, I plan to move forward by
initiating the voting thread. I encourage you to participate in the
upcoming thread.

Warm regards,
Gengliang

On Sat, Mar 9, 2024 at 12:55 PM Mich Talebzadeh 
wrote:

> Splendid. Thanks Gengliang
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Sat, 9 Mar 2024 at 18:10, Gengliang Wang  wrote:
>
>> Hi Mich,
>>
>> Thanks for your suggestions. I agree that we should avoid confusion with
>> Spark Structured Streaming.
>>
>> So, I'll go with "Structured Logging Framework for Apache Spark". This
>> keeps the standard term "Structured Logging" and distinguishes it from
>> "Structured Streaming" clearly.
>>
>> Thanks for helping shape this!
>>
>> Best,
>> Gengliang
>>
>> On Sat, Mar 2, 2024 at 12:19 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Gengliang,
>>>
>>> Thanks for taking the initiative to improve the Spark logging system.
>>> Transitioning to structured logs seems like a worthy way to enhance the
>>> ability to analyze and troubleshoot Spark jobs and hopefully  the future
>>> integration with cloud logging systems. While "Structured Spark Logging"
>>> sounds good, I was wondering if we could consider an alternative name.
>>> Since we already use "Spark Structured Streaming", there might be a slight
>>> initial confusion with the terminology. I must confess it was my initial
>>> reaction so to speak.
>>>
>>> Here are a few alternative names I came up with if I may
>>>
>>>- Spark Log Schema Initiative
>>>- Centralized Logging with Structured Data for Spark
>>>- Enhanced Spark Logging with Queryable Format
>>>
>>> These options all highlight the key aspects of your proposal namely;
>>> schema, centralized logging and queryability and might be even clearer for
>>> everyone at first glance.
>>>
>>> Cheers
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solutions Architect | Engineer
>>> London
>>> United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* The information provided is correct to the best of my
>>> knowledge but of course cannot be guaranteed . It is essential to note
>>> that, as with any advice, quote "one test result is worth one-thousand
>>> expert opinions (Werner
>>> Von Braun
>>> )".
>>>
>>>
>>> On Fri, 1 Mar 2024 at 10:07, Gengliang Wang  wrote:
>>>
 Hi All,

 I propose to enhance our logging system by transitioning to structured
 logs. This initiative is designed to tackle the challenges of analyzing
 distributed logs from drivers, workers, and executors by allowing them to
 be queried using a fixed schema. The goal is to improve the informativeness
 and accessibility of logs, making it significantly easier to diagnose
 issues.

 Key benefits include:

- Clarity and queryability of distributed log files.
- Continued support for log4j, allowing users to switch back to
traditional text logging if preferred.

 The improvement will simplify debugging and enhance productivity
 without disrupting existing logging practices. The implementation is
 estimated to take around 3 months.

 *SPIP*:
 https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing
 *JIRA*: SPARK-47240 

 Your comments and feedback would be greatly appreciated.

>>>


Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-09 Thread Mich Talebzadeh
Splendid. Thanks Gengliang

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Sat, 9 Mar 2024 at 18:10, Gengliang Wang  wrote:

> Hi Mich,
>
> Thanks for your suggestions. I agree that we should avoid confusion with
> Spark Structured Streaming.
>
> So, I'll go with "Structured Logging Framework for Apache Spark". This
> keeps the standard term "Structured Logging" and distinguishes it from
> "Structured Streaming" clearly.
>
> Thanks for helping shape this!
>
> Best,
> Gengliang
>
> On Sat, Mar 2, 2024 at 12:19 PM Mich Talebzadeh 
> wrote:
>
>> Hi Gengliang,
>>
>> Thanks for taking the initiative to improve the Spark logging system.
>> Transitioning to structured logs seems like a worthy way to enhance the
>> ability to analyze and troubleshoot Spark jobs and hopefully  the future
>> integration with cloud logging systems. While "Structured Spark Logging"
>> sounds good, I was wondering if we could consider an alternative name.
>> Since we already use "Spark Structured Streaming", there might be a slight
>> initial confusion with the terminology. I must confess it was my initial
>> reaction so to speak.
>>
>> Here are a few alternative names I came up with if I may
>>
>>- Spark Log Schema Initiative
>>- Centralized Logging with Structured Data for Spark
>>- Enhanced Spark Logging with Queryable Format
>>
>> These options all highlight the key aspects of your proposal namely;
>> schema, centralized logging and queryability and might be even clearer for
>> everyone at first glance.
>>
>> Cheers
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> Von Braun
>> )".
>>
>>
>> On Fri, 1 Mar 2024 at 10:07, Gengliang Wang  wrote:
>>
>>> Hi All,
>>>
>>> I propose to enhance our logging system by transitioning to structured
>>> logs. This initiative is designed to tackle the challenges of analyzing
>>> distributed logs from drivers, workers, and executors by allowing them to
>>> be queried using a fixed schema. The goal is to improve the informativeness
>>> and accessibility of logs, making it significantly easier to diagnose
>>> issues.
>>>
>>> Key benefits include:
>>>
>>>- Clarity and queryability of distributed log files.
>>>- Continued support for log4j, allowing users to switch back to
>>>traditional text logging if preferred.
>>>
>>> The improvement will simplify debugging and enhance productivity without
>>> disrupting existing logging practices. The implementation is estimated to
>>> take around 3 months.
>>>
>>> *SPIP*:
>>> https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing
>>> *JIRA*: SPARK-47240 
>>>
>>> Your comments and feedback would be greatly appreciated.
>>>
>>


Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-09 Thread Gengliang Wang
Hi Mich,

Thanks for your suggestions. I agree that we should avoid confusion with
Spark Structured Streaming.

So, I'll go with "Structured Logging Framework for Apache Spark". This
keeps the standard term "Structured Logging" and distinguishes it from
"Structured Streaming" clearly.

Thanks for helping shape this!

Best,
Gengliang

On Sat, Mar 2, 2024 at 12:19 PM Mich Talebzadeh 
wrote:

> Hi Gengliang,
>
> Thanks for taking the initiative to improve the Spark logging system.
> Transitioning to structured logs seems like a worthy way to enhance the
> ability to analyze and troubleshoot Spark jobs and hopefully  the future
> integration with cloud logging systems. While "Structured Spark Logging"
> sounds good, I was wondering if we could consider an alternative name.
> Since we already use "Spark Structured Streaming", there might be a slight
> initial confusion with the terminology. I must confess it was my initial
> reaction so to speak.
>
> Here are a few alternative names I came up with if I may
>
>- Spark Log Schema Initiative
>- Centralized Logging with Structured Data for Spark
>- Enhanced Spark Logging with Queryable Format
>
> These options all highlight the key aspects of your proposal namely;
> schema, centralized logging and queryability and might be even clearer for
> everyone at first glance.
>
> Cheers
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Fri, 1 Mar 2024 at 10:07, Gengliang Wang  wrote:
>
>> Hi All,
>>
>> I propose to enhance our logging system by transitioning to structured
>> logs. This initiative is designed to tackle the challenges of analyzing
>> distributed logs from drivers, workers, and executors by allowing them to
>> be queried using a fixed schema. The goal is to improve the informativeness
>> and accessibility of logs, making it significantly easier to diagnose
>> issues.
>>
>> Key benefits include:
>>
>>- Clarity and queryability of distributed log files.
>>- Continued support for log4j, allowing users to switch back to
>>traditional text logging if preferred.
>>
>> The improvement will simplify debugging and enhance productivity without
>> disrupting existing logging practices. The implementation is estimated to
>> take around 3 months.
>>
>> *SPIP*:
>> https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing
>> *JIRA*: SPARK-47240 
>>
>> Your comments and feedback would be greatly appreciated.
>>
>


Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-02 Thread Mich Talebzadeh
Hi Gengliang,

Thanks for taking the initiative to improve the Spark logging system.
Transitioning to structured logs seems like a worthy way to enhance the
ability to analyze and troubleshoot Spark jobs and hopefully  the future
integration with cloud logging systems. While "Structured Spark Logging"
sounds good, I was wondering if we could consider an alternative name.
Since we already use "Spark Structured Streaming", there might be a slight
initial confusion with the terminology. I must confess it was my initial
reaction so to speak.

Here are a few alternative names I came up with if I may

   - Spark Log Schema Initiative
   - Centralized Logging with Structured Data for Spark
   - Enhanced Spark Logging with Queryable Format

These options all highlight the key aspects of your proposal namely;
schema, centralized logging and queryability and might be even clearer for
everyone at first glance.

Cheers

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Fri, 1 Mar 2024 at 10:07, Gengliang Wang  wrote:

> Hi All,
>
> I propose to enhance our logging system by transitioning to structured
> logs. This initiative is designed to tackle the challenges of analyzing
> distributed logs from drivers, workers, and executors by allowing them to
> be queried using a fixed schema. The goal is to improve the informativeness
> and accessibility of logs, making it significantly easier to diagnose
> issues.
>
> Key benefits include:
>
>- Clarity and queryability of distributed log files.
>- Continued support for log4j, allowing users to switch back to
>traditional text logging if preferred.
>
> The improvement will simplify debugging and enhance productivity without
> disrupting existing logging practices. The implementation is estimated to
> take around 3 months.
>
> *SPIP*:
> https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing
> *JIRA*: SPARK-47240 
>
> Your comments and feedback would be greatly appreciated.
>


Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-02 Thread Mridul Muralidharan
Hi Gengling,

  Thanks for sharing this !
I added a few queries to the proposal doc, and we can continue discussing
there, but overall I am in favor of this.

Regards,
Mridul


On Fri, Mar 1, 2024 at 1:35 AM Gengliang Wang  wrote:

> Hi All,
>
> I propose to enhance our logging system by transitioning to structured
> logs. This initiative is designed to tackle the challenges of analyzing
> distributed logs from drivers, workers, and executors by allowing them to
> be queried using a fixed schema. The goal is to improve the informativeness
> and accessibility of logs, making it significantly easier to diagnose
> issues.
>
> Key benefits include:
>
>- Clarity and queryability of distributed log files.
>- Continued support for log4j, allowing users to switch back to
>traditional text logging if preferred.
>
> The improvement will simplify debugging and enhance productivity without
> disrupting existing logging practices. The implementation is estimated to
> take around 3 months.
>
> *SPIP*:
> https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing
> *JIRA*: SPARK-47240 
>
> Your comments and feedback would be greatly appreciated.
>


[DISCUSS] SPIP: Structured Spark Logging

2024-02-29 Thread Gengliang Wang
Hi All,

I propose to enhance our logging system by transitioning to structured
logs. This initiative is designed to tackle the challenges of analyzing
distributed logs from drivers, workers, and executors by allowing them to
be queried using a fixed schema. The goal is to improve the informativeness
and accessibility of logs, making it significantly easier to diagnose
issues.

Key benefits include:

   - Clarity and queryability of distributed log files.
   - Continued support for log4j, allowing users to switch back to
   traditional text logging if preferred.

The improvement will simplify debugging and enhance productivity without
disrupting existing logging practices. The implementation is estimated to
take around 3 months.

*SPIP*:
https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing
*JIRA*: SPARK-47240 

Your comments and feedback would be greatly appreciated.