Re: Enhanced Console Sink for Structured Streaming

Neil Ramaswamy Tue, 06 Feb 2024 11:08:35 -0800

Jungtaek and Raghu, thanks for the input. I'm happy with the verbose mode
being off by default.

I think it's reasonable to have 1 or 2 levels of verbosity:

   1. The first verbose mode could target new users, and take a highly
   opinionated view on what's important to understand streaming semantics.
   This would include printing the sink rows, watermark, number of dropped
   rows (if any), and state data. For state data, we should print for all
   state stores (for multiple stateful operators), but for joins, I think
   rendering just the KeyWithIndexToValueStore(s) is reasonable. Timestamps
   would render as durations (see original message) to make small examples
   easy to understand.
   2. The second verbose mode could target more advanced users trying to
   create a reproduction. In addition to the first verbose mode, it would also
   print the other join state store, the number of evicted rows due to the
   watermark, and print timestamps as extended ISO 8601 strings (same as
   today).

Rather than implementing both, I would prefer to implement the first level,
and evaluate later if the second would be useful.

Mich, can you elaborate on why you don't think it's useful? To reiterate,
this proposal is to bring to light certain metrics/values that are
essential for understanding SS micro-batching semantics. It's to help users
go from 0 to 1, not 1 to 100. (And the Spark UI can't be the place for
rendering sink data or state store values—there should be no sensitive user
data there.)

On Mon, Feb 5, 2024 at 11:32 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> I don't think adding this to the streaming flow (at micro level) will be
> that useful
>
> However, this can be added to Spark UI as an enhancement to the Streaming
> Query Statistics page.
>
> HTH
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 6 Feb 2024 at 03:49, Raghu Angadi <raghu.ang...@databricks.com>
> wrote:
>
>> Agree, the default behavior does not need to change.
>>
>> Neil, how about separating it into two sections:
>>
>>    - Actual rows in the sink (same as current output)
>>    - Followed by metadata data
>>
>>

Re: Enhanced Console Sink for Structured Streaming

Reply via email to