Jungtaek and Raghu, thanks for the input. I'm happy with the verbose mode being off by default.
I think it's reasonable to have 1 or 2 levels of verbosity: 1. The first verbose mode could target new users, and take a highly opinionated view on what's important to understand streaming semantics. This would include printing the sink rows, watermark, number of dropped rows (if any), and state data. For state data, we should print for all state stores (for multiple stateful operators), but for joins, I think rendering just the KeyWithIndexToValueStore(s) is reasonable. Timestamps would render as durations (see original message) to make small examples easy to understand. 2. The second verbose mode could target more advanced users trying to create a reproduction. In addition to the first verbose mode, it would also print the other join state store, the number of evicted rows due to the watermark, and print timestamps as extended ISO 8601 strings (same as today). Rather than implementing both, I would prefer to implement the first level, and evaluate later if the second would be useful. Mich, can you elaborate on why you don't think it's useful? To reiterate, this proposal is to bring to light certain metrics/values that are essential for understanding SS micro-batching semantics. It's to help users go from 0 to 1, not 1 to 100. (And the Spark UI can't be the place for rendering sink data or state store values—there should be no sensitive user data there.) On Mon, Feb 5, 2024 at 11:32 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > I don't think adding this to the streaming flow (at micro level) will be > that useful > > However, this can be added to Spark UI as an enhancement to the Streaming > Query Statistics page. > > HTH > > Mich Talebzadeh, > Dad | Technologist | Solutions Architect | Engineer > London > United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Tue, 6 Feb 2024 at 03:49, Raghu Angadi <raghu.ang...@databricks.com> > wrote: > >> Agree, the default behavior does not need to change. >> >> Neil, how about separating it into two sections: >> >> - Actual rows in the sink (same as current output) >> - Followed by metadata data >> >>