neilramaswamy opened a new pull request, #47864:
URL: https://github.com/apache/spark/pull/47864

   ### What changes were proposed in this pull request?
   
   These changes break the Structured Streaming Programming Guide into smaller 
sub-pages **without changing any content**.
   
   I broke up the pages by `h1` tag; within pages, the sub-sections on the left 
menu are broken up by `h2`. The SS programming guide now will resemble the SQL 
programming guide and the MLLib programming guide.
   
   Additionally, to avoid cluttering the top-level namespace (there are dozens 
of `sql-*` files for the SQL reference), we nest all streaming docs in by one 
directory, namely the `/streaming/`. This has the side-effect of breaking links 
from our `_layouts`, since we assume a flat top-level namespace. To fix this 
issue, URLs in global layout files now all use absolute paths.
   
   This move to `/streaming/` has the consequence that bookmarks of 
`https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html`
 will not refer to the actual programming guide content. In anticipation of 
this, I have kept all pages for existing URLs present with links to the pages 
in their new locations. This includes the new state data source and the Kafka 
integration guide.
   
   In the future, we'll be able to quite easily (and in-parallel) break the 
programming guide apart further. This PR does all of the plumbing to make it 
work.
   
   
![image](https://github.com/user-attachments/assets/3eca87d4-9fb7-453c-a74a-20bd5c504d87)
   
   It is future work to fix the oddly-sized left-navigation bar for our menus.
   
   ### Why are the changes needed?
   
   One of the major hurdles that users have with Structured Streaming is that 
our guide is exceptionally long—it feels insurmountable, especially compared to 
other engines like Flink, which has many sub-pages.
   
   Google also has a very tricky time indexing the single large page; if you 
Google "[structured streaming output 
mode](https://www.google.com/search?q=structured+streaming+output+mode)" and 
you click on the link to our programming guide... nothing happens. You aren't 
taken to the actual content, since Google has trouble with indexing to specific 
heading tags.
   
   ### Does this PR introduce _any_ user-facing change?
   
   The structure of the website, with respect to Structured Streaming-related 
pages, is now changed. See the earlier parts of the PR description for the 
specific changes.
   
   However, **no** content is changed. This should make reviewing the changes 
much easier.
   
   ### How was this patch tested?
   
   I have used automated tools (e.g. 
[Lychee](https://github.com/lycheeverse/lychee)) and manual verification (i.e. 
clicking on every link) to make sure that I didn't break any links. It isn't 
fool-proof, though.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to