Re: [PR] [SPARK-54895][SDP][DOCS] Fix markdown issues in Spark Declarative Pipelines Programming Guide [spark]

via GitHub Tue, 13 Jan 2026 08:48:44 -0800


sryza commented on code in PR #53671:
URL: https://github.com/apache/spark/pull/53671#discussion_r2687226804



##########
docs/declarative-pipelines-programming-guide.md:
##########
@@ -24,22 +24,29 @@ license: |
 
 ## What is Spark Declarative Pipelines (SDP)?
 
-Spark Declarative Pipelines (SDP) is a declarative framework for building 
reliable, maintainable, and testable data pipelines on Apache Spark. SDP 
simplifies ETL development by allowing you to focus on the transformations you 
want to apply to your data, rather than the mechanics of pipeline execution.
+<!-- rumdl-disable MD013 -->

Review Comment:
   In that case, my opinion would be to avoid introducing anything that 
requires docs contributors to split up paragraphs across multiple lines, limit 
the length of lines, or use special flags to allow lines to go over a limit. It 
would add additional burden to docs contributors without any user-facing 
benefit.



##########
docs/declarative-pipelines-programming-guide.md:
##########
@@ -453,22 +476,24 @@ SELECT * FROM STREAM(customers_us_east);
 
 ### Python Considerations
 
-- SDP evaluates the code that defines a pipeline multiple times during 
planning and pipeline runs. Python functions that define datasets should 
include only the code required to define the table or view.
-- The function used to define a dataset must return a `pyspark.sql.DataFrame`.
-- Never use methods that save or write to files or tables as part of your SDP 
dataset code.
-- When using the `for` loop pattern to define datasets in Python, ensure that 
the list of values passed to the `for` loop is always additive.
+* SDP evaluates the code that defines a pipeline multiple times during 
planning and pipeline runs.
+    Python functions that define datasets should include only the code 
required to define the table or view.
+* The function used to define a dataset must return a `pyspark.sql.DataFrame`.
+* Never use methods that save or write to files or tables as part of your SDP 
dataset code.
+* When using the `for` loop pattern to define datasets in Python,
+    ensure that the list of values passed to the `for` loop is always additive.
 
 Examples of Spark SQL operations that should never be used in SDP code:
 
-- `collect()`
-- `count()`
-- `pivot()`
-- `toPandas()`
-- `save()`
-- `saveAsTable()`
-- `start()`
-- `toTable()`
+* `collect()`

Review Comment:
   IMO it's more important to have a consistent convention within Spark docs 
than to satisfy an external linter. If there's some particular reason that 
asterixes are better than dashes (e.g. if they avoid some common sort of bug), 
then I could see the value of a change, but otherwise I'd bias towards avoiding 
unnecessary changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54895][SDP][DOCS] Fix markdown issues in Spark Declarative Pipelines Programming Guide [spark]

Reply via email to