kokes edited a comment on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792625344


   Given that we should redirect anchor links as well (thanks, @HyukjinKwon), 
we'll need to do this a bit differently. We need to check if there's an anchor 
value in the current URL and if so, change both the <meta> redirect and the 
fallback link in the HTML body itself.
   
   The implementation will then behave like so:
   - pyspark.*.html will redirect to new section homepages
   - pyspark*.html#some_function will redirect to the new page of 
api/reference/some_function.html
   - if the user doesn't have javascript (incl. some bots), 
pyspark*.html#some_function will redirect to the new section homepage
   - if the user doesn't have redirects enabled (rare), they can click the 
link, which contains the same URL
   
   I tested this locally (`python3 -m http.server` in build/html) and it works 
- both for automatic redirects and clicking the links
   
   - 
http://localhost:8000/pyspark.sql.html?highlight=from_json#pyspark.sql.functions.from_json
   - 
http://localhost:8000/pyspark.sql.html#pyspark.sql.DataFrameStatFunctions.crosstab
   - 
http://localhost:8000/pyspark.sql.html#pyspark.sql.streaming.StreamingQuery.exception
   - 
http://localhost:8000/pyspark.sql.html#pyspark.sql.streaming.StreamingQuery.id
   - http://localhost:8000/pyspark.ml.html#pyspark.ml.feature.Binarizer
   
   **BUT**, I found some modules where the anchor links didn't result in new 
HTML pages - why do some methods have their own pages and some don't?
   
   - 
http://localhost:8000/pyspark.ml.html#pyspark.ml.feature.Binarizer.inputCols
   - 
http://localhost:8000/pyspark.streaming.html#pyspark.streaming.kinesis.KinesisUtils
 (here the class doesn't have its own doc page, but its methods do...)
   - 
http://localhost:8000/pyspark.streaming.html#pyspark.streaming.kinesis.InitialPositionInStream
   
   Last but not least: I don't sanitise the anchor value in any way and use it 
as it is - I can't think of any injection that could happen there since it's a 
relative link to a reference page, but feel free to suggest some regexp check 
that the hash contains only [a-zA-Z_-0-9\.] or something.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to