[jira] [Commented] (SPARK-34606) New PySpark documentation has different URLs
[ https://issues.apache.org/jira/browse/SPARK-34606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296795#comment-17296795 ] Apache Spark commented on SPARK-34606: -- User 'kokes' has created a pull request for this issue: https://github.com/apache/spark/pull/31770 > New PySpark documentation has different URLs > > > Key: SPARK-34606 > URL: https://issues.apache.org/jira/browse/SPARK-34606 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 3.1.1 >Reporter: Ondrej Kokes >Priority: Minor > > The new documentation site moved some subsites to different URLs, notably the > PySpark API reference ([see > here|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html]). > (Note the new `/reference/` bit in the new URL.) > It's the first hit when you google "pyspark sql functions", you'll also get > there if you search for individual functions or modules (e.g. "pyspark > streaming"). > I looked through various JIRA tickets and pull requests, but couldn't find a > mention of this. Even the pull request introducing the new documentation site > mentions the only visible change to users is the design, not its location. > Possible resolution: > * let the links be refreshed by search engines and live with dead links in > various places (stack overflow, emails, bookmarks, ...) > * identify the missing pages and provide a 301 redirects for these (could be > found in logs, google analytics, or maybe we can list all assets generated > before/now and diff them) > * change sphinx configuration to result in identical links as before > Links to potentially relevant tickets and PRs: > * https://issues.apache.org/jira/browse/SPARK-31851 > * https://github.com/apache/spark/pull/29188 > * https://issues.apache.org/jira/browse/SPARK-32188 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34606) New PySpark documentation has different URLs
[ https://issues.apache.org/jira/browse/SPARK-34606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296794#comment-17296794 ] Ondrej Kokes commented on SPARK-34606: -- [~hyukjin.kwon] gave it a go and [submitted a PR|https://github.com/apache/spark/pull/31770] > New PySpark documentation has different URLs > > > Key: SPARK-34606 > URL: https://issues.apache.org/jira/browse/SPARK-34606 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 3.1.1 >Reporter: Ondrej Kokes >Priority: Minor > > The new documentation site moved some subsites to different URLs, notably the > PySpark API reference ([see > here|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html]). > (Note the new `/reference/` bit in the new URL.) > It's the first hit when you google "pyspark sql functions", you'll also get > there if you search for individual functions or modules (e.g. "pyspark > streaming"). > I looked through various JIRA tickets and pull requests, but couldn't find a > mention of this. Even the pull request introducing the new documentation site > mentions the only visible change to users is the design, not its location. > Possible resolution: > * let the links be refreshed by search engines and live with dead links in > various places (stack overflow, emails, bookmarks, ...) > * identify the missing pages and provide a 301 redirects for these (could be > found in logs, google analytics, or maybe we can list all assets generated > before/now and diff them) > * change sphinx configuration to result in identical links as before > Links to potentially relevant tickets and PRs: > * https://issues.apache.org/jira/browse/SPARK-31851 > * https://github.com/apache/spark/pull/29188 > * https://issues.apache.org/jira/browse/SPARK-32188 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34606) New PySpark documentation has different URLs
[ https://issues.apache.org/jira/browse/SPARK-34606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296749#comment-17296749 ] Hyukjin Kwon commented on SPARK-34606: -- Yeah, I noticed this problem too. One simple solution is to redirect to the root page of new documentation at least. I don't think it's feasible to map each link to the legacy ones. > New PySpark documentation has different URLs > > > Key: SPARK-34606 > URL: https://issues.apache.org/jira/browse/SPARK-34606 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 3.1.1 >Reporter: Ondrej Kokes >Priority: Minor > > The new documentation site moved some subsites to different URLs, notably the > PySpark API reference ([see > here|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html]). > (Note the new `/reference/` bit in the new URL.) > It's the first hit when you google "pyspark sql functions", you'll also get > there if you search for individual functions or modules (e.g. "pyspark > streaming"). > I looked through various JIRA tickets and pull requests, but couldn't find a > mention of this. Even the pull request introducing the new documentation site > mentions the only visible change to users is the design, not its location. > Possible resolution: > * let the links be refreshed by search engines and live with dead links in > various places (stack overflow, emails, bookmarks, ...) > * identify the missing pages and provide a 301 redirects for these (could be > found in logs, google analytics, or maybe we can list all assets generated > before/now and diff them) > * change sphinx configuration to result in identical links as before > Links to potentially relevant tickets and PRs: > * https://issues.apache.org/jira/browse/SPARK-31851 > * https://github.com/apache/spark/pull/29188 > * https://issues.apache.org/jira/browse/SPARK-32188 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34606) New PySpark documentation has different URLs
[ https://issues.apache.org/jira/browse/SPARK-34606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296750#comment-17296750 ] Hyukjin Kwon commented on SPARK-34606: -- [~ondrej], are you working on this? Any PR will be very welcome on this. > New PySpark documentation has different URLs > > > Key: SPARK-34606 > URL: https://issues.apache.org/jira/browse/SPARK-34606 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 3.1.1 >Reporter: Ondrej Kokes >Priority: Minor > > The new documentation site moved some subsites to different URLs, notably the > PySpark API reference ([see > here|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html]). > (Note the new `/reference/` bit in the new URL.) > It's the first hit when you google "pyspark sql functions", you'll also get > there if you search for individual functions or modules (e.g. "pyspark > streaming"). > I looked through various JIRA tickets and pull requests, but couldn't find a > mention of this. Even the pull request introducing the new documentation site > mentions the only visible change to users is the design, not its location. > Possible resolution: > * let the links be refreshed by search engines and live with dead links in > various places (stack overflow, emails, bookmarks, ...) > * identify the missing pages and provide a 301 redirects for these (could be > found in logs, google analytics, or maybe we can list all assets generated > before/now and diff them) > * change sphinx configuration to result in identical links as before > Links to potentially relevant tickets and PRs: > * https://issues.apache.org/jira/browse/SPARK-31851 > * https://github.com/apache/spark/pull/29188 > * https://issues.apache.org/jira/browse/SPARK-32188 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34606) New PySpark documentation has different URLs
[ https://issues.apache.org/jira/browse/SPARK-34606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17294423#comment-17294423 ] Ondrej Kokes commented on SPARK-34606: -- I tried building HTML docs for PySpark 2.4.7 and the current master and here's the one-way diff (set(2.4.7) - set(master)). missing docs * build/html/pyspark.html * build/html/pyspark.ml.html * build/html/pyspark.mllib.html * build/html/pyspark.sql.html * build/html/pyspark.streaming.html other pages not present (module code): * build/html/_modules/pyspark/profiler.html * build/html/_modules/pyspark/serializers.html * build/html/_modules/pyspark/sql/catalog.html * build/html/_modules/pyspark/sql/context.html * build/html/_modules/pyspark/sql/udf.html * build/html/_modules/pyspark/status.html * build/html/_modules/pyspark/streaming/flume.html * build/html/_modules/pyspark/streaming/kafka.html * build/html/_modules/pyspark/streaming/listener.html > New PySpark documentation has different URLs > > > Key: SPARK-34606 > URL: https://issues.apache.org/jira/browse/SPARK-34606 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 3.1.1 >Reporter: Ondrej Kokes >Priority: Minor > > The new documentation site moved some subsites to different URLs, notably the > PySpark API reference ([see > here|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html]). > (Note the new `/reference/` bit in the new URL.) > It's the first hit when you google "pyspark sql functions", you'll also get > there if you search for individual functions or modules (e.g. "pyspark > streaming"). > I looked through various JIRA tickets and pull requests, but couldn't find a > mention of this. Even the pull request introducing the new documentation site > mentions the only visible change to users is the design, not its location. > Possible resolution: > * let the links be refreshed by search engines and live with dead links in > various places (stack overflow, emails, bookmarks, ...) > * identify the missing pages and provide a 301 redirects for these (could be > found in logs, google analytics, or maybe we can list all assets generated > before/now and diff them) > * change sphinx configuration to result in identical links as before > Links to potentially relevant tickets and PRs: > * https://issues.apache.org/jira/browse/SPARK-31851 > * https://github.com/apache/spark/pull/29188 > * https://issues.apache.org/jira/browse/SPARK-32188 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org