[jira] [Assigned] (SPARK-43295) Make DataFrameGroupBy.sum support for string type columns
[ https://issues.apache.org/jira/browse/SPARK-43295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43295: - Assignee: Haejoon Lee > Make DataFrameGroupBy.sum support for string type columns > - > > Key: SPARK-43295 > URL: https://issues.apache.org/jira/browse/SPARK-43295 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > From pandas 2.0.0, DataFrameGroupBy.sum also works for string type columns: > {code:java} > >>> psdf > A B C D > 0 1 3.1 a True > 1 2 4.1 b False > 2 1 4.1 b False > 3 2 3.1 a True > >>> psdf.groupby("A").sum().sort_index() > B D > A > 1 7.2 1 > 2 7.2 1 > >>> psdf.to_pandas().groupby("A").sum().sort_index() > B C D > A > 1 7.2 ab 1 > 2 7.2 ba 1 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43295) Make DataFrameGroupBy.sum support for string type columns
[ https://issues.apache.org/jira/browse/SPARK-43295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43295: --- Labels: pull-request-available (was: ) > Make DataFrameGroupBy.sum support for string type columns > - > > Key: SPARK-43295 > URL: https://issues.apache.org/jira/browse/SPARK-43295 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > From pandas 2.0.0, DataFrameGroupBy.sum also works for string type columns: > {code:java} > >>> psdf > A B C D > 0 1 3.1 a True > 1 2 4.1 b False > 2 1 4.1 b False > 3 2 3.1 a True > >>> psdf.groupby("A").sum().sort_index() > B D > A > 1 7.2 1 > 2 7.2 1 > >>> psdf.to_pandas().groupby("A").sum().sort_index() > B C D > A > 1 7.2 ab 1 > 2 7.2 ba 1 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43295) Make DataFrameGroupBy.sum support for string type columns
[ https://issues.apache.org/jira/browse/SPARK-43295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43295. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42798 [https://github.com/apache/spark/pull/42798] > Make DataFrameGroupBy.sum support for string type columns > - > > Key: SPARK-43295 > URL: https://issues.apache.org/jira/browse/SPARK-43295 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > From pandas 2.0.0, DataFrameGroupBy.sum also works for string type columns: > {code:java} > >>> psdf > A B C D > 0 1 3.1 a True > 1 2 4.1 b False > 2 1 4.1 b False > 3 2 3.1 a True > >>> psdf.groupby("A").sum().sort_index() > B D > A > 1 7.2 1 > 2 7.2 1 > >>> psdf.to_pandas().groupby("A").sum().sort_index() > B C D > A > 1 7.2 ab 1 > 2 7.2 ba 1 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45113) Refine docstrings of `collect_list/collect_set`
[ https://issues.apache.org/jira/browse/SPARK-45113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45113: --- Labels: pull-request-available (was: ) > Refine docstrings of `collect_list/collect_set` > --- > > Key: SPARK-45113 > URL: https://issues.apache.org/jira/browse/SPARK-45113 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45113) Refine docstrings of `collect_list/collect_set`
Yang Jie created SPARK-45113: Summary: Refine docstrings of `collect_list/collect_set` Key: SPARK-45113 URL: https://issues.apache.org/jira/browse/SPARK-45113 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43291) Generate proper warning on different behavior with numeric_only
[ https://issues.apache.org/jira/browse/SPARK-43291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43291: --- Labels: pull-request-available (was: ) > Generate proper warning on different behavior with numeric_only > --- > > Key: SPARK-43291 > URL: https://issues.apache.org/jira/browse/SPARK-43291 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Should enable test below: > {code:java} > pdf = pd.DataFrame([("1", "2"), ("0", "3"), ("2", "0"), ("1", "1")], > columns=["a", "b"]) > psdf = ps.from_pandas(pdf) > self.assert_eq(pdf.cov(), psdf.cov()) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44940) Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled
[ https://issues.apache.org/jira/browse/SPARK-44940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44940: --- Labels: correctness pull-request-available (was: correctness) > Improve performance of JSON parsing when > "spark.sql.json.enablePartialResults" is enabled > - > > Key: SPARK-44940 > URL: https://issues.apache.org/jira/browse/SPARK-44940 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.5.0, 4.0.0 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Labels: correctness, pull-request-available > Fix For: 3.4.2, 3.5.1 > > > Follow-up on https://issues.apache.org/jira/browse/SPARK-40646. > I found that JSON parsing is significantly slower due to exception creation > in control flow. Also, some fields are not parsed correctly and the exception > is thrown in certain cases: > {code:java} > Caused by: java.lang.ClassCastException: > org.apache.spark.sql.catalyst.util.GenericArrayData cannot be cast to > org.apache.spark.sql.catalyst.InternalRow > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct(rows.scala:51) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct$(rows.scala:51) > at > org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getStruct(rows.scala:195) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:590) > ... 39 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43252) Assign a name to the error class _LEGACY_ERROR_TEMP_2016
[ https://issues.apache.org/jira/browse/SPARK-43252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43252: --- Labels: pull-request-available starter (was: starter) > Assign a name to the error class _LEGACY_ERROR_TEMP_2016 > > > Key: SPARK-43252 > URL: https://issues.apache.org/jira/browse/SPARK-43252 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2016* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43251) Assign a name to the error class _LEGACY_ERROR_TEMP_2015
[ https://issues.apache.org/jira/browse/SPARK-43251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43251: --- Labels: pull-request-available starter (was: starter) > Assign a name to the error class _LEGACY_ERROR_TEMP_2015 > > > Key: SPARK-43251 > URL: https://issues.apache.org/jira/browse/SPARK-43251 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2015* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44942) Use Jira notification options to sync with Github
[ https://issues.apache.org/jira/browse/SPARK-44942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44942: --- Labels: pull-request-available (was: ) > Use Jira notification options to sync with Github > - > > Key: SPARK-44942 > URL: https://issues.apache.org/jira/browse/SPARK-44942 > Project: Spark > Issue Type: Github Integration > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > h3. Jira notification options > You can use the file to enable Jira notifications that will fire when a > GitHub issue or pull request has a ticket in its title, such as > {{{}"[TICKET-1234] Improve foo bar"{}}}. > You can set one or more of these options: > * {{{}comment{}}}: Add the PR/issue event as a comment in the referenced > Jira ticket. > * {{{}worklog{}}}: Add the event as a worklog entry instead of a comment in > the Jira ticket you reference. > * {{{}label{}}}: Add a 'pull-request-available' label to referenced tickets. > * {{{}link{}}}: When you create a GitHub PR/issue, embed a link to the PR or > issue in the Jira ticket you reference. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45068) Make math function output column name displayed in upper-cased style
[ https://issues.apache.org/jira/browse/SPARK-45068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45068: --- Labels: pull-request-available (was: ) > Make math function output column name displayed in upper-cased style > > > Key: SPARK-45068 > URL: https://issues.apache.org/jira/browse/SPARK-45068 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45044) Refine docstring of `groupBy/rollup/cube`
[ https://issues.apache.org/jira/browse/SPARK-45044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-45044. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42834 [https://github.com/apache/spark/pull/42834] > Refine docstring of `groupBy/rollup/cube` > - > > Key: SPARK-45044 > URL: https://issues.apache.org/jira/browse/SPARK-45044 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45044) Refine docstring of `groupBy/rollup/cube`
[ https://issues.apache.org/jira/browse/SPARK-45044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-45044: - Assignee: BingKun Pan > Refine docstring of `groupBy/rollup/cube` > - > > Key: SPARK-45044 > URL: https://issues.apache.org/jira/browse/SPARK-45044 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45044) Refine docstring of `groupBy/rollup/cube`
[ https://issues.apache.org/jira/browse/SPARK-45044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45044: --- Labels: pull-request-available (was: ) > Refine docstring of `groupBy/rollup/cube` > - > > Key: SPARK-45044 > URL: https://issues.apache.org/jira/browse/SPARK-45044 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40688) Support data masking built-in function 'mask_first_n'
[ https://issues.apache.org/jira/browse/SPARK-40688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-40688: --- Labels: pull-request-available (was: ) > Support data masking built-in function 'mask_first_n' > -- > > Key: SPARK-40688 > URL: https://issues.apache.org/jira/browse/SPARK-40688 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vinod KC >Priority: Minor > Labels: pull-request-available > > Support data masking built-in function *mask_first_n* > Return a masked version of str with the first n values masked. Upper case > letters should be converted to "X", lower case letters should be converted to > "x" and numbers should be converted to "n". For example, > mask_first_n("1234-5678-8765-4321", 4) results in -5678-8765-4321. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43384) Make `df.show` print a nice string for MapType
[ https://issues.apache.org/jira/browse/SPARK-43384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43384: --- Labels: pull-request-available (was: ) > Make `df.show` print a nice string for MapType > -- > > Key: SPARK-43384 > URL: https://issues.apache.org/jira/browse/SPARK-43384 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: yikaifei >Priority: Minor > Labels: pull-request-available > > Make `df.show` print a nice string for MapType. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45027) Hide internal functions/variables in `pyspark.sql.functions` from auto-completion
[ https://issues.apache.org/jira/browse/SPARK-45027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-45027. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42745 [https://github.com/apache/spark/pull/42745] > Hide internal functions/variables in `pyspark.sql.functions` from > auto-completion > - > > Key: SPARK-45027 > URL: https://issues.apache.org/jira/browse/SPARK-45027 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45027) Hide internal functions/variables in `pyspark.sql.functions` from auto-completion
[ https://issues.apache.org/jira/browse/SPARK-45027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45027: --- Labels: pull-request-available (was: ) > Hide internal functions/variables in `pyspark.sql.functions` from > auto-completion > - > > Key: SPARK-45027 > URL: https://issues.apache.org/jira/browse/SPARK-45027 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45027) Hide internal functions/variables in `pyspark.sql.functions` from auto-completion
[ https://issues.apache.org/jira/browse/SPARK-45027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-45027: - Assignee: Ruifeng Zheng > Hide internal functions/variables in `pyspark.sql.functions` from > auto-completion > - > > Key: SPARK-45027 > URL: https://issues.apache.org/jira/browse/SPARK-45027 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45109) Fix eas_decrypt and ln in connect
[ https://issues.apache.org/jira/browse/SPARK-45109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-45109: - Assignee: Peter Toth > Fix eas_decrypt and ln in connect > - > > Key: SPARK-45109 > URL: https://issues.apache.org/jira/browse/SPARK-45109 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Peter Toth >Assignee: Peter Toth >Priority: Major > Labels: pull-request-available > > The current {{eas_descrypt}} reference to {{aes_encrypt}} is clearly a bug. > The {{ln}} reference to {{log}} is more like a cosmetic issue, but because > {{ln}} and {{log}} function implementations are different in Spark SQL we > should use the same implementation in Spark Connect too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45109) Fix eas_decrypt and ln in connect
[ https://issues.apache.org/jira/browse/SPARK-45109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-45109. --- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request 42863 [https://github.com/apache/spark/pull/42863] > Fix eas_decrypt and ln in connect > - > > Key: SPARK-45109 > URL: https://issues.apache.org/jira/browse/SPARK-45109 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Peter Toth >Assignee: Peter Toth >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 4.0.0 > > > The current {{eas_descrypt}} reference to {{aes_encrypt}} is clearly a bug. > The {{ln}} reference to {{log}} is more like a cosmetic issue, but because > {{ln}} and {{log}} function implementations are different in Spark SQL we > should use the same implementation in Spark Connect too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42304) Assign name to _LEGACY_ERROR_TEMP_2189
[ https://issues.apache.org/jira/browse/SPARK-42304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-42304: --- Labels: pull-request-available (was: ) > Assign name to _LEGACY_ERROR_TEMP_2189 > -- > > Key: SPARK-42304 > URL: https://issues.apache.org/jira/browse/SPARK-42304 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Valentin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44912) Spark 3.4 multi-column sum slows with many columns
[ https://issues.apache.org/jira/browse/SPARK-44912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763455#comment-17763455 ] Bruce Robbins commented on SPARK-44912: --- It looks like this was fixed with SPARK-45071. Your issue was reported earlier, but missed somehow. > Spark 3.4 multi-column sum slows with many columns > -- > > Key: SPARK-44912 > URL: https://issues.apache.org/jira/browse/SPARK-44912 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0, 3.4.1 >Reporter: Brady Bickel >Priority: Major > > The code below is a minimal reproducible example of an issue I discovered > with Pyspark 3.4.x. I want to sum the values of multiple columns and put the > sum of those columns (per row) into a new column. This code works and returns > in a reasonable amount of time in Pyspark 3.3.x, but is extremely slow in > Pyspark 3.4.x when the number of columns grows. See below for execution > timing summary as N varies. > {code:java} > import pyspark.sql.functions as F > import random > import string > from functools import reduce > from operator import add > from pyspark.sql import SparkSession > spark = SparkSession.builder.getOrCreate() > # generate a dataframe N columns by M rows with random 8 digit column > # names and random integers in [-5,10] > N = 30 > M = 100 > columns = [''.join(random.choices(string.ascii_uppercase + > string.digits, k=8)) >for _ in range(N)] > data = [tuple([random.randint(-5,10) for _ in range(N)]) > for _ in range(M)] > df = spark.sparkContext.parallelize(data).toDF(columns) > # 3 ways to add a sum column, all of them slow for high N in spark 3.4 > df = df.withColumn("col_sum1", sum(df[col] for col in columns)) > df = df.withColumn("col_sum2", reduce(add, [F.col(col) for col in columns])) > df = df.withColumn("col_sum3", F.expr("+".join(columns))) {code} > Timing results for Spark 3.3: > ||N||Exe Time (s)|| > |5|0.514| > |10|0.248| > |15|0.327| > |20|0.403| > |25|0.279| > |30|0.322| > |50|0.430| > Timing results for Spark 3.4: > ||N||Exe Time (s)|| > |5|0.379| > |10|0.318| > |15|0.405| > |20|1.32| > |25|28.8| > |30|448| > |50|>1 (did not finish)| -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to initiate retry
[ https://issues.apache.org/jira/browse/SPARK-44756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44756: --- Labels: pull-request-available (was: ) > Executor hangs when RetryingBlockTransferor fails to initiate retry > --- > > Key: SPARK-44756 > URL: https://issues.apache.org/jira/browse/SPARK-44756 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 3.3.1 >Reporter: Harunobu Daikoku >Priority: Minor > Labels: pull-request-available > > We have been observing this issue several times in our production where some > executors are being stuck at BlockTransferService#fetchBlockSync(). > After some investigation, the issue seems to be caused by an unhandled edge > case in RetryingBlockTransferor. > 1. Shuffle transfer fails for whatever reason > {noformat} > java.io.IOException: Cannot allocate memory > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:51) > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) > at > org.apache.spark.network.shuffle.SimpleDownloadFile$SimpleDownloadWritableChannel.write(SimpleDownloadFile.java:78) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onData(OneForOneBlockFetcher.java:340) > at > org.apache.spark.network.client.StreamInterceptor.handle(StreamInterceptor.java:79) > at > org.apache.spark.network.util.TransportFrameDecoder.feedInterceptor(TransportFrameDecoder.java:263) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:87) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > {noformat} > 2. The above exception caught by > [AbstractChannelHandlerContext#invokeChannelRead()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L381], > and propagated to the exception handler > 3. Exception reaches > [RetryingBlockTransferor#initiateRetry()|https://github.com/apache/spark/blob/v3.3.1/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java#L178-L180], > and it tries to initiate retry > {noformat} > 23/08/09 16:58:37 shuffle-client-4-2 INFO RetryingBlockTransferor: Retrying > fetch (1/3) for 1 outstanding blocks after 5000 ms > {noformat} > 4. Retry initiation fails (in our case, it fails to create a new thread) > 5. Exception caught by > [AbstractChannelHandlerContext#invokeExceptionCaught()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L305-L309], > and not further processed > {noformat} > 23/08/09 16:58:53 shuffle-client-4-2 DEBUG AbstractChannelHandlerContext: An > exception java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:719) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor.initiateRetry(RetryingBlockTransferor.java:182) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor.access$500(RetryingBlockTransferor.java:43) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.handleBlockTransferFailure(RetryingBlockTransferor.java:230) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.onBlockFetchFailure(RetryingBlockTransferor.java:260) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher.failRemainingBlocks(OneForOneBlockFetcher.java:318) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher.access$300(OneForOneBlockFetcher.java:55) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onFailure(OneForOneBlockFetcher.java:357) > at > org.apache.spark.network.client.StreamInterceptor.exceptionCaught(StreamInterceptor.java:56) > at > org.apache.spark.network.util.TransportFrameDecoder.exceptionCaught(TransportFrameDecoder.java:231) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) > {noformat} > 6. After all,
[jira] [Updated] (SPARK-45112) Use UnresolvedFunction in dataset functions
[ https://issues.apache.org/jira/browse/SPARK-45112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45112: --- Labels: pull-request-available (was: ) > Use UnresolvedFunction in dataset functions > --- > > Key: SPARK-45112 > URL: https://issues.apache.org/jira/browse/SPARK-45112 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Peter Toth >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44306) Group FileStatus with few RPC calls within Yarn Client
[ https://issues.apache.org/jira/browse/SPARK-44306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44306: --- Labels: pull-request-available (was: ) > Group FileStatus with few RPC calls within Yarn Client > -- > > Key: SPARK-44306 > URL: https://issues.apache.org/jira/browse/SPARK-44306 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Affects Versions: 0.9.2, 2.3.0, 3.5.0 >Reporter: SHU WANG >Priority: Major > Labels: pull-request-available > > It's inefficient to obtain *FileStatus* for each resource [one by > one|https://github.com/apache/spark/blob/531ec8bddc8dd22ca39486dbdd31e62e989ddc15/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientDistributedCacheManager.scala#L71C1]. > In our company setting, we are running Spark with Hadoop Yarn and HDFS. We > noticed the current behavior has two major drawbacks: > # Since each *getFileStatus* call involves network delays, the overall delay > can be *large* and add *uncertainty* to the overall Spark job runtime. > Specifically, we quantify this overhead within our cluster. We see the p50 > overhead is around 10s, p80 is 1 min, and p100 is up to 15 mins. When HDFS is > overloaded, the delays become more severe. > # In our cluster, we have nearly 100 million *getFileStatus* call to HDFS > daily. We noticed that in our cluster, most resources come from the same HDFS > directory for each user (See our [engineer blog > post|https://engineering.linkedin.com/blog/2023/reducing-apache-spark-application-dependencies-upload-by-99-] > about why we took this approach). Therefore, we can greatly reduce nearly > 100 million *getFileStatus* call to 0.1 million *listStatus* calls daily. > This will further reduce overhead from the HDFS side. > All in all, a more efficient way to fetch the *FileStatus* for each resource > is highly needed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45112) Use UnresolvedFunction in dataset functions
Peter Toth created SPARK-45112: -- Summary: Use UnresolvedFunction in dataset functions Key: SPARK-45112 URL: https://issues.apache.org/jira/browse/SPARK-45112 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Peter Toth -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45111) Upgrade maven to 3.9.4
[ https://issues.apache.org/jira/browse/SPARK-45111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45111: --- Labels: pull-request-available (was: ) > Upgrade maven to 3.9.4 > -- > > Key: SPARK-45111 > URL: https://issues.apache.org/jira/browse/SPARK-45111 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45111) Upgrade maven to 3.9.4
Yang Jie created SPARK-45111: Summary: Upgrade maven to 3.9.4 Key: SPARK-45111 URL: https://issues.apache.org/jira/browse/SPARK-45111 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45109) Fix eas_decrypt and ln in connect
[ https://issues.apache.org/jira/browse/SPARK-45109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45109: -- Assignee: Apache Spark > Fix eas_decrypt and ln in connect > - > > Key: SPARK-45109 > URL: https://issues.apache.org/jira/browse/SPARK-45109 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Peter Toth >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > The current {{eas_descrypt}} reference to {{aes_encrypt}} is clearly a bug. > The {{ln}} reference to {{log}} is more like a cosmetic issue, but because > {{ln}} and {{log}} function implementations are different in Spark SQL we > should use the same implementation in Spark Connect too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45109) Fix eas_decrypt and ln in connect
[ https://issues.apache.org/jira/browse/SPARK-45109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45109: -- Assignee: (was: Apache Spark) > Fix eas_decrypt and ln in connect > - > > Key: SPARK-45109 > URL: https://issues.apache.org/jira/browse/SPARK-45109 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Peter Toth >Priority: Major > Labels: pull-request-available > > The current {{eas_descrypt}} reference to {{aes_encrypt}} is clearly a bug. > The {{ln}} reference to {{log}} is more like a cosmetic issue, but because > {{ln}} and {{log}} function implementations are different in Spark SQL we > should use the same implementation in Spark Connect too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45109) Fix eas_decrypt and ln in connect
[ https://issues.apache.org/jira/browse/SPARK-45109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Toth updated SPARK-45109: --- Description: The current {{eas_descrypt}} reference to {{aes_encrypt}} is clearly a bug. The {{ln}} reference to {{log}} is more like a cosmetic issue, but because {{ln}} and {{log}} function implementations are different in Spark SQL we should use the same implementation in Spark Connect too. > Fix eas_decrypt and ln in connect > - > > Key: SPARK-45109 > URL: https://issues.apache.org/jira/browse/SPARK-45109 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Peter Toth >Priority: Major > Labels: pull-request-available > > The current {{eas_descrypt}} reference to {{aes_encrypt}} is clearly a bug. > The {{ln}} reference to {{log}} is more like a cosmetic issue, but because > {{ln}} and {{log}} function implementations are different in Spark SQL we > should use the same implementation in Spark Connect too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45109) Fix eas_decrypt and ln in connect
[ https://issues.apache.org/jira/browse/SPARK-45109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45109: -- Assignee: (was: Apache Spark) > Fix eas_decrypt and ln in connect > - > > Key: SPARK-45109 > URL: https://issues.apache.org/jira/browse/SPARK-45109 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Peter Toth >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45110) Upgrade rocksdbjni to 8.5.3
[ https://issues.apache.org/jira/browse/SPARK-45110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45110: -- Assignee: Apache Spark > Upgrade rocksdbjni to 8.5.3 > --- > > Key: SPARK-45110 > URL: https://issues.apache.org/jira/browse/SPARK-45110 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45110) Upgrade rocksdbjni to 8.5.3
[ https://issues.apache.org/jira/browse/SPARK-45110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45110: -- Assignee: (was: Apache Spark) > Upgrade rocksdbjni to 8.5.3 > --- > > Key: SPARK-45110 > URL: https://issues.apache.org/jira/browse/SPARK-45110 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45109) Fix eas_decrypt and ln in connect
[ https://issues.apache.org/jira/browse/SPARK-45109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45109: -- Assignee: Apache Spark > Fix eas_decrypt and ln in connect > - > > Key: SPARK-45109 > URL: https://issues.apache.org/jira/browse/SPARK-45109 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Peter Toth >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45109) Fix eas_decrypt and ln in connect
[ https://issues.apache.org/jira/browse/SPARK-45109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45109: --- Labels: pull-request-available (was: ) > Fix eas_decrypt and ln in connect > - > > Key: SPARK-45109 > URL: https://issues.apache.org/jira/browse/SPARK-45109 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Peter Toth >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45110) Upgrade rocksdbjni to 8.5.3
[ https://issues.apache.org/jira/browse/SPARK-45110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45110: --- Labels: pull-request-available (was: ) > Upgrade rocksdbjni to 8.5.3 > --- > > Key: SPARK-45110 > URL: https://issues.apache.org/jira/browse/SPARK-45110 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45110) Upgrade rocksdbjni to 8.5.3
BingKun Pan created SPARK-45110: --- Summary: Upgrade rocksdbjni to 8.5.3 Key: SPARK-45110 URL: https://issues.apache.org/jira/browse/SPARK-45110 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45109) Fix eas_decrypt and ln in connect
Peter Toth created SPARK-45109: -- Summary: Fix eas_decrypt and ln in connect Key: SPARK-45109 URL: https://issues.apache.org/jira/browse/SPARK-45109 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0, 4.0.0 Reporter: Peter Toth -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org