Venkata Sai Akhil Gudesa created SPARK-44657: ------------------------------------------------
Summary: Incorrect limit handling and config parsing in Arrow collect Key: SPARK-44657 URL: https://issues.apache.org/jira/browse/SPARK-44657 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.4.1, 3.4.0, 3.4.2, 3.5.0 Reporter: Venkata Sai Akhil Gudesa In the arrow writer [code|https://github.com/apache/spark/blob/6161bf44f40f8146ea4c115c788fd4eaeb128769/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala#L154-L163] , the conditions don’t seem to hold what the documentation says regd "{_}maxBatchSize and maxRecordsPerBatch, respect whatever smaller"{_} since it seems to actually respect the conf which is "larger" (i.e less restrictive) due to _||_ operator. Further, when the `{_}CONNECT_GRPC_ARROW_MAX_BATCH_SIZE{_}` conf is read, the value is not converted to bytes from Mib ([example|https://github.com/apache/spark/blob/3e5203c64c06cc8a8560dfa0fb6f52e74589b583/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/SparkConnectPlanExecution.scala#L103]). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org