[GitHub] [spark] HyukjinKwon commented on a change in pull request #33362: [SPARK-36153][SQL] Update transform doc to current code

GitBox Thu, 15 Jul 2021 03:10:14 -0700


HyukjinKwon commented on a change in pull request #33362:
URL: https://github.com/apache/spark/pull/33362#discussion_r670326046




##########
File path: docs/sql-ref-syntax-qry-select-transform.md
##########
@@ -57,16 +65,38 @@ SELECT TRANSFORM ( expression [ , ... ] )
 
     Specifies a command or a path to script to process data.
 
-### SerDe behavior
+### ROW FORMAT DELIMITED BEHAVIOR
+
+When spark use `ROW FORMAT DELIMITED` format, Spark will use `\u0001` as 
default filed delimit,
+use `\n` as default line delimit and use `"\N"` as `NULL` value in order to 
differentiate `NULL` values 
+from empty strings. These delimit can be overridden by `FIELDS TERMINATED BY`, 
`LINES TERMINATED BY` and
+`NULL TERMINATED AS`. Since we use `to_json` and `from_json` to handle complex 
data type, so 
+`COLLECTION ITEMS TERMINATED BY` and `MAP KEYS TERMINATED BY` won't work in 
current code. 
+Spark will cast all columns to `STRING` and combined by tabs before feeding to 
the user script.
+For complex type such as `ARRAY\MAP\STRUCT`, spark use `to_json` cast it to 
input json string

Review comment:
       ```suggestion
   For complex type such as `ARRAY`\`MAP`\`STRUCT`, spark use `to_json` cast it 
to input json string
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33362: [SPARK-36153][SQL] Update transform doc to current code

Reply via email to