HyukjinKwon commented on a change in pull request #33362:
URL: https://github.com/apache/spark/pull/33362#discussion_r670326847



##########
File path: docs/sql-ref-syntax-qry-select-transform.md
##########
@@ -57,16 +65,38 @@ SELECT TRANSFORM ( expression [ , ... ] )
 
     Specifies a command or a path to script to process data.
 
-### SerDe behavior
+### ROW FORMAT DELIMITED BEHAVIOR
+
+When spark use `ROW FORMAT DELIMITED` format, Spark will use `\u0001` as 
default filed delimit,
+use `\n` as default line delimit and use `"\N"` as `NULL` value in order to 
differentiate `NULL` values 
+from empty strings. These delimit can be overridden by `FIELDS TERMINATED BY`, 
`LINES TERMINATED BY` and
+`NULL TERMINATED AS`. Since we use `to_json` and `from_json` to handle complex 
data type, so 
+`COLLECTION ITEMS TERMINATED BY` and `MAP KEYS TERMINATED BY` won't work in 
current code. 
+Spark will cast all columns to `STRING` and combined by tabs before feeding to 
the user script.
+For complex type such as `ARRAY\MAP\STRUCT`, spark use `to_json` cast it to 
input json string
+and use `from_json` to convert result output to `ARRAY/MAP/STRUCT` data. The 
standard output of
+the user script will be treated as tab-separated `STRING` columns, any cell 
containing only `"\N"` 
+will be re-interpreted as a `NULL` value, and then the resulting STRING column 
will be cast to the 
+data type specified in `col_type`. If the actual number of output columns is 
less than the number 
+of specified output columns, insufficient output columns will be supplemented 
with `NULL`. 
+If the actual number of output columns is more than the number of specified 
output columns,
+the output columns will only select the corresponding columns and the 
remaining part will be discarded.

Review comment:
       I think we should show some examples with input/outputs. it's difficult 
to follow it just from reading the texts.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to