[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource URL: https://github.com/apache/spark/pull/23325#issuecomment-450834404 yes. I think it's more important to make the behavior of returning struct/array/map in `from_json` consistent, than making `from_json` and json data source consistent. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource URL: https://github.com/apache/spark/pull/23325#issuecomment-450780863 then how about `from_json` always return null for corrupted record if mode is `PERMISSIVE`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource URL: https://github.com/apache/spark/pull/23325#issuecomment-450539611 if it's only for troubleshooting, I guess users can do `if(IsNull(FromJson(col)), col, FromJson(col))`. My major concern is, `PERMISSIVE` mode doesn't work well in `from_json`, because we can return map/array. Maybe another choice is, if `from_jsom` returns array/map, we should forbid `PERMISSIVE` mode. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource URL: https://github.com/apache/spark/pull/23325#issuecomment-449795306 @HyukjinKwon what do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource URL: https://github.com/apache/spark/pull/23325#issuecomment-449696981 It seems also reasonable to accept @HyukjinKwon 's proposal: just revert https://github.com/apache/spark/commit/38628dd1b8298d2686e5d00de17c461c70db99a8 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource URL: https://github.com/apache/spark/pull/23325#issuecomment-448463474 > in the PERMISSIVE mode return a Row with nulls for all fields if specified root type is StructType (bad record string is placed to the corrupt column if it is specified), and null for ArrayType and MapType. I think this is the most arguable part. The current behavior looks not reasonable, I can think of 2 options: 1. always return null if the token is empty, no matter it's row or array or map. 2 never return null if the token is empty. For struct type, return a row with all null fields. For array/map, return empty array/map. @HyukjinKwon @MaxGekk any preference? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource URL: https://github.com/apache/spark/pull/23325#issuecomment-447708971 ah this is a good point. I think PERMISSIVE mode doesn't make sense for array/map as we can't have a special column to put the original token. Now we have several things to consider together to decide the behavior: 1. it's `from_json` or json data source 2. the result is row or array/map (when it's `from_json`) 3. the parse mode 4. the token is valid or not @MaxGekk can you describe the behavior you proposed? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource URL: https://github.com/apache/spark/pull/23325#issuecomment-447689895 These 2 cases can't be consistent. `from_json` can't skip input. If it returns null, it's still not consistent with json data source which returns `Nil`. It's arguable if json data source should follow `from_json` and treat empty token as invalid input. But I think it's safer to not introduce behavior change if we are not sure one way is better than another. while for `from_json`, I do think treating empty token as invalid is better than returning null. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource URL: https://github.com/apache/spark/pull/23325#issuecomment-447616420 IIUC https://github.com/apache/spark/pull/22938 tried to change the behavior of `from_json` intentionally, but changed the behavior of json data source unexpectedly. This PR is to fix this problem. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource URL: https://github.com/apache/spark/pull/23325#issuecomment-447615391 revert it and reopen a PR with the original commit and this fix? I'm fine with it, but I'm not sure if it's a clean revert. IIRC there are multiple JSON related PRs merged recently. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource URL: https://github.com/apache/spark/pull/23325#issuecomment-447610955 +1 for fixing this behavior change, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org