[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource

2019-01-02 Thread GitBox
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without 
tokens by JSON datasource
URL: https://github.com/apache/spark/pull/23325#issuecomment-450834404
 
 
   yes. I think it's more important to make the behavior of returning 
struct/array/map in `from_json` consistent, than making `from_json` and json 
data source consistent.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource

2019-01-01 Thread GitBox
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without 
tokens by JSON datasource
URL: https://github.com/apache/spark/pull/23325#issuecomment-450780863
 
 
   then how about `from_json` always return null for corrupted record if mode 
is `PERMISSIVE`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource

2018-12-29 Thread GitBox
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without 
tokens by JSON datasource
URL: https://github.com/apache/spark/pull/23325#issuecomment-450539611
 
 
   if it's only for troubleshooting, I guess users can do 
`if(IsNull(FromJson(col)), col, FromJson(col))`.
   
   My major concern is, `PERMISSIVE` mode doesn't work well in `from_json`, 
because we can return map/array. Maybe another choice is, if `from_jsom` 
returns array/map, we should forbid `PERMISSIVE` mode.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource

2018-12-24 Thread GitBox
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without 
tokens by JSON datasource
URL: https://github.com/apache/spark/pull/23325#issuecomment-449795306
 
 
   @HyukjinKwon what do you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource

2018-12-23 Thread GitBox
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without 
tokens by JSON datasource
URL: https://github.com/apache/spark/pull/23325#issuecomment-449696981
 
 
   It seems also reasonable to accept @HyukjinKwon 's proposal: just revert 
https://github.com/apache/spark/commit/38628dd1b8298d2686e5d00de17c461c70db99a8


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource

2018-12-18 Thread GitBox
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without 
tokens by JSON datasource
URL: https://github.com/apache/spark/pull/23325#issuecomment-448463474
 
 
   > in the PERMISSIVE mode return a Row with nulls for all fields if specified 
root type is StructType (bad record string is placed to the corrupt column if 
it is specified), and null for ArrayType and MapType.
   
   I think this is the most arguable part. The current behavior looks not 
reasonable, I can think of 2 options:
   1. always return null if the token is empty, no matter it's row or array or 
map.
   2 never return null if the token is empty. For struct type, return a row 
with all null fields. For array/map, return empty array/map.
   
   @HyukjinKwon @MaxGekk any preference?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource

2018-12-16 Thread GitBox
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without 
tokens by JSON datasource
URL: https://github.com/apache/spark/pull/23325#issuecomment-447708971
 
 
   ah this is a good point. I think PERMISSIVE mode doesn't make sense for 
array/map as we can't have a special column to put the original token.
   
   Now we have several things to consider together to decide the behavior:
   1. it's `from_json` or json data source
   2. the result is row or array/map (when it's `from_json`)
   3. the parse mode
   4. the token is valid or not
   
   @MaxGekk can you describe the behavior you proposed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource

2018-12-16 Thread GitBox
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without 
tokens by JSON datasource
URL: https://github.com/apache/spark/pull/23325#issuecomment-447689895
 
 
   These 2 cases can't be consistent. `from_json` can't skip input. If it 
returns null, it's still not consistent with json data source which returns 
`Nil`.
   
   It's arguable if json data source should follow `from_json` and treat empty 
token as invalid input. But I think it's safer to not introduce behavior change 
if we are not sure one way is better than another.
   
   while for `from_json`, I do think treating empty token as invalid is better 
than returning null.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource

2018-12-15 Thread GitBox
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without 
tokens by JSON datasource
URL: https://github.com/apache/spark/pull/23325#issuecomment-447616420
 
 
   IIUC https://github.com/apache/spark/pull/22938 tried to change the behavior 
of `from_json` intentionally, but changed the behavior of json data source 
unexpectedly.
   
   This PR is to fix this problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource

2018-12-15 Thread GitBox
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without 
tokens by JSON datasource
URL: https://github.com/apache/spark/pull/23325#issuecomment-447615391
 
 
   revert it and reopen a PR with the original commit and this fix? I'm fine 
with it, but I'm not sure if it's a clean revert. IIRC there are multiple JSON 
related PRs merged recently.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource

2018-12-15 Thread GitBox
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without 
tokens by JSON datasource
URL: https://github.com/apache/spark/pull/23325#issuecomment-447610955
 
 
   +1 for fixing this behavior change, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org