[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources
[ https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677486#comment-16677486 ] Hyukjin Kwon commented on SPARK-17967: -- It's a rough idea but I was also thinking allowing binary and sending some CSV setting object directly ({{CsvWriterSettings}}) from Scala and Java. Current Univocity parser allows too many options and it's kind of troublesome to judge which one should be added or not (https://github.com/apache/spark/pull/22590). > Support for list or other types as an option for datasources > > > Key: SPARK-17967 > URL: https://issues.apache.org/jira/browse/SPARK-17967 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0, 2.0.1 >Reporter: Hyukjin Kwon >Priority: Major > > This was discussed in SPARK-17878 > For other datasources, it seems okay with string/long/boolean/double value as > an option but it seems it is not enough for the datasource such as CSV. As it > is an interface for other external datasources, I guess it'd affect several > ones out there. > I took a look a first but it seems it'd be difficult to support this (need to > change a lot). > One suggestion is support this as a JSON array. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources
[ https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677478#comment-16677478 ] Hyukjin Kwon commented on SPARK-17967: -- For CSV itself, yea, there are workaround and I agree - for CSV, it should be just a good to do. However, other cases like, for instance, specifying binary format (https://github.com/apache/spark/pull/21192) ideally needs this. it is also needed to specify multiple delimiters or dates format (there are already some JIRAs open). > Support for list or other types as an option for datasources > > > Key: SPARK-17967 > URL: https://issues.apache.org/jira/browse/SPARK-17967 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0, 2.0.1 >Reporter: Hyukjin Kwon >Priority: Major > > This was discussed in SPARK-17878 > For other datasources, it seems okay with string/long/boolean/double value as > an option but it seems it is not enough for the datasource such as CSV. As it > is an interface for other external datasources, I guess it'd affect several > ones out there. > I took a look a first but it seems it'd be difficult to support this (need to > change a lot). > One suggestion is support this as a JSON array. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources
[ https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677418#comment-16677418 ] Reynold Xin commented on SPARK-17967: - BTW how important is this? Seems like for CSV people can just replace the null values with null themselves using the programmatic API. > Support for list or other types as an option for datasources > > > Key: SPARK-17967 > URL: https://issues.apache.org/jira/browse/SPARK-17967 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0, 2.0.1 >Reporter: Hyukjin Kwon >Priority: Major > > This was discussed in SPARK-17878 > For other datasources, it seems okay with string/long/boolean/double value as > an option but it seems it is not enough for the datasource such as CSV. As it > is an interface for other external datasources, I guess it'd affect several > ones out there. > I took a look a first but it seems it'd be difficult to support this (need to > change a lot). > One suggestion is support this as a JSON array. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources
[ https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16674327#comment-16674327 ] Hyukjin Kwon commented on SPARK-17967: -- That works but IMHO less pretty actually. We can add `option(Array[String])` API and internally use CSV format as well (current approach is JSON); however, if we use CSV then, we will face another problems like how to handle null, etc. > Support for list or other types as an option for datasources > > > Key: SPARK-17967 > URL: https://issues.apache.org/jira/browse/SPARK-17967 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0, 2.0.1 >Reporter: Hyukjin Kwon >Priority: Major > > This was discussed in SPARK-17878 > For other datasources, it seems okay with string/long/boolean/double value as > an option but it seems it is not enough for the datasource such as CSV. As it > is an interface for other external datasources, I guess it'd affect several > ones out there. > I took a look a first but it seems it'd be difficult to support this (need to > change a lot). > One suggestion is support this as a JSON array. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources
[ https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16674007#comment-16674007 ] Maxim Gekk commented on SPARK-17967: What about to preserve existing API as is, and pass multiple values as CSV string? For example: {code:scala} spark.read.format("csv") .option("nullValue", "2012, Tesla, null")) ... {code} or {code:sql} CREATE TEMPORARY TABLE tableA USING csv OPTIONS (sep '|,-', ...) {code} > Support for list or other types as an option for datasources > > > Key: SPARK-17967 > URL: https://issues.apache.org/jira/browse/SPARK-17967 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0, 2.0.1 >Reporter: Hyukjin Kwon >Priority: Major > > This was discussed in SPARK-17878 > For other datasources, it seems okay with string/long/boolean/double value as > an option but it seems it is not enough for the datasource such as CSV. As it > is an interface for other external datasources, I guess it'd affect several > ones out there. > I took a look a first but it seems it'd be difficult to support this (need to > change a lot). > One suggestion is support this as a JSON array. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources
[ https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16307187#comment-16307187 ] Apache Spark commented on SPARK-17967: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/20125 > Support for list or other types as an option for datasources > > > Key: SPARK-17967 > URL: https://issues.apache.org/jira/browse/SPARK-17967 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0, 2.0.1 >Reporter: Hyukjin Kwon > > This was discussed in SPARK-17878 > For other datasources, it seems okay with string/long/boolean/double value as > an option but it seems it is not enough for the datasource such as CSV. As it > is an interface for other external datasources, I guess it'd affect several > ones out there. > I took a look a first but it seems it'd be difficult to support this (need to > change a lot). > One suggestion is support this as a JSON array. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources
[ https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15825694#comment-15825694 ] Apache Spark commented on SPARK-17967: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/16611 > Support for list or other types as an option for datasources > > > Key: SPARK-17967 > URL: https://issues.apache.org/jira/browse/SPARK-17967 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0, 2.0.1 >Reporter: Hyukjin Kwon > > This was discussed in SPARK-17878 > For other datasources, it seems okay with string/long/boolean/double value as > an option but it seems it is not enough for the datasource such as CSV. As it > is an interface for other external datasources, I guess it'd affect several > ones out there. > I took a look a first but it seems it'd be difficult to support this (need to > change a lot). > One suggestion is support this as a JSON array. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources
[ https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15647746#comment-15647746 ] Hyukjin Kwon commented on SPARK-17967: -- Thanks [~rxin], I made a patch for this locally but I guess you might not want to get this into 2.1? If so, I will submit this PR later after 2.1. > Support for list or other types as an option for datasources > > > Key: SPARK-17967 > URL: https://issues.apache.org/jira/browse/SPARK-17967 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0, 2.0.1 >Reporter: Hyukjin Kwon > > This was discussed in SPARK-17878 > For other datasources, it seems okay with string/long/boolean/double value as > an option but it seems it is not enough for the datasource such as CSV. As it > is an interface for other external datasources, I guess it'd affect several > ones out there. > I took a look a first but it seems it'd be difficult to support this (need to > change a lot). > One suggestion is support this as a JSON array. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources
[ https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15647747#comment-15647747 ] Hyukjin Kwon commented on SPARK-17967: -- Thanks [~rxin], I made a patch for this locally but I guess you might not want to get this into 2.1? If so, I will submit this PR later after 2.1. > Support for list or other types as an option for datasources > > > Key: SPARK-17967 > URL: https://issues.apache.org/jira/browse/SPARK-17967 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0, 2.0.1 >Reporter: Hyukjin Kwon > > This was discussed in SPARK-17878 > For other datasources, it seems okay with string/long/boolean/double value as > an option but it seems it is not enough for the datasource such as CSV. As it > is an interface for other external datasources, I guess it'd affect several > ones out there. > I took a look a first but it seems it'd be difficult to support this (need to > change a lot). > One suggestion is support this as a JSON array. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources
[ https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15627698#comment-15627698 ] Reynold Xin commented on SPARK-17967: - +1 on json arrays. > Support for list or other types as an option for datasources > > > Key: SPARK-17967 > URL: https://issues.apache.org/jira/browse/SPARK-17967 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0, 2.0.1 >Reporter: Hyukjin Kwon > > This was discussed in SPARK-17878 > For other datasources, it seems okay with string/long/boolean/double value as > an option but it seems it is not enough for the datasource such as CSV. As it > is an interface for other external datasources, I guess it'd affect several > ones out there. > I took a look a first but it seems it'd be difficult to support this (need to > change a lot). > One suggestion is support this as a JSON array. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources
[ https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15580958#comment-15580958 ] Hyukjin Kwon commented on SPARK-17967: -- I am leaving SPARK-17878 as a related one but it does not mean this one blocks that JIRA. > Support for list or other types as an option for datasources > > > Key: SPARK-17967 > URL: https://issues.apache.org/jira/browse/SPARK-17967 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0, 2.0.1 >Reporter: Hyukjin Kwon > > This was discussed in SPARK-17878 > For other datasources, it seems okay with string/long/boolean/double value as > an option but it seems it is not enough for the datasource such as CSV. As it > is an interface for other external datasources, I guess it'd affect several > ones out there. > I took a look a first but it seems it'd be difficult to support this (need to > change a lot). > One suggestion is support this as a JSON array. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org