[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources

2018-11-06 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677486#comment-16677486
 ] 

Hyukjin Kwon commented on SPARK-17967:
--

It's a rough idea but I was also thinking allowing binary and sending some CSV 
setting object directly ({{CsvWriterSettings}}) from Scala and Java. Current 
Univocity parser allows too many options and it's kind of troublesome to judge 
which one should be added or not (https://github.com/apache/spark/pull/22590).

> Support for list or other types as an option for datasources
> 
>
> Key: SPARK-17967
> URL: https://issues.apache.org/jira/browse/SPARK-17967
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Hyukjin Kwon
>Priority: Major
>
> This was discussed in SPARK-17878
> For other datasources, it seems okay with string/long/boolean/double value as 
> an option but it seems it is not enough for the datasource such as CSV. As it 
> is an interface for other external datasources, I guess it'd affect several 
> ones out there.
> I took a look a first but it seems it'd be difficult to support this (need to 
> change a lot).
> One suggestion is support this as a JSON array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources

2018-11-06 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677478#comment-16677478
 ] 

Hyukjin Kwon commented on SPARK-17967:
--

For CSV itself, yea, there are workaround and I agree - for CSV, it should be 
just a good to do. However, other cases like, for instance, specifying binary 
format (https://github.com/apache/spark/pull/21192) ideally needs this. it is 
also needed to specify multiple delimiters or dates format (there are already 
some JIRAs open).

> Support for list or other types as an option for datasources
> 
>
> Key: SPARK-17967
> URL: https://issues.apache.org/jira/browse/SPARK-17967
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Hyukjin Kwon
>Priority: Major
>
> This was discussed in SPARK-17878
> For other datasources, it seems okay with string/long/boolean/double value as 
> an option but it seems it is not enough for the datasource such as CSV. As it 
> is an interface for other external datasources, I guess it'd affect several 
> ones out there.
> I took a look a first but it seems it'd be difficult to support this (need to 
> change a lot).
> One suggestion is support this as a JSON array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources

2018-11-06 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677418#comment-16677418
 ] 

Reynold Xin commented on SPARK-17967:
-

BTW how important is this? Seems like for CSV people can just replace the null 
values with null themselves using the programmatic API.

 

> Support for list or other types as an option for datasources
> 
>
> Key: SPARK-17967
> URL: https://issues.apache.org/jira/browse/SPARK-17967
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Hyukjin Kwon
>Priority: Major
>
> This was discussed in SPARK-17878
> For other datasources, it seems okay with string/long/boolean/double value as 
> an option but it seems it is not enough for the datasource such as CSV. As it 
> is an interface for other external datasources, I guess it'd affect several 
> ones out there.
> I took a look a first but it seems it'd be difficult to support this (need to 
> change a lot).
> One suggestion is support this as a JSON array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources

2018-11-04 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674327#comment-16674327
 ] 

Hyukjin Kwon commented on SPARK-17967:
--

That works but IMHO less pretty actually. We can add `option(Array[String])` 
API and internally use CSV format as well (current approach is JSON); however, 
if we use CSV then, we will face another problems like how to handle null, etc. 

> Support for list or other types as an option for datasources
> 
>
> Key: SPARK-17967
> URL: https://issues.apache.org/jira/browse/SPARK-17967
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Hyukjin Kwon
>Priority: Major
>
> This was discussed in SPARK-17878
> For other datasources, it seems okay with string/long/boolean/double value as 
> an option but it seems it is not enough for the datasource such as CSV. As it 
> is an interface for other external datasources, I guess it'd affect several 
> ones out there.
> I took a look a first but it seems it'd be difficult to support this (need to 
> change a lot).
> One suggestion is support this as a JSON array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources

2018-11-03 Thread Maxim Gekk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674007#comment-16674007
 ] 

Maxim Gekk commented on SPARK-17967:


What about to preserve existing API as is, and pass multiple values as CSV 
string? For example:
{code:scala}
spark.read.format("csv")
  .option("nullValue", "2012, Tesla, null"))
  ...
{code}
or
{code:sql}
CREATE TEMPORARY TABLE tableA USING csv
OPTIONS (sep '|,-', ...)
{code}

> Support for list or other types as an option for datasources
> 
>
> Key: SPARK-17967
> URL: https://issues.apache.org/jira/browse/SPARK-17967
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Hyukjin Kwon
>Priority: Major
>
> This was discussed in SPARK-17878
> For other datasources, it seems okay with string/long/boolean/double value as 
> an option but it seems it is not enough for the datasource such as CSV. As it 
> is an interface for other external datasources, I guess it'd affect several 
> ones out there.
> I took a look a first but it seems it'd be difficult to support this (need to 
> change a lot).
> One suggestion is support this as a JSON array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources

2017-12-31 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307187#comment-16307187
 ] 

Apache Spark commented on SPARK-17967:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/20125

> Support for list or other types as an option for datasources
> 
>
> Key: SPARK-17967
> URL: https://issues.apache.org/jira/browse/SPARK-17967
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Hyukjin Kwon
>
> This was discussed in SPARK-17878
> For other datasources, it seems okay with string/long/boolean/double value as 
> an option but it seems it is not enough for the datasource such as CSV. As it 
> is an interface for other external datasources, I guess it'd affect several 
> ones out there.
> I took a look a first but it seems it'd be difficult to support this (need to 
> change a lot).
> One suggestion is support this as a JSON array.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources

2017-01-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825694#comment-15825694
 ] 

Apache Spark commented on SPARK-17967:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/16611

> Support for list or other types as an option for datasources
> 
>
> Key: SPARK-17967
> URL: https://issues.apache.org/jira/browse/SPARK-17967
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Hyukjin Kwon
>
> This was discussed in SPARK-17878
> For other datasources, it seems okay with string/long/boolean/double value as 
> an option but it seems it is not enough for the datasource such as CSV. As it 
> is an interface for other external datasources, I guess it'd affect several 
> ones out there.
> I took a look a first but it seems it'd be difficult to support this (need to 
> change a lot).
> One suggestion is support this as a JSON array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources

2016-11-08 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15647746#comment-15647746
 ] 

Hyukjin Kwon commented on SPARK-17967:
--

Thanks [~rxin], I made a patch for this locally but I guess you might not want 
to get this into 2.1? If so, I will submit this PR later after 2.1.

> Support for list or other types as an option for datasources
> 
>
> Key: SPARK-17967
> URL: https://issues.apache.org/jira/browse/SPARK-17967
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Hyukjin Kwon
>
> This was discussed in SPARK-17878
> For other datasources, it seems okay with string/long/boolean/double value as 
> an option but it seems it is not enough for the datasource such as CSV. As it 
> is an interface for other external datasources, I guess it'd affect several 
> ones out there.
> I took a look a first but it seems it'd be difficult to support this (need to 
> change a lot).
> One suggestion is support this as a JSON array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources

2016-11-08 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15647747#comment-15647747
 ] 

Hyukjin Kwon commented on SPARK-17967:
--

Thanks [~rxin], I made a patch for this locally but I guess you might not want 
to get this into 2.1? If so, I will submit this PR later after 2.1.

> Support for list or other types as an option for datasources
> 
>
> Key: SPARK-17967
> URL: https://issues.apache.org/jira/browse/SPARK-17967
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Hyukjin Kwon
>
> This was discussed in SPARK-17878
> For other datasources, it seems okay with string/long/boolean/double value as 
> an option but it seems it is not enough for the datasource such as CSV. As it 
> is an interface for other external datasources, I guess it'd affect several 
> ones out there.
> I took a look a first but it seems it'd be difficult to support this (need to 
> change a lot).
> One suggestion is support this as a JSON array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources

2016-11-01 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627698#comment-15627698
 ] 

Reynold Xin commented on SPARK-17967:
-

+1 on json arrays.

> Support for list or other types as an option for datasources
> 
>
> Key: SPARK-17967
> URL: https://issues.apache.org/jira/browse/SPARK-17967
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Hyukjin Kwon
>
> This was discussed in SPARK-17878
> For other datasources, it seems okay with string/long/boolean/double value as 
> an option but it seems it is not enough for the datasource such as CSV. As it 
> is an interface for other external datasources, I guess it'd affect several 
> ones out there.
> I took a look a first but it seems it'd be difficult to support this (need to 
> change a lot).
> One suggestion is support this as a JSON array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17967) Support for list or other types as an option for datasources

2016-10-16 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580958#comment-15580958
 ] 

Hyukjin Kwon commented on SPARK-17967:
--

I am leaving SPARK-17878 as a related one but it does not mean this one blocks 
that JIRA.

> Support for list or other types as an option for datasources
> 
>
> Key: SPARK-17967
> URL: https://issues.apache.org/jira/browse/SPARK-17967
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Hyukjin Kwon
>
> This was discussed in SPARK-17878
> For other datasources, it seems okay with string/long/boolean/double value as 
> an option but it seems it is not enough for the datasource such as CSV. As it 
> is an interface for other external datasources, I guess it'd affect several 
> ones out there.
> I took a look a first but it seems it'd be difficult to support this (need to 
> change a lot).
> One suggestion is support this as a JSON array.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org