[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2017-05-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014944#comment-16014944 ] Bryan Cutler commented on SPARK-14141: -- Take a look at SPARK-13534 which will make a Pandas

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2017-01-19 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830750#comment-15830750 ] Luke Miner commented on SPARK-14141: One option is to convert all the categorical variables into

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-10-16 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579949#comment-15579949 ] holdenk commented on SPARK-14141: - Ah sorry for the delay, so doing the cache + count together is done

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-04-04 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224934#comment-15224934 ] Luke Miner commented on SPARK-14141: Do you think you could sketch out your method? I'd love to try

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-30 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218991#comment-15218991 ] holdenk commented on SPARK-14141: - If the data fits in memory on the cluster, cache + count +

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-30 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218930#comment-15218930 ] Luke Miner commented on SPARK-14141: Anecdotally, at least, it seems like a pretty common workflow

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-30 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218909#comment-15218909 ] Davies Liu commented on SPARK-14141: toLocalIterator is better than collect, but will run partitions

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-28 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214718#comment-15214718 ] Luke Miner commented on SPARK-14141: Good to know. Would rdd.toLocalIterator() be a scalable way to

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-28 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213876#comment-15213876 ] Davies Liu commented on SPARK-14141: toPandas() is just an convenient way to convert a small

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-26 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213206#comment-15213206 ] holdenk commented on SPARK-14141: - Its doable, but I'm not sure it belongs in Spark its self. Maybe

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-26 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213199#comment-15213199 ] Luke Miner commented on SPARK-14141: If that's the case, it sounds like it is doable. One way would

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212590#comment-15212590 ] holdenk commented on SPARK-14141: - So with RDDs there is `toLocalIterator` which you could use to do this

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212587#comment-15212587 ] holdenk commented on SPARK-14141: - The more I look at this the more I think its not a good fit for Spark.

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-25 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212563#comment-15212563 ] Luke Miner commented on SPARK-14141: Is there any way to do this process in chunks: read a chunk of

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212548#comment-15212548 ] holdenk commented on SPARK-14141: - So following up, `from_records` doesn't take dtypes although we could

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-25 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212410#comment-15212410 ] holdenk commented on SPARK-14141: - I can take a crack at this, seems pretty reasonable & small. > Let