[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-28 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214718#comment-15214718 ] Luke Miner commented on SPARK-14141: Good to know. Would rdd.toLocalIterator() be a scalable way to

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-30 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218930#comment-15218930 ] Luke Miner commented on SPARK-14141: Anecdotally, at least, it seems like a pretty common workflow

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-25 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212563#comment-15212563 ] Luke Miner commented on SPARK-14141: Is there any way to do this process in chunks: read a chunk of

[jira] [Created] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-24 Thread Luke Miner (JIRA)
Luke Miner created SPARK-14141: -- Summary: Let user specify datatypes of pandas dataframe in toPandas() Key: SPARK-14141 URL: https://issues.apache.org/jira/browse/SPARK-14141 Project: Spark

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-03-26 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213199#comment-15213199 ] Luke Miner commented on SPARK-14141: If that's the case, it sounds like it is doable. One way would

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-04-04 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224934#comment-15224934 ] Luke Miner commented on SPARK-14141: Do you think you could sketch out your method? I'd love to try

[jira] [Comment Edited] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2016-04-04 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218930#comment-15218930 ] Luke Miner edited comment on SPARK-14141 at 4/4/16 6:54 PM: Anecdotally, at

[jira] [Commented] (SPARK-19428) Ability to select first row of groupby

2017-02-03 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851784#comment-15851784 ] Luke Miner commented on SPARK-19428: Couple of things. Sometimes I just want a random row from each

[jira] [Comment Edited] (SPARK-19428) Ability to select first row of groupby

2017-02-03 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851784#comment-15851784 ] Luke Miner edited comment on SPARK-19428 at 2/3/17 5:33 PM: Couple of things.

[jira] [Comment Edited] (SPARK-19428) Ability to select first row of groupby

2017-02-04 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852946#comment-15852946 ] Luke Miner edited comment on SPARK-19428 at 2/5/17 3:45 AM: I did not know of

[jira] [Commented] (SPARK-19428) Ability to select first row of groupby

2017-02-04 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852946#comment-15852946 ] Luke Miner commented on SPARK-19428: I did not know of the existence of the {first}} function for

[jira] [Commented] (SPARK-19428) Ability to select first row of groupby

2017-02-04 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852867#comment-15852867 ] Luke Miner commented on SPARK-19428: Unfortunately no, because that would just get me a row for a

[jira] [Comment Edited] (SPARK-19428) Ability to select first row of groupby

2017-02-04 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852884#comment-15852884 ] Luke Miner edited comment on SPARK-19428 at 2/4/17 6:15 PM: That would be

[jira] [Commented] (SPARK-19428) Ability to select first row of groupby

2017-02-04 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852884#comment-15852884 ] Luke Miner commented on SPARK-19428: That would be fantastic. Would it be possible to generalize it

[jira] [Commented] (SPARK-19428) Ability to select first row of groupby

2017-02-04 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852890#comment-15852890 ] Luke Miner commented on SPARK-19428: How could you do it that way? Normally the cutoff varies by

[jira] [Created] (SPARK-19428) Ability to select first row of groupby

2017-02-01 Thread Luke Miner (JIRA)
Luke Miner created SPARK-19428: -- Summary: Ability to select first row of groupby Key: SPARK-19428 URL: https://issues.apache.org/jira/browse/SPARK-19428 Project: Spark Issue Type: Brainstorming

[jira] [Commented] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2017-01-19 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830750#comment-15830750 ] Luke Miner commented on SPARK-14141: One option is to convert all the categorical variables into

[jira] [Comment Edited] (SPARK-14141) Let user specify datatypes of pandas dataframe in toPandas()

2017-01-19 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830750#comment-15830750 ] Luke Miner edited comment on SPARK-14141 at 1/19/17 10:52 PM: -- One option is

[jira] [Updated] (SPARK-18343) FileSystem$Statistics$StatisticsDataReferenceCleaner hangs on s3 write

2016-11-08 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Miner updated SPARK-18343: --- Description: I have a driver program where I write read data in from Cassandra using spark, perform

[jira] [Commented] (SPARK-18343) FileSystem$Statistics$StatisticsDataReferenceCleaner hangs on s3 write

2016-11-08 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648900#comment-15648900 ] Luke Miner commented on SPARK-18343: I ran jstack on an executor and on the driver and have attached

[jira] [Commented] (SPARK-18343) FileSystem$Statistics$StatisticsDataReferenceCleaner hangs on s3 write

2016-11-09 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651942#comment-15651942 ] Luke Miner commented on SPARK-18343: Any suggestions on how one might hunt down that library? I've

[jira] [Resolved] (SPARK-18343) FileSystem$Statistics$StatisticsDataReferenceCleaner hangs on s3 write

2016-11-09 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Miner resolved SPARK-18343. Resolution: Not A Bug This was due to some clash in versions between the libraries I was using.

[jira] [Issue Comment Deleted] (SPARK-18343) FileSystem$Statistics$StatisticsDataReferenceCleaner hangs on s3 write

2016-11-09 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Miner updated SPARK-18343: --- Comment: was deleted (was: Any suggestions on how one might hunt down that library? I've included my

[jira] [Commented] (SPARK-18343) FileSystem$Statistics$StatisticsDataReferenceCleaner hangs on s3 write

2016-11-09 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652487#comment-15652487 ] Luke Miner commented on SPARK-18343: Updating some of those libraries to their latest versions fixed

[jira] [Created] (SPARK-18343) FileSystem$Statistics$StatisticsDataReferenceCleaner hangs on s3 write

2016-11-07 Thread Luke Miner (JIRA)
Luke Miner created SPARK-18343: -- Summary: FileSystem$Statistics$StatisticsDataReferenceCleaner hangs on s3 write Key: SPARK-18343 URL: https://issues.apache.org/jira/browse/SPARK-18343 Project: Spark

[jira] [Created] (SPARK-18402) spark: SAXParseException while writing from json to parquet on s3

2016-11-10 Thread Luke Miner (JIRA)
Luke Miner created SPARK-18402: -- Summary: spark: SAXParseException while writing from json to parquet on s3 Key: SPARK-18402 URL: https://issues.apache.org/jira/browse/SPARK-18402 Project: Spark

[jira] [Updated] (SPARK-18402) spark: SAXParseException while writing from json to parquet on s3

2016-11-10 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Miner updated SPARK-18402: --- Description: I'm trying to read in some json, infer a schema, and write it out again as parquet to

[jira] [Updated] (SPARK-18402) spark: SAXParseException while writing from json to parquet on s3

2016-11-10 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Miner updated SPARK-18402: --- Description: I'm trying to read in some json, infer a schema, and write it out again as parquet to

[jira] [Updated] (SPARK-18402) spark: SAXParseException while writing from json to parquet on s3

2016-11-10 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Miner updated SPARK-18402: --- Environment: spark 2.0.1 hadoop 2.7.1 hadoop aws 2.7.1 ubuntu 14.04.5 on aws mesos 1.0.1 was:

[jira] [Updated] (SPARK-18402) spark: SAXParseException while writing from json to parquet on s3

2016-11-10 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Miner updated SPARK-18402: --- Environment: spark 2.0.1 hadoop 2.7.1 hadoop aws 2.7.1 ubuntu 14.04.5 on aws mesos 1.0.1 Java

[jira] [Created] (SPARK-18281) toLocalIterator yields time out error on pyspark2

2016-11-04 Thread Luke Miner (JIRA)
Luke Miner created SPARK-18281: -- Summary: toLocalIterator yields time out error on pyspark2 Key: SPARK-18281 URL: https://issues.apache.org/jira/browse/SPARK-18281 Project: Spark Issue Type:

[jira] [Updated] (SPARK-18281) toLocalIterator yields time out error on pyspark2

2016-11-04 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Miner updated SPARK-18281: --- Description: I run the example straight out of the api docs for toLocalIterator and it gives a time

[jira] [Commented] (SPARK-18402) spark: SAXParseException while writing from json to parquet on s3

2016-11-11 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15657801#comment-15657801 ] Luke Miner commented on SPARK-18402: Great! Will respond there. > spark: SAXParseException while

[jira] [Commented] (SPARK-18402) spark: SAXParseException while writing from json to parquet on s3

2016-11-11 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15657663#comment-15657663 ] Luke Miner commented on SPARK-18402: Ah. I'll see if I can follow up with them. Would you like me to