[jira] [Commented] (SPARK-19116) LogicalPlan.statistics.sizeInBytes wrong for trivial parquet file

2017-08-04 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115176#comment-16115176 ] Shea Parkes commented on SPARK-19116: - Apologies for not responding earlier. I'm struggling to

[jira] [Commented] (SPARK-20683) Make table uncache chaining optional

2017-05-09 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003864#comment-16003864 ] Shea Parkes commented on SPARK-20683: - For anyone that found this issue and just wants to revert to

[jira] [Created] (SPARK-20683) Make table uncache chaining optional

2017-05-09 Thread Shea Parkes (JIRA)
Shea Parkes created SPARK-20683: --- Summary: Make table uncache chaining optional Key: SPARK-20683 URL: https://issues.apache.org/jira/browse/SPARK-20683 Project: Spark Issue Type: Bug

[jira] [Comment Edited] (SPARK-12261) pyspark crash for large dataset

2017-03-16 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928179#comment-15928179 ] Shea Parkes edited comment on SPARK-12261 at 3/16/17 2:38 PM: -- I simply

[jira] [Comment Edited] (SPARK-12261) pyspark crash for large dataset

2017-03-16 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928179#comment-15928179 ] Shea Parkes edited comment on SPARK-12261 at 3/16/17 2:38 PM: -- I simply

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2017-03-16 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928179#comment-15928179 ] Shea Parkes commented on SPARK-12261: - I simply added the following to the end: for _ in iterator:

[jira] [Commented] (SPARK-18541) Add pyspark.sql.Column.aliasWithMetadata to allow dynamic metadata management in pyspark SQL API

2017-02-14 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866393#comment-15866393 ] Shea Parkes commented on SPARK-18541: - Thank you very much! > Add

[jira] [Created] (SPARK-19116) LogicalPlan.statistics.sizeInBytes wrong for trivial parquet file

2017-01-07 Thread Shea Parkes (JIRA)
Shea Parkes created SPARK-19116: --- Summary: LogicalPlan.statistics.sizeInBytes wrong for trivial parquet file Key: SPARK-19116 URL: https://issues.apache.org/jira/browse/SPARK-19116 Project: Spark

[jira] [Commented] (SPARK-18541) Add pyspark.sql.Column.aliasWithMetadata to allow dynamic metadata management in pyspark SQL API

2016-11-28 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15703755#comment-15703755 ] Shea Parkes commented on SPARK-18541: - Yea, I originally did {{aliasWithMetadata}} because I could

[jira] [Created] (SPARK-18541) Add pyspark.sql.Column.aliasWithMetadata to allow dynamic metadata management in pyspark SQL API

2016-11-22 Thread Shea Parkes (JIRA)
Shea Parkes created SPARK-18541: --- Summary: Add pyspark.sql.Column.aliasWithMetadata to allow dynamic metadata management in pyspark SQL API Key: SPARK-18541 URL: https://issues.apache.org/jira/browse/SPARK-18541

[jira] [Commented] (SPARK-2141) Add sc.getPersistentRDDs() to PySpark

2016-11-14 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665137#comment-15665137 ] Shea Parkes commented on SPARK-2141: This would have been nice to have today. We wanted to clean up

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-11-02 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629182#comment-15629182 ] Shea Parkes commented on SPARK-12261: - I'm still maintaining the two-line bandaid to

[jira] [Commented] (SPARK-17998) Reading Parquet files coalesces parts into too few in-memory partitions

2016-10-19 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590680#comment-15590680 ] Shea Parkes commented on SPARK-17998: - That definitely answers it. I would say the default of 128MB

[jira] [Closed] (SPARK-17998) Reading Parquet files coalesces parts into too few in-memory partitions

2016-10-19 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shea Parkes closed SPARK-17998. --- Resolution: Information Provided > Reading Parquet files coalesces parts into too few in-memory

[jira] [Created] (SPARK-17998) Reading Parquet files coalesces parts into too few in-memory partitions

2016-10-18 Thread Shea Parkes (JIRA)
Shea Parkes created SPARK-17998: --- Summary: Reading Parquet files coalesces parts into too few in-memory partitions Key: SPARK-17998 URL: https://issues.apache.org/jira/browse/SPARK-17998 Project: Spark

[jira] [Updated] (SPARK-17218) Caching a DataFrame with >200 columns ~nulls the contents

2016-08-24 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shea Parkes updated SPARK-17218: Description: Caching a DataFrame with >200 columns causes the contents to be ~nulled. This is

[jira] [Created] (SPARK-17218) Caching a DataFrame with >200 columns ~nulls the contents

2016-08-24 Thread Shea Parkes (JIRA)
Shea Parkes created SPARK-17218: --- Summary: Caching a DataFrame with >200 columns ~nulls the contents Key: SPARK-17218 URL: https://issues.apache.org/jira/browse/SPARK-17218 Project: Spark

[jira] [Updated] (SPARK-17218) Caching a DataFrame with >200 columns ~nulls the contents

2016-08-24 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shea Parkes updated SPARK-17218: Environment: Microsoft Windows 10 Python v3.5.x Standalone Spark Cluster was: Microsoft Windows

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-07-26 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394508#comment-15394508 ] Shea Parkes commented on SPARK-12261: - I still can't get this bug to reproduce reliably locally, but

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-07-25 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391731#comment-15391731 ] Shea Parkes commented on SPARK-12261: - Also, I added extensive logging to {{worker.py}} in my

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-07-25 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391728#comment-15391728 ] Shea Parkes commented on SPARK-12261: - Alright, I've been spending time off and on for a week on

[jira] [Commented] (SPARK-12261) pyspark crash for large dataset

2016-07-16 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15380846#comment-15380846 ] Shea Parkes commented on SPARK-12261: - I believe I'm hitting the same bug. I'm also running on

[jira] [Commented] (SPARK-13842) Consider __iter__ and __getitem__ methods for pyspark.sql.types.StructType

2016-04-07 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231491#comment-15231491 ] Shea Parkes commented on SPARK-13842: - Pull request is available

[jira] [Commented] (SPARK-13842) Consider __iter__ and __getitem__ methods for pyspark.sql.types.StructType

2016-04-07 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231201#comment-15231201 ] Shea Parkes commented on SPARK-13842: - I'm willing to give it a first pass. Need to go dig up what

[jira] [Created] (SPARK-13842) Consider __iter__ and __getitem__ methods for pyspark.sql.types.StructType

2016-03-12 Thread Shea Parkes (JIRA)
Shea Parkes created SPARK-13842: --- Summary: Consider __iter__ and __getitem__ methods for pyspark.sql.types.StructType Key: SPARK-13842 URL: https://issues.apache.org/jira/browse/SPARK-13842 Project:

[jira] [Commented] (SPARK-10847) Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure

2015-10-02 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941952#comment-14941952 ] Shea Parkes commented on SPARK-10847: - I appreciate your assistance! I think your proposal is an

[jira] [Commented] (SPARK-10847) Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure

2015-10-02 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941953#comment-14941953 ] Shea Parkes commented on SPARK-10847: - My apologies, I just read your patch and see you made it work

[jira] [Commented] (SPARK-10847) Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure

2015-09-28 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933924#comment-14933924 ] Shea Parkes commented on SPARK-10847: - This issue caused me to learn enough about Scala only to learn

[jira] [Created] (SPARK-10847) Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure

2015-09-27 Thread Shea Parkes (JIRA)
Shea Parkes created SPARK-10847: --- Summary: Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure Key: SPARK-10847 URL: https://issues.apache.org/jira/browse/SPARK-10847 Project: