[jira] [Commented] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark

2023-06-21 Thread Michael Allman (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17735876#comment-17735876 ] Michael Allman commented on SPARK-39375: To be clear, Spark Connect will be an alternative or

[jira] [Created] (SPARK-39833) Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true

2022-07-21 Thread Michael Allman (Jira)
Michael Allman created SPARK-39833: -- Summary: Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true Key: SPARK-39833 URL:

[jira] [Created] (SPARK-25894) Include a count of the number of physical columns read for a columnar data source in the metadata of FileSourceScanExec

2018-10-31 Thread Michael Allman (JIRA)
Michael Allman created SPARK-25894: -- Summary: Include a count of the number of physical columns read for a columnar data source in the metadata of FileSourceScanExec Key: SPARK-25894 URL:

[jira] [Comment Edited] (SPARK-25561) HiveClient.getPartitionsByFilter throws an exception if Hive retries directSql

2018-09-28 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632137#comment-16632137 ] Michael Allman edited comment on SPARK-25561 at 9/28/18 5:08 PM: - cc

[jira] [Commented] (SPARK-25561) HiveClient.getPartitionsByFilter throws an exception if Hive retries directSql

2018-09-28 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632137#comment-16632137 ] Michael Allman commented on SPARK-25561: cc [~cloud_fan] [~ekhliang] Hi [~karthik.manamcheri].

[jira] [Updated] (SPARK-25407) Spark throws a `ParquetDecodingException` when attempting to read a field from a complex type in certain cases of schema merging

2018-09-13 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-25407: --- Description: Spark supports merging schemata across table partitions in which one partition

[jira] [Commented] (SPARK-25407) Spark throws a `ParquetDecodingException` when attempting to read a field from a complex type in certain cases of schema merging

2018-09-11 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610841#comment-16610841 ] Michael Allman commented on SPARK-25407: I have a code-complete patch for this bug, but I want

[jira] [Created] (SPARK-25407) Spark throws a `ParquetDecodingException` when attempting to read a field from a complex type in certain cases of schema merging

2018-09-11 Thread Michael Allman (JIRA)
Michael Allman created SPARK-25407: -- Summary: Spark throws a `ParquetDecodingException` when attempting to read a field from a complex type in certain cases of schema merging Key: SPARK-25407 URL:

[jira] [Updated] (SPARK-25406) Incorrect usage of withSQLConf method in Parquet schema pruning test suite masks failing tests

2018-09-11 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-25406: --- Priority: Major (was: Critical) > Incorrect usage of withSQLConf method in Parquet schema

[jira] [Created] (SPARK-25406) Incorrect usage of withSQLConf method in Parquet schema pruning test suite masks failing tests

2018-09-11 Thread Michael Allman (JIRA)
Michael Allman created SPARK-25406: -- Summary: Incorrect usage of withSQLConf method in Parquet schema pruning test suite masks failing tests Key: SPARK-25406 URL:

[jira] [Commented] (SPARK-20843) Cannot gracefully kill drivers which take longer than 10 seconds to die

2017-05-26 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026841#comment-16026841 ] Michael Allman commented on SPARK-20843: bq. Will a per-cluster config be enough for your usage?

[jira] [Commented] (SPARK-20843) Cannot gracefully kill drivers which take longer than 10 seconds to die

2017-05-26 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026663#comment-16026663 ] Michael Allman commented on SPARK-20843: bq. I will say that I don't think its really safe to

[jira] [Commented] (SPARK-20888) Document HiveCaseSensitiveInferenceMode.INFER_AND_SAVE in Spark SQL 2.1 to 2.2 migration notes

2017-05-25 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025059#comment-16025059 ] Michael Allman commented on SPARK-20888: I will work on a PR for this and try to get it in ASAP.

[jira] [Created] (SPARK-20888) Document HiveCaseSensitiveInferenceMode.INFER_AND_SAVE in Spark SQL 2.1 to 2.2 migration notes

2017-05-25 Thread Michael Allman (JIRA)
Michael Allman created SPARK-20888: -- Summary: Document HiveCaseSensitiveInferenceMode.INFER_AND_SAVE in Spark SQL 2.1 to 2.2 migration notes Key: SPARK-20888 URL:

[jira] [Commented] (SPARK-20843) Cannot gracefully kill drivers which take longer than 10 seconds to die

2017-05-25 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025036#comment-16025036 ] Michael Allman commented on SPARK-20843: [~rxin] I'd like to bump this to "Critical". This is

[jira] [Created] (SPARK-20843) Cannot gracefully kill drivers which take longer than 10 seconds to die

2017-05-22 Thread Michael Allman (JIRA)
Michael Allman created SPARK-20843: -- Summary: Cannot gracefully kill drivers which take longer than 10 seconds to die Key: SPARK-20843 URL: https://issues.apache.org/jira/browse/SPARK-20843 Project:

[jira] [Updated] (SPARK-20331) Broaden support for Hive partition pruning predicate pushdown

2017-04-13 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-20331: --- Description: Spark 2.1 introduced scalable support for Hive tables with huge numbers of

[jira] [Created] (SPARK-20331) Broaden support for Hive partition pruning predicate pushdown

2017-04-13 Thread Michael Allman (JIRA)
Michael Allman created SPARK-20331: -- Summary: Broaden support for Hive partition pruning predicate pushdown Key: SPARK-20331 URL: https://issues.apache.org/jira/browse/SPARK-20331 Project: Spark

[jira] [Commented] (SPARK-5484) Pregel should checkpoint periodically to avoid StackOverflowError

2017-01-18 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828768#comment-15828768 ] Michael Allman commented on SPARK-5484: --- Hi Guys, @ding has rebased his PR, and it LGTM. Can a

[jira] [Commented] (SPARK-17993) Spark prints an avalanche of warning messages from Parquet when reading parquet files written by older versions of Parquet-mr

2017-01-13 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822248#comment-15822248 ] Michael Allman commented on SPARK-17993: [~emre.colak] FYI

[jira] [Commented] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet

2017-01-13 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822111#comment-15822111 ] Michael Allman commented on SPARK-4502: --- Hi Guys, I'm going to submit a PR for this shortly. We've

[jira] [Commented] (SPARK-17993) Spark prints an avalanche of warning messages from Parquet when reading parquet files written by older versions of Parquet-mr

2017-01-10 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816771#comment-15816771 ] Michael Allman commented on SPARK-17993: Cool. I'll work on a simple PR to silence those warnings

[jira] [Commented] (SPARK-17993) Spark prints an avalanche of warning messages from Parquet when reading parquet files written by older versions of Parquet-mr

2017-01-10 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816641#comment-15816641 ] Michael Allman commented on SPARK-17993: Also, if that doesn't work, add the following line as

[jira] [Commented] (SPARK-17993) Spark prints an avalanche of warning messages from Parquet when reading parquet files written by older versions of Parquet-mr

2017-01-10 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816629#comment-15816629 ] Michael Allman commented on SPARK-17993: Try adding the following line to

[jira] [Commented] (SPARK-17993) Spark prints an avalanche of warning messages from Parquet when reading parquet files written by older versions of Parquet-mr

2017-01-10 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815965#comment-15815965 ] Michael Allman commented on SPARK-17993: Hi Emre, Thanks for reporting this. To clarify, what do

[jira] [Commented] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2017-01-01 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15791884#comment-15791884 ] Michael Allman commented on SPARK-17204: I'm 99% sure I've fixed this. I'll submit a PR in the

[jira] [Commented] (SPARK-18853) Project (UnaryNode) is way too aggressive in estimating statistics

2016-12-14 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749397#comment-15749397 ] Michael Allman commented on SPARK-18853: [~rxin] [~hvanhovell] Should we move the overridden

[jira] [Commented] (SPARK-18853) Project (UnaryNode) is way too aggressive in estimating statistics

2016-12-14 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15748968#comment-15748968 ] Michael Allman commented on SPARK-18853: Yes, nested arrays. > Project (UnaryNode) is way too

[jira] [Commented] (SPARK-18853) Project (UnaryNode) is way too aggressive in estimating statistics

2016-12-14 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15748962#comment-15748962 ] Michael Allman commented on SPARK-18853: I'll just add another issue with overestimating the

[jira] [Commented] (SPARK-18853) Project (UnaryNode) is way too aggressive in estimating statistics

2016-12-14 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15748961#comment-15748961 ] Michael Allman commented on SPARK-18853: Should we link this to

[jira] [Commented] (SPARK-18676) Spark 2.x query plan data size estimation can crash join queries versus 1.x

2016-12-13 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746382#comment-15746382 ] Michael Allman commented on SPARK-18676: Yeah, I was wondering how that would work with the

[jira] [Commented] (SPARK-18676) Spark 2.x query plan data size estimation can crash join queries versus 1.x

2016-12-12 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743076#comment-15743076 ] Michael Allman commented on SPARK-18676: I'm sorry I have not had time to provide more

[jira] [Commented] (SPARK-18676) Spark 2.x query plan data size estimation can crash join queries versus 1.x

2016-12-08 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734560#comment-15734560 ] Michael Allman commented on SPARK-18676: Ah okay. That might be a strategy to explore. > Spark

[jira] [Commented] (SPARK-18681) Throw Filtering is supported only on partition keys of type string exception

2016-12-06 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726464#comment-15726464 ] Michael Allman commented on SPARK-18681: [~rxin] I think this should be a blocker for 2.1. This

[jira] [Commented] (SPARK-18676) Spark 2.x query plan data size estimation can crash join queries versus 1.x

2016-12-06 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726437#comment-15726437 ] Michael Allman commented on SPARK-18676: > maybe we could switch to ShuffleJoin when it realize

[jira] [Commented] (SPARK-18676) Spark 2.x query plan data size estimation can crash join queries versus 1.x

2016-12-06 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726431#comment-15726431 ] Michael Allman commented on SPARK-18676: I'm spending some more time this week to understand

[jira] [Updated] (SPARK-18676) Spark 2.x query plan data size estimation can crash join queries versus 1.x

2016-12-06 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-18676: --- Description: Commit [c481bdf|https://github.com/apache/spark/commit/c481bdf] significantly

[jira] [Updated] (SPARK-18676) Spark 2.x query plan data size estimation can crash join queries versus 1.x

2016-12-01 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-18676: --- Description: Commit [c481bdf|https://github.com/apache/spark/commit/c481bdf] significantly

[jira] [Commented] (SPARK-18676) Spark 2.x query plan data size estimation can crash join queries versus 1.x

2016-12-01 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712823#comment-15712823 ] Michael Allman commented on SPARK-18676: cc [~davies] as author of

[jira] [Created] (SPARK-18676) Spark 2.x query plan data size estimation can crash join queries versus 1.x

2016-12-01 Thread Michael Allman (JIRA)
Michael Allman created SPARK-18676: -- Summary: Spark 2.x query plan data size estimation can crash join queries versus 1.x Key: SPARK-18676 URL: https://issues.apache.org/jira/browse/SPARK-18676

[jira] [Issue Comment Deleted] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-11-29 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17204: --- Comment: was deleted (was: I'm able to reproduce the problem with this configuration, and

[jira] [Commented] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-11-29 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706130#comment-15706130 ] Michael Allman commented on SPARK-17204: I'm able to reproduce the problem with this

[jira] [Commented] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-11-29 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706131#comment-15706131 ] Michael Allman commented on SPARK-17204: I'm able to reproduce the problem with this

[jira] [Updated] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-11-29 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17204: --- Description: We use the {{OFF_HEAP}} storage level extensively with great success. We've

[jira] [Commented] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-11-29 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706056#comment-15706056 ] Michael Allman commented on SPARK-17204: Is your question directed at me? The RDD storage blocks

[jira] [Comment Edited] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-11-29 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706031#comment-15706031 ] Michael Allman edited comment on SPARK-17204 at 11/29/16 6:02 PM: -- FYI

[jira] [Commented] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-11-29 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706031#comment-15706031 ] Michael Allman commented on SPARK-17204: FYI I've noticed this is remains an issue in the Spark

[jira] [Updated] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-11-29 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17204: --- Description: We use the {{OFF_HEAP}} storage level extensively with great success. We've

[jira] [Updated] (SPARK-18572) Use the hive client method "getPartitionNames" to answer "SHOW PARTITIONS" queries on partitioned Hive tables

2016-11-23 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-18572: --- Description: Currently Spark answers the {{SHOW PARTITIONS}} query by fetching all of the

[jira] [Created] (SPARK-18572) Use the hive client method "getPartitionNames" to answer "SHOW PARTITIONS" queries on partitioned Hive tables

2016-11-23 Thread Michael Allman (JIRA)
Michael Allman created SPARK-18572: -- Summary: Use the hive client method "getPartitionNames" to answer "SHOW PARTITIONS" queries on partitioned Hive tables Key: SPARK-18572 URL:

[jira] [Commented] (SPARK-18507) Major performance regression in SHOW PARTITIONS on partitioned Hive tables

2016-11-22 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687386#comment-15687386 ] Michael Allman commented on SPARK-18507: We have three partition columns in this table of type

[jira] [Comment Edited] (SPARK-18507) Major performance regression in SHOW PARTITIONS on partitioned Hive tables

2016-11-18 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678215#comment-15678215 ] Michael Allman edited comment on SPARK-18507 at 11/19/16 12:30 AM: --- CC

[jira] [Commented] (SPARK-18507) Major performance regression in SHOW PARTITIONS on partitioned Hive tables

2016-11-18 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678215#comment-15678215 ] Michael Allman commented on SPARK-18507: CC [~ekhliang] > Major performance regression in SHOW

[jira] [Created] (SPARK-18507) Major performance regression in SHOW PARTITIONS on partitioned Hive tables

2016-11-18 Thread Michael Allman (JIRA)
Michael Allman created SPARK-18507: -- Summary: Major performance regression in SHOW PARTITIONS on partitioned Hive tables Key: SPARK-18507 URL: https://issues.apache.org/jira/browse/SPARK-18507

[jira] [Commented] (SPARK-17993) Spark prints an avalanche of warning messages from Parquet when reading parquet files written by older versions of Parquet-mr

2016-11-10 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655355#comment-15655355 ] Michael Allman commented on SPARK-17993: This patch will be part of Spark 2.1, but it looks like

[jira] [Updated] (SPARK-17993) Spark prints an avalanche of warning messages from Parquet when reading parquet files written by older versions of Parquet-mr

2016-11-07 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17993: --- Summary: Spark prints an avalanche of warning messages from Parquet when reading parquet

[jira] [Commented] (SPARK-17993) Spark spews a slew of harmless but annoying warning messages from Parquet when reading parquet files written by older versions of Parquet-mr

2016-11-07 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645495#comment-15645495 ] Michael Allman commented on SPARK-17993: Thank you for your input, Keith. I agree this is a major

[jira] [Updated] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2016-11-02 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-13127: --- Affects Version/s: (was: 1.6.0) 2.0.0 2.0.1

[jira] [Updated] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2016-11-02 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-13127: --- Priority: Major (was: Minor) > Upgrade Parquet to 1.9 (Fixes parquet sorting) >

[jira] [Created] (SPARK-18228) Enhance visibility of Spark wiki

2016-11-02 Thread Michael Allman (JIRA)
Michael Allman created SPARK-18228: -- Summary: Enhance visibility of Spark wiki Key: SPARK-18228 URL: https://issues.apache.org/jira/browse/SPARK-18228 Project: Spark Issue Type:

[jira] [Created] (SPARK-18202) Spark throws a mysterious system error when a Hive command has at least 100,000 results

2016-11-01 Thread Michael Allman (JIRA)
Michael Allman created SPARK-18202: -- Summary: Spark throws a mysterious system error when a Hive command has at least 100,000 results Key: SPARK-18202 URL: https://issues.apache.org/jira/browse/SPARK-18202

[jira] [Commented] (SPARK-17990) ALTER TABLE ... ADD PARTITION does not play nice with mixed-case partition column names

2016-10-29 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15618686#comment-15618686 ] Michael Allman commented on SPARK-17990: Has a decision been made on how we want to handle this?

[jira] [Commented] (SPARK-17344) Kafka 0.8 support for Structured Streaming

2016-10-28 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15617004#comment-15617004 ] Michael Allman commented on SPARK-17344: We (at VideoAmp) would love to use structured streaming

[jira] [Commented] (SPARK-17990) ALTER TABLE ... ADD PARTITION does not play nice with mixed-case partition column names

2016-10-21 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595569#comment-15595569 ] Michael Allman commented on SPARK-17990: The main problem as I see it is one of user experience.

[jira] [Commented] (SPARK-17993) Spark spews a slew of harmless but annoying warning messages from Parquet when reading parquet files written by older versions of Parquet-mr

2016-10-18 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586897#comment-15586897 ] Michael Allman commented on SPARK-17993: cc [~ekhliang] I think I have a fix for this. I'm going

[jira] [Commented] (SPARK-17992) HiveClient.getPartitionsByFilter throws an exception for some unsupported filters when hive.metastore.try.direct.sql=false

2016-10-18 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586892#comment-15586892 ] Michael Allman commented on SPARK-17992: cc [~ekhliang] [~cloud_fan] >

[jira] [Commented] (SPARK-17990) ALTER TABLE ... ADD PARTITION does not play nice with mixed-case partition column names

2016-10-18 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586888#comment-15586888 ] Michael Allman commented on SPARK-17990: CC [~ekhliang] [~cloud_fan] > ALTER TABLE ... ADD

[jira] [Created] (SPARK-17993) Spark spews a slew of harmless but annoying warning messages from Parquet when reading parquet files written by older versions of Parquet-mr

2016-10-18 Thread Michael Allman (JIRA)
Michael Allman created SPARK-17993: -- Summary: Spark spews a slew of harmless but annoying warning messages from Parquet when reading parquet files written by older versions of Parquet-mr Key: SPARK-17993 URL:

[jira] [Created] (SPARK-17992) HiveClient.getPartitionsByFilter throws an exception for some unsupported filters when hive.metastore.try.direct.sql=false

2016-10-18 Thread Michael Allman (JIRA)
Michael Allman created SPARK-17992: -- Summary: HiveClient.getPartitionsByFilter throws an exception for some unsupported filters when hive.metastore.try.direct.sql=false Key: SPARK-17992 URL:

[jira] [Created] (SPARK-17990) ALTER TABLE ... ADD PARTITION does not play nice with mixed-case partition column names

2016-10-18 Thread Michael Allman (JIRA)
Michael Allman created SPARK-17990: -- Summary: ALTER TABLE ... ADD PARTITION does not play nice with mixed-case partition column names Key: SPARK-17990 URL: https://issues.apache.org/jira/browse/SPARK-17990

[jira] [Comment Edited] (SPARK-17983) Can't filter over mixed case parquet columns of converted Hive tables

2016-10-18 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586133#comment-15586133 ] Michael Allman edited comment on SPARK-17983 at 10/18/16 5:49 PM: --

[jira] [Commented] (SPARK-17983) Can't filter over mixed case parquet columns of converted Hive tables

2016-10-18 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586133#comment-15586133 ] Michael Allman commented on SPARK-17983: Hmmm... not sure what you mean. You talking about

[jira] [Commented] (SPARK-17983) Can't filter over mixed case parquet columns of converted Hive tables

2016-10-18 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586116#comment-15586116 ] Michael Allman commented on SPARK-17983: Speaking strictly from the POV of parquet predicate

[jira] [Comment Edited] (SPARK-17983) Can't filter over mixed case parquet columns of converted Hive tables

2016-10-18 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586018#comment-15586018 ] Michael Allman edited comment on SPARK-17983 at 10/18/16 5:23 PM: -- cc

[jira] [Commented] (SPARK-17983) Can't filter over mixed case parquet columns of converted Hive tables

2016-10-18 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586018#comment-15586018 ] Michael Allman commented on SPARK-17983: cc [~rxin] I had a feeling there might be some fallout

[jira] [Updated] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17204: --- Description: We use the OFF_HEAP storage level extensively with great success. We've tried

[jira] [Comment Edited] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436218#comment-15436218 ] Michael Allman edited comment on SPARK-17204 at 8/25/16 3:36 AM: - I would

[jira] [Comment Edited] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436218#comment-15436218 ] Michael Allman edited comment on SPARK-17204 at 8/25/16 3:37 AM: - I would

[jira] [Commented] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436218#comment-15436218 ] Michael Allman commented on SPARK-17204: I would think that, but `sc.range(0, 0)` throws the

[jira] [Updated] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17204: --- Description: We use the OFF_HEAP storage level extensively with great success. We've tried

[jira] [Updated] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17204: --- Description: We use the OFF_HEAP storage level extensively with great success. We've tried

[jira] [Commented] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436202#comment-15436202 ] Michael Allman commented on SPARK-17204: Hi [~jerryshao]. I wonder if you're testing in local

[jira] [Updated] (SPARK-17231) Avoid building debug or trace log messages unless the respective log level is enabled

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17231: --- Description: While debugging the performance of a large GraphX connected components

[jira] [Comment Edited] (SPARK-17231) Avoid building debug or trace log messages unless the respective log level is enabled

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435767#comment-15435767 ] Michael Allman edited comment on SPARK-17231 at 8/24/16 11:29 PM: -- I've

[jira] [Updated] (SPARK-17231) Avoid building debug or trace log messages unless the respective log level is enabled

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17231: --- Attachment: master 2.jpg logging_perf_improvements 2.jpg > Avoid building

[jira] [Updated] (SPARK-17231) Avoid building debug or trace log messages unless the respective log level is enabled

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17231: --- Description: While debugging the performance of a large GraphX connected components

[jira] [Updated] (SPARK-17231) Avoid building debug or trace log messages unless the respective log level is enabled

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17231: --- Description: While debugging the performance of a large GraphX connected components

[jira] [Updated] (SPARK-17231) Avoid building debug or trace log messages unless the respective log level is enabled

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17231: --- Description: While debugging the performance of a large GraphX connected components

[jira] [Updated] (SPARK-17231) Avoid building debug or trace log messages unless the respective log level is enabled

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17231: --- Description: While debugging the performance of a large GraphX connected components

[jira] [Commented] (SPARK-17231) Avoid building debug or trace log messages unless the respective log level is enabled

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435767#comment-15435767 ] Michael Allman commented on SPARK-17231: Note that in the attached screenshots, all stats are the

[jira] [Updated] (SPARK-17231) Avoid building debug or trace log messages unless the respective log level is enabled

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17231: --- Attachment: logging_perf_improvements.jpg master.jpg > Avoid building debug

[jira] [Updated] (SPARK-17231) Avoid building debug or trace log messages unless the respective log level is enabled

2016-08-24 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17231: --- Description: While debugging the performance of a large GraphX connected components

[jira] [Created] (SPARK-17231) Avoid building debug or trace log messages unless the respective log level is enabled

2016-08-24 Thread Michael Allman (JIRA)
Michael Allman created SPARK-17231: -- Summary: Avoid building debug or trace log messages unless the respective log level is enabled Key: SPARK-17231 URL: https://issues.apache.org/jira/browse/SPARK-17231

[jira] [Updated] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory data corruption

2016-08-23 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17204: --- Summary: Spark 2.0 off heap RDD persistence with replication factor 2 leads to in-memory

[jira] [Updated] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to data corruption

2016-08-23 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17204: --- Description: We use the OFF_HEAP storage level extensively. We've tried off-heap storage

[jira] [Commented] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to data corruption

2016-08-23 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433878#comment-15433878 ] Michael Allman commented on SPARK-17204: [~rxin] I rebuilt from master as of commit

[jira] [Commented] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to data corruption

2016-08-23 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433408#comment-15433408 ] Michael Allman commented on SPARK-17204: [~rxin] I'll give it a try. Thanks for the heads up. I

[jira] [Created] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to data corruption

2016-08-23 Thread Michael Allman (JIRA)
Michael Allman created SPARK-17204: -- Summary: Spark 2.0 off heap RDD persistence with replication factor 2 leads to data corruption Key: SPARK-17204 URL: https://issues.apache.org/jira/browse/SPARK-17204

[jira] [Updated] (SPARK-17204) Spark 2.0 off heap RDD persistence with replication factor 2 leads to data corruption

2016-08-23 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-17204: --- Description: We use the OFF_HEAP storage level extensively. We've tried off-heap storage

[jira] [Updated] (SPARK-16980) Load only catalog table partition metadata required to answer a query

2016-08-09 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-16980: --- Description: Currently, when a user reads from a partitioned Hive table whose metadata are

  1   2   >