from:"budde"

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102338189 --- Diff: python/pyspark/streaming/kinesis.py --- @@ -37,7 +37,8 @@ class KinesisUtils(object): def createStream(ssc, kinesisAppName, streamName

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-21 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r102338119 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala --- @@ -78,8 +70,9 @@ case class

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-21 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 @brkyvz Anyone I can ping to help get this merged? The PR is going on a month old at this point and I know that lack of STS support is an issue that several interested parties would like to see get

[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-21 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16944 Pinging participants from #16797 once more to get any feedback on the new proposal: @gatorsmile, @viirya, @ericl, @mallman and @cloud-fan --- If your project is set up for it, you can reply

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-20 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 @brkyvz Just for clarification, can this PR be merged as-is with a separate Jira/PR for adding a builder interface or is the builder interface a prerequisite for merging this? --- If your project

[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-19 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16944 @viirya I've updated the PR to include the initial catalog table checks you've suggested in the [```setupCaseSensitiveTable()```](https://github.com/apache/spark/pull/16944/files#diff

[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-19 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16944 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-18 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r101908155 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala --- @@ -0,0 +1,192 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-18 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r101908105 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala --- @@ -0,0 +1,192 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-17 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16944 Pinging @viirya and @ericl to take a look at the updates per their feedback --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-17 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 @brkyvz Fair enough. Let me know if there's anything I can do to help get this merged. I can also take a look at adding a builder class for Kinesis streams as a separate PR before the code freeze

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-16 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 @brkyvz, @zsxwing â Any update here? Worried that this PR is starting to languish. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-16 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16944 I've updated the PR based on feedback received. Changes from previous commit: - Fixed a couple indent issues - Clarify some HiveSchemaInferenceSuite comments and general cleanup - Add

[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-16 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r101625724 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -296,6 +296,21 @@ object SQLConf { .longConf

[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-16 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r101606197 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -186,8 +212,7 @@ private[hive] class HiveMetastoreCatalog

[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-16 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r101605728 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -296,6 +296,17 @@ object SQLConf { .longConf

[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-16 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r101605711 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -163,6 +163,10 @@ case class BucketSpec( * @param

[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-16 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r101560890 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -161,23 +161,49 @@ private[hive] class HiveMetastoreCatalog

[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-16 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r101562475 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -296,6 +296,17 @@ object SQLConf { .longConf

[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-15 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r101461535 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -161,23 +161,49 @@ private[hive] class HiveMetastoreCatalog

[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-15 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r101461357 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala --- @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-15 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r101461155 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala --- @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-15 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r101460842 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -161,23 +161,49 @@ private[hive] class HiveMetastoreCatalog

[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-15 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r101460565 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -296,6 +296,17 @@ object SQLConf { .longConf

[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-15 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16944 Looks like I missed a Catalyst test. Updating the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16942: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-15 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16942 @mallman If I did close it then it was by mistake. The "Reopen and comment" button was disabled with a message about the PR being closed by a force push when I hovered over it. Afraid

[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-15 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16944 Re-pinging participants from #16797: @gatorsmile, @viirya, @ericl, @mallman and @cloud-fan. Sorry for the noise. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-15 Thread budde

GitHub user budde opened a pull request: https://github.com/apache/spark/pull/16944 [SPARK-19611][SQL] Introduce configurable table schema inference *Update: Accidentally broke #16942 via a force push. Opening a replacement PR.* Replaces #16797. See the discussion

[GitHub] spark issue #16942: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-15 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16942 Accidentally did a force-push to my branch for this issue. Looks like I'll have to open a new PR... --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #16942: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-15 Thread budde

Github user budde closed the pull request at: https://github.com/apache/spark/pull/16942 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16942: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-15 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16942 Tests appear to be failing due to the following error: ``` [info] Exception encountered when attempting to run a suite with class name

[GitHub] spark pull request #16942: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-15 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16942#discussion_r101366583 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -296,6 +296,17 @@ object SQLConf { .longConf

[GitHub] spark pull request #16942: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-15 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16942#discussion_r101366441 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -296,6 +296,17 @@ object SQLConf { .longConf

[GitHub] spark pull request #16942: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-15 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16942#discussion_r101366307 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala --- @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #16942: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-15 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16942 Pinging participants from #16797: @gatorsmile, @viirya, @ericl, @mallman and @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #16942: [SPARK-19611][SQL] Introduce configurable table s...

2017-02-15 Thread budde

GitHub user budde opened a pull request: https://github.com/apache/spark/pull/16942 [SPARK-19611][SQL] Introduce configurable table schema inference Replaces #16797. See the discussion in this PR for more details/justification for this change. ## Summary of changes

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-15 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 Thanks for all the feedback on this PR, folks. I'm going to close this PR/JIRA and open new ones for enabling configurable schema inference as a fallback. I'll ping each of you who has been active

[GitHub] spark pull request #16797: [SPARK-19455][SQL] Add option for case-insensitiv...

2017-02-15 Thread budde

Github user budde closed the pull request at: https://github.com/apache/spark/pull/16797 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-14 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 @brkyvz Any thoughts on moving the dependency version bump to a new commit and backporting to 2.11 with the pervious versions? @zswing Any chance you could take a look at this sometime

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-10 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 @brkyvz Would it be possible to backport to 2.1.1 if I reverted to the old version of the KCL and made the dependency upgrade as a separate PR? We'd still be adding ```aws-java-sdk-sts

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-09 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 @brkyvz Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-09 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 @mallman The Parquet schema merging methods take me back to #5214 :) I haven't been following changes here very closely but I would guess use of this method was replaced to the previously

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-09 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 Pinging @brkyvz and @srowen once more for a final look and to get Jenkins to retest the latest update (not sure if this still requires Jenkins admin rights). --- If your project is set up

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-09 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 @cloud-fan: > Spark does support mixed-case-schema tables, and it has always been. It's because we write table schema to metastore case-preserving, via table properties. Spark pr

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-08 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 > For better user experience, we should automatically infer the schema and write it back to metastore, if there is no case-sensitive table schema in metastore. This has the cost of detection the n

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-08 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 Looks like Jenkins is failing to build any recent PR due to the following error: ```[error] Could not find hadoop2.3 in the list. Valid options are ['hadoop2.6', 'hadoop2.7']``` I

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-08 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 Amending the PR again to fix new dependency conflict in spark/pom.xml. Thanks again for taking the time to review this, @brkyvz and @srowen. Please let me know if you feel any additional changes

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-07 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 > is it a completely compatibility issue? Seems like the only problem is, when we write out mixed-case-schema parquet files directly, and create an external table pointing to these files with Sp

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-07 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 Amending PR per review feedback. Issue around using optional stsExternalId argument in ```KinesisUtils.createStream()``` remains open. --- If your project is set up for it, you can reply

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-07 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r99909144 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala --- @@ -123,9 +123,143 @@ object KinesisUtils

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-07 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r99908239 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala --- @@ -34,11 +35,56 @@ import

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-07 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r99908125 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala --- @@ -449,22 +935,48 @@ private class

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-07 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r99907831 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala --- @@ -449,22 +935,48 @@ private class

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-07 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r99906733 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala --- @@ -123,9 +123,143 @@ object KinesisUtils

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-07 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r99905835 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala --- @@ -34,11 +35,56 @@ import

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-07 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r99905600 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala --- @@ -23,7 +23,8 @@ import

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-07 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r99905664 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala --- @@ -123,9 +123,143 @@ object KinesisUtils

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-07 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r99905577 --- Diff: external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisReceiverSuite.scala --- @@ -62,9 +62,20 @@ class KinesisReceiverSuite

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-07 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 > Can we write such schema (conflicting columns after lower-casing) into metastore? I think the scenario here would be that the metastore contains a single lower-case column name that co

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-06 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 > BTW, what behavior do we expect if a parquet file has two columns whose lower-cased names are identical? I can take a look at how Spark handled this prior to 2.1, although I'm not s

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-06 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 > how about we add a new SQL command to refresh the table schema in metastore by inferring schema with data files? This is a compatibility issue and we should have provided a way for users to migr

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-06 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 PR has been amended to reflect feedback. Thanks for taking a look, @brkyvz. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-06 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 > Should we roll these behaviors into one flag? e.g. ```spark.sql.hive.mixedCaseSchemaSupport``` That sounds reasonable to me. The only thing I wonder about is if there's any use case wh

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-06 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r99718950 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala --- @@ -35,10 +36,65 @@ import

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-06 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r99718545 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisExampleUtils.scala --- @@ -0,0 +1,22 @@ +/* + * Licensed

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-06 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 I'll double check, but I don't think ```spark.sql.hive.manageFilesourcePartitions=false``` would solve this issue since we're still deriving the file relation's dataSchema parameter from the schema

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-06 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 Amending this PR to upgrade the KCL/AWS SDK dependencies to more-current versions (1.7.3 and 1.11.76, respectively). The ```RegionUtils.getRegionByEndpoint()``` API was removed from the SDK, so I've

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-04 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 Bringing back schema inference is certainly a much cleaner option, although I imagine doing this in the old manner would negate the performance improvements brought by #14690 for any non-Spark 2.1

[GitHub] spark pull request #16797: [SPARK-19455][SQL] Add option for case-insensitiv...

2017-02-03 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16797#discussion_r99458106 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -268,13 +292,23 @@ private[parquet

[GitHub] spark pull request #16797: [SPARK-19455][SQL] Add option for case-insensitiv...

2017-02-03 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16797#discussion_r99456138 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -268,13 +292,23 @@ private[parquet

[GitHub] spark pull request #16797: [SPARK-19455][SQL] Add option for case-insensitiv...

2017-02-03 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16797#discussion_r99455967 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -249,10 +249,18 @@ object SQLConf { val

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-03 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 Looks like SparkR unit tests have been failing for all or most PRs after [this commit.](https://github.com/apache/spark/commit/48aafeda7db879491ed36fff89d59ca7ec3136fa) --- If your project is set

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-03 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 Relevant part of [Jenkins output](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72326/console) for SparkR tests: ``` Error: processing vignette 'sparkr-vignettes.Rmd

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-03 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 Pinging @ericl, @cloud-fan and @davies, committers who have all reviewed or submitted changes related to this. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #16797: [SPARK-19455][SQL] Add option for case-insensitiv...

2017-02-03 Thread budde

GitHub user budde opened a pull request: https://github.com/apache/spark/pull/16797 [SPARK-19455][SQL] Add option for case-insensitive Parquet field resolution ## What changes were proposed in this pull request? **Summary** - Add

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-02-02 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/16744#discussion_r99217534 --- Diff: pom.xml --- @@ -146,6 +146,8 @@ hadoop2 0.7.1 1.6.2 + +1.10.61 --- End diff -- I believe

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-01 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 Pinging @brkyvz as well, who also appears to have reviewed kinesis-asl changes in the past --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-01 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 There shouldn't be any change to behavior or compatibility when using the existing implementations of ```KinesisUtils.createStream()```. Only drawback I can think of is this is making

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-01 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 Pinging @zsxwing and @srowen, additional committers who have previously reviewed kinesis-asl changes. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-01-31 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 Pinging @tdas on this-- looks like you're the committer who has contributed the most to kinesis-asl. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-01-30 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 Also, on another note, the ```SerializableKCLAuthProvider``` class that **SparkQA** is identifying as a new public class is actually package private and replaced another package private class

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-01-30 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 The JIRA I opended for this issue contains further details and background. Linking to it here for good measure: * https://issues.apache.org/jira/browse/SPARK-19405 --- If your project

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-01-30 Thread budde

Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 Missed the code in python/streaming that this touches. Will update PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #16744: [SPARK-19405][STREAMING] Support for cross-accoun...

2017-01-30 Thread budde

GitHub user budde opened a pull request: https://github.com/apache/spark/pull/16744 [SPARK-19405][STREAMING] Support for cross-account Kinesis reads via STS - Add dependency on aws-java-sdk-sts - Replace SerializableAWSCredentials with new SerializableKCLAuthProvider class

[GitHub] spark pull request: [SPARK-13122] Fix race condition in MemoryStor...

2016-02-02 Thread budde

Github user budde commented on the pull request: https://github.com/apache/spark/pull/11012#issuecomment-178853877 From Jenkins output: >Fetching upstream changes from https://github.com/apache/spark.git > git --version # timeout=10 > git fetch --tags -

[GitHub] spark pull request: [SPARK-13122] Fix race condition in MemoryStor...

2016-02-02 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/11012#discussion_r51645315 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -304,10 +309,9 @@ private[spark] class MemoryStore(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-13122] Fix race condition in MemoryStor...

2016-02-02 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/11012#discussion_r51643796 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -304,10 +309,9 @@ private[spark] class MemoryStore(blockManager: BlockManager

[GitHub] spark pull request: [SPARK-13122] Fix race condition in MemoryStor...

2016-02-02 Thread budde

Github user budde commented on the pull request: https://github.com/apache/spark/pull/11012#issuecomment-178940544 Looks like a bunch of Spark SQL/Hive tests are failing due to this error: >Caused by: sbt.ForkMain$ForkError: org.apache.spark.SparkException: Job aborted

[GitHub] spark pull request: [SPARK-13122] Fix race condition in MemoryStor...

2016-02-02 Thread budde

Github user budde commented on the pull request: https://github.com/apache/spark/pull/11012#issuecomment-178929153 Latest change is looking good on my end. No unroll memory is being leaked. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-13122] Fix race condition in MemoryStor...

2016-02-02 Thread budde

Github user budde commented on the pull request: https://github.com/apache/spark/pull/11012#issuecomment-178766741 Updated PR with new implementation that uses a counter variable instead of requiring the whole method to be atomic. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-13122] Fix race condition in MemoryStor...

2016-02-01 Thread budde

Github user budde commented on the pull request: https://github.com/apache/spark/pull/11012#issuecomment-178314141 Pinging @andrewor14 , the original implementor of unrollSafely(), for any potential feedback. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-13122] Fix race condition in MemoryStor...

2016-02-01 Thread budde

GitHub user budde opened a pull request: https://github.com/apache/spark/pull/11012 [SPARK-13122] Fix race condition in MemoryStore.unrollSafely() https://issues.apache.org/jira/browse/SPARK-13122 A race condition can occur in MemoryStore's unrollSafely() method if two

[GitHub] spark pull request: [SPARK-6538][SQL] Add missing nullable Metasto...

2015-03-27 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/5214#discussion_r27315420 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -775,6 +777,32 @@ private[sql] object ParquetRelation2 extends Logging

[GitHub] spark pull request: [SPARK-6538][SQL] Add missing nullable Metasto...

2015-03-27 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/5214#discussion_r27311560 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -775,6 +777,32 @@ private[sql] object ParquetRelation2 extends Logging

[GitHub] spark pull request: [SPARK-6538][SQL] Add missing nullable Metasto...

2015-03-27 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/5214#discussion_r27332712 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -775,6 +777,32 @@ private[sql] object ParquetRelation2 extends Logging

[GitHub] spark pull request: [SPARK-6538][SQL] Add missing nullable Metasto...

2015-03-27 Thread budde

Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/5214#discussion_r27332969 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala --- @@ -775,6 +777,32 @@ private[sql] object ParquetRelation2 extends Logging

[GitHub] spark pull request: [SPARK-6538][SQL] Add missing nullable Metasto...

2015-03-26 Thread budde

GitHub user budde opened a pull request: https://github.com/apache/spark/pull/5214 [SPARK-6538][SQL] Add missing nullable Metastore fields when merging a Parquet schema Opening to replace #5188. When Spark SQL infers a schema for a DataFrame, it will take the union of all

[GitHub] spark pull request: [SPARK-6538][SQL] Add missing nullable Metasto...

2015-03-26 Thread budde

Github user budde closed the pull request at: https://github.com/apache/spark/pull/5188 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-6538][SQL] Add missing nullable Metasto...

2015-03-26 Thread budde

Github user budde commented on the pull request: https://github.com/apache/spark/pull/5188#issuecomment-86625383 Thanks for the input, @marmbrus and @liancheng. I'll resolve the conflicts and open a new PR against master. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-6538][SQL] Add missing nullable Metasto...

2015-03-26 Thread budde

Github user budde commented on the pull request: https://github.com/apache/spark/pull/5214#issuecomment-86699105 Taking a look at why these tests failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

< 1 2 3 4 >

201 - 300 of 302 matches

Mail list logo