Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16744#discussion_r102338189
--- Diff: python/pyspark/streaming/kinesis.py ---
@@ -37,7 +37,8 @@ class KinesisUtils(object):
def createStream(ssc, kinesisAppName, streamName
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16744#discussion_r102338119
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala
---
@@ -78,8 +70,9 @@ case class
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
@brkyvz Anyone I can ping to help get this merged? The PR is going on a
month old at this point and I know that lack of STS support is an issue that
several interested parties would like to see get
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16944
Pinging participants from #16797 once more to get any feedback on the new
proposal: @gatorsmile, @viirya, @ericl, @mallman and @cloud-fan
---
If your project is set up for it, you can reply
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
@brkyvz Just for clarification, can this PR be merged as-is with a separate
Jira/PR for adding a builder interface or is the builder interface a
prerequisite for merging this?
---
If your project
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16944
@viirya I've updated the PR to include the initial catalog table checks
you've suggested in the
[```setupCaseSensitiveTable()```](https://github.com/apache/spark/pull/16944/files#diff
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16944
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16944#discussion_r101908155
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
---
@@ -0,0 +1,192 @@
+/*
+ * Licensed to the Apache
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16944#discussion_r101908105
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
---
@@ -0,0 +1,192 @@
+/*
+ * Licensed to the Apache
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16944
Pinging @viirya and @ericl to take a look at the updates per their feedback
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
@brkyvz Fair enough. Let me know if there's anything I can do to help get
this merged. I can also take a look at adding a builder class for Kinesis
streams as a separate PR before the code freeze
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
@brkyvz, @zsxwing â Any update here? Worried that this PR is starting to
languish.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16944
I've updated the PR based on feedback received. Changes from previous
commit:
- Fixed a couple indent issues
- Clarify some HiveSchemaInferenceSuite comments and general cleanup
- Add
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16944#discussion_r101625724
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -296,6 +296,21 @@ object SQLConf {
.longConf
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16944#discussion_r101606197
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -186,8 +212,7 @@ private[hive] class HiveMetastoreCatalog
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16944#discussion_r101605728
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -296,6 +296,17 @@ object SQLConf {
.longConf
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16944#discussion_r101605711
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
---
@@ -163,6 +163,10 @@ case class BucketSpec(
* @param
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16944#discussion_r101560890
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -161,23 +161,49 @@ private[hive] class
HiveMetastoreCatalog
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16944#discussion_r101562475
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -296,6 +296,17 @@ object SQLConf {
.longConf
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16944#discussion_r101461535
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -161,23 +161,49 @@ private[hive] class
HiveMetastoreCatalog
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16944#discussion_r101461357
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16944#discussion_r101461155
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16944#discussion_r101460842
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -161,23 +161,49 @@ private[hive] class
HiveMetastoreCatalog
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16944#discussion_r101460565
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -296,6 +296,17 @@ object SQLConf {
.longConf
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16944
Looks like I missed a Catalyst test. Updating the PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16942
@mallman If I did close it then it was by mistake. The "Reopen and comment"
button was disabled with a message about the PR being closed by a force push
when I hovered over it. Afraid
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16944
Re-pinging participants from #16797: @gatorsmile, @viirya, @ericl, @mallman
and @cloud-fan. Sorry for the noise.
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user budde opened a pull request:
https://github.com/apache/spark/pull/16944
[SPARK-19611][SQL] Introduce configurable table schema inference
*Update: Accidentally broke #16942 via a force push. Opening a replacement
PR.*
Replaces #16797. See the discussion
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16942
Accidentally did a force-push to my branch for this issue. Looks like I'll
have to open a new PR...
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user budde closed the pull request at:
https://github.com/apache/spark/pull/16942
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16942
Tests appear to be failing due to the following error:
```
[info] Exception encountered when attempting to run a suite with class
name
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16942#discussion_r101366583
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -296,6 +296,17 @@ object SQLConf {
.longConf
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16942#discussion_r101366441
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -296,6 +296,17 @@ object SQLConf {
.longConf
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16942#discussion_r101366307
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16942
Pinging participants from #16797: @gatorsmile, @viirya, @ericl, @mallman
and @cloud-fan
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
GitHub user budde opened a pull request:
https://github.com/apache/spark/pull/16942
[SPARK-19611][SQL] Introduce configurable table schema inference
Replaces #16797. See the discussion in this PR for more
details/justification for this change.
## Summary of changes
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16797
Thanks for all the feedback on this PR, folks. I'm going to close this
PR/JIRA and open new ones for enabling configurable schema inference as a
fallback. I'll ping each of you who has been active
Github user budde closed the pull request at:
https://github.com/apache/spark/pull/16797
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
@brkyvz Any thoughts on moving the dependency version bump to a new commit
and backporting to 2.11 with the pervious versions?
@zswing Any chance you could take a look at this sometime
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
@brkyvz Would it be possible to backport to 2.1.1 if I reverted to the old
version of the KCL and made the dependency upgrade as a separate PR? We'd still
be adding ```aws-java-sdk-sts
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
@brkyvz Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16797
@mallman The Parquet schema merging methods take me back to #5214 :)
I haven't been following changes here very closely but I would guess use of
this method was replaced to the previously
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
Pinging @brkyvz and @srowen once more for a final look and to get Jenkins
to retest the latest update (not sure if this still requires Jenkins admin
rights).
---
If your project is set up
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16797
@cloud-fan:
> Spark does support mixed-case-schema tables, and it has always been. It's
because we write table schema to metastore case-preserving, via table
properties.
Spark pr
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16797
> For better user experience, we should automatically infer the schema and
write it back to metastore, if there is no case-sensitive table schema in
metastore. This has the cost of detection the n
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
Looks like Jenkins is failing to build any recent PR due to the following
error:
```[error] Could not find hadoop2.3 in the list. Valid options are
['hadoop2.6', 'hadoop2.7']```
I
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
Amending the PR again to fix new dependency conflict in spark/pom.xml.
Thanks again for taking the time to review this, @brkyvz and @srowen. Please
let me know if you feel any additional changes
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16797
> is it a completely compatibility issue? Seems like the only problem is,
when we write out mixed-case-schema parquet files directly, and create an
external table pointing to these files with Sp
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
Amending PR per review feedback. Issue around using optional stsExternalId
argument in ```KinesisUtils.createStream()``` remains open.
---
If your project is set up for it, you can reply
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16744#discussion_r99909144
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala
---
@@ -123,9 +123,143 @@ object KinesisUtils
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16744#discussion_r99908239
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala
---
@@ -34,11 +35,56 @@ import
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16744#discussion_r99908125
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala
---
@@ -449,22 +935,48 @@ private class
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16744#discussion_r99907831
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala
---
@@ -449,22 +935,48 @@ private class
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16744#discussion_r99906733
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala
---
@@ -123,9 +123,143 @@ object KinesisUtils
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16744#discussion_r99905835
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala
---
@@ -34,11 +35,56 @@ import
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16744#discussion_r99905600
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala
---
@@ -23,7 +23,8 @@ import
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16744#discussion_r99905664
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala
---
@@ -123,9 +123,143 @@ object KinesisUtils
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16744#discussion_r99905577
--- Diff:
external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisReceiverSuite.scala
---
@@ -62,9 +62,20 @@ class KinesisReceiverSuite
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16797
> Can we write such schema (conflicting columns after lower-casing) into
metastore?
I think the scenario here would be that the metastore contains a single
lower-case column name that co
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16797
> BTW, what behavior do we expect if a parquet file has two columns whose
lower-cased names are identical?
I can take a look at how Spark handled this prior to 2.1, although I'm not
s
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16797
> how about we add a new SQL command to refresh the table schema in
metastore by inferring schema with data files? This is a compatibility issue
and we should have provided a way for users to migr
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
PR has been amended to reflect feedback. Thanks for taking a look, @brkyvz.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16797
> Should we roll these behaviors into one flag? e.g.
```spark.sql.hive.mixedCaseSchemaSupport```
That sounds reasonable to me. The only thing I wonder about is if there's
any use case wh
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16744#discussion_r99718950
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala
---
@@ -35,10 +36,65 @@ import
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16744#discussion_r99718545
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisExampleUtils.scala
---
@@ -0,0 +1,22 @@
+/*
+ * Licensed
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16797
I'll double check, but I don't think
```spark.sql.hive.manageFilesourcePartitions=false``` would solve this issue
since we're still deriving the file relation's dataSchema parameter from the
schema
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
Amending this PR to upgrade the KCL/AWS SDK dependencies to more-current
versions (1.7.3 and 1.11.76, respectively). The
```RegionUtils.getRegionByEndpoint()``` API was removed from the SDK, so I've
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16797
Bringing back schema inference is certainly a much cleaner option, although
I imagine doing this in the old manner would negate the performance
improvements brought by #14690 for any non-Spark 2.1
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16797#discussion_r99458106
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
---
@@ -268,13 +292,23 @@ private[parquet
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16797#discussion_r99456138
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
---
@@ -268,13 +292,23 @@ private[parquet
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16797#discussion_r99455967
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -249,10 +249,18 @@ object SQLConf {
val
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16797
Looks like SparkR unit tests have been failing for all or most PRs after
[this
commit.](https://github.com/apache/spark/commit/48aafeda7db879491ed36fff89d59ca7ec3136fa)
---
If your project is set
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16797
Relevant part of [Jenkins
output](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72326/console)
for SparkR tests:
```
Error: processing vignette 'sparkr-vignettes.Rmd
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16797
Pinging @ericl, @cloud-fan and @davies, committers who have all reviewed or
submitted changes related to this.
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user budde opened a pull request:
https://github.com/apache/spark/pull/16797
[SPARK-19455][SQL] Add option for case-insensitive Parquet field resolution
## What changes were proposed in this pull request?
**Summary**
- Add
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16744#discussion_r99217534
--- Diff: pom.xml ---
@@ -146,6 +146,8 @@
hadoop2
0.7.1
1.6.2
+
+1.10.61
--- End diff --
I believe
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
Pinging @brkyvz as well, who also appears to have reviewed kinesis-asl
changes in the past
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
There shouldn't be any change to behavior or compatibility when using the
existing implementations of ```KinesisUtils.createStream()```. Only drawback I
can think of is this is making
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
Pinging @zsxwing and @srowen, additional committers who have previously
reviewed kinesis-asl changes.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
Pinging @tdas on this-- looks like you're the committer who has contributed
the most to kinesis-asl.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
Also, on another note, the ```SerializableKCLAuthProvider``` class that
**SparkQA** is identifying as a new public class is actually package private
and replaced another package private class
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
The JIRA I opended for this issue contains further details and background.
Linking to it here for good measure:
* https://issues.apache.org/jira/browse/SPARK-19405
---
If your project
Github user budde commented on the issue:
https://github.com/apache/spark/pull/16744
Missed the code in python/streaming that this touches. Will update PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
GitHub user budde opened a pull request:
https://github.com/apache/spark/pull/16744
[SPARK-19405][STREAMING] Support for cross-account Kinesis reads via STS
- Add dependency on aws-java-sdk-sts
- Replace SerializableAWSCredentials with new SerializableKCLAuthProvider
class
Github user budde commented on the pull request:
https://github.com/apache/spark/pull/11012#issuecomment-178853877
From Jenkins output:
>Fetching upstream changes from https://github.com/apache/spark.git
> git --version # timeout=10
> git fetch --tags -
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/11012#discussion_r51645315
--- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
@@ -304,10 +309,9 @@ private[spark] class MemoryStore(blockManager:
BlockManager
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/11012#discussion_r51643796
--- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala ---
@@ -304,10 +309,9 @@ private[spark] class MemoryStore(blockManager:
BlockManager
Github user budde commented on the pull request:
https://github.com/apache/spark/pull/11012#issuecomment-178940544
Looks like a bunch of Spark SQL/Hive tests are failing due to this error:
>Caused by: sbt.ForkMain$ForkError: org.apache.spark.SparkException: Job
aborted
Github user budde commented on the pull request:
https://github.com/apache/spark/pull/11012#issuecomment-178929153
Latest change is looking good on my end. No unroll memory is being leaked.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user budde commented on the pull request:
https://github.com/apache/spark/pull/11012#issuecomment-178766741
Updated PR with new implementation that uses a counter variable instead of
requiring the whole method to be atomic.
---
If your project is set up for it, you can reply
Github user budde commented on the pull request:
https://github.com/apache/spark/pull/11012#issuecomment-178314141
Pinging @andrewor14 , the original implementor of unrollSafely(), for any
potential feedback.
---
If your project is set up for it, you can reply to this email and have
GitHub user budde opened a pull request:
https://github.com/apache/spark/pull/11012
[SPARK-13122] Fix race condition in MemoryStore.unrollSafely()
https://issues.apache.org/jira/browse/SPARK-13122
A race condition can occur in MemoryStore's unrollSafely() method if two
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/5214#discussion_r27315420
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
@@ -775,6 +777,32 @@ private[sql] object ParquetRelation2 extends Logging
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/5214#discussion_r27311560
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
@@ -775,6 +777,32 @@ private[sql] object ParquetRelation2 extends Logging
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/5214#discussion_r27332712
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
@@ -775,6 +777,32 @@ private[sql] object ParquetRelation2 extends Logging
Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/5214#discussion_r27332969
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
@@ -775,6 +777,32 @@ private[sql] object ParquetRelation2 extends Logging
GitHub user budde opened a pull request:
https://github.com/apache/spark/pull/5214
[SPARK-6538][SQL] Add missing nullable Metastore fields when merging a
Parquet schema
Opening to replace #5188.
When Spark SQL infers a schema for a DataFrame, it will take the union of
all
Github user budde closed the pull request at:
https://github.com/apache/spark/pull/5188
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user budde commented on the pull request:
https://github.com/apache/spark/pull/5188#issuecomment-86625383
Thanks for the input, @marmbrus and @liancheng. I'll resolve the conflicts
and open a new PR against master.
---
If your project is set up for it, you can reply
Github user budde commented on the pull request:
https://github.com/apache/spark/pull/5214#issuecomment-86699105
Taking a look at why these tests failed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
201 - 300 of 302 matches
Mail list logo