GitHub user brkyvz opened a pull request:
https://github.com/apache/spark/pull/15942
[SPARK-18407] Inferred partition columns cause assertion error in
StructuredStreaming
## What changes were proposed in this pull request?
It turns out we are a bit enthusiastic when providing users partition
columns when they read the data even if they didn't specify it in their schema.
This causes an assertion error in Streaming jobs, because the `Attribute`s of a
given trigger don't match the `Attribute`s returned by the DataSource. The
DataSource returns additional partition columns all the time.
While this is weird behavior for batch as well IMHO, because someone asked
for a specific schema, but we returned them something else, apparently this
behavior existed since Spark 1.6. I didn't try older versions. Anyway, I tried
fixing this by not enforcing a strict size check, but by picking out the
columns that we want from the batch DataSource.
## How was this patch tested?
Regression test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/brkyvz/spark filesource-part-bug
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15942.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15942
----
commit ca0bf68e7269bc74da923a2f228bdf43b1bc868c
Author: Burak Yavuz <[email protected]>
Date: 2016-11-18T19:55:42Z
save
try fix
fix
commit 6578cc34cd9f6938a98361047bee61d1ab4e08fb
Author: Burak Yavuz <[email protected]>
Date: 2016-11-19T01:43:13Z
fixed
commit ed2c3f92d45d5075a475d83c79e45672b3aad794
Author: Burak Yavuz <[email protected]>
Date: 2016-11-19T03:06:25Z
better debug message
commit 8465aca7dfce72f4141e4bec241bc833a2e4a83c
Author: Burak Yavuz <[email protected]>
Date: 2016-11-20T03:23:54Z
ready for review
commit c2c2cd5890a38ac3848d724fbe10b24a7cd44ad6
Author: Burak Yavuz <[email protected]>
Date: 2016-11-20T03:25:22Z
make test a bit more complex
commit 879c6e1449074badeb6da73fb10fdd6efcb5838c
Author: Burak Yavuz <[email protected]>
Date: 2016-11-20T03:29:55Z
make test a bit more complex
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]