Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20280
closing now, will revisit for Spark 3.0
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional com
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20280
holding off is fine; however, I am less sure about the configuration if
that's not something you guys feel strongly.
---
-
T
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20280
@HyukjinKwon , I was also thinking about holding off on this until 3.0.0
and then make a clean switch. What do you think about that @holdenk ?
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20280
Probably that'd work but also it'd be trickier to add / remove that
configuration. Another similar option maybe just close this for now and target
this for 3.0.0 since we already started to talk
Github user holdenk commented on the issue:
https://github.com/apache/spark/pull/20280
I'm worried that people might have two rows with different meanings but the
same type and their application will start producing garbage #s. I think a lot
of people go from RDDs of Rows to DFs in Py
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20280
Thanks @HyukjinKwon and @felixcheung , I'm a bit worried too that this
might break someones code, but it doesn't affect `createDataFrame` from `Row`s,
it's only when the Row is serialized like g
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20280
@BryanCutler, mind if I ask to clarify what happens for end-to-end cases in
the PR description (like before & after with explaining the reasons)? the
change looks small but possibly a breaking c
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20280
BTW, I believe it's not so easy to pass a configuration from a very quick
look because the exception usually would be thrown in a Python worker process.
---
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20280
If the renaming scenario works in most of cases as expected, I think it'd
be worthwhile to have a configuration; however, the previous behaviour looks
actually odd because it's going to work onl
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/20280
I'm kinda worry the example you give above is actually fairly common -
construct with kwargs, and then (re-)name the columns.
perhaps worthwhile to consider a config switch?
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20280
Right. Will triple check for sure but I am with you for now. Yup, something
in the migration guide makes much more sense to me too.
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20280
oops, I missed this. will take a look shortly.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additio
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20280
@HyukjinKwon @holdenk and @MrBago have any thoughts on moving forward with
this change?
---
-
To unsubscribe, e-mail: review
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20280
Also, this will cause a breaking change if `Row`s are defined with kwargs
and schema changes field names, like this:
```
data = [Row(key=i, value=str(i)) for i in range(100)]
rdd
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20280
> I think we should raise an error if __from_dict__ is set and the user
tries to index using a position or a slice.
I'd also like to follow up with another PR to address some of the usab
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20280
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89530/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20280
**[Test build #89530 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89530/testReport)**
for PR 20280 at commit
[`10bf2d0`](https://github.com/apache/spark/commit/1
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20280
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20280
Let me restate what I think the intended behavior of Row is:
If a `Row` is made from kwargs, then the order of the fields can not be
relied upon and whenever accessing data, it must be d
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20280
Looking at this again, I'm back to thinking this is the right fix. Based
on #14469, if the `Row` objects were made with named arguments, then the intent
is for elements to be looked up by field
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20280
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2452/
Tes
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20280
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20280
**[Test build #89530 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89530/testReport)**
for PR 20280 at commit
[`10bf2d0`](https://github.com/apache/spark/commit/10
Github user holdenk commented on the issue:
https://github.com/apache/spark/pull/20280
Awesome, looking forward to the update.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional command
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20280
Hey @holdenk , yeah I've been meaning to circle back to this and get some
kind of resolution. I'll try to take another look later this week.
---
--
Github user holdenk commented on the issue:
https://github.com/apache/spark/pull/20280
Hey @BryanCutler is this still on your radar?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional c
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20280
Thanks @HyukjinKwon and @MrBago for reviewing. After thinking about this
some more, I don't think this is the right solution. Like @HyukjinKwon pointed
out, the supplied schema names should al
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20280
^ Yup, let's leave the performance issue out. I think we might have to
raise an error too but it's kind of a radical change.
As a note, sorted fields are documented:
https://gi
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/20280
BTW the performance issue is orthogonal to the serialization issue raised
in this jira/PR. Maybe we should avoid scope creep in this thread.
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20280
Ahh. @zero323 too if you are available because I think we had a talk about
this somewhere long time ago. Yea, I am aware of the issue itself. Will take a
look soon.
---
--
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/20280
I think we should raise an error if `__from_dict__` is set and the user
tries to index using a position or a slice.
Indexing by field name takes the same code path for Rows that are and are
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20280
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86205/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20280
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20280
**[Test build #86205 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86205/testReport)**
for PR 20280 at commit
[`a7d3396`](https://github.com/apache/spark/commit/a
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20280
**[Test build #86205 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86205/testReport)**
for PR 20280 at commit
[`a7d3396`](https://github.com/apache/spark/commit/a7
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20280
**[Test build #86204 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86204/testReport)**
for PR 20280 at commit
[`2192e49`](https://github.com/apache/spark/commit/2
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20280
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86204/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20280
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20280
**[Test build #86204 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86204/testReport)**
for PR 20280 at commit
[`2192e49`](https://github.com/apache/spark/commit/21
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20280
**[Test build #86193 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86193/testReport)**
for PR 20280 at commit
[`315b8de`](https://github.com/apache/spark/commit/3
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20280
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86193/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20280
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20280
@MrBago @HyukjinKwon I think the above behavior of the `Row` class is a
little screwy, but at least this fixes it to be more consistent. I'm not sure
if there is a way to rectify the two differ
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/20280
After looking into this, it seems like the behavior of the `Row` class is
as follows:
If a `Row` is made from kwargs, then the order of the fields can not be
relied upon and whenever ac
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20280
**[Test build #86193 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86193/testReport)**
for PR 20280 at commit
[`315b8de`](https://github.com/apache/spark/commit/31
45 matches
Mail list logo