Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21427
Thanks all for the discussion and reviews!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
Merged to master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92223/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21427
**[Test build #92223 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92223/testReport)**
for PR 21427 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21427
**[Test build #92223 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92223/testReport)**
for PR 21427 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/418/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
Seems fine and I am okay with it.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92077/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21427
**[Test build #92077 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92077/testReport)**
for PR 21427 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21427
**[Test build #92077 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92077/testReport)**
for PR 21427 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4208/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/314/
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92066/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21427
**[Test build #92066 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92066/testReport)**
for PR 21427 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21427
**[Test build #92066 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92066/testReport)**
for PR 21427 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4202/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/308/
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92048/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21427
**[Test build #92048 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92048/testReport)**
for PR 21427 at commit
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21427
I updated to use a conf that reverts to column assignment by position,
regardless of the the type of column labels in the Pandas DataFrame, and added
a test for this. I also put this in a
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4187/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/293/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21427
**[Test build #92048 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92048/testReport)**
for PR 21427 at commit
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
Okay .. but let's make sure this case was special ...
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21427
I agree with you @HyukjinKwon but we have had lots of discussion here and
are still split, so let's just add the config. There is still the off chance
that someone upgrading from 2.3.0
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
2.3.1 wouldn't have this behaviour change and we marked this as
experimental. So, on the other hand, it probably will give more time to expose
that this is discouraged in production and there
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21427
> But we marked this as experimental ...
That's also special for this case, we marked it as experimental in 2.3.1.
Not a lot of behavior changes are similar to this one. To
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
> we can't just change the behavior. We think the old behavior doesn't make
sense and users should change their code, but users may not think in this way.
I think this basically mean we
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
Okay, but I get it can be smooth to go ahead. I am okay.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
But we marked this as experimental. If we treat old API and new
experimental API in the same way, I wonder why we have them. One thing I am
less clear is, what kind of scenario we are worried
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21427
Pandas UDF is already in 2 releases(2.3.0 and 2.3.1), we can't just change
the behavior. We think the old behavior doesn't make sense and users should
change their code, but users may not think
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
How about we just start to have configurations classified for each version
if it sounds better to have configurations for each behaviour change? For
example, we could have postfix like spark_23
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21427
@icexelloss @HyukjinKwon It is always simple to deprecate the confs in the
release of Spark 3.0. Let us make it configurable in the next release which is
Spark 2.4.
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
It's at least not trivial as much as Scaia side's. I am okay but please
make sure what case we will allow by this configuration.
---
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21427
if adding the config is trivial, let's add it. We can pick the new behavior
by default.
---
-
To unsubscribe, e-mail:
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21427
I would prefer if we could do this without a config because while the
current behavior can work if the user knows what they are doing, it can also
fail very easily and not obviously. So to me
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/21427
I think the config switch is for maintaining backward compatibility in case
someone is hit with this. so I think it's a good idea.
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
yea, I generally agree with that and I feel in the same way. I think I had
a talk about this @gatorsmile and @cloud-fan multiple times. Here is my
thought: we should not make a configuration
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/21427
@HyukjinKwon I agree with you 99% people will mostly certainly not use the
config. I think @gatorsmile 's concern is that in the rare case that some
people are actually depending on the existing
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
I'm okay if that's only the way to get through here. but I must say I
wonder who's going to intentionally switch this off though. This now sounds
more like a bug or a design issue to be fixed
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/21427
I ran into @ueshin and @gatorsmile at the Summit. It seems the preferable
way to move forward is to having a configuration to fall back to the existing
behavior and change the default behavior
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21427
I made https://issues.apache.org/jira/browse/SPARK-2 to improve the
documentation of the current behavior, until we can resolve this.
---
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/21427
@gatorsmile @ueshin Thanks for joining the discussion!
I wonder if you agree that at least for case (1) here
https://github.com/apache/spark/pull/21427#issuecomment-392070950, we should
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21427
Here are some examples that currently work, but would no longer work under
the proposed fix. These are all cases where columns are named with strings, but
the names do not match the schema (let
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21427
Thanks @ueshin and @gatorsmile for taking a look, I agree the proposed fix
changes some behavior, but I think that behavior is either error-prone or
doesn't make much sense. Let me put up some
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
Thing is, we already do similar thing via createDataFrame with list and
dictionary. I believe @icexelloss borrowed this idea from there:
```
>>> spark.createDataFrame([["a", 1],
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21427
Please let users decide whether they are resolved by names or by position.
---
-
To unsubscribe, e-mail:
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
For configuration, I wasn't sure if we should send the whole configuration
map into worker.py side, if we should fix the command writing way, and also was
thinking of the current timezone way
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/21427
I guess sending configurations is not that difficult.
We can write configs (as `Map[String, String]` for further configurations
in the future?) before `PythonUDFRunner.writeUDFs(dataOut, funcs,
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
Yup, my impression was that there could be a corner case too but I wasn't
sure how much the corner case makes sense, and haven't checked it closelt yet.
I believe elaborating the case might be
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/21427
I'm sorry for the late review, but I think the current fix is still
behavior change..
---
-
To unsubscribe, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91275/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21427
**[Test build #91275 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91275/testReport)**
for PR 21427 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3684/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21427
**[Test build #91275 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91275/testReport)**
for PR 21427 at commit
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/21427
@BryanCutler could you also add in the document of grouped_map to explains
the behavior?
---
-
To unsubscribe, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91167/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21427
**[Test build #91167 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91167/testReport)**
for PR 21427 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3595/
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21427
> what will happen if the given schema names are numbers? could we
recognise it?
The schema names will be integers as strings, so they are treated same as
any string. This is only if the
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21427
**[Test build #91167 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91167/testReport)**
for PR 21427 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91166/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21427
**[Test build #91166 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91166/testReport)**
for PR 21427 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21427
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3594/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21427
**[Test build #91166 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91166/testReport)**
for PR 21427 at commit
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
SGTM if it works.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/21427
I think so:
```
>>> type(pd.DataFrame({'1': [1], '2': [2]}).columns)
>>> type(pd.DataFrame([[1, 2.0, "hello"]]).columns)
```
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
what will happen if the given schema names are numbers? could we recognise
it?
---
-
To unsubscribe, e-mail:
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21427
If we can fix it without breaking existing behavior that would be awesome.
On Fri, May 25, 2018 at 9:59 AM Bryan Cutler
wrote:
> I've been thinking about
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21427
I've been thinking about this and came to the same conclusion as
@icexelloss here
https://github.com/apache/spark/pull/21427#issuecomment-392070950 that we could
really support both names and
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21427
I opened https://issues.apache.org/jira/browse/SPARK-24392 to continue the
discussion about changing this to experimental. IMO it was a bit of an
oversight to not do so initially, but agree
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
btw, we should really mark this as experimental and allow a bit of
behaviour changes really. I guess that's what we meat by Experimental:
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
Yea, that's what I meant. Having configuration should be desiable but I
doubt if we should extend that way further. one time thing should be fine too.
I roughly guess that's going to be a min
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/21427
@rxin @gatorsmile thanks for joining the discussion!
On the configuration side, we have already some mechanism to do so for the
"timezone" config:
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
I need to take a look too but sounds possible. WDYT @BryanCutler? BTW, the
fix there should be the most appropriate place to fix since that's actually
where the problem was started.
---
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21427
On the config part, I havenât looked at the code but canât we just
reorder
the columns on the JVM side? Why do we need to reorder them on the Python
side?
On Fri, May 25, 2018 at
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
I believe it was just a mistake to correct - we forget this to mark it
experimental. It's pretty unstable and many JIRAs are being open. @BryanCutler
mind if I ask to go ahead if you find some
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21427
I agree it should have started experimental. It is pretty weird to after
the fact mark something experimental though.
On Fri, May 25, 2018 at 12:23 AM Hyukjin Kwon
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
BTW, what do you think about adding a blocker to set this feature as
experimental @rxin? I think it's pretty new feature and it should be reasonable
to call it experimental.
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21427
because it needs to change the our pickle protocol to access to the
configuration if I remember this correctly. cc @ueshin too.
---
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21427
Why is it difficult?
On Fri, May 25, 2018 at 12:03 AM Hyukjin Kwon
wrote:
> but as I said it's difficult to have a configuration there. Shall we just
1 - 100 of 120 matches
Mail list logo