[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22140
  
We are very conservative when backporting the PR to the released version. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-10 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/22140
  
> Thanks for your understanding. Normally, we are very conservative to 
introduce any potential behavior change to the released version.

Yes, I know. It seemed to me at the time as failing fast rather than later 
and improving the error message, but best to be safe. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22140
  
@BryanCutler @HyukjinKwon Thanks for your understanding. Normally, we are 
very conservative to introduce any potential behavior change to the released 
version. 

I just reverted it from branch 2.3. Thanks! 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-10 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/22140
  
> Can we just simply take this out from branch-2.3?

Thanks @HyukjinKwon , that is fine with me. What do you think @gatorsmile ?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22140
  
Yea, actually I wouldn't at least backport this to branch-2.3 since the 
release is very close. Looks a bug to me as well.

One nitpicking is the case with RDD operation:

```python
>>> from pyspark.sql import Row
>>> row_class = Row("c1", "c2")
>>> row = row_class(1, 2, 3)
>>> spark.sparkContext.parallelize([row]).map(lambda r: r.c1).collect()
[1]
```

This is really unlikely and I even wonder if it makes any sense, but still 
there might be a case although the creation of the namedtuple like row itself 
should be disallowed, as fixed here.

Can we just simply take this out from branch-2.3?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-08 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/22140
  
@gatorsmile it seemed like a straightforward bug to me. Rows with extra 
values lead to incorrect output and exceptions when used in `DataFrames`, so it 
did not seem like there was any possible this would break existing code. For 
example

```
In [1]: MyRow = Row('a','b')

In [2]: print(MyRow(1,2,3))
Row(a=1, b=2)

In [3]: spark.createDataFrame([MyRow(1,2,3)])
Out[3]: DataFrame[a: bigint, b: bigint]

In [4]: spark.createDataFrame([MyRow(1,2,3)]).show()
18/09/08 21:55:48 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7)
java.lang.IllegalStateException: Input row doesn't have expected number of 
values required by the schema. 2 fields are required while 3 values are 
provided.

In [5]: spark.createDataFrame([MyRow(1,2,3)], schema="x: int, y: 
int").show()

ValueError: Length of object (3) does not match with length of fields (2)
```
Maybe I was too hasty with backporting and this needed some discussion. Do 
you know of a use case that this change would break?





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-08 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/22140
  
```
@xuanyuanking Could you please update the document?
```
#22369 Thanks for reminding, I'll pay attention in future work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22140
  
@BryanCutler What is the reason to backport this PR? This sounds a behavior 
change. 

@xuanyuanking Could you please update the document?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-06 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/22140
  
Thanks @BryanCutler @HyukjinKwon !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-06 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/22140
  
merged to master, branch 2.4 and 2.3. Thanks @xuanyuanking !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-06 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/22140
  
> yea, to me it looks less sense actually but seems at least working for 
now:

good point, I guess it only fails when you supply a schema.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22140
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22140
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95756/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22140
  
**[Test build #95756 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95756/testReport)**
 for PR 22140 at commit 
[`eb3f506`](https://github.com/apache/spark/commit/eb3f506817e6cb99230853ffd5c50e3299527d4b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22140
  
**[Test build #95756 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95756/testReport)**
 for PR 22140 at commit 
[`eb3f506`](https://github.com/apache/spark/commit/eb3f506817e6cb99230853ffd5c50e3299527d4b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22140
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22140
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2901/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22140
  
@BryanCutler, for 
https://github.com/apache/spark/pull/22140#issuecomment-414802978, yea, to me 
it looks less sense actually but seems at least working for now:

```python
from pyspark.sql import Row
rowClass = Row("c1", "c2")
spark.createDataFrame([rowClass(1)]).show()
```

```
+---+
| c1|
+---+
|  1|
+---+
```

I think we should consider disallowing it in 3.0.0 given the test above.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-09-05 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/22140
  
gental ping @HyukjinKwon @BryanCutler 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-08-22 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/22140
  
AFAIC, the fix should forbid illegal extra value passing. If less values 
than fields it should get a `AttributeError` while accessing as the currently 
implement, not ban it here? What do you think :) @HyukjinKwon @BryanCutler 
Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-08-21 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/22140
  
Does it make any sense to have less values than fields? Maybe we should 
check that they are equal, wdyt @HyukjinKwon ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22140
  
cc @BryanCutler as well since we discussed an issue about this code path 
before.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-08-18 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/22140
  
cc @HyukjinKwon


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22140
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94920/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22140
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22140
  
**[Test build #94920 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94920/testReport)**
 for PR 22140 at commit 
[`b8c6522`](https://github.com/apache/spark/commit/b8c6522bccde51584e9878144924fd7b92f8785f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22140
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2296/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22140
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row

2018-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22140
  
**[Test build #94920 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94920/testReport)**
 for PR 22140 at commit 
[`b8c6522`](https://github.com/apache/spark/commit/b8c6522bccde51584e9878144924fd7b92f8785f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org