[GitHub] spark issue #22332: [SPARK-25333][SQL] Ability add new columns in Dataset in...

2018-09-06 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/22332
  
Thanks guys.

On Thu, Sep 6, 2018 at 2:12 AM Hyukjin Kwon 
wrote:

> Thanks, @wmellouli .
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22332: [SPARK-25333][SQL] Ability add new columns in Dataset in...

2018-09-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22332
  
Thanks, @wmellouli.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22332: [SPARK-25333][SQL] Ability add new columns in Dataset in...

2018-09-06 Thread wmellouli
Github user wmellouli commented on the issue:

https://github.com/apache/spark/pull/22332
  
PR closed: we can use select to add new columns in a user-defined position.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22332: [SPARK-25333][SQL] Ability add new columns in Dataset in...

2018-09-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22332
  
If that's easily worked around, let's not add this one. There are too many 
APIs open now and we should rather try to reduce them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22332: [SPARK-25333][SQL] Ability add new columns in Dataset in...

2018-09-06 Thread wmellouli
Github user wmellouli commented on the issue:

https://github.com/apache/spark/pull/22332
  
@HyukjinKwon even instead of using the actual method `withColumn(colName: 
String, col: Column)` we can just add a column and select. The idea from this 
PR is to add more power/flexibility to withColumn method to cover more use 
cases, without affecting performance or backward compatibility.
IMO using withColumn is more natural and hides adding/replacing + select 
logic, in one operation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22332: [SPARK-25333][SQL] Ability add new columns in Dataset in...

2018-09-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22332
  
> What you suggested does not manage replacing existing column content.

I think we can still just add a column and select ... no?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22332: [SPARK-25333][SQL] Ability add new columns in Dataset in...

2018-09-05 Thread wmellouli
Github user wmellouli commented on the issue:

https://github.com/apache/spark/pull/22332
  
@HyukjinKwon Thank you for your review. To answer to your question about 
using `select`, take a look at my explaination 
[here](https://github.com/apache/spark/pull/22332#issuecomment-418526562) to 
@jaceklaskowski (he asked about the same question 
[here](https://github.com/apache/spark/pull/22332#issuecomment-418485371)). 
In addition I took into consideration @mgaido91 suggestion 
[here](https://github.com/apache/spark/pull/22332#issuecomment-418445688). So 
what do you think about this new version ?

Someone can run tests please ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22332: [SPARK-25333][SQL] Ability add new columns in Dataset in...

2018-09-05 Thread wmellouli
Github user wmellouli commented on the issue:

https://github.com/apache/spark/pull/22332
  
@jaceklaskowski I refactored with what you suggested in your review. Let me 
know what you think.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22332: [SPARK-25333][SQL] Ability add new columns in Dataset in...

2018-09-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22332
  
Can't we simply `select` after the the column is added? I wouldn't add this 
as well - it can look confusing to be honest IMO.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22332: [SPARK-25333][SQL] Ability add new columns in Dataset in...

2018-09-04 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22332
  
I also can't find a strong reason to append a new API in `Dataset`... btw, 
to add a new API there, you'd be better to discuss in jira before making a pr, 
I think. cc: @rxin @cloud-fan @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22332: [SPARK-25333][SQL] Ability add new columns in Dataset in...

2018-09-04 Thread wmellouli
Github user wmellouli commented on the issue:

https://github.com/apache/spark/pull/22332
  
@mgaido91 Thank you for your suggestion, I updated the PR name, description 
and sources with a new version using a parameter `atPosition` instead of a flag 
`atTheEnd`. Let me know what you think about this new implementation.

@jaceklaskowski Thank your for your review. But here I'm discussing about 
the `withColumn` method  that allows **adding and/or replacing** existing 
columns with new column content. What you suggested does not manage replacing 
existing column content. The idea is to make the `withColumn` method more 
flexible with keeping the backward compatibility. Actually the `withColumn` 
method is useful only for one use case: add (maybe replace) column at the end. 
I changed the implementation with @mgaido91 suggestion to make `withColumn` 
useful for more use cases. Let me know what you think about this new 
implementation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org