shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1496311839
Thanks a lot @cloud-fan for the guidance and support in getting this issue
fixed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1495354346
Gentle ping @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1492817996
@cloud-fan I have made the change. All Tests have passed. Can you please
review?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1491954910
> If you really worry about regression, we can add a legacy config to fall
back to the old code. I don't agree to make code changes that only fix the
problem in one particular code
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1491795763
> according to the [code in
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1486232865
@cloud-fan Can you please check my last comments.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1482819187
FWIW Both the use cases were working fine in Spark 2.3
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1482792058
> I think case 1 works by accident. It's not an intentional design. I don't
think it's a bug that case 2 doesn't work.
As I had said in previous comment :
Not sure about the
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1482368131
> > It works because the resolved column has just one match
>
> But there are two id columns. Does Spark already do deduplication
somewhere?
Not sure about the
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480595916
> @shrprasa do you know how the case 1 works?
yes. It works because the resolved column has just one match
attributes: Vector(id#17)
but for second case, the match
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480574606
> df3.select("id").show()
@cloud-fan The example you have shared will behave the same even after this
fix. It will give ambiguous error.
The use case which the fix is trying
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480532873
> I second @srowen ‘s view. cc @cloud-fan
Thanks @yaooqinn for replying. Can you please explain why you think it's not
the right fix?
The fix only proposes to remove
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480217186
Gentle ping @dongjoon-hyun @mridulm @HyukjinKwon @yaooqinn Can you please
review this PR or direct it to someone who can review this PR.
--
This is an automated message from the
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1476233946
Gentle ping @dongjoon-hyun @mridulm @HyukjinKwon @yaooqinn Can you please
review this PR or direct it to someone who can review this PR.
--
This is an automated message from the
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1466394445
> That's a "no" from me, per the logic above
Thanks @srowen But seems I am not able to explain the change to you. So it's
better to get review from someone who is qualified to
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1465517104
Gentle ping @dongjoon-hyun @mridulm @HyukjinKwon @yaooqinn Can you please
review this PR or direct it to someone who is aware of this code.
--
This is an automated message from the
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1463230011
> Hm, I just don't see the logic in that. It isn't how SQL works either, as
far as I understand. Here's maybe another example, imagine a DataFrame defined
by `SELECT 3 as id, 3 as ID`.
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1461303721
It's very much relevant as this is the only case which requires the fix. If
they do not come from same source, the plan will reflect that and it will
throw the ambiguous error even
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1461288036
> Hm, how is it not ambiguous? When case insensitive, 'id' could mean one of
two different columns
It's not ambiguous because the when we are selecting using list of column
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1461262849
> I don't get it, it is due to case sensitivity; that's why it becomes
ambiguous and that's what you see. The issue is that the error isn't super
helpful because it shows the
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1459668923
> You first defined a case-sensitive data set, then queried in a
case-insensitive way, I guess the error is expected.
In the physical plan, both id and ID columns are projected to
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1459652942
> Can you try `set spark.sql.caseSensitive=true`?
Yes, I have tried it. With caseSensitive set to false, it will work as then
id and ID will be treated as separate columns.
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1459535564
Gentle ping @dongjoon-hyun @mridulm @HyukjinKwon @yaooqinn Can you please
review this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message,
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1458159825
> I'm not sure about the change, not sure I'm qualified to review it. I
think at best the error message should change; I am not clear that the result
is 'wrong'
Thanks for
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1457585690
Gentle Ping @srowen @dongjoon-hyun
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1454348175
@srowen @dongjoon-hyun Can you please review this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
26 matches
Mail list logo