Re: Spark join produce duplicate rows in resultset

2023-10-27 Thread Meena Rajani
Thanks all:

Patrick selected rev.* and I.* cleared the confusion. The Item actually
brought 4 rows hence the final result set had 4 rows.

Regards,
Meena

On Sun, Oct 22, 2023 at 10:13 AM Bjørn Jørgensen 
wrote:

> alos remove the space in rev. scode
>
> søn. 22. okt. 2023 kl. 19:08 skrev Sadha Chilukoori <
> sage.quoti...@gmail.com>:
>
>> Hi Meena,
>>
>> I'm asking to clarify, are the *on *& *and* keywords optional in the
>> join conditions?
>>
>> Please try this snippet, and see if it helps
>>
>> select rev.* from rev
>> inner join customer c
>> on rev.custumer_id =c.id
>> inner join product p
>> on rev.sys = p.sys
>> and rev.prin = p.prin
>> and rev.scode= p.bcode
>>
>> left join item I
>> on rev.sys = I.sys
>> and rev.custumer_id = I.custumer_id
>> and rev. scode = I.scode;
>>
>> Thanks,
>> Sadha
>>
>> On Sat, Oct 21, 2023 at 3:21 PM Meena Rajani 
>> wrote:
>>
>>> Hello all:
>>>
>>> I am using spark sql to join two tables. To my surprise I am
>>> getting redundant rows. What could be the cause.
>>>
>>>
>>> select rev.* from rev
>>> inner join customer c
>>> on rev.custumer_id =c.id
>>> inner join product p
>>> rev.sys = p.sys
>>> rev.prin = p.prin
>>> rev.scode= p.bcode
>>>
>>> left join item I
>>> on rev.sys = i.sys
>>> rev.custumer_id = I.custumer_id
>>> rev. scode = I.scode
>>>
>>> where rev.custumer_id = '123456789'
>>>
>>> The first part of the code brings one row
>>>
>>> select rev.* from rev
>>> inner join customer c
>>> on rev.custumer_id =c.id
>>> inner join product p
>>> rev.sys = p.sys
>>> rev.prin = p.prin
>>> rev.scode= p.bcode
>>>
>>>
>>> The  item has two rows which have common attributes  and the* final
>>> join should result in 2 rows. But I am seeing 4 rows instead.*
>>>
>>> left join item I
>>> on rev.sys = i.sys
>>> rev.custumer_id = I.custumer_id
>>> rev. scode = I.scode
>>>
>>>
>>>
>>> Regards,
>>> Meena
>>>
>>>
>>>
>
> --
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> Norge
>
> +47 480 94 297
>


Re: [Structured Streaming] Joins after aggregation don't work in streaming

2023-10-27 Thread Andrzej Zera
Hi, thank you very much for an update!

Thanks,
Andrzej

On 2023/10/27 01:50:35 Jungtaek Lim wrote:

> Hi, we are aware of your ticket and plan to look into it. We can't say
> about ETA but just wanted to let you know that we are going to look into
> it. Thanks for reporting!
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> On Fri, Oct 27, 2023 at 5:22 AM Andrzej Zera 
> wrote:
>
>> Hey All,
>>
>> I'm trying to reproduce the following streaming operation: "Time window
>> aggregation in separate streams followed by stream-stream join". According
>> to documentation, this should be possible in Spark 3.5.0 but I had no
>> success despite different tries.
>>
>> Here is a documentation snippet I'm trying to reproduce:
>> https://github.com/apache/spark/blob/261b281e6e57be32eb28bf4e50bea24ed22a9f21/docs/structured-streaming-programming-guide.md?plain=1#L1939-L1995
>>
>> I created an issue with more details but no one responded yet:
>> https://issues.apache.org/jira/browse/SPARK-45637
>>
>> Thank you!
>> Andrzej
>>
>