Hi Meena, It's not impossible, but it's unlikely that there's a bug in Spark SQL randomly duplicating rows. The most likely explanation is there are more records in the item table that match your sys/custumer_id/scode criteria than you expect.
In your original query, try changing select rev.* to select I.*. This will show you the records from item that the join produces. If the first part of the code only returns one record, I expect you will see 4 distinct records returned here. Thanks, Patrick On Sun, Oct 22, 2023 at 1:29 AM Meena Rajani <meenakraj...@gmail.com> wrote: > Hello all: > > I am using spark sql to join two tables. To my surprise I am > getting redundant rows. What could be the cause. > > > select rev.* from rev > inner join customer c > on rev.custumer_id =c.id > inner join product p > rev.sys = p.sys > rev.prin = p.prin > rev.scode= p.bcode > > left join item I > on rev.sys = i.sys > rev.custumer_id = I.custumer_id > rev. scode = I.scode > > where rev.custumer_id = '123456789' > > The first part of the code brings one row > > select rev.* from rev > inner join customer c > on rev.custumer_id =c.id > inner join product p > rev.sys = p.sys > rev.prin = p.prin > rev.scode= p.bcode > > > The item has two rows which have common attributes and the* final join > should result in 2 rows. But I am seeing 4 rows instead.* > > left join item I > on rev.sys = i.sys > rev.custumer_id = I.custumer_id > rev. scode = I.scode > > > > Regards, > Meena > > >