Github user NarineK commented on the pull request:

    https://github.com/apache/spark/pull/9012#issuecomment-149370054
  
    Hi @felixcheung  and @sun-rui , I've worked on your suggestions! Thanks!
    
    Also, I've noticed that the Cartesian product has in general a problem:
    let's say my df is: 
      age    name_a
    1  NA Michael
    2  30    Andy
    3  19  Justin
    
    df2 is:
    
        name_b test
    1 Michael  yes
    2    Andy   no
    3  Justin  yes
    4     Bob  yes
    
    when I do join(df, df2), I expect to see Cartesian product.  That is -  the 
output has to have nrow(df)*nrow(df2) rows - see also 
https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html
    
    but what I see is: 
    
    age    name test
    1  NA Michael  yes
    2  NA    Andy   no
    3  NA  Justin  yes
    4  NA     Bob  yes
    5  30 Michael  yes
    6  30    Andy   no
    
    This is less than expected.
    
    Could you please verify this ?
    
    Thanks,
    Narine
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to