Taewoo Kim created ASTERIXDB-1743: ------------------------------------- Summary: Hash Join on 9-node is slower than conducting the same join on 1-node. Key: ASTERIXDB-1743 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1743 Project: Apache AsterixDB Issue Type: Bug Reporter: Taewoo Kim Assignee: Taewoo Kim
For the same amount of the data, conducting a simple hash join like the following AQL takes 2 hours and 30 minutes to finish on 9-nodes, while it takes 1 hour and 20 minutes on 1-node. The data file is a 5.5GB Json file. The difference is that spilling happens on 1-node and it's not happening on 9-node. {code} create type AmazonReviewType as open { id: uuid } create dataset AmazonReview9Mline(AmazonReviewType) primary key id auto generated; omit load ... count( for $o in dataset AmazonReview9Mline for $i in dataset AmazonReview9Mline where $o.reviewerID = $i.reviewerID and $o.id < $i.id return {"oid":$o.reviewerID, "iid":$i.reviewerID} ); {code} The following is a sample record. {code} { "reviewerID": "A2SUAM1J3GNN3B", "asin": "0000013714", "reviewerName": "J. McDonald", "helpful": [2, 3], "reviewText": "I bought this for my husband who plays the piano. He is having a wonderful time playing these old hymns. The music is at times hard to read because we think the book was published for singing from more than playing from. Great purchase though!", "overall": 5.0, "summary": "Heavenly Highway Hymns", "unixReviewTime": 1252800000, "reviewTime": "09 13, 2009" } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)