Taewoo Kim created ASTERIXDB-1743:
-------------------------------------

             Summary: Hash Join on 9-node is slower than conducting the same 
join on 1-node.
                 Key: ASTERIXDB-1743
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1743
             Project: Apache AsterixDB
          Issue Type: Bug
            Reporter: Taewoo Kim
            Assignee: Taewoo Kim


For the same amount of the data, conducting a simple hash join like the 
following AQL takes 2 hours and 30 minutes to finish on 9-nodes, while it takes 
1 hour and 20 minutes on 1-node. The data file is a 5.5GB Json file. The 
difference is that spilling happens on 1-node and it's not happening on 9-node. 

{code}
create type AmazonReviewType as open {
        id: uuid
}

create dataset AmazonReview9Mline(AmazonReviewType) primary key id auto 
generated;

omit load ...

count(
for $o in dataset AmazonReview9Mline
for $i in dataset AmazonReview9Mline
where $o.reviewerID = $i.reviewerID and $o.id < $i.id
return {"oid":$o.reviewerID, "iid":$i.reviewerID}
);
{code}

The following is a sample record.

{code}
{
  "reviewerID": "A2SUAM1J3GNN3B",
  "asin": "0000013714",
  "reviewerName": "J. McDonald",
  "helpful": [2, 3],
  "reviewText": "I bought this for my husband who plays the piano.  He is 
having a wonderful time playing these old hymns.  The music  is at times hard 
to read because we think the book was published for singing from more than 
playing from.  Great purchase though!",
  "overall": 5.0,
  "summary": "Heavenly Highway Hymns",
  "unixReviewTime": 1252800000,
  "reviewTime": "09 13, 2009"
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to