Some additional info to Mike's comment:
I discussed this "skew-resistant parallel join" with Mike yesterday and
checked some papers briefly. It looks like the common strategy is splitting
the block of tuples that has the same join key from one side (e.g., R) to
several nodes and let the tuple
To all, just to clarify: This is a self-join (equijoin) query on a
non-key attribute using real data (Amazon reviews, key is reviewer id)
which has a non-uniform value distribution in terms of the number of
entries per join key value, and in this case we really (someday...) need
a more
Interesting. Thanks for filing the issue as well!
Cheers,
Till
On 7 Dec 2016, at 15:08, Taewoo Kim wrote:
In short, the reason on why a specific one node among 9 nodes didn't
stop
its hash-join job was due to a skewness (out of 9M records, 40,000
records
was containing the same join key) as
In short, the reason on why a specific one node among 9 nodes didn't stop
its hash-join job was due to a skewness (out of 9M records, 40,000 records
was containing the same join key) as Abdullah suggested. Thanks all for the
information. Our system works fine as expected for this matter! Along the
@Abdullah: Thanks. I missed your e-mail and just checked that. Will try.
Best,
Taewoo
On Fri, Dec 2, 2016 at 10:32 AM, abdullah alamoudi
wrote:
> Taewoo,
> You can use the diagnostics end point (/admin/diagnostics) to look at all
> the stack traces from a single interface
Additional note: @Till: Yes. It happened again for the same hash-join
query. As we can see in the bold part of the following CC.log, one node
alone was executing for two hours.
Dec 01, 2016 10:41:56 PM
org.apache.hyracks.control.cc.scheduler.ActivityClusterPlanner
planActivityCluster
INFO: Plan
@Ian: I have a separate CC on one node that doesn't have a NC. Yourkit
might be a good way to find the reason. Thanks.
@Till: I think so. I am sending the same query now to see what happens this
time.
Best,
Taewoo
On Thu, Dec 1, 2016 at 10:41 PM, Till Westmann wrote:
> Hi