[Impala-ASF-CR] IMPALA-4733: Avoid using several loading threads on one table.

2017-01-13 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4733: Avoid using several loading threads on one table.
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5707/1//COMMIT_MSG
Commit Message:

PS1, Line 7: IMPALA-4733
> Wrong jira.
Done


PS1, Line 17: simpe
> nit: typo
Done


PS1, Line 22: Testing
> The JIRA said that the code had to be modified to run the test. Is there a 
It's a race so reproducing was easier by adding a sleep() because our small 
test tables load very quickly making this bug hard to hit and measure in a 
reasonable time.

The manual test also involves inspecting the output of jstack due to a lack of 
other metrics.

The behavior correct behavior is tricky to test, let me think about it some 
more. Open to suggestions.


http://gerrit.cloudera.org:8080/#/c/5707/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

Line 280: tableLoadingSet_.remove(tblName);
> Understood, thanks for the explanation. Agree there is still a race here.
I originally didn't want to address this race to minimize the changes, but it 
seems ok. Fixed it as suggested.


PS1, Line 282: tblName
> Looks like a thrift object. Just wanted to make sure its printed correctly 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idba5f1808e0b9cbbcf46245834d8ad38d01231cb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4733: Avoid using several loading threads on one table.

2017-01-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4733: Avoid using several loading threads on one table.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5707/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

Line 280: tableLoadingSet_.remove(tblName);
> Right, but there's no synchronisation across tableLoadingSet_ and loadingTa
Understood, thanks for the explanation. Agree there is still a race here.


-- 
To view, visit http://gerrit.cloudera.org:8080/5707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idba5f1808e0b9cbbcf46245834d8ad38d01231cb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4733: Avoid using several loading threads on one table.

2017-01-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4733: Avoid using several loading threads on one table.
..


Patch Set 1:

Also, to be clear, I don't think the race is a show-stopper, since this is a 
clear improvement and the code has other issues that prevent us achieving 
perfection, but I think we should at least document that it's possible.

-- 
To view, visit http://gerrit.cloudera.org:8080/5707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idba5f1808e0b9cbbcf46245834d8ad38d01231cb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4733: Avoid using several loading threads on one table.

2017-01-13 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4733: Avoid using several loading threads on one table.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5707/1//COMMIT_MSG
Commit Message:

PS1, Line 22: Testing
The JIRA said that the code had to be modified to run the test. Is there a way 
to add a test that we can put in the custom_cluter tests or e2e tests?


-- 
To view, visit http://gerrit.cloudera.org:8080/5707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idba5f1808e0b9cbbcf46245834d8ad38d01231cb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4733: Avoid using several loading threads on one table.

2017-01-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4733: Avoid using several loading threads on one table.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5707/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

Line 280: tableLoadingSet_.remove(tblName);
> tableLoadingSet_ seems to be a synchronized (thread-safe) one?
Right, but there's no synchronisation across tableLoadingSet_ and 
loadingTable_. 

So if two threads are in here at the same time and get the same table, there's 
no guarantee that they don't both end up waiting. E.g. the following sequence 
of events is possible for two threads T1 and T2

T1: remove table1 from tableLoadingSet_
T2: remove table1 from tableLoadingSet_
T1: check loadingTables_ -> not present
T2: check loadingTables_ -> not present
T1: call getOrLoadTable() for table1
T2: call getOrLoadTable() for table1
... time elapses
load starts and table1 is added to loadingTables_


-- 
To view, visit http://gerrit.cloudera.org:8080/5707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idba5f1808e0b9cbbcf46245834d8ad38d01231cb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4733: Avoid using several loading threads on one table.

2017-01-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4733: Avoid using several loading threads on one table.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5707/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

Line 280: tableLoadingSet_.remove(tblName);
> There's still a window for a race here if multiple threads pull the same en
tableLoadingSet_ seems to be a synchronized (thread-safe) one?

 private final Set tableLoadingSet_ =
  Collections.synchronizedSet(new HashSet());


-- 
To view, visit http://gerrit.cloudera.org:8080/5707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idba5f1808e0b9cbbcf46245834d8ad38d01231cb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4733: Avoid using several loading threads on one table.

2017-01-13 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4733: Avoid using several loading threads on one table.
..


Patch Set 1: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5707/1//COMMIT_MSG
Commit Message:

PS1, Line 7: IMPALA-4733
Wrong jira.


PS1, Line 17: simpe
nit: typo


http://gerrit.cloudera.org:8080/#/c/5707/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

PS1, Line 282: tblName
Looks like a thrift object. Just wanted to make sure its printed correctly :)


-- 
To view, visit http://gerrit.cloudera.org:8080/5707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idba5f1808e0b9cbbcf46245834d8ad38d01231cb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4733: Avoid using several loading threads on one table.

2017-01-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4733: Avoid using several loading threads on one table.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5707/1/fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
File fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java:

Line 280: tableLoadingSet_.remove(tblName);
There's still a window for a race here if multiple threads pull the same entry 
off the queue at the same time. I think we can avoid it if we change the data 
structures slightly.

How about we change loadingTables_ to a map from TTableName to an atomic 
integer, and only only remove tables from the map once the load finishes?

That way, the first loading thread to flip the atomic integer from 0 to 1 wins 
the race and does the load. If another thread doesn't find the entry it knows 
that a load completed after it was last added. If it finds the integer set to 
1, it knows that a load is in progress.

Not ideal but it makes the race impossible and keeps this mechanism separate 
from loadingTables_, which has its own lifecycle.


-- 
To view, visit http://gerrit.cloudera.org:8080/5707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idba5f1808e0b9cbbcf46245834d8ad38d01231cb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4733: Avoid using several loading threads on one table.

2017-01-13 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5707

Change subject: IMPALA-4733: Avoid using several loading threads on one table.
..

IMPALA-4733: Avoid using several loading threads on one table.

When there are multiple concurrent requests to the catalogd to
prioritize loading the same table, then several catalog loading
threads may end up waiting for that single table to be loaded,
effectively reducing the number of catalog loading threads. In
extreme examples, this might degrade to serial loading of tables.

The existing design and data structures in the catalog service
make this bug somewhat tricky to fix completely, so this patch adds
a simpe fix that addresses the most common and severe manifestations
of this bug. It prevents wasting several threads on long-running table
loads by checking the map of in-progress table loads before waiting on
the future of that table.

Testing: I could easily reproduce the bug locally with the steps
described in the JIRA. After this patch, I could not observe threads
being wasted anymore.

Change-Id: Idba5f1808e0b9cbbcf46245834d8ad38d01231cb
---
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
1 file changed, 7 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/5707/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Idba5f1808e0b9cbbcf46245834d8ad38d01231cb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm