Alex Behm has posted comments on this change.

Change subject: IMPALA-4789: Fix slow metadata loading due to inconsistent 
paths.
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5743/1//COMMIT_MSG
Commit Message:

PS1, Line 22: The treatment of such
            : partitions is very inefficient.
> I could be wrong about it, but I don't think this is an accurate descriptio
You are exactly right. Modified comment.


http://gerrit.cloudera.org:8080/#/c/5743/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 263
> Oh never mind, i didn't read the change description carefully enough.
Glad to hear we arrived at the same fix :)


Line 266:         // unpartitioned table, or the path of a partition with a 
custom location.
> how does the 'custom location' part follow?
You are right, it does not always follow. Rephrased.


PS1, Line 698: // TODO: We can still do some advanced optimization by grouping 
all the partition
             :     // directories under the same ancestor path up the tree.
             :     List<Path> dirsToLoad = Lists.newArrayList(tblLocation);
> We already have the ability to compress partition paths by removing the com
1. This TODO refers to partitions with custom locations specifically. Those 
will never go through L274-280, but always through L267-269.

2. I'm not even convinced this optimization is easy/practical. At least it's 
complicated enough to investigate in a different JIRA/patch. For example, 
consider something like this:

HDFS paths:
/warehouse/staging/2017/a
/warehouse/staging/2017/b
/warehouse/staging/2017/c
/warehouse/staging/2016/a
/warehouse/staging/2016/b
/warehouse/staging/2016/c

Now let's way we have two partitions with these custom locations:
/warehouse/staging/2017/a
/warehouse/staging/2016/a

Their common ancestor is:
/warehouse/staging/

We will load all files under the common ancestor from the NN. Might be good or 
might be a disaster.

It doesn't sound trivial to make this optimization robust.


Line 760:       partsByPath.put(partDirPath, Lists.newArrayList(partition));
> do we know that on this path it will be a qualified path, or do we not care
Good point. As long as the qualification is consistent, we're ok. I'm inclined 
to leave it as is because qualification
complicates the code due to additional exception handling in various places 
(qualification can throw IOException). What do you think?


http://gerrit.cloudera.org:8080/#/c/5743/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

Line 429:    * be qualified or unqualified paths for this function to behave as 
expected.
> add a precondition that enforces this?
I'm a little worried about that for the following reasons:

1. If there is still a bug lurking somewhere this may lead to failed table 
loads (Preconditions check will throw), instead of table loads possibly being 
slow.

2. The Preconditions check in itself is pretty expensive (see the new code).

3. If something with the path qualification is going wrong, then the modified 
log message in HdfsTable.loadMetadataAndDiskIds() will give us a good 
indication of that.

So instead of a Preconditions check I added a LOG message. How does that sound?


Line 433:     while (!p.isRoot() && p.depth() != parent.depth()) p = 
p.getParent();
> Now that we are here, should we consider fixing this method to make it more
I'd prefer to leave that as follow-on work because it requires additional 
investigation. Based on Mostafa's
profiling, we don't have evidence to indicate this function is a bottleneck, so 
why fix it if it's not broken?


-- 
To view, visit http://gerrit.cloudera.org:8080/5743
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8c881b7cb155032b82fba0e29350ca31de388d55
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-HasComments: Yes

Reply via email to