[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-12-08 Thread Laszlo Gaal (Code Review)
Laszlo Gaal has abandoned this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 379:   private static int getDiskId(String storageId) {
> How can we separate them though? The new API returns a UUID.
oops, i didn't see that. in that case, shipping those around would cause an 
immediate size regression. so we need to translate them into ordinals.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 379:   private static int getDiskId(String storageId) {
> in this patch, let's focus on using the new api, rather than shrinking the 
How can we separate them though? The new API returns a UUID.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-07 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 379:   private static int getDiskId(String storageId) {
> Agree, that's even better. We need to group the UUIDs by host and then assi
in this patch, let's focus on using the new api, rather than shrinking the disk 
ids down to their minimum size. that's probably best handled in a follow-on.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-04 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 437:   BlockLocation[] locations = fs.getFileBlockLocations(file, 0, 
file.getLen());
> still using the old API?
I don't think this change is complete.
This call stack need to be moved to the new API as well.

com.cloudera.impala.catalog.TableLoader.load(Db,String)
com.cloudera.impala.catalog.HdfsTable.load(boolean,IMetaStoreClient,Table)
com.cloudera.impala.catalog.HdfsTable.load(boolean,IMetaStoreClient,Table,boolean,boolean,Set)
com.cloudera.impala.catalog.HdfsTable.loadAllPartitions(List,Table)
com.cloudera.impala.catalog.HdfsTable.createPartition(StorageDescriptor,Partition,Map)
com.cloudera.impala.catalog.HdfsTable.updatePartitionFds(Path,boolean,HdfsFileFormat,Map)
com.cloudera.impala.catalog.HdfsTable.loadBlockMetadata(FileSystem,FileStatus,HdfsPartition$FileDescriptor,HdfsFileFormat,Map)
org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(FileStatus,long,long)
org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(Path,long,long)
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystem,Path)
org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(Path)
org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(Path)
org.apache.hadoop.hdfs.DFSClient.getBlockLocations(String,long,long)
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(String,long,long)
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(ClientProtocol,String,long,long)


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-04 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 379:   private static int getDiskId(String storageId) {
> why would we ship the uuid around, or a 16-byte int for that matter? why no
Agree, that's even better. We need to group the UUIDs by host and then assign 
the disk ids.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-04 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 379:   private static int getDiskId(String storageId) {
> Agree and that's basically what I think I recommended in my comments here. 
why would we ship the uuid around, or a 16-byte int for that matter? why not 
stick to 1-byte ints right after we get the disk uuids and convert them?


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-04 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 379:   private static int getDiskId(String storageId) {
> there is no need to do what you just described.
Agree and that's basically what I think I recommended in my comments here. We 
can shop the UUID around as a TUniqueId though (and not as a string).


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-04 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 379:   private static int getDiskId(String storageId) {
> the UUIDs are generated randomly and we hash those, so it's possible we get
there is no need to do what you just described.

what we should do:
- map uuids to per-node ordinals. in your example, that machine would see disk 
ordinals 0-7
- use the disk ordinal to directly address the corresponding disk queue

with this approach, only servers with >=127 disks would see degradation.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-04 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 379:   private static int getDiskId(String storageId) {
> it's very unusual for a server to have more than 128 disks (i don't think w
the UUIDs are generated randomly and we hash those, so it's possible we get a 
series of disk IDs which when modded all land on the same disk queue

for example, we could get hashes 8, 16, 24, 32, etc. on a machine with 8 disks

Of course, in aggregate these collisions are unlikely, but like you said a 
single machine has few disks, so I think this effect can easily happen on a 
single machine.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-04 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 379:   private static int getDiskId(String storageId) {
> In the BE we mod the disk id to determine which disk queue to put reads on,
it's very unusual for a server to have more than 128 disks (i don't think we've 
ever seen one), so from that perspective i'm not worried about using fewer bits 
to represent the disk "ordinal".

keep in mind that we really don't care about distinguishing disks globally. 
using a designated "out of range" value (such as 127) if a node should really 
exceed 127 disks, and then placing scan ranges with that value on random disk 
queues should degrade performance gracefully in cases where the number of disks 
is just a bit above that threshold.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-04 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 379:   private static int getDiskId(String storageId) {
> could you go into detail about the performance problems?
In the BE we mod the disk id to determine which disk queue to put reads on, so 
I'm worried about pathological cases with "collisions" where many ranges are 
assigned to only one disk thread. That will be quite hard to debug.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-04 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 379:   private static int getDiskId(String storageId) {
> As much as I'd like to use this simpler alternative, I think it could lead 
could you go into detail about the performance problems?

the goal is to switch to an *8-bit* node-relative disk id in order to reduce 
the amount of storage space required for recording this info.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(1 comment)

An additional high-level comment

http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 526:   if (SUPPORTS_STORAGE_ID) {
> This seems fine as a first cut, but we should consider taking this a step f
On Hadoop 3 the old API is not available at all, so the old API calls won't 
even compile.

Can we factor out the logic into two different implementation classes with a 
common interface for the two different APIs? That will be necessary anyway and 
I think will make the logic easier to understand.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-04 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

(11 comments)

Some high-level comments before digging in deeper.

http://gerrit.cloudera.org:8080/#/c/4914/1//COMMIT_MSG
Commit Message:

Line 9: This change enables Impala to use BlockLocation#getStorageIds,
nit: we usually use () after names to denote they are e function/method

i.e. getStorageIds -> getStorageIds()

getFileBlockLocations -> getFileBlockLocations()


Line 12: getFileBlockLocations call which will be  deprecated in Hadoop-3.
extra space after "be"


http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 287:   clazz = null;
remove


Line 290: if ( clazz != null ) {
style: extra spaces, we do

if (clazz != null) {
...
}


Line 295: m = null;
remove


Line 311: LOG.info("SUPPORTS_STORAGE_ID is: " + (SUPPORTS_STORAGE_ID ? 
"true" : "false"));
no need for this ternary logic, just:

LOG.info("SUPPORTS_STORAGE_ID is: " + SUPPORTS_STORAGE_ID);


Line 352:   private static int getSequentialDiskId(String storageId)
While this solution is minimally invasive, I'm not a big fan of the extra 
locking and memory consumption of this map.

An overall better design is to convert the UUID to a 128-bit int and use that 
everywhere. Also see my comments below.

Let me know what you think.


Line 353:   {
move to previous line


Line 379:   private static int getDiskId(String storageId) {
As much as I'd like to use this simpler alternative, I think it could lead to 
serious performance problems that could be very hard to debug. I don't think we 
can use this.

Perhaps we should consider switching to a 128-bit integer to represent the UUID 
everywhere (including all thrift structures and the BE). The BE only needs a 
number that it can mod against the number of local data dirs (see 
DiskIoMgr::AssignQueue()).

Using a 128-bit int would also work with the old volume id API.

We have an existing TUniqueId which may be suitable.


Line 437:   BlockLocation[] locations = fs.getFileBlockLocations(file, 0, 
file.getLen());
still using the old API?

My understanding is that switching to a new API for these calls here will make 
the extra loadDiskIds() unnecessary.


Line 526:   if (SUPPORTS_STORAGE_ID) {
This seems fine as a first cut, but we should consider taking this a step 
further in a follow-on change.

This is still grabbing the storage ids partition-by-partition, but ideally we 
could get all the blocks and their locations for an entire table. We'd need to 
handle partitions with non-standard locations specially.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-04 Thread Laszlo Gaal (Code Review)
Laszlo Gaal has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..


Patch Set 1:

First pass on the review to make sure it's going in the right direction. I also 
wonder about the best way to test it in the current dev environment; the 
minicluster seems to be too constrained in instances and (simulated) spindles.
The old code path is still preserved in case the new HDFS method is not present 
on the platform.

-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

2016-11-04 Thread Laszlo Gaal (Code Review)
Laszlo Gaal has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4914

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
..

IMPALA-4172: Switch to BlockLocation methods for disk IDs

This change enables Impala to use BlockLocation#getStorageIds,
the new call in Hadoop 3 for getting HDFS data block location
info from the NameNode. This call supercedes the old
getFileBlockLocations call which will be  deprecated in Hadoop-3.

The presence of BlockLocation#getStorageIds is determined dynamically.
This is necessary because the implementation was backported to
various Hadoop-2 releases after it appeared in Hadoop-3, so simple
version checking is not suitable to decide whether the call can be
used. In cases where both getFileBlockLocations and
BlockLocation#getStorageIds are supported, the code prefers the
latter for performance reasons: getStorageIds does not have to
query the DataNodes for this information.

BlockLocation#getStorageIds returns disk IDs as UUID-based strings,
which would be too expensive to ship around and store. This patch
maps these strings to small integers, preserving compatibility with the
existing representation of diskIDs.

Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
1 file changed, 155 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/4914/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal