[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 10: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 10:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/559/

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-09 Thread Dimitris Tsirogiannis (Code Review)
Hello Marcel Kornacker, Impala Public Jenkins, Bharath Vissapragada, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6406

to look at the new patch set (#10).

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
14 files changed, 572 insertions(+), 323 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 9: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/554/

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 9:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/554/

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-09 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 9: Code-Review+2

Forgot to push some change.

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-09 Thread Dimitris Tsirogiannis (Code Review)
Hello Marcel Kornacker, Bharath Vissapragada, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6406

to look at the new patch set (#9).

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
13 files changed, 560 insertions(+), 323 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 8:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/553/

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-09 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 8: Code-Review+2

Rebase and fix minor test. Keep Marcel's +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-09 Thread Dimitris Tsirogiannis (Code Review)
Hello Marcel Kornacker, Bharath Vissapragada, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6406

to look at the new patch set (#8).

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
13 files changed, 563 insertions(+), 323 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-06 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 6:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 46: // the storage ID of a particular disk is unique across all the 
nodes in the cluster.
> ?
Oops, sorry missed that one. Done


http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 112: ListMap hostIndex, Reference 
unknownDiskIds)
> numUnknown...?
Done


Line 275:* File Block metadata
> provide more information (such as the fact that it's used in conjunction wi
Done


Line 283:  * Constructs the metadata of a file block from its block 
location metadata
> "Constructs an FbFileBlock..."?
Done


Line 291: ListMap hostIndex, Reference 
unknownDiskIds)
> numUnknown...?
Done


Line 305: boolean isReplicaCached = 
cachedHosts.contains(loc.getHosts()[i]);
> doesn't guava have some kind of arrayutils.contains(loc.getcachedhosts(), l
Not in Guava, ArrayUtils is a class in Apache Commons.


Line 327:  * using 'fbb' and returns the offset in the underlying buffer 
where the encoded file
> "in the underlying...starts": fine to leave out, that's implied by fb seman
Agreed but maybe it's good to explicitly mention it for future readers that may 
not be familiar with FB semantics.


http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 287:   Reference unknownDiskIds = new Reference();
> numUnknown...?
Unfortunately not, generics don't work with primitive types. Miss C++ :)?


Line 741:* Helper method to load the partition file metadata from scratch. 
This method is
> is this from a rebase?
Yes, this is from the performance improvements of REFRESH.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-06 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 5:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 46: private ConcurrentHashMap storageUuidToDiskId =
> add trailing _ to member vars
?


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 161:  * Synthesizes the block metadata of a file represented by 
'fileStatus' that resides
> Done
i meant: how does it "synthesize" it? e.g., by creating artificial blocks on 
some boundary.


http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 112: ListMap hostIndex, FileDescStats stats) 
throws IOException {
numUnknown...?


Line 275: }
provide more information (such as the fact that it's used in conjunction with 
an FbFileBlock...)


Line 283:   String[] ip_port = location.split(":");
"Constructs an FbFileBlock..."?


Line 291: 
numUnknown...?


Line 305: // Use ~REPLICA_HOST_IDX_MASK to extract the cache info (stored 
in MSB).
doesn't guava have some kind of arrayutils.contains(loc.getcachedhosts(), 
loc.gethosts()[i])?


Line 327: 
"in the underlying...starts": fine to leave out, that's implied by fb semantics.


http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 287: // Find the partition that this file belongs (if any).
numUnknown...?

does Reference also work?


Line 741:   HashMap partsByPath) {
is this from a rebase?


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-05 Thread Dimitris Tsirogiannis (Code Review)
Hello Bharath Vissapragada, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6406

to look at the new patch set (#6).

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
13 files changed, 556 insertions(+), 320 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 5:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/6406/5/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

Line 31:   LZ4
> zlib?
Done


Line 38:   offset: long (id: 0);
> why not specify a default, given that most will be 0?
Done


Line 51:   disk_ids: [ushort] (id: 3);
> is_cached?
This is embedded in MSB of replica_host_idxs.


http://gerrit.cloudera.org:8080/#/c/6406/5/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 217:   // File descriptor metadata serialized into a FlatBuffer.
> mention where to find the fb definition
Done


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 78:   diskId = (short) (storageIdGenerator.get(host) + 1);
> at least assert that you don't get an overflow here
Done


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java:

Line 78:   public byte toFlatBuffer() {
> should we generally settle on toFb for functions like these? brevity is a v
Done


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 115:   int[] blockOffsets = createBlockMetadata(fbb, blockLocations, 
hostIndex, stats);
> i couldn't tell from the name that this creates an array of fbs. call it so
Done


Line 126: public static FileDescriptor 
createWithSynthesizedBlockMetadata(FileStatus fileStatus,
> feel free to generally abbreviate metadata as md, to make this very long na
Done


PS5, Line 141: blockOffsets
> would also be good to name this so it's immediately clear what it is (ie, e
Done


Line 161:  * Synthesizes the block metadata of a file represented by 
'fileStatus' that resides
> you don't say what that block metadata contains.
Done


Line 182: List replicas = Lists.newArrayList(
> same comment about intermediate objects as below
Done


Line 208: List replicas = 
Lists.newArrayListWithExpectedSize(
> hm, it would be nice not to have to construct these intermediate objects. p
Done


Line 224: public THdfsCompression getFileCompression() {
> does this need to return a thrift struct?
Done


Line 231: public FileBlock getFileBlock(int idx) {
> why not just return the fbfileblock and save yourself the object constructi
Done


Line 329:  * Loads the metadata of a file block from its block location 
metadata 'location' and
> what do you mean by 'load'? 'construct an FbFileBlock with the provided fbb
Done


Line 346: replicaIdx = replica.isCached() ?
> please describe this encoding in the .fb file.
Done


Line 360:   FbFileBlock.addOffset(fbb, location.getOffset());
> why not just pass in the offset/length directly and avoid constructing a Bl
Done


Line 402: public boolean hasCachedReplica() { return hasCachedReplica_; }
> instead of adding member vars, why not make all functions be static and tak
Done


Line 449:   public static class FileDescStats {
> since this only keeps track of a single number, consider using a Long inste
Done


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 134:   // if the table is stored in the catalog server.
> what does that mean? i thought all tables are stored in the catalog server.
The same class is used both in CatalogServiceCatalog (catalod) and 
ImpaladCatalog (impalad). However, when the table is stored in the catalog we 
don't need to populate data structure used for static partition pruning. I 
agree "stored in the catalog" is not the best description.


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 93:   protected boolean storedInImpaladCatalogCache_ = false;
> is this part of your changes?
Yes, see previous comment in HdfsTable.java.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel 

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-04-17 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 5: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-04-05 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6406/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 218:   1: required binary file_desc_data
> Nice
Leave a todo to put this in a KRPC sidecar :) No need to pay serialization for 
the string itself!


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-04-05 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 3:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/6406/3/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

Line 36:   offset: long = 0 (id: 0);
Why a default value? Seems potentially dangerous.

Not for this change, but I'm thinking we don't even need to store the offset. 
If we know the block size, than we can derive the offset (assuming the list of 
file blocks is ordered by offset). Might be worth adding a TODO or recording 
that idea somewhere.


Line 40:   length: long = -1 (id: 1);
Seems redundant, why keep it?


Line 65:   compression: FbCompression (id: 3);
Will FlatBuffers add padding to align members? Ideally, we'd optimize for space 
and not access speed.


http://gerrit.cloudera.org:8080/#/c/6406/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 218:   1: required binary file_desc_data
Nice


Line 296:   10: optional list file_name_prefixes
Is this required for the move to flat buffers? I think we should consider an 
HdfsPathSet abstraction that assigns path ids and internally compresses the 
underlying strings. There's a lot of manual lookups and stitching in the 
current code. I don't feel too strongly about whether we should do that now, or 
clean up the code later.


http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java:

Line 78:   public byte toFbCompression() {
toFlatBuffer()?


http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 126: locations = fileSystem.getFileBlockLocations(fileStatus, 0, 
fileStatus.getLen());
I think it would be better for now to keep all the code that fetches 
information from external systems in one place (HdfsTable). Splitting up the 
loading and delegating to several classes may make sense, but that probably 
requires significant surgery, and the current fetching code is very much 
centralized (we iterate over all files of in a table).

The loading code in this patch is more confusing to me than before. The meaning 
of some verbs like load/create is less clear.

If you agree with that direction, we may not need a FileDescriptor class at 
all, and can only rely on the FB to hold the data. It may still make sense to 
have a FileDescBuilder which you can use to construct a FbFileDesc.


Line 227: loc.getNames().length);
Another case where we might be calling out to the NN.


Line 321: private static int REPLICA_HOST_CACHE_MASK = 0x8000;
Less code and more readable to have one var with (1 << 15). The places that 
need to & the mask can bit-wise invert, i.e. (x & ~MASK).


http://gerrit.cloudera.org:8080/#/c/6406/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 189:   // List of file name prefixes.
Sorted list of file name prefixes?


Line 309: Pair compressedFileName =
How predictable is the compression behavior? Do we iterate over the files in 
lexicographical order for both HDFS and S3?
I'm worried about the case where an "invalidate metadata" suddenly leads to a 
higher memory requirement even if no files/partitions have changed.


Line 340:* the suffix is equal to 'fileName'.
Should mention that this function expects the fileNames to be sorted.


Line 346: String commonPrefix = Strings.commonPrefix(prevFileName, 
fileName);
I'm wondering if it's better to first check the last common prefix instead of 
the prefix between the current and prev file name. In the current version it 
seems like the list of prefixes could contain strings which are a common prefix 
of another one.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-24 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 3: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6406/2/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS2, Line 64: message(${JAVA_FE_ARGS})
> Why?
Never mind, it looked to me like debug printing. Please ignore.


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

PS2, Line 124:   locations = locatedFileStatus.getBlockLocations();
 :   } else {
 :
> If fileStatus is not an instance of LocatedFileStatus we don't have the blo
Oops thanks for clarifying.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-24 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#3).

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
14 files changed, 648 insertions(+), 304 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-24 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/6406/2/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS2, Line 64: message(${JAVA_FE_ARGS})
> Looks like debug printing. Remove? (below too)
Why?


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

PS2, Line 67: storageIdtoInt
> (Unrelated to your change). Looks obsolete.
Done


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 119:   FlatBufferBuilder fbb = new FlatBufferBuilder(1);
> Preconditions.checkState(isBlockMdSupported(fs))?
Done


PS2, Line 124: } else {
 : locations = fileSystem.getFileBlockLocations(fileStatus, 
0, fileStatus.getLen());
 :   }
> Do we really need this? Per my understanding, we only fetch LocatedFileStat
If fileStatus is not an instance of LocatedFileStatus we don't have the block 
locations. E.g. during refresh.


Line 141: ListMap hostIndex) {
> Preconditions.checkState(!isBlockMdSupported(fs))
We don't have the filesystem here.


PS2, Line 425: fbFileBlock_.replicaHostIdxs(replicaIdx);
> - replicaHostIdxs is [ushort] as per fbs definition in which case idx shoul
In Java there is no unsigned short. Hence, ushort is stored as in int in 
flatbuffer java-generated code. I like the two masks as it makes the intention 
more explicit.


Line 463:   public static class FileDescStats {
> For supportability purposes, I'm wondering if we should track the remote_ne
That's a good point. I plan on augmenting this in a separate patch along with 
other useful stats.


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS2, Line 344: =
> nit: spacing
Done


Line 365:   private static String findCommonPrefix(String a, String b) {
> Use Strings.commonPrefix() from guava?
Good point. Didn't know this existed.


Line 1933:   Collections.sort(orderedFds, new Comparator() {
> Can we just make FileDescriptor implement Comparator and ad
How are we going to call decompressFileName that references a structure (file 
name prefixes) in HdfsTable if we do that?


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 93:   protected boolean storedInImpaladCatalogCache_ = false;
> What is the motivation to track this? A quick look at the places it is used
To avoid initializing data structures that aren't used in the catalog. Tests 
also need to access this because of the different way we have for loading 
metadata.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-23 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/6406/2/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS2, Line 64: message(${JAVA_FE_ARGS})
Looks like debug printing. Remove? (below too)


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

PS2, Line 67: storageIdtoInt
(Unrelated to your change). Looks obsolete.


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 119:   FlatBufferBuilder fbb = new FlatBufferBuilder(1);
Preconditions.checkState(isBlockMdSupported(fs))?


PS2, Line 124: } else {
 : locations = fileSystem.getFileBlockLocations(fileStatus, 
0, fileStatus.getLen());
 :   }
Do we really need this? Per my understanding, we only fetch 
LocatedFileStatus'es going forward as we don't want to do another RPC to 
getFileBlockLocations(). In another words, is there some caller that requires 
this?


Line 141: ListMap hostIndex) {
Preconditions.checkState(!isBlockMdSupported(fs))


PS2, Line 425: fbFileBlock_.replicaHostIdxs(replicaIdx);
- replicaHostIdxs is [ushort] as per fbs definition in which case idx should be 
a short? That REPLICA_HOST_CACHE_MASK could be made made short and avoid 
typecasting in L361. 

- Also that way we could use only REPLICA_HOST_CACHE_MASK and use 
~REPLICA_HOST_CACHE_MASK for unmasking instead of a separate 
REPLICA_HOST_IDX_MASK. Feel free to ignore this if you think that affects 
readability.


Line 463:   public static class FileDescStats {
For supportability purposes, I'm wondering if we should track the 
remote_network_addresses and present a rolled up version of stats so that we 
can know which DNs to look at when we see -ve disk IDs.


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS2, Line 344: =
nit: spacing


Line 365:   private static String findCommonPrefix(String a, String b) {
Use Strings.commonPrefix() from guava?


Line 1933:   Collections.sort(orderedFds, new Comparator() {
Can we just make FileDescriptor implement Comparator and add a 
compare() method there. I think that'd be cleaner.


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 93:   protected boolean storedInImpaladCatalogCache_ = false;
What is the motivation to track this? A quick look at the places it is used, it 
seems like they are only called when it actually true (for example 
resetPartitions()). Are there callers when its false too? Like tests?


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-20 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
14 files changed, 654 insertions(+), 301 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-20 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 1:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/6406/1//COMMIT_MSG
Commit Message:

Line 11: serialization library.
> Any benchmark numbers? Just curious. Also I think it is good to add them he
Done


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS1, Line 24:  
> Trailing white spaces at multiple places in this file.
Done


Line 45: set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b)
> Can this be moved outside the loop like JAVA_FE_ARGS
Done


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

PS1, Line 20:  
> extraneous white spaces.
Done


PS1, Line 54: file_name_prefix_idx
> Didn't understand what this prefix means till I read HdfsTable.fileNamePref
Done


Line 70: root_type FbFileDesc;
> What is the significance of this? I'm reading about FlatBuffers for the fir
Yeah, it is mostly used if you read from JSON. Removed.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 83: storageIdGenerator.put(host, new Short(diskId));
> Drive-by comment: using 'new' here is pessimising memory consumption. If yo
Good point thanks. Done


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

PS1, Line 318: REPLICA_HOST_IDX_MASK
> Looks like it used for UNMASKing, rename may be?
It's a bit mask, so I'd rather keep the name as such :)


Line 318: private static int REPLICA_HOST_IDX_MASK = 0x7FFF;
> Document these?
Done


PS1, Line 357: getHostIdx
> how about just making hostIdx_ a short rather than typecasting everytime?
Done


PS1, Line 360: REPLICA_HOST_CACHE_MASK
> Shouldn't this be short rather than int? May be it works fine this way, jus
For convenience. Java short is signed.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 315: Preconditions.checkNotNull(fd);
> I don't think this FileDescriptor.create() can ever return null? If yes, wo
Done


Line 338:* Computes the common prefix between 'fileName' and 'prevFileName' 
and returns a
> I like the idea but I'm wondering if this kind of optimization is a little 
That's not entirely true. Multiple filenames from a single write do have a 
common prefix.


Line 420:   prefixCompressFileName(currentFileName, prevFileName);
> Also I think it may not be optimal to construct the prefixes this way by co
In principal you're right, this is not the best way to compress strings. 
However, due to the way these file names are generated (random) this is not 
meant to be a generic file name compressor. Here we take advantage of the 
pattern that file names of a single write exhibit to avoid storing the common 
prefix multiple times.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 96:   protected static EnumSet SUPPORTED_TABLE_TYPES = 
EnumSet.of(
> May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by
Not sure how this ended up here. Removed.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-15 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6406/1//COMMIT_MSG
Commit Message:

Line 11: serialization library.
Any benchmark numbers? Just curious. Also I think it is good to add them here.


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS1, Line 24:  
Trailing white spaces at multiple places in this file.


Line 45: set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b)
Can this be moved outside the loop like JAVA_FE_ARGS


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

PS1, Line 20:  
extraneous white spaces.


PS1, Line 54: file_name_prefix_idx
Didn't understand what this prefix means till I read 
HdfsTable.fileNamePrefixes_. I think it should be pointed out here.


Line 70: root_type FbFileDesc;
What is the significance of this? I'm reading about FlatBuffers for the first 
time, so pardon my ignorance. From what I understand, this isn't required in 
strongly typed systems. Please correct me if I'm wrong.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 318: private static int REPLICA_HOST_IDX_MASK = 0x7FFF;
Document these?


PS1, Line 318: REPLICA_HOST_IDX_MASK
Looks like it used for UNMASKing, rename may be?


PS1, Line 357: getHostIdx
how about just making hostIdx_ a short rather than typecasting everytime?


PS1, Line 360: REPLICA_HOST_CACHE_MASK
Shouldn't this be short rather than int? May be it works fine this way, just 
want to be sure.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 315: Preconditions.checkNotNull(fd);
I don't think this FileDescriptor.create() can ever return null? If yes, worth 
removing Preconditions check here and elsewhere.


Line 338:* Computes the common prefix between 'fileName' and 'prevFileName' 
and returns a
I like the idea but I'm wondering if this kind of optimization is a little 
premature. Especially given the filenames generated by Impala writes are 
typically random UUID style strings and would likely not qualify this 
optimization. Thoughts? Please correct me if I got this wrong.


Line 420:   prefixCompressFileName(currentFileName, prevFileName);
Also I think it may not be optimal to construct the prefixes this way by 
comparing the previous name with the current name. May be we should huffman 
style encoding that minimizes the no. of bits across the entire set of file 
names?


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 96:   protected static EnumSet SUPPORTED_TABLE_TYPES = 
EnumSet.of(
May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by 
making it public?


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 83: storageIdGenerator.put(host, new Short(diskId));
Drive-by comment: using 'new' here is pessimising memory consumption. If you 
use Short.valueof() or just let Java automatically box it, it will use a 
built-in cache for values -128 to 127, e.g. see 
http://docs.oracle.com/javase/7/docs/api/java/lang/Short.html#valueOf(short)


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-15 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6406

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
14 files changed, 653 insertions(+), 298 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis