[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file

2017-08-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5412: Fix scan result with partitions on same file
..


Patch Set 8: Code-Review+2 Verified+1

Carry +2
https://jenkins.impala.io/job/parallel-all-tests/1284/

-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file

2017-08-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has submitted this change and it was merged.

Change subject: IMPALA-5412: Fix scan result with partitions on same file
..


IMPALA-5412: Fix scan result with partitions on same file

The maps storing file descriptors and file metadata were using
filename as a key. Multiple partitions pointing to the same
filesystem location resulted that these map entries were
occasionally overwritted by the other partition poing to
the same.

As a solution the map key was enhanced to contain a pair of
partition ID and file name.

Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Reviewed-on: http://gerrit.cloudera.org:8080/7625
Reviewed-by: Tim Armstrong 
Tested-by: Tim Armstrong 
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/scanner-context.cc
M be/src/util/container-util.h
M tests/metadata/test_partition_metadata.py
8 files changed, 110 insertions(+), 47 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved; Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file

2017-08-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5412: Fix scan result with partitions on same file
..


Patch Set 7: Verified+1

https://jenkins.impala.io/job/parallel-all-tests/1284/

-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file

2017-08-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5412: Fix scan result with partitions on same file
..


Patch Set 7:

Running tests against the Impala-lzo branch with matching change: 
https://jenkins.impala.io/job/parallel-all-tests/1281/

-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file

2017-08-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5412: Fix scan result with partitions on same file
..


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1040/

-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file

2017-08-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5412: Fix scan result with partitions on same file
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1040/

-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file

2017-08-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5412: Fix scan result with partitions on same file
..


Patch Set 7: Code-Review+2

Rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file

2017-08-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5412: Fix scan result with partitions on same file
..


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file

2017-08-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5412: Fix scan result with partitions on same file
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7625/5/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 365:   typedef std::unordered_map FileDescMap;
> extra space after PartitionFileKey
Done


http://gerrit.cloudera.org:8080/#/c/7625/5/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

Line 52:   def test_multiple_partitions_same_location(self, vector, 
unique_database):
> I think this test could use some cleanup, but I'm ok to accept this patch i
I see your point but I don't have the time right now to do this refactoring. I 
think we should get this in and improve the tests as a separate step (if we 
think it's important).


Line 103: # check if using num_nodes=1 has the same behaviour
> # force all scan ranges to be on the same node
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file

2017-08-11 Thread Tim Armstrong (Code Review)
Hello Matthew Jacobs,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7625

to look at the new patch set (#6).

Change subject: IMPALA-5412: Fix scan result with partitions on same file
..

IMPALA-5412: Fix scan result with partitions on same file

The maps storing file descriptors and file metadata were using
filename as a key. Multiple partitions pointing to the same
filesystem location resulted that these map entries were
occasionally overwritted by the other partition poing to
the same.

As a solution the map key was enhanced to contain a pair of
partition ID and file name.

Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/scanner-context.cc
M be/src/util/container-util.h
M tests/metadata/test_partition_metadata.py
8 files changed, 110 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/7625/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file

2017-08-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5412: Fix scan result with partitions on same file
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7625/5/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 365:   typedef std::unordered_map FileDescMap;
extra space after PartitionFileKey


http://gerrit.cloudera.org:8080/#/c/7625/5/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

Line 52:   def test_multiple_partitions_same_location(self, vector, 
unique_database):
I think this test could use some cleanup, but I'm ok to accept this patch if 
you feel the cleanup is too cumbersome.

* instead of limiting the file formats and relying on 
allow_unsupported_formats, we could use existing alltypes data (create table 
like, then create partitions pointing to known locations with data)
* split up the read and write tests; it's good to have coverage of the write 
path, but none of the JIRAs mentioned here were bugs in the write path


Line 103: # check if using num_nodes=1 has the same behaviour
# force all scan ranges to be on the same node


-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file

2017-08-11 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5412: Fix scan result with partitions on same file
..


Patch Set 5: Code-Review+1

Looks good

-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5412 Fix scan result with partitions on same file

2017-08-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5412 Fix scan result with partitions on same file
..


Patch Set 3:

(3 comments)

I ended up combining the tests in test_partition_metadata and adding parquet 
and seq. Reran the tests to confirm that it still failed before the fix.

http://gerrit.cloudera.org:8080/#/c/7625/3/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

PS3, Line 589:   c
> nit: 4 spaces
Done


http://gerrit.cloudera.org:8080/#/c/7625/3/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

PS3, Line 86: Imapala
> There's still an extra 'a' :)
Done


PS3, Line 125:  
> Trailing whitespace.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5412: Fix scan result with partitions on same file

2017-08-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-5412: Fix scan result with partitions on same file
..

IMPALA-5412: Fix scan result with partitions on same file

The maps storing file descriptors and file metadata were using
filename as a key. Multiple partitions pointing to the same
filesystem location resulted that these map entries were
occasionally overwritted by the other partition poing to
the same.

As a solution the map key was enhanced to contain a pair of
partition ID and file name.

Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/scanner-context.cc
M be/src/util/container-util.h
M tests/metadata/test_partition_metadata.py
8 files changed, 109 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/7625/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5412 Fix scan result with partitions on same file

2017-08-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-5412 Fix scan result with partitions on same file
..

IMPALA-5412 Fix scan result with partitions on same file

The maps storing file descriptors and file metadata were using
filename as a key. Multiple partitions pointing to the same
filesystem location resulted that these map entries were
occasionally overwritted by the other partition poing to
the same.

As a solution the map key was enhanced to contain a pair of
partition ID and file name.

Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/scanner-context.cc
M be/src/util/container-util.h
M tests/metadata/test_partition_metadata.py
8 files changed, 109 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/7625/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5412 Fix scan result with partitions on same file

2017-08-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5412 Fix scan result with partitions on same file
..


Patch Set 3:

(1 comment)

> (2 comments)
 > 
 > Gabor told me that he won't be able to work on this for around 10
 > days. From my point of view this is pretty close, so I think we
 > should consider filing a follow-on JIRA for the extra test coverage
 > and getting this in soonish to give it time to bake.

Works for me.

http://gerrit.cloudera.org:8080/#/c/7625/3/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

PS3, Line 589:   c
nit: 4 spaces


-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5412 Fix scan result with partitions on same file

2017-08-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5412 Fix scan result with partitions on same file
..


Patch Set 3:

(2 comments)

Gabor told me that he won't be able to work on this for around 10 days. From my 
point of view this is pretty close, so I think we should consider filing a 
follow-on JIRA for the extra test coverage and getting this in soonish to give 
it time to bake.

http://gerrit.cloudera.org:8080/#/c/7625/3/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

PS3, Line 86: Imapala
There's still an extra 'a' :)


PS3, Line 125:  
Trailing whitespace.


-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5412 Fix scan result with partitions on same file

2017-08-10 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change.

Change subject: IMPALA-5412 Fix scan result with partitions on same file
..


Patch Set 1:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/base-sequence-scanner.cc
File be/src/exec/base-sequence-scanner.cc:

Line 90:   context->partition_descriptor()->id(),stream_->filename()));
> nit:space after ,
Done


PS1, Line 163: ,s
> space
Done


PS1, Line 309: 
> 4 spaces
Done


http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 613: context_->partition_descriptor()->id(), filename());
> 4 spaces
Done


http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

PS1, Line 266: ,
> space
Done


PS1, Line 266: string
> Do we need to call the string constructor explicitly? Seems a bit weird.
Done


PS1, Line 266: string
> remove string constructor here
Done


http://gerrit.cloudera.org:8080/#/c/7625/1/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 197:   /// Allocate a new scan range object, stored in the runtime state's 
object pool. For
> This comment needs updating since partition_id is now required.
Done


PS1, Line 237:   /// Returns nullptr if the search doesn't find the descriptor.
> not true; it has a dcheck
Done


PS1, Line 360: pair_hash
> misleading name because the hash is computed on the value.
apparently, unordered_map doesn't accept pair as key by default, only if a 
custom hash is provided.


Line 360:   struct pair_hash {
> We have some utilities like this already in util/container-util.h, let's mo
Done


Line 363: return std::hash{}(p.second);
> Let's hash both values in the pair and combine them with boost::hash_combin
Done


PS1, Line 369: pair
> Can you typedef this and using that here and in the .cc file ?
Done


PS1, Line 381: pair
> same
Done


http://gerrit.cloudera.org:8080/#/c/7625/1/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

Line 71: data = self.execute_scalar("select sum(i), sum(j) from %s" % 
FQ_TBL_NAME)
> Can we run this query with num_nodes=1 to verify that the same bug doesn't 
Done


PS1, Line 81: ad
> as?
Done


PS1, Line 81: imapala
> Impala
Done


PS1, Line 109: output
> 'output' isn't clear (it doesn't match up with the queries). say this is wh
Done


Line 110: # (note, that shortcoming on avro writer would result the 
inserted value as NULL,
> How about we only query the second column to avoid this complication?
Done


PS1, Line 122: output:
 : # [NULL, 1] 3 times
 : # [NULL, 2] 3 times
> again, this is confusing
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5412 Fix scan result with partitions on same file

2017-08-10 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded a new patch set (#3).

Change subject: IMPALA-5412 Fix scan result with partitions on same file
..

IMPALA-5412 Fix scan result with partitions on same file

The maps storing file descriptors and file metadata were using
filename as a key. Multiple partitions pointing to the same
filesystem location resulted that these map entries were
occasionally overwritted by the other partition poing to
the same.

As a solution the map key was enhanced to contain a pair of
partition ID and file name.

Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/scanner-context.cc
M be/src/util/container-util.h
M tests/metadata/test_partition_metadata.py
8 files changed, 120 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/7625/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5412 Fix scan result with partitions on same file

2017-08-10 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded a new patch set (#2).

Change subject: IMPALA-5412 Fix scan result with partitions on same file
..

IMPALA-5412 Fix scan result with partitions on same file

The maps storing file descriptors and file metadata were using filename as a 
key.
Multiple partitions pointing to the same filesystem location resulted that these
map entries were occasionally overwritted by the other partition poing to the 
same.

As a solution the map key was enhanced to contain a pair of partition ID and 
file name.

Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/scanner-context.cc
M tests/metadata/test_partition_metadata.py
7 files changed, 109 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/7625/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7625
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie74b305377248045c0d87b911943e1cabb7223e9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong