[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8405 )

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 02 Nov 2017 00:40:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8405 )

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..

IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

This is a revert of a revert, re-enabling parallel data load.  It avoid
the race condition by explicitly configuring the temporary directory in
question in load-data.py.

When the parallel data load change went in, we discovered
a race with a signature of:

  java.io.FileNotFoundException: File
  /tmp/hadoop-jenkins/mapred/local/1508958341829_tmp does not exist

The number in this path is milliseconds since the epoch, and the race
occurs when two queries submitted to HiveServer2, running with the local
runner, hit the same millisecond time stamp.  The upstream bug is
https://issues.apache.org/jira/browse/MAPREDUCE-6441, and I described the
symptoms in https://issues.apache.org/jira/browse/MAPREDUCE-6992 (which
is now marked as a dupe).

I've tested this by running data load 5 times on the same machines
where it failed before. I also ran data load manually and inspected
the system to make sure that the temporary directories are getting
created as expected in /tmp/impala-data-load-*.

Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Reviewed-on: http://gerrit.cloudera.org:8080/8405
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M bin/load-data.py
M testdata/bin/create-load-data.sh
M testdata/bin/run-hive-server.sh
M testdata/bin/run-step.sh
4 files changed, 59 insertions(+), 6 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-11-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8405 )

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:54:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-11-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8405 )

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1424/


--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:54:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-10-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8405 )

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..


Patch Set 3:

Lars asked us to hold off on merging code changes, so I won't merge this yet.


--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 30 Oct 2017 18:45:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-10-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8405 )

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 30 Oct 2017 18:45:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-10-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8405 )

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 30 Oct 2017 18:45:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-10-27 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8405 )

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8405/2/bin/load-data.py
File bin/load-data.py:

http://gerrit.cloudera.org:8080/#/c/8405/2/bin/load-data.py@114
PS2, Line 114: # When HiveServer2 is configured to use "local" mode (i.e., MR 
jobs are run
> Let's mention the HADOOP JIRA in case it's ever fixed and we can remove the
Good idea. Done.



--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Oct 2017 03:48:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-10-27 Thread Philip Zeyliger (Code Review)
Hello Joe McDonnell, Tim Armstrong, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8405

to look at the new patch set (#3).

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..

IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

This is a revert of a revert, re-enabling parallel data load.  It avoid
the race condition by explicitly configuring the temporary directory in
question in load-data.py.

When the parallel data load change went in, we discovered
a race with a signature of:

  java.io.FileNotFoundException: File
  /tmp/hadoop-jenkins/mapred/local/1508958341829_tmp does not exist

The number in this path is milliseconds since the epoch, and the race
occurs when two queries submitted to HiveServer2, running with the local
runner, hit the same millisecond time stamp.  The upstream bug is
https://issues.apache.org/jira/browse/MAPREDUCE-6441, and I described the
symptoms in https://issues.apache.org/jira/browse/MAPREDUCE-6992 (which
is now marked as a dupe).

I've tested this by running data load 5 times on the same machines
where it failed before. I also ran data load manually and inspected
the system to make sure that the temporary directories are getting
created as expected in /tmp/impala-data-load-*.

Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
---
M bin/load-data.py
M testdata/bin/create-load-data.sh
M testdata/bin/run-hive-server.sh
M testdata/bin/run-step.sh
4 files changed, 59 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8405/3
--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-10-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8405 )

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8405/2/bin/load-data.py
File bin/load-data.py:

http://gerrit.cloudera.org:8080/#/c/8405/2/bin/load-data.py@114
PS2, Line 114: # When HiveServer2 is configured to use "local" mode (i.e., MR 
jobs are run
Let's mention the HADOOP JIRA in case it's ever fixed and we can remove the 
workaround.



--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Oct 2017 00:14:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-10-27 Thread Philip Zeyliger (Code Review)
Hello Joe McDonnell, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8405

to look at the new patch set (#2).

Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..

IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

This is a revert of a revert, re-enabling parallel data load.  It avoid
the race condition by explicitly configuring the temporary directory in
question in load-data.py.

When the parallel data load change went in, we discovered
a race with a signature of:

  java.io.FileNotFoundException: File
  /tmp/hadoop-jenkins/mapred/local/1508958341829_tmp does not exist

The number in this path is milliseconds since the epoch, and the race
occurs when two queries submitted to HiveServer2, running with the local
runner, hit the same millisecond time stamp.  The upstream bug is
https://issues.apache.org/jira/browse/MAPREDUCE-6441, and I described the
symptoms in https://issues.apache.org/jira/browse/MAPREDUCE-6992 (which
is now marked as a dupe).

I've tested this by running data load 5 times on the same machines
where it failed before. I also ran data load manually and inspected
the system to make sure that the temporary directories are getting
created as expected in /tmp/impala-data-load-*.

Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
---
M bin/load-data.py
M testdata/bin/create-load-data.sh
M testdata/bin/run-hive-server.sh
M testdata/bin/run-step.sh
4 files changed, 58 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8405/2
--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

2017-10-27 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8405


Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
..

IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).

This is a revert of a revert, re-enabling parallel data load.  It avoid
the race condition by explicitly configuring the temporary directory in
question in load-data.py.

When the parallel data load change went in, we discovered
a race with a signature of:

  java.io.FileNotFoundException: File
  /tmp/hadoop-jenkins/mapred/local/1508958341829_tmp does not exist

The number in this path is milliseconds since the epoch, and the race
occurs when two queries submitted to HiveServer2, running with the local
runner, hit the same millisecond time stamp.  The upstream bug is
https://issues.apache.org/jira/browse/MAPREDUCE-6441, and I described the
symptoms in https://issues.apache.org/jira/browse/MAPREDUCE-6992 (which
is now marked as a dupe).

I've tested this by running data load 5 times on the same machines
where it failed before. I also ran data load manually and inspected
the system to make sure that the temporary directories are getting
created as expected in /tmp/impala-data-load-*.

Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
---
M bin/load-data.py
M testdata/bin/create-load-data.sh
M testdata/bin/run-hive-server.sh
M testdata/bin/run-step.sh
4 files changed, 61 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8405/1
--
To view, visit http://gerrit.cloudera.org:8080/8405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10
Gerrit-Change-Number: 8405
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger