[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-07-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..


IMPALA-5583: [DOCS] Document default_join_distribution_mode query option

New page for the query option.

Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Reviewed-on: http://gerrit.cloudera.org:8080/7300
Reviewed-by: Mostafa Mokhtar 
Tested-by: Impala Public Jenkins
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
A docs/topics/impala_default_join_distribution_mode.xml
3 files changed, 136 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Mostafa Mokhtar: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-07-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-07-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..


Patch Set 4:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/140/

-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-07-10 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7300/3/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 52:   on the right-hand side of the join is broadcast. This behavior
> I believe the word right-hand side is reserved to the position of the table
I knew 'RHS' was going to get me in trouble. :-)


-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-07-10 Thread John Russell (Code Review)
John Russell has uploaded a new patch set (#4).

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..

IMPALA-5583: [DOCS] Document default_join_distribution_mode query option

New page for the query option.

Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
A docs/topics/impala_default_join_distribution_mode.xml
3 files changed, 136 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7300/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-07-06 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7300/1/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 48:   Impala uses the broadcast technique that transmits the 
entire contents
> If both tables are missing stats the table listed first in the query will b
Done


Line 61:   the setting 
DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE lets you
> This is the description for the SHUFFLE join, we should use similar wording
Done


http://gerrit.cloudera.org:8080/#/c/7300/2/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 40:   This option determines the join distribution that Impala uses 
when any of the tables
> Alex's comment around not using "Join strategy" hasn't been addressed. 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-07-06 Thread John Russell (Code Review)
John Russell has uploaded a new patch set (#3).

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..

IMPALA-5583: [DOCS] Document default_join_distribution_mode query option

New page for the query option.

Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
A docs/topics/impala_default_join_distribution_mode.xml
3 files changed, 136 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7300/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 


[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-07-06 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7300/1/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 48:   Impala uses the broadcast technique that transmits the 
entire contents
> What is the answer if both tables are missing stats? Does Impala make a ded
If both tables are missing stats the table listed first in the query will be 
the probe side while the second table will be broadcasted.


Line 61:   from each table to each executor node.
> I'd prefer to prepare and fine-tune a brief explanation so I could reuse th
This is the description for the SHUFFLE join, we should use similar wording

[SHUFFLE] - Makes that join operation use the "partitioned" technique, which 
divides up corresponding rows from both tables using a hashing algorithm, 
sending subsets of the rows to other nodes for processing. (The keyword SHUFFLE 
is used to indicate a "partitioned join", because that type of join is not 
related to "partitioned tables".) Since the alternative "broadcast" join 
mechanism is the default when table and index statistics are unavailable, you 
might use this hint for queries where broadcast joins are unsuitable; 
typically, partitioned joins are more efficient for joins between large tables 
of similar size.


http://gerrit.cloudera.org:8080/#/c/7300/2/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 40:   This option determines the join strategy that Impala uses when 
any of the tables
Alex's comment around not using "Join strategy" hasn't been addressed. 

Can you please use "join distribution" instead?


-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-07-05 Thread John Russell (Code Review)
John Russell has uploaded a new patch set (#2).

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..

IMPALA-5583: [DOCS] Document default_join_distribution_mode query option

New page for the query option.

Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
A docs/topics/impala_default_join_distribution_mode.xml
3 files changed, 132 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7300/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: John Russell 


[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-07-05 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7300/1/docs/impala.ditamap
File docs/impala.ditamap:

Line 179:   
> Why mention IMPALA-5583 also?
In the past I've referred both to the "code implementation" JIRA and "document 
the new feature" JIRA in this kind of context. Just for ease of future 
maintenance and tracing if something is wrong or missing on the doc side. I 
guess that's less important when the doc one is a subtask of the code one. I'll 
take it out.


http://gerrit.cloudera.org:8080/#/c/7300/1/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 40:   This option determines the join strategy that Impala uses when 
any of the tables
> We deliberately did not use "join strategy" in the option name because stra
Can you elaborate a little on the meaning of "join distribution mode" then? 
That's not terminology we've used elsewhere in the docs.


Line 47:   Hive ANALYZE TABLE statement.
> Sure you want to keep the ANALYZE TABLE part? In most situations we cannot 
Done


Line 48:   By default, when a table involved in the join query does not 
have statistics,
> Accuracy could be improved. What if both tables do not have stats? Clarify 
What is the answer if both tables are missing stats? Does Impala make a 
deduction about which is smaller and that one gets broadcast while the other 
doesn't?


Line 58:   might be missing statistics due to the overhead involved in 
calculating them,
> I wouldn't suppose a particular reason for not having stats.
Done


Line 61:   of a table involved in a join query and only transmits a portion 
of the table
> Not very accurate, both tables are transferred across the network. Not sure
I'd prefer to prepare and fine-tune a brief explanation so I could reuse that 
wording in places where such terminology is mentioned to a reader that might 
not have seen it before. Anyone who needs detailed background info can follow 
the "related info" links at the end of the page.


Line 67:   recommended when setting up and deploying new clusters. This 
setting is
> We should mention why we recommend this. SHUFFLE is generally a safer optio
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-06-26 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7300/1/docs/impala.ditamap
File docs/impala.ditamap:

Line 179:   
Why mention IMPALA-5583 also?


http://gerrit.cloudera.org:8080/#/c/7300/1/docs/topics/impala_default_join_distribution_mode.xml
File docs/topics/impala_default_join_distribution_mode.xml:

Line 40:   This option determines the join strategy that Impala uses when 
any of the tables
We deliberately did not use "join strategy" in the option name because strategy 
is too generic.


Line 47:   Hive ANALYZE TABLE statement.
Sure you want to keep the ANALYZE TABLE part? In most situations we cannot 
effectively use what Hive produces.


Line 48:   By default, when a table involved in the join query does not 
have statistics,
Accuracy could be improved. What if both tables do not have stats? Clarify that 
one table is going to be broadcast. Might even be worth explicitly listing what 
happens if one table has stats and the other doesn't (the one without stats 
will be broadcast)


Line 58:   might be missing statistics due to the overhead involved in 
calculating them,
I wouldn't suppose a particular reason for not having stats.


Line 61:   of a table involved in a join query and only transmits a portion 
of the table
Not very accurate, both tables are transferred across the network. Not sure if 
we need to explain the differences between broadcast+shuffle here, maybe 
provide a link to their explanation/definition?


Line 67:   recommended when setting up and deploying new clusters. This 
setting is
We should mention why we recommend this. SHUFFLE is generally a safer option 
because the join build will be less prone to spilling and/or OOM.


-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-06-26 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..


Patch Set 1:

This gerrit is a "stake in the ground" with basic details about the query 
option. If there are other places where the option could be mentioned (under 
joins, compute stats, etc.), I'll handle those in a separate gerrit.

-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5583: [DOCS] Document default join distribution mode query option

2017-06-26 Thread John Russell (Code Review)
John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7300

Change subject: IMPALA-5583: [DOCS] Document default_join_distribution_mode 
query option
..

IMPALA-5583: [DOCS] Document default_join_distribution_mode query option

New page for the query option.

Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
---
M docs/impala.ditamap
M docs/impala_keydefs.ditamap
A docs/topics/impala_default_join_distribution_mode.xml
3 files changed, 132 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/7300/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell