[GitHub] drill pull request #1075: DRILL-6030: Managed sort should minimize number of...

2018-01-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1075


---


[GitHub] drill pull request #1075: DRILL-6030: Managed sort should minimize number of...

2017-12-20 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1075#discussion_r158150320
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortConfig.java
 ---
@@ -84,7 +85,7 @@ public SortConfig(DrillConfig config) {
 if (limit > 0) {
   mergeLimit = Math.max(limit, MIN_MERGE_LIMIT);
 } else {
-  mergeLimit = Integer.MAX_VALUE;
+  mergeLimit = DEFAULT_MERGE_LIMIT;
--- End diff --

There may be a misunderstanding of how config options work. We define the 
defaults in Drill's own source code: `drill-module.conf` in each module. (Here 
it is in `java-exec`.)

To change the default option, we change the value in `drill-module.conf`. 
In the highly unlikely case that a user has overridden this value in 
`drill-override.conf`, their value will be used. But, the option is not 
documented in `drill-override-example.conf` so it is very, very unlikely that 
anyone created an override. (The property is meant to be internal, for use in 
tests.)

So, rather than introducing yet another variable, we might as well use the 
existing config property. This has the added advantage that, if experience 
suggests that we need a smaller or larger limit for some scenarios, we can make 
the adjustment in the field via the config system.


---


[GitHub] drill pull request #1075: DRILL-6030: Managed sort should minimize number of...

2017-12-19 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1075#discussion_r157885846
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortConfig.java
 ---
@@ -84,7 +85,7 @@ public SortConfig(DrillConfig config) {
 if (limit > 0) {
   mergeLimit = Math.max(limit, MIN_MERGE_LIMIT);
 } else {
-  mergeLimit = Integer.MAX_VALUE;
+  mergeLimit = DEFAULT_MERGE_LIMIT;
--- End diff --

IMO, it is better to change the default to avoid upgrade problems. In an 
upgrade scenario,  users may simply overwrite `drill-override.conf` from their 
prior installations and forget to set the merge limit. Is there a reason not to 
change the default merge limit?


---


[GitHub] drill pull request #1075: DRILL-6030: Managed sort should minimize number of...

2017-12-19 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1075#discussion_r157830252
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortConfig.java
 ---
@@ -84,7 +85,7 @@ public SortConfig(DrillConfig config) {
 if (limit > 0) {
   mergeLimit = Math.max(limit, MIN_MERGE_LIMIT);
 } else {
-  mergeLimit = Integer.MAX_VALUE;
+  mergeLimit = DEFAULT_MERGE_LIMIT;
--- End diff --

The merge limit is already a config option. (I'd forgotten about that.) The 
comment on the config option says "Limit on the number of spilled batches that 
can be merged in a single pass." So, let's just set that default (in 
`drill-override-conf`) to your new value of 128 and leave the code here 
unchanged.


---


[GitHub] drill pull request #1075: DRILL-6030: Managed sort should minimize number of...

2017-12-18 Thread vrozov
GitHub user vrozov opened a pull request:

https://github.com/apache/drill/pull/1075

DRILL-6030: Managed sort should minimize number of batches in a k-way merge

@paul-rogers Please review

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vrozov/drill DRILL-6030

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1075.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1075


commit f40d962cf35574c0406937de1ee86ef853234b3e
Author: Vlad Rozov 
Date:   2017-12-17T17:25:55Z

DRILL-6030: Managed sort should minimize number of batches in a k-way merge




---