[GitHub] drill pull request #753: DRILL-5260: Extend "Cluster Fixture" test framework
GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/753 DRILL-5260: Extend "Cluster Fixture" test framework - Config option to suppress printing of CSV and other output. (Allows printing for single tests, not printing when running from Maven.) - Parsing of query profiles to extract plan and run time information. - Fix bug in log fixture when enabling logging for a package. - Improved ZK support. - Set up the new CTTAS default temporary workspace for tests. - Clean up persistent storage files on disk to avoid CTTAS startup failures. - Provides a set of examples for how to use the cluster fixture. You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-5260 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/753.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #753 commit 40080d663545c6f0a5a413cf18866611a2bc5b3f Author: Paul RogersDate: 2017-02-18T01:39:20Z DRILL-5260: Extend "Cluster Fixture" test framework - Config option to suppress printing of CSV and other output. (Allows printing for single tests, not printing when running from Maven.) - Parsing of query profiles to extract plan and run time information. - Fix bug in log fixture when enabling logging for a package. - Improved ZK support. - Set up the new CTTAS default temporary workspace for tests. - Clean up persistent storage files on disk to avoid CTTAS startup failures. - Provides a set of examples for how to use the cluster fixture. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #594: DRILL-4842: SELECT * on JSON data results in NumberFormatE...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/594 The bug here is fundamental to the way Drill works with JSON. We already had an extensive discussion around this area in another PR. The problem is that JSON supports a null type which is independent of all other types. In JSON, a null is not a "null int" or a "null string" -- it is just null. Drill must infer a type for a field. This leads to all kinds of grief when a file contains a run of nulls before the real value: {code} { id: 1, b: null } ... { id: 8, b: "gee, I'm a string!" } {code} Drill must do something with the leading values. "b" is a null... what? Int? String? We've had many bugs in this area. The bugs are not just code bugs, they represent a basic incompatibility between Drill and JSON. This fix is yet another attempt to work around the limitation, but cannot overcome the basic incompatibility. What we are doing, it seems, is building a list of fields that have seen only null values, deferring action on those fields until later. That works fine if "later" occurs in the same record batch. It is not clear what happens if we get to the end of the batch (as in the example above), but have never seen the type of the field: what type of vector do we create? There are several solutions. One is to have a "null" type in Drill. When we see the initial run of nulls, we simply create a field of the "null" type. We have type conversion rules that say that a "null" vector can be coerced into any other type when we ultimately see the type. (And, if we don't see a type in one batch, we can pass the null vector along upstream for later reconciliation.) This is a big change; too big for a bug fix. Another solution, used here, is to keep track of "null only" fields, to defer the decision for later. That has a performance impact. A third solution is to go ahead and create a vector of any type, keep setting its values to null (as if we had already seen the field type), but be ready to discard that vector and convert it to the proper type once we see that type. In this way, we treat null fields just as any other up to the point where we realize we have a type conflict. Only then do we check the "null only" map and decide we can quietly convert the vector type to the proper type. These are the initial thoughts. I'll add more nuanced comments as I review the code in more detail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #751: DRILL-5259: Allow listing a user-defined number of ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/751#discussion_r101880155 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -60,8 +63,7 @@ @RolesAllowed(DrillUserPrincipal.AUTHENTICATED_ROLE) public class ProfileResources { static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProfileResources.class); - - public final static int MAX_PROFILES = 100; + private static Integer MaxProfiles = null; --- End diff -- This can be an int. The config system requires that you set some value for the config property, unless you go through some hokey-pokey to check if the property exists. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #751: DRILL-5259: Allow listing a user-defined number of ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/751#discussion_r101880333 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -220,7 +225,21 @@ public QProfiles getProfilesJSON() { final List finishedQueries = Lists.newArrayList(); - final Iterator> range = completed.getRange(0, MAX_PROFILES); + //Defining #Profiles to load + int maxProfilesToLoad; + if (MaxProfiles == null) { +MaxProfiles = work.getContext().getConfig().getInt(ExecConstants.HTTP_MAX_PROFILES); + } + + String maxProfilesParams = uriInfo.getQueryParameters().getFirst(MAX_QPROFILES_PARAM); + if (maxProfilesParams != null && !maxProfilesParams.isEmpty()) { +maxProfilesToLoad = Integer.valueOf(maxProfilesParams); + } else { +maxProfilesToLoad = MaxProfiles; + } --- End diff -- No need for an else. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #751: DRILL-5259: Allow listing a user-defined number of ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/751#discussion_r101880188 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -190,10 +192,13 @@ public QProfiles(List runningQueries, List finishedQue public List getErrors() { return errors; } } + //max Param to cap listing of profiles + private static final String MAX_QPROFILES_PARAM = "max"; --- End diff -- Maybe limit? As in SQL? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #751: DRILL-5259: Allow listing a user-defined number of ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/751#discussion_r101880246 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -220,7 +225,21 @@ public QProfiles getProfilesJSON() { final List finishedQueries = Lists.newArrayList(); - final Iterator> range = completed.getRange(0, MAX_PROFILES); + //Defining #Profiles to load + int maxProfilesToLoad; + if (MaxProfiles == null) { +MaxProfiles = work.getContext().getConfig().getInt(ExecConstants.HTTP_MAX_PROFILES); --- End diff -- Just {code} int maxProfilesToLoad = work.getContext()...; {code} --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #751: DRILL-5259: Allow listing a user-defined number of ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/751#discussion_r101880292 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -220,7 +225,21 @@ public QProfiles getProfilesJSON() { final List finishedQueries = Lists.newArrayList(); - final Iterator> range = completed.getRange(0, MAX_PROFILES); + //Defining #Profiles to load + int maxProfilesToLoad; + if (MaxProfiles == null) { +MaxProfiles = work.getContext().getConfig().getInt(ExecConstants.HTTP_MAX_PROFILES); + } + + String maxProfilesParams = uriInfo.getQueryParameters().getFirst(MAX_QPROFILES_PARAM); --- End diff -- You can use annotations for the property. That's the best way in this particular context. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #751: DRILL-5259: Allow listing a user-defined number of ...
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/751#discussion_r101879098 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -190,10 +193,13 @@ public QProfiles(List runningQueries, List finishedQue public List getErrors() { return errors; } } + //max Param to cap listing of profiles + private static final String MAX_QPROFILES_PARAM = "max"; --- End diff -- Lucky. There is a WorkManager instance available. We're able to load the default from `drilloverride.conf` and also have the user pass it via the URL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (DRILL-5005) Potential issues with external sort info in query profile
[ https://issues.apache.org/jira/browse/DRILL-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-5005. Resolution: Fixed Fix Version/s: 1.10.0 The label in the tree is left at "Sort" because, in modern Drill, all sorts use the external sort. The older non-spillable sort is no longer used. Memory usages for the sort were vastly revised, which should address the memory concern in this issue. > Potential issues with external sort info in query profile > - > > Key: DRILL-5005 > URL: https://issues.apache.org/jira/browse/DRILL-5005 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.10.0 > > > Run a query that will include an external sort. > Look at the visualized plan. The external sort is shown in the tree as just > "Sort". Suggestion: say "External Sort". > In the operator profiles section, memory use for EXTERNAL_SORT is listed as > 26MB. Yet the file being sorted is 388 MB. Even allowing for projects of a > subset of column, 26 MB seems awfully small to hold the data set. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] drill pull request #751: DRILL-5259: Allow listing a user-defined number of ...
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/751#discussion_r101872793 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -190,10 +193,13 @@ public QProfiles(List runningQueries, List finishedQue public List getErrors() { return errors; } } + //max Param to cap listing of profiles + private static final String MAX_QPROFILES_PARAM = "max"; --- End diff -- This is to define the param in the URL. This would be too verbose to type into the URL IMO. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #751: DRILL-5259: Allow listing a user-defined number of ...
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/751#discussion_r101872723 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -61,7 +64,7 @@ public class ProfileResources { static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProfileResources.class); - public final static int MAX_PROFILES = 100; + public final static int MAX_PROFILES = Integer.valueOf(System.getProperty(ExecConstants.HTTP_MAX_PROFILES, "100")); --- End diff -- I was hoping to. Wasn't sure how to get access to the DrillConfig object. Let me look at this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #751: DRILL-5259: Allow listing a user-defined number of ...
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/751#discussion_r101868584 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -190,10 +193,13 @@ public QProfiles(List runningQueries, List finishedQue public List getErrors() { return errors; } } + //max Param to cap listing of profiles + private static final String MAX_QPROFILES_PARAM = "max"; --- End diff -- "max" seems too generic for Context, say "drill.profiles.load.max"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #751: DRILL-5259: Allow listing a user-defined number of ...
Github user sudheeshkatkam commented on a diff in the pull request: https://github.com/apache/drill/pull/751#discussion_r101868848 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -61,7 +64,7 @@ public class ProfileResources { static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(ProfileResources.class); - public final static int MAX_PROFILES = 100; + public final static int MAX_PROFILES = Integer.valueOf(System.getProperty(ExecConstants.HTTP_MAX_PROFILES, "100")); --- End diff -- Make this part of `drill-override.conf`? (Using an Inject, see [DrillRestServer](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/DrillRestServer.java#L88)) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #742: DRILL-5242: The UI breaks when rendering profiles having u...
Github user sudheeshkatkam commented on the issue: https://github.com/apache/drill/pull/742 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #752: DRILL-5258: Access mock data definition from SQL
GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/752 DRILL-5258: Access mock data definition from SQL Extends the mock data source to allow using the full power of the mock data source from an SQL query by referencing the JSON definition file. See JIRA and package-info for details. You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-5258 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/752.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #752 commit eb9860d4365f60da3b3c22fc0f96a9acfd31ed5c Author: Paul RogersDate: 2017-02-14T18:02:13Z DRILL-5258: Access mock data definition from SQL Extends the mock data source to allow using the full power of the mock data source from an SQL query by referencing the JSON definition file. See JIRA and package-info for details. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (DRILL-4272) When sort runs out of memory and query fails, resources are seemingly not freed
[ https://issues.apache.org/jira/browse/DRILL-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-4272. Resolution: Fixed > When sort runs out of memory and query fails, resources are seemingly not > freed > --- > > Key: DRILL-4272 > URL: https://issues.apache.org/jira/browse/DRILL-4272 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Relational Operators >Affects Versions: 1.5.0 >Reporter: Victoria Markman >Assignee: Paul Rogers >Priority: Critical > > Executed query11.sql from resources/Advanced/tpcds/tpcds_sf1/original/parquet > Query runs out of memory: > {code} > Error: RESOURCE ERROR: One or more nodes ran out of memory while executing > the query. > Unable to allocate sv2 for 32768 records, and not enough batchGroups to spill. > batchGroups.size 1 > spilledBatchGroups.size 0 > allocated memory 19961472 > allocator limit 2000 > Fragment 19:0 > [Error Id: 87aa32b8-17eb-488e-90cb-5f5b9aec on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > And leaves fragments running, holding resources: > {code} > 2016-01-14 22:46:32,435 [Drillbit-ShutdownHook#0] INFO > o.apache.drill.exec.server.Drillbit - Received shutdown request. > 2016-01-14 22:46:32,546 [Curator-ServiceCache-0] WARN > o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-136.qa.lab no longer > active. Cancelling fragment 2967db08-cd38-925a-4960-9e881f537af8:19:0. > 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 2967db08-cd38-925a-4960-9e881f537af8:19:0: State change requested > CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED > 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] WARN > o.a.d.e.w.fragment.FragmentExecutor - > 2967db08-cd38-925a-4960-9e881f537af8:19:0: Ignoring unexpected state > transition CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED > 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] WARN > o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-136.qa.lab no longer > active. Cancelling fragment 2967db08-cd38-925a-4960-9e881f537af8:17:0. > 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 2967db08-cd38-925a-4960-9e881f537af8:17:0: State change requested > CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED > 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] WARN > o.a.d.e.w.fragment.FragmentExecutor - > 2967db08-cd38-925a-4960-9e881f537af8:17:0: Ignoring unexpected state > transition CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED > 2016-01-14 22:46:33,563 [BitServer-1] INFO > o.a.d.exec.rpc.control.ControlClient - Channel closed /10.10.88.134:59069 > <--> atsqa4-136.qa.lab/10.10.88.136:31011. > 2016-01-14 22:46:33,563 [BitClient-1] INFO > o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:34802 <--> > atsqa4-136.qa.lab/10.10.88.136:31012. > 2016-01-14 22:46:33,590 [BitClient-1] INFO > o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:36937 <--> > atsqa4-135.qa.lab/10.10.88.135:31012. > 2016-01-14 22:46:33,595 [BitClient-1] INFO > o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:53860 <--> > atsqa4-133.qa.lab/10.10.88.133:31012. > 2016-01-14 22:46:38,467 [BitClient-1] INFO > o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:48276 <--> > atsqa4-134.qa.lab/10.10.88.134:31012. > 2016-01-14 22:46:39,470 [pool-6-thread-1] INFO > o.a.drill.exec.rpc.user.UserServer - closed eventLoopGroup > io.netty.channel.nio.NioEventLoopGroup@6fb32dfb in 1003 ms > 2016-01-14 22:46:39,470 [pool-6-thread-2] INFO > o.a.drill.exec.rpc.data.DataServer - closed eventLoopGroup > io.netty.channel.nio.NioEventLoopGroup@5c93dd80 in 1003 ms > 2016-01-14 22:46:39,470 [pool-6-thread-1] INFO > o.a.drill.exec.service.ServiceEngine - closed userServer in 1004 ms > 2016-01-14 22:46:39,470 [pool-6-thread-2] INFO > o.a.drill.exec.service.ServiceEngine - closed dataPool in 1005 ms > 2016-01-14 22:46:39,483 [Drillbit-ShutdownHook#0] WARN > o.apache.drill.exec.work.WorkManager - Closing WorkManager but there are 2 > running fragments. > 2016-01-14 22:46:41,489 [Drillbit-ShutdownHook#0] ERROR > o.a.d.exec.server.BootStrapContext - Pool did not terminate > 2016-01-14 22:46:41,498 [Drillbit-ShutdownHook#0] WARN > o.apache.drill.exec.server.Drillbit - Failure on close() > java.lang.RuntimeException: Exception while closing > at > org.apache.drill.common.DrillAutoCloseables.closeNoChecked(DrillAutoCloseables.java:46) > ~[drill-common-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] > at > org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:127) > ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] > at >
[GitHub] drill pull request #751: DRILL-5259: Allow listing a user-defined number of ...
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/751#discussion_r101864433 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -190,10 +193,13 @@ public QProfiles(List runningQueries, List finishedQue public List getErrors() { return errors; } } + //max Param to cap listing of profiles + private static final String MAX_QPARAM = "max"; + --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #751: DRILL-5259: Allow listing a user-defined number of ...
Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/751#discussion_r101862127 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -190,10 +193,13 @@ public QProfiles(List runningQueries, List finishedQue public List getErrors() { return errors; } } + //max Param to cap listing of profiles + private static final String MAX_QPARAM = "max"; + --- End diff -- please choose a better name than MAX_QPARAM something like MAX_QPROFILES_PARAM ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (DRILL-5025) ExternalSortBatch provides weak control over spill file size
[ https://issues.apache.org/jira/browse/DRILL-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-5025. Resolution: Fixed > ExternalSortBatch provides weak control over spill file size > > > Key: DRILL-5025 > URL: https://issues.apache.org/jira/browse/DRILL-5025 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > The ExternalSortBatch (ESB) operator sorts records while spilling to disk to > control memory use. The size of the spill file is not easy to control. It is > a function of the accumulated batches size (half of the accumulated total), > which is determined by either the memory budget or the > {{drill.exec.sort.external.group.size}} parameter. (But, even with the > parameter, the actual file size is still half the accumulated batches.) > The proposed solution is to provide an explicit parameter that sets the > maximum spill file size: {{drill.exec.sort.external.spill.size}}. If the ESB > needs to spill more than this amount of data, ESB should split the spill into > multiple files. > The spill.size should be in bytes (or MB). (A size in records makes the file > size data-dependent, which would not be helpful.) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (DRILL-5055) External Sort does not delete spill file if error occurs during close
[ https://issues.apache.org/jira/browse/DRILL-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-5055. Resolution: Fixed > External Sort does not delete spill file if error occurs during close > - > > Key: DRILL-5055 > URL: https://issues.apache.org/jira/browse/DRILL-5055 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > DRILL-3898 recently fixed a case in which disk space was exhausted during a > spill event for the external sort. In this case, the call to close failed > because close attempted to flush remaining buffered data, but that also > failed due to out of space. > While the fix works, the fix causes the partially-completed spill file to be > left on disk. Consider this code in {{BatchGroup.close( )}} > {code} > if (outputStream != null) { > outputStream.close(); > } > ... > if (fs != null && fs.exists(path)) { > fs.delete(path, false); > } > {code} > Notice that, if the output stream close fails, the spill file is not deleted. > The fix is to put the delete in a finally block so that it is always deleted. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (DRILL-5019) ExternalSortBatch spills all batches to disk even if even one spills
[ https://issues.apache.org/jira/browse/DRILL-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-5019. Resolution: Fixed > ExternalSortBatch spills all batches to disk even if even one spills > > > Key: DRILL-5019 > URL: https://issues.apache.org/jira/browse/DRILL-5019 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > The ExternalSortBatch (ESB) operator sorts batches while spilling to disk to > stay within a defined memory budget. > Assume the memory budget is 10 GB. Assume that the actual volume of data to > be sorted is 10.1 GB. The ESB spills the extra 0.1 GB to disk. (Actually > spills more than that, say 5 GB.) > At the completion of the run, ESB has read all incoming batches. It must now > merge those batches. It does so by spilling **all** batches to disk, then > doing a disk-based merge. > This means that exceeding the memory limit by even a small amount is the same > as having a very low memory limit: all batches must spill. > This solution is simple, it works, and has some amount of logic. > But, it would be better to have a slightly more advanced solution that spills > only the smallest possible set of batches to disk, then does a hybrid > in-memory, on-disk merge, saving the unnecessary write/read cycle. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] drill pull request #751: DRILL-5259: Allow listing a user-defined number of ...
GitHub user kkhatua opened a pull request: https://github.com/apache/drill/pull/751 DRILL-5259: Allow listing a user-defined number of profiles Allow changing default number of finished queries in web UI, when starting up Drillbits -Ddrill.exec.http.max_profiles=100 Alternatively, the page can be loaded dynamically for the same. e.g. appending the parameter **max=\** `https://:8047/profiles?max=100` You can merge this pull request into a Git repository by running: $ git pull https://github.com/kkhatua/drill DRILL-5259 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/751.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #751 commit a47e86791a2cbbdb803d1b0f299572332198b1cf Author: Kunal Khatua Date: 2017-02-17T22:16:40Z DRILL-5259: Allow listing a user-defined number of profiles Allow changing default number of finished queries in web UI, when starting up Drillbits. Alternatively, the page can be loaded dynamically for the same. e.g. https://:8047/profiles?max=100 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (DRILL-5027) ExternalSortBatch is inefficient: rewrites data unnecessarily
[ https://issues.apache.org/jira/browse/DRILL-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-5027. Resolution: Fixed > ExternalSortBatch is inefficient: rewrites data unnecessarily > - > > Key: DRILL-5027 > URL: https://issues.apache.org/jira/browse/DRILL-5027 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > The {{ExternalSortBatch}} (ESB) operator sorts data while spilling to disk as > needed to operate within a memory budget. > The sort happens in two phases: > 1. Gather the incoming batches from the upstream operator, sort them, and > spill to disk as needed. > 2. Merge the "runs" spilled in step 1. > In most cases, the second step should run within the memory available for the > first step (which is why severity is only Minor). Large queries need multiple > sort "phases" in which previously spilled runs are read back into memory, > merged, and again spilled. It is here that ESB has an issue. This process > correctly limit the amount of memory used, but at the cost or rewriting the > same data over and over. > Consider current Drill behavior: > {code} > a b c d (re-spill) > abcd e f g h (re-spill) > abcefgh i j k > {code} > That is batches, a, b, c and d are re-spilled to create the combined abcd, > and so on. The same data is rewritten over and over. > Note that spilled batches take no (direct) memory in Drill, and require only > a small on-heap memento. So, maintaining data on disk s "free". So, better > would be to re-spill only newer data: > {code} > a b c d (re-spill) > abcd | e f g h (re-spill) > abcd efgh | i j k > {code} > Where the bar indicates a moving point at which we've already merged and do > not need to do so again. If each letter is one unit of disk I/O, the original > method uses 35 units while the revised method uses 27 units. > At some point the process may have to repeat by merging the second-generation > spill files and so on. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (DRILL-5022) ExternalSortBatch sets two different limits for "copier" memory
[ https://issues.apache.org/jira/browse/DRILL-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-5022. Resolution: Fixed > ExternalSortBatch sets two different limits for "copier" memory > --- > > Key: DRILL-5022 > URL: https://issues.apache.org/jira/browse/DRILL-5022 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > The {{ExternalSortBatch}} (ESB) operator sorts rows and supports spilling to > disk to operate within a set memory budget. > A key step in disk-based sorting is to merge "runs" of previously-sorted > records. ESB does this with a class created from the > {{PriorityQueueCopierTemplate}}, called the "copier" in the code. > The sort runs are represented by record batches, each with an indirection > vector (AKA {{SelectionVector}}) that point to the records in sort order. > The copier restructures the incoming runs: copying from the original batches > (from positions given by the indirection vector) into new output vectors in > sorted order. To do this work, the copier must allocate new vectors to hold > the merged data. These vectors consume memory, and must fit into the overall > memory budget assigned to the ESB. > As it turns out, the ESB code has two conflicting ways of setting the limit. > One is hard-coded: > {code} > private static final int COPIER_BATCH_MEM_LIMIT = 256 * 1024; > {code} > The other comes from config parameters: > {code} > public static final long INITIAL_ALLOCATION = 10_000_000; > public static final long MAX_ALLOCATION = 20_000_000; > copierAllocator = oAllocator.newChildAllocator(oAllocator.getName() + > ":copier", > PriorityQueueCopier.INITIAL_ALLOCATION, > PriorityQueueCopier.MAX_ALLOCATION); > {code} > Strangely, the config parameters are used to set aside memory for the copier > to use. But, the {{COPIER_BATCH_MEM_LIMIT}} is used to determine how large of > a merged batch to actually create. > The result is that we set aside 10 MB of memory, but use only 256K of it, > wasting 9 MB. > This ticket asks to: > * Determine the proper merged batch size. > * Use that limit to set the memory allocation for the copier. > Elsewhere in Drill batch sizes tend to be on the order of 32K records. In the > ESB, the low {{COPIER_BATCH_MEM_LIMIT}} tends to favor smaller batches: A > test case has a row width of 114 bytes, and produces batches of just 2299 > records. So, likely the proper choice is the larger 10 MB memory allocator > limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (DRILL-5008) Refactor, document and simplify ExternalSortBatch
[ https://issues.apache.org/jira/browse/DRILL-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-5008. Resolution: Fixed > Refactor, document and simplify ExternalSortBatch > - > > Key: DRILL-5008 > URL: https://issues.apache.org/jira/browse/DRILL-5008 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > ExternalSortBatch provides a spillable sort operator for Drill. The code > works fine, but can be a hard to follow and understand. Make the following > changes to improve ease-of-use for developers: > 1. Refactor the large methods into bite-sized chunks to aid understanding. > 2. Provide additional explanation of the theory and operation of this > operator. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (DRILL-5066) External sort attempts to retry sv2 memory alloc, even if can never succeed
[ https://issues.apache.org/jira/browse/DRILL-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers resolved DRILL-5066. Resolution: Fixed > External sort attempts to retry sv2 memory alloc, even if can never succeed > --- > > Key: DRILL-5066 > URL: https://issues.apache.org/jira/browse/DRILL-5066 > Project: Apache Drill > Issue Type: Sub-task >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > > The external sort contains rather complex code to allocate an sv2 in the > method {{newSV2()}}. The code tries to allocate an sv2. If the allocation > fails, the code attempts to spill (which is fine) and try again. If things > still fail, the code waits 1 sec. and tries again. It will continue to wait > up to a minute, doubling the wait time each cycle. > Presumably, this is so that some other part of Drill will release memory. > But, because of the way the allocator currently works, the allocation is > limited by the limit set on the external sort's own allocator. This limit > won't change by waiting. > The loop only makes sense if the memory allocation failed because the > external sort's allocator is not above its limit, but the parent can't > provide memory. > In practice, this scenario should not occur once the external sort is > resource managed, so the retry code can simply be removed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] drill pull request #750: DRILL-5273: CompliantTextReader excessive memory us...
GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/750 DRILL-5273: CompliantTextReader excessive memory use DRILL-5273 CompliantTextReader exhausts 4 GB memory when reading 5000 small files Please see JIRA for details of problem and fix. You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-5273 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/750.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #750 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #749: DRILL-5266: Parquet returns low-density batches
GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/749 DRILL-5266: Parquet returns low-density batches Fixes one glaring problem related to bit/byte confusion. Includes a few clean-up items found along the way. You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-5266 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/749.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #749 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [ANNOUNCE] New Apache Drill Committer - Abhishek Girish
Congratulations, Abhishek! On Fri, Feb 17, 2017 at 10:30 AM, Abhishek Girishwrote: > Thank you all :) > > On Fri, Feb 17, 2017 at 11:57 PM Sudheesh Katkam wrote: > >> Congratulations, Abhishek. >> >> > On Feb 15, 2017, at 1:22 PM, Aditya wrote: >> > >> > Congratulations, Abhishek! >> > >> > On Tue, Feb 14, 2017 at 10:39 PM, Khurram Faraaz >> wrote: >> > >> >> Congrats Abhishek! >> >> >> >> >> >> From: Parth Chandra >> >> Sent: Wednesday, February 15, 2017 9:48:18 AM >> >> To: dev@drill.apache.org >> >> Subject: [ANNOUNCE] New Apache Drill Committer - Abhishek Girish >> >> >> >> The Project Management Committee (PMC) for Apache Drill has invited >> >> Abhishek Girish to become a committer and we are pleased to announce >> that >> >> he has accepted. >> >> >> >> Welcome Abhishek and thanks for your great contributions! >> >> >> >> Parth >> >> >> >> -- > -Abhishek
Re: [ANNOUNCE] NewApache Drill Committer - Rahul Challapalli
Congratulations, Rahul! On Fri, Feb 17, 2017 at 10:26 AM, Sudheesh Katkamwrote: > Congratulations, Rahul! > >> On Feb 15, 2017, at 1:22 PM, Aditya wrote: >> >> Congratulations, Rahul! >> >> On Wed, Feb 15, 2017 at 1:04 PM, rahul challapalli < >> challapallira...@gmail.com> wrote: >> >>> Thank you Ramana and Khurram >>> >>> On Feb 15, 2017 11:19 AM, "Khurram Faraaz" wrote: >>> Congrats Rahul! From: Parth Chandra Sent: Wednesday, February 15, 2017 9:48:15 AM To: dev@drill.apache.org Subject: [ANNOUNCE] NewApache Drill Committer - Rahul Challapalli The Project Management Committee (PMC) for Apache Drill has invited Rahul Challapalli to become a committer and we are pleased to announce that he has accepted. Welcome Rahul and thanks for your great contributions! Parth >>> >
[jira] [Resolved] (DRILL-4953) Please delete the JIRA DRILL-4687 as it violates confidentiality
[ https://issues.apache.org/jira/browse/DRILL-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Khatua resolved DRILL-4953. - Resolution: Not A Bug > Please delete the JIRA DRILL-4687 as it violates confidentiality > > > Key: DRILL-4953 > URL: https://issues.apache.org/jira/browse/DRILL-4953 > Project: Apache Drill > Issue Type: Task > Components: Functions - Drill >Reporter: Vikas Taank >Priority: Blocker > > Please delete the JIRA DRILL-4687 as it violates confidentiality > https://issues.apache.org/jira/browse/DRILL-4687 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: [ANNOUNCE] New Apache Drill Committer - Abhishek Girish
Thank you all :) On Fri, Feb 17, 2017 at 11:57 PM Sudheesh Katkamwrote: > Congratulations, Abhishek. > > > On Feb 15, 2017, at 1:22 PM, Aditya wrote: > > > > Congratulations, Abhishek! > > > > On Tue, Feb 14, 2017 at 10:39 PM, Khurram Faraaz > wrote: > > > >> Congrats Abhishek! > >> > >> > >> From: Parth Chandra > >> Sent: Wednesday, February 15, 2017 9:48:18 AM > >> To: dev@drill.apache.org > >> Subject: [ANNOUNCE] New Apache Drill Committer - Abhishek Girish > >> > >> The Project Management Committee (PMC) for Apache Drill has invited > >> Abhishek Girish to become a committer and we are pleased to announce > that > >> he has accepted. > >> > >> Welcome Abhishek and thanks for your great contributions! > >> > >> Parth > >> > > -- -Abhishek
Re: [ANNOUNCE] New Apache Drill Committer - Abhishek Girish
Congratulations, Abhishek. > On Feb 15, 2017, at 1:22 PM, Adityawrote: > > Congratulations, Abhishek! > > On Tue, Feb 14, 2017 at 10:39 PM, Khurram Faraaz wrote: > >> Congrats Abhishek! >> >> >> From: Parth Chandra >> Sent: Wednesday, February 15, 2017 9:48:18 AM >> To: dev@drill.apache.org >> Subject: [ANNOUNCE] New Apache Drill Committer - Abhishek Girish >> >> The Project Management Committee (PMC) for Apache Drill has invited >> Abhishek Girish to become a committer and we are pleased to announce that >> he has accepted. >> >> Welcome Abhishek and thanks for your great contributions! >> >> Parth >>
Re: [ANNOUNCE] NewApache Drill Committer - Rahul Challapalli
Congratulations, Rahul! > On Feb 15, 2017, at 1:22 PM, Adityawrote: > > Congratulations, Rahul! > > On Wed, Feb 15, 2017 at 1:04 PM, rahul challapalli < > challapallira...@gmail.com> wrote: > >> Thank you Ramana and Khurram >> >> On Feb 15, 2017 11:19 AM, "Khurram Faraaz" wrote: >> >>> Congrats Rahul! >>> >>> >>> From: Parth Chandra >>> Sent: Wednesday, February 15, 2017 9:48:15 AM >>> To: dev@drill.apache.org >>> Subject: [ANNOUNCE] NewApache Drill Committer - Rahul Challapalli >>> >>> The Project Management Committee (PMC) for Apache Drill has invited Rahul >>> Challapalli to become a committer and we are pleased to announce that he >>> has accepted. >>> >>> Welcome Rahul and thanks for your great contributions! >>> >>> Parth >>> >>
[GitHub] drill issue #578: DRILL-4280: Kerberos Authentication
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/578 +1 Thanks for the work guys, especially the extensive review comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---