[jira] [Resolved] (DRILL-5088) Error when reading DBRef column

2017-02-24 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam resolved DRILL-5088.

   Resolution: Fixed
Fix Version/s: 1.10.0

Fixed in 
[b892b99|https://github.com/apache/drill/commit/b892b997dfa0259550942f076b0afd89b27c9fdf]

> Error when reading DBRef column
> ---
>
> Key: DRILL-5088
> URL: https://issues.apache.org/jira/browse/DRILL-5088
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
> Environment: drill 1.9.0
> mongo 3.2
>Reporter: Guillaume Champion
>Assignee: Chunhui Shi
> Fix For: 1.10.0
>
>
> In a mongo database with DBRef, when a DBRef is inserted in the first line of 
> a mongo's collection drill query failed :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> {code}
> Simple example to reproduce:
> In mongo instance
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> [Error Id: 2944d766-e483-4453-a706-3d481397b186 on Analytics-Biznet:31010] 
> (state=,code=0)
> {code}
> If the first line doesn't contain de DBRef, drill will querying correctly :
> In a mongo instance :
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8939") });
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> +--+---+
> | _id  |account   
>  |
> +--+---+
> | {"$oid":"582081d96b69060001fd8939"}  | {"$id":{}}   
>  |
> | {"$oid":"582081d96b69060001fd8938"}  | 
> {"$ref":"contact","$id":{"$oid":"999cbf116b69060001fd8611"}}  |
> +--+---+
> 2 rows selected (0,563 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5088) Error when reading DBRef column

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884108#comment-15884108
 ] 

ASF GitHub Bot commented on DRILL-5088:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/702


> Error when reading DBRef column
> ---
>
> Key: DRILL-5088
> URL: https://issues.apache.org/jira/browse/DRILL-5088
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
> Environment: drill 1.9.0
> mongo 3.2
>Reporter: Guillaume Champion
>Assignee: Chunhui Shi
>
> In a mongo database with DBRef, when a DBRef is inserted in the first line of 
> a mongo's collection drill query failed :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> {code}
> Simple example to reproduce:
> In mongo instance
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> [Error Id: 2944d766-e483-4453-a706-3d481397b186 on Analytics-Biznet:31010] 
> (state=,code=0)
> {code}
> If the first line doesn't contain de DBRef, drill will querying correctly :
> In a mongo instance :
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8939") });
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> +--+---+
> | _id  |account   
>  |
> +--+---+
> | {"$oid":"582081d96b69060001fd8939"}  | {"$id":{}}   
>  |
> | {"$oid":"582081d96b69060001fd8938"}  | 
> {"$ref":"contact","$id":{"$oid":"999cbf116b69060001fd8611"}}  |
> +--+---+
> 2 rows selected (0,563 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4280) Kerberos Authentication

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884112#comment-15884112
 ] 

ASF GitHub Bot commented on DRILL-4280:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/578


> Kerberos Authentication
> ---
>
> Key: DRILL-4280
> URL: https://issues.apache.org/jira/browse/DRILL-4280
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: security
>
> Drill should support Kerberos based authentication from clients. This means 
> that both the ODBC and JDBC drivers as well as the web/REST interfaces should 
> support inbound Kerberos. For Web this would most likely be SPNEGO while for 
> ODBC and JDBC this will be more generic Kerberos.
> Since Hive and much of Hadoop supports Kerberos there is a potential for a 
> lot of reuse of ideas if not implementation.
> Note that this is related to but not the same as 
> https://issues.apache.org/jira/browse/DRILL-3584 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5260) Refinements to new "Cluster Fixture" test framework

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884102#comment-15884102
 ] 

ASF GitHub Bot commented on DRILL-5260:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/753


> Refinements to new "Cluster Fixture" test framework
> ---
>
> Key: DRILL-5260
> URL: https://issues.apache.org/jira/browse/DRILL-5260
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10
>
>
> Roll-up of a number of enhancements to the cluster fixture framework.
> * Config option to suppress printing of CSV and other output. (Allows 
> printing for single tests, not printing when running from Maven.)
> * Parsing of query profiles to extract plan and run time information.
> * Fix bug in log fixture when enabling logging for a package.
> * Improved ZK support.
> * Set up the new CTTAS default temporary workspace for tests.
> * Revise TestDrillbitResiliance to use the new framework.
> * Revise TestWindowFrame to to use the new framework.
> * Revise TestMergeJoinWithSchemaChanges to use the new framework.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5255) Unit tests fail due to CTTAS temporary name space checks

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884107#comment-15884107
 ] 

ASF GitHub Bot commented on DRILL-5255:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/759


> Unit tests fail due to CTTAS temporary name space checks
> 
>
> Key: DRILL-5255
> URL: https://issues.apache.org/jira/browse/DRILL-5255
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> Drill can operate in embedded mode. In this mode, no storage plugin 
> definitions other than the defaults may be present. In particular, when using 
> the Drill test framework, only those storage plugins defined in the Drill 
> code are available.
> Yet, Drill checks for the existence of the dfs.tmp plugin definition (as 
> named by the {{drill.exec.default_temporary_workspace}} parameter. Because 
> this plugin is not defined, an exception occurs:
> {code}
> org.apache.drill.common.exceptions.UserException: PARSE ERROR: Unable to 
> create or drop tables/views. Schema [dfs.tmp] is immutable.
> [Error Id: 792d4e5d-3f31-4f38-8bb4-d108f1a808f6 ]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>   at 
> org.apache.drill.exec.planner.sql.SchemaUtilites.resolveToMutableDrillSchema(SchemaUtilites.java:184)
>   at 
> org.apache.drill.exec.planner.sql.SchemaUtilites.getTemporaryWorkspace(SchemaUtilites.java:201)
>   at 
> org.apache.drill.exec.server.Drillbit.validateTemporaryWorkspace(Drillbit.java:264)
>   at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:135)
>   at 
> org.apache.drill.test.ClusterFixture.startDrillbits(ClusterFixture.java:207)
>   ...
> {code}
> Expected that either a configuration would exist that would use the default 
> /tmp/drill location, or that the check for {{drill.tmp}} would be deferred 
> until it is actually required (such as when executing a CTTAS statement.)
> It seemed that the test framework must be altered to work around this problem 
> by defining the necessary workspace. Unfortunately, the Drillbit must start 
> before we can define the workspace needed for the Drillbit to start. So, this 
> workaround is not possible.
> Further, users of the embedded Drillbit may not know to do this configuration.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5273) CompliantTextReader exhausts 4 GB memory when reading 5000 small files

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884105#comment-15884105
 ] 

ASF GitHub Bot commented on DRILL-5273:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/750


> CompliantTextReader exhausts 4 GB memory when reading 5000 small files
> --
>
> Key: DRILL-5273
> URL: https://issues.apache.org/jira/browse/DRILL-5273
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> A test case was created that consists of 5000 text files, each with a single 
> line with the file number: 1 to 5001. Each file has a single record, and at 
> most 4 characters per record.
> Run the following query:
> {code}
> SELECT * FROM `dfs.data`.`5000files/text
> {code}
> The query will fail with an OOM in the scan batch on around record 3700 on a 
> Mac with 4GB of direct memory.
> The code to read records in {ScanBatch} is complex. The following appears to 
> occur:
> * Iterate over the record readers for each file.
> * For each, call setup
> The setup code is:
> {code}
>   public void setup(OperatorContext context, OutputMutator outputMutator) 
> throws ExecutionSetupException {
> oContext = context;
> readBuffer = context.getManagedBuffer(READ_BUFFER);
> whitespaceBuffer = context.getManagedBuffer(WHITE_SPACE_BUFFER);
> {code}
> The two buffers are in direct memory. There is no code that releases the 
> buffers.
> The sizes are:
> {code}
>   private static final int READ_BUFFER = 1024*1024;
>   private static final int WHITE_SPACE_BUFFER = 64*1024;
> = 1,048,576 + 65536 = 1,114,112
> {code}
> This is exactly the amount of memory that accumulates per call to 
> {{ScanBatch.next()}}
> {code}
> Ctor: 0  -- Initial memory in constructor
> Init setup: 1114112  -- After call to first record reader setup
> Entry Memory: 1114112  -- first next() call, returns one record
> Entry Memory: 1114112  -- second next(), eof and start second reader
> Entry Memory: 2228224 -- third next(), second reader returns EOF
> ...
> {code}
> If we leak 1 MB per file, with 5000 files we would leak 5 GB of memory, which 
> would explain the OOM when given only 4 GB.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5190) Display planning and queued time for a query in its profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884109#comment-15884109
 ] 

ASF GitHub Bot commented on DRILL-5190:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/738


> Display planning and queued time for a query in its profile page
> 
>
> Key: DRILL-5190
> URL: https://issues.apache.org/jira/browse/DRILL-5190
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> Currently, the Web UI does not display the time spent in planning for a query 
> in its profile page. The estimate needs to be done by seeing how late did the 
> earliest major fragment start. 
> As an additional enhancement, we can also track for the time spent by a query 
> in waiting in the queue and the actual execution time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5257) Provide option to save query profiles sync, async or not at all

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884106#comment-15884106
 ] 

ASF GitHub Bot commented on DRILL-5257:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/747


> Provide option to save query profiles sync, async or not at all
> ---
>
> Key: DRILL-5257
> URL: https://issues.apache.org/jira/browse/DRILL-5257
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10
>
>
> DRILL-5123 improved perceived query performance by writing the query profile 
> after sending a final response to the client. This is the desired behaviors 
> in most situations. However, some tests want to verify certain results by 
> reading the query profile from disk. Doing so works best when the query 
> profile is written before returning the final query results.
> This ticket requests that the timing if the query profile writing be 
> configurable.
> * Sync: write profile before final client response.
> * Async: write profile after final client response. (Default)
> * None: don't write query profile at all
> A config option (boot time? run time?) should control the option. A boot-time 
> option is fine for testing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5275) Sort spill serialization is slow due to repeated buffer allocations

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884101#comment-15884101
 ] 

ASF GitHub Bot commented on DRILL-5275:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/754


> Sort spill serialization is slow due to repeated buffer allocations
> ---
>
> Key: DRILL-5275
> URL: https://issues.apache.org/jira/browse/DRILL-5275
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> Drill provides a sort operator that spills to disk. The spill and read 
> operations use the serialization code in the 
> {{VectorAccessibleSerializable}}. This code, in turn, uses the 
> {{DrillBuf.getBytes()}} method to write to an output stream. (Yes, the "get" 
> method writes, and the "write" method reads...)
> The DrillBuf method turns around and calls the UDLE method that does:
> {code}
> byte[] tmp = new byte[length];
> PlatformDependent.copyMemory(addr(index), tmp, 0, length);
> out.write(tmp);
> {code}
> That is, for each write the code allocates a heap buffer. Since Drill buffers 
> can be quite large (4, 8, 16 MB or larger), the above rapidly fills the heap 
> and causes GC.
> The result is slow performance. On a Mac, with an SSD that can do 700 MB/s of 
> I/O, we get only about 40 MB/s. Very likely because of excessive CPU cost and 
> GC.
> The solution is to allocate a single read or write buffer, then use that same 
> buffer over and over when reading or writing. This must be done in 
> {{VectorAccessibleSerializable}} as it is a per-thread class that has 
> visibility to all the buffers to be written.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5259) Allow listing a user-defined number of profiles

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884103#comment-15884103
 ] 

ASF GitHub Bot commented on DRILL-5259:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/751


> Allow listing a user-defined number of profiles 
> 
>
> Key: DRILL-5259
> URL: https://issues.apache.org/jira/browse/DRILL-5259
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Trivial
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> Currently, the web UI only lists the last 100 profiles. 
> This count is currently hard coded. The proposed change would be to create an 
> option in drill-override.conf to provide a flexible default value, and also 
> an option within the UI (via optional parameter in the path). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5274) Exception thrown in Drillbit shutdown in UDF cleanup code

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884104#comment-15884104
 ] 

ASF GitHub Bot commented on DRILL-5274:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/760


> Exception thrown in Drillbit shutdown in UDF cleanup code
> -
>
> Key: DRILL-5274
> URL: https://issues.apache.org/jira/browse/DRILL-5274
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: ready-to-commit
>
> I ran a very simple query: a single-line text file in an embedded Drillbit. 
> The UDF directory was placed in /tmp. During the run, the directory was 
> deleted. On Drillbit shutdown, the following occurred:
> {code}
> 25328 DEBUG [main] [org.apache.drill.exec.server.Drillbit] - Shutdown begun.
> 26344 INFO [pool-1-thread-2] [org.apache.drill.exec.rpc.data.DataServer] - 
> closed eventLoopGroup io.netty.channel.nio.NioEventLoopGroup@7d1c0d85 in 1007 
> ms
> 26345 INFO [pool-1-thread-1] [org.apache.drill.exec.rpc.user.UserServer] - 
> closed eventLoopGroup io.netty.channel.nio.NioEventLoopGroup@7cdb3b56 in 1008 
> ms
> 26345 INFO [pool-1-thread-1] [org.apache.drill.exec.service.ServiceEngine] - 
> closed userServer in 1009 ms
> 26345 INFO [pool-1-thread-2] [org.apache.drill.exec.service.ServiceEngine] - 
> closed dataPool in 1009 ms
> 26356 WARN [main] [org.apache.drill.exec.server.Drillbit] - Failure on close()
> java.lang.IllegalArgumentException: /tmp/drill/udf/udf/local does not exist
>   at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1637) 
> ~[commons-io-2.4.jar:2.4]
>   at 
> org.apache.drill.exec.expr.fn.FunctionImplementationRegistry.close(FunctionImplementationRegistry.java:469)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.server.DrillbitContext.close(DrillbitContext.java:209) 
> ~[classes/:na]
>   at org.apache.drill.exec.work.WorkManager.close(WorkManager.java:152) 
> ~[classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
> ~[classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
> ~[classes/:na]
>   at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:171) 
> ~[classes/:na]
> ...
> {code}
> The following patch makes the problem go away, but I'm not sure if the above 
> is an indication of deeper problems.
> {code}
> public class FunctionImplementationRegistry implements FunctionLookupContext, 
> AutoCloseable {
>   ...
>   public void close() {
> if (deleteTmpDir) {
>   ...
> } else {
>   try {
> File dir = new File(localUdfDir.toUri().getPath());
> if (dir.exists()) {
>   FileUtils.cleanDirectory(dir);
> }
>   ...
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884111#comment-15884111
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/756


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>  Labels: ready-to-commit
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5196) Could not run a single MongoDB unit test case through command line or IDE

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884110#comment-15884110
 ] 

ASF GitHub Bot commented on DRILL-5196:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/741


> Could not run a single MongoDB unit test case through command line or IDE
> -
>
> Key: DRILL-5196
> URL: https://issues.apache.org/jira/browse/DRILL-5196
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>  Labels: ready-to-commit
>
> Could not run a single MongoDB's unit test through IDE or command line. The 
> reason is when running a single test case, the MongoDB instance did not get 
> started thus a 'table not found' error for 'mongo.employee.empinfo' would be 
> raised.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5298) CTAS with 0 records from a SELECT query should create the table with metadata

2017-02-24 Thread Senthilkumar (JIRA)
Senthilkumar created DRILL-5298:
---

 Summary: CTAS with 0 records from a SELECT query should create the 
table with metadata
 Key: DRILL-5298
 URL: https://issues.apache.org/jira/browse/DRILL-5298
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata, Query Planning & Optimization
Affects Versions: 1.9.0
 Environment: MapR 5.2
Reporter: Senthilkumar
 Fix For: 1.9.0


Hello team,

I create a table in Drill using CTAS as

CREATE TABLE CTAS_TEST AS SELECT * FROM `hive.default`.`test` WHERE 1 = 0

It runs successfully.

But the table is not getting created as there are 0 records getting returned 
from the SELECT query. 

CTAS should still go ahead and create the table with the column metadata.

When BI tools fire up multi-pass queries, with CTAS in the first query, the 
subsequent queries fail because of a missing table.

In databases like SQL Server, Postgres, CTAS will create the table, even if the 
SELECT doesnt return any rows.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883991#comment-15883991
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103066042
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/SpillSet.java
 ---
@@ -357,9 +393,13 @@ public SpillSet(FragmentContext context, 
PhysicalOperator popConfig) {
 } else {
   fileManager = new HadoopFileManager(spillFs);
 }
-FragmentHandle handle = context.getHandle();
-spillDirName = String.format("%s_major%s_minor%s_op%s", 
QueryIdHelper.getQueryId(handle.getQueryId()),
-handle.getMajorFragmentId(), handle.getMinorFragmentId(), 
popConfig.getOperatorId());
+spillDirName = String.format(
--- End diff --

Meanwhile I changed this code to print something like:


/tmp/drill/spill/27509954-4b86-b20a-f16a-c46644f74519_HashAgg_1-3_minor0/spill1

So the operator name "HashAgg_1-3" looks like what is presented in the web 
interface.
 


> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883987#comment-15883987
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103069029
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta, 
RecordBatchSizer sizer) {
 // spill batches of either 64K records, or as many records as fit into 
the
 // amount of memory dedicated to each spill batch, whichever is less.
 
-spillBatchRowCount = (int) Math.max(1, spillBatchSize / 
estimatedRowWidth);
+spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize / 
estimatedRowWidth / 2);
 spillBatchRowCount = Math.min(spillBatchRowCount, Character.MAX_VALUE);
 
+// Compute the actual spill batch size which may be larger or smaller
+// than the preferred size depending on the row width. Double the 
estimated
+// memory needs to allow for power-of-two rounding.
+
+targetSpillBatchSize = spillBatchRowCount * estimatedRowWidth * 2;
+
 // Determine the number of records per batch per merge step. The goal 
is to
 // merge batches of either 64K records, or as many records as fit into 
the
 // amount of memory dedicated to each merge batch, whichever is less.
 
-targetMergeBatchSize = preferredMergeBatchSize;
-mergeBatchRowCount = (int) Math.max(1, targetMergeBatchSize / 
estimatedRowWidth);
+mergeBatchRowCount = (int) Math.max(1, preferredMergeBatchSize / 
estimatedRowWidth / 2);
 mergeBatchRowCount = Math.min(mergeBatchRowCount, Character.MAX_VALUE);
+targetMergeBatchSize = mergeBatchRowCount * estimatedRowWidth * 2;
 
 // Determine the minimum memory needed for spilling. Spilling is done 
just
 // before accepting a batch, so we must spill if we don't have room 
for a
 // (worst case) input batch. To spill, we need room for the output 
batch created
 // by merging the batches already in memory. Double this to allow for 
power-of-two
 // memory allocations.
 
-spillPoint = estimatedInputBatchSize + 2 * spillBatchSize;
+long spillPoint = estimatedInputBatchSize + 2 * targetSpillBatchSize;
 
 // The merge memory pool assumes we can spill all input batches. To 
make
 // progress, we must have at least two merge batches (same size as an 
output
 // batch) and one output batch. Again, double to allow for power-of-two
 // allocation and add one for a margin of error.
 
-int minMergeBatches = 2 * 3 + 1;
-long minMergeMemory = minMergeBatches * targetMergeBatchSize;
+long minMergeMemory = Math.round((2 * targetSpillBatchSize + 
targetMergeBatchSize) * 1.05);
 
 // If we are in a low-memory condition, then we might not have room 
for the
 // default output batch size. In that case, pick a smaller size.
 
-long minMemory = Math.max(spillPoint, minMergeMemory);
-if (minMemory > memoryLimit) {
-
-  // Figure out the minimum output batch size based on memory, but 
can't be
-  // any smaller than the defined minimum.
-
-  targetMergeBatchSize = Math.max(MIN_MERGED_BATCH_SIZE, memoryLimit / 
minMergeBatches);
+if (minMergeMemory > memoryLimit) {
 
-  // Regardless of anything else, the batch must hold at least one
-  // complete row.
+  // Figure out the minimum output batch size based on memory,
+  // must hold at least one complete row.
 
-  targetMergeBatchSize = Math.max(estimatedRowWidth, 
targetMergeBatchSize);
-  spillPoint = estimatedInputBatchSize + 2 * spillBatchSize;
-  minMergeMemory = minMergeBatches * targetMergeBatchSize;
+  long mergeAllowance = Math.round((memoryLimit - 2 * 
targetSpillBatchSize) * 0.95);
+  targetMergeBatchSize = Math.max(estimatedRowWidth, mergeAllowance / 
2);
+  mergeBatchRowCount = (int) (targetMergeBatchSize / estimatedRowWidth 
/ 2);
--- End diff --

If estimatedRowWidth is huge, then mergeBatchRowCount may be zero ! 


> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a 

[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883988#comment-15883988
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103067184
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -219,7 +220,18 @@
 
   private BatchSchema schema;
 
+  /**
+   * Incoming batches buffered in memory prior to spilling
+   * or an in-memory merge.
+   */
+
   private LinkedList bufferedBatches = 
Lists.newLinkedList();
+
+  /**
+   * Spilled runs consisting of a large number of spilled
+   * in-memory batches.
--- End diff --

"spilled in-memory" is an oxymoron :-) 


> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883995#comment-15883995
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103067805
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -392,22 +448,31 @@ private void configure(DrillConfig config) {
 // Set too large and the ratio between memory and input data sizes 
becomes
 // small. Set too small and disk seek times dominate performance.
 
-spillBatchSize = 
config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE);
-spillBatchSize = Math.max(spillBatchSize, MIN_SPILL_BATCH_SIZE);
+preferredSpillBatchSize = 
config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE);
+
+// In low memory, use no more than 1/4 of memory for each spill batch. 
Ensures we
+// can merge.
+
+preferredSpillBatchSize = Math.min(preferredSpillBatchSize, 
memoryLimit / 4);
--- End diff --

Why restrict the spill batch size so low ? This would create more runs and 
increase the risk of needing those intermediate merges.  Otherwise during a 
merge, only a single batch at a time is read from each run, not the whole run 
(I believe -- if we spill all the remaining batches at the end ...)



> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883989#comment-15883989
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103069698
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -1231,52 +1308,44 @@ private boolean consolidateBatches() {
* This method spills only half the accumulated batches
* minimizing unnecessary disk writes. The exact count must lie between
* the minimum and maximum spill counts.
-*/
+   */
 
   private void spillFromMemory() {
 
 // Determine the number of batches to spill to create a spill file
 // of the desired size. The actual file size might be a bit larger
 // or smaller than the target, which is expected.
 
-long estSize = 0;
 int spillCount = 0;
+long spillSize = 0;
 for (InputBatch batch : bufferedBatches) {
-  estSize += batch.getDataSize();
-  if (estSize > spillFileSize) {
-break; }
+  long batchSize = batch.getDataSize();
+  spillSize += batchSize;
   spillCount++;
+  if (spillSize + batchSize / 2 > spillFileSize) {
+break; }
 }
 
-// Should not happen, but just to be sure...
+// Must always spill at least 2, even if this creates an over-size
+// spill file.
 
-if (spillCount == 0) {
-  return; }
+spillCount = Math.max(spillCount, 2);
 
 // Do the actual spill.
 
-logger.trace("Starting spill from memory. Memory = {}, Buffered batch 
count = {}, Spill batch count = {}",
- allocator.getAllocatedMemory(), bufferedBatches.size(), 
spillCount);
 mergeAndSpill(bufferedBatches, spillCount);
   }
 
   private void mergeAndSpill(LinkedList source, int 
count) {
-if (count == 0) {
-  return; }
 spilledRuns.add(doMergeAndSpill(source, count));
   }
 
   private BatchGroup.SpilledRun doMergeAndSpill(LinkedList batchGroups, int spillCount) {
 List batchesToSpill = Lists.newArrayList();
 spillCount = Math.min(batchGroups.size(), spillCount);
 assert spillCount > 0 : "Spill count to mergeAndSpill must not be 
zero";
-long spillSize = 0;
 for (int i = 0; i < spillCount; i++) {
-  @SuppressWarnings("resource")
-  BatchGroup batch = batchGroups.pollFirst();
-  assert batch != null : "Encountered a null batch during merge and 
spill operation";
-  batchesToSpill.add(batch);
-  spillSize += batch.getDataSize();
+  batchesToSpill.add(batchGroups.pollFirst());
--- End diff --

In case there was only one batch group, but we bumped spillCount up to 2 
(line 1332 above), then pollFirst() would return a null at the second time ?



> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883993#comment-15883993
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103067062
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -238,6 +238,25 @@ public boolean copyFromSafe(int fromIndex, int 
thisIndex, ${minor.class}Vector f
 return true;
   }
 
+  @Override
+  public int getAllocatedByteCount() {
+return offsetVector.getAllocatedByteCount() + 
super.getAllocatedByteCount();
--- End diff --

Why don't the other getAllocatedByteCount() methods add the "super" ?  Is 
it because their supers do not allocate ?



> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883996#comment-15883996
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103068553
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -934,6 +1005,14 @@ private void updateMemoryEstimates(long memoryDelta, 
RecordBatchSizer sizer) {
 long origInputBatchSize = estimatedInputBatchSize;
 estimatedInputBatchSize = Math.max(estimatedInputBatchSize, 
actualBatchSize);
 
+// The row width may end up as zero if all fields are nulls or some
+// other unusual situation. In this case, assume a width of 10 just
+// to avoid lots of special case code.
+
+if (estimatedRowWidth == 0) {
+  estimatedRowWidth = 10;
--- End diff --

Where is estimatedRowWidth  being set ?  Could there be an extreme 
situation (e.g. too many columns) such that we do write much more than 10, and 
thus all the following computations are off ?



> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883990#comment-15883990
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103068736
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta, 
RecordBatchSizer sizer) {
 // spill batches of either 64K records, or as many records as fit into 
the
 // amount of memory dedicated to each spill batch, whichever is less.
 
-spillBatchRowCount = (int) Math.max(1, spillBatchSize / 
estimatedRowWidth);
+spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize / 
estimatedRowWidth / 2);
--- End diff --

This factor of 2 seems extremely cautious; we may pay a little in 
performance every time, just for that.



> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883994#comment-15883994
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103068095
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -765,12 +838,12 @@ private void processBatch() {
   spillFromMemory();
 }
 
-// Sanity check. We should now be above the spill point.
+// Sanity check. We should now be below the buffer memory maximum.
 
 long startMem = allocator.getAllocatedMemory();
-if (memoryLimit - startMem < spillPoint) {
-  logger.error( "ERROR: Failed to spill below the spill point. Spill 
point = {}, free memory = {}",
-spillPoint, memoryLimit - startMem);
+if (startMem > bufferMemoryPool) {
+  logger.error( "ERROR: Failed to spill above buffer limit. Buffer 
pool = {}, memory = {}",
+  bufferMemoryPool, startMem);
--- End diff --

Do we need to return or throw an exception here ? 


> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5297) Print the plan text when plan pattern check fails in unit tests

2017-02-24 Thread Chunhui Shi (JIRA)
Chunhui Shi created DRILL-5297:
--

 Summary: Print the plan text when plan pattern check fails in unit 
tests 
 Key: DRILL-5297
 URL: https://issues.apache.org/jira/browse/DRILL-5297
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi


If we have a unit test did not generate expected plan, we will print only the 
expected pattern like this:

Did not find expected pattern in plan: Scan.*FindLimit0Visitor"

We should also print the plan here for debugging purpose.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5257) Provide option to save query profiles sync, async or not at all

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883940#comment-15883940
 ] 

ASF GitHub Bot commented on DRILL-5257:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/747
  
+1


> Provide option to save query profiles sync, async or not at all
> ---
>
> Key: DRILL-5257
> URL: https://issues.apache.org/jira/browse/DRILL-5257
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10
>
>
> DRILL-5123 improved perceived query performance by writing the query profile 
> after sending a final response to the client. This is the desired behaviors 
> in most situations. However, some tests want to verify certain results by 
> reading the query profile from disk. Doing so works best when the query 
> profile is written before returning the final query results.
> This ticket requests that the timing if the query profile writing be 
> configurable.
> * Sync: write profile before final client response.
> * Async: write profile after final client response. (Default)
> * None: don't write query profile at all
> A config option (boot time? run time?) should control the option. A boot-time 
> option is fine for testing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5088) Error when reading DBRef column

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883939#comment-15883939
 ] 

ASF GitHub Bot commented on DRILL-5088:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/702
  
+1


> Error when reading DBRef column
> ---
>
> Key: DRILL-5088
> URL: https://issues.apache.org/jira/browse/DRILL-5088
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
> Environment: drill 1.9.0
> mongo 3.2
>Reporter: Guillaume Champion
>Assignee: Chunhui Shi
>
> In a mongo database with DBRef, when a DBRef is inserted in the first line of 
> a mongo's collection drill query failed :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> {code}
> Simple example to reproduce:
> In mongo instance
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> [Error Id: 2944d766-e483-4453-a706-3d481397b186 on Analytics-Biznet:31010] 
> (state=,code=0)
> {code}
> If the first line doesn't contain de DBRef, drill will querying correctly :
> In a mongo instance :
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8939") });
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> +--+---+
> | _id  |account   
>  |
> +--+---+
> | {"$oid":"582081d96b69060001fd8939"}  | {"$id":{}}   
>  |
> | {"$oid":"582081d96b69060001fd8938"}  | 
> {"$ref":"contact","$id":{"$oid":"999cbf116b69060001fd8611"}}  |
> +--+---+
> 2 rows selected (0,563 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883936#comment-15883936
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/756
  
+1


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>  Labels: ready-to-commit
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5114) Rationalize use of Logback logging in unit tests

2017-02-24 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam updated DRILL-5114:
---
Labels:   (was: ready-to-commit)

> Rationalize use of Logback logging in unit tests
> 
>
> Key: DRILL-5114
> URL: https://issues.apache.org/jira/browse/DRILL-5114
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Drill uses Logback as its logger. The logger is used in several to display 
> some test output. Test output is sent to stdout, rather than a log file. 
> Since Drill also uses Logback, that same configuration sends much Drill 
> logging output to stdout as well, cluttering test output.
> Logback requires that that one Logback config file (either logback.xml or 
> hogback-test.xml) exist on the class path. Tests store the config file in the 
> src/test/resources folder of each sub-project.
> These files set the default logging level to debug. While this setting is 
> fine when working with individual tests, the output is overwhelming for bulk 
> test runs.
> The first requested change is to set the default logging level to error.
> The existing config files are usually called "logback.xml." Change the name 
> of test files to "logback-test.xml" to make clear that they are, in fact, 
> test configs.
> The {{exec/java-exec/src/test/resources/logback.xml}} config file is a full 
> version of Drill's production config file. Replace this with a config 
> suitable for testing (that is, the same as other modules.)
> The java-exec project includes a production-like config file in its non-test 
> sources: {{exec/java-exec/src/main/resources/logback.xml}}. Remove this as it 
> is not needed. (Instead, rely on the one shipped in the distribution 
> subsystem, which is the one copied to the Drill distribution.)
> Since Logback complains bitterly (via many log messages) when it cannot find 
> a configuration file (and each sub-module must have its own test 
> configuration), add missing logging configuration files:
> * exec/memory/base/src/test/resources/logback-test.xml
> * logical/src/test/resources/logback-test.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5114) Rationalize use of Logback logging in unit tests

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883926#comment-15883926
 ] 

ASF GitHub Bot commented on DRILL-5114:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/762
  
Some of us rely on Lilith (SOCKET ref) to view "debug" logs while running 
unit tests, any reason to change to "error" level?


> Rationalize use of Logback logging in unit tests
> 
>
> Key: DRILL-5114
> URL: https://issues.apache.org/jira/browse/DRILL-5114
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Drill uses Logback as its logger. The logger is used in several to display 
> some test output. Test output is sent to stdout, rather than a log file. 
> Since Drill also uses Logback, that same configuration sends much Drill 
> logging output to stdout as well, cluttering test output.
> Logback requires that that one Logback config file (either logback.xml or 
> hogback-test.xml) exist on the class path. Tests store the config file in the 
> src/test/resources folder of each sub-project.
> These files set the default logging level to debug. While this setting is 
> fine when working with individual tests, the output is overwhelming for bulk 
> test runs.
> The first requested change is to set the default logging level to error.
> The existing config files are usually called "logback.xml." Change the name 
> of test files to "logback-test.xml" to make clear that they are, in fact, 
> test configs.
> The {{exec/java-exec/src/test/resources/logback.xml}} config file is a full 
> version of Drill's production config file. Replace this with a config 
> suitable for testing (that is, the same as other modules.)
> The java-exec project includes a production-like config file in its non-test 
> sources: {{exec/java-exec/src/main/resources/logback.xml}}. Remove this as it 
> is not needed. (Instead, rely on the one shipped in the distribution 
> subsystem, which is the one copied to the Drill distribution.)
> Since Logback complains bitterly (via many log messages) when it cannot find 
> a configuration file (and each sub-module must have its own test 
> configuration), add missing logging configuration files:
> * exec/memory/base/src/test/resources/logback-test.xml
> * logical/src/test/resources/logback-test.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5255) Unit tests fail due to CTTAS temporary name space checks

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883923#comment-15883923
 ] 

ASF GitHub Bot commented on DRILL-5255:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/759
  
+1


> Unit tests fail due to CTTAS temporary name space checks
> 
>
> Key: DRILL-5255
> URL: https://issues.apache.org/jira/browse/DRILL-5255
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> Drill can operate in embedded mode. In this mode, no storage plugin 
> definitions other than the defaults may be present. In particular, when using 
> the Drill test framework, only those storage plugins defined in the Drill 
> code are available.
> Yet, Drill checks for the existence of the dfs.tmp plugin definition (as 
> named by the {{drill.exec.default_temporary_workspace}} parameter. Because 
> this plugin is not defined, an exception occurs:
> {code}
> org.apache.drill.common.exceptions.UserException: PARSE ERROR: Unable to 
> create or drop tables/views. Schema [dfs.tmp] is immutable.
> [Error Id: 792d4e5d-3f31-4f38-8bb4-d108f1a808f6 ]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>   at 
> org.apache.drill.exec.planner.sql.SchemaUtilites.resolveToMutableDrillSchema(SchemaUtilites.java:184)
>   at 
> org.apache.drill.exec.planner.sql.SchemaUtilites.getTemporaryWorkspace(SchemaUtilites.java:201)
>   at 
> org.apache.drill.exec.server.Drillbit.validateTemporaryWorkspace(Drillbit.java:264)
>   at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:135)
>   at 
> org.apache.drill.test.ClusterFixture.startDrillbits(ClusterFixture.java:207)
>   ...
> {code}
> Expected that either a configuration would exist that would use the default 
> /tmp/drill location, or that the check for {{drill.tmp}} would be deferred 
> until it is actually required (such as when executing a CTTAS statement.)
> It seemed that the test framework must be altered to work around this problem 
> by defining the necessary workspace. Unfortunately, the Drillbit must start 
> before we can define the workspace needed for the Drillbit to start. So, this 
> workaround is not possible.
> Further, users of the embedded Drillbit may not know to do this configuration.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5196) Could not run a single MongoDB unit test case through command line or IDE

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883921#comment-15883921
 ] 

ASF GitHub Bot commented on DRILL-5196:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/741
  
+1


> Could not run a single MongoDB unit test case through command line or IDE
> -
>
> Key: DRILL-5196
> URL: https://issues.apache.org/jira/browse/DRILL-5196
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>  Labels: ready-to-commit
>
> Could not run a single MongoDB's unit test through IDE or command line. The 
> reason is when running a single test case, the MongoDB instance did not get 
> started thus a 'table not found' error for 'mongo.employee.empinfo' would be 
> raised.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5260) Refinements to new "Cluster Fixture" test framework

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883917#comment-15883917
 ] 

ASF GitHub Bot commented on DRILL-5260:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/753
  
+1


> Refinements to new "Cluster Fixture" test framework
> ---
>
> Key: DRILL-5260
> URL: https://issues.apache.org/jira/browse/DRILL-5260
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10
>
>
> Roll-up of a number of enhancements to the cluster fixture framework.
> * Config option to suppress printing of CSV and other output. (Allows 
> printing for single tests, not printing when running from Maven.)
> * Parsing of query profiles to extract plan and run time information.
> * Fix bug in log fixture when enabling logging for a package.
> * Improved ZK support.
> * Set up the new CTTAS default temporary workspace for tests.
> * Revise TestDrillbitResiliance to use the new framework.
> * Revise TestWindowFrame to to use the new framework.
> * Revise TestMergeJoinWithSchemaChanges to use the new framework.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5274) Exception thrown in Drillbit shutdown in UDF cleanup code

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883915#comment-15883915
 ] 

ASF GitHub Bot commented on DRILL-5274:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/760
  
+1


> Exception thrown in Drillbit shutdown in UDF cleanup code
> -
>
> Key: DRILL-5274
> URL: https://issues.apache.org/jira/browse/DRILL-5274
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: ready-to-commit
>
> I ran a very simple query: a single-line text file in an embedded Drillbit. 
> The UDF directory was placed in /tmp. During the run, the directory was 
> deleted. On Drillbit shutdown, the following occurred:
> {code}
> 25328 DEBUG [main] [org.apache.drill.exec.server.Drillbit] - Shutdown begun.
> 26344 INFO [pool-1-thread-2] [org.apache.drill.exec.rpc.data.DataServer] - 
> closed eventLoopGroup io.netty.channel.nio.NioEventLoopGroup@7d1c0d85 in 1007 
> ms
> 26345 INFO [pool-1-thread-1] [org.apache.drill.exec.rpc.user.UserServer] - 
> closed eventLoopGroup io.netty.channel.nio.NioEventLoopGroup@7cdb3b56 in 1008 
> ms
> 26345 INFO [pool-1-thread-1] [org.apache.drill.exec.service.ServiceEngine] - 
> closed userServer in 1009 ms
> 26345 INFO [pool-1-thread-2] [org.apache.drill.exec.service.ServiceEngine] - 
> closed dataPool in 1009 ms
> 26356 WARN [main] [org.apache.drill.exec.server.Drillbit] - Failure on close()
> java.lang.IllegalArgumentException: /tmp/drill/udf/udf/local does not exist
>   at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1637) 
> ~[commons-io-2.4.jar:2.4]
>   at 
> org.apache.drill.exec.expr.fn.FunctionImplementationRegistry.close(FunctionImplementationRegistry.java:469)
>  ~[classes/:na]
>   at 
> org.apache.drill.exec.server.DrillbitContext.close(DrillbitContext.java:209) 
> ~[classes/:na]
>   at org.apache.drill.exec.work.WorkManager.close(WorkManager.java:152) 
> ~[classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
> ~[classes/:na]
>   at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
> ~[classes/:na]
>   at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:171) 
> ~[classes/:na]
> ...
> {code}
> The following patch makes the problem go away, but I'm not sure if the above 
> is an indication of deeper problems.
> {code}
> public class FunctionImplementationRegistry implements FunctionLookupContext, 
> AutoCloseable {
>   ...
>   public void close() {
> if (deleteTmpDir) {
>   ...
> } else {
>   try {
> File dir = new File(localUdfDir.toUri().getPath());
> if (dir.exists()) {
>   FileUtils.cleanDirectory(dir);
> }
>   ...
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5275) Sort spill serialization is slow due to repeated buffer allocations

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883914#comment-15883914
 ] 

ASF GitHub Bot commented on DRILL-5275:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/754
  
+1


> Sort spill serialization is slow due to repeated buffer allocations
> ---
>
> Key: DRILL-5275
> URL: https://issues.apache.org/jira/browse/DRILL-5275
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> Drill provides a sort operator that spills to disk. The spill and read 
> operations use the serialization code in the 
> {{VectorAccessibleSerializable}}. This code, in turn, uses the 
> {{DrillBuf.getBytes()}} method to write to an output stream. (Yes, the "get" 
> method writes, and the "write" method reads...)
> The DrillBuf method turns around and calls the UDLE method that does:
> {code}
> byte[] tmp = new byte[length];
> PlatformDependent.copyMemory(addr(index), tmp, 0, length);
> out.write(tmp);
> {code}
> That is, for each write the code allocates a heap buffer. Since Drill buffers 
> can be quite large (4, 8, 16 MB or larger), the above rapidly fills the heap 
> and causes GC.
> The result is slow performance. On a Mac, with an SSD that can do 700 MB/s of 
> I/O, we get only about 40 MB/s. Very likely because of excessive CPU cost and 
> GC.
> The solution is to allocate a single read or write buffer, then use that same 
> buffer over and over when reading or writing. This must be done in 
> {{VectorAccessibleSerializable}} as it is a per-thread class that has 
> visibility to all the buffers to be written.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5273) CompliantTextReader exhausts 4 GB memory when reading 5000 small files

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883916#comment-15883916
 ] 

ASF GitHub Bot commented on DRILL-5273:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/750
  
+1


> CompliantTextReader exhausts 4 GB memory when reading 5000 small files
> --
>
> Key: DRILL-5273
> URL: https://issues.apache.org/jira/browse/DRILL-5273
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> A test case was created that consists of 5000 text files, each with a single 
> line with the file number: 1 to 5001. Each file has a single record, and at 
> most 4 characters per record.
> Run the following query:
> {code}
> SELECT * FROM `dfs.data`.`5000files/text
> {code}
> The query will fail with an OOM in the scan batch on around record 3700 on a 
> Mac with 4GB of direct memory.
> The code to read records in {ScanBatch} is complex. The following appears to 
> occur:
> * Iterate over the record readers for each file.
> * For each, call setup
> The setup code is:
> {code}
>   public void setup(OperatorContext context, OutputMutator outputMutator) 
> throws ExecutionSetupException {
> oContext = context;
> readBuffer = context.getManagedBuffer(READ_BUFFER);
> whitespaceBuffer = context.getManagedBuffer(WHITE_SPACE_BUFFER);
> {code}
> The two buffers are in direct memory. There is no code that releases the 
> buffers.
> The sizes are:
> {code}
>   private static final int READ_BUFFER = 1024*1024;
>   private static final int WHITE_SPACE_BUFFER = 64*1024;
> = 1,048,576 + 65536 = 1,114,112
> {code}
> This is exactly the amount of memory that accumulates per call to 
> {{ScanBatch.next()}}
> {code}
> Ctor: 0  -- Initial memory in constructor
> Init setup: 1114112  -- After call to first record reader setup
> Entry Memory: 1114112  -- first next() call, returns one record
> Entry Memory: 1114112  -- second next(), eof and start second reader
> Entry Memory: 2228224 -- third next(), second reader returns EOF
> ...
> {code}
> If we leak 1 MB per file, with 5000 files we would leak 5 GB of memory, which 
> would explain the OOM when given only 4 GB.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5195:
---
Labels: ready-to-commit  (was: )

> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>  Labels: ready-to-commit
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883855#comment-15883855
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103063526
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/FragmentWrapper.java
 ---
@@ -49,58 +51,135 @@ public String getId() {
 return String.format("fragment-%s", major.getMajorFragmentId());
   }
 
-  public static final String[] FRAGMENT_OVERVIEW_COLUMNS = {"Major 
Fragment", "Minor Fragments Reporting",
-"First Start", "Last Start", "First End", "Last End", "Min Runtime", 
"Avg Runtime", "Max Runtime", "Last Update",
-"Last Progress", "Max Peak Memory"};
+  public static final String[] ACTIVE_FRAGMENT_OVERVIEW_COLUMNS = {"Major 
Fragment", "Minor Fragments Reporting",
+"First Start", "Last Start", "First End", "Last End", "Min Runtime", 
"Avg Runtime", "Max Runtime", "% Busy",
+"Last Update", "Last Progress", "Max Peak Memory"};
+
+  public static final String[] ACTIVE_FRAGMENT_OVERVIEW_COLUMNS_TOOLTIP = 
{null, "# Minor Fragments Spawned",
+null, null, null, null, "Shortest duration of a fragment", "Avg 
duration of a fragment", "Longest duration of a fragment", "%time Fragments 
were Busy",
+"Last time a running fragment's status was updated", "Last time we 
heard from a running fragment", null};
 
   // Not including Major Fragment ID and Minor Fragments Reporting
-  public static final int NUM_NULLABLE_OVERVIEW_COLUMNS = 
FRAGMENT_OVERVIEW_COLUMNS.length - 2;
+  public static final int NUM_NULLABLE_ACTIVE_OVERVIEW_COLUMNS = 
ACTIVE_FRAGMENT_OVERVIEW_COLUMNS.length - 2;
 
   public void addSummary(TableBuilder tb) {
 // Use only minor fragments that have complete profiles
 // Complete iff the fragment profile has at least one operator 
profile, and start and end times.
 final List complete = new ArrayList<>(
   Collections2.filter(major.getMinorFragmentProfileList(), 
Filters.hasOperatorsAndTimes));
 
-tb.appendCell(new OperatorPathBuilder().setMajor(major).build(), null);
-tb.appendCell(complete.size() + " / " + 
major.getMinorFragmentProfileCount(), null);
+tb.appendCell(new OperatorPathBuilder().setMajor(major).build(), null, 
null);
+tb.appendCell(complete.size() + " / " + 
major.getMinorFragmentProfileCount(), null, null);
 
 // If there are no stats to aggregate, create an empty row
 if (complete.size() < 1) {
-  tb.appendRepeated("", null, NUM_NULLABLE_OVERVIEW_COLUMNS);
+  tb.appendRepeated("", null, NUM_NULLABLE_ACTIVE_OVERVIEW_COLUMNS, 
null);
   return;
 }
 
 final MinorFragmentProfile firstStart = Collections.min(complete, 
Comparators.startTime);
 final MinorFragmentProfile lastStart = Collections.max(complete, 
Comparators.startTime);
-tb.appendMillis(firstStart.getStartTime() - start, null);
-tb.appendMillis(lastStart.getStartTime() - start, null);
+tb.appendMillis(firstStart.getStartTime() - start, null, null);
+tb.appendMillis(lastStart.getStartTime() - start, null, null);
 
 final MinorFragmentProfile firstEnd = Collections.min(complete, 
Comparators.endTime);
 final MinorFragmentProfile lastEnd = Collections.max(complete, 
Comparators.endTime);
-tb.appendMillis(firstEnd.getEndTime() - start, null);
-tb.appendMillis(lastEnd.getEndTime() - start, null);
+tb.appendMillis(firstEnd.getEndTime() - start, null, null);
+tb.appendMillis(lastEnd.getEndTime() - start, null, null);
 
-long total = 0;
+long totalDuration = 0L;
+double totalProcessInMillis = 0.0d;
+double totalWaitInMillis = 0.0d;
 for (final MinorFragmentProfile p : complete) {
-  total += p.getEndTime() - p.getStartTime();
+  totalDuration += p.getEndTime() - p.getStartTime();
+  //Capture Busy & Wait Time
+  List opProfileList = p.getOperatorProfileList();
+  for (OperatorProfile operatorProfile : opProfileList) {
+totalProcessInMillis += operatorProfile.getProcessNanos()/1E6;
+totalWaitInMillis += operatorProfile.getWaitNanos()/1E6;
--- End diff --

Actually, accumulate in nanos, but round the total nanos to get ms. I 
probably missed the aggregation detail.


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web 

[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883853#comment-15883853
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/756
  
@paul-rogers Incorporated all changes. 


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883852#comment-15883852
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103063332
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/FragmentWrapper.java
 ---
@@ -35,6 +38,10 @@
 public class FragmentWrapper {
   private final MajorFragmentProfile major;
   private final long start;
+  private final Locale currentLocale = Locale.getDefault();
+  private final String pattern = "dd-MMM- HH:mm:ss";
+  private final SimpleDateFormat simpleDateFormat = new SimpleDateFormat(
--- End diff --

Dropping this as it's just a tooltip


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883847#comment-15883847
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user sohami commented on the issue:

https://github.com/apache/drill/pull/752
  
Thanks for the change. LGTM. +1


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-24 Thread Sorabh Hamirwasia (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-5258:
-
Labels: ready-to-commit  (was: )

> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5260) Refinements to new "Cluster Fixture" test framework

2017-02-24 Thread Sorabh Hamirwasia (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-5260:
-
Labels: ready-to-commit  (was: )

> Refinements to new "Cluster Fixture" test framework
> ---
>
> Key: DRILL-5260
> URL: https://issues.apache.org/jira/browse/DRILL-5260
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10
>
>
> Roll-up of a number of enhancements to the cluster fixture framework.
> * Config option to suppress printing of CSV and other output. (Allows 
> printing for single tests, not printing when running from Maven.)
> * Parsing of query profiles to extract plan and run time information.
> * Fix bug in log fixture when enabling logging for a package.
> * Improved ZK support.
> * Set up the new CTTAS default temporary workspace for tests.
> * Revise TestDrillbitResiliance to use the new framework.
> * Revise TestWindowFrame to to use the new framework.
> * Revise TestMergeJoinWithSchemaChanges to use the new framework.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883840#comment-15883840
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103062066
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/OperatorWrapper.java
 ---
@@ -179,12 +207,47 @@ public String getMetricsTable() {
   }
   for (final Number value : values) {
 if (value != null) {
-  builder.appendFormattedNumber(value, null);
+  builder.appendFormattedNumber(value);
 } else {
-  builder.appendCell("", null);
+  builder.appendCell("");
 }
   }
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String OperatorID = "Operator ID";
+static final String Type = "Type";
+static final String AvgSetupTime = "Avg Setup Time";
+static final String MaxSetupTime = "Max Setup Time";
+static final String AvgProcessTime = "Avg Process Time";
+static final String MaxProcessTime = "Max Process Time";
+static final String MinWaitTime = "Min Wait Time";
+static final String AvgWaitTime = "Avg Wait Time";
+static final String MaxWaitTime = "Max Wait Time";
+static final String PercentFragmentTime = "% Fragment Time";
+static final String PercentQueryTime = "% Query Time";
+static final String Rows = "Rows";
+static final String AvgPeakMemory = "Avg Peak Memory";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String OperatorID = "Operator ID";
+static final String Type = "Operator Type";
+static final String AvgSetupTime = "Average Time in setting up 
fragments";
+static final String MaxSetupTime = "Longest Time a fragment took in 
setup";
+static final String AvgProcessTime = "Average processing time for a 
fragment";
+static final String MaxProcessTime = "Longest Time a fragment took to 
process";
+static final String MinWaitTime = "Shortest time a fragment spent in 
waiting for data";
+static final String AvgWaitTime = "Average wait time for a fragment";
+static final String MaxWaitTime = "Longest Time a fragment spent in 
waiting data";
+static final String PercentFragmentTime = "Percentage of the total 
fragment time that was spent on the operator";
+static final String PercentQueryTime = "Percentage of the total query 
time that was spent on the operator";
+static final String Rows = "Rows emitted by the operator";
--- End diff --

Went through some profiles in more detail. You were right about the row 
counts. HashAgg and HashJoins mention the rows processed. The following PROJECT 
operators indicate what the actual rowcounts were. 
I'll just revert to what you suggested. 


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-5242) The UI breaks when trying to render profiles having unknown metrics

2017-02-24 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua closed DRILL-5242.
---

Closed https://github.com/apache/drill/pull/742 after commit to Apache Master

> The UI breaks when trying to render profiles having unknown metrics
> ---
>
> Key: DRILL-5242
> URL: https://issues.apache.org/jira/browse/DRILL-5242
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> When profiles are generated using a fork of Drill that has introduced new 
> metrics, the server of the parent branch will fail in rendering the operator 
> metrics correctly. 
> The workaround should be to simply skip unknown metrics. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-5230) Translation of millisecond duration into hours is incorrect

2017-02-24 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua closed DRILL-5230.
---
Assignee: Kunal Khatua

Closed https://github.com/apache/drill/pull/739 after commit to Apache Master

> Translation of millisecond duration into hours is incorrect
> ---
>
> Key: DRILL-5230
> URL: https://issues.apache.org/jira/browse/DRILL-5230
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>  Labels: easyfix, ready-to-commit
> Fix For: 1.10.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The method 
> {code:JAVA}org.apache.drill.exec.server.rest.profile.TableBuilder.appendMillis(long,
>  String){code}
> has a bug where the human readable translation of a 1+ hr duration in 
> milliseconds is reported incorrectly. 
> This has to do with the {code:JAVA}SimpleDateFormat.format() {code} method 
> incorectly translating it. 
> For e.g.
> {code:JAVA}
> long x = 4545342L; //1 hour 15 min 45.342 sec
> public void appendMillis(x, null);
> {code}
> This formats the value as {noformat}17h15m{noformat} instead of 
> {noformat}1h15m{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5260) Refinements to new "Cluster Fixture" test framework

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883819#comment-15883819
 ] 

ASF GitHub Bot commented on DRILL-5260:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/753#discussion_r103060090
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ProfileParser.java ---
@@ -138,9 +414,208 @@ public long getMetric(int id) {
 }
   }
 
-  public Map getOpInfo( ) {
+  /**
+   * Information about an operator definition: the plan-time information
+   * that appears in the plan portion of the profile. Also holds the
+   * "actuals" from the minor fragment portion of the profile.
+   * Allows integrating the "planned" vs. "actual" performance of the
+   * query.
+   */
+
+  public static class OpDefInfo {
+public String opName;
+public boolean isInferred;
+public int majorId;
+public int stepId;
+public String args;
+public List columns;
+public int globalLevel;
+public int localLevel;
+public int id;
+public int branchId;
+public boolean isBranchRoot;
+public double estMemoryCost;
+public double estNetCost;
+public double estIOCost;
+public double estCpuCost;
+public double estRowCost;
+public double estRows;
+public String name;
+public long actualMemory;
+public int actualBatches;
+public long actualRows;
+public OpDefInfo inferredParent;
+public List opExecs = new ArrayList<>( );
+public List children = new ArrayList<>( );
+
+// 00-00Screen : rowType = RecordType(VARCHAR(10) Year, 
VARCHAR(65536) Month, VARCHAR(100) Devices, VARCHAR(100) Tier, VARCHAR(100) 
LOB, CHAR(10) Gateway, BIGINT Day, BIGINT Hour, INTEGER Week, VARCHAR(100) 
Week_end_date, BIGINT Usage_Cnt): \
+// rowcount = 100.0, cumulative cost = {7.42124276972414E9 rows, 
7.663067406383167E10 cpu, 0.0 io, 2.24645048816E10 network, 2.692766612982188E8 
memory}, id = 129302
+//
+// 00-01  Project(Year=[$0], Month=[$1], Devices=[$2], Tier=[$3], 
LOB=[$4], Gateway=[$5], Day=[$6], Hour=[$7], Week=[$8], Week_end_date=[$9], 
Usage_Cnt=[$10]) :
+// rowType = RecordType(VARCHAR(10) Year, VARCHAR(65536) Month, 
VARCHAR(100) Devices, VARCHAR(100) Tier, VARCHAR(100) LOB, CHAR(10) Gateway, 
BIGINT Day, BIGINT Hour, INTEGER Week, VARCHAR(100) Week_end_date, BIGINT 
Usage_Cnt): rowcount = 100.0, cumulative cost = {7.42124275972414E9 rows, 
7.663067405383167E10 cpu, 0.0 io, 2.24645048816E10 network, 2.692766612982188E8 
memory}, id = 129301
+
+public OpDefInfo(String plan) {
+  Pattern p = Pattern.compile( 
"^(\\d+)-(\\d+)(\\s+)(\\w+)(?:\\((.*)\\))?\\s*:\\s*(.*)$" );
+  Matcher m = p.matcher(plan);
+  if (!m.matches()) {
+throw new IllegalStateException( "Could not parse plan: " + plan );
+  }
+  majorId = Integer.parseInt(m.group(1));
+  stepId = Integer.parseInt(m.group(2));
+  name = m.group(4);
+  args = m.group(5);
+  String tail = m.group(6);
+  String indent = m.group(3);
+  globalLevel = (indent.length() - 4) / 2;
+
+  p = Pattern.compile("rowType = RecordType\\((.*)\\): (rowcount .*)");
+  m = p.matcher(tail);
--- End diff --

As it turns out, different scan operators use a different syntax for this 
info. This syntax works for the scan operators used in tests thus far. We'll 
need to extend this for others.

Actually, we'll want to combine this functionality with the full-blown 
parser used earlier. This was a quick & dirty on that predated the fancier one. 
And, as noted, the real solution is to include the info in the profile in JSON 
to avoid the need to fiddle with parsing the text format.


> Refinements to new "Cluster Fixture" test framework
> ---
>
> Key: DRILL-5260
> URL: https://issues.apache.org/jira/browse/DRILL-5260
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> Roll-up of a number of enhancements to the cluster fixture framework.
> * Config option to suppress printing of CSV and other output. (Allows 
> printing for single tests, not printing when running from Maven.)
> * Parsing of query profiles to extract plan and run time information.
> * Fix bug in log fixture when enabling logging for a package.
> * Improved ZK support.
> * Set up the new CTTAS default temporary workspace for tests.
> * Revise TestDrillbitResiliance to use 

[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883810#comment-15883810
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r103057839
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/BooleanGen.java 
---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.mock;
+
+import java.util.Random;
+
+import org.apache.drill.exec.vector.BitVector;
+import org.apache.drill.exec.vector.ValueVector;
+
+public class BooleanGen implements FieldGen {
+
+  Random rand = new Random( );
+
+  @Override
+  public void setup(ColumnDef colDef) { }
+
+  public int value( ) {
--- End diff --

Fixed.


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883809#comment-15883809
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r103057873
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockGroupScanPOP.java
 ---
@@ -75,20 +76,50 @@
*/
 
   private boolean extended;
+  private ScanStats scanStats = ScanStats.TRIVIAL_TABLE;
 
   @JsonCreator
   public MockGroupScanPOP(@JsonProperty("url") String url,
-  @JsonProperty("extended") Boolean extended,
   @JsonProperty("entries") List readEntries) {
 super((String) null);
 this.readEntries = readEntries;
 this.url = url;
-this.extended = extended == null ? false : extended;
+
+// Compute decent row-count stats for this mock data source so that
+// the planner is "fooled" into thinking that this operator wil do
+// disk I/O.
+
+int rowCount = 0;
+int rowWidth = 0;
+for (MockScanEntry entry : readEntries) {
+  rowCount += entry.getRecords();
+  int width = 0;
+  if (entry.getTypes() == null) {
+width = 50;
+  } else {
+for (MockColumn col : entry.getTypes()) {
+  int colWidth = 0;
+  if (col.getWidthValue() == 0) {
+colWidth = TypeHelper.getSize(col.getMajorType());
+  } else {
+colWidth = col.getWidthValue();
+  }
+  colWidth *= col.getRepeatCount();
+  width += colWidth;
+}
+  }
+  rowWidth = Math.max(rowWidth, width);
--- End diff --

Revised names and added comments to make clear what's going on.


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883811#comment-15883811
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r103058784
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/ExpressionInterpreterTest.java
 ---
@@ -124,7 +125,7 @@ public void interpreterDateTest() throws Exception {
 final BitControl.PlanFragment planFragment = 
BitControl.PlanFragment.getDefaultInstance();
 final QueryContextInformation queryContextInfo = 
planFragment.getContext();
 final inttimeZoneIndex = 
queryContextInfo.getTimeZone();
-final org.joda.time.DateTimeZone timeZone = 
org.joda.time.DateTimeZone.forID(org.apache.drill.exec.expr.fn.impl.DateUtility.getTimeZone(timeZoneIndex));
+final DateTimeZone timeZone =
DateTimeZone.forID(org.apache.drill.exec.expr.fn.impl.DateUtility.getTimeZone(timeZoneIndex));
--- End diff --

Original code, but fixed.


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883812#comment-15883812
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r103058623
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/package-info.java 
---
@@ -60,14 +62,26 @@
  * The mode is one of the supported Drill
  * {@link DataMode} names: usually OPTIONAL or 
REQUIRED.
  * 
+ * 
+ * Recent extensions include:
+ * 
+ * repeat in either the "entry" or "record" elements allow
--- End diff --

Yes. Added the property to MockScanEntry. Need to add it to the 
implementation as well, which is planned, but not yet complete.


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883813#comment-15883813
 ] 

ASF GitHub Bot commented on DRILL-5258:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/752#discussion_r103057899
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockStorageEngine.java
 ---
@@ -89,14 +85,30 @@ public boolean supportsRead() {
 return true;
   }
 
-//  public static class ImplicitTable extends DynamicDrillTable {
-//
-//public ImplicitTable(StoragePlugin plugin, String storageEngineName,
-//Object selection) {
-//  super(plugin, storageEngineName, selection);
-//}
-//
-//  }
+  /**
+   * Resolves table names within the mock data source. Tables can be of 
two forms:
+   * 
+   * _
+   * 
+   * Where the "name" can be anything, "n" is the number of rows, and 
"unit" is
+   * the units for the row count: non, K (thousand) or M (million).
+   * 
+   * The above form generates a table directly with no other information 
needed.
+   * Column names must be provided, and must be of the form:
+   * 
+   * _
+   * 
+   * Where the name can be anything, the type must be i (integer), d 
(double)
+   * or s (string, AKA VarChar). The length is needed only for string 
fields.
--- End diff --

Fixed.


> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5208) Finding path to java executable should be deterministic

2017-02-24 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883804#comment-15883804
 ] 

Paul Rogers commented on DRILL-5208:


[~knguyen], when you get a 1.10 RC, please try your scenario again on your two 
machines (assuming that their Java setup has not changed.)

> Finding path to java executable should be deterministic
> ---
>
> Key: DRILL-5208
> URL: https://issues.apache.org/jira/browse/DRILL-5208
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.10.0
>Reporter: Krystal
>Assignee: Paul Rogers
>Priority: Minor
>
> Command to find JAVA in drill-config.sh is not deterministic.  
> drill-config.sh uses the following command to find JAVA:
> JAVA=`find -L "$JAVA_HOME" -name $JAVA_BIN -type f | head -n 1`
> On one of my node the following command returned 2 entries:
> find -L $JAVA_HOME -name java -type f
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> /usr/local/java/jdk1.7.0_67/bin/java
> On another node, the same command returned entries in different order:
> find -L $JAVA_HOME -name java -type f
> /usr/local/java/jdk1.7.0_67/bin/java
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> The complete command picks the first one returned which may not be the same 
> on each node:
> find -L $JAVA_HOME -name java -type f | head -n 1
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> If JAVA_HOME is found, we should just append the "bin/java" to the path"
> JAVA=$JAVA_HOME/bin/java



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.

2017-02-24 Thread Kunal Khatua (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883799#comment-15883799
 ] 

Kunal Khatua commented on DRILL-4990:
-

[~agirish] Can you check why the unit tests fail for WindowsVM ?

> Use new HDFS API access instead of listStatus to check if users have 
> permissions to access workspace.
> -
>
> Key: DRILL-4990
> URL: https://issues.apache.org/jira/browse/DRILL-4990
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.9.0
>
>
> For every query, we build the schema tree 
> (runSQL->getPlan->getNewDefaultSchema->getRootSchema). All workspaces in all 
> storage plugins are checked and are added to the schema tree if they are 
> accessible by the user who initiated the query.  For file system plugin, 
> listStatus API is used to check if  the workspace is accessible or not 
> (WorkspaceSchemaFactory.accessible) by the user.  The idea seem to be if the 
> user does not have access to file(s) in the workspace, listStatus will 
> generate an exception and we return false. But, listStatus (which lists all 
> the entries of a directory) is an expensive operation when there are large 
> number of files in the directory. A new API is added in Hadoop 2.6 called 
> access (HDFS-6570) which provides the ability to check if the user has 
> permissions on a file/directory.  Use this new API instead of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5034) Select timestamp from hive generated parquet always return in UTC

2017-02-24 Thread Karthikeyan Manivannan (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthikeyan Manivannan updated DRILL-5034:
--
Labels: ready-to-commit  (was: )

> Select timestamp from hive generated parquet always return in UTC
> -
>
> Key: DRILL-5034
> URL: https://issues.apache.org/jira/browse/DRILL-5034
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Krystal
>Assignee: Vitalii Diravka
>  Labels: ready-to-commit
>
> commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904
> Reading timestamp data against a hive parquet table from drill automatically 
> converts the timestamp data to UTC. 
> {code}
> SELECT TIMEOFDAY() FROM (VALUES(1));
> +--+
> |EXPR$0|
> +--+
> | 2016-11-10 12:33:26.547 America/Los_Angeles  |
> +--+
> {code}
> data schema:
> {code}
> message hive_schema {
>   optional int32 voter_id;
>   optional binary name (UTF8);
>   optional int32 age;
>   optional binary registration (UTF8);
>   optional fixed_len_byte_array(3) contributions (DECIMAL(6,2));
>   optional int32 voterzone;
>   optional int96 create_timestamp;
>   optional int32 create_date (DATE);
> }
> {code}
> Using drill-1.8, the returned timestamps match the table data:
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> `/user/hive/warehouse/voter_hive_parquet` limit 5;
> ++
> | EXPR$0 |
> ++
> | 2016-10-23 20:03:58.0  |
> | null   |
> | 2016-09-09 12:01:18.0  |
> | 2017-03-06 20:35:55.0  |
> | 2017-01-20 22:32:43.0  |
> ++
> 5 rows selected (1.032 seconds)
> {code}
> If the user timzone is changed to UTC, then the timestamp data is returned in 
> UTC time.
> Using drill-1.9, the returned timestamps got converted to UTC eventhough the 
> user timezone is in PST.
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5;
> ++
> | EXPR$0 |
> ++
> | 2016-10-24 03:03:58.0  |
> | null   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> ++
> {code}
> {code}
> alter session set `store.parquet.reader.int96_as_timestamp`=true;
> +---+---+
> |  ok   |  summary  |
> +---+---+
> | true  | store.parquet.reader.int96_as_timestamp updated.  |
> +---+---+
> select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` 
> limit 5;
> ++
> |create_timestamp|
> ++
> | 2016-10-24 03:03:58.0  |
> | null   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> ++
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883735#comment-15883735
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/756
  
@paul-rogers 
Incorporated most changes... except:
https://github.com/apache/drill/pull/756#discussion_r103000211
https://github.com/apache/drill/pull/756#discussion_r103005037


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883727#comment-15883727
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103053243
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/OperatorWrapper.java
 ---
@@ -179,12 +207,47 @@ public String getMetricsTable() {
   }
   for (final Number value : values) {
 if (value != null) {
-  builder.appendFormattedNumber(value, null);
+  builder.appendFormattedNumber(value);
 } else {
-  builder.appendCell("", null);
+  builder.appendCell("");
 }
   }
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String OperatorID = "Operator ID";
+static final String Type = "Type";
+static final String AvgSetupTime = "Avg Setup Time";
+static final String MaxSetupTime = "Max Setup Time";
+static final String AvgProcessTime = "Avg Process Time";
+static final String MaxProcessTime = "Max Process Time";
+static final String MinWaitTime = "Min Wait Time";
+static final String AvgWaitTime = "Avg Wait Time";
+static final String MaxWaitTime = "Max Wait Time";
+static final String PercentFragmentTime = "% Fragment Time";
+static final String PercentQueryTime = "% Query Time";
+static final String Rows = "Rows";
+static final String AvgPeakMemory = "Avg Peak Memory";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String OperatorID = "Operator ID";
+static final String Type = "Operator Type";
+static final String AvgSetupTime = "Average Time in setting up 
fragments";
+static final String MaxSetupTime = "Longest Time a fragment took in 
setup";
+static final String AvgProcessTime = "Average processing time for a 
fragment";
+static final String MaxProcessTime = "Longest Time a fragment took to 
process";
+static final String MinWaitTime = "Shortest time a fragment spent in 
waiting for data";
+static final String AvgWaitTime = "Average wait time for a fragment";
+static final String MaxWaitTime = "Longest Time a fragment spent in 
waiting data";
+static final String PercentFragmentTime = "Percentage of the total 
fragment time that was spent on the operator";
+static final String PercentQueryTime = "Percentage of the total query 
time that was spent on the operator";
+static final String Rows = "Rows emitted by the operator";
--- End diff --

Rows emitted by scans is incomplete. Operators like Filter, HashAgg, etc 
will also change the number of outgoing rows wrt the number of incoming rows. 
I think we should leave this as is 'Rows emitter by the operator'


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5260) Refinements to new "Cluster Fixture" test framework

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883713#comment-15883713
 ] 

ASF GitHub Bot commented on DRILL-5260:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/753#discussion_r103052249
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/ProfileParser.java ---
@@ -138,9 +414,208 @@ public long getMetric(int id) {
 }
   }
 
-  public Map getOpInfo( ) {
+  /**
+   * Information about an operator definition: the plan-time information
+   * that appears in the plan portion of the profile. Also holds the
+   * "actuals" from the minor fragment portion of the profile.
+   * Allows integrating the "planned" vs. "actual" performance of the
+   * query.
+   */
+
+  public static class OpDefInfo {
+public String opName;
+public boolean isInferred;
+public int majorId;
+public int stepId;
+public String args;
+public List columns;
+public int globalLevel;
+public int localLevel;
+public int id;
+public int branchId;
+public boolean isBranchRoot;
+public double estMemoryCost;
+public double estNetCost;
+public double estIOCost;
+public double estCpuCost;
+public double estRowCost;
+public double estRows;
+public String name;
+public long actualMemory;
+public int actualBatches;
+public long actualRows;
+public OpDefInfo inferredParent;
+public List opExecs = new ArrayList<>( );
+public List children = new ArrayList<>( );
+
+// 00-00Screen : rowType = RecordType(VARCHAR(10) Year, 
VARCHAR(65536) Month, VARCHAR(100) Devices, VARCHAR(100) Tier, VARCHAR(100) 
LOB, CHAR(10) Gateway, BIGINT Day, BIGINT Hour, INTEGER Week, VARCHAR(100) 
Week_end_date, BIGINT Usage_Cnt): \
+// rowcount = 100.0, cumulative cost = {7.42124276972414E9 rows, 
7.663067406383167E10 cpu, 0.0 io, 2.24645048816E10 network, 2.692766612982188E8 
memory}, id = 129302
+//
+// 00-01  Project(Year=[$0], Month=[$1], Devices=[$2], Tier=[$3], 
LOB=[$4], Gateway=[$5], Day=[$6], Hour=[$7], Week=[$8], Week_end_date=[$9], 
Usage_Cnt=[$10]) :
+// rowType = RecordType(VARCHAR(10) Year, VARCHAR(65536) Month, 
VARCHAR(100) Devices, VARCHAR(100) Tier, VARCHAR(100) LOB, CHAR(10) Gateway, 
BIGINT Day, BIGINT Hour, INTEGER Week, VARCHAR(100) Week_end_date, BIGINT 
Usage_Cnt): rowcount = 100.0, cumulative cost = {7.42124275972414E9 rows, 
7.663067405383167E10 cpu, 0.0 io, 2.24645048816E10 network, 2.692766612982188E8 
memory}, id = 129301
+
+public OpDefInfo(String plan) {
+  Pattern p = Pattern.compile( 
"^(\\d+)-(\\d+)(\\s+)(\\w+)(?:\\((.*)\\))?\\s*:\\s*(.*)$" );
+  Matcher m = p.matcher(plan);
+  if (!m.matches()) {
+throw new IllegalStateException( "Could not parse plan: " + plan );
+  }
+  majorId = Integer.parseInt(m.group(1));
+  stepId = Integer.parseInt(m.group(2));
+  name = m.group(4);
+  args = m.group(5);
+  String tail = m.group(6);
+  String indent = m.group(3);
+  globalLevel = (indent.length() - 4) / 2;
+
+  p = Pattern.compile("rowType = RecordType\\((.*)\\): (rowcount .*)");
+  m = p.matcher(tail);
--- End diff --

Discussed offline.


> Refinements to new "Cluster Fixture" test framework
> ---
>
> Key: DRILL-5260
> URL: https://issues.apache.org/jira/browse/DRILL-5260
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> Roll-up of a number of enhancements to the cluster fixture framework.
> * Config option to suppress printing of CSV and other output. (Allows 
> printing for single tests, not printing when running from Maven.)
> * Parsing of query profiles to extract plan and run time information.
> * Fix bug in log fixture when enabling logging for a package.
> * Improved ZK support.
> * Set up the new CTTAS default temporary workspace for tests.
> * Revise TestDrillbitResiliance to use the new framework.
> * Revise TestWindowFrame to to use the new framework.
> * Revise TestMergeJoinWithSchemaChanges to use the new framework.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5260) Refinements to new "Cluster Fixture" test framework

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883714#comment-15883714
 ] 

ASF GitHub Bot commented on DRILL-5260:
---

Github user sohami commented on the issue:

https://github.com/apache/drill/pull/753
  
Apart from fixing regex in ProfileParser.java, changes looks good to me.
+1


> Refinements to new "Cluster Fixture" test framework
> ---
>
> Key: DRILL-5260
> URL: https://issues.apache.org/jira/browse/DRILL-5260
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.10
>
>
> Roll-up of a number of enhancements to the cluster fixture framework.
> * Config option to suppress printing of CSV and other output. (Allows 
> printing for single tests, not printing when running from Maven.)
> * Parsing of query profiles to extract plan and run time information.
> * Fix bug in log fixture when enabling logging for a package.
> * Improved ZK support.
> * Set up the new CTTAS default temporary workspace for tests.
> * Revise TestDrillbitResiliance to use the new framework.
> * Revise TestWindowFrame to to use the new framework.
> * Revise TestMergeJoinWithSchemaChanges to use the new framework.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5208) Finding path to java executable should be deterministic

2017-02-24 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5208:

Reviewer: Arina Ielchiieva

Assigned Reviewer to [~arina]

> Finding path to java executable should be deterministic
> ---
>
> Key: DRILL-5208
> URL: https://issues.apache.org/jira/browse/DRILL-5208
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.10.0
>Reporter: Krystal
>Assignee: Paul Rogers
>Priority: Minor
>
> Command to find JAVA in drill-config.sh is not deterministic.  
> drill-config.sh uses the following command to find JAVA:
> JAVA=`find -L "$JAVA_HOME" -name $JAVA_BIN -type f | head -n 1`
> On one of my node the following command returned 2 entries:
> find -L $JAVA_HOME -name java -type f
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> /usr/local/java/jdk1.7.0_67/bin/java
> On another node, the same command returned entries in different order:
> find -L $JAVA_HOME -name java -type f
> /usr/local/java/jdk1.7.0_67/bin/java
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> The complete command picks the first one returned which may not be the same 
> on each node:
> find -L $JAVA_HOME -name java -type f | head -n 1
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> If JAVA_HOME is found, we should just append the "bin/java" to the path"
> JAVA=$JAVA_HOME/bin/java



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5208) Finding path to java executable should be deterministic

2017-02-24 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883690#comment-15883690
 ] 

Paul Rogers commented on DRILL-5208:


Solution uses a more structured approach to finding the Java command. If a 
JAVA_HOME is provided, look in bin before using find. If no JAVA_HOME, but the 
java command is on the class path, then use that command (after resolving sym 
links) as the command. Only if these fail will we resort to using "find."

This should ensure we use the JDK java when both a JDK and JRE exist in the 
same JAVA_HOME.

> Finding path to java executable should be deterministic
> ---
>
> Key: DRILL-5208
> URL: https://issues.apache.org/jira/browse/DRILL-5208
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.10.0
>Reporter: Krystal
>Assignee: Paul Rogers
>Priority: Minor
>
> Command to find JAVA in drill-config.sh is not deterministic.  
> drill-config.sh uses the following command to find JAVA:
> JAVA=`find -L "$JAVA_HOME" -name $JAVA_BIN -type f | head -n 1`
> On one of my node the following command returned 2 entries:
> find -L $JAVA_HOME -name java -type f
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> /usr/local/java/jdk1.7.0_67/bin/java
> On another node, the same command returned entries in different order:
> find -L $JAVA_HOME -name java -type f
> /usr/local/java/jdk1.7.0_67/bin/java
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> The complete command picks the first one returned which may not be the same 
> on each node:
> find -L $JAVA_HOME -name java -type f | head -n 1
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> If JAVA_HOME is found, we should just append the "bin/java" to the path"
> JAVA=$JAVA_HOME/bin/java



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5208) Finding path to java executable should be deterministic

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883681#comment-15883681
 ] 

ASF GitHub Bot commented on DRILL-5208:
---

GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/763

DRILL-5208: Finding path to java executable should be deterministic

See DRILL-5208 for background. Instead of using “find” to locate the
java command, we use the any information available, resorting to find
only if the “usual suspects” fails. The result is that we use the JDK
java when available, instead of randomly choosing JDK or JRE java.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-rogers/drill DRILL-5208

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/763.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #763


commit 19c412e9900bf5db80d4d95f9317d8f308669f4b
Author: Paul Rogers 
Date:   2017-02-24T22:53:23Z

DRILL-5208: Finding path to java executable should be deterministic

See DRILL-5208 for background. Instead of using “find” to locate the
java command, we use the any information available, resorting to find
only if the “usual suspects” fails. The result is that we use the JDK
java when available, instead of randomly choosing JDK or JRE java.




> Finding path to java executable should be deterministic
> ---
>
> Key: DRILL-5208
> URL: https://issues.apache.org/jira/browse/DRILL-5208
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.10.0
>Reporter: Krystal
>Assignee: Paul Rogers
>Priority: Minor
>
> Command to find JAVA in drill-config.sh is not deterministic.  
> drill-config.sh uses the following command to find JAVA:
> JAVA=`find -L "$JAVA_HOME" -name $JAVA_BIN -type f | head -n 1`
> On one of my node the following command returned 2 entries:
> find -L $JAVA_HOME -name java -type f
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> /usr/local/java/jdk1.7.0_67/bin/java
> On another node, the same command returned entries in different order:
> find -L $JAVA_HOME -name java -type f
> /usr/local/java/jdk1.7.0_67/bin/java
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> The complete command picks the first one returned which may not be the same 
> on each node:
> find -L $JAVA_HOME -name java -type f | head -n 1
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> If JAVA_HOME is found, we should just append the "bin/java" to the path"
> JAVA=$JAVA_HOME/bin/java



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5114) Rationalize use of Logback logging in unit tests

2017-02-24 Thread Chunhui Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi updated DRILL-5114:
---
Labels: ready-to-commit  (was: )

> Rationalize use of Logback logging in unit tests
> 
>
> Key: DRILL-5114
> URL: https://issues.apache.org/jira/browse/DRILL-5114
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
>
> Drill uses Logback as its logger. The logger is used in several to display 
> some test output. Test output is sent to stdout, rather than a log file. 
> Since Drill also uses Logback, that same configuration sends much Drill 
> logging output to stdout as well, cluttering test output.
> Logback requires that that one Logback config file (either logback.xml or 
> hogback-test.xml) exist on the class path. Tests store the config file in the 
> src/test/resources folder of each sub-project.
> These files set the default logging level to debug. While this setting is 
> fine when working with individual tests, the output is overwhelming for bulk 
> test runs.
> The first requested change is to set the default logging level to error.
> The existing config files are usually called "logback.xml." Change the name 
> of test files to "logback-test.xml" to make clear that they are, in fact, 
> test configs.
> The {{exec/java-exec/src/test/resources/logback.xml}} config file is a full 
> version of Drill's production config file. Replace this with a config 
> suitable for testing (that is, the same as other modules.)
> The java-exec project includes a production-like config file in its non-test 
> sources: {{exec/java-exec/src/main/resources/logback.xml}}. Remove this as it 
> is not needed. (Instead, rely on the one shipped in the distribution 
> subsystem, which is the one copied to the Drill distribution.)
> Since Logback complains bitterly (via many log messages) when it cannot find 
> a configuration file (and each sub-module must have its own test 
> configuration), add missing logging configuration files:
> * exec/memory/base/src/test/resources/logback-test.xml
> * logical/src/test/resources/logback-test.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883503#comment-15883503
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103032276
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/OperatorWrapper.java
 ---
@@ -179,12 +207,47 @@ public String getMetricsTable() {
   }
   for (final Number value : values) {
 if (value != null) {
-  builder.appendFormattedNumber(value, null);
+  builder.appendFormattedNumber(value);
 } else {
-  builder.appendCell("", null);
+  builder.appendCell("");
 }
   }
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String OperatorID = "Operator ID";
+static final String Type = "Type";
+static final String AvgSetupTime = "Avg Setup Time";
+static final String MaxSetupTime = "Max Setup Time";
+static final String AvgProcessTime = "Avg Process Time";
+static final String MaxProcessTime = "Max Process Time";
+static final String MinWaitTime = "Min Wait Time";
+static final String AvgWaitTime = "Avg Wait Time";
+static final String MaxWaitTime = "Max Wait Time";
+static final String PercentFragmentTime = "% Fragment Time";
+static final String PercentQueryTime = "% Query Time";
+static final String Rows = "Rows";
+static final String AvgPeakMemory = "Avg Peak Memory";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String OperatorID = "Operator ID";
+static final String Type = "Operator Type";
+static final String AvgSetupTime = "Average Time in setting up 
fragments";
--- End diff --

+1


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5196) Could not run a single MongoDB unit test case through command line or IDE

2017-02-24 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5196:
---
Labels: ready-to-commit  (was: )

> Could not run a single MongoDB unit test case through command line or IDE
> -
>
> Key: DRILL-5196
> URL: https://issues.apache.org/jira/browse/DRILL-5196
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>  Labels: ready-to-commit
>
> Could not run a single MongoDB's unit test through IDE or command line. The 
> reason is when running a single test case, the MongoDB instance did not get 
> started thus a 'table not found' error for 'mongo.employee.empinfo' would be 
> raised.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5114) Rationalize use of Logback logging in unit tests

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883498#comment-15883498
 ] 

ASF GitHub Bot commented on DRILL-5114:
---

GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/762

DRILL-5114: Rationalize use of Logback logging in unit tests

Renamed logback.xml file used for testing to logback-test.xml as per
the Logback documentation. Made logging less detailed in the test
version to reduce verbose output.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-rogers/drill DRILL-5114b

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/762.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #762


commit dc98f5e5c5a1397f3f47ea9bfca36257db07b1f3
Author: Paul Rogers 
Date:   2017-02-24T20:57:25Z

DRILL-5114: Rationalize use of Logback logging in unit tests

Renamed logback.xml file used for testing to logback-test.xml as per
the Logback documentation. Made logging less detailed in the test
version to reduce verbose output.




> Rationalize use of Logback logging in unit tests
> 
>
> Key: DRILL-5114
> URL: https://issues.apache.org/jira/browse/DRILL-5114
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Drill uses Logback as its logger. The logger is used in several to display 
> some test output. Test output is sent to stdout, rather than a log file. 
> Since Drill also uses Logback, that same configuration sends much Drill 
> logging output to stdout as well, cluttering test output.
> Logback requires that that one Logback config file (either logback.xml or 
> hogback-test.xml) exist on the class path. Tests store the config file in the 
> src/test/resources folder of each sub-project.
> These files set the default logging level to debug. While this setting is 
> fine when working with individual tests, the output is overwhelming for bulk 
> test runs.
> The first requested change is to set the default logging level to error.
> The existing config files are usually called "logback.xml." Change the name 
> of test files to "logback-test.xml" to make clear that they are, in fact, 
> test configs.
> The {{exec/java-exec/src/test/resources/logback.xml}} config file is a full 
> version of Drill's production config file. Replace this with a config 
> suitable for testing (that is, the same as other modules.)
> The java-exec project includes a production-like config file in its non-test 
> sources: {{exec/java-exec/src/main/resources/logback.xml}}. Remove this as it 
> is not needed. (Instead, rely on the one shipped in the distribution 
> subsystem, which is the one copied to the Drill distribution.)
> Since Logback complains bitterly (via many log messages) when it cannot find 
> a configuration file (and each sub-module must have its own test 
> configuration), add missing logging configuration files:
> * exec/memory/base/src/test/resources/logback-test.xml
> * logical/src/test/resources/logback-test.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (DRILL-5114) Rationalize use of Logback logging in unit tests

2017-02-24 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883489#comment-15883489
 ] 

Paul Rogers edited comment on DRILL-5114 at 2/24/17 8:56 PM:
-

Found a simpler solution: simply rename the {{logback.xml}} file in 
{{test/resources}} to {{logback-test.xml}}, then make the logging less 
intensive. This prevents verbose logging, but lets specific tests enable more 
detailed logging when needed using the new {{LogFixture}}.

In debug mode, Logback first looks for {{logback-test.xml}} on the class path. 
If found, Logback stops there. If not found, it then looks for all instances of 
{{logback.xml}}. Since we have two (one for test, another for production), 
Logback complains and dumps a bunch of messages into the log. The version of 
{{logback.xml}} that is used is undefined.

By renaming the test file, then Logback will prefer that file in tests, we get 
no unwanted log output, and the config file chosen is deterministic.


was (Author: paul-rogers):
Found a simpler solution: simply rename the {{logback.xml}} file in 
{{test/resources}} to {{logback-test.xml}}, then make the logging less 
intensive. This prevents verbose logging, but lets specific tests enable more 
detailed logging when needed using the new {{LogFixture}}.

> Rationalize use of Logback logging in unit tests
> 
>
> Key: DRILL-5114
> URL: https://issues.apache.org/jira/browse/DRILL-5114
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Drill uses Logback as its logger. The logger is used in several to display 
> some test output. Test output is sent to stdout, rather than a log file. 
> Since Drill also uses Logback, that same configuration sends much Drill 
> logging output to stdout as well, cluttering test output.
> Logback requires that that one Logback config file (either logback.xml or 
> hogback-test.xml) exist on the class path. Tests store the config file in the 
> src/test/resources folder of each sub-project.
> These files set the default logging level to debug. While this setting is 
> fine when working with individual tests, the output is overwhelming for bulk 
> test runs.
> The first requested change is to set the default logging level to error.
> The existing config files are usually called "logback.xml." Change the name 
> of test files to "logback-test.xml" to make clear that they are, in fact, 
> test configs.
> The {{exec/java-exec/src/test/resources/logback.xml}} config file is a full 
> version of Drill's production config file. Replace this with a config 
> suitable for testing (that is, the same as other modules.)
> The java-exec project includes a production-like config file in its non-test 
> sources: {{exec/java-exec/src/main/resources/logback.xml}}. Remove this as it 
> is not needed. (Instead, rely on the one shipped in the distribution 
> subsystem, which is the one copied to the Drill distribution.)
> Since Logback complains bitterly (via many log messages) when it cannot find 
> a configuration file (and each sub-module must have its own test 
> configuration), add missing logging configuration files:
> * exec/memory/base/src/test/resources/logback-test.xml
> * logical/src/test/resources/logback-test.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883488#comment-15883488
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103030867
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/FragmentWrapper.java
 ---
@@ -136,26 +240,60 @@ public String getContent() {
 biggestBatches = Math.max(biggestBatches, batches);
   }
 
-  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build(), null);
-  builder.appendCell(minor.getEndpoint().getAddress(), null);
-  builder.appendMillis(minor.getStartTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - minor.getStartTime(), 
null);
+  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build());
+  builder.appendCell(minor.getEndpoint().getAddress());
+  builder.appendMillis(minor.getStartTime() - start);
+  builder.appendMillis(minor.getEndTime() - start);
+  builder.appendMillis(minor.getEndTime() - minor.getStartTime());
 
-  builder.appendFormattedInteger(biggestIncomingRecords, null);
-  builder.appendFormattedInteger(biggestBatches, null);
+  builder.appendFormattedInteger(biggestIncomingRecords);
+  builder.appendFormattedInteger(biggestBatches);
 
-  builder.appendTime(minor.getLastUpdate(), null);
-  builder.appendTime(minor.getLastProgress(), null);
+  builder.appendTime(minor.getLastUpdate());
+  builder.appendTime(minor.getLastProgress());
 
-  builder.appendBytes(minor.getMaxMemoryUsed(), null);
-  builder.appendCell(minor.getState().name(), null);
+  builder.appendBytes(minor.getMaxMemoryUsed());
+  builder.appendCell(minor.getState().name());
 }
 
 for (final MinorFragmentProfile m : incomplete) {
-  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId(), null);
+  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId());
   builder.appendRepeated(m.getState().toString(), null, 
NUM_NULLABLE_FRAGMENTS_COLUMNS);
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String MajorFragment = "Major Fragment";
+static final String MinorFragmentsReporting = "Minor Fragments 
Reporting";
+static final String FirstStart = "First Start";
+static final String LastStart = "Last Start";
+static final String FirstEnd = "First End";
+static final String LastEnd = "Last End";
+static final String MinRuntime = "Min Runtime";
+static final String AvgRuntime = "Avg Runtime";
+static final String MaxRuntime = "Max Runtime";
+static final String PercentBusy = "% Busy";
+static final String LastUpdate = "Last Update";
+static final String LastProgress = "Last Progress";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String MajorFragment = "Major Fragment ID";
+static final String MinorFragmentsReporting = "Number of Minor 
Fragments Spawned";
+static final String FirstStart = "Earliest start of a fragment since 
query submission";
+static final String LastStart = "Latest start of a fragment since 
query submission";
+static final String FirstEnd = "Earliest completion time a fragment";
+static final String LastEnd = "Latest completion time of a fragment";
+static final String MinRuntime = "Shortest fragment runtime";
+static final String AvgRuntime = "Average fragment runtime";
+static final String MaxRuntime = "Longest fragment runtime";
+static final String PercentBusy = "Percent time Fragments were busy 
doing work";
--- End diff --

+1


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> 

[jira] [Commented] (DRILL-5114) Rationalize use of Logback logging in unit tests

2017-02-24 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883489#comment-15883489
 ] 

Paul Rogers commented on DRILL-5114:


Found a simpler solution: simply rename the {{logback.xml}} file in 
{{test/resources}} to {{logback-test.xml}}, then make the logging less 
intensive. This prevents verbose logging, but lets specific tests enable more 
detailed logging when needed using the new {{LogFixture}}.

> Rationalize use of Logback logging in unit tests
> 
>
> Key: DRILL-5114
> URL: https://issues.apache.org/jira/browse/DRILL-5114
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Drill uses Logback as its logger. The logger is used in several to display 
> some test output. Test output is sent to stdout, rather than a log file. 
> Since Drill also uses Logback, that same configuration sends much Drill 
> logging output to stdout as well, cluttering test output.
> Logback requires that that one Logback config file (either logback.xml or 
> hogback-test.xml) exist on the class path. Tests store the config file in the 
> src/test/resources folder of each sub-project.
> These files set the default logging level to debug. While this setting is 
> fine when working with individual tests, the output is overwhelming for bulk 
> test runs.
> The first requested change is to set the default logging level to error.
> The existing config files are usually called "logback.xml." Change the name 
> of test files to "logback-test.xml" to make clear that they are, in fact, 
> test configs.
> The {{exec/java-exec/src/test/resources/logback.xml}} config file is a full 
> version of Drill's production config file. Replace this with a config 
> suitable for testing (that is, the same as other modules.)
> The java-exec project includes a production-like config file in its non-test 
> sources: {{exec/java-exec/src/main/resources/logback.xml}}. Remove this as it 
> is not needed. (Instead, rely on the one shipped in the distribution 
> subsystem, which is the one copied to the Drill distribution.)
> Since Logback complains bitterly (via many log messages) when it cannot find 
> a configuration file (and each sub-module must have its own test 
> configuration), add missing logging configuration files:
> * exec/memory/base/src/test/resources/logback-test.xml
> * logical/src/test/resources/logback-test.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-24 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5284:
---
Reviewer: Boaz Ben-Zvi

> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883485#comment-15883485
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/761

DRILL-5284: Roll-up of final fixes for managed sort

See subtasks for details.

* Provide detailed, accurate estimate of size consumed by a record batch
* Managed external sort spills too often with Parquet data
* Managed External Sort fails with OOM
* External sort refers to the deprecated HDFS fs.default.name param
* Config param drill.exec.sort.external.batch.size is not used
* NPE in managed external sort while spilling to disk
* External Sort BatchGroup leaks memory if an OOM occurs during read
* Ensure at least two batches are merged in low-memory conditions

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-rogers/drill DRILL-5284

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/761.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #761


commit 5558e9439805d595cfc9625591a276385454625f
Author: Paul Rogers 
Date:   2017-02-24T18:31:25Z

DRILL-5284: Roll-up of final fixes for managed sort

See subtasks for details.

* Provide detailed, accurate estimate of size consumed by a record batch
* Managed external sort spills too often with Parquet data
* Managed External Sort fails with OOM
* External sort refers to the deprecated HDFS fs.default.name param
* Config param drill.exec.sort.external.batch.size is not used
* NPE in managed external sort while spilling to disk
* External Sort BatchGroup leaks memory if an OOM occurs during read

commit 0028f26fef5d9b462700a28b689d47241ee3a1ce
Author: Paul Rogers 
Date:   2017-02-24T20:45:18Z

Fix for DRILL-5294

Under certain low-memory conditions, need to force the sort to merge
two batches to make progress, even though this is a bit more than
comfortably fits into memory.




> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883478#comment-15883478
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103028895
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/FragmentWrapper.java
 ---
@@ -136,26 +240,60 @@ public String getContent() {
 biggestBatches = Math.max(biggestBatches, batches);
   }
 
-  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build(), null);
-  builder.appendCell(minor.getEndpoint().getAddress(), null);
-  builder.appendMillis(minor.getStartTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - minor.getStartTime(), 
null);
+  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build());
+  builder.appendCell(minor.getEndpoint().getAddress());
+  builder.appendMillis(minor.getStartTime() - start);
+  builder.appendMillis(minor.getEndTime() - start);
+  builder.appendMillis(minor.getEndTime() - minor.getStartTime());
 
-  builder.appendFormattedInteger(biggestIncomingRecords, null);
-  builder.appendFormattedInteger(biggestBatches, null);
+  builder.appendFormattedInteger(biggestIncomingRecords);
+  builder.appendFormattedInteger(biggestBatches);
 
-  builder.appendTime(minor.getLastUpdate(), null);
-  builder.appendTime(minor.getLastProgress(), null);
+  builder.appendTime(minor.getLastUpdate());
+  builder.appendTime(minor.getLastProgress());
 
-  builder.appendBytes(minor.getMaxMemoryUsed(), null);
-  builder.appendCell(minor.getState().name(), null);
+  builder.appendBytes(minor.getMaxMemoryUsed());
+  builder.appendCell(minor.getState().name());
 }
 
 for (final MinorFragmentProfile m : incomplete) {
-  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId(), null);
+  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId());
   builder.appendRepeated(m.getState().toString(), null, 
NUM_NULLABLE_FRAGMENTS_COLUMNS);
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String MajorFragment = "Major Fragment";
+static final String MinorFragmentsReporting = "Minor Fragments 
Reporting";
+static final String FirstStart = "First Start";
+static final String LastStart = "Last Start";
+static final String FirstEnd = "First End";
+static final String LastEnd = "Last End";
+static final String MinRuntime = "Min Runtime";
+static final String AvgRuntime = "Avg Runtime";
+static final String MaxRuntime = "Max Runtime";
+static final String PercentBusy = "% Busy";
+static final String LastUpdate = "Last Update";
+static final String LastProgress = "Last Progress";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String MajorFragment = "Major Fragment ID";
+static final String MinorFragmentsReporting = "Number of Minor 
Fragments Spawned";
+static final String FirstStart = "Earliest start of a fragment since 
query submission";
+static final String LastStart = "Latest start of a fragment since 
query submission";
+static final String FirstEnd = "Earliest completion time a fragment";
--- End diff --

+1 Applied changes to all.


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883475#comment-15883475
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103028666
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/FragmentWrapper.java
 ---
@@ -136,26 +240,60 @@ public String getContent() {
 biggestBatches = Math.max(biggestBatches, batches);
   }
 
-  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build(), null);
-  builder.appendCell(minor.getEndpoint().getAddress(), null);
-  builder.appendMillis(minor.getStartTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - minor.getStartTime(), 
null);
+  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build());
+  builder.appendCell(minor.getEndpoint().getAddress());
+  builder.appendMillis(minor.getStartTime() - start);
+  builder.appendMillis(minor.getEndTime() - start);
+  builder.appendMillis(minor.getEndTime() - minor.getStartTime());
 
-  builder.appendFormattedInteger(biggestIncomingRecords, null);
-  builder.appendFormattedInteger(biggestBatches, null);
+  builder.appendFormattedInteger(biggestIncomingRecords);
+  builder.appendFormattedInteger(biggestBatches);
 
-  builder.appendTime(minor.getLastUpdate(), null);
-  builder.appendTime(minor.getLastProgress(), null);
+  builder.appendTime(minor.getLastUpdate());
+  builder.appendTime(minor.getLastProgress());
 
-  builder.appendBytes(minor.getMaxMemoryUsed(), null);
-  builder.appendCell(minor.getState().name(), null);
+  builder.appendBytes(minor.getMaxMemoryUsed());
+  builder.appendCell(minor.getState().name());
 }
 
 for (final MinorFragmentProfile m : incomplete) {
-  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId(), null);
+  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId());
   builder.appendRepeated(m.getState().toString(), null, 
NUM_NULLABLE_FRAGMENTS_COLUMNS);
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String MajorFragment = "Major Fragment";
+static final String MinorFragmentsReporting = "Minor Fragments 
Reporting";
+static final String FirstStart = "First Start";
+static final String LastStart = "Last Start";
+static final String FirstEnd = "First End";
+static final String LastEnd = "Last End";
+static final String MinRuntime = "Min Runtime";
+static final String AvgRuntime = "Avg Runtime";
+static final String MaxRuntime = "Max Runtime";
+static final String PercentBusy = "% Busy";
+static final String LastUpdate = "Last Update";
+static final String LastProgress = "Last Progress";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String MajorFragment = "Major Fragment ID";
+static final String MinorFragmentsReporting = "Number of Minor 
Fragments Spawned";
--- End diff --

+1


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883473#comment-15883473
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103028479
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/OperatorWrapper.java
 ---
@@ -179,12 +207,47 @@ public String getMetricsTable() {
   }
   for (final Number value : values) {
 if (value != null) {
-  builder.appendFormattedNumber(value, null);
+  builder.appendFormattedNumber(value);
 } else {
-  builder.appendCell("", null);
+  builder.appendCell("");
 }
   }
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String OperatorID = "Operator ID";
--- End diff --

I'll change it to the Java convention. It was getting to be jarring to see 
everything in Upper case when defining the columns. Should follow the 
convention. Thanks!


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883474#comment-15883474
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103028587
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/FragmentWrapper.java
 ---
@@ -136,26 +240,60 @@ public String getContent() {
 biggestBatches = Math.max(biggestBatches, batches);
   }
 
-  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build(), null);
-  builder.appendCell(minor.getEndpoint().getAddress(), null);
-  builder.appendMillis(minor.getStartTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - minor.getStartTime(), 
null);
+  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build());
+  builder.appendCell(minor.getEndpoint().getAddress());
+  builder.appendMillis(minor.getStartTime() - start);
+  builder.appendMillis(minor.getEndTime() - start);
+  builder.appendMillis(minor.getEndTime() - minor.getStartTime());
 
-  builder.appendFormattedInteger(biggestIncomingRecords, null);
-  builder.appendFormattedInteger(biggestBatches, null);
+  builder.appendFormattedInteger(biggestIncomingRecords);
+  builder.appendFormattedInteger(biggestBatches);
 
-  builder.appendTime(minor.getLastUpdate(), null);
-  builder.appendTime(minor.getLastProgress(), null);
+  builder.appendTime(minor.getLastUpdate());
+  builder.appendTime(minor.getLastProgress());
 
-  builder.appendBytes(minor.getMaxMemoryUsed(), null);
-  builder.appendCell(minor.getState().name(), null);
+  builder.appendBytes(minor.getMaxMemoryUsed());
+  builder.appendCell(minor.getState().name());
 }
 
 for (final MinorFragmentProfile m : incomplete) {
-  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId(), null);
+  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId());
   builder.appendRepeated(m.getState().toString(), null, 
NUM_NULLABLE_FRAGMENTS_COLUMNS);
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String MajorFragment = "Major Fragment";
+static final String MinorFragmentsReporting = "Minor Fragments 
Reporting";
+static final String FirstStart = "First Start";
+static final String LastStart = "Last Start";
+static final String FirstEnd = "First End";
+static final String LastEnd = "Last End";
+static final String MinRuntime = "Min Runtime";
+static final String AvgRuntime = "Avg Runtime";
+static final String MaxRuntime = "Max Runtime";
+static final String PercentBusy = "% Busy";
+static final String LastUpdate = "Last Update";
+static final String LastProgress = "Last Progress";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String MajorFragment = "Major Fragment ID";
--- End diff --

+1 


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883471#comment-15883471
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103028294
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/FragmentWrapper.java
 ---
@@ -35,6 +38,10 @@
 public class FragmentWrapper {
   private final MajorFragmentProfile major;
   private final long start;
+  private final Locale currentLocale = Locale.getDefault();
+  private final String pattern = "dd-MMM- HH:mm:ss";
+  private final SimpleDateFormat simpleDateFormat = new SimpleDateFormat(
--- End diff --

Oh. I thought the locale did that. But, as long as these timestamps are 
aligned with the timestamps on the server (where the logs reside), this should 
be fine, right? Otherwise, this is useful for only _running_ queries and we can 
skip it altogether. What do you suggest, @paul-rogers ?


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5294) Managed External Sort throws an OOM during the merge and spill phase

2017-02-24 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5294:
---
Issue Type: Sub-task  (was: Bug)
Parent: DRILL-5284

> Managed External Sort throws an OOM during the merge and spill phase
> 
>
> Key: DRILL-5294
> URL: https://issues.apache.org/jira/browse/DRILL-5294
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Relational Operators
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
> Attachments: 2751ce6d-67e6-ae08-3b68-e33b29f9d2a3.sys.drill, 
> drillbit.log, drillbit_scenario2.log, drillbit_scenario3.log, 
> scenario2_profile.sys.drill, scenario3_profile.sys.drill
>
>
> commit # : 38f816a45924654efd085bf7f1da7d97a4a51e38
> The below query fails with managed sort while it succeeds on the old sort
> {code}
> select * from (select columns[433] col433, columns[0], 
> columns[1],columns[2],columns[3],columns[4],columns[5],columns[6],columns[7],columns[8],columns[9],columns[10],columns[11]
>  from dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
> columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50])
>  d where d.col433 = 'sjka skjf';
> Error: RESOURCE ERROR: External Sort encountered an error while spilling to 
> disk
> Fragment 1:11
> [Error Id: 0aa20284-cfcc-450f-89b3-645c280f33a4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> Env : 
> {code}
> No of Drillbits : 1
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> {code}
> Attached the logs and profile. Data is too large for a jira



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (DRILL-5294) Managed External Sort throws an OOM during the merge and spill phase

2017-02-24 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883418#comment-15883418
 ] 

Paul Rogers edited comment on DRILL-5294 at 2/24/17 8:35 PM:
-

Original test case works fine with latest code. Tested with the long query 
using a single slice (all that can be done on the Mac) and 2 GB sort memory.

{code}
Results: 0 records, 1 batches, 208,341 ms
{code}

Tested with an adaptation of the second query using the 18 GB "250wide.tbl" 
file:

{code}
select * from (select * from `dfs.data`.`250wide.tbl` d
  where cast(d.columns[1] as int) > 0 order by columns[0]) d1 where 
d1.columns[0] = 'kjhf'
Results: 0 records, 1 batches, 356,243 ms
{code}

The second use case completes, but is slow because it does a binary merge: 
merging two batches, then spilling and repeating until only two runs remain:

{code}
select * from (select * from `dfs.data`.`250wide-small.tbl` order by 
columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf'
Results: 0 records, 1 batches, 26,753 ms
{code}

The third case also succeeds:

{code}
select * from (select * from `dfs.data`.`250wide_files` d
  where cast(d.columns[1] as int) > 0 order by columns[0]) d1 where 
d1.columns[0] = 'kjhf'
Results: 0 records, 1 batches, 9,987 ms
{code}

One minor fix was found, will be pushed to the Sort-Rollup branch and included 
in the DRILL-5284 PR.


was (Author: paul-rogers):
Original test case works fine with latest code. Tested with the long query 
using a single slice (all that can be done on the Mac) and 2 GB sort memory.

{code}
Results: 0 records, 1 batches, 208,341 ms
{code}

Tested with an adaptation of the second query using the 18 GB "250wide.tbl" 
file:

{code}
select * from (select * from `dfs.data`.`250wide.tbl` d
  where cast(d.columns[1] as int) > 0 order by columns[0]) d1 where 
d1.columns[0] = 'kjhf'
Results: 0 records, 1 batches, 356,243 ms
{code}

The second use case completes, but is slow because it does a binary merge: 
merging two batches, then spilling and repeating until only two runs remain:

{code}
select * from (select * from `dfs.data`.`250wide-small.tbl` order by 
columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf'
Results: 0 records, 1 batches, 26,753 ms
{code}



> Managed External Sort throws an OOM during the merge and spill phase
> 
>
> Key: DRILL-5294
> URL: https://issues.apache.org/jira/browse/DRILL-5294
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
> Attachments: 2751ce6d-67e6-ae08-3b68-e33b29f9d2a3.sys.drill, 
> drillbit.log, drillbit_scenario2.log, drillbit_scenario3.log, 
> scenario2_profile.sys.drill, scenario3_profile.sys.drill
>
>
> commit # : 38f816a45924654efd085bf7f1da7d97a4a51e38
> The below query fails with managed sort while it succeeds on the old sort
> {code}
> select * from (select columns[433] col433, columns[0], 
> columns[1],columns[2],columns[3],columns[4],columns[5],columns[6],columns[7],columns[8],columns[9],columns[10],columns[11]
>  from dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
> columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50])
>  d where d.col433 = 'sjka skjf';
> Error: RESOURCE ERROR: External Sort encountered an error while spilling to 
> disk
> Fragment 1:11
> [Error Id: 0aa20284-cfcc-450f-89b3-645c280f33a4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> Env : 
> {code}
> No of Drillbits : 1
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> {code}
> Attached the logs and profile. Data is too large for a jira



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5294) Managed External Sort throws an OOM during the merge and spill phase

2017-02-24 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883418#comment-15883418
 ] 

Paul Rogers commented on DRILL-5294:


Original test case works fine with latest code. Tested with the long query 
using a single slice (all that can be done on the Mac) and 2 GB sort memory.

{code}
Results: 0 records, 1 batches, 208,341 ms
{code}

Tested with an adaptation of the second query using the 18 GB "250wide.tbl" 
file:

{code}
select * from (select * from `dfs.data`.`250wide.tbl` d
  where cast(d.columns[1] as int) > 0 order by columns[0]) d1 where 
d1.columns[0] = 'kjhf'
Results: 0 records, 1 batches, 356,243 ms
{code}

The second use case completes, but is slow because it does a binary merge: 
merging two batches, then spilling and repeating until only two runs remain:

{code}
select * from (select * from `dfs.data`.`250wide-small.tbl` order by 
columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf'
Results: 0 records, 1 batches, 26,753 ms
{code}



> Managed External Sort throws an OOM during the merge and spill phase
> 
>
> Key: DRILL-5294
> URL: https://issues.apache.org/jira/browse/DRILL-5294
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
> Attachments: 2751ce6d-67e6-ae08-3b68-e33b29f9d2a3.sys.drill, 
> drillbit.log, drillbit_scenario2.log, drillbit_scenario3.log, 
> scenario2_profile.sys.drill, scenario3_profile.sys.drill
>
>
> commit # : 38f816a45924654efd085bf7f1da7d97a4a51e38
> The below query fails with managed sort while it succeeds on the old sort
> {code}
> select * from (select columns[433] col433, columns[0], 
> columns[1],columns[2],columns[3],columns[4],columns[5],columns[6],columns[7],columns[8],columns[9],columns[10],columns[11]
>  from dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
> columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50])
>  d where d.col433 = 'sjka skjf';
> Error: RESOURCE ERROR: External Sort encountered an error while spilling to 
> disk
> Fragment 1:11
> [Error Id: 0aa20284-cfcc-450f-89b3-645c280f33a4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> Env : 
> {code}
> No of Drillbits : 1
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> {code}
> Attached the logs and profile. Data is too large for a jira



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5295) Unable to query INFORMATION_SCHEMA.`TABLES` if MySql storage plugin enabled

2017-02-24 Thread Akira Matsuo (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883406#comment-15883406
 ] 

Akira Matsuo commented on DRILL-5295:
-

It seems like some of the code in InfoSchemaRecordGenerator regarding visiting 
tables handles null table types and other parts don't.  The configuration in 
question where this error occurs is defined as a:
{
  "type": "jdbc",
  "driver": "com.mysql.jdbc.Driver",
... rest of config
}
Not sure if it's the correct way to solve this issue but as work around I've 
patched our drill to ignore null table types.   Not sure why I had to check for 
nulls in both places as visitTableWithType is private and should only be called 
if shouldVisitTable == true.  As a reference I've included the patch in this 
comment below:

<<< PATCH FOLLOWS >>
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaRecordGenerator.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaRecordGenerator.java
index aee3dc1..a9404f5 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaRecordGenerator.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaRecordGenerator.java
@@ -156,6 +156,10 @@ public abstract class InfoSchemaRecordGenerator {
 if (filter == null) {
   return true;
 }
+// Don't think we should be visiting tables with null tableType.
+if (tableType == null) {
+  return false;
+}

 final Map recordValues =
 ImmutableMap.of(
@@ -311,9 +315,15 @@ public abstract class InfoSchemaRecordGenerator {
 }

 private void visitTableWithType(String schemaName, String tableName, 
TableType type) {
+  // fail gracefully instead of NPE on precondition, not sure why this is 
necessary if
+  // shouldVisitTable() has already been modified to handle null tableTypes
+  if (type == null) {
+return;
+  }
+  // don't think precondition is necessary anymore, will leave it in for 
now
   Preconditions
-  .checkNotNull(type, "Error. Type information for table %s.%s 
provided is null.", schemaName,
-  tableName);
+.checkNotNull(type, "Error. Type information for table %s.%s provided 
is null.", schemaName,
+tableName);
   records.add(new Records.Table(IS_CATALOG_NAME, schemaName, tableName, 
type.toString()));
   return;
 } 



> Unable to query INFORMATION_SCHEMA.`TABLES` if MySql storage plugin enabled
> ---
>
> Key: DRILL-5295
> URL: https://issues.apache.org/jira/browse/DRILL-5295
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
> Environment: Drill 1.9. Error can be reproduced running Drill locally 
> on Windows and on Linux within zookeeper. I can reproduce it with 2 mysql 
> servers.
>Reporter: Martina Ponca
>
> Impact: Unable to connect from Qlik Sense to Drill because of MySql Storage 
> Plugin enabled.
> Steps to repro:
> 1. Create a new storage plugin to MySql Community Edition 5.5.43. Enable it.
> 2. Run query: "select * from INFORMATION_SCHEMA.`TABLES`"
> 3. Error: 
> {code}
> Error: SYSTEM ERROR: NullPointerException: Error. Type information for table 
> bistoremysql.information_schema.CHARACTER_SETS provided is null.
> Fragment 0:0
> [Error Id: 2717cfe1-413d-4330-ab3f-720ae92ebc50 on 
> mycomputer.domain.lan:31010]
>   (java.lang.NullPointerException) Error. Type information for table 
> bistoremysql.information_schema.CHARACTER_SETS provided is null.
> com.google.common.base.Preconditions.checkNotNull():250
> 
> org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator$Tables.visitTableWithType():314
> 
> org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator$Tables.visitTables():308
> 
> org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator.scanSchema():215
> 
> org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator.scanSchema():208
> 
> org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator.scanSchema():208
> 
> org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator.scanSchema():195
> 
> org.apache.drill.exec.store.ischema.InfoSchemaTableType.getRecordReader():58
> org.apache.drill.exec.store.ischema.InfoSchemaBatchCreator.getBatch():36
> org.apache.drill.exec.store.ischema.InfoSchemaBatchCreator.getBatch():30
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():148
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():171
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():128
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():171
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():101
> 

[jira] [Commented] (DRILL-5295) Unable to query INFORMATION_SCHEMA.`TABLES` if MySql storage plugin enabled

2017-02-24 Thread Akira Matsuo (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883387#comment-15883387
 ] 

Akira Matsuo commented on DRILL-5295:
-

Caused by table type filtering.

> Unable to query INFORMATION_SCHEMA.`TABLES` if MySql storage plugin enabled
> ---
>
> Key: DRILL-5295
> URL: https://issues.apache.org/jira/browse/DRILL-5295
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
> Environment: Drill 1.9. Error can be reproduced running Drill locally 
> on Windows and on Linux within zookeeper. I can reproduce it with 2 mysql 
> servers.
>Reporter: Martina Ponca
>
> Impact: Unable to connect from Qlik Sense to Drill because of MySql Storage 
> Plugin enabled.
> Steps to repro:
> 1. Create a new storage plugin to MySql Community Edition 5.5.43. Enable it.
> 2. Run query: "select * from INFORMATION_SCHEMA.`TABLES`"
> 3. Error: 
> {code}
> Error: SYSTEM ERROR: NullPointerException: Error. Type information for table 
> bistoremysql.information_schema.CHARACTER_SETS provided is null.
> Fragment 0:0
> [Error Id: 2717cfe1-413d-4330-ab3f-720ae92ebc50 on 
> mycomputer.domain.lan:31010]
>   (java.lang.NullPointerException) Error. Type information for table 
> bistoremysql.information_schema.CHARACTER_SETS provided is null.
> com.google.common.base.Preconditions.checkNotNull():250
> 
> org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator$Tables.visitTableWithType():314
> 
> org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator$Tables.visitTables():308
> 
> org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator.scanSchema():215
> 
> org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator.scanSchema():208
> 
> org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator.scanSchema():208
> 
> org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator.scanSchema():195
> 
> org.apache.drill.exec.store.ischema.InfoSchemaTableType.getRecordReader():58
> org.apache.drill.exec.store.ischema.InfoSchemaBatchCreator.getBatch():36
> org.apache.drill.exec.store.ischema.InfoSchemaBatchCreator.getBatch():30
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():148
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():171
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():128
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():171
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():101
> org.apache.drill.exec.physical.impl.ImplCreator.getExec():79
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():206
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745 (state=,code=0)
> {code}
> The full query Qlik Sense runs:
> {code:sql}
> select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from 
> INFORMATION_SCHEMA.`TABLES` WHERE TABLE_CATALOG LIKE 'DRILL' ESCAPE '\' AND 
> TABLE_SCHEMA <> 'sys' AND TABLE_SCHEMA <> 'INFORMATION_SCHEMA'ORDER BY 
> TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME
> {code}
> If I disable the mysql storage plugin, I can run the query and connect from 
> Qlik (not a workaround). 
> This issue cannot be reproduced using Drill 1.5. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4963) Issues when overloading Drill native functions with dynamic UDFs

2017-02-24 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-4963:
---
Labels: ready-to-commit  (was: )

> Issues when overloading Drill native functions with dynamic UDFs
> 
>
> Key: DRILL-4963
> URL: https://issues.apache.org/jira/browse/DRILL-4963
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.9.0
>Reporter: Roman
>Assignee: Arina Ielchiieva
>  Labels: ready-to-commit
> Fix For: Future
>
> Attachments: subquery_udf-1.0.jar, subquery_udf-1.0-sources.jar, 
> test_overloading-1.0.jar, test_overloading-1.0-sources.jar
>
>
> I created jar file which overloads 3 DRILL native functions 
> (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED) and 
> ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF.
> If I try to use my functions I will get errors:
> {code:xml}
> SELECT CURRENT_DATE('test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR)
> SQL Query null
> {code:xml}
> SELECT ABS('test','test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR)
> SQL Query null
> {code:xml}
> SELECT LOG('test') FROM (VALUES(1));
> {code}
> Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing 
> expression in constant expression evaluator LOG('test').  Errors: 
> Error in expression at index -1.  Error: Missing function implementation: 
> castTINYINT(VARCHAR-REQUIRED).  Full expression: UNKNOWN EXPRESSION.
> But if I rerun all this queries after "DrillRuntimeException", they will run 
> correctly. It seems that Drill have not updated the function signature before 
> that error. Also if I add jar as usual UDF (copy jar to 
> /drill_home/jars/3rdparty and restart drillbits), all queries will run 
> correctly without errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5196) Could not run a single MongoDB unit test case through command line or IDE

2017-02-24 Thread Chunhui Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi updated DRILL-5196:
---
Reviewer: Paul Rogers

> Could not run a single MongoDB unit test case through command line or IDE
> -
>
> Key: DRILL-5196
> URL: https://issues.apache.org/jira/browse/DRILL-5196
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>
> Could not run a single MongoDB's unit test through IDE or command line. The 
> reason is when running a single test case, the MongoDB instance did not get 
> started thus a 'table not found' error for 'mongo.employee.empinfo' would be 
> raised.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883284#comment-15883284
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103004338
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/OperatorWrapper.java
 ---
@@ -179,12 +207,47 @@ public String getMetricsTable() {
   }
   for (final Number value : values) {
 if (value != null) {
-  builder.appendFormattedNumber(value, null);
+  builder.appendFormattedNumber(value);
 } else {
-  builder.appendCell("", null);
+  builder.appendCell("");
 }
   }
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String OperatorID = "Operator ID";
+static final String Type = "Type";
+static final String AvgSetupTime = "Avg Setup Time";
+static final String MaxSetupTime = "Max Setup Time";
+static final String AvgProcessTime = "Avg Process Time";
+static final String MaxProcessTime = "Max Process Time";
+static final String MinWaitTime = "Min Wait Time";
+static final String AvgWaitTime = "Avg Wait Time";
+static final String MaxWaitTime = "Max Wait Time";
+static final String PercentFragmentTime = "% Fragment Time";
+static final String PercentQueryTime = "% Query Time";
+static final String Rows = "Rows";
+static final String AvgPeakMemory = "Avg Peak Memory";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String OperatorID = "Operator ID";
+static final String Type = "Operator Type";
+static final String AvgSetupTime = "Average Time in setting up 
fragments";
+static final String MaxSetupTime = "Longest Time a fragment took in 
setup";
+static final String AvgProcessTime = "Average processing time for a 
fragment";
+static final String MaxProcessTime = "Longest Time a fragment took to 
process";
--- End diff --

"Time" --> "time"

Better, "Longest process time of any fragment"


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883287#comment-15883287
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103000803
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/FragmentWrapper.java
 ---
@@ -136,26 +240,60 @@ public String getContent() {
 biggestBatches = Math.max(biggestBatches, batches);
   }
 
-  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build(), null);
-  builder.appendCell(minor.getEndpoint().getAddress(), null);
-  builder.appendMillis(minor.getStartTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - minor.getStartTime(), 
null);
+  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build());
+  builder.appendCell(minor.getEndpoint().getAddress());
+  builder.appendMillis(minor.getStartTime() - start);
+  builder.appendMillis(minor.getEndTime() - start);
+  builder.appendMillis(minor.getEndTime() - minor.getStartTime());
 
-  builder.appendFormattedInteger(biggestIncomingRecords, null);
-  builder.appendFormattedInteger(biggestBatches, null);
+  builder.appendFormattedInteger(biggestIncomingRecords);
+  builder.appendFormattedInteger(biggestBatches);
 
-  builder.appendTime(minor.getLastUpdate(), null);
-  builder.appendTime(minor.getLastProgress(), null);
+  builder.appendTime(minor.getLastUpdate());
+  builder.appendTime(minor.getLastProgress());
 
-  builder.appendBytes(minor.getMaxMemoryUsed(), null);
-  builder.appendCell(minor.getState().name(), null);
+  builder.appendBytes(minor.getMaxMemoryUsed());
+  builder.appendCell(minor.getState().name());
 }
 
 for (final MinorFragmentProfile m : incomplete) {
-  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId(), null);
+  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId());
   builder.appendRepeated(m.getState().toString(), null, 
NUM_NULLABLE_FRAGMENTS_COLUMNS);
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String MajorFragment = "Major Fragment";
+static final String MinorFragmentsReporting = "Minor Fragments 
Reporting";
+static final String FirstStart = "First Start";
+static final String LastStart = "Last Start";
+static final String FirstEnd = "First End";
+static final String LastEnd = "Last End";
+static final String MinRuntime = "Min Runtime";
+static final String AvgRuntime = "Avg Runtime";
+static final String MaxRuntime = "Max Runtime";
+static final String PercentBusy = "% Busy";
+static final String LastUpdate = "Last Update";
+static final String LastProgress = "Last Progress";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String MajorFragment = "Major Fragment ID";
+static final String MinorFragmentsReporting = "Number of Minor 
Fragments Spawned";
--- End diff --

This is descriptive text, so use sentence case: "Number of minor fragments 
spawned". Maybe "spawened" --> "started".


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883294#comment-15883294
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103003378
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/FragmentWrapper.java
 ---
@@ -136,26 +240,60 @@ public String getContent() {
 biggestBatches = Math.max(biggestBatches, batches);
   }
 
-  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build(), null);
-  builder.appendCell(minor.getEndpoint().getAddress(), null);
-  builder.appendMillis(minor.getStartTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - minor.getStartTime(), 
null);
+  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build());
+  builder.appendCell(minor.getEndpoint().getAddress());
+  builder.appendMillis(minor.getStartTime() - start);
+  builder.appendMillis(minor.getEndTime() - start);
+  builder.appendMillis(minor.getEndTime() - minor.getStartTime());
 
-  builder.appendFormattedInteger(biggestIncomingRecords, null);
-  builder.appendFormattedInteger(biggestBatches, null);
+  builder.appendFormattedInteger(biggestIncomingRecords);
+  builder.appendFormattedInteger(biggestBatches);
 
-  builder.appendTime(minor.getLastUpdate(), null);
-  builder.appendTime(minor.getLastProgress(), null);
+  builder.appendTime(minor.getLastUpdate());
+  builder.appendTime(minor.getLastProgress());
 
-  builder.appendBytes(minor.getMaxMemoryUsed(), null);
-  builder.appendCell(minor.getState().name(), null);
+  builder.appendBytes(minor.getMaxMemoryUsed());
+  builder.appendCell(minor.getState().name());
 }
 
 for (final MinorFragmentProfile m : incomplete) {
-  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId(), null);
+  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId());
   builder.appendRepeated(m.getState().toString(), null, 
NUM_NULLABLE_FRAGMENTS_COLUMNS);
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String MajorFragment = "Major Fragment";
+static final String MinorFragmentsReporting = "Minor Fragments 
Reporting";
+static final String FirstStart = "First Start";
+static final String LastStart = "Last Start";
+static final String FirstEnd = "First End";
+static final String LastEnd = "Last End";
+static final String MinRuntime = "Min Runtime";
+static final String AvgRuntime = "Avg Runtime";
+static final String MaxRuntime = "Max Runtime";
+static final String PercentBusy = "% Busy";
+static final String LastUpdate = "Last Update";
+static final String LastProgress = "Last Progress";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String MajorFragment = "Major Fragment ID";
+static final String MinorFragmentsReporting = "Number of Minor 
Fragments Spawned";
+static final String FirstStart = "Earliest start of a fragment since 
query submission";
+static final String LastStart = "Latest start of a fragment since 
query submission";
+static final String FirstEnd = "Earliest completion time a fragment";
+static final String LastEnd = "Latest completion time of a fragment";
+static final String MinRuntime = "Shortest fragment runtime";
+static final String AvgRuntime = "Average fragment runtime";
+static final String MaxRuntime = "Longest fragment runtime";
+static final String PercentBusy = "Percent time Fragments were busy 
doing work";
--- End diff --

"Percentage of run time that fragments". Note "Fragments" --> "fragments".


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It 

[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883289#comment-15883289
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103005037
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/OperatorWrapper.java
 ---
@@ -179,12 +207,47 @@ public String getMetricsTable() {
   }
   for (final Number value : values) {
 if (value != null) {
-  builder.appendFormattedNumber(value, null);
+  builder.appendFormattedNumber(value);
 } else {
-  builder.appendCell("", null);
+  builder.appendCell("");
 }
   }
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String OperatorID = "Operator ID";
+static final String Type = "Type";
+static final String AvgSetupTime = "Avg Setup Time";
+static final String MaxSetupTime = "Max Setup Time";
+static final String AvgProcessTime = "Avg Process Time";
+static final String MaxProcessTime = "Max Process Time";
+static final String MinWaitTime = "Min Wait Time";
+static final String AvgWaitTime = "Avg Wait Time";
+static final String MaxWaitTime = "Max Wait Time";
+static final String PercentFragmentTime = "% Fragment Time";
+static final String PercentQueryTime = "% Query Time";
+static final String Rows = "Rows";
+static final String AvgPeakMemory = "Avg Peak Memory";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String OperatorID = "Operator ID";
+static final String Type = "Operator Type";
+static final String AvgSetupTime = "Average Time in setting up 
fragments";
+static final String MaxSetupTime = "Longest Time a fragment took in 
setup";
+static final String AvgProcessTime = "Average processing time for a 
fragment";
+static final String MaxProcessTime = "Longest Time a fragment took to 
process";
+static final String MinWaitTime = "Shortest time a fragment spent in 
waiting for data";
+static final String AvgWaitTime = "Average wait time for a fragment";
+static final String MaxWaitTime = "Longest Time a fragment spent in 
waiting data";
+static final String PercentFragmentTime = "Percentage of the total 
fragment time that was spent on the operator";
+static final String PercentQueryTime = "Percentage of the total query 
time that was spent on the operator";
+static final String Rows = "Rows emitted by the operator";
--- End diff --

"Rows emitted by scans, or consumed by other operators."


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883288#comment-15883288
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103000873
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/FragmentWrapper.java
 ---
@@ -136,26 +240,60 @@ public String getContent() {
 biggestBatches = Math.max(biggestBatches, batches);
   }
 
-  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build(), null);
-  builder.appendCell(minor.getEndpoint().getAddress(), null);
-  builder.appendMillis(minor.getStartTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - minor.getStartTime(), 
null);
+  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build());
+  builder.appendCell(minor.getEndpoint().getAddress());
+  builder.appendMillis(minor.getStartTime() - start);
+  builder.appendMillis(minor.getEndTime() - start);
+  builder.appendMillis(minor.getEndTime() - minor.getStartTime());
 
-  builder.appendFormattedInteger(biggestIncomingRecords, null);
-  builder.appendFormattedInteger(biggestBatches, null);
+  builder.appendFormattedInteger(biggestIncomingRecords);
+  builder.appendFormattedInteger(biggestBatches);
 
-  builder.appendTime(minor.getLastUpdate(), null);
-  builder.appendTime(minor.getLastProgress(), null);
+  builder.appendTime(minor.getLastUpdate());
+  builder.appendTime(minor.getLastProgress());
 
-  builder.appendBytes(minor.getMaxMemoryUsed(), null);
-  builder.appendCell(minor.getState().name(), null);
+  builder.appendBytes(minor.getMaxMemoryUsed());
+  builder.appendCell(minor.getState().name());
 }
 
 for (final MinorFragmentProfile m : incomplete) {
-  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId(), null);
+  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId());
   builder.appendRepeated(m.getState().toString(), null, 
NUM_NULLABLE_FRAGMENTS_COLUMNS);
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String MajorFragment = "Major Fragment";
+static final String MinorFragmentsReporting = "Minor Fragments 
Reporting";
+static final String FirstStart = "First Start";
+static final String LastStart = "Last Start";
+static final String FirstEnd = "First End";
+static final String LastEnd = "Last End";
+static final String MinRuntime = "Min Runtime";
+static final String AvgRuntime = "Avg Runtime";
+static final String MaxRuntime = "Max Runtime";
+static final String PercentBusy = "% Busy";
+static final String LastUpdate = "Last Update";
+static final String LastProgress = "Last Progress";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String MajorFragment = "Major Fragment ID";
--- End diff --

Maybe: "Major fragment ID as shown in the visualization"


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883286#comment-15883286
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103003252
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/FragmentWrapper.java
 ---
@@ -136,26 +240,60 @@ public String getContent() {
 biggestBatches = Math.max(biggestBatches, batches);
   }
 
-  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build(), null);
-  builder.appendCell(minor.getEndpoint().getAddress(), null);
-  builder.appendMillis(minor.getStartTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - minor.getStartTime(), 
null);
+  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build());
+  builder.appendCell(minor.getEndpoint().getAddress());
+  builder.appendMillis(minor.getStartTime() - start);
+  builder.appendMillis(minor.getEndTime() - start);
+  builder.appendMillis(minor.getEndTime() - minor.getStartTime());
 
-  builder.appendFormattedInteger(biggestIncomingRecords, null);
-  builder.appendFormattedInteger(biggestBatches, null);
+  builder.appendFormattedInteger(biggestIncomingRecords);
+  builder.appendFormattedInteger(biggestBatches);
 
-  builder.appendTime(minor.getLastUpdate(), null);
-  builder.appendTime(minor.getLastProgress(), null);
+  builder.appendTime(minor.getLastUpdate());
+  builder.appendTime(minor.getLastProgress());
 
-  builder.appendBytes(minor.getMaxMemoryUsed(), null);
-  builder.appendCell(minor.getState().name(), null);
+  builder.appendBytes(minor.getMaxMemoryUsed());
+  builder.appendCell(minor.getState().name());
 }
 
 for (final MinorFragmentProfile m : incomplete) {
-  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId(), null);
+  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId());
   builder.appendRepeated(m.getState().toString(), null, 
NUM_NULLABLE_FRAGMENTS_COLUMNS);
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String MajorFragment = "Major Fragment";
+static final String MinorFragmentsReporting = "Minor Fragments 
Reporting";
+static final String FirstStart = "First Start";
+static final String LastStart = "Last Start";
+static final String FirstEnd = "First End";
+static final String LastEnd = "Last End";
+static final String MinRuntime = "Min Runtime";
+static final String AvgRuntime = "Avg Runtime";
+static final String MaxRuntime = "Max Runtime";
+static final String PercentBusy = "% Busy";
+static final String LastUpdate = "Last Update";
+static final String LastProgress = "Last Progress";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String MajorFragment = "Major Fragment ID";
+static final String MinorFragmentsReporting = "Number of Minor 
Fragments Spawned";
+static final String FirstStart = "Earliest start of a fragment since 
query submission";
+static final String LastStart = "Latest start of a fragment since 
query submission";
+static final String FirstEnd = "Earliest completion time a fragment";
--- End diff --

"time a" --> "time of a".

Or, better, "Time at which the first fragment completed."


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated 

[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883291#comment-15883291
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103004451
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/OperatorWrapper.java
 ---
@@ -179,12 +207,47 @@ public String getMetricsTable() {
   }
   for (final Number value : values) {
 if (value != null) {
-  builder.appendFormattedNumber(value, null);
+  builder.appendFormattedNumber(value);
 } else {
-  builder.appendCell("", null);
+  builder.appendCell("");
 }
   }
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String OperatorID = "Operator ID";
+static final String Type = "Type";
+static final String AvgSetupTime = "Avg Setup Time";
+static final String MaxSetupTime = "Max Setup Time";
+static final String AvgProcessTime = "Avg Process Time";
+static final String MaxProcessTime = "Max Process Time";
+static final String MinWaitTime = "Min Wait Time";
+static final String AvgWaitTime = "Avg Wait Time";
+static final String MaxWaitTime = "Max Wait Time";
+static final String PercentFragmentTime = "% Fragment Time";
+static final String PercentQueryTime = "% Query Time";
+static final String Rows = "Rows";
+static final String AvgPeakMemory = "Avg Peak Memory";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String OperatorID = "Operator ID";
+static final String Type = "Operator Type";
+static final String AvgSetupTime = "Average Time in setting up 
fragments";
+static final String MaxSetupTime = "Longest Time a fragment took in 
setup";
+static final String AvgProcessTime = "Average processing time for a 
fragment";
+static final String MaxProcessTime = "Longest Time a fragment took to 
process";
+static final String MinWaitTime = "Shortest time a fragment spent in 
waiting for data";
--- End diff --

Waits can be for any reason, so "Shortest time a fragment spent waiting"


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883290#comment-15883290
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103004076
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/OperatorWrapper.java
 ---
@@ -179,12 +207,47 @@ public String getMetricsTable() {
   }
   for (final Number value : values) {
 if (value != null) {
-  builder.appendFormattedNumber(value, null);
+  builder.appendFormattedNumber(value);
 } else {
-  builder.appendCell("", null);
+  builder.appendCell("");
 }
   }
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String OperatorID = "Operator ID";
+static final String Type = "Type";
+static final String AvgSetupTime = "Avg Setup Time";
+static final String MaxSetupTime = "Max Setup Time";
+static final String AvgProcessTime = "Avg Process Time";
+static final String MaxProcessTime = "Max Process Time";
+static final String MinWaitTime = "Min Wait Time";
+static final String AvgWaitTime = "Avg Wait Time";
+static final String MaxWaitTime = "Max Wait Time";
+static final String PercentFragmentTime = "% Fragment Time";
+static final String PercentQueryTime = "% Query Time";
+static final String Rows = "Rows";
+static final String AvgPeakMemory = "Avg Peak Memory";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String OperatorID = "Operator ID";
+static final String Type = "Operator Type";
+static final String AvgSetupTime = "Average Time in setting up 
fragments";
--- End diff --

Sentence case: "Time" --> "time"


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883292#comment-15883292
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103000425
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/FragmentWrapper.java
 ---
@@ -136,26 +240,60 @@ public String getContent() {
 biggestBatches = Math.max(biggestBatches, batches);
   }
 
-  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build(), null);
-  builder.appendCell(minor.getEndpoint().getAddress(), null);
-  builder.appendMillis(minor.getStartTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - minor.getStartTime(), 
null);
+  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build());
+  builder.appendCell(minor.getEndpoint().getAddress());
+  builder.appendMillis(minor.getStartTime() - start);
+  builder.appendMillis(minor.getEndTime() - start);
+  builder.appendMillis(minor.getEndTime() - minor.getStartTime());
 
-  builder.appendFormattedInteger(biggestIncomingRecords, null);
-  builder.appendFormattedInteger(biggestBatches, null);
+  builder.appendFormattedInteger(biggestIncomingRecords);
+  builder.appendFormattedInteger(biggestBatches);
 
-  builder.appendTime(minor.getLastUpdate(), null);
-  builder.appendTime(minor.getLastProgress(), null);
+  builder.appendTime(minor.getLastUpdate());
+  builder.appendTime(minor.getLastProgress());
 
-  builder.appendBytes(minor.getMaxMemoryUsed(), null);
-  builder.appendCell(minor.getState().name(), null);
+  builder.appendBytes(minor.getMaxMemoryUsed());
+  builder.appendCell(minor.getState().name());
 }
 
 for (final MinorFragmentProfile m : incomplete) {
-  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId(), null);
+  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId());
   builder.appendRepeated(m.getState().toString(), null, 
NUM_NULLABLE_FRAGMENTS_COLUMNS);
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String MajorFragment = "Major Fragment";
--- End diff --

Java conventions are to use UPPER_CASE for constants.


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883282#comment-15883282
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103000585
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/FragmentWrapper.java
 ---
@@ -136,26 +240,60 @@ public String getContent() {
 biggestBatches = Math.max(biggestBatches, batches);
   }
 
-  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build(), null);
-  builder.appendCell(minor.getEndpoint().getAddress(), null);
-  builder.appendMillis(minor.getStartTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - start, null);
-  builder.appendMillis(minor.getEndTime() - minor.getStartTime(), 
null);
+  builder.appendCell(new 
OperatorPathBuilder().setMajor(major).setMinor(minor).build());
+  builder.appendCell(minor.getEndpoint().getAddress());
+  builder.appendMillis(minor.getStartTime() - start);
+  builder.appendMillis(minor.getEndTime() - start);
+  builder.appendMillis(minor.getEndTime() - minor.getStartTime());
 
-  builder.appendFormattedInteger(biggestIncomingRecords, null);
-  builder.appendFormattedInteger(biggestBatches, null);
+  builder.appendFormattedInteger(biggestIncomingRecords);
+  builder.appendFormattedInteger(biggestBatches);
 
-  builder.appendTime(minor.getLastUpdate(), null);
-  builder.appendTime(minor.getLastProgress(), null);
+  builder.appendTime(minor.getLastUpdate());
+  builder.appendTime(minor.getLastProgress());
 
-  builder.appendBytes(minor.getMaxMemoryUsed(), null);
-  builder.appendCell(minor.getState().name(), null);
+  builder.appendBytes(minor.getMaxMemoryUsed());
+  builder.appendCell(minor.getState().name());
 }
 
 for (final MinorFragmentProfile m : incomplete) {
-  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId(), null);
+  builder.appendCell(major.getMajorFragmentId() + "-" + 
m.getMinorFragmentId());
   builder.appendRepeated(m.getState().toString(), null, 
NUM_NULLABLE_FRAGMENTS_COLUMNS);
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
--- End diff --

Thanks for adding the explanations!


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883283#comment-15883283
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103004131
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/OperatorWrapper.java
 ---
@@ -179,12 +207,47 @@ public String getMetricsTable() {
   }
   for (final Number value : values) {
 if (value != null) {
-  builder.appendFormattedNumber(value, null);
+  builder.appendFormattedNumber(value);
 } else {
-  builder.appendCell("", null);
+  builder.appendCell("");
 }
   }
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String OperatorID = "Operator ID";
+static final String Type = "Type";
+static final String AvgSetupTime = "Avg Setup Time";
+static final String MaxSetupTime = "Max Setup Time";
+static final String AvgProcessTime = "Avg Process Time";
+static final String MaxProcessTime = "Max Process Time";
+static final String MinWaitTime = "Min Wait Time";
+static final String AvgWaitTime = "Avg Wait Time";
+static final String MaxWaitTime = "Max Wait Time";
+static final String PercentFragmentTime = "% Fragment Time";
+static final String PercentQueryTime = "% Query Time";
+static final String Rows = "Rows";
+static final String AvgPeakMemory = "Avg Peak Memory";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String OperatorID = "Operator ID";
+static final String Type = "Operator Type";
+static final String AvgSetupTime = "Average Time in setting up 
fragments";
+static final String MaxSetupTime = "Longest Time a fragment took in 
setup";
--- End diff --

"Time" --> "time"


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883285#comment-15883285
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103004559
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/OperatorWrapper.java
 ---
@@ -179,12 +207,47 @@ public String getMetricsTable() {
   }
   for (final Number value : values) {
 if (value != null) {
-  builder.appendFormattedNumber(value, null);
+  builder.appendFormattedNumber(value);
 } else {
-  builder.appendCell("", null);
+  builder.appendCell("");
 }
   }
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String OperatorID = "Operator ID";
+static final String Type = "Type";
+static final String AvgSetupTime = "Avg Setup Time";
+static final String MaxSetupTime = "Max Setup Time";
+static final String AvgProcessTime = "Avg Process Time";
+static final String MaxProcessTime = "Max Process Time";
+static final String MinWaitTime = "Min Wait Time";
+static final String AvgWaitTime = "Avg Wait Time";
+static final String MaxWaitTime = "Max Wait Time";
+static final String PercentFragmentTime = "% Fragment Time";
+static final String PercentQueryTime = "% Query Time";
+static final String Rows = "Rows";
+static final String AvgPeakMemory = "Avg Peak Memory";
+static final String MaxPeakMemory = "Max Peak Memory";
+  }
+
+  private class OverviewTblTooltip {
+static final String OperatorID = "Operator ID";
+static final String Type = "Operator Type";
+static final String AvgSetupTime = "Average Time in setting up 
fragments";
+static final String MaxSetupTime = "Longest Time a fragment took in 
setup";
+static final String AvgProcessTime = "Average processing time for a 
fragment";
+static final String MaxProcessTime = "Longest Time a fragment took to 
process";
+static final String MinWaitTime = "Shortest time a fragment spent in 
waiting for data";
+static final String AvgWaitTime = "Average wait time for a fragment";
+static final String MaxWaitTime = "Longest Time a fragment spent in 
waiting data";
--- End diff --

"Time" --> "time"

"waiting data" --> "waiting for data" or just "waiting"


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883293#comment-15883293
 ] 

ASF GitHub Bot commented on DRILL-5195:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/756#discussion_r103003954
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/OperatorWrapper.java
 ---
@@ -179,12 +207,47 @@ public String getMetricsTable() {
   }
   for (final Number value : values) {
 if (value != null) {
-  builder.appendFormattedNumber(value, null);
+  builder.appendFormattedNumber(value);
 } else {
-  builder.appendCell("", null);
+  builder.appendCell("");
 }
   }
 }
 return builder.build();
   }
+
+  private class OverviewTblTxt {
+static final String OperatorID = "Operator ID";
--- End diff --

These are constants. So, Java convention is upper case: OPERATOR_ID.


> Publish Operator and MajorFragment Stats in Profile page
> 
>
> Key: DRILL-5195
> URL: https://issues.apache.org/jira/browse/DRILL-5195
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Attachments: dbit_complete.png, dbit_inflight.png, dbit_opOverview.png
>
>
> Currently, we show runtimes for major fragments, and min,max,avg times for 
> setup, processing and waiting for various operators.
> It would be worthwhile to have additional stats for the following:
> MajorFragment
>   %Busy - % of the active time for all the minor fragments within each major 
> fragment that they were busy. 
> Operator Profile
>   %Busy - % of the active time for all the fragments within each operator 
> that they were busy. 
>   Records - Total number of records propagated out by that operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5287) Provide option to skip updates of ephemeral state changes in Zookeeper

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883279#comment-15883279
 ] 

ASF GitHub Bot commented on DRILL-5287:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/758#discussion_r103005063
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -1010,7 +1010,9 @@ public void addToEventQueue(final QueryState 
newState, final Exception exception
 
   private void recordNewState(final QueryState newState) {
 state = newState;
-queryManager.updateEphemeralState(newState);
+if 
(queryContext.getOptions().getOption(ExecConstants.ZK_QUERY_STATE_UPDATE)) {
+  queryManager.updateEphemeralState(newState);
+}
--- End diff --

For long running queries, it may not make much difference. It adds latency 
of around ~50-60 msec for single query. However, with high concurrency, impact 
of contention because of zookeeper updates is significant. Like I mentioned in 
the JIRA, for concurrency=100, the average query response time for simple 
queries is 8 sec vs 0.2 sec with these updates disabled.  It does not impact 
the query profile. Query profile gets updated and written at the end of the 
query as usual.  This option affects only running queries. In Web UI, you will 
not see running queries and their state.


> Provide option to skip updates of ephemeral state changes in Zookeeper
> --
>
> Key: DRILL-5287
> URL: https://issues.apache.org/jira/browse/DRILL-5287
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.10
>
>
> We put transient profiles in zookeeper and update state as query progresses 
> and changes states. It is observed that this adds latency of ~45msec for each 
> update in the query execution path. This gets even worse when high number of 
> concurrent queries are in progress. For concurrency=100, the average query 
> response time even for short queries  is 8 sec vs 0.2 sec with these updates 
> disabled. For short lived queries in a high-throughput scenario, it is of no 
> value to update state changes in zookeeper. We need an option to disable 
> these updates for short running operational queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5196) Could not run a single MongoDB unit test case through command line or IDE

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883274#comment-15883274
 ] 

ASF GitHub Bot commented on DRILL-5196:
---

Github user gparai commented on the issue:

https://github.com/apache/drill/pull/741
  
+1 LGTM


> Could not run a single MongoDB unit test case through command line or IDE
> -
>
> Key: DRILL-5196
> URL: https://issues.apache.org/jira/browse/DRILL-5196
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>
> Could not run a single MongoDB's unit test through IDE or command line. The 
> reason is when running a single test case, the MongoDB instance did not get 
> started thus a 'table not found' error for 'mongo.employee.empinfo' would be 
> raised.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-4897) NumberFormatException in Drill SQL while casting to BIGINT when its actually a number

2017-02-24 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua reassigned DRILL-4897:
---

Assignee: Khurram Faraaz

[~khfaraaz] Can you take a look at this and try a repro?

> NumberFormatException in Drill SQL while casting to BIGINT when its actually 
> a number
> -
>
> Key: DRILL-4897
> URL: https://issues.apache.org/jira/browse/DRILL-4897
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Srihari Karanth
>Assignee: Khurram Faraaz
>Priority: Blocker
>
> In the following SQL, drill cribs when trying to convert a number which is in 
> varchar
>select cast (case IsNumeric(Delta_Radio_Delay)  
> when 0 then 0 else Delta_Radio_Delay end as BIGINT) 
> from datasource.`./sometable` 
> where Delta_Radio_Delay='4294967294';
> BIGINT should be able to take very large number. I dont understand how it 
> throws the below error:
> 0: jdbc:drill:> select cast (case IsNumeric(Delta_Radio_Delay)  
> when 0 then 0 else Delta_Radio_Delay end as BIGINT) 
> from datasource.`./sometable` 
> where Delta_Radio_Delay='4294967294';
> Error: SYSTEM ERROR: NumberFormatException: 4294967294
> Fragment 1:29
> [Error Id: a63bb113-271f-4d8b-8194-2c9728543200 on cluster-3:31010] 
> (state=,code=0)
> How can i modify SQL to fix this?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4355) Query can not be cancelled when it is in the planning phase

2017-02-24 Thread Kunal Khatua (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883268#comment-15883268
 ] 

Kunal Khatua commented on DRILL-4355:
-

[~amansinha100] Is it possible to have a fix for this *without* resolving 
CALCITE-872 ? There are cases where the planning time can be more than a 
minute, where in it might be worthwhile to provide some mechanism to 
cancel/abort. 

> Query can not be cancelled when it is in the planning phase
> ---
>
> Key: DRILL-4355
> URL: https://issues.apache.org/jira/browse/DRILL-4355
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Victoria Markman
>Priority: Critical
>
> It's a known issue, but I could not find a bug ... Please close it as a 
> duplicate if there is one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5255) Unit tests fail due to CTTAS temporary name space checks

2017-02-24 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5255:
---
Labels: ready-to-commit  (was: )

> Unit tests fail due to CTTAS temporary name space checks
> 
>
> Key: DRILL-5255
> URL: https://issues.apache.org/jira/browse/DRILL-5255
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Arina Ielchiieva
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> Drill can operate in embedded mode. In this mode, no storage plugin 
> definitions other than the defaults may be present. In particular, when using 
> the Drill test framework, only those storage plugins defined in the Drill 
> code are available.
> Yet, Drill checks for the existence of the dfs.tmp plugin definition (as 
> named by the {{drill.exec.default_temporary_workspace}} parameter. Because 
> this plugin is not defined, an exception occurs:
> {code}
> org.apache.drill.common.exceptions.UserException: PARSE ERROR: Unable to 
> create or drop tables/views. Schema [dfs.tmp] is immutable.
> [Error Id: 792d4e5d-3f31-4f38-8bb4-d108f1a808f6 ]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>   at 
> org.apache.drill.exec.planner.sql.SchemaUtilites.resolveToMutableDrillSchema(SchemaUtilites.java:184)
>   at 
> org.apache.drill.exec.planner.sql.SchemaUtilites.getTemporaryWorkspace(SchemaUtilites.java:201)
>   at 
> org.apache.drill.exec.server.Drillbit.validateTemporaryWorkspace(Drillbit.java:264)
>   at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:135)
>   at 
> org.apache.drill.test.ClusterFixture.startDrillbits(ClusterFixture.java:207)
>   ...
> {code}
> Expected that either a configuration would exist that would use the default 
> /tmp/drill location, or that the check for {{drill.tmp}} would be deferred 
> until it is actually required (such as when executing a CTTAS statement.)
> It seemed that the test framework must be altered to work around this problem 
> by defining the necessary workspace. Unfortunately, the Drillbit must start 
> before we can define the workspace needed for the Drillbit to start. So, this 
> workaround is not possible.
> Further, users of the embedded Drillbit may not know to do this configuration.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >