from:"Chunhui Shi \(JIRA\)"

[jira] [Created] (DRILL-4695) Startup failure should be logged in log file.

2016-05-25 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-4695:
--

 Summary: Startup failure should be logged in log file.
 Key: DRILL-4695
 URL: https://issues.apache.org/jira/browse/DRILL-4695
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi


When drillbit failed to start, the thrown exception did not get logged in 
drillbit.log. In the log we can only see "Shutdown begun" as shown below.

2016-05-25 13:58:26,132 [main] DEBUG o.apache.drill.exec.server.Drillbit - 
Shutdown begun.
2016-05-25 13:58:28,150 [pool-5-thread-2] INFO  
o.a.drill.exec.rpc.data.DataServer - closed eventLoopGroup 
io.netty.channel.epoll.EpollEventLoopGroup@2164289f in 1014 ms
2016-05-25 13:58:28,150 [pool-5-thread-1] INFO  
o.a.drill.exec.rpc.user.UserServer - closed eventLoopGroup 
io.netty.channel.epoll.EpollEventLoopGroup@2164289f in 1014 ms
2016-05-25 13:58:28,150 [pool-5-thread-2] INFO  
o.a.drill.exec.service.ServiceEngine - closed dataPool in 1015 ms
2016-05-25 13:58:28,150 [pool-5-thread-1] INFO  
o.a.drill.exec.service.ServiceEngine - closed userServer in 1015 ms
2016-05-25 13:58:28,177 [main] WARN  o.apache.drill.exec.server.Drillbit - 
Failure on close()




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4724) convert_from(binary_string(expression),'INT_BE') results in Exception

2016-06-15 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332880#comment-15332880
 ] 

Chunhui Shi commented on DRILL-4724:


Hi Khurram, it should not be double slash '\\' but one slash instead.

My test result is this:
0: jdbc:drill:zk=local> select
convert_from(binary_string('\x99\x8c\x2f\x77'),'INT_BE') from (values(1));
+--+
+--+
+--+


On Wed, Jun 15, 2016 at 3:09 AM, Khurram Faraaz (JIRA) 
wrote:



> convert_from(binary_string(expression),'INT_BE') results in Exception
> -
>
> Key: DRILL-4724
> URL: https://issues.apache.org/jira/browse/DRILL-4724
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.7.0
>Reporter: Khurram Faraaz
>Assignee: Chunhui Shi
>
> The below query that uses binary_string function results in Exception
> Drill git commit ID : 6286c0a4
> {noformat}
> 2016-06-15 09:20:43,623 [289ee213-8ada-808f-e59d-5a6b67c53732:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 289ee213-8ada-808f-e59d-5a6b67c53732: 
> values(convert_from(binary_string('\\x99\\x8c\\x2f\\x77'),'INT_BE'))
> 2016-06-15 09:20:43,666 [289ee213-8ada-808f-e59d-5a6b67c53732:foreman] ERROR 
> o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: IllegalArgumentException: 
> Wrong length 8(8-0) in the buffer '\x5C\x99\x5C\x8C\x5C/\x5Cw', expected 4.
> [Error Id: bb6968cd-44c2-4c48-bb12-865f8709167e on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalArgumentException: Wrong length 8(8-0) in the buffer 
> '\x5C\x99\x5C\x8C\x5C/\x5Cw', expected 4.
> [Error Id: bb6968cd-44c2-4c48-bb12-865f8709167e on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:791)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:901) 
> [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:271) 
> [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: Internal error: Error while 
> applying rule ReduceExpressionsRule_Project, args 
> [rel#1460:LogicalProject.NONE.ANY([]).[](input=rel#1459:Subset#0.NONE.ANY([]).[0],EXPR$0=CONVERT_FROMINT_BE(BINARY_STRING('\\x99\\x8c\\x2f\\x77')))]
> ... 4 common frames omitted
> Caused by: java.lang.AssertionError: Internal error: Error while applying 
> rule ReduceExpressionsRule_Project, args 
> [rel#1460:LogicalProject.NONE.ANY([]).[](input=rel#1459:Subset#0.NONE.ANY([]).[0],EXPR$0=CONVERT_FROMINT_BE(BINARY_STRING('\\x99\\x8c\\x2f\\x77')))]
> at org.apache.calcite.util.Util.newInternal(Util.java:792) 
> ~[calcite-core-1.4.0-drill-r11.jar:1.4.0-drill-r11]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:251)
>  ~[calcite-core-1.4.0-drill-r11.jar:1.4.0-drill-r11]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
>  ~[calcite-core-1.4.0-drill-r11.jar:1.4.0-drill-r11]
> at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) 
> ~[calcite-core-1.4.0-drill-r11.jar:1.4.0-drill-r11]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:400)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:339)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:237)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:286)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:168)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
>

[jira] [Assigned] (DRILL-4478) binary_string cannot convert buffer that were not start from 0 correctly

2016-03-23 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-4478:
--

Assignee: Chunhui Shi

> binary_string cannot convert buffer that were not start from 0 correctly
> 
>
> Key: DRILL-4478
> URL: https://issues.apache.org/jira/browse/DRILL-4478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>
> When binary_string was called multiple times, it can only convert the first 
> one correctly if the drillbuf start from 0. For the second and afterwards 
> calls, because the drillbuf is not starting from 0 thus 
> DrillStringUtils.parseBinaryString could not do the work correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4478) binary_string cannot convert buffer that were not start from 0 correctly

2016-03-04 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-4478:
--

 Summary: binary_string cannot convert buffer that were not start 
from 0 correctly
 Key: DRILL-4478
 URL: https://issues.apache.org/jira/browse/DRILL-4478
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Codegen
Reporter: Chunhui Shi


When binary_string was called multiple times, it can only convert the first one 
correctly if the drillbuf start from 0. For the second and afterwards calls, 
because the drillbuf is not starting from 0 thus 
DrillStringUtils.parseBinaryString could not do the work correctly.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4618) random numbers generator function broken

2016-05-17 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-4618:
--

Assignee: Chunhui Shi

> random numbers generator function broken
> 
>
> Key: DRILL-4618
> URL: https://issues.apache.org/jira/browse/DRILL-4618
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>
> File this JIRA based on the the bug description from Ted's email and 
> discussion in dev mail list for record purpose:
> I am trying to generate some random numbers. I have a large base file (foo)
> this is what I get:
> 0: jdbc:drill:>  select floor(1000*random()) as x, floor(1000*random()) as
> y, floor(1000*rand()) as z from (select * from maprfs.tdunning.foo) a limit
> 20;
> ++++
> |   x|   y|   z|
> ++++
> | 556.0  | 556.0  | 618.0  |
> | 564.0  | 564.0  | 618.0  |
> | 129.0  | 129.0  | 618.0  |
> | 48.0   | 48.0   | 618.0  |
> | 696.0  | 696.0  | 618.0  |
> | 642.0  | 642.0  | 618.0  |
> | 535.0  | 535.0  | 618.0  |
> | 440.0  | 440.0  | 618.0  |
> | 894.0  | 894.0  | 618.0  |
> | 24.0   | 24.0   | 618.0  |
> | 508.0  | 508.0  | 618.0  |
> | 28.0   | 28.0   | 618.0  |
> | 816.0  | 816.0  | 618.0  |
> | 717.0  | 717.0  | 618.0  |
> | 334.0  | 334.0  | 618.0  |
> | 978.0  | 978.0  | 618.0  |
> | 646.0  | 646.0  | 618.0  |
> | 787.0  | 787.0  | 618.0  |
> | 260.0  | 260.0  | 618.0  |
> | 711.0  | 711.0  | 618.0  |
> ++++
> On this page, https://drill.apache.org/docs/math-and-trig/, the rand
> function is described and random() is not. But it appears that rand()
> delivers a constant instead (although a different constant each time the
> query is run) and it appears that random() delivers the same value when
> used multiple times in each returned value.
> This seems very, very wrong.
> The fault does not seem to be related to my querying a table:
> 0: jdbc:drill:> select rand(), random(), random() from (values (1),(2),(3))
> x;
> +-+---+---+
> |   EXPR$0|EXPR$1 |EXPR$2 |
> +-+---+---+
> | 0.1347749257216052  | 0.36724556209765014   | 0.36724556209765014   |
> | 0.1347749257216052  | 0.006087161689924625  | 0.006087161689924625  |
> | 0.1347749257216052  | 0.09417099142512142   | 0.09417099142512142   |
> +-+---+---+
> For reference, postgres doesn't have rand() and does the right thing with
> random().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4237) Skew in hash distribution

2016-04-20 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi resolved DRILL-4237.

   Resolution: Fixed
 Reviewer: Aman Sinha
Fix Version/s: 1.7.0

> Skew in hash distribution
> -
>
> Key: DRILL-4237
> URL: https://issues.apache.org/jira/browse/DRILL-4237
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Chunhui Shi
> Fix For: 1.7.0
>
>
> Apparently, the fix in DRILL-4119 did not fully resolve the data skew issue.  
> It worked fine on the smaller sample of the data set but on another sample of 
> the same data set, it still produces skewed values - see below the hash 
> values which are all odd numbers. 
> {noformat}
> 0: jdbc:drill:zk=local> select columns[0], hash32(columns[0]) from `test.csv` 
> limit 10;
> +---+--+
> |  EXPR$0   |EXPR$1|
> +---+--+
> | f71aaddec3316ae18d43cb1467e88a41  | 1506011089   |
> | 3f3a13bb45618542b5ac9d9536704d3a  | 1105719049   |
> | 6935afd0c693c67bba482cedb7a2919b  | -18137557|
> | ca2a938d6d7e57bda40501578f98c2a8  | -1372666789  |
> | fab7f08402c8836563b0a5c94dbf0aec  | -1930778239  |
> | 9eb4620dcb68a84d17209da279236431  | -970026001   |
> | 16eed4a4e801b98550b4ff504242961e  | 356133757|
> | a46f7935fea578ce61d8dd45bfbc2b3d  | -94010449|
> | 7fdf5344536080c15deb2b5a2975a2b7  | -141361507   |
> | b82560a06e2e51b461c9fe134a8211bd  | -375376717   |
> +---+--+
> {noformat}
> This indicates an underlying issue with the XXHash64 java implementation, 
> which is Drill's implementation of the C version.  One of the key difference 
> as pointed out by [~jnadeau] was the use of unsigned int64 in the C version 
> compared to the Java version which uses (signed) long.  I created an XXHash 
> version using com.google.common.primitives.UnsignedLong.  However, 
> UnsignedLong does not have bit-wise operations that are needed for XXHash 
> such as rotateLeft(),  XOR etc.  One could write wrappers for these but at 
> this point, the question is: should we think of an alternative hash function 
> ? 
> The alternative approach could be the murmur hash for numeric data types that 
> we were using earlier and the Mahout version of hash function for string 
> types 
> (https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/HashHelper.java#L28).
>   As a test, I reverted to this function and was getting good hash 
> distribution for the test data. 
> I could not find any performance comparisons of our perf tests (TPC-H or DS) 
> with the original and newer (XXHash) hash functions.  If performance is 
> comparable, should we revert to the original function ?  
> As an aside, I would like to remove the hash64 versions of the functions 
> since these are not used anywhere. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (DRILL-4478) binary_string cannot convert buffer that were not start from 0 correctly

2016-04-20 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi resolved DRILL-4478.

   Resolution: Fixed
 Reviewer: Aman Sinha
Fix Version/s: 1.7.0

> binary_string cannot convert buffer that were not start from 0 correctly
> 
>
> Key: DRILL-4478
> URL: https://issues.apache.org/jira/browse/DRILL-4478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
> Fix For: 1.7.0
>
>
> When binary_string was called multiple times, it can only convert the first 
> one correctly if the drillbuf start from 0. For the second and afterwards 
> calls, because the drillbuf is not starting from 0 thus 
> DrillStringUtils.parseBinaryString could not do the work correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4620) Drill query Hbase table got base64 encoded results while Hbase Shell show table content correctly

2016-04-20 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-4620:
--

 Summary: Drill query Hbase table got base64 encoded results while 
Hbase Shell show table content correctly 
 Key: DRILL-4620
 URL: https://issues.apache.org/jira/browse/DRILL-4620
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi


Create a table using hbase shell following the steps in 
https://www.mapr.com/blog/secondary-indexing-mapr-db-using-elasticsearch. 
However query the generated table in drill showing base64 encoded results but 
not the correct plaintext. As shown below:

[root@atsqa4-128 ~]# hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.98.12-mapr-1602, rcf7a299d9b0a24150d4a13cbce7fc9eac9b2404d, Tue Mar  
1 19:32:45 UTC 2016

Not all HBase shell commands are applicable to MapR tables.
Consult MapR documentation for the list of supported commands.

hbase(main):001:0> scan '/user/person'
ROW 
COLUMN+CELL 


 1  
column=details:address, timestamp=1461110148447, value=350 Holger Way   


 1  
column=details:fname, timestamp=1461110112541, value=Tom


 1  
column=details:lname, timestamp=1461110121828, value=John   


 2  
column=details:address, timestamp=1461110227143, value=340 Holger Way   


 2  
column=details:fname, timestamp=1461110171622, value=David  


 2  
column=details:lname, timestamp=1461110189721, value=Robert 


 3  
column=details:address, timestamp=1461110282174, value=310 Holger Way   


 3  
column=details:fname, timestamp=1461110248477, value=Samuel 


 3  
column=details:lname, timestamp=1461110268460, value=Trump  


 4  
column=details:address, timestamp=1461110355548, value=100 Zanker Ave   


 4  
column=details:fname, timestamp=1461110307194, value=Christina  


 4  
column=details:lname, timestamp=1461110332695, value=Rogers 


4 row(s) in 0.1380 seconds

hbase(main):002:0> exit
[root@atsqa4-128 ~]# /opt/mapr/drill/drill-1.7.0/bin/sqlline -u 
"jdbc:drill:zk=10.10.88.125:5181"
apache drill 1.7.0-SNAPSHOT 
"what ever the mind of man can conceive and believe, drill can query"
0:

[jira] [Closed] (DRILL-4620) Drill query Hbase table got base64 encoded results while Hbase Shell show table content correctly

2016-04-20 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi closed DRILL-4620.
--
Resolution: Not A Bug

> Drill query Hbase table got base64 encoded results while Hbase Shell show 
> table content correctly 
> --
>
> Key: DRILL-4620
> URL: https://issues.apache.org/jira/browse/DRILL-4620
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>
> Create a table using hbase shell following the steps in 
> https://www.mapr.com/blog/secondary-indexing-mapr-db-using-elasticsearch. 
> However query the generated table in drill showing base64 encoded results but 
> not the correct plaintext. As shown below:
> [root@atsqa4-128 ~]# hbase shell
> HBase Shell; enter 'help' for list of supported commands.
> Type "exit" to leave the HBase Shell
> Version 0.98.12-mapr-1602, rcf7a299d9b0a24150d4a13cbce7fc9eac9b2404d, Tue Mar 
>  1 19:32:45 UTC 2016
> Not all HBase shell commands are applicable to MapR tables.
> Consult MapR documentation for the list of supported commands.
> hbase(main):001:0> scan '/user/person'
> ROW 
> COLUMN+CELL   
>   
> 
>  1  
> column=details:address, timestamp=1461110148447, value=350 Holger Way 
>   
> 
>  1  
> column=details:fname, timestamp=1461110112541, value=Tom  
>   
> 
>  1  
> column=details:lname, timestamp=1461110121828, value=John 
>   
> 
>  2  
> column=details:address, timestamp=1461110227143, value=340 Holger Way 
>   
> 
>  2  
> column=details:fname, timestamp=1461110171622, value=David
>   
> 
>  2  
> column=details:lname, timestamp=1461110189721, value=Robert   
>   
> 
>  3  
> column=details:address, timestamp=1461110282174, value=310 Holger Way 
>   
> 
>  3  
> column=details:fname, timestamp=1461110248477, value=Samuel   
>   
> 
>  3  
> column=details:lname, timestamp=1461110268460, value=Trump
>   
> 
>  4  
> column=details:address, timestamp=1461110355548, value=100 Zanker Ave 
>   
> 
>  4  
> column=details:fname, timestamp=1461110307194, value=Christina
>   
> 
>  4  
> column=details:lname, timestamp=1461110332695, value=Rogers   
>

[jira] [Created] (DRILL-4618) random numbers generator function broken

2016-04-18 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-4618:
--

 Summary: random numbers generator function broken
 Key: DRILL-4618
 URL: https://issues.apache.org/jira/browse/DRILL-4618
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi


File this JIRA based on the the bug description from Ted's email and discussion 
in dev mail list for record purpose:

I am trying to generate some random numbers. I have a large base file (foo)
this is what I get:

0: jdbc:drill:>  select floor(1000*random()) as x, floor(1000*random()) as
y, floor(1000*rand()) as z from (select * from maprfs.tdunning.foo) a limit
20;
++++
|   x|   y|   z|
++++
| 556.0  | 556.0  | 618.0  |
| 564.0  | 564.0  | 618.0  |
| 129.0  | 129.0  | 618.0  |
| 48.0   | 48.0   | 618.0  |
| 696.0  | 696.0  | 618.0  |
| 642.0  | 642.0  | 618.0  |
| 535.0  | 535.0  | 618.0  |
| 440.0  | 440.0  | 618.0  |
| 894.0  | 894.0  | 618.0  |
| 24.0   | 24.0   | 618.0  |
| 508.0  | 508.0  | 618.0  |
| 28.0   | 28.0   | 618.0  |
| 816.0  | 816.0  | 618.0  |
| 717.0  | 717.0  | 618.0  |
| 334.0  | 334.0  | 618.0  |
| 978.0  | 978.0  | 618.0  |
| 646.0  | 646.0  | 618.0  |
| 787.0  | 787.0  | 618.0  |
| 260.0  | 260.0  | 618.0  |
| 711.0  | 711.0  | 618.0  |
++++

On this page, https://drill.apache.org/docs/math-and-trig/, the rand
function is described and random() is not. But it appears that rand()
delivers a constant instead (although a different constant each time the
query is run) and it appears that random() delivers the same value when
used multiple times in each returned value.

This seems very, very wrong.

The fault does not seem to be related to my querying a table:

0: jdbc:drill:> select rand(), random(), random() from (values (1),(2),(3))
x;
+-+---+---+
|   EXPR$0|EXPR$1 |EXPR$2 |
+-+---+---+
| 0.1347749257216052  | 0.36724556209765014   | 0.36724556209765014   |
| 0.1347749257216052  | 0.006087161689924625  | 0.006087161689924625  |
| 0.1347749257216052  | 0.09417099142512142   | 0.09417099142512142   |
+-+---+---+

For reference, postgres doesn't have rand() and does the right thing with
random().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-4143) REFRESH TABLE METADATA - Permission Issues with metadata files

2016-05-24 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi closed DRILL-4143.
--
   Resolution: Fixed
Fix Version/s: (was: Future)
   1.7.0

> REFRESH TABLE METADATA - Permission Issues with metadata files
> --
>
> Key: DRILL-4143
> URL: https://issues.apache.org/jira/browse/DRILL-4143
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0, 1.4.0
>Reporter: John Omernik
>Assignee: Chunhui Shi
>  Labels: Metadata, Parquet, Permissions
> Fix For: 1.7.0
>
>
> Summary of Refresh Metadata Issues confirmed by two different users on Drill 
> User Mailing list. (Title: REFRESH TABLE METADATA - Access Denied)
> This issue pertains to table METADATA and revolves around user 
> authentication. 
> Basically, when the drill bits are running as one user, and the data is owned 
> by another user, there can be access denied issues on subsequent queries 
> after issuing a REFRESH TABLE METADATA command. 
> To troubleshoot what is actually happening, I turned on MapR Auditing (This 
> is a handy feature) and found that when I run a query (that is giving me 
> access denied.. my query is select count(1) from testtable ) Per MapR the 
> user I am logged in as (dataowner) is trying to do a create operation on the 
> .drill.parquet_metadata file and it's failing with status: 17. Per Keys at 
> MapR, "status 17 means errno 17 which means EEXIST. Looks like Drill is 
> trying to create a file that already exists." This seems to indicate that 
> drill is perhaps trying to create the .drill.parquet_metadata on each select 
> as the dataowner user, but the permissions (as seen below) don't allow it. 
> Here are the steps to reproduce:
> Enable Authentication. 
> Run all drill bits in the cluster as "drillbituser", then have the files 
> owned by "dataowner". Note the root of the table permissions are drwxrwxr-x 
> but as Drill loads each partition it loads them as drwxr-xr-x (all with 
> dataowner:dataowner ownership). That may be something too, the default 
> permissions when creating a table?  Another note, in my setup, drillbituser 
> is in the group for dataowner.  Thus, they should always have read access. 
> # Authenticated as dataowner (this should have full permissions to all the 
> data)
> Enter username for jdbc:drill:zk=zknode1:5181: dataowner
> Enter password for jdbc:drill:zk=zknode1:5181: **
> 0: jdbc:drill:zk=zknode1> use dfs.dev;
> +---+--+
> |  ok   |   summary|
> +---+--+
> | true  | Default schema changed to [dfs.dev]  |
> +---+--+
> 1 row selected (0.307 seconds)
> # The query works fine with no table metadata
> 0: jdbc:drill:zk=zknode1> select count(1) from `testtable`;
> +---+
> |  EXPR$0   |
> +---+
> | 24565203  |
> +---+
> 1 row selected (3.392 seconds)
> # Refresh of metadata works under with no errors
> 0: jdbc:drill:zk=zknode1> refresh table metadata `testtable`;
> +---+---+
> |  ok   |summary|
> +---+---+
> | true  | Successfully updated metadata for table testtable.  |
> +---+---+
> 1 row selected (5.767 seconds)
>  
> # Trying to run the same query, it returns a access denied issue. 
> 0: jdbc:drill:zk=zknode1> select count(1) from `testtable`;
> Error: SYSTEM ERROR: IOException: 2127.7646.2950962 
> /data/dev/testtable/2015-11-12/.drill.parquet_metadata (Permission denied)
>  
>  
> [Error Id: 7bfce2e7-f78d-4fba-b047-f4c85b471de4 on node1:31010] 
> (state=,code=0)
>  
>  
> # Note how all the files are owned by the drillbituser. Per discussion on 
> list, this is normal 
>  
> $ find ./ -type f -name ".drill.parquet_metadata" -exec ls -ls {} \;
> 726 -rwxr-xr-x 1 drillbituser drillbituser 742837 Nov 30 14:27 
> ./2015-11-12/.drill.parquet_metadata
> 583 -rwxr-xr-x 1 drillbituser drillbituser 596146 Nov 30 14:27 
> ./2015-11-29/.drill.parquet_metadata
> 756 -rwxr-xr-x 1 drillbituser drillbituser 773811 Nov 30 14:27 
> ./2015-11-11/.drill.parquet_metadata
> 763 -rwxr-xr-x 1 drillbituser drillbituser 780829 Nov 30 14:27 
> ./2015-11-04/.drill.parquet_metadata
> 632 -rwxr-xr-x 1 drillbituser drillbituser 646851 Nov 30 14:27 
> ./2015-11-08/.drill.parquet_metadata
> 845 -rwxr-xr-x 1 drillbituser drillbituser 864421 Nov 30 14:27 
> ./2015-11-05/.drill.parquet_metadata
> 771 -rwxr-xr-x 1 drillbituser drillbituser 788823 Nov 30 14:27 
>

[jira] [Created] (DRILL-4777) Fuse generated code to reduce code size and gain performance improvement

2016-07-12 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-4777:
--

 Summary: Fuse generated code to reduce code size and gain 
performance improvement
 Key: DRILL-4777
 URL: https://issues.apache.org/jira/browse/DRILL-4777
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi


Drill generates code for operators, compile the classes and load them on the 
fly of a query. However, in large query cases, the generated code will become 
hundreds KB or could be even bigger. We have seen multiple issues reported when 
generated code is too big either due to Java's size limit to one method or 
degrated performance of compiling or executing. Also when I looked at JIT 
optimization logs, there are many complaining about 'hot method too big'

Some measures can be considered to reduce the code size, such as, 
1) For now Drill embed function calls' code directly into generated code, this 
make one line function call to become 5-10 lines code in generated java 
classes. If we inject these functions as a private functions of the classes and 
directly call them in main function body, this could reduce code size while the 
cost of function call can be erased by JIT inline optimization.

2) Drill generate one variable for each column, if the column number became 
dozens to one hundred, the code will be redundant, we could consider using an 
array to store the value vectors and loop on top of it so the code size will be 
reduced even more.







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4778) revisit of using two Java compilers for code generation

2016-07-12 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-4778:
--

 Summary: revisit of using two Java compilers for code generation
 Key: DRILL-4778
 URL: https://issues.apache.org/jira/browse/DRILL-4778
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi


Currently Drill uses two Java compilers: Janino when code size is <256K, JDK 
compiler if source code size > 256K.

However, recently I notice that for large source code(~550K), while JDK 
compiler seems 30% faster in compiling, the generated code is 20% larger and 
the execution is much slower(10 times slower in some cases):

2016-07-07 21:17:32,647 [28813914-36f3-8bf5-5d4b-7de2daba5341:frag:0:0] DEBUG 
o.a.d.exec.compile.JDKClassCompiler - Compiling (source size=654.7 KiB):
2016-07-07 21:17:48,806 [28813914-36f3-8bf5-5d4b-7de2daba5341:frag:0:0] DEBUG 
o.a.d.exec.compile.ClassTransformer - Done compiling (bytecode size=300.6 KiB, 
time:16165 millis).
2016-07-07 21:23:52,389 [2881379e-5ebf-966d-c099-f813f9f99ab4:frag:0:0] DEBUG 
o.a.d.e.compile.JaninoClassCompiler - Compiling (source size=654.7 KiB):
2016-07-07 21:24:14,584 [2881379e-5ebf-966d-c099-f813f9f99ab4:frag:0:0] DEBUG 
o.a.d.exec.compile.ClassTransformer - Done compiling (bytecode size=241.8 KiB, 
time:2 millis).

Seems we should stick to one compiler: Janino only. And we should measure and 
root cause the performance difference of these two compilers especially for 
large queries. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4777) Fuse generated code to reduce code size and gain performance improvement

2016-07-13 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374543#comment-15374543
 ] 

Chunhui Shi commented on DRILL-4777:


Example code pieces for function compare_to_nulls_high and hash32 embedded in 
generated source HashTableGenNN.java:

3021:   // start of eval portion of compare_to_nulls_high 
function. //
3022:   IntHolder out17 = new IntHolder();
3023:   {
3024:   final IntHolder out = new IntHolder();
3025:   NullableBigIntHolder left = out12;
3026:   NullableBigIntHolder right = out16;
3027:
3028:   
GCompareBigIntVsBigInt$GCompareNullableBigIntVsNullableBigIntNullHigh_eval: {
3029:   outside:
3030:   {
3031:   if (left.isSet == 0) {
3032:   if (right.isSet == 0) {
3033:   out.value = 0;
3034:   break outside;
3035:   } else
3036:   {
3037:   out.value = 1;
3038:   break outside;
3039:   }
3040:   } else
3041:   if (right.isSet == 0) {
3042:   out.value = -1;
3043:   break outside;
3044:   }
3045:   out.value = left.value < right.value ? -1 : (left.value == 
right.value ? 0 : 1);
3046:   }
3047:   }
3048:
3049:   out17 = out;
3050:   }



462:IntHolder out127 = new IntHolder();
463:{
464:final IntHolder out = new IntHolder();
465:NullableVarCharHolder in = out91;
466:IntHolder seed = out126;
467: 
468:Hash32FunctionsWithSeed$NullableVarCharHash_eval: {
469:if (in.isSet == 0) {
470:out.value = seed.value;
471:} else
472:{
473:out.value = 
org.apache.drill.exec.expr.fn.impl.XXHash.hash32(in.start, in.end, in.buffer, 
seed.value);
474:}
475:}
476: 
477:out127 = out;
478:}
479:// end of eval portion of hash32 function. //

> Fuse generated code to reduce code size and gain performance improvement
> 
>
> Key: DRILL-4777
> URL: https://issues.apache.org/jira/browse/DRILL-4777
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>
> Drill generates code for operators, compile the classes and load them on the 
> fly of a query. However, in large query cases, the generated code will become 
> hundreds KB or could be even bigger. We have seen multiple issues reported 
> when generated code is too big either due to Java's size limit to one method 
> or degrated performance of compiling or executing. Also when I looked at JIT 
> optimization logs, there are many complaining about 'hot method too big'
> Some measures can be considered to reduce the code size, such as, 
> 1) For now Drill embed function calls' code directly into generated code, 
> this make one line function call to become 5-10 lines code in generated java 
> classes. If we inject these functions as a private functions of the classes 
> and directly call them in main function body, this could reduce code size 
> while the cost of function call can be erased by JIT inline optimization.
> 2) Drill generate one variable for each column, if the column number became 
> dozens to one hundred, the code will be redundant, we could consider using an 
> array to store the value vectors and loop on top of it so the code size will 
> be reduced even more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4092) Support for INTERSECT

2016-07-21 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-4092:
--

Assignee: Chunhui Shi

> Support for INTERSECT 
> --
>
> Key: DRILL-4092
> URL: https://issues.apache.org/jira/browse/DRILL-4092
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Victoria Markman
>Assignee: Chunhui Shi
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4783) Flatten on CONVERT_FROM fails with ClassCastException if resultset is empty

2016-07-15 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-4783:
--

 Summary: Flatten on CONVERT_FROM fails with ClassCastException if 
resultset is empty
 Key: DRILL-4783
 URL: https://issues.apache.org/jira/browse/DRILL-4783
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Priority: Critical


Flatten failed to work on top of convert_from when the resultset is empty. 

For a HBase table like this:

0: jdbc:drill:zk=localhost:5181> select convert_from(t.address.cities,'json') 
from hbase.`/tmp/flattentest` t;
+--+
|  EXPR$0   
   |
+--+
| {"list":[{"city":"SunnyVale"},{"city":"Palo Alto"},{"city":"Mountain View"}]} 
   |
| {"list":[{"city":"Seattle"},{"city":"Bellevue"},{"city":"Renton"}]}   
   |
| {"list":[{"city":"Minneapolis"},{"city":"Falcon Heights"},{"city":"San 
Paul"}]}  |
+--+

Flatten works when row_key is in (1,2,3)
0: jdbc:drill:zk=localhost:5181> select flatten(t1.json.list) from (select 
convert_from(t.address.cities,'json') json from hbase.`/tmp/flattentest` t 
where row_key=1) t1;
+---+
|  EXPR$0   |
+---+
| {"city":"SunnyVale"}  |
| {"city":"Palo Alto"}  |
| {"city":"Mountain View"}  |
+---+

But Flatten throws exception if the resultset is empty

0: jdbc:drill:zk=localhost:5181> select flatten(t1.json.list) from (select 
convert_from(t.address.cities,'json') json from hbase.`/tmp/flattentest` t 
where row_key=4) t1;
Error: SYSTEM ERROR: ClassCastException: Cannot cast 
org.apache.drill.exec.vector.NullableIntVector to 
org.apache.drill.exec.vector.complex.RepeatedValueVector

Fragment 0:0

[Error Id: 07fd0cab-d1e6-4259-bfec-ad80f02d93a2 on atsqa4-127.qa.lab:31010] 
(state=,code=0)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4783) Flatten on CONVERT_FROM fails with ClassCastException if resultset is empty

2016-07-15 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-4783:
--

Assignee: Chunhui Shi

> Flatten on CONVERT_FROM fails with ClassCastException if resultset is empty
> ---
>
> Key: DRILL-4783
> URL: https://issues.apache.org/jira/browse/DRILL-4783
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>Priority: Critical
>
> Flatten failed to work on top of convert_from when the resultset is empty. 
> For a HBase table like this:
> 0: jdbc:drill:zk=localhost:5181> select convert_from(t.address.cities,'json') 
> from hbase.`/tmp/flattentest` t;
> +--+
> |  EXPR$0 
>  |
> +--+
> | {"list":[{"city":"SunnyVale"},{"city":"Palo Alto"},{"city":"Mountain 
> View"}]}|
> | {"list":[{"city":"Seattle"},{"city":"Bellevue"},{"city":"Renton"}]} 
>  |
> | {"list":[{"city":"Minneapolis"},{"city":"Falcon Heights"},{"city":"San 
> Paul"}]}  |
> +--+
> Flatten works when row_key is in (1,2,3)
> 0: jdbc:drill:zk=localhost:5181> select flatten(t1.json.list) from (select 
> convert_from(t.address.cities,'json') json from hbase.`/tmp/flattentest` t 
> where row_key=1) t1;
> +---+
> |  EXPR$0   |
> +---+
> | {"city":"SunnyVale"}  |
> | {"city":"Palo Alto"}  |
> | {"city":"Mountain View"}  |
> +---+
> But Flatten throws exception if the resultset is empty
> 0: jdbc:drill:zk=localhost:5181> select flatten(t1.json.list) from (select 
> convert_from(t.address.cities,'json') json from hbase.`/tmp/flattentest` t 
> where row_key=4) t1;
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> Fragment 0:0
> [Error Id: 07fd0cab-d1e6-4259-bfec-ad80f02d93a2 on atsqa4-127.qa.lab:31010] 
> (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4783) Flatten on CONVERT_FROM fails with ClassCastException if resultset is empty

2016-07-15 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15380422#comment-15380422
 ] 

Chunhui Shi commented on DRILL-4783:


Flatten operator should be able to initialize ValueVector appropriately even 
underlying operator (in this case it is Project to retrieve t1.`json`.`list`) 
could not initialize ValueVector correctly since without data underlying 
operator has no idea what  t1.`json`.`list` is.

I propose to fix it by modifying FlattenRecordBatch.java when there is 
ClassCastException thrown due to failing to cast ValueVector(NullInt in this 
example) to RepeatedValueVector, new a RepeatedMapVector instead.

> Flatten on CONVERT_FROM fails with ClassCastException if resultset is empty
> ---
>
> Key: DRILL-4783
> URL: https://issues.apache.org/jira/browse/DRILL-4783
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>Priority: Critical
>
> Flatten failed to work on top of convert_from when the resultset is empty. 
> For a HBase table like this:
> 0: jdbc:drill:zk=localhost:5181> select convert_from(t.address.cities,'json') 
> from hbase.`/tmp/flattentest` t;
> +--+
> |  EXPR$0 
>  |
> +--+
> | {"list":[{"city":"SunnyVale"},{"city":"Palo Alto"},{"city":"Mountain 
> View"}]}|
> | {"list":[{"city":"Seattle"},{"city":"Bellevue"},{"city":"Renton"}]} 
>  |
> | {"list":[{"city":"Minneapolis"},{"city":"Falcon Heights"},{"city":"San 
> Paul"}]}  |
> +--+
> Flatten works when row_key is in (1,2,3)
> 0: jdbc:drill:zk=localhost:5181> select flatten(t1.json.list) from (select 
> convert_from(t.address.cities,'json') json from hbase.`/tmp/flattentest` t 
> where row_key=1) t1;
> +---+
> |  EXPR$0   |
> +---+
> | {"city":"SunnyVale"}  |
> | {"city":"Palo Alto"}  |
> | {"city":"Mountain View"}  |
> +---+
> But Flatten throws exception if the resultset is empty
> 0: jdbc:drill:zk=localhost:5181> select flatten(t1.json.list) from (select 
> convert_from(t.address.cities,'json') json from hbase.`/tmp/flattentest` t 
> where row_key=4) t1;
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> Fragment 0:0
> [Error Id: 07fd0cab-d1e6-4259-bfec-ad80f02d93a2 on atsqa4-127.qa.lab:31010] 
> (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5247) Text form of EXPLAIN statement does not have same information as profile

2017-02-08 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858883#comment-15858883
 ] 

Chunhui Shi commented on DRILL-5247:


"explain plan including all attributes for " will get the cost details 
returned. Is that not enough, if you need cost information about abandoned 
plans, usually I will go to calcite logs.


> Text form of EXPLAIN statement does not have same information as profile
> 
>
> Key: DRILL-5247
> URL: https://issues.apache.org/jira/browse/DRILL-5247
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Create as simple query. Run it and view the "physical plan" in the Web UI or 
> the profile JSON. That plan contains a rich set of information about operator 
> costs and so on.
> Now, with the same query, execute an EXPLAIN statement. The resulting plan 
> looks like the one in the profile, but lacks the cost detail. (See below.)
> Since the cost detail comes from the planner, and is essential to 
> understanding why a plan was chosen, the information should appear in the 
> EXPLAIN output. (After all, the output is supposed to EXPLAIN the plan...)
> Example of EXPLAIN output:
> {code}
> 00-00Screen
> 00-01  Project(id_i=[$0], name_s20=[$1])
> 00-02SelectionVectorRemover
> 00-03  Filter(condition=[=($0, 10)])
> 00-04Scan(groupscan=[MockGroupScanPOP [url=null, 
> readEntries=[MockScanEntry [records=1, columns=[MockColumn 
> [minorType=INT, name=id_i, mode=REQUIRED], MockColumn [minorType=VARCHAR, 
> name=name_s20, mode=REQUIRED]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (DRILL-5247) Text form of EXPLAIN statement does not have same information as profile

2017-02-08 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858883#comment-15858883
 ] 

Chunhui Shi edited comment on DRILL-5247 at 2/9/17 2:19 AM:


"explain plan including all attributes for " will get the cost details 
returned. Is that not enough? if you need cost information about abandoned 
plans, usually I will go to calcite logs.



was (Author: cshi):
"explain plan including all attributes for " will get the cost details 
returned. Is that not enough, if you need cost information about abandoned 
plans, usually I will go to calcite logs.


> Text form of EXPLAIN statement does not have same information as profile
> 
>
> Key: DRILL-5247
> URL: https://issues.apache.org/jira/browse/DRILL-5247
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Create as simple query. Run it and view the "physical plan" in the Web UI or 
> the profile JSON. That plan contains a rich set of information about operator 
> costs and so on.
> Now, with the same query, execute an EXPLAIN statement. The resulting plan 
> looks like the one in the profile, but lacks the cost detail. (See below.)
> Since the cost detail comes from the planner, and is essential to 
> understanding why a plan was chosen, the information should appear in the 
> EXPLAIN output. (After all, the output is supposed to EXPLAIN the plan...)
> Example of EXPLAIN output:
> {code}
> 00-00Screen
> 00-01  Project(id_i=[$0], name_s20=[$1])
> 00-02SelectionVectorRemover
> 00-03  Filter(condition=[=($0, 10)])
> 00-04Scan(groupscan=[MockGroupScanPOP [url=null, 
> readEntries=[MockScanEntry [records=1, columns=[MockColumn 
> [minorType=INT, name=id_i, mode=REQUIRED], MockColumn [minorType=VARCHAR, 
> name=name_s20, mode=REQUIRED]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-5196) Could not run a single MongoDB unit test case through command line or IDE

2017-02-24 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi updated DRILL-5196:
---
Reviewer: Paul Rogers

> Could not run a single MongoDB unit test case through command line or IDE
> -
>
> Key: DRILL-5196
> URL: https://issues.apache.org/jira/browse/DRILL-5196
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>
> Could not run a single MongoDB's unit test through IDE or command line. The 
> reason is when running a single test case, the MongoDB instance did not get 
> started thus a 'table not found' error for 'mongo.employee.empinfo' would be 
> raised.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-5114) Rationalize use of Logback logging in unit tests

2017-02-24 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi updated DRILL-5114:
---
Labels: ready-to-commit  (was: )

> Rationalize use of Logback logging in unit tests
> 
>
> Key: DRILL-5114
> URL: https://issues.apache.org/jira/browse/DRILL-5114
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
>
> Drill uses Logback as its logger. The logger is used in several to display 
> some test output. Test output is sent to stdout, rather than a log file. 
> Since Drill also uses Logback, that same configuration sends much Drill 
> logging output to stdout as well, cluttering test output.
> Logback requires that that one Logback config file (either logback.xml or 
> hogback-test.xml) exist on the class path. Tests store the config file in the 
> src/test/resources folder of each sub-project.
> These files set the default logging level to debug. While this setting is 
> fine when working with individual tests, the output is overwhelming for bulk 
> test runs.
> The first requested change is to set the default logging level to error.
> The existing config files are usually called "logback.xml." Change the name 
> of test files to "logback-test.xml" to make clear that they are, in fact, 
> test configs.
> The {{exec/java-exec/src/test/resources/logback.xml}} config file is a full 
> version of Drill's production config file. Replace this with a config 
> suitable for testing (that is, the same as other modules.)
> The java-exec project includes a production-like config file in its non-test 
> sources: {{exec/java-exec/src/main/resources/logback.xml}}. Remove this as it 
> is not needed. (Instead, rely on the one shipped in the distribution 
> subsystem, which is the one copied to the Drill distribution.)
> Since Logback complains bitterly (via many log messages) when it cannot find 
> a configuration file (and each sub-module must have its own test 
> configuration), add missing logging configuration files:
> * exec/memory/base/src/test/resources/logback-test.xml
> * logical/src/test/resources/logback-test.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5297) Print the plan text when plan pattern check fails in unit tests

2017-02-24 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5297:
--

 Summary: Print the plan text when plan pattern check fails in unit 
tests 
 Key: DRILL-5297
 URL: https://issues.apache.org/jira/browse/DRILL-5297
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi


If we have a unit test did not generate expected plan, we will print only the 
expected pattern like this:

Did not find expected pattern in plan: Scan.*FindLimit0Visitor"

We should also print the plan here for debugging purpose.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5296) Add option to allow "explain plan for" to hide the json of the plan

2017-02-23 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5296:
--

 Summary: Add option to allow "explain plan for" to hide the json 
of the plan
 Key: DRILL-5296
 URL: https://issues.apache.org/jira/browse/DRILL-5296
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi


Most of time, we just want to see the plan tree for a query in sqlline when we 
run "explain plan for". The json part is for replay the query but may not be so 
useful for most cases, and even for a simple query it could be hundreds lines. 
So consider adding an option "with/without json"(like "with/without 
implementation") to control whether to show the plan's json or not. 

The default should be "without json" but this can be arguable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5286) When rel and target candidate set is the same, planner should not need to do convert for the relNode since it must have been done

2017-02-21 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5286:
--

 Summary: When rel and target candidate set is the same, planner 
should not need to do convert for the relNode since it must have been done
 Key: DRILL-5286
 URL: https://issues.apache.org/jira/browse/DRILL-5286
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-5273) CompliantTextReader exhausts 4 GB memory when reading 5000 small files

2017-02-22 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi updated DRILL-5273:
---
Labels: ready-to-commit  (was: )

> CompliantTextReader exhausts 4 GB memory when reading 5000 small files
> --
>
> Key: DRILL-5273
> URL: https://issues.apache.org/jira/browse/DRILL-5273
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> A test case was created that consists of 5000 text files, each with a single 
> line with the file number: 1 to 5001. Each file has a single record, and at 
> most 4 characters per record.
> Run the following query:
> {code}
> SELECT * FROM `dfs.data`.`5000files/text
> {code}
> The query will fail with an OOM in the scan batch on around record 3700 on a 
> Mac with 4GB of direct memory.
> The code to read records in {ScanBatch} is complex. The following appears to 
> occur:
> * Iterate over the record readers for each file.
> * For each, call setup
> The setup code is:
> {code}
>   public void setup(OperatorContext context, OutputMutator outputMutator) 
> throws ExecutionSetupException {
> oContext = context;
> readBuffer = context.getManagedBuffer(READ_BUFFER);
> whitespaceBuffer = context.getManagedBuffer(WHITE_SPACE_BUFFER);
> {code}
> The two buffers are in direct memory. There is no code that releases the 
> buffers.
> The sizes are:
> {code}
>   private static final int READ_BUFFER = 1024*1024;
>   private static final int WHITE_SPACE_BUFFER = 64*1024;
> = 1,048,576 + 65536 = 1,114,112
> {code}
> This is exactly the amount of memory that accumulates per call to 
> {{ScanBatch.next()}}
> {code}
> Ctor: 0  -- Initial memory in constructor
> Init setup: 1114112  -- After call to first record reader setup
> Entry Memory: 1114112  -- first next() call, returns one record
> Entry Memory: 1114112  -- second next(), eof and start second reader
> Entry Memory: 2228224 -- third next(), second reader returns EOF
> ...
> {code}
> If we leak 1 MB per file, with 5000 files we would leak 5 GB of memory, which 
> would explain the OOM when given only 4 GB.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-4868) Hive functions should update writerIndex accordingly when return binary type

2016-08-30 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-4868:
--

 Summary: Hive functions should update writerIndex accordingly when 
return binary type
 Key: DRILL-4868
 URL: https://issues.apache.org/jira/browse/DRILL-4868
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi


unhex is a Hive function. the returned binary buffer could not be consumed by 
convert_from as shown below.

0: jdbc:drill:zk=10.10.88.128:5181> select 
convert_from(unhex('0a5f710b'),'int_be') from (values(1));
Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex(0) + length(4) 
exceeds writerIndex(0): DrillBuf[31], udle: [25 0..1024]
Fragment 0:0
[Error Id: 5e72ce4a-6164-4260-8317-ca2bb6325013 on atsqa4-128.qa.lab:31010] 
(state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4862) wrong results - use of convert_from(binary_string(key),'UTF8') in filter results in wrong results

2016-10-06 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-4862:
--

Assignee: Chunhui Shi

> wrong results - use of convert_from(binary_string(key),'UTF8') in filter 
> results in wrong results
> -
>
> Key: DRILL-4862
> URL: https://issues.apache.org/jira/browse/DRILL-4862
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Chunhui Shi
>
> These results do not look right, i.e when the predicate has 
> convert_from(binary_string(key),'UTF8')
> Apache drill 1.8.0-SNAPSHOT git commit ID: 57dc9f43
> {noformat}
> [root@centos-0x drill4478]# cat f1.json
> {"key":"\\x30\\x31\\x32\\x33"}
> {"key":"\\x34\\x35\\x36\\x37"}
> {"key":"\\x38\\x39\\x30\\x31"}
> {"key":"\\x30\\x30\\x30\\x30"}
> {"key":"\\x31\\x31\\x31\\x31"}
> {"key":"\\x35\\x35\\x35\\x35"}
> {"key":"\\x38\\x38\\x38\\x38"}
> {"key":"\\x39\\x39\\x39\\x39"}
> {"key":"\\x41\\x42\\x43\\x44"}
> {"key":"\\x45\\x46\\x47\\x48"}
> {"key":"\\x49\\x41\\x44\\x46"}
> {"key":"\\x4a\\x4b\\x4c\\x4d"}
> {"key":"\\x57\\x58\\x59\\x5a"}
> {"key":"\\x4e\\x4f\\x50\\x51"}
> {"key":"\\x46\\x46\\x46\\x46"}
> {noformat}
> results without the predicate - these are correct results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json`;
> +-+
> | EXPR$0  |
> +-+
> | 0123|
> | 4567|
> | 8901|
> | |
> | |
> | |
> | |
> | |
> | ABCD|
> | EFGH|
> | IADF|
> | JKLM|
> | WXYZ|
> | NOPQ|
> | |
> +-+
> 15 rows selected (0.256 seconds)
> {noformat}
> results with a predicate - these results don't look correct
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select convert_from(binary_string(key),'UTF8') 
> from `f1.json` where convert_from(binary_string(key),'UTF8') is not null;
> +--+
> |  EXPR$0  |
> +--+
> | 0123123  |
> | 4567567  |
> | 8901901  |
> | 000  |
> | 111  |
> | 555  |
> | 888  |
> | 999  |
> | ABCDBCD  |
> | EFGHFGH  |
> | IADFADF  |
> | JKLMKLM  |
> | WXYZXYZ  |
> | NOPQOPQ  |
> | FFF  |
> +--+
> 15 rows selected (0.279 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4982) Hive Queries degrade when queries switch between different formats

2016-10-30 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-4982:
--

 Summary: Hive Queries degrade when queries switch between 
different formats
 Key: DRILL-4982
 URL: https://issues.apache.org/jira/browse/DRILL-4982
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi
Priority: Critical


We have seen degraded performance by doing these steps:
1) generate the repro data:

python script repro.py as below:

import string
import random
 
for i in range(3000):
x1 = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in 
range(random.randrange(19, 27)))
x2 = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in 
range(random.randrange(19, 27)))
x3 = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in 
range(random.randrange(19, 27)))
x4 = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in 
range(random.randrange(19, 27)))
x5 = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in 
range(random.randrange(19, 27)))
x6 = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in 
range(random.randrange(19, 27)))

print 
"{0}".format(x1),"{0}".format(x2),"{0}".format(x3),"{0}".format(x4),"{0}".format(x5),"{0}".format(x6)


python repro.py > repro.csv

2) put these files in a dfs directory e.g. '/tmp/hiveworkspace/plain'. Under 
hive prompt, use the following sql command to create an external table:
CREATE EXTERNAL TABLE `hiveworkspace`.`plain` (`id1` string, `id2` string, 
`id3` string, `id4` string, `id5` string, `id6` string) ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS TEXTFILE LOCATION 
'/tmp/hiveworkspace/plain'

3) create Hive's table of ORC|PARQUET format:
CREATE TABLE `hiveworkspace`.`plainorc` STORED AS ORC AS SELECT 
id1,id2,id3,id4,id5,id6 from `hiveworkspace`.`plain`;
CREATE TABLE `hiveworkspace`.`plainparquet` STORED AS PARQUET AS SELECT 
id1,id2,id3,id4,id5,id6 from `hiveworkspace`.`plain`;

4) Query switch between these two tables, then the query time on the same table 
significantly lengthened. On my setup, for ORC, it was 15sec -> 26secs. Queries 
on table of other formats, after injecting a query to other formats, all have 
significant slow down.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4951) Running single HBase Unit Test results in error: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.()V

2016-10-18 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-4951:
--

 Summary: Running single HBase Unit Test results in error: 
java.lang.IllegalAccessError: tried to access method 
com.google.common.base.Stopwatch.()V
 Key: DRILL-4951
 URL: https://issues.apache.org/jira/browse/DRILL-4951
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi


Under contrib/storage-hbase, running this command:
mvn test -Dtest=org.apache.drill.hbase.TestHBaseQueries#testWithEmptyTable

Got an error complaining Stopwatch does not have an expected constructor.

Running org.apache.drill.hbase.TestHBaseQueries
10:13:58.402 [main] WARN  o.a.hadoop.util.NativeCodeLoader - Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
10:14:01.458 [main] WARN  o.a.h.metrics2.impl.MetricsConfig - Cannot locate 
configuration: tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
10:14:02.020 [main] WARN  o.a.hadoop.hbase.http.HttpRequestLog - Jetty request 
log can only be enabled using Log4j
10:14:02.584 [localhost:37323.activeMasterManager] WARN  
org.apache.hadoop.hbase.ZNodeClearer - Environment variable HBASE_ZNODE_FILE 
not set; znodes will not be cleared on crash by start scripts (Longer MTTR!)
10:14:03.130 [JvmPauseMonitor] ERROR o.a.z.server.NIOServerCnxnFactory - Thread 
Thread[JvmPauseMonitor,5,main] died
java.lang.IllegalAccessError: tried to access method 
com.google.common.base.Stopwatch.()V from class 
org.apache.hadoop.hbase.util.JvmPauseMonitor$Monitor
at 
org.apache.hadoop.hbase.util.JvmPauseMonitor$Monitor.run(JvmPauseMonitor.java:154)
 ~[hbase-server-1.1.3.jar:1.1.3]
at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_101]
10:14:03.157 [JvmPauseMonitor] ERROR o.a.z.server.NIOServerCnxnFactory - Thread 
Thread[JvmPauseMonitor,5,main] died
java.lang.IllegalAccessError: tried to access method 
com.google.common.base.Stopwatch.()V from class 
org.apache.hadoop.hbase.util.JvmPauseMonitor$Monitor
at 
org.apache.hadoop.hbase.util.JvmPauseMonitor$Monitor.run(JvmPauseMonitor.java:154)
 ~[hbase-server-1.1.3.jar:1.1.3]
at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_101]
10:14:03.670 [localhost:37323.activeMasterManager] WARN  
o.a.h.h.p.s.wal.WALProcedureStore - Log directory not found: File 
file:/home/shi/dev/chunhui-shi/drill/contrib/storage-hbase/target/test-data/cea28708-595f-4585-ba37-9ba2a85ff0b1/MasterProcWALs
 does not exist
10:14:03.907 [RS:0;localhost:43220] WARN  o.a.h.h.regionserver.HRegionServer - 
reportForDuty failed; sleeping and then retrying.
10:14:04.931 [RS:0;localhost:43220] WARN  org.apache.hadoop.hbase.ZNodeClearer 
- Environment variable HBASE_ZNODE_FILE not set; znodes will not be cleared on 
crash by start scripts (Longer MTTR!)
10:14:04.981 [localhost:37323.activeMasterManager] ERROR 
o.apache.hadoop.hbase.master.HMaster - Failed to become active master
java.lang.IllegalAccessError: tried to access method 
com.google.common.base.Stopwatch.()V from class 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator
at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:596)
 ~[hbase-client-1.1.3.jar:1.1.3]
at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.waitMetaRegionLocation(MetaTableLocator.java:217)
 ~[hbase-client-1.1.3.jar:1.1.3]
at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaServerConnection(MetaTableLocator.java:363)
 ~[hbase-client-1.1.3.jar:1.1.3]
at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.verifyMetaRegionLocation(MetaTableLocator.java:283)
 ~[hbase-client-1.1.3.jar:1.1.3]
at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:907) 
~[hbase-server-1.1.3.jar:1.1.3]
at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:743)
 ~[hbase-server-1.1.3.jar:1.1.3]
at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:183) 
~[hbase-server-1.1.3.jar:1.1.3]
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1652) 
~[hbase-server-1.1.3.jar:1.1.3]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
10:14:04.982 [localhost:37323.activeMasterManager] ERROR 
o.apache.hadoop.hbase.master.HMaster - Master server abort: loaded coprocessors 
are: []
10:14:04.985 [localhost:37323.activeMasterManager] ERROR 
o.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting shutdown.
java.lang.IllegalAccessError: tried to access method 
com.google.common.base.Stopwatch.()V from class 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator
at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:596)
 ~[hbase-client-1.1.3.jar:1.1.3]
at 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.waitMetaRegionLocation(MetaTableLocator.java:217)

[jira] [Commented] (DRILL-5032) Drill query on hive parquet table failed with OutOfMemoryError: Java heap space

2016-11-22 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687464#comment-15687464
 ] 

Chunhui Shi commented on DRILL-5032:


+1, could you please attach serialized physical plan before and after fix to 
this JIRA for reference?

> Drill query on hive parquet table failed with OutOfMemoryError: Java heap 
> space
> ---
>
> Key: DRILL-5032
> URL: https://issues.apache.org/jira/browse/DRILL-5032
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Chunhui Shi
>
> Following query on hive parquet table failed with OOM Java heap space:
> {code}
> select distinct(businessdate) from vmdr_trades where trade_date='2016-04-12'
> 2016-08-31 08:02:03,597 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 283938c3-fde8-0fc6-37e1-9a568c7f5913: select distinct(businessdate) from 
> vmdr_trades where trade_date='2016-04-12'
> 2016-08-31 08:05:58,502 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2
> 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 1 ms
> 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 3 ms
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> 2016-08-31 08:05:58,664 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$1
> 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> 2016-08-31 08:09:42,355 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:3332) ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
>  ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
>  ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:136) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:76) 
> ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:457) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:166) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:76) 
> ~[na:1.8.0_74]
> at 
> com.google.protobuf.TextFormat$TextGenerator.write(TextFormat.java:538) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$TextGenerator.print(TextFormat.java:526) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:389) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286) 
> ~[protobuf-java-2.5.0.jar:na]
> at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273) 
>

[jira] [Commented] (DRILL-5032) Drill query on hive parquet table failed with OutOfMemoryError: Java heap space

2016-11-18 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678313#comment-15678313
 ] 

Chunhui Shi commented on DRILL-5032:


Have you tried to reproduce the problem, and after your fix, OOM did get 
resolved? If so, could you provide repro steps as well? I believe this fix 
improve the efficiency of memory usage, but we need proof that this fix did 
solve this JIRA. 

Another question, if some partitions has different column sets, then what would 
happen with your fix? 

Meanwhile, would like to know your estimation about heap memory usage if we did 
not have this fix on a ~750 partition table. Thanks.

> Drill query on hive parquet table failed with OutOfMemoryError: Java heap 
> space
> ---
>
> Key: DRILL-5032
> URL: https://issues.apache.org/jira/browse/DRILL-5032
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Chunhui Shi
>
> Following query on hive parquet table failed with OOM Java heap space:
> {code}
> select distinct(businessdate) from vmdr_trades where trade_date='2016-04-12'
> 2016-08-31 08:02:03,597 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 283938c3-fde8-0fc6-37e1-9a568c7f5913: select distinct(businessdate) from 
> vmdr_trades where trade_date='2016-04-12'
> 2016-08-31 08:05:58,502 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2
> 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 1 ms
> 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 3 ms
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> 2016-08-31 08:05:58,664 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$1
> 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> 2016-08-31 08:09:42,355 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:3332) ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
>  ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
>  ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:136) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:76) 
> ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:457) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:166) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:76) 
> ~[na:1.8.0_74]
> at 
> com.google.protobuf.TextFormat$TextGenerator.write(TextFormat.java:538) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$TextGenerator.print(TextFormat.java:526) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
>

[jira] [Commented] (DRILL-4842) SELECT * on JSON data results in NumberFormatException

2016-11-17 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674364#comment-15674364
 ] 

Chunhui Shi commented on DRILL-4842:


I don't think this fix is aimed to improving performance. Or the performance 
number could be different with LinkedHashSet and HashSet, but we know for sure 
adding/getting an item will cost more if it is LinkedHashSet, so unless it is 
really required, we should not add extra cost. 

By the way, have you tested this bug by trying to have less, e.g. 1000 nulls? 
Seems the problem is related to could not build schema from the first batch 
being seen, could this issue be seen with other formats/data sources that have 
null values? 


> SELECT * on JSON data results in NumberFormatException
> --
>
> Key: DRILL-4842
> URL: https://issues.apache.org/jira/browse/DRILL-4842
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Khurram Faraaz
>Assignee: Chunhui Shi
> Attachments: tooManyNulls.json
>
>
> Note that doing SELECT c1 returns correct results, the failure is seen when 
> we do SELECT star. json.all_text_mode was set to true.
> JSON file tooManyNulls.json has one key c1 with 4096 nulls as its value and 
> the 4097th key c1 has the value "Hello World"
> git commit ID : aaf220ff
> MapR Drill 1.8.0 RPM
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> alter session set 
> `store.json.all_text_mode`=true;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | store.json.all_text_mode updated.  |
> +---++
> 1 row selected (0.27 seconds)
> 0: jdbc:drill:schema=dfs.tmp> SELECT c1 FROM `tooManyNulls.json` WHERE c1 IN 
> ('Hello World');
> +--+
> |  c1  |
> +--+
> | Hello World  |
> +--+
> 1 row selected (0.243 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select * FROM `tooManyNulls.json` WHERE c1 IN 
> ('Hello World');
> Error: SYSTEM ERROR: NumberFormatException: Hello World
> Fragment 0:0
> [Error Id: 9cafb3f9-3d5c-478a-b55c-900602b8765e on centos-01.qa.lab:31010]
>  (java.lang.NumberFormatException) Hello World
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI():95
> 
> org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.varTypesToInt():120
> org.apache.drill.exec.test.generated.FiltererGen1169.doSetup():45
> org.apache.drill.exec.test.generated.FiltererGen1169.setup():54
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():195
> 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():107
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745

[jira] [Resolved] (DRILL-4695) Startup failure should be logged in log file.

2016-10-13 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi resolved DRILL-4695.

Resolution: Fixed

Fixed in pull request shown above.

> Startup failure should be logged in log file.
> -
>
> Key: DRILL-4695
> URL: https://issues.apache.org/jira/browse/DRILL-4695
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>
> When drillbit failed to start, the thrown exception did not get logged in 
> drillbit.log. In the log we can only see "Shutdown begun" as shown below.
> 2016-05-25 13:58:26,132 [main] DEBUG o.apache.drill.exec.server.Drillbit - 
> Shutdown begun.
> 2016-05-25 13:58:28,150 [pool-5-thread-2] INFO  
> o.a.drill.exec.rpc.data.DataServer - closed eventLoopGroup 
> io.netty.channel.epoll.EpollEventLoopGroup@2164289f in 1014 ms
> 2016-05-25 13:58:28,150 [pool-5-thread-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - closed eventLoopGroup 
> io.netty.channel.epoll.EpollEventLoopGroup@2164289f in 1014 ms
> 2016-05-25 13:58:28,150 [pool-5-thread-2] INFO  
> o.a.drill.exec.service.ServiceEngine - closed dataPool in 1015 ms
> 2016-05-25 13:58:28,150 [pool-5-thread-1] INFO  
> o.a.drill.exec.service.ServiceEngine - closed userServer in 1015 ms
> 2016-05-25 13:58:28,177 [main] WARN  o.apache.drill.exec.server.Drillbit - 
> Failure on close()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4946) org.objectweb.asm.tree.analysis.AnalyzerException printed to console in embedded mode

2016-10-13 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1557#comment-1557
 ] 

Chunhui Shi commented on DRILL-4946:


[~jacq...@dremio.com],  [~cwestin] 

> org.objectweb.asm.tree.analysis.AnalyzerException printed to console in 
> embedded mode
> -
>
> Key: DRILL-4946
> URL: https://issues.apache.org/jira/browse/DRILL-4946
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>Priority: Critical
>
> Testing by querying a json file got AnalyzerException printed. 
> The problem was due to scalar_replacement mode is default to be 'try', and 
> org.objectweb.asm.util.CheckMethodAdapter is printing stack trace to stderr. 
> [shi@cshi-centos1 private-drill]$ cat /tmp/conv.json 
> {"row": "0", "key": "\\x4a\\x31\\x39\\x38", "key2": "4a313938", "kp1": 
> "4a31", "kp2": "38"}
> {"row": "1", "key": null, "key2": null, "kp1": null, "kp2": null}
> {"row": "2", "key": "\\x4e\\x4f\\x39\\x51", "key2": "4e4f3951", "kp1": 
> "4e4f", "kp2": "51"}
> {"row": "3", "key": "\\x6e\\x6f\\x39\\x31", "key2": "6e6f3931", "kp1": 
> "6e6f", "kp2": "31"}
> 0: jdbc:drill:zk=local> SELECT convert_from(binary_string(key), 'INT_BE') as 
> intkey from dfs.`/tmp/conv.json`;
> org.objectweb.asm.tree.analysis.AnalyzerException: Error at instruction 158: 
> Expected an object reference, but found .
>   at org.objectweb.asm.tree.analysis.Analyzer.analyze(Analyzer.java:294)
>   at 
> org.objectweb.asm.util.CheckMethodAdapter$1.visitEnd(CheckMethodAdapter.java:450)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.objectweb.asm.util.CheckMethodAdapter.visitEnd(CheckMethodAdapter.java:1028)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.bytecode.InstructionModifier.visitEnd(InstructionModifier.java:508)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837)
>   at 
> org.apache.drill.exec.compile.bytecode.ScalarReplacementNode.visitEnd(ScalarReplacementNode.java:87)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.bytecode.AloadPopRemover.visitEnd(AloadPopRemover.java:136)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:726)
>   at org.objectweb.asm.tree.ClassNode.accept(ClassNode.java:412)
>   at 
> org.apache.drill.exec.compile.MergeAdapter.getMergedClass(MergeAdapter.java:223)
>   at 
> org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:263)
>   at 
> org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:78)
>   at 
> org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:74)
>   at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
>   at 
> org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:63)
>   at 
> org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:56)
>   at 
> org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:310)
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:484)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>   at 
>

[jira] [Created] (DRILL-4946) org.objectweb.asm.tree.analysis.AnalyzerException printed to console in embedded mode

2016-10-13 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-4946:
--

 Summary: org.objectweb.asm.tree.analysis.AnalyzerException printed 
to console in embedded mode
 Key: DRILL-4946
 URL: https://issues.apache.org/jira/browse/DRILL-4946
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi
Priority: Critical


Testing by querying a json file got AnalyzerException printed. 
The problem was due to scalar_replacement mode is default to be 'try', and 
org.objectweb.asm.util.CheckMethodAdapter is printing stack trace to stderr. 

[shi@cshi-centos1 private-drill]$ cat /tmp/conv.json 
{"row": "0", "key": "\\x4a\\x31\\x39\\x38", "key2": "4a313938", "kp1": "4a31", 
"kp2": "38"}
{"row": "1", "key": null, "key2": null, "kp1": null, "kp2": null}
{"row": "2", "key": "\\x4e\\x4f\\x39\\x51", "key2": "4e4f3951", "kp1": "4e4f", 
"kp2": "51"}
{"row": "3", "key": "\\x6e\\x6f\\x39\\x31", "key2": "6e6f3931", "kp1": "6e6f", 
"kp2": "31"}


0: jdbc:drill:zk=local> SELECT convert_from(binary_string(key), 'INT_BE') as 
intkey from dfs.`/tmp/conv.json`;
org.objectweb.asm.tree.analysis.AnalyzerException: Error at instruction 158: 
Expected an object reference, but found .
at org.objectweb.asm.tree.analysis.Analyzer.analyze(Analyzer.java:294)
at 
org.objectweb.asm.util.CheckMethodAdapter$1.visitEnd(CheckMethodAdapter.java:450)
at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
at 
org.objectweb.asm.util.CheckMethodAdapter.visitEnd(CheckMethodAdapter.java:1028)
at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
at 
org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114)
at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
at 
org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114)
at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
at 
org.apache.drill.exec.compile.bytecode.InstructionModifier.visitEnd(InstructionModifier.java:508)
at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837)
at 
org.apache.drill.exec.compile.bytecode.ScalarReplacementNode.visitEnd(ScalarReplacementNode.java:87)
at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
at 
org.apache.drill.exec.compile.bytecode.AloadPopRemover.visitEnd(AloadPopRemover.java:136)
at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837)
at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:726)
at org.objectweb.asm.tree.ClassNode.accept(ClassNode.java:412)
at 
org.apache.drill.exec.compile.MergeAdapter.getMergedClass(MergeAdapter.java:223)
at 
org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:263)
at 
org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:78)
at 
org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:74)
at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
at 
org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:63)
at 
org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:56)
at 
org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:310)
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:484)
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104)
at 
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:232)
at

[jira] [Commented] (DRILL-4946) org.objectweb.asm.tree.analysis.AnalyzerException printed to console in embedded mode

2016-10-13 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573253#comment-15573253
 ] 

Chunhui Shi commented on DRILL-4946:


We can redirect System.err, or we can set default value of scalar_replacement 
to be 'off'.  TPCH results did not show much difference between 'try' and 
'off'. What kind of query could gain observable performance gain with 
scalar_replacement?

> org.objectweb.asm.tree.analysis.AnalyzerException printed to console in 
> embedded mode
> -
>
> Key: DRILL-4946
> URL: https://issues.apache.org/jira/browse/DRILL-4946
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>Priority: Critical
>
> Testing by querying a json file got AnalyzerException printed. 
> The problem was due to scalar_replacement mode is default to be 'try', and 
> org.objectweb.asm.util.CheckMethodAdapter is printing stack trace to stderr. 
> [shi@cshi-centos1 private-drill]$ cat /tmp/conv.json 
> {"row": "0", "key": "\\x4a\\x31\\x39\\x38", "key2": "4a313938", "kp1": 
> "4a31", "kp2": "38"}
> {"row": "1", "key": null, "key2": null, "kp1": null, "kp2": null}
> {"row": "2", "key": "\\x4e\\x4f\\x39\\x51", "key2": "4e4f3951", "kp1": 
> "4e4f", "kp2": "51"}
> {"row": "3", "key": "\\x6e\\x6f\\x39\\x31", "key2": "6e6f3931", "kp1": 
> "6e6f", "kp2": "31"}
> 0: jdbc:drill:zk=local> SELECT convert_from(binary_string(key), 'INT_BE') as 
> intkey from dfs.`/tmp/conv.json`;
> org.objectweb.asm.tree.analysis.AnalyzerException: Error at instruction 158: 
> Expected an object reference, but found .
>   at org.objectweb.asm.tree.analysis.Analyzer.analyze(Analyzer.java:294)
>   at 
> org.objectweb.asm.util.CheckMethodAdapter$1.visitEnd(CheckMethodAdapter.java:450)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.objectweb.asm.util.CheckMethodAdapter.visitEnd(CheckMethodAdapter.java:1028)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.bytecode.InstructionModifier.visitEnd(InstructionModifier.java:508)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837)
>   at 
> org.apache.drill.exec.compile.bytecode.ScalarReplacementNode.visitEnd(ScalarReplacementNode.java:87)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.bytecode.AloadPopRemover.visitEnd(AloadPopRemover.java:136)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:726)
>   at org.objectweb.asm.tree.ClassNode.accept(ClassNode.java:412)
>   at 
> org.apache.drill.exec.compile.MergeAdapter.getMergedClass(MergeAdapter.java:223)
>   at 
> org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:263)
>   at 
> org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:78)
>   at 
> org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:74)
>   at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
>   at 
> org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:63)
>   at 
> org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:56)
>   at 
> org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:310)
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:484)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>   at 
>

[jira] [Commented] (DRILL-4946) org.objectweb.asm.tree.analysis.AnalyzerException printed to console in embedded mode

2016-10-16 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1558#comment-1558
 ] 

Chunhui Shi commented on DRILL-4946:


[~jacq...@dremio.com]  Thanks. We may want to investigate more on this scalar 
replacement later. What worries me is, if in some cases it generates 'invalid' 
byte code sequence and we can catch with an exception, in some other cases, it 
may generate 'incorrect' byte code but we don't have any exception raised.


> org.objectweb.asm.tree.analysis.AnalyzerException printed to console in 
> embedded mode
> -
>
> Key: DRILL-4946
> URL: https://issues.apache.org/jira/browse/DRILL-4946
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>Priority: Critical
>
> Testing by querying a json file got AnalyzerException printed. 
> The problem was due to scalar_replacement mode is default to be 'try', and 
> org.objectweb.asm.util.CheckMethodAdapter is printing stack trace to stderr. 
> [shi@cshi-centos1 private-drill]$ cat /tmp/conv.json 
> {"row": "0", "key": "\\x4a\\x31\\x39\\x38", "key2": "4a313938", "kp1": 
> "4a31", "kp2": "38"}
> {"row": "1", "key": null, "key2": null, "kp1": null, "kp2": null}
> {"row": "2", "key": "\\x4e\\x4f\\x39\\x51", "key2": "4e4f3951", "kp1": 
> "4e4f", "kp2": "51"}
> {"row": "3", "key": "\\x6e\\x6f\\x39\\x31", "key2": "6e6f3931", "kp1": 
> "6e6f", "kp2": "31"}
> 0: jdbc:drill:zk=local> SELECT convert_from(binary_string(key), 'INT_BE') as 
> intkey from dfs.`/tmp/conv.json`;
> org.objectweb.asm.tree.analysis.AnalyzerException: Error at instruction 158: 
> Expected an object reference, but found .
>   at org.objectweb.asm.tree.analysis.Analyzer.analyze(Analyzer.java:294)
>   at 
> org.objectweb.asm.util.CheckMethodAdapter$1.visitEnd(CheckMethodAdapter.java:450)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.objectweb.asm.util.CheckMethodAdapter.visitEnd(CheckMethodAdapter.java:1028)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.bytecode.InstructionModifier.visitEnd(InstructionModifier.java:508)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837)
>   at 
> org.apache.drill.exec.compile.bytecode.ScalarReplacementNode.visitEnd(ScalarReplacementNode.java:87)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.bytecode.AloadPopRemover.visitEnd(AloadPopRemover.java:136)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:726)
>   at org.objectweb.asm.tree.ClassNode.accept(ClassNode.java:412)
>   at 
> org.apache.drill.exec.compile.MergeAdapter.getMergedClass(MergeAdapter.java:223)
>   at 
> org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:263)
>   at 
> org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:78)
>   at 
> org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:74)
>   at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
>   at 
> org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:63)
>   at 
> org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:56)
>   at 
> org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:310)
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:484)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>   at 
>

[jira] [Assigned] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2016-12-14 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-5089:
--

Assignee: Chunhui Shi

> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4558) When a query returns diacritics in a string, the string is cut

2016-12-13 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-4558:
--

Assignee: Chunhui Shi

> When a query returns diacritics in a string, the string is cut
> --
>
> Key: DRILL-4558
> URL: https://issues.apache.org/jira/browse/DRILL-4558
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MongoDB
> Environment: Apache Drill 1.6
> MongoDB 3.2.1
>Reporter: Vincent Uribe
>Assignee: Chunhui Shi
>
> With the given document in a collection "Test" from a database testDb :
> {
> "_id" : ObjectId("56e7f1bd0944228aab06d0e2"),
> "ID_ATTRIBUT" : "3",
> "VAL_ATTRIBUT" : "Végétaux",
> "UPDATED" : ISODate("2016-01-09T23:00:00.000Z")
> }
> When querying select * from mongoStorage.testDb.Test I get 
> _id: [B@affb65
> ID_ATTRIBUT: 3
> VAL_ATTRIBUT: *Végéta*
> UPDATED: 2016-01-09T23:00:00.000Z
> As you can see, the two 'é' cut the string "végétaux" by 2 characters, giving 
> végéta.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1928) Refactor filter pushdown code

2016-12-13 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746991#comment-15746991
 ] 

Chunhui Shi commented on DRILL-1928:


[~vkorukanti]  was assigned with this JIRA at 1/10/2015, but the assignee field 
is now empty. Are you still working on a fix for this one? 
[~jacq...@dremio.com] 

> Refactor filter pushdown code
> -
>
> Key: DRILL-1928
> URL: https://issues.apache.org/jira/browse/DRILL-1928
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Venki Korukanti
> Fix For: Future
>
>
> Currently in many places (InfoSchema, HBase, MongoDB and FS/Hive partition 
> pruning) we have logic to push the filter into scan.
> 1. We can have common code for visiting the expression tree and determining 
> which part of the tree can be pushed to scan and which part can't be pushed 
> to scan. 
> 2. In all places, if we can't convert the complete filter, partially 
> converted filter is pushed into Scan but the complete filter is copied and 
> evaluated in Filter operator. This causes the partially pushed filter to be 
> evaluated in two places (Scan and Filter). 
> This JIRA proposes following API:
> {code}
> /**
>  * @param filter Filter expression tree
>  * @param fields List of columns names to consider when extracting the 
> filter. For example partition pruning is interested in only on expressions 
> that involve partition columns. In that case partition pruning can pass list 
> of partition columns as supported column list.
>  * @param functions List of supported functions to consider in extracting the 
> filter. For example partition pruning is interested in only "=", ">" etc. It 
> can pass these functions as the supported function list.
>  *
>  * @return Result contains two trees. One tree that can be pushed to Scan and 
> other tree that can't be pushed into Scan. Either of them can be null.
>  */
> FilterPruningResult extract(FilterExpression, FieldList, FunctionList)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-1928) Refactor filter pushdown code

2016-12-14 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-1928:
--

Assignee: Chunhui Shi

> Refactor filter pushdown code
> -
>
> Key: DRILL-1928
> URL: https://issues.apache.org/jira/browse/DRILL-1928
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Venki Korukanti
>Assignee: Chunhui Shi
> Fix For: Future
>
>
> Currently in many places (InfoSchema, HBase, MongoDB and FS/Hive partition 
> pruning) we have logic to push the filter into scan.
> 1. We can have common code for visiting the expression tree and determining 
> which part of the tree can be pushed to scan and which part can't be pushed 
> to scan. 
> 2. In all places, if we can't convert the complete filter, partially 
> converted filter is pushed into Scan but the complete filter is copied and 
> evaluated in Filter operator. This causes the partially pushed filter to be 
> evaluated in two places (Scan and Filter). 
> This JIRA proposes following API:
> {code}
> /**
>  * @param filter Filter expression tree
>  * @param fields List of columns names to consider when extracting the 
> filter. For example partition pruning is interested in only on expressions 
> that involve partition columns. In that case partition pruning can pass list 
> of partition columns as supported column list.
>  * @param functions List of supported functions to consider in extracting the 
> filter. For example partition pruning is interested in only "=", ">" etc. It 
> can pass these functions as the supported function list.
>  *
>  * @return Result contains two trees. One tree that can be pushed to Scan and 
> other tree that can't be pushed into Scan. Either of them can be null.
>  */
> FilterPruningResult extract(FilterExpression, FieldList, FunctionList)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5196) Could not run a single MongoDB unit test case through command line or IDE

2017-01-13 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5196:
--

 Summary: Could not run a single MongoDB unit test case through 
command line or IDE
 Key: DRILL-5196
 URL: https://issues.apache.org/jira/browse/DRILL-5196
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi


Could not run a single MongoDB's unit test through IDE or command line. The 
reason is when running a single test case, the MongoDB instance did not get 
started thus a 'table not found' error for 'mongo.employee.empinfo' would be 
raised.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5032) Drill query on hive parquet table failed with OutOfMemoryError: Java heap space

2016-12-01 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi updated DRILL-5032:
---
Labels:   (was: ready-to-commit)

> Drill query on hive parquet table failed with OutOfMemoryError: Java heap 
> space
> ---
>
> Key: DRILL-5032
> URL: https://issues.apache.org/jira/browse/DRILL-5032
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
> Attachments: plan, plan with fix
>
>
> Following query on hive parquet table failed with OOM Java heap space:
> {code}
> select distinct(businessdate) from vmdr_trades where trade_date='2016-04-12'
> 2016-08-31 08:02:03,597 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 283938c3-fde8-0fc6-37e1-9a568c7f5913: select distinct(businessdate) from 
> vmdr_trades where trade_date='2016-04-12'
> 2016-08-31 08:05:58,502 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2
> 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 1 ms
> 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 3 ms
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> 2016-08-31 08:05:58,664 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$1
> 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> 2016-08-31 08:09:42,355 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:3332) ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
>  ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
>  ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:136) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:76) 
> ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:457) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:166) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:76) 
> ~[na:1.8.0_74]
> at 
> com.google.protobuf.TextFormat$TextGenerator.write(TextFormat.java:538) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$TextGenerator.print(TextFormat.java:526) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:389) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286) 
> ~[protobuf-java-2.5.0.jar:na]
> at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
>

[jira] [Created] (DRILL-5094) Assure Comparator to be transitive

2016-12-01 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5094:
--

 Summary: Assure Comparator to be transitive
 Key: DRILL-5094
 URL: https://issues.apache.org/jira/browse/DRILL-5094
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi
Priority: Critical


In AssignmentCreator.java, one Comparator could break transitive attribute 
required for a Comparator implementation and the result is not correct.

E.g. for:
long IntPlusOne = 0x8000L;
[0]=2 * IntPlusOne + 5, [1] = 2* IntPlusOne + 8, [2] = 4 * IntPlusOne + 4,

the compare results will be like:
compare([0],[1]) = -3,
compare([1],[2]) = 4,
compare([0],[2]) = 1 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5094) Assure Comparator to be transitive

2016-12-02 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15716927#comment-15716927
 ] 

Chunhui Shi commented on DRILL-5094:


Yes. We don't need to optimize so hard here. The reason it is this way because 
I actually started from reading Long.compare and other stuff (Comparator, 
TimSort) in JDK first. Then I considered directly implementing compare method 
here will reduce one function call and I could do the comparison and evaluation 
in the order I prefer(>, <, ==)... So here it is the fix.

I agree that considering complete picture, the gain is not that much in 
percentage perspective. But if we keep the fix this way, we also lose nothing. 
Do you agree? :-) Thanks for the review so I got the chance to extract the 
thoughts swiftly came to my mind when I coded this fix!





> Assure Comparator to be transitive
> --
>
> Key: DRILL-5094
> URL: https://issues.apache.org/jira/browse/DRILL-5094
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>Priority: Critical
>  Labels: ready-to-commit
>
> In AssignmentCreator.java, one Comparator could break transitive attribute 
> required for a Comparator implementation and the result is not correct.
> E.g. for:
> long IntPlusOne = 0x8000L;
> [0]=2 * IntPlusOne + 5, [1] = 2* IntPlusOne + 8, [2] = 4 * IntPlusOne + 4,
> the compare results will be like:
> compare([0],[1]) = -3,
> compare([1],[2]) = 4,
> compare([0],[2]) = 1 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5106) Refactor SkipRecordsInspector to exclude check for predefined file formats

2016-12-05 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15722892#comment-15722892
 ] 

Chunhui Shi commented on DRILL-5106:


And one more improvement can be made: bufAdd keep pushing and popping if 
footerCount is 0, this is unnecessary cost. 

The same when headerCount is zero, we don't have to do doSkipHeader all the 
time. Even when headerCount is not zero, Ideally we should not need to do 
doSkipHeader all the time if there is a way we could skip enough records then 
we go to another stage/function there we don't need to check skip or not again 
and again.


> Refactor SkipRecordsInspector to exclude check for predefined file formats
> --
>
> Key: DRILL-5106
> URL: https://issues.apache.org/jira/browse/DRILL-5106
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive
>Affects Versions: 1.9.0
>Reporter: Arina Ielchiieva
>Priority: Minor
>
> After changes introduced in DRILL-4982, SkipRecordInspector is used only for 
> predefined formats (using hasHeaderFooter: false / true). But 
> SkipRecordInspector has its own check for formats where skip strategy can be 
> applied. Acceptable file formats are stored in private final Set 
> fileFormats and initialized in constructor, currently it contains only one 
> format - TextInputFormat. Now this check is redundant and may lead to 
> ignoring hasHeaderFooter setting to true for any other format except of Text.
> To do:
> 1. remove private final Set fileFormats
> 2. remove if block from SkipRecordsInspector.retrievePositiveIntProperty:
> {code}
>  if 
> (!fileFormats.contains(tableProperties.get(hive_metastoreConstants.FILE_INPUT_FORMAT)))
>  {
> return propertyIntValue;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5094) Assure Comparator to be transitive

2016-12-02 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15715873#comment-15715873
 ] 

Chunhui Shi commented on DRILL-5094:


The Long value is bytes of data, it could not be less than 0. So there is no 
overflow possibility.

> Assure Comparator to be transitive
> --
>
> Key: DRILL-5094
> URL: https://issues.apache.org/jira/browse/DRILL-5094
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>Priority: Critical
>  Labels: ready-to-commit
>
> In AssignmentCreator.java, one Comparator could break transitive attribute 
> required for a Comparator implementation and the result is not correct.
> E.g. for:
> long IntPlusOne = 0x8000L;
> [0]=2 * IntPlusOne + 5, [1] = 2* IntPlusOne + 8, [2] = 4 * IntPlusOne + 4,
> the compare results will be like:
> compare([0],[1]) = -3,
> compare([1],[2]) = 4,
> compare([0],[2]) = 1 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5094) Assure Comparator to be transitive

2016-12-02 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15716103#comment-15716103
 ] 

Chunhui Shi commented on DRILL-5094:


Subtraction is 1 cycle, invokeStatic is 74 cycles and more. Please refer 
http://www.jopdesign.com/doc/timing.pdf, 
And JIT optimization happens only after being called thousands time to warm up. 
And this code is in planning stage. That means AssignmentCreator should be 
called once for one query. Consider the gain of moving '>', '<' to front, My 
rough idea was this seemed better. What do you think? 



> Assure Comparator to be transitive
> --
>
> Key: DRILL-5094
> URL: https://issues.apache.org/jira/browse/DRILL-5094
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>Priority: Critical
>  Labels: ready-to-commit
>
> In AssignmentCreator.java, one Comparator could break transitive attribute 
> required for a Comparator implementation and the result is not correct.
> E.g. for:
> long IntPlusOne = 0x8000L;
> [0]=2 * IntPlusOne + 5, [1] = 2* IntPlusOne + 8, [2] = 4 * IntPlusOne + 4,
> the compare results will be like:
> compare([0],[1]) = -3,
> compare([1],[2]) = 4,
> compare([0],[2]) = 1 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5094) Assure Comparator to be transitive

2016-12-02 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15716029#comment-15716029
 ] 

Chunhui Shi commented on DRILL-5094:


My implementation is actually implemented by referring to Long.compareTo. The 
reason I prefer not to use compareTo() was I wanted to reduce one function 
call, and move more possible code paths ('>' and '<' ) to the front (of '='). 

I think it is reasonable assumption that we don't expect dataset to be larger 
than half of Long.MAX_VALUE bytes(2EB)? So -1 still will be fine. :-) . 

> Assure Comparator to be transitive
> --
>
> Key: DRILL-5094
> URL: https://issues.apache.org/jira/browse/DRILL-5094
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>Priority: Critical
>  Labels: ready-to-commit
>
> In AssignmentCreator.java, one Comparator could break transitive attribute 
> required for a Comparator implementation and the result is not correct.
> E.g. for:
> long IntPlusOne = 0x8000L;
> [0]=2 * IntPlusOne + 5, [1] = 2* IntPlusOne + 8, [2] = 4 * IntPlusOne + 4,
> the compare results will be like:
> compare([0],[1]) = -3,
> compare([1],[2]) = 4,
> compare([0],[2]) = 1 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-5177) Query Planning takes infinite time in case drill is connected to Mongo Sharded environment

2017-01-05 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802274#comment-15802274
 ] 

Chunhui Shi edited comment on DRILL-5177 at 1/5/17 7:33 PM:


Not sure if this is related(https://issues.apache.org/jira/browse/DRILL-4882). 
You may want to give it a try. There is no much difference in planning stage 
for MongoDB, except the filter push down optimize rule. So it could be either 
Drill has difficulty to get schema information from MongoDB(due to DRILL-4882) 
or other storage plugins(DRILL-5089), or the filter push down itself is slow. 
It will be easier for us to look at this issue if we have more information, 
e.g. debug level drillbit.log, profile of this query, or the repro steps, 
including configurations, e.g. mongoDB configurations, and mongo storage 
plugins configs in Drill.


was (Author: cshi):
Not sure if this is related(https://issues.apache.org/jira/browse/DRILL-4882). 
You may want to give it a try. There is no much difference in planning stage 
for MongoDB, except the filter push down optimize rule. So it could be either 
Drill has difficulty to get schema information from MongoDB(due to DRILL-4882) 
or other storage plugins(DRILL-4882), or the filter push down itself is slow. 
It will be easier for us to look at this issue if we have more information, 
e.g. debug level drillbit.log, profile of this query, or the repro steps, 
including configurations, e.g. mongoDB configurations, and mongo storage 
plugins configs in Drill.

> Query Planning takes infinite time in case drill is connected to  Mongo 
> Sharded environment
> ---
>
> Key: DRILL-5177
> URL: https://issues.apache.org/jira/browse/DRILL-5177
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - MongoDB
>Affects Versions: 1.8.0
> Environment: 1) There were 4 drillbits (with 3 zookeeper) and 4 
> mongod’s in the cluster. However the drillbits and mongod’s were not located 
> on the same physical server.
> 2) The shard key was evenly distributed among 4 shards (mongod)
>Reporter: Mridul Chopra
>
> When Drill is connected to Sharded Mongo environment (mongoS), then the query 
> execution time is very high as compared to query execution time on mongod 
> (even though the volume of data on mongoS and mongoD is almost same). The 
> root cause behind the same can be linked with the query planning time.
> On MongoS
> Collection Size : - 200 GB, Record Count  : 230,083,160
> A simple select query with a filter on indexed column was executed, but then 
> the query was under execution for more than 50 minutes. The query state was 
> "STARTING" until 40 minutes. Upon further analysis, it was revealed that 
> query planning took very long. 
> Below are the details where this issue was localised -
> Class Name : DefaultSqlHandler.java
> Method Name : protected RelNode transform(PlannerType plannerType, 
> PlannerPhase phase, RelNode input, RelTraitSet targetTraits,
>   boolean log) 
> Line No : 384 :output = program.run(planner, input, toTraits);
> The output from the above line is returned by VolcanoPlanner class 
> (package: org.apache.calcite.plan.volcano) which takes huge time for query 
> planning. This is only in case of MongoS environment.
> When the same select query was executed on MongoD environment
> CollectionSize: 306 GB  Record Count : 49,924,351
> Query execution was completed within 2 minutes and above line returned the 
> output within seconds.
> Given that the data volume was high (300 GB) on mongoD as compared to 
> MongoS(200GB), but the query planning was much faster on MongoD. There seems 
> to be some issue with query planning for MongoS environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5177) Query Planning takes infinite time in case drill is connected to Mongo Sharded environment

2017-01-05 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802274#comment-15802274
 ] 

Chunhui Shi commented on DRILL-5177:


Not sure if this is related(https://issues.apache.org/jira/browse/DRILL-4882). 
You may want to give it a try. There is no much difference in planning stage 
for MongoDB, except the filter push down optimize rule. So it could be either 
Drill has difficulty to get schema information from MongoDB(due to DRILL-4882) 
or other storage plugins(DRILL-4882), or the filter push down itself is slow. 
It will be easier for us to look at this issue if we have more information, 
e.g. debug level drillbit.log, profile of this query, or the repro steps, 
including configurations, e.g. mongoDB configurations, and mongo storage 
plugins configs in Drill.

> Query Planning takes infinite time in case drill is connected to  Mongo 
> Sharded environment
> ---
>
> Key: DRILL-5177
> URL: https://issues.apache.org/jira/browse/DRILL-5177
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - MongoDB
>Affects Versions: 1.8.0
> Environment: 1) There were 4 drillbits (with 3 zookeeper) and 4 
> mongod’s in the cluster. However the drillbits and mongod’s were not located 
> on the same physical server.
> 2) The shard key was evenly distributed among 4 shards (mongod)
>Reporter: Mridul Chopra
>
> When Drill is connected to Sharded Mongo environment (mongoS), then the query 
> execution time is very high as compared to query execution time on mongod 
> (even though the volume of data on mongoS and mongoD is almost same). The 
> root cause behind the same can be linked with the query planning time.
> On MongoS
> Collection Size : - 200 GB, Record Count  : 230,083,160
> A simple select query with a filter on indexed column was executed, but then 
> the query was under execution for more than 50 minutes. The query state was 
> "STARTING" until 40 minutes. Upon further analysis, it was revealed that 
> query planning took very long. 
> Below are the details where this issue was localised -
> Class Name : DefaultSqlHandler.java
> Method Name : protected RelNode transform(PlannerType plannerType, 
> PlannerPhase phase, RelNode input, RelTraitSet targetTraits,
>   boolean log) 
> Line No : 384 :output = program.run(planner, input, toTraits);
> The output from the above line is returned by VolcanoPlanner class 
> (package: org.apache.calcite.plan.volcano) which takes huge time for query 
> planning. This is only in case of MongoS environment.
> When the same select query was executed on MongoD environment
> CollectionSize: 306 GB  Record Count : 49,924,351
> Query execution was completed within 2 minutes and above line returned the 
> output within seconds.
> Given that the data volume was high (300 GB) on mongoD as compared to 
> MongoS(200GB), but the query planning was much faster on MongoD. There seems 
> to be some issue with query planning for MongoS environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5105) Query time increases exponentially with increasing nested levels

2017-01-05 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi updated DRILL-5105:
---
Labels: ready-to-commit  (was: )

> Query time increases exponentially with increasing nested levels
> 
>
> Key: DRILL-5105
> URL: https://issues.apache.org/jira/browse/DRILL-5105
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.9.0
> Environment: 3 Node Cluster with default memory and configurations. 
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>  Labels: ready-to-commit
>
> The time taken to query any JSON dataset depends on number of nested levels 
> within the dataset. Also, increasing the complexity of the dataset further 
> impacts the execution time. 
> Tabulated below is cached query execution times for a simple select * query 
> over two simple forms of JSON datasets: 
> || No. Levels   || Time (s) Dataset 1 || Time (s) Dataset 2  ||
> |1   |0.22  |0.27 
>  |
> |2   |0.23 |0.25  
> |
> |4   |0.24 |0.22  
> |
> |8   |0.22 |0.23  
> |
> |16  |0.34 |0.48  
> |
> |24  |25.76|72.51 
>|
> |26  |103.48   |289.6 
>|
> |28  |336.12   |1151.94   
>  |
> |30  |1342.22  |4586.79|
> |32  |5360.2   |Expected: ~20k|
> The above table lists query times for 20 different JSON files, 10 belonging 
> to dataset 1 & 10 belonging to dataset 2. Each have 1 record, but the number 
> of nested levels within them vary as mentioned in the "No. Levels" column. 
> It appears that the query time almost doubles with addition of a nested level 
> (note that in the table above, it translates to almost 4x across levels 
> starting 24) 
> The below two are the representative datasets, showcasing simple JSON 
> structures with nested levels.
> Structure of Dataset 1:
> {code}
> {
>   "level1": {
> "field1": "a",
> "level2": {
>   "field1"": "b",
>   ...
> }
>   }
> }
> {code}
> Structure of Dataset 2:
> {code}
> "{
>   "level1": {
> "field1": ""a",
> "field2": {
>   "nfield1": true,
>   "nfield2": 1.1
> },
> "level2": {
>   "field1": "b",
>   "field2": {
> "nfield1": false,
> "nfield2": 2.2
>   },
>   ...
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4996) Parquet Date auto-correction is not working in auto-partitioned parquet files generated by drill-1.6

2017-01-04 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi updated DRILL-4996:
---
Labels: ready-to-commit  (was: )

> Parquet Date auto-correction is not working in auto-partitioned parquet files 
> generated by drill-1.6
> 
>
> Key: DRILL-4996
> URL: https://issues.apache.org/jira/browse/DRILL-4996
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>Priority: Critical
>  Labels: ready-to-commit
> Attachments: item.tgz
>
>
> git.commit.id.abbrev=4ee1d4c
> Below are the steps I followed to generate the data :
> {code}
> 1. Generate a parquet file with date column using hive1.2
> 2. Use drill 1.6 to create auto-partitioned parquet files partitioned on the 
> date column
> {code}
> Now the below query returns wrong results :
> {code}
> select i_rec_start_date, i_size from 
> dfs.`/drill/testdata/parquet_date/auto_partition/item_multipart_autorefresh`  
> group by i_rec_start_date, i_size;
> +---+--+
> | i_rec_start_date  |i_size|
> +---+--+
> | null  | large|
> | 366-11-08| extra large  |
> | 366-11-08| medium   |
> | null  | medium   |
> | 366-11-08| petite   |
> | 364-11-07| medium   |
> | null  | petite   |
> | 365-11-07| medium   |
> | 368-11-07| economy  |
> | 365-11-07| large|
> | 365-11-07| small|
> | 366-11-08| small|
> | 365-11-07| extra large  |
> | 364-11-07| N/A  |
> | 366-11-08| economy  |
> | 366-11-08| large|
> | 364-11-07| small|
> | null  | small|
> | 364-11-07| large|
> | 364-11-07| extra large  |
> | 368-11-07| N/A  |
> | 368-11-07| extra large  |
> | 368-11-07| large|
> | 365-11-07| petite   |
> | null  | N/A  |
> | 365-11-07| economy  |
> | 364-11-07| economy  |
> | 364-11-07| petite   |
> | 365-11-07| N/A  |
> | 368-11-07| medium   |
> | null  | extra large  |
> | 368-11-07| small|
> | 368-11-07| petite   |
> | 366-11-08| N/A  |
> +---+--+
> 34 rows selected (0.691 seconds)
> {code}
> However I tried generating the auto-partitioned parquet files using Drill 1.2 
> and then the above query returned the right results.
> I attached the required data sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-5088) Error when reading DBRef column

2016-12-21 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-5088:
--

Assignee: Chunhui Shi

> Error when reading DBRef column
> ---
>
> Key: DRILL-5088
> URL: https://issues.apache.org/jira/browse/DRILL-5088
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
> Environment: drill 1.9.0
> mongo 3.2
>Reporter: Guillaume Champion
>Assignee: Chunhui Shi
>
> In a mongo database with DBRef, when a DBRef is inserted in the first line of 
> a mongo's collection drill query failed :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> {code}
> Simple example to reproduce:
> In mongo instance
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> Error: SYSTEM ERROR: CodecConfigurationException: Can't find a codec for 
> class com.mongodb.DBRef.
> [Error Id: 2944d766-e483-4453-a706-3d481397b186 on Analytics-Biznet:31010] 
> (state=,code=0)
> {code}
> If the first line doesn't contain de DBRef, drill will querying correctly :
> In a mongo instance :
> {code}
> db.contact2.drop();
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8939") });
> db.contact2.insert({ "_id" : ObjectId("582081d96b69060001fd8938"), "account" 
> : DBRef("contact", ObjectId("999cbf116b69060001fd8611")) });
> {code}
> In drill :
> {code}
> 0: jdbc:drill:zk=local> select * from mongo.mydb.contact2;
> +--+---+
> | _id  |account   
>  |
> +--+---+
> | {"$oid":"582081d96b69060001fd8939"}  | {"$id":{}}   
>  |
> | {"$oid":"582081d96b69060001fd8938"}  | 
> {"$ref":"contact","$id":{"$oid":"999cbf116b69060001fd8611"}}  |
> +--+---+
> 2 rows selected (0,563 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-5105) Query time increases exponentially with increasing nested levels

2016-12-24 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-5105:
--

Assignee: Chunhui Shi

> Query time increases exponentially with increasing nested levels
> 
>
> Key: DRILL-5105
> URL: https://issues.apache.org/jira/browse/DRILL-5105
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.9.0
> Environment: 3 Node Cluster with default memory and configurations. 
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>
> The time taken to query any JSON dataset depends on number of nested levels 
> within the dataset. Also, increasing the complexity of the dataset further 
> impacts the execution time. 
> Tabulated below is cached query execution times for a simple select * query 
> over two simple forms of JSON datasets: 
> || No. Levels   || Time (s) Dataset 1 || Time (s) Dataset 2  ||
> |1   |0.22  |0.27 
>  |
> |2   |0.23 |0.25  
> |
> |4   |0.24 |0.22  
> |
> |8   |0.22 |0.23  
> |
> |16  |0.34 |0.48  
> |
> |24  |25.76|72.51 
>|
> |26  |103.48   |289.6 
>|
> |28  |336.12   |1151.94   
>  |
> |30  |1342.22  |4586.79|
> |32  |5360.2   |Expected: ~20k|
> The above table lists query times for 20 different JSON files, 10 belonging 
> to dataset 1 & 10 belonging to dataset 2. Each have 1 record, but the number 
> of nested levels within them vary as mentioned in the "No. Levels" column. 
> It appears that the query time almost doubles with addition of a nested level 
> (note that in the table above, it translates to almost 4x across levels 
> starting 24) 
> The below two are the representative datasets, showcasing simple JSON 
> structures with nested levels.
> Structure of Dataset 1:
> {code}
> {
>   "level1": {
> "field1": "a",
> "level2": {
>   "field1"": "b",
>   ...
> }
>   }
> }
> {code}
> Structure of Dataset 2:
> {code}
> "{
>   "level1": {
> "field1": ""a",
> "field2": {
>   "nfield1": true,
>   "nfield2": 1.1
> },
> "level2": {
>   "field1": "b",
>   "field2": {
> "nfield1": false,
> "nfield2": 2.2
>   },
>   ...
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5151) ConventionTraitDef.plannerConversionMap is not thread safe

2016-12-21 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768644#comment-15768644
 ] 

Chunhui Shi commented on DRILL-5151:


The fix is on calcite side.

> ConventionTraitDef.plannerConversionMap is not thread safe
> --
>
> Key: DRILL-5151
> URL: https://issues.apache.org/jira/browse/DRILL-5151
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>
> We are using static instance ConventionTraitDef.INSTANCE globally and 
> plannerConversionMap(class WeakHashMap) defined in ConventionTraitDef class 
> is not threadsafe. And the data in the map could corrupt and cause dead loop 
> or other data error.
>   
>   private final WeakHashMap
>   plannerConversionMap =
>   new WeakHashMap();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-5151) ConventionTraitDef.plannerConversionMap is not thread safe

2016-12-21 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi updated DRILL-5151:
---
Priority: Major  (was: Critical)

> ConventionTraitDef.plannerConversionMap is not thread safe
> --
>
> Key: DRILL-5151
> URL: https://issues.apache.org/jira/browse/DRILL-5151
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>
> We are using static instance ConventionTraitDef.INSTANCE globally and 
> plannerConversionMap(class WeakHashMap) defined in ConventionTraitDef class 
> is not threadsafe. And the data in the map could corrupt and cause dead loop 
> or other data error.
>   
>   private final WeakHashMap
>   plannerConversionMap =
>   new WeakHashMap();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5151) ConventionTraitDef.plannerConversionMap is not thread safe

2016-12-21 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5151:
--

 Summary: ConventionTraitDef.plannerConversionMap is not thread safe
 Key: DRILL-5151
 URL: https://issues.apache.org/jira/browse/DRILL-5151
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: Chunhui Shi
Assignee: Chunhui Shi
Priority: Critical


We are using static instance ConventionTraitDef.INSTANCE globally and 
plannerConversionMap(class WeakHashMap) defined in ConventionTraitDef class is 
not threadsafe. And the data in the map could corrupt and cause dead loop or 
other data error.
  
  private final WeakHashMap
  plannerConversionMap =
  new WeakHashMap();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4868) Hive functions should update writerIndex accordingly when return binary type

2016-12-19 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762525#comment-15762525
 ] 

Chunhui Shi commented on DRILL-4868:


if data is null, isSet set to 0 to prevent upper layer caller(e.g. another 
function) from operating on null.

> Hive functions should update writerIndex accordingly when return binary type
> 
>
> Key: DRILL-4868
> URL: https://issues.apache.org/jira/browse/DRILL-4868
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>
> unhex is a Hive function. the returned binary buffer could not be consumed by 
> convert_from as shown below.
> 0: jdbc:drill:zk=10.10.88.128:5181> select 
> convert_from(unhex('0a5f710b'),'int_be') from (values(1));
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex(0) + length(4) 
> exceeds writerIndex(0): DrillBuf[31], udle: [25 0..1024]
> Fragment 0:0
> [Error Id: 5e72ce4a-6164-4260-8317-ca2bb6325013 on atsqa4-128.qa.lab:31010] 
> (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5383) Several impersonation unit tests fail in unit test

2017-03-24 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5383:
--

 Summary: Several impersonation unit tests fail in unit test
 Key: DRILL-5383
 URL: https://issues.apache.org/jira/browse/DRILL-5383
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Sudheesh Katkam
Priority: Critical


Run several round unit tests and got these errors:
Failed tests: 
  TestInboundImpersonationPrivileges.twoTargetGroups:135->run:62 proxyName: 
user3_2 targetName: user4_2 expected: but was:
  TestInboundImpersonationPrivileges.oneTargetGroup:118->run:62 proxyName: 
user5_1 targetName: user4_2 expected: but was:
  TestInboundImpersonationPrivileges.twoTargetUsers:126->run:62 proxyName: 
user5_2 targetName: user0_2 expected: but was:

Tests in error: 
  
TestDrillbitResilience.memoryLeaksWhenCancelled:890->assertCancelledWithoutException:532
 » 
  TestInboundImpersonation.selectChainedView:136 »  
org.apache.drill.common.exce...
  
TestImpersonationQueries.org.apache.drill.exec.impersonation.TestImpersonationQueries
 » UserRemote

Notice that if I run unit test in my setup, which has different settings.xml 
for maven to point to our internal repository I often (maybe 1 out of 2 runs) 
got different error at TestOptionsAuthEnabled#updateSysOptAsUserInAdminGroup, 

Since the error is quite consistent when the unit test is built on different 
nodes. I guess when we introduce more jars(kerby?) for unit tests we may not do 
the exclusion enough so conflicts are different for different builds.

We should be able to find out why it failed by remote debugging into these 
particular tests.

If we could not address this issue in one or two days, this JIRA should be used 
to disable these tests for now. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5165) wrong results - LIMIT ALL and OFFSET clause in same query

2017-03-08 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902619#comment-15902619
 ] 

Chunhui Shi commented on DRILL-5165:


Jinfeng, I updated the unit test in pull request with the query that can 
reproduce. Previous query in last commit missing a key part: the query needs to 
project the key field of the parquet.

> wrong results - LIMIT ALL and OFFSET clause in same query
> -
>
> Key: DRILL-5165
> URL: https://issues.apache.org/jira/browse/DRILL-5165
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Khurram Faraaz
>Assignee: Chunhui Shi
>Priority: Critical
>
> This issue was reported by a user on Drill's user list.
> Drill 1.10.0 commit ID : bbcf4b76
> I tried a similar query on apache Drill 1.10.0 and Drill returns wrong 
> results when compared to Postgres, for a query that uses LIMIT ALL and OFFSET 
> clause in the same query. We need to file a JIRA to track this issue.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select col_int from typeall_l order by 1 limit 
> all offset 10;
> +--+
> | col_int  |
> +--+
> +--+
> No rows selected (0.211 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select col_int from typeall_l order by col_int 
> limit all offset 10;
> +--+
> | col_int  |
> +--+
> +--+
> No rows selected (0.24 seconds)
> {noformat}
> Query => select col_int from typeall_l limit all offset 10;
> Drill 1.10.0 returns 85 rows
> whereas for same query,
> postgres=# select col_int from typeall_l limit all offset 10;
> Postgres 9.3 returns 95 rows, which is the correct expected result.
> Query plan for above query that returns wrong results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select col_int from typeall_l 
> limit all offset 10;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(col_int=[$0])
> 00-02SelectionVectorRemover
> 00-03  Limit(offset=[10])
> 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///tmp/typeall_l]], selectionRoot=maprfs:/tmp/typeall_l, 
> numFiles=1, usedMetadataFile=false, columns=[`col_int`]]])
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5353) Merge "Project on Project" generated in physical plan stage

2017-03-13 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5353:
--

 Summary: Merge "Project on Project" generated in physical plan 
stage
 Key: DRILL-5353
 URL: https://issues.apache.org/jira/browse/DRILL-5353
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi


There is possibility physical plan stage we will get a project-on-project plan. 
But the ProjectMergeRule(DrillMergeProjectRule) is only for logical planning. 
We need to apply the rule in physical plan stage as well.

And even after planning stage, the JoinPrelRenameVisitor could also inject 
extra Project which can be merged with (if there is one) Project underneath.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5095) Selecting Join Columns returning null values in the second column

2017-03-08 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901955#comment-15901955
 ] 

Chunhui Shi commented on DRILL-5095:


The provided query is not valid, FULL JOIN works fine in my test. close this 
JIRA for now. If needs to reopen please open with accurate repro steps.

> Selecting Join Columns returning null values in the second column
> -
>
> Key: DRILL-5095
> URL: https://issues.apache.org/jira/browse/DRILL-5095
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Ravan
>
> select a.empno,a.ename,a.deptno,b.deptno deptno deptno_1,b.loc from emp a 
> FULL JOIN dept b ON a.deptno = b.deptno
> Returning O/P:- 
> empno ename   deptno  deptno_1  loc
> 7369.0SMITH   20.0nullDALLAS
> 7499.0ALLEN   30.0nullCHICAGO
> Estimated O/P:-
> empno ename   deptno  deptno_1  loc
> 7369.0SMITH   20.020.0DALLAS
> 7499.0ALLEN   30.030.0CHICAGO



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Closed] (DRILL-5095) Selecting Join Columns returning null values in the second column

2017-03-08 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi closed DRILL-5095.
--
Resolution: Works for Me

> Selecting Join Columns returning null values in the second column
> -
>
> Key: DRILL-5095
> URL: https://issues.apache.org/jira/browse/DRILL-5095
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Ravan
>
> select a.empno,a.ename,a.deptno,b.deptno deptno deptno_1,b.loc from emp a 
> FULL JOIN dept b ON a.deptno = b.deptno
> Returning O/P:- 
> empno ename   deptno  deptno_1  loc
> 7369.0SMITH   20.0nullDALLAS
> 7499.0ALLEN   30.0nullCHICAGO
> Estimated O/P:-
> empno ename   deptno  deptno_1  loc
> 7369.0SMITH   20.020.0DALLAS
> 7499.0ALLEN   30.030.0CHICAGO



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5328) Trim down physical plan size - replace StoragePluginConfig with storage name

2017-03-07 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5328:
--

 Summary: Trim down physical plan size - replace 
StoragePluginConfig with storage name
 Key: DRILL-5328
 URL: https://issues.apache.org/jira/browse/DRILL-5328
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Chunhui Shi


For a physical plan, we now pass StoragePluginConfig as part of plan, then the 
destination use the config to fetch the storage plugin in 
StoragePluginRegistry. However, we can also fetch a storage plugin with the 
name which is identical to all Drillbits. 

In the example of simple physical plan of 150 lines shown below,  the storage 
plugin config took 60 lines. In a typical large system, FileSystem's 
StoragePluginConfig could be >500 lines. So this improvement should save the 
cost of passing a larger physical plan among nodes.

0: jdbc:drill:zk=10.10.88.126:5181> explain plan for select * from 
dfs.tmp.employee1 where last_name='Blumberg';
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(*=[$0])
00-02Project(T1¦¦*=[$0])
00-03  SelectionVectorRemover
00-04Filter(condition=[=($1, 'Blumberg')])
00-05  Project(T1¦¦*=[$0], last_name=[$1])
00-06Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=/tmp/employee1/0_0_0.parquet]], 
selectionRoot=/tmp/employee1, numFiles=1, usedMetadataFile=true, 
cacheFileRoot=/tmp/employee1, columns=[`*`]]])
 | {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ ],
"queue" : 0,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "parquet-scan",
"@id" : 6,
"userName" : "root",
"entries" : [ {
  "path" : "/tmp/employee1/0_0_0.parquet"
} ],
"storage" : {
  "type" : "file",
  "enabled" : true,
  "connection" : "maprfs:///",
  "config" : null,
  "workspaces" : {
"root" : {
  "location" : "/",
  "writable" : false,
  "defaultInputFormat" : null
},
"tmp" : {
  "location" : "/tmp",
  "writable" : true,
  "defaultInputFormat" : null
},
"shi" : {
  "location" : "/user/shi",
  "writable" : true,
  "defaultInputFormat" : null
},
"dir700" : {
  "location" : "/user/shi/dir700",
  "writable" : true,
  "defaultInputFormat" : null
},
"dir775" : {
  "location" : "/user/shi/dir775",
  "writable" : true,
  "defaultInputFormat" : null
},
"xyz" : {
  "location" : "/user/xyz",
  "writable" : true,
  "defaultInputFormat" : null
}
  },
  "formats" : {
"psv" : {
  "type" : "text",
  "extensions" : [ "tbl" ],
  "delimiter" : "|"
},
"csv" : {
  "type" : "text",
  "extensions" : [ "csv" ],
  "delimiter" : ","
},
"tsv" : {
  "type" : "text",
  "extensions" : [ "tsv" ],
  "delimiter" : "\t"
},
"parquet" : {
  "type" : "parquet"
},
"json" : {
  "type" : "json",
  "extensions" : [ "json" ]
},
"maprdb" : {
  "type" : "maprdb"
}
  }
},
"format" : {
  "type" : "parquet"
},
"columns" : [ "`*`" ],
"selectionRoot" : "/tmp/employee1",
"filter" : "true",
"fileSet" : [ "/tmp/employee1/0_0_0.parquet" ],
"files" : [ "/tmp/employee1/0_0_0.parquet" ],
"cost" : 1155.0
  }, {
"pop" : "project",
"@id" : 5,
"exprs" : [ {
  "ref" : "`T1¦¦*`",
  "expr" : "`*`"
}, {
  "ref" : "`last_name`",
  "expr" : "`last_name`"
} ],
"child" : 6,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : 1155.0
  }, {
"pop" : "filter",
"@id" : 4,
"child" : 5,
"expr" : "equal(`last_name`, 'Blumberg') ",
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : 173.25
  }, {
"pop" : "selection-vector-remover",
"@id" : 3,
"child" : 4,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : 173.25
  }, {
"pop" : "project",
"@id" : 2,
"exprs" : [ {
  "ref" : "`T1¦¦*`",
  "expr" : "`T1¦¦*`"
} ],
"child" : 3,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : 173.25
  }, {
"pop" : "project",
"@id" : 1,
"exprs" : [ {
  "ref" : "`*`",
  "expr" : "`T1¦¦*`"
} ],
"child" : 2,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : 173.25
  }, {
"pop" : "screen",
"@id" : 0,
"child" : 1,
"initialAllocation" : 100,
"maxAllocation" : 100,

[jira] [Assigned] (DRILL-5165) wrong results - LIMIT ALL and OFFSET clause in same query

2017-03-07 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-5165:
--

Assignee: Chunhui Shi

> wrong results - LIMIT ALL and OFFSET clause in same query
> -
>
> Key: DRILL-5165
> URL: https://issues.apache.org/jira/browse/DRILL-5165
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Khurram Faraaz
>Assignee: Chunhui Shi
>Priority: Critical
>
> This issue was reported by a user on Drill's user list.
> Drill 1.10.0 commit ID : bbcf4b76
> I tried a similar query on apache Drill 1.10.0 and Drill returns wrong 
> results when compared to Postgres, for a query that uses LIMIT ALL and OFFSET 
> clause in the same query. We need to file a JIRA to track this issue.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select col_int from typeall_l order by 1 limit 
> all offset 10;
> +--+
> | col_int  |
> +--+
> +--+
> No rows selected (0.211 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select col_int from typeall_l order by col_int 
> limit all offset 10;
> +--+
> | col_int  |
> +--+
> +--+
> No rows selected (0.24 seconds)
> {noformat}
> Query => select col_int from typeall_l limit all offset 10;
> Drill 1.10.0 returns 85 rows
> whereas for same query,
> postgres=# select col_int from typeall_l limit all offset 10;
> Postgres 9.3 returns 95 rows, which is the correct expected result.
> Query plan for above query that returns wrong results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select col_int from typeall_l 
> limit all offset 10;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(col_int=[$0])
> 00-02SelectionVectorRemover
> 00-03  Limit(offset=[10])
> 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///tmp/typeall_l]], selectionRoot=maprfs:/tmp/typeall_l, 
> numFiles=1, usedMetadataFile=false, columns=[`col_int`]]])
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5165) wrong results - LIMIT ALL and OFFSET clause in same query

2017-07-21 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096655#comment-16096655
 ] 

Chunhui Shi commented on DRILL-5165:


This unit test does not look relevant to this change. We have seen many times 
this unit test 'TestMergeJoinWithSchemaChanges' fails intermittently. Please 
refer to https://issues.apache.org/jira/browse/DRILL-5612

> wrong results - LIMIT ALL and OFFSET clause in same query
> -
>
> Key: DRILL-5165
> URL: https://issues.apache.org/jira/browse/DRILL-5165
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Khurram Faraaz
>Assignee: Chunhui Shi
>Priority: Critical
> Fix For: 1.12.0
>
>
> This issue was reported by a user on Drill's user list.
> Drill 1.10.0 commit ID : bbcf4b76
> I tried a similar query on apache Drill 1.10.0 and Drill returns wrong 
> results when compared to Postgres, for a query that uses LIMIT ALL and OFFSET 
> clause in the same query. We need to file a JIRA to track this issue.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select col_int from typeall_l order by 1 limit 
> all offset 10;
> +--+
> | col_int  |
> +--+
> +--+
> No rows selected (0.211 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select col_int from typeall_l order by col_int 
> limit all offset 10;
> +--+
> | col_int  |
> +--+
> +--+
> No rows selected (0.24 seconds)
> {noformat}
> Query => select col_int from typeall_l limit all offset 10;
> Drill 1.10.0 returns 85 rows
> whereas for same query,
> postgres=# select col_int from typeall_l limit all offset 10;
> Postgres 9.3 returns 95 rows, which is the correct expected result.
> Query plan for above query that returns wrong results
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select col_int from typeall_l 
> limit all offset 10;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(col_int=[$0])
> 00-02SelectionVectorRemover
> 00-03  Limit(offset=[10])
> 00-04Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///tmp/typeall_l]], selectionRoot=maprfs:/tmp/typeall_l, 
> numFiles=1, usedMetadataFile=false, columns=[`col_int`]]])
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-4211) Inconsistent results from a joined sql statement to postgres tables

2017-08-08 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-4211:
--

Assignee: (was: Chunhui Shi)

> Inconsistent results from a joined sql statement to postgres tables
> ---
>
> Key: DRILL-4211
> URL: https://issues.apache.org/jira/browse/DRILL-4211
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.3.0
> Environment: Postgres db stroage
>Reporter: Robert Hamilton-Smith
>  Labels: newbie
>
> When making an sql statement that incorporates a join to a table and then a 
> self join to that table to get a parent value , Drill brings back 
> inconsistent results. 
> Here is the sql in postgres with correct output:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from transactions trx
> join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join categories w1 on (cat.categoryparentguid = w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL;
> {code}
> Output:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|food|
> |id1|restaurants|food|
> |id2|Coffee Shops|food|
> |id2|Coffee Shops|food|
> When run in Drill with correct storage prefix:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from db.schema.transactions trx
> join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join db.schema.wpfm_categories w1 on (cat.categoryparentguid = 
> w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL
> {code}
> Results are:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|null|
> |id1|restaurants|null|
> |id2|Coffee Shops|null|
> |id2|Coffee Shops|null|
> Physical plan is:
> {code:sql}
> 00-00Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) 
> categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = 
> {110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293
> 00-01  Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : 
> rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292
> 00-02Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) 
> : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291
> 00-03  Jdbc(sql=[SELECT *
> FROM "public"."transactions"
> INNER JOIN (SELECT *
> FROM "public"."categories"
> WHERE "categoryparentguid" IS NOT NULL) AS "t" ON 
> "transactions"."categoryguid" = "t"."categoryguid"
> INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" 
> = "categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) 
> transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) 
> transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) 
> transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, 
> VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6) transactiondate, 
> VARCHAR(256) transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3) 
> transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) 
> transactionamount, DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) 
> transactionnotes, DECIMAL(2, 0) transactioninstrumenttype, VARCHAR(20) 
> transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, 
> VARCHAR(50) transactionorigpartyguid, VARCHAR(255) 
> transactionorigaccountguid, VARCHAR(50) transactionrecpartyguid, VARCHAR(255) 
> transactionrecaccountguid, VARCHAR(256) transactionstatementdesc, DECIMAL(1, 
> 0) transactionsplit, DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) 
> transactionrecategorized, TIMESTAMP(6) transactioncreatedat, TIMESTAMP(6) 
> transactionupdatedat, VARCHAR(50) transactionmatrulerefobjtype, VARCHAR(50) 
> transactionmatrulerefobjguid, VARCHAR(50) transactionmatrulerefobjvalue, 
> VARCHAR(50) transactionuserruleguid, DECIMAL(2, 0) transactionsplitorder, 
> TIMESTAMP(6) transactionprocessedat, TIMESTAMP(6) 
> transactioncategoryassignat, VARCHAR(50) transactionsystemcategoryguid, 
> VARCHAR(50) transactionorigmandateid, VARCHAR(100) fingerprint, VARCHAR(50) 
> categoryguid0, VARCHAR(50) categoryparentguid, DECIMAL(3, 0) categorytype, 
> VARCHAR(50) categoryname, VARCHAR(50) categorydescription, VARCHAR(50) 
> partyguid, VARCHAR(50) categoryguid1, VARCHAR(50) categoryparentguid0, 
> DECIMAL(3, 0) categorytype0, VARCHAR(50) categoryname0, VARCHAR(50) 
> categorydescription0, VARCHAR(50)

[jira] [Assigned] (DRILL-4211) Inconsistent results from a joined sql statement to postgres tables

2017-08-08 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-4211:
--

Assignee: Chunhui Shi

> Inconsistent results from a joined sql statement to postgres tables
> ---
>
> Key: DRILL-4211
> URL: https://issues.apache.org/jira/browse/DRILL-4211
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.3.0
> Environment: Postgres db stroage
>Reporter: Robert Hamilton-Smith
>Assignee: Chunhui Shi
>  Labels: newbie
>
> When making an sql statement that incorporates a join to a table and then a 
> self join to that table to get a parent value , Drill brings back 
> inconsistent results. 
> Here is the sql in postgres with correct output:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from transactions trx
> join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join categories w1 on (cat.categoryparentguid = w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL;
> {code}
> Output:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|food|
> |id1|restaurants|food|
> |id2|Coffee Shops|food|
> |id2|Coffee Shops|food|
> When run in Drill with correct storage prefix:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from db.schema.transactions trx
> join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join db.schema.wpfm_categories w1 on (cat.categoryparentguid = 
> w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL
> {code}
> Results are:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|null|
> |id1|restaurants|null|
> |id2|Coffee Shops|null|
> |id2|Coffee Shops|null|
> Physical plan is:
> {code:sql}
> 00-00Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) 
> categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = 
> {110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293
> 00-01  Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : 
> rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292
> 00-02Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) 
> : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291
> 00-03  Jdbc(sql=[SELECT *
> FROM "public"."transactions"
> INNER JOIN (SELECT *
> FROM "public"."categories"
> WHERE "categoryparentguid" IS NOT NULL) AS "t" ON 
> "transactions"."categoryguid" = "t"."categoryguid"
> INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" 
> = "categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) 
> transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) 
> transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) 
> transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, 
> VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6) transactiondate, 
> VARCHAR(256) transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3) 
> transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) 
> transactionamount, DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) 
> transactionnotes, DECIMAL(2, 0) transactioninstrumenttype, VARCHAR(20) 
> transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, 
> VARCHAR(50) transactionorigpartyguid, VARCHAR(255) 
> transactionorigaccountguid, VARCHAR(50) transactionrecpartyguid, VARCHAR(255) 
> transactionrecaccountguid, VARCHAR(256) transactionstatementdesc, DECIMAL(1, 
> 0) transactionsplit, DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) 
> transactionrecategorized, TIMESTAMP(6) transactioncreatedat, TIMESTAMP(6) 
> transactionupdatedat, VARCHAR(50) transactionmatrulerefobjtype, VARCHAR(50) 
> transactionmatrulerefobjguid, VARCHAR(50) transactionmatrulerefobjvalue, 
> VARCHAR(50) transactionuserruleguid, DECIMAL(2, 0) transactionsplitorder, 
> TIMESTAMP(6) transactionprocessedat, TIMESTAMP(6) 
> transactioncategoryassignat, VARCHAR(50) transactionsystemcategoryguid, 
> VARCHAR(50) transactionorigmandateid, VARCHAR(100) fingerprint, VARCHAR(50) 
> categoryguid0, VARCHAR(50) categoryparentguid, DECIMAL(3, 0) categorytype, 
> VARCHAR(50) categoryname, VARCHAR(50) categorydescription, VARCHAR(50) 
> partyguid, VARCHAR(50) categoryguid1, VARCHAR(50) categoryparentguid0, 
> DECIMAL(3, 0) categorytype0, VARCHAR(50) categoryname0, VARCHAR(50) 
> categorydescription0,

[jira] [Commented] (DRILL-5696) change default compiler strategy

2017-08-01 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109482#comment-16109482
 ] 

Chunhui Shi commented on DRILL-5696:


Kunal, can you help looking at the compilation time of large queries with 
JDK7/8 and Janin's performance issue? Also we need to measure the efficiency of 
generated code before making decision.

> change default compiler strategy
> 
>
> Key: DRILL-5696
> URL: https://issues.apache.org/jira/browse/DRILL-5696
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Affects Versions: 1.9.0, 1.10.0, 1.11.0
>Reporter: weijie.tong
>Assignee: Kunal Khatua
>
> at our production ,when we have more than 20 agg expression, the  compile 
> time is high using the default janino.  but when changed to  jdk compiler,we 
> gain fewer compile time than the janino one. Our product jdk version is 1.8. 
> So the default one should be JDK , if user's jdk version is upper than 1.7. 
> We should add another check condition to the ClassCompilerSelector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (DRILL-5696) change default compiler strategy

2017-08-01 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-5696:
--

Assignee: Kunal Khatua

> change default compiler strategy
> 
>
> Key: DRILL-5696
> URL: https://issues.apache.org/jira/browse/DRILL-5696
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Affects Versions: 1.9.0, 1.10.0, 1.11.0
>Reporter: weijie.tong
>Assignee: Kunal Khatua
>
> at our production ,when we have more than 20 agg expression, the  compile 
> time is high using the default janino.  but when changed to  jdk compiler,we 
> gain fewer compile time than the janino one. Our product jdk version is 1.8. 
> So the default one should be JDK , if user's jdk version is upper than 1.7. 
> We should add another check condition to the ClassCompilerSelector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5746) Pcap PR manually edited Protobuf files, values lost on next build

2017-08-30 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148089#comment-16148089
 ] 

Chunhui Shi commented on DRILL-5746:


I fixed this in our project since we also need to arrange the ids of the 
operators, we can sync offline.

> Pcap PR manually edited Protobuf files, values lost on next build
> -
>
> Key: DRILL-5746
> URL: https://issues.apache.org/jira/browse/DRILL-5746
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
>
> Drill recently accepted the pcap format plugin. As part of that work, the 
> author added a new operator type, {{PCAP_SUB_SCAN_VALUE}}.
> But, apparently this was done by editing the generated Protobuf files to add 
> the values, rather than modifying the protobuf definitions and rebuilding the 
> generated files. The result is, on the next build of the Protobuf sources, 
> the following compile error appears:
> {code}
> [ERROR] 
> /Users/paulrogers/git/drill/exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapFormatPlugin.java:[80,41]
>  error: cannot find symbol
> [ERROR] symbol:   variable PCAP_SUB_SCAN_VALUE
> [ERROR] location: class CoreOperatorType
> {code}
> The solution is to properly edit the Protobuf definitions with the required 
> symbol.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5383) Several impersonation unit tests fail in unit test

2017-10-11 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200687#comment-16200687
 ] 

Chunhui Shi commented on DRILL-5383:


Test TestInboundImpersonation.selectChainedView is Ignored in my environment, 
this test failed frequently when it was run in the whole set of unit test, but 
it passed when I run this test individually. we should have the unit test 
ignored until we figure out how to make it work reliably.



> Several impersonation unit tests fail in unit test
> --
>
> Key: DRILL-5383
> URL: https://issues.apache.org/jira/browse/DRILL-5383
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Arina Ielchiieva
>Priority: Critical
>
> Run several round unit tests and got these errors:
> Failed tests: 
>   TestInboundImpersonationPrivileges.twoTargetGroups:135->run:62 proxyName: 
> user3_2 targetName: user4_2 expected: but was:
>   TestInboundImpersonationPrivileges.oneTargetGroup:118->run:62 proxyName: 
> user5_1 targetName: user4_2 expected: but was:
>   TestInboundImpersonationPrivileges.twoTargetUsers:126->run:62 proxyName: 
> user5_2 targetName: user0_2 expected: but was:
> Tests in error: 
>   
> TestDrillbitResilience.memoryLeaksWhenCancelled:890->assertCancelledWithoutException:532
>  » 
>   TestInboundImpersonation.selectChainedView:136 »  
> org.apache.drill.common.exce...
>   
> TestImpersonationQueries.org.apache.drill.exec.impersonation.TestImpersonationQueries
>  » UserRemote
> Notice that if I run unit test in my setup, which has different settings.xml 
> for maven to point to our internal repository I often (maybe 1 out of 2 runs) 
> got different error at TestOptionsAuthEnabled#updateSysOptAsUserInAdminGroup, 
> Since the error is quite consistent when the unit test is built on different 
> nodes. I guess when we introduce more jars(kerby?) for unit tests we may not 
> do the exclusion enough so conflicts are different for different builds.
> We should be able to find out why it failed by remote debugging into these 
> particular tests.
> If we could not address this issue in one or two days, this JIRA should be 
> used to disable these tests for now. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5969) unit test should continue to tests of storage plugins even there are some failures in exec

2017-11-15 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5969:
--

 Summary: unit test should continue to tests of storage plugins 
even there are some failures in exec
 Key: DRILL-5969
 URL: https://issues.apache.org/jira/browse/DRILL-5969
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi


We are seeing some random issues in unit tests such as 
https://issues.apache.org/jira/browse/DRILL-5925. While we should fix these 
issues, we may want to have different ways to handle such situation:

1, we may want to continue running the unit tests esp. those in storage plugins 
regardless non-fundamental random failures happened in exec module.
2, we may want to re-run these failed tests individually. If we saw these 
failed tests failed the first time but passed the second run when it was run 
individually, we could mark these as 'random failures' and decide to continue 
the whole set of unit tests.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5925) Unit test TestValueVector.testFixedVectorReallocation TestValueVector.testVariableVectorReallocation always fail

2017-11-03 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-5925:
--

 Summary: Unit test TestValueVector.testFixedVectorReallocation 
TestValueVector.testVariableVectorReallocation always fail
 Key: DRILL-5925
 URL: https://issues.apache.org/jira/browse/DRILL-5925
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi


Tests in error: 
  TestValueVector.testFixedVectorReallocation »  Unexpected exception, 
expected<...
  TestValueVector.testVariableVectorReallocation »  Unexpected exception, 
expect...

Tests run: 2401, Failures: 0, Errors: 2, Skipped: 142

We are seeing these failures quite often. We should disable these two tests or 
modify the expected exception to be OutofMemory



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-6212) A simple join is recursing too deep in planning and eventually throwing stack overflow.

2018-06-07 Thread Chunhui Shi (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505623#comment-16505623
 ] 

Chunhui Shi commented on DRILL-6212:


The fix we have in MapR Drill is on DrillProjectMergeRule, it could be applied 
in CALCITE-2223 on ProjectMergeRule.

> A simple join is recursing too deep in planning and eventually throwing stack 
> overflow.
> ---
>
> Key: DRILL-6212
> URL: https://issues.apache.org/jira/browse/DRILL-6212
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Chunhui Shi
>Priority: Critical
> Fix For: 1.14.0
>
>
> Create two views using following statements.
> {code}
> create view v1 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> create view v2 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> {code}
> Executing the following join query produces a stack overflow during the 
> planning phase.
> {code}
> select t1.f from dfs.tmp.v1 as t inner join dfs.tmp.v2 as t1 on cast(t.f as 
> int) = cast(t1.f as int) and cast(t.f as int) = 10 and cast(t1.f as int) = 10;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5365) FileNotFoundException when reading a parquet file

2018-06-07 Thread Chunhui Shi (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505625#comment-16505625
 ] 

Chunhui Shi commented on DRILL-5365:


Users do not need to close FileSystem. These FileSystem objects are cached.

> FileNotFoundException when reading a parquet file
> -
>
> Key: DRILL-5365
> URL: https://issues.apache.org/jira/browse/DRILL-5365
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.10.0
>Reporter: Chun Chang
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.14.0
>
>
> The parquet file is generated through the following CTAS.
> To reproduce the issue: 1) two or more nodes cluster; 2) enable 
> impersonation; 3) set "fs.default.name": "file:///" in hive storage plugin; 
> 4) restart drillbits; 5) as a regular user, on node A, drop the table/file; 
> 6) ctas from a large enough hive table as source to recreate the table/file; 
> 7) query the table from node A should work; 8) query from node B as same user 
> should reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6212) A simple join is recursing too deep in planning and eventually throwing stack overflow.

2018-06-07 Thread Chunhui Shi (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504329#comment-16504329
 ] 

Chunhui Shi commented on DRILL-6212:


Just get back from trips etc. I can cross port the fix I have in mapr drill to 
here in these two days. [~vvysotskyi], you could ping me directly via 
shi.chunhui at aliyun.com. 

> A simple join is recursing too deep in planning and eventually throwing stack 
> overflow.
> ---
>
> Key: DRILL-6212
> URL: https://issues.apache.org/jira/browse/DRILL-6212
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Chunhui Shi
>Priority: Critical
> Fix For: 1.14.0
>
>
> Create two views using following statements.
> {code}
> create view v1 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> create view v2 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> {code}
> Executing the following join query produces a stack overflow during the 
> planning phase.
> {code}
> select t1.f from dfs.tmp.v1 as t inner join dfs.tmp.v2 as t1 on cast(t.f as 
> int) = cast(t1.f as int) and cast(t.f as int) = 10 and cast(t1.f as int) = 10;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6212) A simple join is recursing too deep in planning and eventually throwing stack overflow.

2018-05-03 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462118#comment-16462118
 ] 

Chunhui Shi commented on DRILL-6212:


The fix is not a PR to Apache Drill yet. I will wrap up a PR in these two days. 
[~priteshm]

> A simple join is recursing too deep in planning and eventually throwing stack 
> overflow.
> ---
>
> Key: DRILL-6212
> URL: https://issues.apache.org/jira/browse/DRILL-6212
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Chunhui Shi
>Priority: Critical
> Fix For: 1.14.0
>
>
> Create two views using following statements.
> {code}
> create view v1 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> create view v2 as select cast(greeting as int) f from 
> dfs.`/home/mapr/data/json/temp.json`;
> {code}
> Executing the following join query produces a stack overflow during the 
> planning phase.
> {code}
> select t1.f from dfs.tmp.v1 as t inner join dfs.tmp.v2 as t1 on cast(t.f as 
> int) = cast(t1.f as int) and cast(t.f as int) = 10 and cast(t1.f as int) = 10;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6077) To take advantage of pre-aggregate results when generating plans for aggregation

2018-01-08 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-6077:
--

 Summary: To take advantage of pre-aggregate results when 
generating plans for aggregation
 Key: DRILL-6077
 URL: https://issues.apache.org/jira/browse/DRILL-6077
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi


User could generate aggregation results (count, avg, min, max) for segments of 
data stored either in summary tables or in metadata stores. Planner should be 
able to leverage these results either by direct querying these pre-aggregation 
results from these summary tables or combining pre-aggregation results of old 
data with results from new data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-6092) Support latest MapR release in format-maprdb storage plugin

2018-01-16 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-6092:
--

 Summary: Support latest MapR release in format-maprdb storage 
plugin
 Key: DRILL-6092
 URL: https://issues.apache.org/jira/browse/DRILL-6092
 Project: Apache Drill
  Issue Type: Bug
 Environment: Latest MapRDB release is 6.0. Apache Drill still has 5.2 
MapRDB libraries to build together with format-maprdb plugin. We should update 
to latest MapR. Simply bump up version in pom.xml is not working. 

Ideally we should allow users of Apache Drill to decide which version of MapR 
platform to pick, and Drill should work with latest major release (6.0 or 6.x)  
AND last major release, (5.2.1 or 5.2 or 5.x)

The same apply to other storage plugins, we should allow an easy way to 
configure which version of underneath storage to connect when build Drill.
Reporter: Chunhui Shi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6092) Support latest MapR release in format-maprdb storage plugin

2018-01-16 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327967#comment-16327967
 ] 

Chunhui Shi commented on DRILL-6092:


Latest MapRDB release is 6.0. Apache Drill still has 5.2 MapRDB libraries to 
build together with format-maprdb plugin. We should update to latest MapR. 
Simply bump up version in pom.xml is not working. 

Ideally we should allow users of Apache Drill to decide which version of MapR 
platform to pick, and Drill should work with latest major release (6.0 or 6.x)  
AND last major release, (5.2.1 or 5.2 or 5.x)

The same apply to other storage plugins, we should allow an easy way to 
configure which version of underneath storage to connect when build Drill.

> Support latest MapR release in format-maprdb storage plugin
> ---
>
> Key: DRILL-6092
> URL: https://issues.apache.org/jira/browse/DRILL-6092
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6092) Support latest MapR release in format-maprdb storage plugin

2018-01-16 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi updated DRILL-6092:
---
Environment: (was: Latest MapRDB release is 6.0. Apache Drill still has 
5.2 MapRDB libraries to build together with format-maprdb plugin. We should 
update to latest MapR. Simply bump up version in pom.xml is not working. 

Ideally we should allow users of Apache Drill to decide which version of MapR 
platform to pick, and Drill should work with latest major release (6.0 or 6.x)  
AND last major release, (5.2.1 or 5.2 or 5.x)

The same apply to other storage plugins, we should allow an easy way to 
configure which version of underneath storage to connect when build Drill.)

> Support latest MapR release in format-maprdb storage plugin
> ---
>
> Key: DRILL-6092
> URL: https://issues.apache.org/jira/browse/DRILL-6092
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6078) Query with INTERVAL in predicate does not return any rows

2018-01-18 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi updated DRILL-6078:
---
Labels:   (was: ready-to-commit)

> Query with INTERVAL in predicate does not return any rows
> -
>
> Key: DRILL-6078
> URL: https://issues.apache.org/jira/browse/DRILL-6078
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Robert Hou
>Assignee: Chunhui Shi
>Priority: Major
> Fix For: 1.13.0
>
>
> This query does not return any rows when accessing MapR DB tables.
> SELECT
>   C.C_CUSTKEY,
>   C.C_NAME,
>   SUM(L.L_EXTENDEDPRICE * (1 - L.L_DISCOUNT)) AS REVENUE,
>   C.C_ACCTBAL,
>   N.N_NAME,
>   C.C_ADDRESS,
>   C.C_PHONE,
>   C.C_COMMENT
> FROM
>   customer C,
>   orders O,
>   lineitem L,
>   nation N
> WHERE
>   C.C_CUSTKEY = O.O_CUSTKEY
>   AND L.L_ORDERKEY = O.O_ORDERKEY
>   AND O.O_ORDERDate >= DATE '1994-03-01'
>   AND O.O_ORDERDate < DATE '1994-03-01' + INTERVAL '3' MONTH
>   AND L.L_RETURNFLAG = 'R'
>   AND C.C_NATIONKEY = N.N_NATIONKEY
> GROUP BY
>   C.C_CUSTKEY,
>   C.C_NAME,
>   C.C_ACCTBAL,
>   C.C_PHONE,
>   N.N_NAME,
>   C.C_ADDRESS,
>   C.C_COMMENT
> ORDER BY
>   REVENUE DESC
> LIMIT 20
> This query works against JSON tables.  It should return 20 rows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6078) Query with INTERVAL in predicate does not return any rows

2018-01-19 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332582#comment-16332582
 ] 

Chunhui Shi commented on DRILL-6078:


The fix is relying on some changes in MapRDB server side which is not there 
yet. Will retry a PR after MapRDB is done.

> Query with INTERVAL in predicate does not return any rows
> -
>
> Key: DRILL-6078
> URL: https://issues.apache.org/jira/browse/DRILL-6078
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Robert Hou
>Assignee: Chunhui Shi
>Priority: Major
> Fix For: 1.13.0
>
>
> This query does not return any rows when accessing MapR DB tables.
> SELECT
>   C.C_CUSTKEY,
>   C.C_NAME,
>   SUM(L.L_EXTENDEDPRICE * (1 - L.L_DISCOUNT)) AS REVENUE,
>   C.C_ACCTBAL,
>   N.N_NAME,
>   C.C_ADDRESS,
>   C.C_PHONE,
>   C.C_COMMENT
> FROM
>   customer C,
>   orders O,
>   lineitem L,
>   nation N
> WHERE
>   C.C_CUSTKEY = O.O_CUSTKEY
>   AND L.L_ORDERKEY = O.O_ORDERKEY
>   AND O.O_ORDERDate >= DATE '1994-03-01'
>   AND O.O_ORDERDate < DATE '1994-03-01' + INTERVAL '3' MONTH
>   AND L.L_RETURNFLAG = 'R'
>   AND C.C_NATIONKEY = N.N_NATIONKEY
> GROUP BY
>   C.C_CUSTKEY,
>   C.C_NAME,
>   C.C_ACCTBAL,
>   C.C_PHONE,
>   N.N_NAME,
>   C.C_ADDRESS,
>   C.C_COMMENT
> ORDER BY
>   REVENUE DESC
> LIMIT 20
> This query works against JSON tables.  It should return 20 rows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-6055) Session Multiplexing

2018-02-05 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi reassigned DRILL-6055:
--

Assignee: Padma Penumarthy

> Session Multiplexing
> 
>
> Key: DRILL-6055
> URL: https://issues.apache.org/jira/browse/DRILL-6055
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Chunhui Shi
>Assignee: Padma Penumarthy
>Priority: Major
>
> We could allow one connection to carry multiple user sessions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6145) Implement Hive MapR-DB JSON handler.

2018-02-08 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357161#comment-16357161
 ] 

Chunhui Shi commented on DRILL-6145:


Should this Jira be created in Hive project?

> Implement Hive MapR-DB JSON handler. 
> -
>
> Key: DRILL-6145
> URL: https://issues.apache.org/jira/browse/DRILL-6145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Critical
>  Labels: doc-impacting
> Fix For: 1.13.0
>
>
> Similar to "hive-hbase-storage-handler" to support querying MapR-DB Hive's 
> external tables it is necessary to add "hive-maprdb-json-handler".
> Use case:
>  # Create a table MapR-DB JSON table:
> {code}
> _> mapr dbshell_
> _maprdb root:> create /tmp/table/json_  (make sure /tmp/table exists)
> {code}
> -- insert data
> {code}
> insert /tmp/table/json --value '\{"_id":"movie002" , "title":"Developers 
> on the Edge", "studio":"Command Line Studios"}'
> insert /tmp/table/json --id movie003 --value '\{"title":"The Golden 
> Master", "studio":"All-Nighter"}'
> {code} 
>  #  Create a Hive external table:
> {code}
> CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( 
> movie_id string, title string, studio string) 
> STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' 
> TBLPROPERTIES("maprdb.table.name" = "/tmp/table/json","maprdb.column.id" = 
> "movie_id");
> {code}
>  
>  #  Use hive schema to query this table:
> {code}
> select * from hive.mapr_db_json_hive_tbl;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6103) lsb_release: command not found

2018-02-05 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352777#comment-16352777
 ] 

Chunhui Shi commented on DRILL-6103:


According to this page: [https://www.computerhope.com/unix/lsb_release.htm]

*lsb_release* is part of a software package called the LSB core, which is not 
necessarily [installed|https://www.computerhope.com/jargon/i/install.htm] on 
your system by default.

 

I think I installed my VM by installing minimal centOS configuration.

> lsb_release: command not found
> --
>
> Key: DRILL-6103
> URL: https://issues.apache.org/jira/browse/DRILL-6103
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Priority: Major
> Fix For: 1.13.0
>
>
> Got this error when running drillbit.sh:
>  
> $ bin/drillbit.sh restart
> bin/drill-config.sh: line 317: lsb_release: command not found



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5286) When rel and target candidate set is the same, planner should not need to do convert for the relNode since it must have been done

2017-12-28 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306030#comment-16306030
 ] 

Chunhui Shi commented on DRILL-5286:


It was Paul reviewing the PR: https://github.com/apache/drill/pull/797

> When rel and target candidate set is the same, planner should not need to do 
> convert for the relNode since it must have been done
> -
>
> Key: DRILL-5286
> URL: https://issues.apache.org/jira/browse/DRILL-5286
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>  Labels: ready-to-commit
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-6055) Session Multiplexing

2017-12-22 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-6055:
--

 Summary: Session Multiplexing
 Key: DRILL-6055
 URL: https://issues.apache.org/jira/browse/DRILL-6055
 Project: Apache Drill
  Issue Type: Task
Reporter: Chunhui Shi


We could allow one connection to carry multiple user sessions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-6054) Issues in FindPartitionConditions

2017-12-22 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-6054:
--

 Summary: Issues in FindPartitionConditions
 Key: DRILL-6054
 URL: https://issues.apache.org/jira/browse/DRILL-6054
 Project: Apache Drill
  Issue Type: Bug
Reporter: Chunhui Shi
Assignee: Chunhui Shi


When the condition is these cases, partition is not done correctly: 
b = 3 OR (dir0 = 1 and a = 2)
not (dir0 = 1 AND b = 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (DRILL-5151) ConventionTraitDef.plannerConversionMap is not thread safe

2017-12-22 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi closed DRILL-5151.
--
Resolution: Fixed

> ConventionTraitDef.plannerConversionMap is not thread safe
> --
>
> Key: DRILL-5151
> URL: https://issues.apache.org/jira/browse/DRILL-5151
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>
> We are using static instance ConventionTraitDef.INSTANCE globally and 
> plannerConversionMap(class WeakHashMap) defined in ConventionTraitDef class 
> is not threadsafe. And the data in the map could corrupt and cause dead loop 
> or other data error.
>   
>   private final WeakHashMap
>   plannerConversionMap =
>   new WeakHashMap();



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (DRILL-5286) When rel and target candidate set is the same, planner should not need to do convert for the relNode since it must have been done

2017-12-22 Thread Chunhui Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunhui Shi updated DRILL-5286:
---
Labels: ready-to-commit  (was: )

> When rel and target candidate set is the same, planner should not need to do 
> convert for the relNode since it must have been done
> -
>
> Key: DRILL-5286
> URL: https://issues.apache.org/jira/browse/DRILL-5286
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>  Labels: ready-to-commit
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-6056) Mock datasize could overflow to negative

2017-12-22 Thread Chunhui Shi (JIRA)

Chunhui Shi created DRILL-6056:
--

 Summary: Mock datasize could overflow to negative
 Key: DRILL-6056
 URL: https://issues.apache.org/jira/browse/DRILL-6056
 Project: Apache Drill
  Issue Type: Task
Reporter: Chunhui Shi


In some cases, mock datasize (rowCount * rowWidth) could be too large, 
especially when we test spilling or memory OOB exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (DRILL-5286) When rel and target candidate set is the same, planner should not need to do convert for the relNode since it must have been done

2018-01-02 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308726#comment-16308726
 ] 

Chunhui Shi edited comment on DRILL-5286 at 1/2/18 9:29 PM:


There is no change to result plans. This optimization is aimed to reduce the 
redundant convertChild calls on the same nodes for the same traits. With this 
change, the planning performance is improved.

Add a trace log to the original code as shown below, then we could count the 
occurrence of "convertChild to convert NODE" in log(trace level) to measure the 
improvement, since this occurrence represent how many time the next line ( 
convertChild) will be called.
RelNode newRel = RelOptRule.convert(candidateSet, 
rel.getTraitSet().plus(Prel.DRILL_PHYSICAL));
logger.trace("{}.convertChild to convert NODE {} ,AND {}", 
this.getClass().getSimpleName(), n, newRel);
RelNode out = convertChild(n, newRel);

And I randomly picked several TPCH queries as examples, the counts of this log 
occurrence comparison is listed as this:

With this optimization,  Original - 
Without this change
tpch/5.sql183   2385
tpch/10.sql   68  
365
tpch/17.sql  118 605
tpch/20.sql  381   3839

While planning time is not a major part of a full performance run (TPCH queries 
on SF100). We did specifically run the performance run for this change once and 
it showed an improvement of ~1-2%, so the planning improvement is significant.




was (Author: cshi):
Add a trace log to the original code as shown below, then we could count the 
occurrence of "convertChild to convert NODE" in log(trace level) to measure the 
improvement, since this occurrence represent how many time the next line ( 
convertChild) will be called.
RelNode newRel = RelOptRule.convert(candidateSet, 
rel.getTraitSet().plus(Prel.DRILL_PHYSICAL));
logger.trace("{}.convertChild to convert NODE {} ,AND {}", 
this.getClass().getSimpleName(), n, newRel);
RelNode out = convertChild(n, newRel);

And I randomly picked several TPCH queries as examples, the counts of this log 
occurrence comparison is listed as this:

With this optimization,  Original - 
Without this change
tpch/5.sql183   2385
tpch/10.sql   68  
365
tpch/17.sql  118 605
tpch/20.sql  381   3839

While planning time is not a major part of a full performance run (TPCH queries 
on SF100). We did specifically run the performance run for this change once and 
it showed an improvement of ~1-2%, so the planning improvement is significant.



> When rel and target candidate set is the same, planner should not need to do 
> convert for the relNode since it must have been done
> -
>
> Key: DRILL-5286
> URL: https://issues.apache.org/jira/browse/DRILL-5286
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (DRILL-5286) When rel and target candidate set is the same, planner should not need to do convert for the relNode since it must have been done

2018-01-02 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308726#comment-16308726
 ] 

Chunhui Shi edited comment on DRILL-5286 at 1/2/18 9:44 PM:


There is no change to result plans. This optimization is aimed to reduce the 
redundant convertChild calls on the same nodes for the same traits. With this 
change, the planning performance is improved.

Add a trace log to the original code as shown below, then we could count the 
occurrence of "convertChild to convert NODE" in log(trace level) to measure the 
improvement, since this occurrence represent how many time the next line ( 
convertChild) will be called.
RelNode newRel = RelOptRule.convert(candidateSet, 
rel.getTraitSet().plus(Prel.DRILL_PHYSICAL));
logger.trace("{}.convertChild to convert NODE {} ,AND {}", 
this.getClass().getSimpleName(), n, newRel);
RelNode out = convertChild(n, newRel);

And I randomly picked several TPCH queries as examples, the comparison of 
counts of this log occurrence is listed as this:

|| Query   || With this optimization ||Original - Without 
this change ||
| tpch/5.sql|   183|
  2385   |
| tpch/10.sql  |  68   |
   365|
tpch/17.sql|  118 | 
  605|
tpch/20.sql| 381  | 
3839|

While planning time is not a major part of a full performance run (TPCH queries 
on SF100). We did specifically run the performance run for this change once and 
it showed an improvement of ~1-2%, so the planning improvement is significant.




was (Author: cshi):
There is no change to result plans. This optimization is aimed to reduce the 
redundant convertChild calls on the same nodes for the same traits. With this 
change, the planning performance is improved.

Add a trace log to the original code as shown below, then we could count the 
occurrence of "convertChild to convert NODE" in log(trace level) to measure the 
improvement, since this occurrence represent how many time the next line ( 
convertChild) will be called.
RelNode newRel = RelOptRule.convert(candidateSet, 
rel.getTraitSet().plus(Prel.DRILL_PHYSICAL));
logger.trace("{}.convertChild to convert NODE {} ,AND {}", 
this.getClass().getSimpleName(), n, newRel);
RelNode out = convertChild(n, newRel);

And I randomly picked several TPCH queries as examples, the comparison of 
counts of this log occurrence is listed as this:

With this optimization,  Original - 
Without this change
tpch/5.sql183   2385
tpch/10.sql   68  
365
tpch/17.sql  118 605
tpch/20.sql  381   3839

While planning time is not a major part of a full performance run (TPCH queries 
on SF100). We did specifically run the performance run for this change once and 
it showed an improvement of ~1-2%, so the planning improvement is significant.



> When rel and target candidate set is the same, planner should not need to do 
> convert for the relNode since it must have been done
> -
>
> Key: DRILL-5286
> URL: https://issues.apache.org/jira/browse/DRILL-5286
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5286) When rel and target candidate set is the same, planner should not need to do convert for the relNode since it must have been done

2018-01-02 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308726#comment-16308726
 ] 

Chunhui Shi commented on DRILL-5286:


Add a trace log to the original code as shown below, then we could count the 
occurrence of "convertChild to convert NODE" in log(trace level) to measure the 
improvement, since this occurrence represent how many time the next line ( 
convertChild) will be called.
RelNode newRel = RelOptRule.convert(candidateSet, 
rel.getTraitSet().plus(Prel.DRILL_PHYSICAL));
logger.trace("{}.convertChild to convert NODE {} ,AND {}", 
this.getClass().getSimpleName(), n, newRel);
RelNode out = convertChild(n, newRel);

And I randomly picked several TPCH queries as examples, the counts of this log 
occurrence comparison is listed as this:

With this optimization,  Original - 
Without this change
tpch/5.sql183   2385
tpch/10.sql   68  
365
tpch/17.sql  118 605
tpch/20.sql  381   3839

While planning time is not a major part of a full performance run (TPCH queries 
on SF100). We did specifically run the performance run for this change once and 
it showed an improvement of ~1-2%, so the planning improvement is significant.



> When rel and target candidate set is the same, planner should not need to do 
> convert for the relNode since it must have been done
> -
>
> Key: DRILL-5286
> URL: https://issues.apache.org/jira/browse/DRILL-5286
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (DRILL-5286) When rel and target candidate set is the same, planner should not need to do convert for the relNode since it must have been done

2018-01-02 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308726#comment-16308726
 ] 

Chunhui Shi edited comment on DRILL-5286 at 1/2/18 9:30 PM:


There is no change to result plans. This optimization is aimed to reduce the 
redundant convertChild calls on the same nodes for the same traits. With this 
change, the planning performance is improved.

Add a trace log to the original code as shown below, then we could count the 
occurrence of "convertChild to convert NODE" in log(trace level) to measure the 
improvement, since this occurrence represent how many time the next line ( 
convertChild) will be called.
RelNode newRel = RelOptRule.convert(candidateSet, 
rel.getTraitSet().plus(Prel.DRILL_PHYSICAL));
logger.trace("{}.convertChild to convert NODE {} ,AND {}", 
this.getClass().getSimpleName(), n, newRel);
RelNode out = convertChild(n, newRel);

And I randomly picked several TPCH queries as examples, the comparison of 
counts of this log occurrence is listed as this:

With this optimization,  Original - 
Without this change
tpch/5.sql183   2385
tpch/10.sql   68  
365
tpch/17.sql  118 605
tpch/20.sql  381   3839

While planning time is not a major part of a full performance run (TPCH queries 
on SF100). We did specifically run the performance run for this change once and 
it showed an improvement of ~1-2%, so the planning improvement is significant.




was (Author: cshi):
There is no change to result plans. This optimization is aimed to reduce the 
redundant convertChild calls on the same nodes for the same traits. With this 
change, the planning performance is improved.

Add a trace log to the original code as shown below, then we could count the 
occurrence of "convertChild to convert NODE" in log(trace level) to measure the 
improvement, since this occurrence represent how many time the next line ( 
convertChild) will be called.
RelNode newRel = RelOptRule.convert(candidateSet, 
rel.getTraitSet().plus(Prel.DRILL_PHYSICAL));
logger.trace("{}.convertChild to convert NODE {} ,AND {}", 
this.getClass().getSimpleName(), n, newRel);
RelNode out = convertChild(n, newRel);

And I randomly picked several TPCH queries as examples, the counts of this log 
occurrence comparison is listed as this:

With this optimization,  Original - 
Without this change
tpch/5.sql183   2385
tpch/10.sql   68  
365
tpch/17.sql  118 605
tpch/20.sql  381   3839

While planning time is not a major part of a full performance run (TPCH queries 
on SF100). We did specifically run the performance run for this change once and 
it showed an improvement of ~1-2%, so the planning improvement is significant.



> When rel and target candidate set is the same, planner should not need to do 
> convert for the relNode since it must have been done
> -
>
> Key: DRILL-5286
> URL: https://issues.apache.org/jira/browse/DRILL-5286
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (DRILL-5286) When rel and target candidate set is the same, planner should not need to do convert for the relNode since it must have been done

2018-01-02 Thread Chunhui Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308726#comment-16308726
 ] 

Chunhui Shi edited comment on DRILL-5286 at 1/2/18 9:44 PM:


There is no change to result plans. This optimization is aimed to reduce the 
redundant convertChild calls on the same nodes for the same traits. With this 
change, the planning performance is improved.

Add a trace log to the original code as shown below, then we could count the 
occurrence of "convertChild to convert NODE" in log(trace level) to measure the 
improvement, since this occurrence represent how many time the next line ( 
convertChild) will be called.
RelNode newRel = RelOptRule.convert(candidateSet, 
rel.getTraitSet().plus(Prel.DRILL_PHYSICAL));
logger.trace("{}.convertChild to convert NODE {} ,AND {}", 
this.getClass().getSimpleName(), n, newRel);
RelNode out = convertChild(n, newRel);

And I randomly picked several TPCH queries as examples, the comparison of 
counts of this log occurrence is listed as this:

|| Query   || With this optimization ||Original - Without 
this change ||
| tpch/5.sql|   183|
  2385   |
| tpch/10.sql  |  68   |
   365|
| tpch/17.sql  |  118 | 
  605|
| tpch/20.sql  | 381  | 
3839|

While planning time is not a major part of a full performance run (TPCH queries 
on SF100). We did specifically run the performance run for this change once and 
it showed an improvement of ~1-2%, so the planning improvement is significant.




was (Author: cshi):
There is no change to result plans. This optimization is aimed to reduce the 
redundant convertChild calls on the same nodes for the same traits. With this 
change, the planning performance is improved.

Add a trace log to the original code as shown below, then we could count the 
occurrence of "convertChild to convert NODE" in log(trace level) to measure the 
improvement, since this occurrence represent how many time the next line ( 
convertChild) will be called.
RelNode newRel = RelOptRule.convert(candidateSet, 
rel.getTraitSet().plus(Prel.DRILL_PHYSICAL));
logger.trace("{}.convertChild to convert NODE {} ,AND {}", 
this.getClass().getSimpleName(), n, newRel);
RelNode out = convertChild(n, newRel);

And I randomly picked several TPCH queries as examples, the comparison of 
counts of this log occurrence is listed as this:

|| Query   || With this optimization ||Original - Without 
this change ||
| tpch/5.sql|   183|
  2385   |
| tpch/10.sql  |  68   |
   365|
tpch/17.sql|  118 | 
  605|
tpch/20.sql| 381  | 
3839|

While planning time is not a major part of a full performance run (TPCH queries 
on SF100). We did specifically run the performance run for this change once and 
it showed an improvement of ~1-2%, so the planning improvement is significant.



> When rel and target candidate set is the same, planner should not need to do 
> convert for the relNode since it must have been done
> -
>
> Key: DRILL-5286
> URL: https://issues.apache.org/jira/browse/DRILL-5286
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

1 2 >

1 - 100 of 106 matches

Mail list logo