[jira] [Created] (DRILL-7284) reusing the hashCodes computed at exchange nodes

2019-06-06 Thread weijie.tong (JIRA)
weijie.tong created DRILL-7284:
--

 Summary: reusing the hashCodes computed at exchange nodes
 Key: DRILL-7284
 URL: https://issues.apache.org/jira/browse/DRILL-7284
 Project: Apache Drill
  Issue Type: New Feature
Reporter: weijie.tong
Assignee: weijie.tong
 Fix For: 1.17.0


To HashJoin or HashAggregate, we will shuffle the input data according to 
hashCodes of join conditions or group by keys at the exchange nodes. This 
computing of the hash codes will be redo at the HashJoin or HashAggregate 
nodes. We could send the computed hashCodes of exchange nodes to the upper 
nodes. So the HashJoin or HashAggregate nodes will not need to do the hash 
computing again.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7206) Tuning hash join code using primitive int

2019-04-24 Thread weijie.tong (JIRA)
weijie.tong created DRILL-7206:
--

 Summary: Tuning hash join code using primitive int
 Key: DRILL-7206
 URL: https://issues.apache.org/jira/browse/DRILL-7206
 Project: Apache Drill
  Issue Type: Improvement
Reporter: weijie.tong
Assignee: weijie.tong
 Fix For: 1.17.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7087) Integrate Arrow's Gandiva into Drill

2019-03-11 Thread weijie.tong (JIRA)
weijie.tong created DRILL-7087:
--

 Summary: Integrate Arrow's Gandiva into Drill
 Key: DRILL-7087
 URL: https://issues.apache.org/jira/browse/DRILL-7087
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Codegen, Execution - Relational Operators
Reporter: weijie.tong


It's a prior work to integrate arrow into drill by invoking the its gandiva 
feature. Comparing arrow and drill 's in memory column representation , there's 
different null representation internal now. Drill use 1 byte while arrow using 
1 bit to indicate one null row. Also all columns of arrow is nullable now. 
Apart from those basic differences , they have same memory representation to 
the different data types. 

The integrating strategy is to invoke arrow's JniWrapper's native method 
directly by passing the ValueVector's memory address. 

I have done a implementation at our own Drill version by integrating gandiva 
into Drill's project operator. The performance shows that there's nearly 1 
times performance gain at expression computation.

So if there's no objection , I will submit a related PR to contribute this 
feature. Also this issue waits for arrow's related issue[ARROW-4819].





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-7002) RuntimeFilter produce wrong results while setting exec.hashjoin.num_partitions=1

2019-01-28 Thread weijie.tong (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weijie.tong resolved DRILL-7002.

Resolution: Fixed

> RuntimeFilter produce wrong results while setting 
> exec.hashjoin.num_partitions=1
> 
>
> Key: DRILL-7002
> URL: https://issues.apache.org/jira/browse/DRILL-7002
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.16.0
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>  Labels: reviewable
> Fix For: 1.16.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7002) RuntimeFilter produce wrong results while setting exec.hashjoin.num_partitions=1

2019-01-25 Thread weijie.tong (JIRA)
weijie.tong created DRILL-7002:
--

 Summary: RuntimeFilter produce wrong results while setting 
exec.hashjoin.num_partitions=1
 Key: DRILL-7002
 URL: https://issues.apache.org/jira/browse/DRILL-7002
 Project: Apache Drill
  Issue Type: Bug
 Environment: RuntimeFilter produce wrong results while setting 
exec.hashjoin.num_partitions=1. With that setting, the HashJoin node will not 
executing the bloom filter generation logic. This will make the HashJoin node 
produces zero filled bloom filter and cause the RuntimeFilter to filter out 
wrong rows.
Reporter: weijie.tong
Assignee: weijie.tong
 Fix For: 1.16.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6947) RuntimeFilter memory leak due to BF ByteBuf ownership transferring

2019-01-05 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6947:
--

 Summary: RuntimeFilter memory leak due to BF ByteBuf ownership 
transferring 
 Key: DRILL-6947
 URL: https://issues.apache.org/jira/browse/DRILL-6947
 Project: Apache Drill
  Issue Type: Improvement
Reporter: weijie.tong
Assignee: weijie.tong
 Fix For: 1.16.0


RuntimeFilter's BF ByteBuf ownership should be transferred right at broadcast 
and random hash cases. Currently due to we not treat this transferring 
reasonable, it caused the memory leak.

To broadcast case,the HashJoin operator's allocator allocated the BF, the 
allocated BF's ownership should be transferred to its receiver : the 
FragmentContextImpl or the final RuntimeFilter operator. Otherwise, the 
OperatorContextImpl's close method will complain about the memory leak when 
closing the corresponding allocator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6826) Transfer RF ByteBuf owner causing the query hang

2018-11-01 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6826:
--

 Summary: Transfer RF ByteBuf owner causing the query hang
 Key: DRILL-6826
 URL: https://issues.apache.org/jira/browse/DRILL-6826
 Project: Apache Drill
  Issue Type: Improvement
Reporter: weijie.tong


To JPPD feature, when we transfer the received RF's ByteBuf owner at the 
WorkerBee's receiveRuntimeFilter method, the sent out aggregated RF will not 
receive any response, no Ack.OK , even exception information. This will 
eventually cause the query hanged at ForemanResult's close method which blocked 
at runtimeFilterRouter's waitForComplete method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6825) Applying different hash function according to data types and data size

2018-11-01 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6825:
--

 Summary: Applying different hash function according to data types 
and data size
 Key: DRILL-6825
 URL: https://issues.apache.org/jira/browse/DRILL-6825
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Codegen
Reporter: weijie.tong
 Fix For: 1.16.0


Different hash functions have different performance according to different data 
types and data size. We should choose a right one to apply not just Murmurhash.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6792) Find the right probe side fragment to any storage plugin

2018-10-12 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6792:
--

 Summary: Find the right probe side fragment to any storage plugin
 Key: DRILL-6792
 URL: https://issues.apache.org/jira/browse/DRILL-6792
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Flow
Reporter: weijie.tong
Assignee: weijie.tong
 Fix For: 1.15.0


The current implementation of JPPD to find the probe side wrapper depends on 
the GroupScan's digest. But there's no promise the GroupScan's digest will not 
be changed since it is attached to the RuntimeFilterDef by different storage 
plugin implementation logic.So here we assign a unique identifier to the 
RuntimeFilter operator, and find the right probe side fragment wrapper by the 
runtime filter identifier at the RuntimeFilterRouter class. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6784) Add RuntimeFilter metrics to the web console

2018-10-08 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6784:
--

 Summary: Add RuntimeFilter metrics to the web console
 Key: DRILL-6784
 URL: https://issues.apache.org/jira/browse/DRILL-6784
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Monitoring, Web Server
Affects Versions: 1.15.0
Reporter: weijie.tong
Assignee: weijie.tong


This issue is to add some RuntimeFilter metrics to the web console to watch the 
RuntimeFilter behavior directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6745) Introduce the xxHash algorithm as another hash64 option

2018-09-18 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6745:
--

 Summary: Introduce the xxHash algorithm as another hash64 option
 Key: DRILL-6745
 URL: https://issues.apache.org/jira/browse/DRILL-6745
 Project: Apache Drill
  Issue Type: Improvement
Reporter: weijie.tong
Assignee: weijie.tong
 Fix For: 1.15.0


Supply another hash64 algorithm : xxHash as a replacer to MurmurHash. According 
to [xxHash|http://cyan4973.github.io/xxHash/] report , it is more faster than 
MurmurHash  and projects like Spark ,Presto have adopted it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6736) Make successive MinorFragments running in the same JVM communicate without the network

2018-09-08 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6736:
--

 Summary: Make successive MinorFragments running in the same JVM 
communicate without the network
 Key: DRILL-6736
 URL: https://issues.apache.org/jira/browse/DRILL-6736
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.15.0
Reporter: weijie.tong
Assignee: weijie.tong


At current implementation, two successive MinorFragments running in the same 
JVM to transfer RecordBatchs is to use the DataTunnel that means by the socket. 
This is not the best performance and we should treat this special case to let 
them communicate through a method invocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6731) JPPD:Move aggregating the BF from the Foreman to the RuntimeFilter

2018-09-06 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6731:
--

 Summary: JPPD:Move aggregating the BF from the Foreman to the 
RuntimeFilter
 Key: DRILL-6731
 URL: https://issues.apache.org/jira/browse/DRILL-6731
 Project: Apache Drill
  Issue Type: Improvement
  Components:  Server
Affects Versions: 1.15.0
Reporter: weijie.tong
Assignee: weijie.tong


This PR is to move the BloomFilter aggregating work from the foreman to 
RuntimeFilter. Though this change, the RuntimeFilter can apply the incoming BF 
as soon as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6573) Enhance JPPD with NDV

2018-07-02 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6573:
--

 Summary: Enhance JPPD with NDV
 Key: DRILL-6573
 URL: https://issues.apache.org/jira/browse/DRILL-6573
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.14.0
Reporter: weijie.tong
Assignee: weijie.tong


Using NDV from the metadata system to judge whether the BloomFilter should be 
enabled at a possible HashJoin node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6572) Add memory calculattion of JPPD BloomFilter

2018-07-02 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6572:
--

 Summary: Add memory calculattion of JPPD BloomFilter
 Key: DRILL-6572
 URL: https://issues.apache.org/jira/browse/DRILL-6572
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Reporter: weijie.tong
Assignee: weijie.tong
 Fix For: 1.14.0


This is an enhancement of DRILL-6385 to include the memory of BloomFilter in 
the HashJoin's memory calculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6385) Support JPPD (Join Predicate Push Down)

2018-05-07 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6385:
--

 Summary: Support JPPD (Join Predicate Push Down)
 Key: DRILL-6385
 URL: https://issues.apache.org/jira/browse/DRILL-6385
 Project: Apache Drill
  Issue Type: New Feature
  Components:  Server, Execution - Flow
Reporter: weijie.tong
Assignee: weijie.tong


This feature is to support the JPPD (Join Predicate Push Down). It will benefit 
the HashJoin ,Broadcast HashJoin performance by reducing the number of rows to 
send across the network ,the memory consumed. This feature is already supported 
by Impala which calls it RuntimeFilter 
([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]).
 The first PR will try to push down a bloom filter of HashJoin node to 
Parquet’s scan node.   The propose basic procedure is described as follow:
 # The HashJoin build side accumulate the equal join condition rows to 
construct a bloom filter. Then it sends out the bloom filter to the foreman 
node.
 # The foreman node accept the bloom filters passively from all the fragments 
that has the HashJoin operator. It then aggregates the bloom filters to form a 
global bloom filter.
 # The foreman node broadcasts the global bloom filter to all the probe side 
scan nodes which maybe already have send out partial data to the hash join 
nodes(currently the hash join node will prefetch one batch from both sides ).

      4.  The scan node accepts a global bloom filter from the foreman node. It 
will filter the rest rows satisfying the bloom filter.

 

To implement above execution flow, some main new notion described as below:

      1. RuntimeFilter

It’s a filter container which may contain BloomFilter or MinMaxFilter.

      2. RuntimeFilterReporter

It wraps the logic to send hash join’s bloom filter to the foreman.The 
serialized bloom filter will be sent out through the data tunnel.This object 
will be instanced by the FragmentExecutor and passed to the FragmentContext.So 
the HashJoin operator can obtain it through the FragmentContext.

     3. RuntimeFilterRequestHandler

It is responsible to accept a SendRuntimeFilterRequest RPC to strip the actual 
BloomFilter from the network. It then translates this filter to the WorkerBee’s 
new interface registerRuntimeFilter.

Another RPC type is BroadcastRuntimeFilterRequest. It will register the 
accepted global bloom filter to the WorkerBee by the registerRuntimeFilter 
method and then propagate to the FragmentContext through which the probe side 
scan node can fetch the aggregated bloom filter.

      4.RuntimeFilterManager

The foreman will instance a RuntimeFilterManager .It will indirectly get every 
RuntimeFilter by the WorkerBee. Once all the BloomFilters have been accepted 
and aggregated . It will broadcast the aggregated bloom filter to all the probe 
side scan nodes through the data tunnel by a BroadcastRuntimeFilterRequest RPC.

     5. RuntimeFilterEnableOption 

 A global option will be added to decide whether to enable this new feature.

 

Welcome suggestion and advice from you.The related PR will be presented as soon 
as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6059) Apply needed StoragePlugins's RuleSet to the planner

2017-12-22 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6059:
--

 Summary: Apply needed StoragePlugins's RuleSet to the planner
 Key: DRILL-6059
 URL: https://issues.apache.org/jira/browse/DRILL-6059
 Project: Apache Drill
  Issue Type: Improvement
Reporter: weijie.tong
Assignee: weijie.tong
Priority: Minor
 Fix For: 1.13.0


Now once we configure Drill with more than one StoragePlugins, it will apply 
all the plugins's rules to user's queries even the queries not contain 
corresponding storage plugin. The reason is the method below of QueryContext
{code:java}
  public StoragePluginRegistry getStorage() {
return drillbitContext.getStorage();
  }
{code}
 
>From QueryContext's name , the method  should return the query involved 
>storage plugin registry not all the configured storage plugins. 

So we need to identify the involved storage plugin at the parse stage, and set 
the collected involved storage plugins to the QueryContext. This will also 
benefit the work to do a schema level security control. Maybe a new method with 
the name getInvolvedStorage will be more accurate.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-6058) Define per connection level OptionManager

2017-12-22 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6058:
--

 Summary: Define per connection level OptionManager
 Key: DRILL-6058
 URL: https://issues.apache.org/jira/browse/DRILL-6058
 Project: Apache Drill
  Issue Type: Improvement
Reporter: weijie.tong
Assignee: weijie.tong
Priority: Minor
 Fix For: 1.13.0


We want to control some queries's running frequency which need to be identified 
by some options .
One requirement case is we want to control the download query times, but allow 
the normal queries to run. So we need to define a connection level 
OptionManager .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-6057) batch kill running queries from web UI

2017-12-22 Thread weijie.tong (JIRA)
weijie.tong created DRILL-6057:
--

 Summary: batch kill running queries from web UI
 Key: DRILL-6057
 URL: https://issues.apache.org/jira/browse/DRILL-6057
 Project: Apache Drill
  Issue Type: Improvement
  Components: Tools, Build & Test
Affects Versions: 1.13.0
 Environment: Some scenarios, Drill suffers  high concurrency queries 
and leads to overload. As an administrator, you might  want to quickly batch 
kill all the running queries to avoid the system from overload and let the 
system recover from being overloaded.Though we provide a web tool to  kill a 
running query one by one ,it's not so efficient to this case . 

So here will provide a web button to kill all the running queries to an 
administrator.

Reporter: weijie.tong
Assignee: weijie.tong
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5975) Resource utilization

2017-11-18 Thread weijie.tong (JIRA)
weijie.tong created DRILL-5975:
--

 Summary: Resource utilization
 Key: DRILL-5975
 URL: https://issues.apache.org/jira/browse/DRILL-5975
 Project: Apache Drill
  Issue Type: New Feature
Affects Versions: 2.0.0
Reporter: weijie.tong
Assignee: weijie.tong


h1. Motivation

Now the resource utilization radio of Drill's cluster is not too good. Most of 
the cluster resource is wasted. We can not afford too much concurrent queries. 
Once the system accepted more queries with a not high cpu load, the query which 
originally is very quick will become slower and slower.

The reason is Drill does not supply a scheduler . It just assume all the nodes 
have enough calculation resource. Once a query comes, it will schedule the 
related fragments to random nodes not caring about the node's load. Some nodes 
will suffer more cpu context switch to satisfy the coming query. The profound 
causes to this is that the runtime minor fragments construct a runtime tree 
whose nodes spread different drillbits. The runtime tree is a memory pipeline 
that is all the nodes will stay alone the whole lifecycle of a query by sending 
out data to upper nodes successively, even though some node could run quickly 
and quit immediately.What's more the runtime tree is constructed before actual 
running. The schedule target to Drill will become the whole runtime tree nodes.

h1. Design
It will be hard to schedule the runtime tree nodes as a whole. So I try to 
solve this by breaking the runtime cascade nodes. The graph below describes the 
initial design. 
!https://raw.githubusercontent.com/wiki/weijietong/drill/images/design.png!

Every Drillbit instance will have a RecordBatchManager which will accept all 
the RecordBatchs written by the senders of local different MinorFragments. The 
RecordBatchManager will hold the RecordBatchs in memory firstly then disk 
storage . Once the first RecordBatch of one MinorFragment sender of one query 
occur , it will notice the FragmentScheduler. The FragmentScheduler is 
instanced by the Foreman.It holds the whole PlanFragment execution graph.It 
will allocate a new FragmentExecutor to run the generated RecordBatch. The 
allocated FragmentExecutor will then notify the corresponding FragmentManager 
to indicate that I am ready to receive the data. Then the FragmentManger will 
send out the RecordBatch one by one to the corresponding FragmentExecutor's 
receiver like what the current Sender does by throttling the data stream.

h1. Plan

This will be a large PR ,so I plan to divide it into some small ones.
a. to implement the RecordManager.
b. to implement a simple random FragmentScheduler and the whole event flow.
c. to implement a primitive FragmentScheduler which may reference the Sparrow 
project.






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5913) DrillReduceAggregatesRule

2017-10-28 Thread weijie.tong (JIRA)
weijie.tong created DRILL-5913:
--

 Summary: DrillReduceAggregatesRule 
 Key: DRILL-5913
 URL: https://issues.apache.org/jira/browse/DRILL-5913
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.11.0, 1.9.0
Reporter: weijie.tong


sample query:
{code:java}
select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as 
int)) as col2 from cp.`employee.json`
{code}

error info:

{code:java}

org.apache.drill.exec.rpc.RpcException: 
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
AssertionError: Type mismatch:
rel rowtype:
RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT NULL
equivRel rowtype:
RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL
[Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010]
  (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
during fragment initialization: Internal error: Error while applying rule 
DrillReduceAggregatesRule, args 
[rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))]
org.apache.drill.exec.work.foreman.Foreman.run():294
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745
  Caused By (java.lang.AssertionError) Internal error: Error while applying 
rule DrillReduceAggregatesRule, args 
[rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))]
org.apache.calcite.util.Util.newInternal():792
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811
{code}


The reason is that stddev_samp(cast(employee_id as int))  will be reduced as 
sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be 
reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching.  
The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too . 
But this sum0($0) 's data type is different from the first time's sum0($0) : 
one is integer ,the other is bigint . But Calcite's addAggCall method treat 
them as the same by ignoring their data type. This leads to the bigint sum0($0) 
be replaced by the integer sum0($0).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5749) Foreman and Netty threads occure deadlock

2017-08-29 Thread weijie.tong (JIRA)
weijie.tong created DRILL-5749:
--

 Summary: Foreman and Netty threads occure deadlock 
 Key: DRILL-5749
 URL: https://issues.apache.org/jira/browse/DRILL-5749
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - RPC
Affects Versions: 1.11.0, 1.10.0
Reporter: weijie.tong
Priority: Critical


when the cluster was in high concurrency query and the reused control 
connection occured exceptoin, the foreman and netty threads both try to acquire 
each other's lock then deadlock occured.  The netty thread hold the map 
(RequestIdMap) lock then try to acquire the ReconnectingConnection lock to send 
command, while the foreman thread hold the ReconnectingConnection lock then try 
to acquire the RequestIdMap lock. So the deadlock happend.

Below is the jstack dump:

Found one Java-level deadlock:
=
"265aa5cb-e5e2-39ed-9c2f-7658b905372e:foreman":
  waiting to lock monitor 0x7f935b721f48 (object 0x000656affc40, a 
org.apache.drill.exec.rpc.control.ControlConnectionManager),
  which is held by "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman"
"265aa82f-d8c1-5df0-9946-003a4990db7e:foreman":
  waiting to lock monitor 0x7f90de3b9648 (object 0x0006b524d7e8, a 
com.carrotsearch.hppc.IntObjectHashMap),
  which is held by "BitServer-2"
"BitServer-2":
  waiting to lock monitor 0x7f935b721f48 (object 0x000656affc40, a 
org.apache.drill.exec.rpc.control.ControlConnectionManager),
  which is held by "265aa82f-d8c1-5df0-9946-003a4990db7e:foreman"

Java stack information for the threads listed above:
===
"265aa5cb-e5e2-39ed-9c2f-7658b905372e:foreman":
at 
org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:72)
- waiting to lock <0x000656affc40> (a 
org.apache.drill.exec.rpc.control.ControlConnectionManager)
at 
org.apache.drill.exec.rpc.control.ControlTunnel.sendFragments(ControlTunnel.java:66)
at 
org.apache.drill.exec.work.foreman.Foreman.sendRemoteFragments(Foreman.java:1210)
at 
org.apache.drill.exec.work.foreman.Foreman.setupNonRootFragments(Foreman.java:1141)
at 
org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:454)
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1045)
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
at java.lang.Thread.run(Thread.java:849)
"265aa82f-d8c1-5df0-9946-003a4990db7e:foreman":
at 
org.apache.drill.exec.rpc.RequestIdMap.createNewRpcListener(RequestIdMap.java:87)
- waiting to lock <0x0006b524d7e8> (a 
com.carrotsearch.hppc.IntObjectHashMap)
at 
org.apache.drill.exec.rpc.AbstractRemoteConnection.createNewRpcListener(AbstractRemoteConnection.java:153)
at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:115)
at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:89)
at 
org.apache.drill.exec.rpc.control.ControlConnection.send(ControlConnection.java:65)
at 
org.apache.drill.exec.rpc.control.ControlTunnel$SendFragment.doRpcCall(ControlTunnel.java:160)
at 
org.apache.drill.exec.rpc.control.ControlTunnel$SendFragment.doRpcCall(ControlTunnel.java:150)
at 
org.apache.drill.exec.rpc.ListeningCommand.connectionAvailable(ListeningCommand.java:38)
at 
org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:75)
- locked <0x000656affc40> (a 
org.apache.drill.exec.rpc.control.ControlConnectionManager)
at 
org.apache.drill.exec.rpc.control.ControlTunnel.sendFragments(ControlTunnel.java:66)
at 
org.apache.drill.exec.work.foreman.Foreman.sendRemoteFragments(Foreman.java:1210)
at 
org.apache.drill.exec.work.foreman.Foreman.setupNonRootFragments(Foreman.java:1141)
at 
org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:454)
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1045)
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
at java.lang.Thread.run(Thread.java:849)
"BitServer-2":
at 
org.apache.drill.exec.rpc.ReconnectingConnection.runCommand(ReconnectingConnection.java:72)
- waiting to lock <0x000656affc40> (a 
org.apache.drill.exec.rpc.control.ControlConnectionManager)
at 
org.apache.drill.exec.rpc.control.ControlTunnel.cancelFragment(ControlTunnel.java:71)
at 

[jira] [Created] (DRILL-5734) exclude commons-codec dependency conflict

2017-08-20 Thread weijie.tong (JIRA)
weijie.tong created DRILL-5734:
--

 Summary: exclude commons-codec dependency conflict
 Key: DRILL-5734
 URL: https://issues.apache.org/jira/browse/DRILL-5734
 Project: Apache Drill
  Issue Type: Bug
  Components:  Server
Affects Versions: 1.11.0
 Environment: there is version conflicts between 1.4 and 1.10 include 
by contrib/format-maprdb/pom.xml and 
contrib/storage-hive/hive-exec-shade/pom.xml
Reporter: weijie.tong






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5717) date time test cases is not Local independent

2017-08-11 Thread weijie.tong (JIRA)
weijie.tong created DRILL-5717:
--

 Summary: date time test cases is not Local independent
 Key: DRILL-5717
 URL: https://issues.apache.org/jira/browse/DRILL-5717
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
Affects Versions: 1.11.0, 1.9.0
Reporter: weijie.tong


Some date time test cases like  JodaDateValidatorTest  is not Local independent 
.This will cause other Local's users's test phase to fail. We should let these 
test cases to be Local env independent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5696) change default compiler strategy

2017-07-31 Thread weijie.tong (JIRA)
weijie.tong created DRILL-5696:
--

 Summary: change default compiler strategy
 Key: DRILL-5696
 URL: https://issues.apache.org/jira/browse/DRILL-5696
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Codegen
Affects Versions: 1.11.0, 1.10.0, 1.9.0
Reporter: weijie.tong


at our production ,when we have more than 20 agg expression, the  compile time 
is high using the default janino.  but when changed to  jdk compiler,we gain 
fewer compile time than the janino one. Our product jdk version is 1.8. 

So the default one should be JDK , if user's jdk version is upper than 1.7. We 
should add another check condition to the ClassCompilerSelector.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5691) multiple count distinct query planning error at physical phase

2017-07-26 Thread weijie.tong (JIRA)
weijie.tong created DRILL-5691:
--

 Summary: multiple count distinct query planning error at physical 
phase 
 Key: DRILL-5691
 URL: https://issues.apache.org/jira/browse/DRILL-5691
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.10.0, 1.9.0
Reporter: weijie.tong


I materialized the count distinct query result in a cache , added a plugin rule 
to translate the (Aggregate、Aggregate、Project、Scan) or 
(Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. 
Then ,once user issue count distinct queries , it will be translated to query 
cache to get the result.

eg1: " select count(*),sum(a) ,count(distinct b)  from t where dt=xx " 
eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t where 
dt=xxx "

eg1 will be right and have a query result as I expected , but eg2 will be wrong 
at the physical phase.The error info is here: 
https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. 







--
This message was sent by Atlassian JIRA
(v6.4.14#64029)