[jira] [Updated] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-4266:

Attachment: 
memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131

Attached are results of the same test executed with drill built from branch:

{code}
Generated by Git-Commit-Id-Plugin
#Sun Jan 17 03:49:25 UTC 2016
git.commit.id.abbrev=daa89a6
git.commit.user.email=jacq...@apache.org
git.commit.message.full=DRILL-4131\: Move RPC allocators under Drill's root 
allocator & accounting\n\n- Allow settings to be set to ensure RPC reservation 
and maximums (currently unset by default). Defaults set in drill-module.conf\n- 
Add new metrics to report RPC layer memory consumption.\n- Check for memory 
leaks from RPC layer at shutdown.\n
git.commit.id=daa89a68aa416f8718ad005bc57baf4ac58a9c66
git.commit.message.short=DRILL-4131\: Move RPC allocators under Drill's root 
allocator & accounting
git.commit.user.name=Jacques Nadeau
git.build.user.name=vmarkman
git.commit.id.describe=0.9.0-560-gdaa89a6
git.build.user.email=vmark...@maprtech.com
git.branch=daa89a68aa416f8718ad005bc57baf4ac58a9c66
git.commit.time=16.01.2016 @ 00\:04\:59 UTC
git.build.time=17.01.2016 @ 03\:49\:25 UTC
git.remote.origin.url=g...@github.com\:jacques-n/drill.git
{code}


> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: drill.log.2016-01-12-16, memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105411#comment-15105411
 ] 

Victoria Markman commented on DRILL-4266:
-

Important detail: no memory leak was detected on shutdown. 

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: drill.log.2016-01-12-16, memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105414#comment-15105414
 ] 

Jacques Nadeau commented on DRILL-4266:
---

In these attachments, I don't see the information on what the Web UI is 
reporting in terms of memory allocation. Were you able to capture that and I 
just missed it? The UI will tell us how much memory is owned by the various RPC 
layers as well as other key metrics.

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: drill.log.2016-01-12-16, memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4192) Dir0 and Dir1 from drill-1.4 are messed up

2016-01-18 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli closed DRILL-4192.


> Dir0 and Dir1 from drill-1.4 are messed up
> --
>
> Key: DRILL-4192
> URL: https://issues.apache.org/jira/browse/DRILL-4192
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.4.0
>Reporter: Krystal
>Assignee: Aman Sinha
>Priority: Blocker
>
> I have the following directories:
> /drill/testdata/temp1/abc/dt=2014-12-30/lineitem.parquet
> /drill/testdata/temp1/abc/dt=2014-12-31/lineitem.parquet
> The following queries returned incorrect data.
> select dir0,dir1 from dfs.`/drill/testdata/temp1` limit 2;
> ++---+
> |  dir0  | dir1  |
> ++---+
> | dt=2014-12-30  | null  |
> | dt=2014-12-30  | null  |
> ++---+
> select dir0 from dfs.`/drill/testdata/temp1` limit 2;
> ++
> |  dir0  |
> ++
> | dt=2014-12-31  |
> | dt=2014-12-31  |
> ++



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4275) Refactor e/pstore interfaces and their factories to provide a unified mechanism to access stores

2016-01-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105583#comment-15105583
 ] 

ASF GitHub Bot commented on DRILL-4275:
---

Github user hnfgns commented on the pull request:

https://github.com/apache/drill/pull/325#issuecomment-172604790
  
Cool. Let's sync up on tomorrow's weekly hangout. Let me know if that does 
not work for you.


> Refactor e/pstore interfaces and their factories to provide a unified 
> mechanism to access stores
> 
>
> Key: DRILL-4275
> URL: https://issues.apache.org/jira/browse/DRILL-4275
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: Hanifi Gunes
>Assignee: Deneche A. Hakim
>
> We rely on E/PStore interfaces to persist data. Even though E/PStore stands 
> for Ephemeral and Persistent stores respectively, the current design for 
> EStore does not extend the interface/functionality of PStore at all, which 
> hints abstraction for EStore is redundant. This issue proposes a new unified 
> Store interface replacing the old E/PStore that exposes an additional method 
> that report persistence level as follows:
> {code:title=Store interface}
> interface Store {
>   StoreMode getMode();
>   V get(String key);
>   ...
> }
> enum StoreMode {
>   EPHEMERAL,
>   PERSISTENT,
>   ...
> }
> {code}
> The new design brings in less redundancy, more centralized code, ease to 
> reason and maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4131) Update RPC layer to use child allocators of the RootAllocator rather than using the PooledByteBufAllocatorL directly

2016-01-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105584#comment-15105584
 ] 

ASF GitHub Bot commented on DRILL-4131:
---

Github user adeneche commented on the pull request:

https://github.com/apache/drill/pull/327#issuecomment-172604924
  
apart from a small code change, LGTM. 
Let's run a couple of tests (performance and concurrency) before merging 
this PR.

+1


> Update RPC layer to use child allocators of the RootAllocator rather than 
> using the PooledByteBufAllocatorL directly
> 
>
> Key: DRILL-4131
> URL: https://issues.apache.org/jira/browse/DRILL-4131
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>Assignee: Deneche A. Hakim
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3481) Query with Window Function fails with "SYSTEM ERROR: RpcException: Data not accepted downstream."

2016-01-18 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105610#comment-15105610
 ] 

Victoria Markman commented on DRILL-3481:
-

[~agirish] Abhishek, do you happen to remember if you ran this query on a 
cluster (what kind of configuration) or a single node ?

> Query with Window Function fails with "SYSTEM ERROR: RpcException: Data not 
> accepted downstream."
> -
>
> Key: DRILL-3481
> URL: https://issues.apache.org/jira/browse/DRILL-3481
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Abhishek Girish
>  Labels: window_function
> Fix For: Future
>
> Attachments: JSON Profile.txt, Physical Plan.txt
>
>
> I'm seeing an error when executing a simple WF query on the latest master. 
> Dataset: TPC-DS SF1000 - Parquet
> Git.Commit.ID: b6577fe (Jul 8 15)
> Options:
> {code}
> alter session set `planner.memory.max_query_memory_per_node` = 21474836480;
> {code}
> Query:
> {code:sql}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk ORDER BY 
> ss.ss_store_sk)  FROM store_sales ss WHERE ss.ss_store_sk is not NULL LIMIT 
> 20;
> {code}
> Error:
> {code}
> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: 
> RpcException: Data not accepted downstream.
> Fragment 2:1
> [Error Id: 49edceeb-8f10-474d-9cf2-adb4baa13bf4 on ucs-node7.perf.lab:31010]
> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
> at 
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
> at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
> at sqlline.SqlLine.print(SqlLine.java:1583)
> at sqlline.Commands.execute(Commands.java:852)
> at sqlline.Commands.sql(Commands.java:751)
> at sqlline.SqlLine.dispatch(SqlLine.java:738)
> at sqlline.SqlLine.begin(SqlLine.java:612)
> at sqlline.SqlLine.start(SqlLine.java:366)
> at sqlline.SqlLine.main(SqlLine.java:259)
> {code}
> Log:
> {code}
> 2015-07-08 12:17:44,764 ucs-node7.perf.lab [BitClient-2] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2a628c9e-3ff4-cc1b-77d2-bab6d00a2d1d:2:1: State change requested RUNNING --> 
> FAILED
> 2015-07-08 12:17:44,765 ucs-node7.perf.lab [BitClient-2] ERROR 
> o.a.drill.exec.ops.StatusHandler - Data not accepted downstream. Stopping 
> future sends.
> 2015-07-08 12:17:44,765 ucs-node7.perf.lab [BitClient-2] ERROR 
> o.a.drill.exec.ops.StatusHandler - Data not accepted downstream. Stopping 
> future sends.
> 2015-07-08 12:17:44,765 ucs-node7.perf.lab [BitClient-2] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2a628c9e-3ff4-cc1b-77d2-bab6d00a2d1d:2:1: State change requested FAILED --> 
> FAILED
> 2015-07-08 12:17:44,768 ucs-node7.perf.lab [BitClient-2] ERROR 
> o.a.drill.exec.ops.StatusHandler - Data not accepted downstream. Stopping 
> future sends.
> 2015-07-08 12:17:44,768 ucs-node7.perf.lab [BitClient-2] ERROR 
> o.a.drill.exec.ops.StatusHandler - Data not accepted downstream. Stopping 
> future sends.
> 2015-07-08 12:17:44,769 ucs-node7.perf.lab [BitClient-2] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2a628c9e-3ff4-cc1b-77d2-bab6d00a2d1d:2:1: State change requested FAILED --> 
> FAILED
> 2015-07-08 12:17:44,771 ucs-node7.perf.lab [BitClient-2] ERROR 
> o.a.drill.exec.ops.StatusHandler - Data not accepted downstream. Stopping 
> future sends.
> 2015-07-08 12:17:44,771 ucs-node7.perf.lab [BitClient-2] ERROR 
> o.a.drill.exec.ops.StatusHandler - Data not accepted downstream. Stopping 
> future sends.
> 2015-07-08 12:17:44,772 ucs-node7.perf.lab [BitClient-2] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2a628c9e-3ff4-cc1b-77d2-bab6d00a2d1d:2:1: State change requested FAILED --> 
> FAILED
> 2015-07-08 12:17:44,790 ucs-node7.perf.lab 
> [2a628c9e-3ff4-cc1b-77d2-bab6d00a2d1d:frag:2:1] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2a628c9e-3ff4-cc1b-77d2-bab6d00a2d1d:2:1: State change requested FAILED --> 
> FINISHED
> 2015-07-08 12:17:44,814 ucs-node7.perf.lab 
> [2a628c9e-3ff4-cc1b-77d2-bab6d00a2d1d:frag:2:1] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: RpcException: Data not 
> accepted downstream.
> Fragment 2:1
> [Error Id: 49edceeb-8f10-474d-9cf2-adb4baa13bf4 on ucs-node7.perf.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: RpcException: 
> Data not accepted downstream.
> Fragment 2:1
> [Error Id: 49edceeb-8f10-474d-9cf2-adb4baa13bf4 on ucs-node7.perf.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523)
>  ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323)
>  

[jira] [Commented] (DRILL-4270) Create a separate WindowFramer that supports the FRAME clause

2016-01-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105630#comment-15105630
 ] 

ASF GitHub Bot commented on DRILL-4270:
---

Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/322#issuecomment-172619347
  
Renaming sounds fine to avoid misinterpretation.  +1. 


> Create a separate WindowFramer that supports the FRAME clause
> -
>
> Key: DRILL-4270
> URL: https://issues.apache.org/jira/browse/DRILL-4270
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Relational Operators
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
>Assignee: Aman Sinha
> Fix For: Future
>
>
> Currently most the window functions logic is handled by DefaultFrameTemplate. 
> Create a separate CustomFrameTemplate that handles the FRAME clause, this 
> should make the code in both classes more focused and will make it easier for 
> us to add support for the FRAME clause.
> Aggregations, FIRST_VALUE and LAST_VALUE will be handled by 
> CustomFrameTemplate, and all remaining window functions (Ranking, ROW_NUMBER, 
> LEAD and LAG) will be handled by DefaultFrameTemplate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4280) Kerberos Authentication

2016-01-18 Thread Keys Botzum (JIRA)
Keys Botzum created DRILL-4280:
--

 Summary: Kerberos Authentication
 Key: DRILL-4280
 URL: https://issues.apache.org/jira/browse/DRILL-4280
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Keys Botzum


Drill should support Kerberos based authentication from clients. This means 
that both the ODBC and JDBC drivers as well as the web/REST interfaces should 
support inbound Kerberos. For Web this would most likely be SPNEGO while for 
ODBC and JDBC this will be more generic Kerberos.

Since Hive and much of Hadoop supports Kerberos there is a potential for a lot 
of reuse of ideas if not implementation.

Note that this is related to but not the same as 
https://issues.apache.org/jira/browse/DRILL-3584 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4281) Drill should support inbound impersonation

2016-01-18 Thread Keys Botzum (JIRA)
Keys Botzum created DRILL-4281:
--

 Summary: Drill should support inbound impersonation
 Key: DRILL-4281
 URL: https://issues.apache.org/jira/browse/DRILL-4281
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Keys Botzum


Today Drill supports impersonation *to* external sources. For example I can 
authenticate to Drill as myself and then Drill will access HDFS using 
impersonation

In many scenarios we also need impersonation to Drill. For example I might use 
some front end tool (such as Tableau) and authenticate to it as myself. That 
tool (server version) then needs to access Drill to perform queries and I want 
those queries to run as myself, not as the Tableau user. While in theory the 
intermediate tool could store the userid & password for every user to the Drill 
this isn't a scalable or very secure solution.

Note that HS2 today does support inbound impersonation as described here:  
https://issues.apache.org/jira/browse/HIVE-5155 

The above is not the best approach as it is tied to the connection object which 
is very coarse grained and potentially expensive. It would be better if there 
was a call on the ODBC/JDBC driver to switch the identity on a existing 
connection. Most modern SQL databases (Oracle, DB2) support such function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-4266:

Attachment: WebUI_500_iterations.txt

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-4266:

Attachment: memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105469#comment-15105469
 ] 

Victoria Markman commented on DRILL-4266:
-

Of course I forgot about WebUI ... Attaching 500 iterations of the same: 
WebUI_500_iterations.txt, 
memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4131) Update RPC layer to use child allocators of the RootAllocator rather than using the PooledByteBufAllocatorL directly

2016-01-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105666#comment-15105666
 ] 

ASF GitHub Bot commented on DRILL-4131:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/327#discussion_r50033210
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/service/ServiceEngine.java 
---
@@ -56,22 +60,83 @@
   private final DrillConfig config;
   boolean useIP = false;
   private final boolean allowPortHunting;
+  private final BufferAllocator userAllocator;
+  private final BufferAllocator controlAllocator;
+  private final BufferAllocator dataAllocator;
+
 
   public ServiceEngine(ControlMessageHandler controlMessageHandler, 
UserWorker userWorker, BootStrapContext context,
   WorkEventBus workBus, WorkerBee bee, boolean allowPortHunting) 
throws DrillbitStartupException {
+userAllocator = newAllocator(context, "rpc:user", 
"drill.exec.rpc.user.server.memory.reservation",
+"drill.exec.rpc.user.server.memory.maximum");
+controlAllocator = newAllocator(context, "rpc:bit-control",
+"drill.exec.rpc.bit.server.memory.control.reservation", 
"drill.exec.rpc.bit.server.memory.control.maximum");
+dataAllocator = newAllocator(context, "rpc:bit-control",
--- End diff --

nice catch


> Update RPC layer to use child allocators of the RootAllocator rather than 
> using the PooledByteBufAllocatorL directly
> 
>
> Key: DRILL-4131
> URL: https://issues.apache.org/jira/browse/DRILL-4131
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>Assignee: Deneche A. Hakim
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4131) Update RPC layer to use child allocators of the RootAllocator rather than using the PooledByteBufAllocatorL directly

2016-01-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105669#comment-15105669
 ] 

ASF GitHub Bot commented on DRILL-4131:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/327#discussion_r50033277
  
--- Diff: 
exec/memory/base/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java ---
@@ -59,7 +59,7 @@ public PooledByteBufAllocatorL(MetricRegistry registry) {
 
   public UnsafeDirectLittleEndian allocate(int size) {
 try {
-  return allocator.directBuffer(size, size);
+  return allocator.directBuffer(size, Integer.MAX_VALUE);
--- End diff --

The RPC uses ensureWritable to expand buffers. Buffers are expandable to 
maxCapacity. We were coding maxCapacity == capacity which meant that buffers 
weren't expandable and the RPC layer would error out. This fixes this issue.


> Update RPC layer to use child allocators of the RootAllocator rather than 
> using the PooledByteBufAllocatorL directly
> 
>
> Key: DRILL-4131
> URL: https://issues.apache.org/jira/browse/DRILL-4131
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>Assignee: Deneche A. Hakim
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4131) Update RPC layer to use child allocators of the RootAllocator rather than using the PooledByteBufAllocatorL directly

2016-01-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105672#comment-15105672
 ] 

ASF GitHub Bot commented on DRILL-4131:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/327#issuecomment-172623959
  
One quick note, I still am receiving failures on a small set of unit tests 
because of metric registration conflicts so I'll be fixing those. (Happens only 
in tests where a single JVM contains multiple drillbits since the registry is 
currently static.) Also, I'm concerned that some portion of allocation is not 
getting tracked since the tests that Vicky just posted on DRILL-4266 don't show 
a zero peak allocation for the RPC data layer (which doesn't make any sense 
unless all the queries were single fragmented).


> Update RPC layer to use child allocators of the RootAllocator rather than 
> using the PooledByteBufAllocatorL directly
> 
>
> Key: DRILL-4131
> URL: https://issues.apache.org/jira/browse/DRILL-4131
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>Assignee: Deneche A. Hakim
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105674#comment-15105674
 ] 

Jacques Nadeau commented on DRILL-4266:
---

The output provided shows minimal use of the RPC layer. Are any of the tests 
run going to run in more than one fragment?

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4196) some TPCDS queries return wrong result when hash join is disabled

2016-01-18 Thread amit hadke (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105733#comment-15105733
 ] 

amit hadke commented on DRILL-4196:
---

[~vicky] Did you see these errors for query 40 and 52 during testing of the fix 
for DRILL-4190?

> some TPCDS queries return wrong result when hash join is disabled
> -
>
> Key: DRILL-4196
> URL: https://issues.apache.org/jira/browse/DRILL-4196
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Victoria Markman
>Assignee: amit hadke
> Attachments: query40.tar, query52.tar
>
>
> With hash join disabled query52.sql and query40.sql returned incorrect result 
> with 1.4.0 :
> {noformat}
> +-+---+-++--++
> | version | commit_id |   
> commit_message|commit_time
>  | build_email  | build_time |
> +-+---+-++--++
> | 1.4.0-SNAPSHOT  | b9068117177c3b47025f52c00f67938e0c3e4732  | DRILL-4165 
> Add a precondition for size of merge join record batch.  | 08.12.2015 @ 
> 01:25:34 UTC  | Unknown  | 08.12.2015 @ 03:36:25 UTC  |
> +-+---+-++--++
> 1 row selected (2.13 seconds)
> {noformat}
> Setup and options are the same as in DRILL-4190
> See attached queries (.sql), expected result (.e_tsv) and actual output (.out)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105741#comment-15105741
 ] 

Victoria Markman commented on DRILL-4266:
-

No queries that are run in more than one fragment.

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2517) Apply Partition pruning before reading files during planning

2016-01-18 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105993#comment-15105993
 ] 

Jinfeng Ni commented on DRILL-2517:
---

Pull request: https://github.com/apache/drill/pull/328/files 

The PR contains both the change from Adam and Mehant. I added some code change 
on top of their change.

I did some preliminary performance comparison on my Mac laptop.  With 115k 
parquet files in total, it's organized in 25 directories (1990, 1991, ... ), 
and each directory has four subdirectories (Q1, Q2, Q3, Q4). 

For the following query : 
{code}
explain plan for select * from t1 where dir0= 1990 and dir1 = 'Q1';
{code}

Master branch shows 19.4 seconds,  DRLL-2517 patch shows 8.8 seconds. Both 
cases are measured for the second run with warm cache. 
{code}
1 row selected (19.434 seconds)

1 row selected (8.845 seconds)
{code} 

The log shows that the time for reading parquet meta data from footer files is 
significantly reduced (from 7388ms to 102ms) , due the the pruning effect. 

On master branch: 
{code}
Fetch parquet metadata: Executed 115544 out of 115544 using 16 threads. Time: 
7388ms total, 1.019393ms avg, 745ms max.
{code}

With patch:
{code}
Fetch parquet metadata: Executed  out of  using 16 threads. Time: 102ms 
total, 1.053320ms avg, 8ms max.
{code}


> Apply Partition pruning before reading files during planning
> 
>
> Key: DRILL-2517
> URL: https://issues.apache.org/jira/browse/DRILL-2517
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Adam Gilmore
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Partition pruning still tries to read Parquet files during the planning stage 
> even though they don't match the partition filter.
> For example, if there were an invalid Parquet file in a directory that should 
> not be queried:
> {code}
> 0: jdbc:drill:zk=local> select sum(price) from dfs.tmp.purchases where dir0 = 
> 1;
> Query failed: IllegalArgumentException: file:/tmp/purchases/4/0_0_0.parquet 
> is not a Parquet file (too small)
> {code}
> The reason is that the partition pruning happens after the Parquet plugin 
> tries to read the footer of each file.
> Ideally, partition pruning would happen first before the format plugin gets 
> involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4234) Drill running out of memory for a simple CTAS query

2016-01-18 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli closed DRILL-4234.


Verified and added a testcase

> Drill running out of memory for a simple CTAS query
> ---
>
> Key: DRILL-4234
> URL: https://issues.apache.org/jira/browse/DRILL-4234
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Writer
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.5.0
>
>
> git.commit.id.abbrev=6dea429
> Memory Settings on the cluster
> {code}
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> export DRILL_JAVA_OPTS="-Xms1G -Xmx$DRILL_MAX_HEAP 
> -XX:MaxDirectMemorySize=$DRILL_MAX_DIRECT_MEMORY -XX:MaxPermSize=512M 
> -XX:ReservedCodeCacheSize=1G -ea"
> {code}
> The below query runs out of memory
> {code}
> create table `tpch_single_partition/lineitem` partition by (l_moddate) as 
> select l.*, l_shipdate - extract(day from l_shipdate) + 1 l_moddate from 
> cp.`tpch/lineitem.parquet` l;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> {code}
> Below is the information from the logs
> {code}
> 2015-12-30 17:44:00,164 [297be821-460b-689f-5b40-8897a6013ffb:frag:0:0] INFO  
> o.a.d.e.w.f.FragmentStatusReporter - 
> 297be821-460b-689f-5b40-8897a6013ffb:0:0: State to report: RUNNING
> 2015-12-30 17:44:00,868 [297be821-460b-689f-5b40-8897a6013ffb:frag:0:0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> [Error Id: 1843305a-cc43-42c5-9b96-e1a887972999 ]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:266)
>  [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
>   at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: 
> org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate sv2, 
> and not enough batchGroups to spill
>   at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:356)
>  ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:94)
>  ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132)
>  

[jira] [Commented] (DRILL-4279) The plan is either confusing or could lead to execution problem, when no columns is required from SCAN

2016-01-18 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106026#comment-15106026
 ] 

Jinfeng Ni commented on DRILL-4279:
---

Turns out that the execution error listed above is happening due to fact that 
the input file has a schema change : when the plan is executed in distributed 
mode, Drill will lose the SKIP_ALL mode, and have to read every column from the 
json file, and that will hit the schema change.  If the plan is executed in 
single mode, then the schema change will not be exposed. 

This just confirms that the SKIP_MODE is indeed not serialized, and it could 
cause performance overhead, since SCAN operator has to read every columns from 
the data source.



> The plan is either confusing or could lead to execution problem, when no 
> columns is required from SCAN
> --
>
> Key: DRILL-4279
> URL: https://issues.apache.org/jira/browse/DRILL-4279
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> When query does not specify any specific column to be returned SCAN,  for 
> instance,
> {code}
> Q1:  select count(*) from T1;
> Q2:  select 1 + 100 from T1;
> Q3:  select  1.0 + random() from T1; 
> {code}
> Drill's planner would use a ColumnList with * column, plus a SKIP_ALL mode. 
> However, the MODE is not serialized / deserialized. This leads to two 
> problems.
> 1).  The EXPLAIN plan is confusing, since there is no way to different from a 
> "SELECT * " query from this SKIP_ALL mode. 
> For instance, 
> {code}
> explain plan for select count(*) from dfs.`/Users/jni/work/data/yelp/t1`;
> 00-03  Project($f0=[0])
> 00-04Scan(groupscan=[EasyGroupScan 
> [selectionRoot=file:/Users/jni/work/data/yelp/t1, numFiles=2, columns=[`*`], 
> files= ... 
> {code} 
> 2) If the query is to be executed distributed / parallel,  the missing 
> serialization of mode would means some Fragment is fetching all the columns, 
> while some Fragment is skipping all the columns. That will cause execution 
> error.
> For instance, by changing slice_target to enforce the query to be executed in 
> multiple fragments, it will hit execution error. 
> {code}
> select count(*) from dfs.`/Users/jni/work/data/yelp/t1`;
> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: 
> Error parsing JSON - You tried to start when you are using a ValueWriter of 
> type NullableBitWriterImpl.
> {code}
> Directory "t1" just contains two yelp JSON files. 
> Ideally, I think when no columns is required from SCAN, the explain plan 
> should show an empty of column list. The MODE of SKIP_ALL together with star 
> * column seems to be confusing and error prone. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4278) Memory leak when using LIMIT

2016-01-18 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106110#comment-15106110
 ] 

Jacques Nadeau commented on DRILL-4278:
---

While we do hold memory here, I don't believe this has the explosive issue you 
were seeing. I believe the issue is actually in us caching a large number of 
Hadoop Configuration objects (which are each fairly sizable). In my example 
tests ~80% of heap was being used up by those objects. It would show up by 
default in the Eclipse Memory Analyzer since I believe that hides JVM classes 
(which are what dominate this issue).

> Memory leak when using LIMIT
> 
>
> Key: DRILL-4278
> URL: https://issues.apache.org/jira/browse/DRILL-4278
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.4.0, 1.5.0
> Environment: OS X
> 0: jdbc:drill:zk=local> select * from sys.version;
> +--+---+-++++
> | version  | commit_id |   
> commit_message|commit_time |
> build_email | build_time |
> +--+---+-++++
> | 1.4.0| 32b871b24c7b69f59a1d2e70f444eed6e599e825  | 
> [maven-release-plugin] prepare release drill-1.4.0  | 08.12.2015 @ 00:24:59 
> PST  | venki.koruka...@gmail.com  | 08.12.2015 @ 01:14:39 PST  |
> +--+---+-++++
> 0: jdbc:drill:zk=local> select * from sys.options where status <> 'DEFAULT';
> +-+---+-+--+--+-+---++
> |name | kind  |  type   |  status  | num_val  | 
> string_val  | bool_val  | float_val  |
> +-+---+-+--+--+-+---++
> | planner.slice_target| LONG  | SYSTEM  | CHANGED  | 10   | null  
>   | null  | null   |
> | planner.width.max_per_node  | LONG  | SYSTEM  | CHANGED  | 5| null  
>   | null  | null   |
> +-+---+-+--+--+-+---++
> 2 rows selected (0.16 seconds)
>Reporter: jean-claude
>
> copy the parquet files in the samples directory so that you have a 12 or so
> $ ls -lha /apache-drill-1.4.0/sample-data/nationsMF/
> nationsMF1.parquet
> nationsMF2.parquet
> nationsMF3.parquet
> create a file with a few thousand lines like these
> select * from dfs.`/Users/jccote/apache-drill-1.4.0/sample-data/nationsMF` 
> limit 500;
> start drill
> $ /apache-drill-1.4.0/bin/drill-embeded
> reduce the slice target size to force drill to use multiple fragment/threads
> jdbc:drill:zk=local> system set planner.slice_target=10;
> now run the list of queries from the file your created above
> jdbc:drill:zk=local> !run /Users/jccote/test-memory-leak-using-limit.sql
> the java heap space keeps going up until the old space is at 100% and 
> eventually you get an OutOfMemoryException in drill
> $ jstat -gccause 86850 5s
>   S0 S1 E  O  M CCSYGC YGCTFGCFGCT 
> GCTLGCC GCC 
>   0.00   0.00 100.00 100.00  98.56  96.71   2279   26.682   240  458.139  
> 484.821 GCLocker Initiated GC Ergonomics  
>   0.00   0.00 100.00  99.99  98.56  96.71   2279   26.682   242  461.347  
> 488.028 Allocation Failure   Ergonomics  
>   0.00   0.00 100.00  99.99  98.56  96.71   2279   26.682   245  466.630  
> 493.311 Allocation Failure   Ergonomics  
>   0.00   0.00 100.00  99.99  98.56  96.71   2279   26.682   247  470.020  
> 496.702 Allocation Failure   Ergonomics  
> If you do the same test but do not use the LIMIT then the memory usage does 
> not go up.
> If you add a where clause so that no results are returned, then the memory 
> usage does not go up.
> Something with the RPC layer?
> Also it seems sensitive to the number of fragments/threads. If you limit it 
> to one fragment/thread the memory usage goes up much slower.
> I have used parquet files and CSV files. In either case the behaviour is the 
> same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105979#comment-15105979
 ] 

Jacques Nadeau commented on DRILL-4266:
---

Based on these metrics, the leak isn't in the RPC layer. Let me add some more 
metrics and we'll get a better snapshot of the memory allocation caching layer.

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106033#comment-15106033
 ] 

Victoria Markman commented on DRILL-4266:
-

If you look at the memory output, it looks very strange: it's not like it is 
growing constantly. It grows and drops, then grows a bit again, and drops. I 
thought that it stabilized somewhere around 1.8GB, but no, in one of the runs 
it went to 2GB. This looks more like fragmentation of some sort. Addin more 
metrics will help a lot. It would be nice if we could graph that staff as well. 

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4278) Memory leak when using LIMIT

2016-01-18 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106102#comment-15106102
 ] 

Jacques Nadeau commented on DRILL-4278:
---

I believe the patch in this pull request will fix the problem. Let me know if 
you have a chance to test on your side:

https://github.com/apache/drill/pull/331

> Memory leak when using LIMIT
> 
>
> Key: DRILL-4278
> URL: https://issues.apache.org/jira/browse/DRILL-4278
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.4.0, 1.5.0
> Environment: OS X
> 0: jdbc:drill:zk=local> select * from sys.version;
> +--+---+-++++
> | version  | commit_id |   
> commit_message|commit_time |
> build_email | build_time |
> +--+---+-++++
> | 1.4.0| 32b871b24c7b69f59a1d2e70f444eed6e599e825  | 
> [maven-release-plugin] prepare release drill-1.4.0  | 08.12.2015 @ 00:24:59 
> PST  | venki.koruka...@gmail.com  | 08.12.2015 @ 01:14:39 PST  |
> +--+---+-++++
> 0: jdbc:drill:zk=local> select * from sys.options where status <> 'DEFAULT';
> +-+---+-+--+--+-+---++
> |name | kind  |  type   |  status  | num_val  | 
> string_val  | bool_val  | float_val  |
> +-+---+-+--+--+-+---++
> | planner.slice_target| LONG  | SYSTEM  | CHANGED  | 10   | null  
>   | null  | null   |
> | planner.width.max_per_node  | LONG  | SYSTEM  | CHANGED  | 5| null  
>   | null  | null   |
> +-+---+-+--+--+-+---++
> 2 rows selected (0.16 seconds)
>Reporter: jean-claude
>
> copy the parquet files in the samples directory so that you have a 12 or so
> $ ls -lha /apache-drill-1.4.0/sample-data/nationsMF/
> nationsMF1.parquet
> nationsMF2.parquet
> nationsMF3.parquet
> create a file with a few thousand lines like these
> select * from dfs.`/Users/jccote/apache-drill-1.4.0/sample-data/nationsMF` 
> limit 500;
> start drill
> $ /apache-drill-1.4.0/bin/drill-embeded
> reduce the slice target size to force drill to use multiple fragment/threads
> jdbc:drill:zk=local> system set planner.slice_target=10;
> now run the list of queries from the file your created above
> jdbc:drill:zk=local> !run /Users/jccote/test-memory-leak-using-limit.sql
> the java heap space keeps going up until the old space is at 100% and 
> eventually you get an OutOfMemoryException in drill
> $ jstat -gccause 86850 5s
>   S0 S1 E  O  M CCSYGC YGCTFGCFGCT 
> GCTLGCC GCC 
>   0.00   0.00 100.00 100.00  98.56  96.71   2279   26.682   240  458.139  
> 484.821 GCLocker Initiated GC Ergonomics  
>   0.00   0.00 100.00  99.99  98.56  96.71   2279   26.682   242  461.347  
> 488.028 Allocation Failure   Ergonomics  
>   0.00   0.00 100.00  99.99  98.56  96.71   2279   26.682   245  466.630  
> 493.311 Allocation Failure   Ergonomics  
>   0.00   0.00 100.00  99.99  98.56  96.71   2279   26.682   247  470.020  
> 496.702 Allocation Failure   Ergonomics  
> If you do the same test but do not use the LIMIT then the memory usage does 
> not go up.
> If you add a where clause so that no results are returned, then the memory 
> usage does not go up.
> Something with the RPC layer?
> Also it seems sensitive to the number of fragments/threads. If you limit it 
> to one fragment/thread the memory usage goes up much slower.
> I have used parquet files and CSV files. In either case the behaviour is the 
> same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4256) Performance regression in hive planning

2016-01-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105982#comment-15105982
 ] 

ASF GitHub Bot commented on DRILL-4256:
---

GitHub user vkorukanti opened a pull request:

https://github.com/apache/drill/pull/329

DRILL-4256: Create HiveConf per HiveStoragePlugin and reuse it wherev…

…er needed.

Creating new instances of HiveConf() are very costly, we should avoid 
creating new ones as much as possible.
Also get rid of hiveConfigOverride and use HiveConf in HiveStoregPlugin 
wherever we need the HiveConf.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vkorukanti/drill DRILL-4256

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/329.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #329


commit 3769dada12dafc7cd9209551e96184c968d19f73
Author: vkorukanti 
Date:   2016-01-11T23:01:02Z

DRILL-4256: Create HiveConf per HiveStoragePlugin and reuse it wherever 
needed.

Creating new instances of HiveConf() are very costly, we should avoid 
creating new ones as much as possible.
Also get rid of hiveConfigOverride and use HiveConf in HiveStoregPlugin 
wherever we need the HiveConf.




> Performance regression in hive planning
> ---
>
> Key: DRILL-4256
> URL: https://issues.apache.org/jira/browse/DRILL-4256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
> Attachments: jstack.tgz
>
>
> Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> The fix for reading hive tables backed by hbase caused a performance 
> regression. The data set used in the below test has ~3700 partitions and the 
> filter in the query would ensure only 1 partition get selected.
> {code}
> Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~25 seconds
> {code}
> {code}
> Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~6.5 seconds
> {code}
> Since the data is large, I couldn't attach it here. Reach out to me if you 
> need additional information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4278) Memory leak when using LIMIT

2016-01-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106101#comment-15106101
 ] 

ASF GitHub Bot commented on DRILL-4278:
---

GitHub user jacques-n opened a pull request:

https://github.com/apache/drill/pull/331

DRILL-4278: Fix issue where WorkspaceConfig was not returning consist…ent 
hashCode()s for equal objects.

WorkspaceConfig was generating different values which caused us to cache 
separate instances of the plugin.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jacques-n/drill DRILL-4278

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/331.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #331


commit 6faed2f380385390c270c6699c653956b8364826
Author: Jacques Nadeau 
Date:   2016-01-19T01:42:33Z

DRILL-4278: Fix issue where WorkspaceConfig was not returning consistent 
hashCode()s for equal objects.




> Memory leak when using LIMIT
> 
>
> Key: DRILL-4278
> URL: https://issues.apache.org/jira/browse/DRILL-4278
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.4.0, 1.5.0
> Environment: OS X
> 0: jdbc:drill:zk=local> select * from sys.version;
> +--+---+-++++
> | version  | commit_id |   
> commit_message|commit_time |
> build_email | build_time |
> +--+---+-++++
> | 1.4.0| 32b871b24c7b69f59a1d2e70f444eed6e599e825  | 
> [maven-release-plugin] prepare release drill-1.4.0  | 08.12.2015 @ 00:24:59 
> PST  | venki.koruka...@gmail.com  | 08.12.2015 @ 01:14:39 PST  |
> +--+---+-++++
> 0: jdbc:drill:zk=local> select * from sys.options where status <> 'DEFAULT';
> +-+---+-+--+--+-+---++
> |name | kind  |  type   |  status  | num_val  | 
> string_val  | bool_val  | float_val  |
> +-+---+-+--+--+-+---++
> | planner.slice_target| LONG  | SYSTEM  | CHANGED  | 10   | null  
>   | null  | null   |
> | planner.width.max_per_node  | LONG  | SYSTEM  | CHANGED  | 5| null  
>   | null  | null   |
> +-+---+-+--+--+-+---++
> 2 rows selected (0.16 seconds)
>Reporter: jean-claude
>
> copy the parquet files in the samples directory so that you have a 12 or so
> $ ls -lha /apache-drill-1.4.0/sample-data/nationsMF/
> nationsMF1.parquet
> nationsMF2.parquet
> nationsMF3.parquet
> create a file with a few thousand lines like these
> select * from dfs.`/Users/jccote/apache-drill-1.4.0/sample-data/nationsMF` 
> limit 500;
> start drill
> $ /apache-drill-1.4.0/bin/drill-embeded
> reduce the slice target size to force drill to use multiple fragment/threads
> jdbc:drill:zk=local> system set planner.slice_target=10;
> now run the list of queries from the file your created above
> jdbc:drill:zk=local> !run /Users/jccote/test-memory-leak-using-limit.sql
> the java heap space keeps going up until the old space is at 100% and 
> eventually you get an OutOfMemoryException in drill
> $ jstat -gccause 86850 5s
>   S0 S1 E  O  M CCSYGC YGCTFGCFGCT 
> GCTLGCC GCC 
>   0.00   0.00 100.00 100.00  98.56  96.71   2279   26.682   240  458.139  
> 484.821 GCLocker Initiated GC Ergonomics  
>   0.00   0.00 100.00  99.99  98.56  96.71   2279   26.682   242  461.347  
> 488.028 Allocation Failure   Ergonomics  
>   0.00   0.00 100.00  99.99  98.56  96.71   2279   26.682   245  466.630  
> 493.311 Allocation Failure   Ergonomics  
>   0.00   0.00 100.00  99.99  98.56  96.71   2279   26.682   247  470.020  
> 496.702 Allocation Failure   Ergonomics  
> If you do the same test but do not use the LIMIT then the memory usage does 
> not go 

[jira] [Assigned] (DRILL-2517) Apply Partition pruning before reading files during planning

2016-01-18 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-2517:
-

Assignee: Jinfeng Ni

> Apply Partition pruning before reading files during planning
> 
>
> Key: DRILL-2517
> URL: https://issues.apache.org/jira/browse/DRILL-2517
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Adam Gilmore
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Partition pruning still tries to read Parquet files during the planning stage 
> even though they don't match the partition filter.
> For example, if there were an invalid Parquet file in a directory that should 
> not be queried:
> {code}
> 0: jdbc:drill:zk=local> select sum(price) from dfs.tmp.purchases where dir0 = 
> 1;
> Query failed: IllegalArgumentException: file:/tmp/purchases/4/0_0_0.parquet 
> is not a Parquet file (too small)
> {code}
> The reason is that the partition pruning happens after the Parquet plugin 
> tries to read the footer of each file.
> Ideally, partition pruning would happen first before the format plugin gets 
> involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4275) Refactor e/pstore interfaces and their factories to provide a unified mechanism to access stores

2016-01-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106280#comment-15106280
 ] 

ASF GitHub Bot commented on DRILL-4275:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/325#issuecomment-172747355
  
Let's do a separate hangout. This seems to deep/specific to cover in a
large group.
On Jan 18, 2016 9:51 AM, "Hanifi Gunes"  wrote:

> Cool. Let's sync up on tomorrow's weekly hangout. Let me know if that does
> not work for you.
>
> —
> Reply to this email directly or view it on GitHub
> .
>



> Refactor e/pstore interfaces and their factories to provide a unified 
> mechanism to access stores
> 
>
> Key: DRILL-4275
> URL: https://issues.apache.org/jira/browse/DRILL-4275
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: Hanifi Gunes
>Assignee: Deneche A. Hakim
>
> We rely on E/PStore interfaces to persist data. Even though E/PStore stands 
> for Ephemeral and Persistent stores respectively, the current design for 
> EStore does not extend the interface/functionality of PStore at all, which 
> hints abstraction for EStore is redundant. This issue proposes a new unified 
> Store interface replacing the old E/PStore that exposes an additional method 
> that report persistence level as follows:
> {code:title=Store interface}
> interface Store {
>   StoreMode getMode();
>   V get(String key);
>   ...
> }
> enum StoreMode {
>   EPHEMERAL,
>   PERSISTENT,
>   ...
> }
> {code}
> The new design brings in less redundancy, more centralized code, ease to 
> reason and maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4278) Memory leak when using LIMIT

2016-01-18 Thread jean-claude (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106216#comment-15106216
 ] 

jean-claude commented on DRILL-4278:


Thanks Jaques however I'm afraid this does not eliminate the issue.

I have done the pull
$ git pull https://github.com/jacques-n/drill DRILL-4278
Then
mvn clean install -DskipTests

I then re-ran my tests using the tarball built in distribution/target/

The problem still remains.

Note I don't seem a memory leak if I remove the LIMIT from my query or if I 
make the LIMIT larger then the entire data set. The fix you made I'm sure is 
valid however I don't think it would be related to the LIMIT clause correct?

I've tried many variant, different file format, different number of files, in 
hdfs and not. The only thing that seems to affect the this leak is the use of 
the LIMIT clause.

Another observation that might be of interest the leak is more pronounced when 
that records (rows) are of a substantial size. See my JSON example above I have 
a rather large string. If the row is rather small than it leaks very slowly.




> Memory leak when using LIMIT
> 
>
> Key: DRILL-4278
> URL: https://issues.apache.org/jira/browse/DRILL-4278
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.4.0, 1.5.0
> Environment: OS X
> 0: jdbc:drill:zk=local> select * from sys.version;
> +--+---+-++++
> | version  | commit_id |   
> commit_message|commit_time |
> build_email | build_time |
> +--+---+-++++
> | 1.4.0| 32b871b24c7b69f59a1d2e70f444eed6e599e825  | 
> [maven-release-plugin] prepare release drill-1.4.0  | 08.12.2015 @ 00:24:59 
> PST  | venki.koruka...@gmail.com  | 08.12.2015 @ 01:14:39 PST  |
> +--+---+-++++
> 0: jdbc:drill:zk=local> select * from sys.options where status <> 'DEFAULT';
> +-+---+-+--+--+-+---++
> |name | kind  |  type   |  status  | num_val  | 
> string_val  | bool_val  | float_val  |
> +-+---+-+--+--+-+---++
> | planner.slice_target| LONG  | SYSTEM  | CHANGED  | 10   | null  
>   | null  | null   |
> | planner.width.max_per_node  | LONG  | SYSTEM  | CHANGED  | 5| null  
>   | null  | null   |
> +-+---+-+--+--+-+---++
> 2 rows selected (0.16 seconds)
>Reporter: jean-claude
>
> copy the parquet files in the samples directory so that you have a 12 or so
> $ ls -lha /apache-drill-1.4.0/sample-data/nationsMF/
> nationsMF1.parquet
> nationsMF2.parquet
> nationsMF3.parquet
> create a file with a few thousand lines like these
> select * from dfs.`/Users/jccote/apache-drill-1.4.0/sample-data/nationsMF` 
> limit 500;
> start drill
> $ /apache-drill-1.4.0/bin/drill-embeded
> reduce the slice target size to force drill to use multiple fragment/threads
> jdbc:drill:zk=local> system set planner.slice_target=10;
> now run the list of queries from the file your created above
> jdbc:drill:zk=local> !run /Users/jccote/test-memory-leak-using-limit.sql
> the java heap space keeps going up until the old space is at 100% and 
> eventually you get an OutOfMemoryException in drill
> $ jstat -gccause 86850 5s
>   S0 S1 E  O  M CCSYGC YGCTFGCFGCT 
> GCTLGCC GCC 
>   0.00   0.00 100.00 100.00  98.56  96.71   2279   26.682   240  458.139  
> 484.821 GCLocker Initiated GC Ergonomics  
>   0.00   0.00 100.00  99.99  98.56  96.71   2279   26.682   242  461.347  
> 488.028 Allocation Failure   Ergonomics  
>   0.00   0.00 100.00  99.99  98.56  96.71   2279   26.682   245  466.630  
> 493.311 Allocation Failure   Ergonomics  
>   0.00   0.00 100.00  99.99  98.56  96.71   2279   26.682   247  470.020  
> 496.702 Allocation Failure   Ergonomics  
> If you do the same test but do not use the LIMIT then the memory usage does 
> not go up.
> If you add a where clause 

[jira] [Updated] (DRILL-4196) some TPCDS queries return wrong result when hash join is disabled

2016-01-18 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-4196:

Attachment: 1.5.0-amit-branch_tpcds_sf1.txt

> some TPCDS queries return wrong result when hash join is disabled
> -
>
> Key: DRILL-4196
> URL: https://issues.apache.org/jira/browse/DRILL-4196
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Victoria Markman
>Assignee: amit hadke
> Attachments: 1.5.0-amit-branch_tpcds_sf1.txt, query40.tar, query52.tar
>
>
> With hash join disabled query52.sql and query40.sql returned incorrect result 
> with 1.4.0 :
> {noformat}
> +-+---+-++--++
> | version | commit_id |   
> commit_message|commit_time
>  | build_email  | build_time |
> +-+---+-++--++
> | 1.4.0-SNAPSHOT  | b9068117177c3b47025f52c00f67938e0c3e4732  | DRILL-4165 
> Add a precondition for size of merge join record batch.  | 08.12.2015 @ 
> 01:25:34 UTC  | Unknown  | 08.12.2015 @ 03:36:25 UTC  |
> +-+---+-++--++
> 1 row selected (2.13 seconds)
> {noformat}
> Setup and options are the same as in DRILL-4190
> See attached queries (.sql), expected result (.e_tsv) and actual output (.out)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4196) some TPCDS queries return wrong result when hash join is disabled

2016-01-18 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106289#comment-15106289
 ] 

Victoria Markman commented on DRILL-4196:
-

Attached are the results: wrong result is still there. Pretty much the same 
queries that were reported by [~cch...@maprtech.com] in DRILL-4190

> some TPCDS queries return wrong result when hash join is disabled
> -
>
> Key: DRILL-4196
> URL: https://issues.apache.org/jira/browse/DRILL-4196
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Victoria Markman
>Assignee: amit hadke
> Attachments: 1.5.0-amit-branch_tpcds_sf1.txt, query40.tar, query52.tar
>
>
> With hash join disabled query52.sql and query40.sql returned incorrect result 
> with 1.4.0 :
> {noformat}
> +-+---+-++--++
> | version | commit_id |   
> commit_message|commit_time
>  | build_email  | build_time |
> +-+---+-++--++
> | 1.4.0-SNAPSHOT  | b9068117177c3b47025f52c00f67938e0c3e4732  | DRILL-4165 
> Add a precondition for size of merge join record batch.  | 08.12.2015 @ 
> 01:25:34 UTC  | Unknown  | 08.12.2015 @ 03:36:25 UTC  |
> +-+---+-++--++
> 1 row selected (2.13 seconds)
> {noformat}
> Setup and options are the same as in DRILL-4190
> See attached queries (.sql), expected result (.e_tsv) and actual output (.out)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4196) some TPCDS queries return wrong result when hash join is disabled

2016-01-18 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105758#comment-15105758
 ] 

Victoria Markman commented on DRILL-4196:
-

[~aah] I did see verification failures, not sure about query 40 and 52. Let me 
dig it out or rerun it. Stay tuned.

> some TPCDS queries return wrong result when hash join is disabled
> -
>
> Key: DRILL-4196
> URL: https://issues.apache.org/jira/browse/DRILL-4196
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Victoria Markman
>Assignee: amit hadke
> Attachments: query40.tar, query52.tar
>
>
> With hash join disabled query52.sql and query40.sql returned incorrect result 
> with 1.4.0 :
> {noformat}
> +-+---+-++--++
> | version | commit_id |   
> commit_message|commit_time
>  | build_email  | build_time |
> +-+---+-++--++
> | 1.4.0-SNAPSHOT  | b9068117177c3b47025f52c00f67938e0c3e4732  | DRILL-4165 
> Add a precondition for size of merge join record batch.  | 08.12.2015 @ 
> 01:25:34 UTC  | Unknown  | 08.12.2015 @ 03:36:25 UTC  |
> +-+---+-++--++
> 1 row selected (2.13 seconds)
> {noformat}
> Setup and options are the same as in DRILL-4190
> See attached queries (.sql), expected result (.e_tsv) and actual output (.out)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)