[jira] [Commented] (DRILL-4287) Do lazy reading of parquet metadata cache file

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150037#comment-15150037
 ] 

ASF GitHub Bot commented on DRILL-4287:
---

Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/376#issuecomment-185074140
  
Simplified the state management in FileSelection.  @jacques-n is this close 
enough to what you intended ?  Also, want to note that this JIRA is motivated 
by a performance enhancement, so I haven't added a new unit test.  


> Do lazy reading of parquet metadata cache file
> --
>
> Key: DRILL-4287
> URL: https://issues.apache.org/jira/browse/DRILL-4287
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Jinfeng Ni
>
> Currently, the parquet metadata cache file is read eagerly during creation of 
> the DrillTable (as part of ParquetFormatMatcher.isReadable()).  This is not 
> desirable from performance standpoint since there are scenarios where we want 
> to do some up-front optimizations - e.g. directory-based partition pruning 
> (see DRILL-2517) or potential limit 0 optimization etc. - and in such 
> situations it is better to do lazy reading of the metadata cache file.   
> This is a placeholder to perform such delayed reading since it is needed for 
> the aforementioned optimizations.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4039) Query fails when non-ascii characters are used in string literals

2016-02-16 Thread liyun Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150025#comment-15150025
 ] 

liyun Liu edited comment on DRILL-4039 at 2/17/16 7:24 AM:
---

I am using Drill 1.4.0.

Use bin/drill-embedded to launch Drill shell. Issue a SQL which contains some
Chinese characters, there will be the following error:

{quote}
0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json` WHERE last_name = '世界' 
LIMIT 3;
Feb 17, 2016 11:17:38 AM org.apache.calcite.runtime.CalciteException 
SEVERE: org.apache.calcite.runtime.CalciteException: Failed to encode '世界' in 
character set 'ISO-8859-1'
Error: SYSTEM ERROR: CalciteException: Failed to encode '世界' in character set 
'ISO-8859-1'


[Error Id: 33cfc8ba-acde-4122-9020-cf61abbf2b42 on jinglin:31010] 
(state=,code=0)
{quote}

And log/sqlline.log has the following log message:

{quote}
[Error Id: 33cfc8ba-acde-4122-9020-cf61abbf2b42 on jinglin:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
 ~[drill-common-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742)
 [drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
 [drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
 [drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
[drill-common-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
 [drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) 
[drill-java-exec-1.4.0.jar:1.4.0]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) 
[drill-java-exec-1.4.0.jar:1.4.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_72]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_72]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
exception during fragment initialization: Internal error: while converting 
`employee.json`.`last_name` = '世界'
... 4 common frames omitted
Caused by: java.lang.AssertionError: Internal error: while converting 
`employee.json`.`last_name` = '世界'
at org.apache.calcite.util.Util.newInternal(Util.java:792) 
~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.sql2rel.ReflectiveConvertletTable$1.convertCall(ReflectiveConvertletTable.java:96)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall(SqlNodeToRexConverterImpl.java:59)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4165)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:3598)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:130) 
~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4057)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertWhere(SqlToRelConverter.java:920)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:606)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:583)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2790)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:537)
 ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at org.apache.calcite.prepare.PlannerImpl.convert(PlannerImpl.java:214) 
~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:471)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:201)
 ~[drill-java-exec-1.4.0.jar:1.4.0]
at 

[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-16 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149901#comment-15149901
 ] 

Jinfeng Ni commented on DRILL-4392:
---

This issue seems to be a regression of DRILL-4382 [1]. I moved to one commit 
earlier, and did not see such problem.

Also, if Select clause in CTAS does not contain * column, it hit this problem 
as well.  That is, the additional internal field seems to not be added because 
of the * column logic.

{code}
create table nation_ctas partition by (n_regionkey) as select n_nationkey, 
n_regionkey, n_name from cp.`tpch/nation.parquet`;
select * from dfs.tmp.nation_ctas;

+--+--+-++
| n_nationkey  | n_regionkey  | n_name  | 
P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
+--+--+-++
| 5| 0| ETHIOPIA| true  
 |
| 15   | 0| MOROCCO | false 
 |
{code}


[1] 
https://github.com/apache/drill/commit/9a3a5c4ff670a50a49f61f97dd838da59a12f976

> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 

[jira] [Updated] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-16 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-4392:
--
Assignee: Steven Phillips

> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Steven Phillips
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support. 
> Also, it will also fail one of the Pre-commit functional test [1]
> [1] 
> https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4287) Do lazy reading of parquet metadata cache file

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149867#comment-15149867
 ] 

ASF GitHub Bot commented on DRILL-4287:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/376#discussion_r53121603
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java 
---
@@ -47,6 +49,14 @@
   public List files;
   public final String selectionRoot;
 
+  private enum StatusType {
+CHECKED_DIRS,// whether we have already checked for directories
+HAS_DIRS,// whether directories were found in the selection
+EXPANDED_DIRS// whether this selection has been expanded to 
files
+  }
+
+  private final BitSet dirStatus;
--- End diff --

Sorry I wasn't clearer. 


> Do lazy reading of parquet metadata cache file
> --
>
> Key: DRILL-4287
> URL: https://issues.apache.org/jira/browse/DRILL-4287
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Jinfeng Ni
>
> Currently, the parquet metadata cache file is read eagerly during creation of 
> the DrillTable (as part of ParquetFormatMatcher.isReadable()).  This is not 
> desirable from performance standpoint since there are scenarios where we want 
> to do some up-front optimizations - e.g. directory-based partition pruning 
> (see DRILL-2517) or potential limit 0 optimization etc. - and in such 
> situations it is better to do lazy reading of the metadata cache file.   
> This is a placeholder to perform such delayed reading since it is needed for 
> the aforementioned optimizations.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-16 Thread Jinfeng Ni (JIRA)
Jinfeng Ni created DRILL-4392:
-

 Summary: CTAS with partition writes an internal field into 
generated parquet files
 Key: DRILL-4392
 URL: https://issues.apache.org/jira/browse/DRILL-4392
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jinfeng Ni
Priority: Blocker


On today's master branch:

{code}
select * from sys.version;
+-+---+-++-++
| version | commit_id | 
  commit_message|commit_time
 |   build_email   | build_time |
+-+---+-++-++
| 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
Remove dependency on drill-logical from vector package  | 16.02.2016 @ 11:58:48 
PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
+-+---+-++-
{code}

Parquet table created by Drill's CTAS statement has one internal field 
"P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
impact non-star query, but would cause incorrect result for star query.

{code}
use dfs.tmp;

create table nation_ctas partition by (n_regionkey) as select * from 
cp.`tpch/nation.parquet`;

select * from dfs.tmp.nation_ctas limit 6;
+--++--+-++
| n_nationkey  | n_name | n_regionkey  |
n_comment   
 | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
+--++--+-++
| 5| ETHIOPIA   | 0| ven packages wake quickly. 
regu
 | true   |
| 15   | MOROCCO| 0| rns. blithely bold courts 
among the closely regular packages use furiously bold platelets?
  | false  |
| 14   | KENYA  | 0|  pending excuses haggle 
furiously deposits. pending, express pinto beans wake fluffily past t   
| false  |
| 0| ALGERIA| 0|  haggle. carefully final 
deposits detect slyly agai  
   | false  |
| 16   | MOZAMBIQUE | 0| s. ironic, unusual asymptotes 
wake blithely r 
  | false  |
| 24   | UNITED STATES  | 1| y final packages. slow foxes 
cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto 
be  | true
{code}

This basically breaks all the parquet files created by Drill's CTAS with 
partition support. 

Also, it will also fail one of the Pre-commit functional test [1]

[1] 
https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4264) Dots in identifier are not escaped correctly

2016-02-16 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149859#comment-15149859
 ] 

Zelaine Fong commented on DRILL-4264:
-

Although the error is slightly different, DRILL-3922 looks like it might be 
related.

> Dots in identifier are not escaped correctly
> 
>
> Key: DRILL-4264
> URL: https://issues.apache.org/jira/browse/DRILL-4264
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Reporter: Alex
>
> If you have some json data like this...
> {code:javascript}
> {
>   "0.0.1":{
> "version":"0.0.1",
> "date_created":"2014-03-15"
>   },
>   "0.1.2":{
> "version":"0.1.2",
> "date_created":"2014-05-21"
>   }
> }
> {code}
> ... there is no way to select any of the rows since their identifiers contain 
> dots and when trying to select them, Drill throws the following error:
> Error: SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference 
> "0.0.1"; a field reference identifier must not have the form of a qualified 
> name
> This must be fixed since there are many json data files containing dots in 
> some of the keys (e.g. when specifying version numbers etc)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4287) Do lazy reading of parquet metadata cache file

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149840#comment-15149840
 ] 

ASF GitHub Bot commented on DRILL-4287:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/376#discussion_r53120105
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java 
---
@@ -47,6 +49,14 @@
   public List files;
   public final String selectionRoot;
 
+  private enum StatusType {
+CHECKED_DIRS,// whether we have already checked for directories
+HAS_DIRS,// whether directories were found in the selection
+EXPANDED_DIRS// whether this selection has been expanded to 
files
+  }
+
+  private final BitSet dirStatus;
--- End diff --

Yeah, there was probably some disconnect since your previous comment  was a 
little succinct :).  I was trying to keep track of multiple states, but agree 
it can be simplified.  I may not need to use enumerator constructors. I will 
post an updated patch once I get a clean test run.


> Do lazy reading of parquet metadata cache file
> --
>
> Key: DRILL-4287
> URL: https://issues.apache.org/jira/browse/DRILL-4287
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Jinfeng Ni
>
> Currently, the parquet metadata cache file is read eagerly during creation of 
> the DrillTable (as part of ParquetFormatMatcher.isReadable()).  This is not 
> desirable from performance standpoint since there are scenarios where we want 
> to do some up-front optimizations - e.g. directory-based partition pruning 
> (see DRILL-2517) or potential limit 0 optimization etc. - and in such 
> situations it is better to do lazy reading of the metadata cache file.   
> This is a placeholder to perform such delayed reading since it is needed for 
> the aforementioned optimizations.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4354) Remove sessions in anonymous (user auth disabled) mode in WebUI server

2016-02-16 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-4354:
--
Fix Version/s: (was: 1.5.0)
   1.6.0

> Remove sessions in anonymous (user auth disabled) mode in WebUI server
> --
>
> Key: DRILL-4354
> URL: https://issues.apache.org/jira/browse/DRILL-4354
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.5.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.6.0
>
>
> Currently we open anonymous sessions when user auth disabled. These sessions 
> are cleaned up when they expire (controlled by boot config 
> {{drill.exec.http.session_max_idle_secs}}). This may lead to unnecessary 
> resource accumulation. This JIRA is to remove anonymous sessions and only 
> have sessions when user authentication is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (DRILL-4354) Remove sessions in anonymous (user auth disabled) mode in WebUI server

2016-02-16 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau reopened DRILL-4354:
---

> Remove sessions in anonymous (user auth disabled) mode in WebUI server
> --
>
> Key: DRILL-4354
> URL: https://issues.apache.org/jira/browse/DRILL-4354
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.5.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.5.0
>
>
> Currently we open anonymous sessions when user auth disabled. These sessions 
> are cleaned up when they expire (controlled by boot config 
> {{drill.exec.http.session_max_idle_secs}}). This may lead to unnecessary 
> resource accumulation. This JIRA is to remove anonymous sessions and only 
> have sessions when user authentication is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4390) Drill webserver cannot get static assets if another jar contains a rest/static directory

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149789#comment-15149789
 ] 

ASF GitHub Bot commented on DRILL-4390:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/378#issuecomment-185001567
  
LGTM +1


> Drill webserver cannot get static assets if another jar contains a 
> rest/static directory
> 
>
> Key: DRILL-4390
> URL: https://issues.apache.org/jira/browse/DRILL-4390
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Minor
>
> WebServer is configured to serve static/ URL from a resource containing 
> "rest/static" but if another jar than drill java-exec jar contains the same 
> directory, all queries to static/ will result most likely in 404s.
> I propose to be more precise regarding the resource to use, and find the one 
> containing the Drill favicon under rest/static.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4391) browsing metadata via SQLSquirrel shows Postgres indexes, primary and foreign keys as tables

2016-02-16 Thread N Campbell (JIRA)
N Campbell created DRILL-4391:
-

 Summary: browsing metadata via SQLSquirrel shows Postgres indexes, 
primary and foreign keys as tables
 Key: DRILL-4391
 URL: https://issues.apache.org/jira/browse/DRILL-4391
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Affects Versions: 1.4.0
Reporter: N Campbell


Apache Drill has storage defined to access a Postgres database 
A schema in the database has several tables which either have indexes, primary 
keys, foreign keys or combination of them all. 
When SQLSquirrel presents metadata from the Drill JDBC driver the list of 
tables will include entries which correspond to the indexes, primary or foreign 
keys in the schema. The implication being that non-standard JDBC metadata 
methods to obtain information is being used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4390) Drill webserver cannot get static assets if another jar contains a rest/static directory

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149608#comment-15149608
 ] 

ASF GitHub Bot commented on DRILL-4390:
---

GitHub user laurentgo opened a pull request:

https://github.com/apache/drill/pull/378

DRILL-4390: Uses Resource where Drill favicon is located for static assets

Drill Webserver uses the first jar containing a rest/static directory to
find its static assets. In case of another jar containing this directory, it
might cause the webserver to return 404 errors.

This configures the server to use the resource containing the Drill favicon
as the place to look for all static resources.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/laurentgo/drill laurent/DRILL-4390

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/378.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #378


commit a97c5506cfa800b37258962a8d049fd1b72852e4
Author: Laurent Goujon 
Date:   2016-02-17T00:47:17Z

DRILL-4390: Uses Resource where Drill favicon is located for static assets

Drill Webserver uses the first jar containing a rest/static directory to
find its static assets. In case of another jar containing this directory, it
might cause the webserver to return 404 errors.

This configures the server to use the resource containing the Drill favicon
as the place to look for all static resources.




> Drill webserver cannot get static assets if another jar contains a 
> rest/static directory
> 
>
> Key: DRILL-4390
> URL: https://issues.apache.org/jira/browse/DRILL-4390
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Minor
>
> WebServer is configured to serve static/ URL from a resource containing 
> "rest/static" but if another jar than drill java-exec jar contains the same 
> directory, all queries to static/ will result most likely in 404s.
> I propose to be more precise regarding the resource to use, and find the one 
> containing the Drill favicon under rest/static.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4390) Drill webserver cannot get static assets if another jar contains a rest/static directory

2016-02-16 Thread Laurent Goujon (JIRA)
Laurent Goujon created DRILL-4390:
-

 Summary: Drill webserver cannot get static assets if another jar 
contains a rest/static directory
 Key: DRILL-4390
 URL: https://issues.apache.org/jira/browse/DRILL-4390
 Project: Apache Drill
  Issue Type: Bug
  Components: Web Server
Reporter: Laurent Goujon
Assignee: Laurent Goujon
Priority: Minor


WebServer is configured to serve static/ URL from a resource containing 
"rest/static" but if another jar than drill java-exec jar contains the same 
directory, all queries to static/ will result most likely in 404s.

I propose to be more precise regarding the resource to use, and find the one 
containing the Drill favicon under rest/static.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4372) Let Operators and Functions expose the types

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149577#comment-15149577
 ] 

ASF GitHub Bot commented on DRILL-4372:
---

Github user hsuanyi closed the pull request at:

https://github.com/apache/drill/pull/370


> Let Operators and Functions expose the types 
> -
>
> Key: DRILL-4372
> URL: https://issues.apache.org/jira/browse/DRILL-4372
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
>
> Currently, for most operators / functions, Drill would always claim the 
> return types being nullable-any. 
> However, in many cases (such as Hive, View, etc.), the types of input columns 
> are known. So, along with resolving to the correct operators / functions, we 
> can infer the output types at planning. 
> Having this mechanism can help speed up many applications, especially where 
> schemas alone are sufficient (e.g., Limit-0).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4372) Let Operators and Functions expose the types

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149576#comment-15149576
 ] 

ASF GitHub Bot commented on DRILL-4372:
---

Github user hsuanyi commented on the pull request:

https://github.com/apache/drill/pull/370#issuecomment-184940494
  
Will send out a rebased and updated code shortly


> Let Operators and Functions expose the types 
> -
>
> Key: DRILL-4372
> URL: https://issues.apache.org/jira/browse/DRILL-4372
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Sean Hsuan-Yi Chu
>
> Currently, for most operators / functions, Drill would always claim the 
> return types being nullable-any. 
> However, in many cases (such as Hive, View, etc.), the types of input columns 
> are known. So, along with resolving to the correct operators / functions, we 
> can infer the output types at planning. 
> Having this mechanism can help speed up many applications, especially where 
> schemas alone are sufficient (e.g., Limit-0).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2517) Apply Partition pruning before reading files during planning

2016-02-16 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-2517:
-
Reviewer: Rahul Challapalli

> Apply Partition pruning before reading files during planning
> 
>
> Key: DRILL-2517
> URL: https://issues.apache.org/jira/browse/DRILL-2517
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Adam Gilmore
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Partition pruning still tries to read Parquet files during the planning stage 
> even though they don't match the partition filter.
> For example, if there were an invalid Parquet file in a directory that should 
> not be queried:
> {code}
> 0: jdbc:drill:zk=local> select sum(price) from dfs.tmp.purchases where dir0 = 
> 1;
> Query failed: IllegalArgumentException: file:/tmp/purchases/4/0_0_0.parquet 
> is not a Parquet file (too small)
> {code}
> The reason is that the partition pruning happens after the Parquet plugin 
> tries to read the footer of each file.
> Ideally, partition pruning would happen first before the format plugin gets 
> involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4308) Aggregate operations on dir columns can be more efficient for certain use cases

2016-02-16 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-4308:
---

Assignee: Jinfeng Ni

[~jni] - I believe the changes you're currently working on as part of 
DRILL-4387 will address this.  Right?

> Aggregate operations on dir columns can be more efficient for certain use 
> cases
> --
>
> Key: DRILL-4308
> URL: https://issues.apache.org/jira/browse/DRILL-4308
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Jinfeng Ni
>
> For queries that perform plain aggregates or DISTINCT operations on the 
> directory partition columns (dir0, dir1 etc.) and there are no other columns 
> referenced in the query, the performance could be substantially improved by 
> not having to scan the entire dataset.   
> Consider the following types of queries:
> {noformat}
> select  min(dir0) from largetable;
> select  distinct dir0 from largetable;
> {noformat}
> The number of distinct values of dir columns is typically quite small and 
> there's no reason to scan the large table.  This is also come as user 
> feedback from some Drill users.  Of course, if there's any other column 
> referenced in the query (WHERE, ORDER-BY etc.) then we cannot apply this 
> optimization.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4287) Do lazy reading of parquet metadata cache file

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149358#comment-15149358
 ] 

ASF GitHub Bot commented on DRILL-4287:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/376#discussion_r53083386
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java 
---
@@ -47,6 +49,14 @@
   public List files;
   public final String selectionRoot;
 
+  private enum StatusType {
+CHECKED_DIRS,// whether we have already checked for directories
+HAS_DIRS,// whether directories were found in the selection
+EXPANDED_DIRS// whether this selection has been expanded to 
files
+  }
+
+  private final BitSet dirStatus;
--- End diff --

You took this differently than I meant it. My proposal was that 
FileSelection has various states:

NOT_CHECKED_DIRS => (HAS_DIRS | NO_DIRS) => EXPANDED

Doesn this lifecycle describe the state of FileSelection? This way you 
don't have the multi-state-management problem you currently have below with 
this kind of construct: 

fileSel.setExpanded(true);
fileSel.setCheckedForDirectories(true);
fileSel.setHasDirectories(false);  // already expanded

For each of the enumerations, we can return the right booleans that you 
need through enumerator constructors. 


> Do lazy reading of parquet metadata cache file
> --
>
> Key: DRILL-4287
> URL: https://issues.apache.org/jira/browse/DRILL-4287
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Jinfeng Ni
>
> Currently, the parquet metadata cache file is read eagerly during creation of 
> the DrillTable (as part of ParquetFormatMatcher.isReadable()).  This is not 
> desirable from performance standpoint since there are scenarios where we want 
> to do some up-front optimizations - e.g. directory-based partition pruning 
> (see DRILL-2517) or potential limit 0 optimization etc. - and in such 
> situations it is better to do lazy reading of the metadata cache file.   
> This is a placeholder to perform such delayed reading since it is needed for 
> the aforementioned optimizations.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4383) Allow passign custom configuration options to a file system through the storage plugin config

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149345#comment-15149345
 ] 

ASF GitHub Bot commented on DRILL-4383:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/375#issuecomment-184883511
  
that lgtm => +1 :)


> Allow passign custom configuration options to a file system through the 
> storage plugin config
> -
>
> Key: DRILL-4383
> URL: https://issues.apache.org/jira/browse/DRILL-4383
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.6.0
>
>
> A similar feature already exists in the Hive and Hbase plugins, it simply 
> provides a key/value map for passing custom configuration options to the 
> underlying storage system.
> This would be useful for the filesystem plugin to configure S3 without 
> needing to create a core-site.xml file or restart Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4275) Refactor e/pstore interfaces and their factories to provide a unified mechanism to access stores

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149344#comment-15149344
 ] 

ASF GitHub Bot commented on DRILL-4275:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/374#issuecomment-184883435
  
I'm +1 on my patch assuming the manually testing that Sudheesh is asking 
about.


> Refactor e/pstore interfaces and their factories to provide a unified 
> mechanism to access stores
> 
>
> Key: DRILL-4275
> URL: https://issues.apache.org/jira/browse/DRILL-4275
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: Hanifi Gunes
>Assignee: Deneche A. Hakim
>
> We rely on E/PStore interfaces to persist data. Even though E/PStore stands 
> for Ephemeral and Persistent stores respectively, the current design for 
> EStore does not extend the interface/functionality of PStore at all, which 
> hints abstraction for EStore is redundant. This issue proposes a new unified 
> Store interface replacing the old E/PStore that exposes an additional method 
> that report persistence level as follows:
> {code:title=Store interface}
> interface Store {
>   StoreMode getMode();
>   V get(String key);
>   ...
> }
> enum StoreMode {
>   EPHEMERAL,
>   PERSISTENT,
>   ...
> }
> {code}
> The new design brings in less redundancy, more centralized code, ease to 
> reason and maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4382) Remove dependency on drill-logical from vector submodule

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149304#comment-15149304
 ] 

ASF GitHub Bot commented on DRILL-4382:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/373


> Remove dependency on drill-logical from vector submodule
> 
>
> Key: DRILL-4382
> URL: https://issues.apache.org/jira/browse/DRILL-4382
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Hanifi Gunes
>
> This is in preparation for transitioning the code to the Apache Arrow project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4287) Do lazy reading of parquet metadata cache file

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149203#comment-15149203
 ] 

ASF GitHub Bot commented on DRILL-4287:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/376#discussion_r53068818
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -529,6 +549,36 @@ public long getRowCount() {
 
   }
 
+
+  // Create and return a new file selection based on reading the metadata 
cache file.
+  // This function also initializes a few of ParquetGroupScan's fields as 
appropriate.
+  private FileSelection
+  initFromMetadataCache(DrillFileSystem fs, FileSelection selection) 
throws IOException {
+FileStatus metaRootDir = selection.getFirstPath(fs);
+Path metaFilePath = new Path(metaRootDir.getPath(), 
Metadata.METADATA_FILENAME);
+
+// get (and set internal field) the metadata for the directory by 
reading the metadata file
+this.parquetTableMetadata = Metadata.readBlockMeta(fs, 
metaFilePath.toString());
+List fileNames = Lists.newArrayList();
+for (Metadata.ParquetFileMetadata file : 
parquetTableMetadata.getFiles()) {
+  fileNames.add(file.getPath());
+}
+// when creating the file selection, set the selection root in the 
form /a/b instead of
+// file:/a/b.  The reason is that the file names above have been 
created in the form
+// /a/b/c.parquet and the format of the selection root must match that 
of the file names
+// otherwise downstream operations such as partition pruning can break.
+final Path metaRootPath = 
Path.getPathWithoutSchemeAndAuthority(metaRootDir.getPath());
+this.selectionRoot = metaRootPath.toString();
+
+// Use the FileSelection constructor directly here instead of the 
FileSelection.create() method
+// because create() changes the root to include the scheme and 
authority; In future, if create()
+// is the preferred way to instantiate a file selection, we may need 
to do something different...
+FileSelection newSelection = new 
FileSelection(selection.getStatuses(fs), fileNames, metaRootPath.toString());
--- End diff --

I see. Agreed that we should prioritize DRILL-4381 to address this 
inconsistency. 

Overall, the revised patch looks good to me. 

+1
 


> Do lazy reading of parquet metadata cache file
> --
>
> Key: DRILL-4287
> URL: https://issues.apache.org/jira/browse/DRILL-4287
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Jinfeng Ni
>
> Currently, the parquet metadata cache file is read eagerly during creation of 
> the DrillTable (as part of ParquetFormatMatcher.isReadable()).  This is not 
> desirable from performance standpoint since there are scenarios where we want 
> to do some up-front optimizations - e.g. directory-based partition pruning 
> (see DRILL-2517) or potential limit 0 optimization etc. - and in such 
> situations it is better to do lazy reading of the metadata cache file.   
> This is a placeholder to perform such delayed reading since it is needed for 
> the aforementioned optimizations.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4287) Do lazy reading of parquet metadata cache file

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149180#comment-15149180
 ] 

ASF GitHub Bot commented on DRILL-4287:
---

Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/376#issuecomment-184848577
  
Updated PR after addressing review comment from @jacques-n


> Do lazy reading of parquet metadata cache file
> --
>
> Key: DRILL-4287
> URL: https://issues.apache.org/jira/browse/DRILL-4287
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Jinfeng Ni
>
> Currently, the parquet metadata cache file is read eagerly during creation of 
> the DrillTable (as part of ParquetFormatMatcher.isReadable()).  This is not 
> desirable from performance standpoint since there are scenarios where we want 
> to do some up-front optimizations - e.g. directory-based partition pruning 
> (see DRILL-2517) or potential limit 0 optimization etc. - and in such 
> situations it is better to do lazy reading of the metadata cache file.   
> This is a placeholder to perform such delayed reading since it is needed for 
> the aforementioned optimizations.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4275) Refactor e/pstore interfaces and their factories to provide a unified mechanism to access stores

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149111#comment-15149111
 ] 

ASF GitHub Bot commented on DRILL-4275:
---

Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/374#issuecomment-184833965
  
Since there are no unit tests for web UI, can you do some manual testing to 
ensure there are no regressions?

Is there a simple way to test all persistent and transient stores? I see 
tests for a few implementations only.


> Refactor e/pstore interfaces and their factories to provide a unified 
> mechanism to access stores
> 
>
> Key: DRILL-4275
> URL: https://issues.apache.org/jira/browse/DRILL-4275
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: Hanifi Gunes
>Assignee: Deneche A. Hakim
>
> We rely on E/PStore interfaces to persist data. Even though E/PStore stands 
> for Ephemeral and Persistent stores respectively, the current design for 
> EStore does not extend the interface/functionality of PStore at all, which 
> hints abstraction for EStore is redundant. This issue proposes a new unified 
> Store interface replacing the old E/PStore that exposes an additional method 
> that report persistence level as follows:
> {code:title=Store interface}
> interface Store {
>   StoreMode getMode();
>   V get(String key);
>   ...
> }
> enum StoreMode {
>   EPHEMERAL,
>   PERSISTENT,
>   ...
> }
> {code}
> The new design brings in less redundancy, more centralized code, ease to 
> reason and maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4275) Refactor e/pstore interfaces and their factories to provide a unified mechanism to access stores

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149101#comment-15149101
 ] 

ASF GitHub Bot commented on DRILL-4275:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/374#discussion_r53059667
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/sys/store/provider/CachingPersistentStoreProvider.java
 ---
@@ -0,0 +1,77 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.sys.store.provider;
+
+import java.util.List;
+import java.util.concurrent.ConcurrentMap;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.exec.exception.StoreException;
+import org.apache.drill.exec.store.sys.PersistentStore;
+import org.apache.drill.exec.store.sys.PersistentStoreConfig;
+import org.apache.drill.exec.store.sys.PersistentStoreProvider;
+
+public class CachingPersistentStoreProvider extends 
BasePersistentStoreProvider {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(CachingPersistentStoreProvider.class);
+
+  private final ConcurrentMap storeCache = Maps.newConcurrentMap();
+  private final PersistentStoreProvider provider;
+
+  public CachingPersistentStoreProvider(PersistentStoreProvider provider) {
+this.provider = provider;
+  }
+
+  @SuppressWarnings("unchecked")
+  public  PersistentStore getOrCreateStore(final 
PersistentStoreConfig config) throws StoreException {
--- End diff --

override


> Refactor e/pstore interfaces and their factories to provide a unified 
> mechanism to access stores
> 
>
> Key: DRILL-4275
> URL: https://issues.apache.org/jira/browse/DRILL-4275
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: Hanifi Gunes
>Assignee: Deneche A. Hakim
>
> We rely on E/PStore interfaces to persist data. Even though E/PStore stands 
> for Ephemeral and Persistent stores respectively, the current design for 
> EStore does not extend the interface/functionality of PStore at all, which 
> hints abstraction for EStore is redundant. This issue proposes a new unified 
> Store interface replacing the old E/PStore that exposes an additional method 
> that report persistence level as follows:
> {code:title=Store interface}
> interface Store {
>   StoreMode getMode();
>   V get(String key);
>   ...
> }
> enum StoreMode {
>   EPHEMERAL,
>   PERSISTENT,
>   ...
> }
> {code}
> The new design brings in less redundancy, more centralized code, ease to 
> reason and maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4287) Do lazy reading of parquet metadata cache file

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149067#comment-15149067
 ] 

ASF GitHub Bot commented on DRILL-4287:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/376#discussion_r53056521
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -529,6 +549,36 @@ public long getRowCount() {
 
   }
 
+
+  // Create and return a new file selection based on reading the metadata 
cache file.
+  // This function also initializes a few of ParquetGroupScan's fields as 
appropriate.
+  private FileSelection
+  initFromMetadataCache(DrillFileSystem fs, FileSelection selection) 
throws IOException {
+FileStatus metaRootDir = selection.getFirstPath(fs);
+Path metaFilePath = new Path(metaRootDir.getPath(), 
Metadata.METADATA_FILENAME);
+
+// get (and set internal field) the metadata for the directory by 
reading the metadata file
+this.parquetTableMetadata = Metadata.readBlockMeta(fs, 
metaFilePath.toString());
+List fileNames = Lists.newArrayList();
+for (Metadata.ParquetFileMetadata file : 
parquetTableMetadata.getFiles()) {
+  fileNames.add(file.getPath());
+}
+// when creating the file selection, set the selection root in the 
form /a/b instead of
+// file:/a/b.  The reason is that the file names above have been 
created in the form
+// /a/b/c.parquet and the format of the selection root must match that 
of the file names
+// otherwise downstream operations such as partition pruning can break.
+final Path metaRootPath = 
Path.getPathWithoutSchemeAndAuthority(metaRootDir.getPath());
+this.selectionRoot = metaRootPath.toString();
+
+// Use the FileSelection constructor directly here instead of the 
FileSelection.create() method
+// because create() changes the root to include the scheme and 
authority; In future, if create()
+// is the preferred way to instantiate a file selection, we may need 
to do something different...
+FileSelection newSelection = new 
FileSelection(selection.getStatuses(fs), fileNames, metaRootPath.toString());
--- End diff --

@jinfengni, as part of this PR I moved the expansion from 
ParquetFormatPlugin to ParquetGroupScan for the metadata cache but did not 
change the call to create FileSelection.  I agree that the inconsistency could 
be a problem.  As @adeneche points out, DRILL-4380 has the background for the 
change, but we really need to fix the FileSelection.create() interface, which 
means we should prioritize DRILL-4381.  


> Do lazy reading of parquet metadata cache file
> --
>
> Key: DRILL-4287
> URL: https://issues.apache.org/jira/browse/DRILL-4287
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Jinfeng Ni
>
> Currently, the parquet metadata cache file is read eagerly during creation of 
> the DrillTable (as part of ParquetFormatMatcher.isReadable()).  This is not 
> desirable from performance standpoint since there are scenarios where we want 
> to do some up-front optimizations - e.g. directory-based partition pruning 
> (see DRILL-2517) or potential limit 0 optimization etc. - and in such 
> situations it is better to do lazy reading of the metadata cache file.   
> This is a placeholder to perform such delayed reading since it is needed for 
> the aforementioned optimizations.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4389) ResultSetMetadata for DatabaseMetadata.getSchemas returns TABLE_CAT and not TABLE_CATALOG

2016-02-16 Thread N Campbell (JIRA)
N Campbell created DRILL-4389:
-

 Summary: ResultSetMetadata for DatabaseMetadata.getSchemas returns 
TABLE_CAT and not TABLE_CATALOG 
 Key: DRILL-4389
 URL: https://issues.apache.org/jira/browse/DRILL-4389
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Affects Versions: 1.4.0
Reporter: N Campbell
Priority: Minor


If you inspect the ResultSetMetadata returned by DatabaseMetadata.getSchemas it 
returns a column TABLE_CAT instead of TABLE_CATALOG

https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getSchemas()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4287) Do lazy reading of parquet metadata cache file

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148975#comment-15148975
 ] 

ASF GitHub Bot commented on DRILL-4287:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/376#discussion_r53047947
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java 
---
@@ -68,6 +79,7 @@ protected FileSelection(final FileSelection selection) {
 this.statuses = selection.statuses;
 this.files = selection.files;
 this.selectionRoot = selection.selectionRoot;
+this.dirStatus = new BitSet(StatusType.values().length);
--- End diff --

Yes, it should...I will change it.  Thanks for catching. 


> Do lazy reading of parquet metadata cache file
> --
>
> Key: DRILL-4287
> URL: https://issues.apache.org/jira/browse/DRILL-4287
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Jinfeng Ni
>
> Currently, the parquet metadata cache file is read eagerly during creation of 
> the DrillTable (as part of ParquetFormatMatcher.isReadable()).  This is not 
> desirable from performance standpoint since there are scenarios where we want 
> to do some up-front optimizations - e.g. directory-based partition pruning 
> (see DRILL-2517) or potential limit 0 optimization etc. - and in such 
> situations it is better to do lazy reading of the metadata cache file.   
> This is a placeholder to perform such delayed reading since it is needed for 
> the aforementioned optimizations.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4388) A large # of DatabaseMetadata methods return can not access a member of class org.apache.drill.jdbc.impl.DrillDatabaseMetaDataImpl with modifiers "public"

2016-02-16 Thread N Campbell (JIRA)
N Campbell created DRILL-4388:
-

 Summary: A large # of DatabaseMetadata methods return can not 
access a member of class org.apache.drill.jdbc.impl.DrillDatabaseMetaDataImpl 
with modifiers "public"
 Key: DRILL-4388
 URL: https://issues.apache.org/jira/browse/DRILL-4388
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Affects Versions: 1.4.0
 Environment: Drill 1.4.1
Reporter: N Campbell


If you were to enumerate all the methods re "get" and "is" on the 
DatabaseMetadata interface, you will find a large # which cause

can not access a member of class 
org.apache.drill.jdbc.impl.DrillDatabaseMetaDataImpl with modifiers "public"





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4287) Do lazy reading of parquet metadata cache file

2016-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148927#comment-15148927
 ] 

ASF GitHub Bot commented on DRILL-4287:
---

Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/376#discussion_r53043516
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
@@ -529,6 +549,36 @@ public long getRowCount() {
 
   }
 
+
+  // Create and return a new file selection based on reading the metadata 
cache file.
+  // This function also initializes a few of ParquetGroupScan's fields as 
appropriate.
+  private FileSelection
+  initFromMetadataCache(DrillFileSystem fs, FileSelection selection) 
throws IOException {
+FileStatus metaRootDir = selection.getFirstPath(fs);
+Path metaFilePath = new Path(metaRootDir.getPath(), 
Metadata.METADATA_FILENAME);
+
+// get (and set internal field) the metadata for the directory by 
reading the metadata file
+this.parquetTableMetadata = Metadata.readBlockMeta(fs, 
metaFilePath.toString());
+List fileNames = Lists.newArrayList();
+for (Metadata.ParquetFileMetadata file : 
parquetTableMetadata.getFiles()) {
+  fileNames.add(file.getPath());
+}
+// when creating the file selection, set the selection root in the 
form /a/b instead of
+// file:/a/b.  The reason is that the file names above have been 
created in the form
+// /a/b/c.parquet and the format of the selection root must match that 
of the file names
+// otherwise downstream operations such as partition pruning can break.
+final Path metaRootPath = 
Path.getPathWithoutSchemeAndAuthority(metaRootDir.getPath());
+this.selectionRoot = metaRootPath.toString();
+
+// Use the FileSelection constructor directly here instead of the 
FileSelection.create() method
+// because create() changes the root to include the scheme and 
authority; In future, if create()
+// is the preferred way to instantiate a file selection, we may need 
to do something different...
+FileSelection newSelection = new 
FileSelection(selection.getStatuses(fs), fileNames, metaRootPath.toString());
--- End diff --

Unfortunately, trying to fix this will introduce a performance regression, 
see [DRILL-4380](https://issues.apache.org/jira/browse/DRILL-4380) for more 
details.


> Do lazy reading of parquet metadata cache file
> --
>
> Key: DRILL-4287
> URL: https://issues.apache.org/jira/browse/DRILL-4287
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.4.0
>Reporter: Aman Sinha
>Assignee: Jinfeng Ni
>
> Currently, the parquet metadata cache file is read eagerly during creation of 
> the DrillTable (as part of ParquetFormatMatcher.isReadable()).  This is not 
> desirable from performance standpoint since there are scenarios where we want 
> to do some up-front optimizations - e.g. directory-based partition pruning 
> (see DRILL-2517) or potential limit 0 optimization etc. - and in such 
> situations it is better to do lazy reading of the metadata cache file.   
> This is a placeholder to perform such delayed reading since it is needed for 
> the aforementioned optimizations.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4387) Improve execution side when it handles skipAll query

2016-02-16 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni reassigned DRILL-4387:
-

Assignee: Jinfeng Ni

> Improve execution side when it handles skipAll query
> 
>
> Key: DRILL-4387
> URL: https://issues.apache.org/jira/browse/DRILL-4387
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.6.0
>
>
> DRILL-4279 changes the planner side and the RecordReader in the execution 
> side when they handles skipAll query. However, it seems there are other 
> places in the codebase that do not handle skipAll query efficiently. In 
> particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty 
> column list with star column. This essentially will force the execution side 
> (RecordReader) to fetch all the columns for data source. Such behavior will 
> lead to big performance overhead for the SCAN operator.
> To improve Drill's performance, we should change those places as well, as a 
> follow-up work after DRILL-4279.
> One simple example of this problem is:
> {code}
>SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
> {code}
> The query does not require any regular column from the parquet file. However, 
> ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
> column list. In case table has dozens or hundreds of columns, this will make 
> SCAN operator much more expensive than necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4387) Improve execution side when it handles skipAll query

2016-02-16 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-4387:
--
Description: 
DRILL-4279 changes the planner side and the RecordReader in the execution side 
when they handles skipAll query. However, it seems there are other places in 
the codebase that do not handle skipAll query efficiently. In particular, in 
GroupScan or ScanBatchCreator, we will replace a NULL or empty column list with 
star column. This essentially will force the execution side (RecordReader) to 
fetch all the columns for data source. Such behavior will lead to big 
performance overhead for the SCAN operator.

To improve Drill's performance, we should change those places as well, as a 
follow-up work after DRILL-4279.

One simple example of this problem is:

{code}
   SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
{code}

The query does not require any regular column from the parquet file. However, 
ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
column list. In case table has dozens or hundreds of columns, this will make 
SCAN operator much more expensive than necessary. 



  was:
DRILL-4279 changes the planner side and the RecordReader in the execution side 
when they handles skipAll query. However, it seems there are other places in 
the codebase that do not handle skipAll query efficiently. In particular, in 
GroupScan or ScanBatchCreator, we will replace a NULL or empty column list with 
star column. This essentially will force the execution side (RecordReader) to 
fetch all the columns for data source. Such behavior will lead to big 
performance overhead for the SCAN operator.

To improve Drill's performance, we should change those places as well, as a 
follow-up work after DRILL-4279.

One simple example of this problem is:

{code}
   SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
{code}

The query does not require any regular column from the parquet file. However, 
ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
column list.




> Improve execution side when it handles skipAll query
> 
>
> Key: DRILL-4387
> URL: https://issues.apache.org/jira/browse/DRILL-4387
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
> Fix For: 1.6.0
>
>
> DRILL-4279 changes the planner side and the RecordReader in the execution 
> side when they handles skipAll query. However, it seems there are other 
> places in the codebase that do not handle skipAll query efficiently. In 
> particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty 
> column list with star column. This essentially will force the execution side 
> (RecordReader) to fetch all the columns for data source. Such behavior will 
> lead to big performance overhead for the SCAN operator.
> To improve Drill's performance, we should change those places as well, as a 
> follow-up work after DRILL-4279.
> One simple example of this problem is:
> {code}
>SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
> {code}
> The query does not require any regular column from the parquet file. However, 
> ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
> column list. In case table has dozens or hundreds of columns, this will make 
> SCAN operator much more expensive than necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4387) Improve execution side when it handles skipAll query

2016-02-16 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-4387:
--
Summary: Improve execution side when it handles skipAll query  (was: Fix 
execution side when it handles skipAll query)

> Improve execution side when it handles skipAll query
> 
>
> Key: DRILL-4387
> URL: https://issues.apache.org/jira/browse/DRILL-4387
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
> Fix For: 1.6.0
>
>
> DRILL-4279 changes the planner side and the RecordReader in the execution 
> side when they handles skipAll query. However, it seems there are other 
> places in the codebase that do not handle skipAll query efficiently. In 
> particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty 
> column list with star column. This essentially will force the execution side 
> (RecordReader) to fetch all the columns for data source. Such behavior will 
> lead to big performance overhead for the SCAN operator.
> To improve Drill's performance, we should change those places as well, as a 
> follow-up work after DRILL-4279.
> One simple example of this problem is:
> {code}
>SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
> {code}
> The query does not require any regular column from the parquet file. However, 
> ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
> column list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4387) Fix execution side when it handles skipAll query

2016-02-16 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-4387:
--
Description: 
DRILL-4279 changes the planner side and the RecordReader in the execution side 
when they handles skipAll query. However, it seems there are other places in 
the codebase that do not handle skipAll query efficiently. In particular, in 
GroupScan or ScanBatchCreator, we will replace a NULL or empty column list with 
star column. This essentially will force the execution side (RecordReader) to 
fetch all the columns for data source. Such behavior will lead to big 
performance overhead for the SCAN operator.

To improve Drill's performance, we should change those places as well, as a 
follow-up work after DRILL-4279.

One simple example of this problem is:

{code}
   SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
{code}

The query does not require any regular column from the parquet file. However, 
ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
column list.



> Fix execution side when it handles skipAll query
> 
>
> Key: DRILL-4387
> URL: https://issues.apache.org/jira/browse/DRILL-4387
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
> Fix For: 1.6.0
>
>
> DRILL-4279 changes the planner side and the RecordReader in the execution 
> side when they handles skipAll query. However, it seems there are other 
> places in the codebase that do not handle skipAll query efficiently. In 
> particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty 
> column list with star column. This essentially will force the execution side 
> (RecordReader) to fetch all the columns for data source. Such behavior will 
> lead to big performance overhead for the SCAN operator.
> To improve Drill's performance, we should change those places as well, as a 
> follow-up work after DRILL-4279.
> One simple example of this problem is:
> {code}
>SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
> {code}
> The query does not require any regular column from the parquet file. However, 
> ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
> column list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4387) Fix execution side when it handles skipAll query

2016-02-16 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-4387:
--
Fix Version/s: 1.6.0

> Fix execution side when it handles skipAll query
> 
>
> Key: DRILL-4387
> URL: https://issues.apache.org/jira/browse/DRILL-4387
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
> Fix For: 1.6.0
>
>
> DRILL-4279 changes the planner side and the RecordReader in the execution 
> side when they handles skipAll query. However, it seems there are other 
> places in the codebase that do not handle skipAll query efficiently. In 
> particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty 
> column list with star column. This essentially will force the execution side 
> (RecordReader) to fetch all the columns for data source. Such behavior will 
> lead to big performance overhead for the SCAN operator.
> To improve Drill's performance, we should change those places as well, as a 
> follow-up work after DRILL-4279.
> One simple example of this problem is:
> {code}
>SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
> {code}
> The query does not require any regular column from the parquet file. However, 
> ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
> column list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4387) Fix execution side when it handles skip

2016-02-16 Thread Jinfeng Ni (JIRA)
Jinfeng Ni created DRILL-4387:
-

 Summary: Fix execution side when it handles skip
 Key: DRILL-4387
 URL: https://issues.apache.org/jira/browse/DRILL-4387
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jinfeng Ni






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4387) Fix execution side when it handles skipAll query

2016-02-16 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-4387:
--
Summary: Fix execution side when it handles skipAll query  (was: Fix 
execution side when it handles skip)

> Fix execution side when it handles skipAll query
> 
>
> Key: DRILL-4387
> URL: https://issues.apache.org/jira/browse/DRILL-4387
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)