date:20160218

[jira] [Updated] (DRILL-4039) Query fails when non-ascii characters are used in string literals

2016-02-18 Thread liyun Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyun Liu updated DRILL-4039:
-
Attachment: DRILL-4039.patch.txt

> Query fails when non-ascii characters are used in string literals
> -
>
> Key: DRILL-4039
> URL: https://issues.apache.org/jira/browse/DRILL-4039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.1.0
> Environment: Linux lnxx64r6 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 
> 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Sergio Lob
> Attachments: DRILL-4039.patch.txt
>
>
> The following query against DRILL returns this error:
> SYSTEM ERROR: CalciteException: Failed to encode  'НАСТРОЕние' in character 
> set 'ISO-8859-1'
>  cc39118a-cde6-4a6e-a1d6-4b6b7e847b8a on maprd
> Query is:
>     SELECT
>    T1.`F01INT`,
>    T1.`F02UCHAR_10`,
>    T1.`F03UVARCHAR_10`
>     FROM
>    DPRV64R6_TRDUNI01T T1
>     WHERE
>    (T1.`F03UVARCHAR_10` =  'НАСТРОЕние')
>     ORDER BY
>    T1.`F01INT`;
> This issue looks similar to jira HIVE-12207.
> Is there a fix or workaround for this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4039) Query fails when non-ascii characters are used in string literals

2016-02-18 Thread liyun Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151921#comment-15151921
 ] 

liyun Liu commented on DRILL-4039:
--

Jingguo, the patch I submitted will also fix the problem you mentioned about 
'show tables' and 'describe'.

> Query fails when non-ascii characters are used in string literals
> -
>
> Key: DRILL-4039
> URL: https://issues.apache.org/jira/browse/DRILL-4039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.1.0
> Environment: Linux lnxx64r6 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 
> 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Sergio Lob
> Attachments: DRILL-4039.patch.txt
>
>
> The following query against DRILL returns this error:
> SYSTEM ERROR: CalciteException: Failed to encode  'НАСТРОЕние' in character 
> set 'ISO-8859-1'
>  cc39118a-cde6-4a6e-a1d6-4b6b7e847b8a on maprd
> Query is:
>     SELECT
>    T1.`F01INT`,
>    T1.`F02UCHAR_10`,
>    T1.`F03UVARCHAR_10`
>     FROM
>    DPRV64R6_TRDUNI01T T1
>     WHERE
>    (T1.`F03UVARCHAR_10` =  'НАСТРОЕние')
>     ORDER BY
>    T1.`F01INT`;
> This issue looks similar to jira HIVE-12207.
> Is there a fix or workaround for this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4257) Ensure shutting down a Drillbit also shuts down all StoragePlugins

2016-02-18 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151944#comment-15151944
 ] 

Khurram Faraaz commented on DRILL-4257:
---

Do we have a unit test to verify this ?
Is there a way to verify this from the drillbit.log, that all StoragePlugins 
are also shut down once a Drillbit is shut down ?




> Ensure shutting down a Drillbit also shuts down all StoragePlugins
> --
>
> Key: DRILL-4257
> URL: https://issues.apache.org/jira/browse/DRILL-4257
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.5.0
>
>
> Right now, if a StoragePlugin implementation relies on the close method to 
> clean up resources, those resources won't be cleaned up when the Drillbit 
> class is shutdown. This is because Drillbit doesn't actually close the 
> StoragePluginRegistry and associated resources. This causes problems in 
> leaking resources in tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4241) Add Experimental Kudu plugin

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152065#comment-15152065
 ] 

ASF GitHub Bot commented on DRILL-4241:
---

Github user tdunning commented on the pull request:

https://github.com/apache/drill/pull/314#issuecomment-185636190
  
How could this have been merged?  There is a huge double standard going on 
here.

This code has NO comments.  No tests.  No documentation.  No design. It 
isn't nearly good enough to pass the reviews that are required for others to 
contribute code.

How can it be merged without any kind of significant review?



> Add Experimental Kudu plugin
> 
>
> Key: DRILL-4241
> URL: https://issues.apache.org/jira/browse/DRILL-4241
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.5.0
>
>
> Merge the work done here into Drill master so others can utilize the plugin: 
> https://github.com/dremio/drill-storage-kudu



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-3688) Drill should honor "skip.header.line.count" and "skip.footer.line.count" attributes of Hive table

2016-02-18 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-3688:

Description: 
Currently Drill does not honor the "skip.header.line.count" attribute of Hive 
table.
It may cause some other format conversion issue.

Reproduce:

1. Create a Hive table
{code}
create table h1db.testheader(col0 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
tblproperties("skip.header.line.count"="1");
{code}
2. Prepare a sample data:
{code}
# cat test.data
col0
2015-01-01
{code}
3. Load sample data into Hive
{code}
LOAD DATA LOCAL INPATH '/xxx/test.data' OVERWRITE INTO TABLE h1db.testheader;
{code}
4. Hive
{code}
hive> select * from h1db.testheader ;
OK
2015-01-01
Time taken: 0.254 seconds, Fetched: 1 row(s)
{code}
5. Drill
{code}
>  select * from hive.h1db.testheader ;
+-+
|col0 |
+-+
| col0|
| 2015-01-01  |
+-+
2 rows selected (0.257 seconds)

> select cast(col0 as date) from hive.h1db.testheader ;
Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must 
be in the range [1,12]

Fragment 0:0

[Error Id: 34353702-ca27-440b-a4f4-0c9f79fc8ccd on h1.poc.com:31010]

  (org.joda.time.IllegalFieldValueException) Value 0 for monthOfYear must be in 
the range [1,12]
org.joda.time.field.FieldUtils.verifyValueBounds():236
org.joda.time.chrono.BasicChronology.getDateMidnightMillis():613
org.joda.time.chrono.BasicChronology.getDateTimeMillis():159
org.joda.time.chrono.AssembledChronology.getDateTimeMillis():120
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.memGetDate():261
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getDate():218
org.apache.drill.exec.test.generated.ProjectorGen0.doEval():67
org.apache.drill.exec.test.generated.ProjectorGen0.projectRecords():62
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():172
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
org.apache.drill.exec.record.AbstractRecordBatch.next():147
org.apache.drill.exec.physical.impl.BaseRootExec.next():83
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():79
org.apache.drill.exec.physical.impl.BaseRootExec.next():73
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():261
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():255
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1566
org.apache.drill.exec.work.fragment.FragmentExecutor.run():255
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745 (state=,code=0)
{code}

Also "skip.footer.line.count" should be taken into account.
If "skip.header.line.count" or "skip.footer.line.count" has incorrect value in 
Hive, throw appropriate exception in Drill.
Ex: Hive table property skip.header.line.count value 'someValue' is non-numeric

  was:
Currently Drill does not honor the "skip.header.line.count" attribute of Hive 
table.
It may cause some other format conversion issue.

Reproduce:

1. Create a Hive table
{code}
create table h1db.testheader(col0 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
tblproperties("skip.header.line.count"="1");
{code}
2. Prepare a sample data:
{code}
# cat test.data
col0
2015-01-01
{code}
3. Load sample data into Hive
{code}
LOAD DATA LOCAL INPATH '/xxx/test.data' OVERWRITE INTO TABLE h1db.testheader;
{code}
4. Hive
{code}
hive> select * from h1db.testheader ;
OK
2015-01-01
Time taken: 0.254 seconds, Fetched: 1 row(s)
{code}
5. Drill
{code}
>  select * from hive.h1db.testheader ;
+-+
|col0 |
+-+
| col0|
| 2015-01-01  |
+-+
2 rows selected (0.257 seconds)

> select cast(col0 as date) from hive.h1db.testheader ;
Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must 
be in the range [1,12]

Fragment 0:0

[Error Id: 34353702-ca27-440b-a4f4-0c9f79fc8ccd on h1.poc.com:31010]

  (org.joda.time.IllegalFieldValueException) Value 0 for monthOfYear must be in 
the range [1,12]
org.joda.time.field.FieldUtils.verifyValueBounds():236
org.joda.time.chrono.BasicChronology.getDateMidnightMillis():613
org.joda.time.chrono.BasicChronology.getDateTimeMillis():159
org.joda.time.chrono.AssembledChronology.getDateTimeMillis():120
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.memGetDate():261
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getDate():218
org.apache.drill.exec.test.generated.ProjectorGen0.doEval():67

[jira] [Commented] (DRILL-4241) Add Experimental Kudu plugin

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152367#comment-15152367
 ] 

ASF GitHub Bot commented on DRILL-4241:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/314#issuecomment-185738887
  
The Kudu plugin was contributed by six different developers, three of which 
are Drill PMC members and two more which are PMC members of other Apache 
projects. The code remained available for five days for review and received two 
plus ones and no negative feedback. It is modeled after the HBase plugin and 
works the same. It is unfortunate that there aren't integrated tests (due to 
the fact that there wasn't an easy way to provide integrated tests such as mini 
hbase cluster) but it was and is regularly manually tested. Due to the light 
testing, we are communicating it as experimental to users.

Suggesting it didn't go through review when you have that large a group of 
developers involved is weird. Assuming that the user api isn't in dispute 
(which no one here disputed most likely because Kudu looks exactly like an 
Oracle table), providing experimental plugins increases the breadth of Drill's 
appeal and thus broadens and strengthens the community. 


> Add Experimental Kudu plugin
> 
>
> Key: DRILL-4241
> URL: https://issues.apache.org/jira/browse/DRILL-4241
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.5.0
>
>
> Merge the work done here into Drill master so others can utilize the plugin: 
> https://github.com/dremio/drill-storage-kudu



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152497#comment-15152497
 ] 

ASF GitHub Bot commented on DRILL-4387:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/379#discussion_r53335185
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/GroupScan.java 
---
@@ -35,6 +35,8 @@
 public interface GroupScan extends Scan, HasAffinity{
 
   public static final List ALL_COLUMNS = 
ImmutableList.of(SchemaPath.getSimplePath("*"));
+  public static final List EMPTY_COLUMNS = ImmutableList.of();
--- End diff --

This static constant does not seem to be referenced anywhere ?


> Improve execution side when it handles skipAll query
> 
>
> Key: DRILL-4387
> URL: https://issues.apache.org/jira/browse/DRILL-4387
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.6.0
>
>
> DRILL-4279 changes the planner side and the RecordReader in the execution 
> side when they handles skipAll query. However, it seems there are other 
> places in the codebase that do not handle skipAll query efficiently. In 
> particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty 
> column list with star column. This essentially will force the execution side 
> (RecordReader) to fetch all the columns for data source. Such behavior will 
> lead to big performance overhead for the SCAN operator.
> To improve Drill's performance, we should change those places as well, as a 
> follow-up work after DRILL-4279.
> One simple example of this problem is:
> {code}
>SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
> {code}
> The query does not require any regular column from the parquet file. However, 
> ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
> column list. In case table has dozens or hundreds of columns, this will make 
> SCAN operator much more expensive than necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152573#comment-15152573
 ] 

ASF GitHub Bot commented on DRILL-4387:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/379#discussion_r53340898
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java
 ---
@@ -87,9 +87,6 @@ public ScanBatch getBatch(FragmentContext context, 
ParquetRowGroupScan rowGroupS
   newColumns.add(column);
 }
   }
-  if (newColumns.isEmpty()) {
--- End diff --

So, to clarify, the reason you removed the check for newColumns.isEmpty() 
is that if the column list is empty, the underlying ParquetRecordReader will 
handle it correctly by produce 1 default column (probably a NullableInt column) 
? 
Was this check for isEmpty() only present in the Parquet scan ? or should 
other readers need modification too ? 

I think it would be good to add comments about how the NULL and empty 
column list are being handled by each data source. 


> Improve execution side when it handles skipAll query
> 
>
> Key: DRILL-4387
> URL: https://issues.apache.org/jira/browse/DRILL-4387
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.6.0
>
>
> DRILL-4279 changes the planner side and the RecordReader in the execution 
> side when they handles skipAll query. However, it seems there are other 
> places in the codebase that do not handle skipAll query efficiently. In 
> particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty 
> column list with star column. This essentially will force the execution side 
> (RecordReader) to fetch all the columns for data source. Such behavior will 
> lead to big performance overhead for the SCAN operator.
> To improve Drill's performance, we should change those places as well, as a 
> follow-up work after DRILL-4279.
> One simple example of this problem is:
> {code}
>SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
> {code}
> The query does not require any regular column from the parquet file. However, 
> ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
> column list. In case table has dozens or hundreds of columns, this will make 
> SCAN operator much more expensive than necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-4256) Performance regression in hive planning

2016-02-18 Thread Dechang Gu (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dechang Gu closed DRILL-4256.
-

Verified by Rahul. 

> Performance regression in hive planning
> ---
>
> Key: DRILL-4256
> URL: https://issues.apache.org/jira/browse/DRILL-4256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
>Assignee: Venki Korukanti
> Fix For: 1.5.0
>
> Attachments: jstack.tgz
>
>
> Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> The fix for reading hive tables backed by hbase caused a performance 
> regression. The data set used in the below test has ~3700 partitions and the 
> filter in the query would ensure only 1 partition get selected.
> {code}
> Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~25 seconds
> {code}
> {code}
> Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~6.5 seconds
> {code}
> Since the data is large, I couldn't attach it here. Reach out to me if you 
> need additional information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-4380) Fix performance regression: in creation of FileSelection in ParquetFormatPlugin to not set files if metadata cache is available.

2016-02-18 Thread Dechang Gu (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dechang Gu closed DRILL-4380.
-

Verified. LGTM.

> Fix performance regression: in creation of FileSelection in 
> ParquetFormatPlugin to not set files if metadata cache is available.
> 
>
> Key: DRILL-4380
> URL: https://issues.apache.org/jira/browse/DRILL-4380
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
> Fix For: 1.5.0
>
>
> The regression has been caused by the changes in 
> 367d74a65ce2871a1452361cbd13bbd5f4a6cc95 (DRILL-2618: handle queries over 
> empty folders consistently so that they report table not found rather than 
> failing.)
> In ParquetFormatPlugin, the original code created a FileSelection object in 
> the following code:
> {code}
> return new FileSelection(fileNames, metaRootPath.toString(), metadata, 
> selection.getFileStatusList(fs));
> {code}
> The selection.getFileStatusList call made an inexpensive call to 
> FileSelection.init(). The call was inexpensive because the 
> FileSelection.files member was not set and the code does not need to make an 
> expensive call to get the file statuses corresponding to the files in the 
> FileSelection.files member.
> In the new code, this is replaced by 
> {code}
>   final FileSelection newSelection = FileSelection.create(null, fileNames, 
> metaRootPath.toString());
> return ParquetFileSelection.create(newSelection, metadata);
> {code}
> This sets the FileSelection.files member but not the FileSelection.statuses 
> member. A subsequent call to FileSelection.getStatuses ( in 
> ParquetGroupScan() ) now makes an expensive call to get all the statuses.
> It appears that there was an implicit assumption that the 
> FileSelection.statuses member should be set before the FileSelection.files 
> member is set. This assumption is no longer true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4413) Improve FrameSupportTemplate to do the setup only when necessary

2016-02-18 Thread Deneche A. Hakim (JIRA)

Deneche A. Hakim created DRILL-4413:
---

 Summary: Improve FrameSupportTemplate to do the setup only when 
necessary
 Key: DRILL-4413
 URL: https://issues.apache.org/jira/browse/DRILL-4413
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Execution - Relational Operators
Affects Versions: 1.4.0
Reporter: Deneche A. Hakim


Current implementation of FrameSupportTemplate does some setup at the beginning 
of every partition. We shouldn't need to redo the setup until the batch changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4413) Improve FrameSupportTemplate to do the setup only when necessary

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152640#comment-15152640
 ] 

ASF GitHub Bot commented on DRILL-4413:
---

Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/340#discussion_r53347747
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/FrameSupportTemplate.java
 ---
@@ -134,44 +142,67 @@ private void cleanPartition() {
* @throws DrillException if it can't write into the container
*/
   private int processPartition(final int currentRow) throws DrillException 
{
-logger.trace("process partition {}, currentRow: {}, outputCount: {}", 
partition, currentRow, outputCount);
+logger.trace("{} rows remaining to process, currentRow: {}, 
outputCount: {}", remainingRows, currentRow, outputCount);
 
 setupWriteFirstValue(internal, container);
 
-int row = currentRow;
+if (popConfig.isRows()) {
+  return processROWS(currentRow);
+} else {
+  return processRANGE(currentRow);
+}
+  }
+
+  private int processROWS(int row) throws DrillException {
+//TODO we only need to call these once per batch
--- End diff --

We do the setup at the beginning of every partition. In case we have 
multiple partitions in the same batch setup should only be done once. To make 
the matters more complicated, if we are aggregating a single partition that 
spans multiple batches, we also need to do the setup for every batch.
The TODO is still valid. I create 
[DRILL-4413](https://issues.apache.org/jira/browse/DRILL-4413) to keep track of 
it


> Improve FrameSupportTemplate to do the setup only when necessary
> 
>
> Key: DRILL-4413
> URL: https://issues.apache.org/jira/browse/DRILL-4413
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Relational Operators
>Affects Versions: 1.4.0
>Reporter: Deneche A. Hakim
> Fix For: Future
>
>
> Current implementation of FrameSupportTemplate does some setup at the 
> beginning of every partition. We shouldn't need to redo the setup until the 
> batch changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4256) Performance regression in hive planning

2016-02-18 Thread Rahul Challapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152682#comment-15152682
 ] 

Rahul Challapalli commented on DRILL-4256:
--

[~dgu-atmapr] This is not closed as we did not automate the fix using the 
performance framework

> Performance regression in hive planning
> ---
>
> Key: DRILL-4256
> URL: https://issues.apache.org/jira/browse/DRILL-4256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive, Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Rahul Challapalli
>Assignee: Venki Korukanti
> Fix For: 1.5.0
>
> Attachments: jstack.tgz
>
>
> Commit # : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> The fix for reading hive tables backed by hbase caused a performance 
> regression. The data set used in the below test has ~3700 partitions and the 
> filter in the query would ensure only 1 partition get selected.
> {code}
> Commit : 76f41e18207e3e3e987fef56ee7f1695dd6ddd7a
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~25 seconds
> {code}
> {code}
> Commit : 1ea3d6c3f144614caf460648c1c27c6d0f5b06b8
> Query : explain plan for select count(*) from lineitem_partitioned where 
> `year`=2015 and `month`=1 and `day` =1;
> Time : ~6.5 seconds
> {code}
> Since the data is large, I couldn't attach it here. Reach out to me if you 
> need additional information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4411) HashJoin should not only depend on number of records, but also on size

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152698#comment-15152698
 ] 

ASF GitHub Bot commented on DRILL-4411:
---

GitHub user minji-kim opened a pull request:

https://github.com/apache/drill/pull/381

DRILL-4411: hash join should limit batch based on size and number of records

Right now, hash joins can run out of memory if records are large since the 
batch is limited only by size (of 4000).  This patch implements a simple 
heuristic.  If the allocator for the outputs become larger than 10 MB before 
outputing 4000 records (say 2000), then set the batch size limit to 2000 for 
the future batches.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/minji-kim/drill DRILL-4411

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/381.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #381


commit 2e3b1c75273e1b87679d79bdc4f3877b72603e3c
Author: Minji Kim 
Date:   2016-02-18T17:05:51Z

DRILL-4411: hash join should limit batch based on size as well as number of 
records




> HashJoin should not only depend on number of records, but also on size
> --
>
> Key: DRILL-4411
> URL: https://issues.apache.org/jira/browse/DRILL-4411
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Reporter: MinJi Kim
>Assignee: MinJi Kim
>
> In HashJoinProbeTemplate, each batch is limited to TARGET_RECORDS_PER_BATCH 
> (4000).  But we should not only depend on the number of records, but also 
> size (in case of extremely large records).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4374) Drill rewrites Postgres query with ambiguous column references

2016-02-18 Thread Jacques Nadeau (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152714#comment-15152714
 ] 

Jacques Nadeau commented on DRILL-4374:
---

Another potential similar example from mailing list:

{code}
SELECT cs.post_id,
   t.tag_id,
   cs.language_code,
   cs.likes_count,
   cs.comments_count,
   cs.clippings_count,
   cs.cr_recency_score
FROM redshift.public.card_scores AS cs
JOIN redshift.public.taggings AS t
  ON cs.post_id = t.post_id
INNER JOIN redshift.public.min_scale_scores AS mss
  ON mss.post_id=cs.post_id
WHERE cs.cr_recency_score IS NOT NULL
  AND t.status <> 'unpublished'

Then, error raised.

2016-02-18 05:44:26,862 [293aa5c5-4dcd-3cd8-7b40-4847289d71fa:frag:0:0] INFO  
o.a.d.e.store.jdbc.JdbcRecordReader - User Error Occurred
org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: The JDBC 
storage plugin failed while trying setup the SQL query.

sql SELECT *
FROM (SELECT *
FROM "public"."card_scores"
INNER JOIN "public"."taggings" ON "card_scores"."post_id" = "taggings"."post_id"
WHERE "card_scores"."cr_recency_score" IS NOT NULL AND "taggings"."status" <> 
'unpublished') AS "t"
INNER JOIN "public"."min_scale_scores" ON "t"."post_id" = 
"min_scale_scores"."post_id"
plugin redshift

[Error Id: 0ffb54f1-95b9-4a8b-b985-f05e16a2aa6a ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
 ~[drill-common-1.5.0.jar:1.5.0]
at 
org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup(JdbcRecordReader.java:221)
 [drill-jdbc-storage-1.5.0.jar:1.5.0]

Caused by: org.postgresql.util.PSQLException: ERROR: column reference "post_id" 
is ambiguous
{code}

> Drill rewrites Postgres query with ambiguous column references
> --
>
> Key: DRILL-4374
> URL: https://issues.apache.org/jira/browse/DRILL-4374
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.4.0
>Reporter: Justin Bradford
>Assignee: Taras Supyk
>
> Drill drops table references when rewriting this query, resulting in 
> ambiguous column references.
> This query: 
> {code:sql}
> select s.uuid as site_uuid, psc.partner_id, 
>   sum(psc.net_revenue_dollars) as revenue 
> from app.public.partner_site_clicks psc 
> join app.public.sites s on psc.site_id = s.id 
> join app.public.partner_click_days pcd on pcd.id = psc.partner_click_day_id 
> where s.generate_revenue_report is true and pcd.`day` = '2016-02-07' 
> group by s.uuid, psc.partner_id; 
> {code} 
> Results in this error: 
> {quote} 
> DATA_READ ERROR: The JDBC storage plugin failed while trying setup the SQL 
> query. 
> {quote}
> Trying to run this re-written query:
> {code:sql}
> SELECT "site_uuid", "partner_id", SUM("net_revenue_dollars") AS "revenue" 
> FROM (
>   SELECT "uuid" AS "site_uuid", "partner_id", "net_revenue_dollars" 
>   FROM "public"."partner_site_clicks" 
>   INNER JOIN "public"."sites" ON "partner_site_clicks"."site_id" = 
> "sites"."id"
>   INNER JOIN "public"."partner_click_days" ON 
> "partner_site_clicks"."partner_click_day_id" = "partner_click_days"."id" 
>   WHERE "sites"."generate_revenue_report" IS TRUE AND 
> "partner_click_days"."day" = '2016-02-07'
> ) AS "t0" GROUP BY "site_uuid", "partner_id" 
> {code}
> That query fails due to an ambiguous "partner_id" reference as two of the 
> tables have that column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-3624) Enhance Web UI to be able to select schema ("use")

2016-02-18 Thread Krystal (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal closed DRILL-3624.
--

Verified as part of Drill-3201

> Enhance Web UI to be able to select schema ("use")
> --
>
> Key: DRILL-3624
> URL: https://issues.apache.org/jira/browse/DRILL-3624
> Project: Apache Drill
>  Issue Type: Wish
>  Components: Client - HTTP
>Affects Versions: 1.1.0
>Reporter: Uwe Geercken
>Priority: Minor
> Fix For: 1.5.0
>
>
> it would be advantageous to be able to select a schema ("use") in the web ui, 
> so that the information does not always have to be specified in each query.
> this could be realized e.g. through a drop down where the user selects the 
> schema from the list of available schemas. the ui should store this 
> information until a different schema is selected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4410) ListVector causes OversizedAllocationException

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152772#comment-15152772
 ] 

ASF GitHub Bot commented on DRILL-4410:
---

Github user minji-kim commented on the pull request:

https://github.com/apache/drill/pull/380#issuecomment-185846185
  
Added the checks in the test.


> ListVector causes OversizedAllocationException
> --
>
> Key: DRILL-4410
> URL: https://issues.apache.org/jira/browse/DRILL-4410
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Reporter: MinJi Kim
>Assignee: MinJi Kim
>
> Reading large data set with array/list causes the following problem.  This 
> happens when union type is enabled.
> (org.apache.drill.exec.exception.OversizedAllocationException) Unable to 
> expand the buffer. Max allowed buffer size is reached.
> org.apache.drill.exec.vector.UInt1Vector.reAlloc():214
> org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406
> org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298
> org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307
> org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563
> org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115
> org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100
> org.apache.drill.exec.vector.complex.ListVector.copyFrom():97
> org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (DRILL-4235) Hit IllegalStateException when exec.queue.enable=ture

2016-02-18 Thread Deneche A. Hakim (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim reopened DRILL-4235:
-
  Assignee: Hanifi Gunes  (was: Deneche A. Hakim)

temporarily reopening this to assign it to the developer who fixed it

> Hit IllegalStateException when exec.queue.enable=ture 
> --
>
> Key: DRILL-4235
> URL: https://issues.apache.org/jira/browse/DRILL-4235
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.5.0
> Environment: git.commit.id=6dea429949a3d6a68aefbdb3d78de41e0955239b
>Reporter: Dechang Gu
>Assignee: Hanifi Gunes
>Priority: Critical
> Fix For: 1.5.0
>
>
> 0: jdbc:drill:schema=dfs.parquet> select * from sys.options;
> Error: SYSTEM ERROR: IllegalStateException: Failure trying to change states: 
> ENQUEUED --> RUNNING
> [Error Id: 6ac8167c-6fb7-4274-9e5c-bf62a195c06e on ucs-node5.perf.lab:31010]
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: Exceptions caught during event processing
> org.apache.drill.exec.work.foreman.Foreman.run():261
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (java.lang.RuntimeException) Exceptions caught during event 
> processing
> org.apache.drill.common.EventProcessor.sendEvent():93
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState():792
> org.apache.drill.exec.work.foreman.Foreman.moveToState():909
> org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan():420
> org.apache.drill.exec.work.foreman.Foreman.runSQL():926
> org.apache.drill.exec.work.foreman.Foreman.run():250
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (java.lang.IllegalStateException) Failure trying to change 
> states: ENQUEUED --> RUNNING
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent():896
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent():790
> org.apache.drill.common.EventProcessor.sendEvent():73
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState():792
> org.apache.drill.exec.work.foreman.Foreman.moveToState():909
> org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan():420
> org.apache.drill.exec.work.foreman.Foreman.runSQL():926
> org.apache.drill.exec.work.foreman.Foreman.run():250
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-4235) Hit IllegalStateException when exec.queue.enable=ture

2016-02-18 Thread Deneche A. Hakim (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim closed DRILL-4235.
---
Resolution: Fixed

> Hit IllegalStateException when exec.queue.enable=ture 
> --
>
> Key: DRILL-4235
> URL: https://issues.apache.org/jira/browse/DRILL-4235
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.5.0
> Environment: git.commit.id=6dea429949a3d6a68aefbdb3d78de41e0955239b
>Reporter: Dechang Gu
>Assignee: Hanifi Gunes
>Priority: Critical
> Fix For: 1.5.0
>
>
> 0: jdbc:drill:schema=dfs.parquet> select * from sys.options;
> Error: SYSTEM ERROR: IllegalStateException: Failure trying to change states: 
> ENQUEUED --> RUNNING
> [Error Id: 6ac8167c-6fb7-4274-9e5c-bf62a195c06e on ucs-node5.perf.lab:31010]
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: Exceptions caught during event processing
> org.apache.drill.exec.work.foreman.Foreman.run():261
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (java.lang.RuntimeException) Exceptions caught during event 
> processing
> org.apache.drill.common.EventProcessor.sendEvent():93
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState():792
> org.apache.drill.exec.work.foreman.Foreman.moveToState():909
> org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan():420
> org.apache.drill.exec.work.foreman.Foreman.runSQL():926
> org.apache.drill.exec.work.foreman.Foreman.run():250
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (java.lang.IllegalStateException) Failure trying to change 
> states: ENQUEUED --> RUNNING
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent():896
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent():790
> org.apache.drill.common.EventProcessor.sendEvent():73
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState():792
> org.apache.drill.exec.work.foreman.Foreman.moveToState():909
> org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan():420
> org.apache.drill.exec.work.foreman.Foreman.runSQL():926
> org.apache.drill.exec.work.foreman.Foreman.run():250
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4410) ListVector causes OversizedAllocationException

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152811#comment-15152811
 ] 

ASF GitHub Bot commented on DRILL-4410:
---

Github user minji-kim commented on a diff in the pull request:

https://github.com/apache/drill/pull/380#discussion_r53359912
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestComplexTypeReader.java
 ---
@@ -241,4 +252,49 @@ public void testRepeatedJson() throws Exception {
 .go();
   }
 
+  @Test  // DRILL-4410
+  // ListVector allocation
+  public void test_array() throws Exception{
+
+long numRecords = 10;
+String file1 = "/tmp/" + TestComplexTypeReader.class.getName() + 
"arrays1.json";
--- End diff --

ParquetRecordReaderTest also uses "/tmp", so I think this should also work.


> ListVector causes OversizedAllocationException
> --
>
> Key: DRILL-4410
> URL: https://issues.apache.org/jira/browse/DRILL-4410
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Reporter: MinJi Kim
>Assignee: MinJi Kim
>
> Reading large data set with array/list causes the following problem.  This 
> happens when union type is enabled.
> (org.apache.drill.exec.exception.OversizedAllocationException) Unable to 
> expand the buffer. Max allowed buffer size is reached.
> org.apache.drill.exec.vector.UInt1Vector.reAlloc():214
> org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406
> org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298
> org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307
> org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563
> org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115
> org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100
> org.apache.drill.exec.vector.complex.ListVector.copyFrom():97
> org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152820#comment-15152820
 ] 

ASF GitHub Bot commented on DRILL-4387:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/379#discussion_r53360482
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/GroupScan.java 
---
@@ -35,6 +35,8 @@
 public interface GroupScan extends Scan, HasAffinity{
 
   public static final List ALL_COLUMNS = 
ImmutableList.of(SchemaPath.getSimplePath("*"));
+  public static final List EMPTY_COLUMNS = ImmutableList.of();
--- End diff --

Nice catch. It's no longer needed. (Originally, I intended to convert NULL 
to empty_columns. But not it's not necessary). I'll remove that. thx. 


> Improve execution side when it handles skipAll query
> 
>
> Key: DRILL-4387
> URL: https://issues.apache.org/jira/browse/DRILL-4387
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.6.0
>
>
> DRILL-4279 changes the planner side and the RecordReader in the execution 
> side when they handles skipAll query. However, it seems there are other 
> places in the codebase that do not handle skipAll query efficiently. In 
> particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty 
> column list with star column. This essentially will force the execution side 
> (RecordReader) to fetch all the columns for data source. Such behavior will 
> lead to big performance overhead for the SCAN operator.
> To improve Drill's performance, we should change those places as well, as a 
> follow-up work after DRILL-4279.
> One simple example of this problem is:
> {code}
>SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
> {code}
> The query does not require any regular column from the parquet file. However, 
> ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
> column list. In case table has dozens or hundreds of columns, this will make 
> SCAN operator much more expensive than necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152837#comment-15152837
 ] 

ASF GitHub Bot commented on DRILL-4387:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/379#discussion_r53361870
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java
 ---
@@ -87,9 +87,6 @@ public ScanBatch getBatch(FragmentContext context, 
ParquetRowGroupScan rowGroupS
   newColumns.add(column);
 }
   }
-  if (newColumns.isEmpty()) {
--- End diff --

I went through all the ScanBatchCreator in Drill's code base. Seems 
ParquetScanBatchCreator is the only one that is converting an empty column list 
to ALL_COLUMNS. Looking at the history, seems DRILL-1845 added the code, 
probably just to make it work in parquet for skipAll query.  

With the patch of DRILL-4279, parquet record reader would be able to handle 
empty column list. 

Besides ParquetScanBatchCreator, this patch also modifies HBaseGroupScan, 
EasyGroupScan where it originally interprets empty column lists into 
ALL_COLUMNS. 

I'll add some comment to the code to clarify the different meaning of NULL 
and empty column list. 



> Improve execution side when it handles skipAll query
> 
>
> Key: DRILL-4387
> URL: https://issues.apache.org/jira/browse/DRILL-4387
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.6.0
>
>
> DRILL-4279 changes the planner side and the RecordReader in the execution 
> side when they handles skipAll query. However, it seems there are other 
> places in the codebase that do not handle skipAll query efficiently. In 
> particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty 
> column list with star column. This essentially will force the execution side 
> (RecordReader) to fetch all the columns for data source. Such behavior will 
> lead to big performance overhead for the SCAN operator.
> To improve Drill's performance, we should change those places as well, as a 
> follow-up work after DRILL-4279.
> One simple example of this problem is:
> {code}
>SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
> {code}
> The query does not require any regular column from the parquet file. However, 
> ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
> column list. In case table has dozens or hundreds of columns, this will make 
> SCAN operator much more expensive than necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4410) ListVector causes OversizedAllocationException

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152841#comment-15152841
 ] 

ASF GitHub Bot commented on DRILL-4410:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/380#discussion_r53362337
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestComplexTypeReader.java
 ---
@@ -241,4 +252,49 @@ public void testRepeatedJson() throws Exception {
 .go();
   }
 
+  @Test  // DRILL-4410
+  // ListVector allocation
+  public void test_array() throws Exception{
+
+long numRecords = 10;
+String file1 = "/tmp/" + TestComplexTypeReader.class.getName() + 
"arrays1.json";
--- End diff --

Seems ParquetRecordReaderTest is ignored?

[1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetRecordReaderTest.java#L84


> ListVector causes OversizedAllocationException
> --
>
> Key: DRILL-4410
> URL: https://issues.apache.org/jira/browse/DRILL-4410
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Reporter: MinJi Kim
>Assignee: MinJi Kim
>
> Reading large data set with array/list causes the following problem.  This 
> happens when union type is enabled.
> (org.apache.drill.exec.exception.OversizedAllocationException) Unable to 
> expand the buffer. Max allowed buffer size is reached.
> org.apache.drill.exec.vector.UInt1Vector.reAlloc():214
> org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406
> org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298
> org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307
> org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563
> org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115
> org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100
> org.apache.drill.exec.vector.complex.ListVector.copyFrom():97
> org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4161) Make Hive Metastore client caching user configurable.

2016-02-18 Thread Rahul Challapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152844#comment-15152844
 ] 

Rahul Challapalli commented on DRILL-4161:
--

Verified! (Not automated as the test framework does not support this use case 
without hacking it)

Created 2 hive plugins with the below details and verified that the metastore 
cache is properly updated based on the below settings. 
hive1 :
{code}
"hive.metastore.cache-ttl-seconds": "600",
 "hive.metastore.cache-expire-after": "access"
{code}

hive2:
{code}
"hive.metastore.cache-ttl-seconds": "30",
 "hive.metastore.cache-expire-after": "write"
{code}

> Make Hive Metastore client caching user configurable.
> -
>
> Key: DRILL-4161
> URL: https://issues.apache.org/jira/browse/DRILL-4161
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>  Labels: documentation
> Fix For: 1.5.0
>
>
> Drill leverages LoadingCache in hive metastore client, in order to avoid the 
> long access time to hive metastore server. However, there is a tradeoff 
> between caching stale data and the possibility of cache hit. 
> For instance, DRILL-3893 changes cache invalidation policy to "1 minute after 
> last write", to avoid the chances of hitting stale data. However, it also 
> implies that the cached data would be only valid for 1 minute after 
> loading/write.
> It's desirable to allow user to configure the caching policy, per their 
> individual use case requirement. In particular, we probably should allow user 
> to specify:
> 1) caching invalidation policy : expire after last access, or expire after 
> last write.
> 2) cache TTL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4328) Fix for backward compatibility regression caused by DRILL-4198

2016-02-18 Thread Rahul Challapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152849#comment-15152849
 ] 

Rahul Challapalli commented on DRILL-4328:
--

There is no simple functional test that can verify this (unless we create 
something which consumes the interface that is being changed/reverted). I 
believe a unit test is sufficient for this.

> Fix for backward compatibility regression caused by DRILL-4198
> --
>
> Key: DRILL-4328
> URL: https://issues.apache.org/jira/browse/DRILL-4328
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.5.0
>
>
> Revert updates made to StoragePlugin interface in DRILL-4198. Instead add the 
> new methods to AbstractStoragePlugin. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4410) ListVector causes OversizedAllocationException

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152890#comment-15152890
 ] 

ASF GitHub Bot commented on DRILL-4410:
---

Github user minji-kim commented on a diff in the pull request:

https://github.com/apache/drill/pull/380#discussion_r53366006
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestComplexTypeReader.java
 ---
@@ -241,4 +252,49 @@ public void testRepeatedJson() throws Exception {
 .go();
   }
 
+  @Test  // DRILL-4410
+  // ListVector allocation
+  public void test_array() throws Exception{
+
+long numRecords = 10;
+String file1 = "/tmp/" + TestComplexTypeReader.class.getName() + 
"arrays1.json";
--- End diff --

I think these tests all use /tmp. 


https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/impersonation/TestImpersonationMetadata.java#L64


https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/TestDropTable.java#L166


https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestWriter.java#L60


> ListVector causes OversizedAllocationException
> --
>
> Key: DRILL-4410
> URL: https://issues.apache.org/jira/browse/DRILL-4410
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Reporter: MinJi Kim
>Assignee: MinJi Kim
>
> Reading large data set with array/list causes the following problem.  This 
> happens when union type is enabled.
> (org.apache.drill.exec.exception.OversizedAllocationException) Unable to 
> expand the buffer. Max allowed buffer size is reached.
> org.apache.drill.exec.vector.UInt1Vector.reAlloc():214
> org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406
> org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298
> org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307
> org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563
> org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115
> org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100
> org.apache.drill.exec.vector.complex.ListVector.copyFrom():97
> org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152896#comment-15152896
 ] 

ASF GitHub Bot commented on DRILL-4387:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/379#discussion_r53366236
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java
 ---
@@ -87,9 +87,6 @@ public ScanBatch getBatch(FragmentContext context, 
ParquetRowGroupScan rowGroupS
   newColumns.add(column);
 }
   }
-  if (newColumns.isEmpty()) {
--- End diff --

@amansinha100 , I made slightly change to the patch to address the 
comments. Could you please take another look?

Thanks!



> Improve execution side when it handles skipAll query
> 
>
> Key: DRILL-4387
> URL: https://issues.apache.org/jira/browse/DRILL-4387
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.6.0
>
>
> DRILL-4279 changes the planner side and the RecordReader in the execution 
> side when they handles skipAll query. However, it seems there are other 
> places in the codebase that do not handle skipAll query efficiently. In 
> particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty 
> column list with star column. This essentially will force the execution side 
> (RecordReader) to fetch all the columns for data source. Such behavior will 
> lead to big performance overhead for the SCAN operator.
> To improve Drill's performance, we should change those places as well, as a 
> follow-up work after DRILL-4279.
> One simple example of this problem is:
> {code}
>SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
> {code}
> The query does not require any regular column from the parquet file. However, 
> ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
> column list. In case table has dozens or hundreds of columns, this will make 
> SCAN operator much more expensive than necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4281) Drill should support inbound impersonation

2016-02-18 Thread Suresh Ollala (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4281:
-
Reviewer: Chun Chang

> Drill should support inbound impersonation
> --
>
> Key: DRILL-4281
> URL: https://issues.apache.org/jira/browse/DRILL-4281
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: security
>
> Today Drill supports impersonation *to* external sources. For example I can 
> authenticate to Drill as myself and then Drill will access HDFS using 
> impersonation
> In many scenarios we also need impersonation to Drill. For example I might 
> use some front end tool (such as Tableau) and authenticate to it as myself. 
> That tool (server version) then needs to access Drill to perform queries and 
> I want those queries to run as myself, not as the Tableau user. While in 
> theory the intermediate tool could store the userid & password for every user 
> to the Drill this isn't a scalable or very secure solution.
> Note that HS2 today does support inbound impersonation as described here:  
> https://issues.apache.org/jira/browse/HIVE-5155 
> The above is not the best approach as it is tied to the connection object 
> which is very coarse grained and potentially expensive. It would be better if 
> there was a call on the ODBC/JDBC driver to switch the identity on a existing 
> connection. Most modern SQL databases (Oracle, DB2) support such function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-18 Thread Deneche A. Hakim (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152954#comment-15152954
 ] 

Deneche A. Hakim commented on DRILL-4392:
-

[~sphillips] any ETA when this could be fixed ?

> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Steven Phillips
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support. 
> Also, it will also fail one of the Pre-commit functional test [1]
> [1] 
> https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153156#comment-15153156
 ] 

ASF GitHub Bot commented on DRILL-4387:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/379#discussion_r53386823
  
--- Diff: 
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseGroupScan.java
 ---
@@ -34,6 +34,7 @@
 import java.util.concurrent.TimeUnit;
 
 import com.fasterxml.jackson.annotation.JsonCreator;
+import com.google.common.base.Objects;
--- End diff --

unnecessary import ?


> Improve execution side when it handles skipAll query
> 
>
> Key: DRILL-4387
> URL: https://issues.apache.org/jira/browse/DRILL-4387
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.6.0
>
>
> DRILL-4279 changes the planner side and the RecordReader in the execution 
> side when they handles skipAll query. However, it seems there are other 
> places in the codebase that do not handle skipAll query efficiently. In 
> particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty 
> column list with star column. This essentially will force the execution side 
> (RecordReader) to fetch all the columns for data source. Such behavior will 
> lead to big performance overhead for the SCAN operator.
> To improve Drill's performance, we should change those places as well, as a 
> follow-up work after DRILL-4279.
> One simple example of this problem is:
> {code}
>SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
> {code}
> The query does not require any regular column from the parquet file. However, 
> ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
> column list. In case table has dozens or hundreds of columns, this will make 
> SCAN operator much more expensive than necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4387) Improve execution side when it handles skipAll query

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153159#comment-15153159
 ] 

ASF GitHub Bot commented on DRILL-4387:
---

Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/379#issuecomment-185934019
  
LGTM +1. 


> Improve execution side when it handles skipAll query
> 
>
> Key: DRILL-4387
> URL: https://issues.apache.org/jira/browse/DRILL-4387
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.6.0
>
>
> DRILL-4279 changes the planner side and the RecordReader in the execution 
> side when they handles skipAll query. However, it seems there are other 
> places in the codebase that do not handle skipAll query efficiently. In 
> particular, in GroupScan or ScanBatchCreator, we will replace a NULL or empty 
> column list with star column. This essentially will force the execution side 
> (RecordReader) to fetch all the columns for data source. Such behavior will 
> lead to big performance overhead for the SCAN operator.
> To improve Drill's performance, we should change those places as well, as a 
> follow-up work after DRILL-4279.
> One simple example of this problem is:
> {code}
>SELECT DISTINCT substring(dir1, 5) from  dfs.`/Path/To/ParquetTable`;  
> {code}
> The query does not require any regular column from the parquet file. However, 
> ParquetRowGroupScan and ParquetScanBatchCreator will put star column as the 
> column list. In case table has dozens or hundreds of columns, this will make 
> SCAN operator much more expensive than necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-3869) Trailing semicolon causes web UI to fail

2016-02-18 Thread Krystal (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal closed DRILL-3869.
--

Verified that queries that end with a ";" run successfully from web UI.

> Trailing semicolon causes web UI to fail
> 
>
> Key: DRILL-3869
> URL: https://issues.apache.org/jira/browse/DRILL-3869
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Andrew
> Fix For: 1.5.0
>
>
> When submitting a query through the web UI, if the user types in a trailing 
> ';' the query will fail with the error message: 
> org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: 
> Encountered ";" at line 1, column 42. Was expecting one of: "OFFSET" ... 
> "FETCH" ... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (DRILL-4353) Expired sessions in web server are not cleaning up resources, leading to resource leak

2016-02-18 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli closed DRILL-4353.


Verified!

> Expired sessions in web server are not cleaning up resources, leading to 
> resource leak
> --
>
> Key: DRILL-4353
> URL: https://issues.apache.org/jira/browse/DRILL-4353
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP, Web Server
>Affects Versions: 1.5.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently we store the session resources (including DrillClient) in attribute 
> {{SessionAuthentication}} object which implements 
> {{HttpSessionBindingListener}}. Whenever a session is invalidated, all 
> attributes are removed and if an attribute class implements 
> {{HttpSessionBindingListener}}, listener is informed. 
> {{SessionAuthentication}} implementation of {{HttpSessionBindingListener}} 
> logs out the user which includes cleaning up the resources as well, but 
> {{SessionAuthentication}} relies on ServletContext stored in thread local 
> variable (see 
> [here|https://github.com/eclipse/jetty.project/blob/jetty-9.1.5.v20140505/jetty-security/src/main/java/org/eclipse/jetty/security/authentication/SessionAuthentication.java#L88]).
>  In case of thread that cleans up the expired sessions there is no 
> {{ServletContext}} in thread local variable, leading to not logging out the 
> user properly and resource leak.
> Fix: Add {{HttpSessionEventListener}} to cleanup the 
> {{SessionAuthentication}} and resources every time a HttpSession is expired 
> or invalidated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4353) Expired sessions in web server are not cleaning up resources, leading to resource leak

2016-02-18 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-4353:
-
Reviewer: Rahul Challapalli  (was: Krystal)

> Expired sessions in web server are not cleaning up resources, leading to 
> resource leak
> --
>
> Key: DRILL-4353
> URL: https://issues.apache.org/jira/browse/DRILL-4353
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP, Web Server
>Affects Versions: 1.5.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently we store the session resources (including DrillClient) in attribute 
> {{SessionAuthentication}} object which implements 
> {{HttpSessionBindingListener}}. Whenever a session is invalidated, all 
> attributes are removed and if an attribute class implements 
> {{HttpSessionBindingListener}}, listener is informed. 
> {{SessionAuthentication}} implementation of {{HttpSessionBindingListener}} 
> logs out the user which includes cleaning up the resources as well, but 
> {{SessionAuthentication}} relies on ServletContext stored in thread local 
> variable (see 
> [here|https://github.com/eclipse/jetty.project/blob/jetty-9.1.5.v20140505/jetty-security/src/main/java/org/eclipse/jetty/security/authentication/SessionAuthentication.java#L88]).
>  In case of thread that cleans up the expired sessions there is no 
> {{ServletContext}} in thread local variable, leading to not logging out the 
> user properly and resource leak.
> Fix: Add {{HttpSessionEventListener}} to cleanup the 
> {{SessionAuthentication}} and resources every time a HttpSession is expired 
> or invalidated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153419#comment-15153419
 ] 

ASF GitHub Bot commented on DRILL-4392:
---

GitHub user jinfengni opened a pull request:

https://github.com/apache/drill/pull/383

DRILL-4392: Fix CTAS partition to remove one unnecessary internal fie…

…ld in generated parquet files.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinfengni/incubator-drill DRILL-4392

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/383.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #383


commit bc6427685b9b3a7846acbc177cfe6e7e1163ec6e
Author: Jinfeng Ni 
Date:   2016-02-18T23:38:42Z

DRILL-4392: Fix CTAS partition to remove one unnecessary internal field in 
generated parquet files.




> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Steven Phillips
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support.

[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-18 Thread Jinfeng Ni (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153435#comment-15153435
 ] 

Jinfeng Ni commented on DRILL-4392:
---

I submitted a patch for this issue.

Seems the issue was caused by the change of MaterializedField.getPath() 
returning a String in stead of SchemaPath. That makes the check for the 
internal partition-related field fail, since one uses String, while the other 
uses SchemaPath.  The fix is simply to compare two Strings. 

On a side note, I'm not sure if it is a right way to change 
MaterializedField.getPath() to return String, in stead of SchemaPath.  
Returning String means we have to ensure the case sensitivity in comparison is 
consistent across the code base, which seems harder to enforce.  



> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Steven Phillips
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support. 
> Also, it will also fail one of the Pre-commit functional test [1]
> [1] 
> https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q



--
This message was sent by

[jira] [Commented] (DRILL-4410) ListVector causes OversizedAllocationException

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153584#comment-15153584
 ] 

ASF GitHub Bot commented on DRILL-4410:
---

Github user minji-kim commented on the pull request:

https://github.com/apache/drill/pull/380#issuecomment-186020856
  
I made a change in the test to use the temporary directory, since that 
seems to be questionable.  Also I added another test in TestValueVector.  Both 
tests fail without the change in ListVector. 


> ListVector causes OversizedAllocationException
> --
>
> Key: DRILL-4410
> URL: https://issues.apache.org/jira/browse/DRILL-4410
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Reporter: MinJi Kim
>Assignee: MinJi Kim
>
> Reading large data set with array/list causes the following problem.  This 
> happens when union type is enabled.
> (org.apache.drill.exec.exception.OversizedAllocationException) Unable to 
> expand the buffer. Max allowed buffer size is reached.
> org.apache.drill.exec.vector.UInt1Vector.reAlloc():214
> org.apache.drill.exec.vector.UInt1Vector$Mutator.setSafe():406
> org.apache.drill.exec.vector.complex.ListVector$Mutator.setNotNull():298
> org.apache.drill.exec.vector.complex.ListVector$Mutator.startNewValue():307
> org.apache.drill.exec.vector.complex.impl.UnionListWriter.startList():563
> org.apache.drill.exec.vector.complex.impl.ComplexCopier.writeValue():115
> org.apache.drill.exec.vector.complex.impl.ComplexCopier.copy():100
> org.apache.drill.exec.vector.complex.ListVector.copyFrom():97
> org.apache.drill.exec.vector.complex.ListVector.copyFromSafe():89
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.projectBuildRecord():356
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.executeProbePhase():173
> org.apache.drill.exec.test.generated.HashJoinProbeGen197.probeAndProject():223
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():233
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153735#comment-15153735
 ] 

ASF GitHub Bot commented on DRILL-4392:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/383#issuecomment-186053195
  
Fix looks fine. +1

However I think we should open a separate bug that this should be fixed in 
planning when we add the partition column we should the projection to remove 
this. 


> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Steven Phillips
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support. 
> Also, it will also fail one of the Pre-commit functional test [1]
> [1] 
> https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153739#comment-15153739
 ] 

ASF GitHub Bot commented on DRILL-4392:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/383#issuecomment-186058545
  
Not projection removal but rather writer rewrite. 


> CTAS with partition writes an internal field into generated parquet files
> -
>
> Key: DRILL-4392
> URL: https://issues.apache.org/jira/browse/DRILL-4392
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Steven Phillips
>Priority: Blocker
>
> On today's master branch:
> {code}
> select * from sys.version;
> +-+---+-++-++
> | version | commit_id |   
> commit_message|commit_time
>  |   build_email   | build_time |
> +-+---+-++-++
> | 1.5.0-SNAPSHOT  | 9a3a5c4ff670a50a49f61f97dd838da59a12f976  | DRILL-4382: 
> Remove dependency on drill-logical from vector package  | 16.02.2016 @ 
> 11:58:48 PST  | j...@apache.org  | 16.02.2016 @ 17:40:44 PST  |
> +-+---+-++-
> {code}
> Parquet table created by Drill's CTAS statement has one internal field 
> "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R".   This additional field would not 
> impact non-star query, but would cause incorrect result for star query.
> {code}
> use dfs.tmp;
> create table nation_ctas partition by (n_regionkey) as select * from 
> cp.`tpch/nation.parquet`;
> select * from dfs.tmp.nation_ctas limit 6;
> +--++--+-++
> | n_nationkey  | n_name | n_regionkey  |  
>   n_comment   
>  | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R  |
> +--++--+-++
> | 5| ETHIOPIA   | 0| ven packages wake quickly. 
> regu  
>| true   |
> | 15   | MOROCCO| 0| rns. blithely bold courts 
> among the closely regular packages use furiously bold platelets?  
> | false  |
> | 14   | KENYA  | 0|  pending excuses haggle 
> furiously deposits. pending, express pinto beans wake fluffily past t 
>   | false  |
> | 0| ALGERIA| 0|  haggle. carefully final 
> deposits detect slyly agai
>  | false  |
> | 16   | MOZAMBIQUE | 0| s. ironic, unusual 
> asymptotes wake blithely r
>| false  |
> | 24   | UNITED STATES  | 1| y final packages. slow foxes 
> cajole quickly. quickly silent platelets breach ironic accounts. unusual 
> pinto be  | true
> {code}
> This basically breaks all the parquet files created by Drill's CTAS with 
> partition support. 
> Also, it will also fail one of the Pre-commit functional test [1]
> [1] 
> https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4275) Refactor e/pstore interfaces and their factories to provide a unified mechanism to access stores

2016-02-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153741#comment-15153741
 ] 

ASF GitHub Bot commented on DRILL-4275:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/374


> Refactor e/pstore interfaces and their factories to provide a unified 
> mechanism to access stores
> 
>
> Key: DRILL-4275
> URL: https://issues.apache.org/jira/browse/DRILL-4275
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: Hanifi Gunes
>Assignee: Deneche A. Hakim
>
> We rely on E/PStore interfaces to persist data. Even though E/PStore stands 
> for Ephemeral and Persistent stores respectively, the current design for 
> EStore does not extend the interface/functionality of PStore at all, which 
> hints abstraction for EStore is redundant. This issue proposes a new unified 
> Store interface replacing the old E/PStore that exposes an additional method 
> that report persistence level as follows:
> {code:title=Store interface}
> interface Store {
>   StoreMode getMode();
>   V get(String key);
>   ...
> }
> enum StoreMode {
>   EPHEMERAL,
>   PERSISTENT,
>   ...
> }
> {code}
> The new design brings in less redundancy, more centralized code, ease to 
> reason and maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4275) Refactor e/pstore interfaces and their factories to provide a unified mechanism to access stores

2016-02-18 Thread Hanifi Gunes (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanifi Gunes reassigned DRILL-4275:
---

Assignee: Hanifi Gunes  (was: Deneche A. Hakim)

> Refactor e/pstore interfaces and their factories to provide a unified 
> mechanism to access stores
> 
>
> Key: DRILL-4275
> URL: https://issues.apache.org/jira/browse/DRILL-4275
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: Hanifi Gunes
>Assignee: Hanifi Gunes
>
> We rely on E/PStore interfaces to persist data. Even though E/PStore stands 
> for Ephemeral and Persistent stores respectively, the current design for 
> EStore does not extend the interface/functionality of PStore at all, which 
> hints abstraction for EStore is redundant. This issue proposes a new unified 
> Store interface replacing the old E/PStore that exposes an additional method 
> that report persistence level as follows:
> {code:title=Store interface}
> interface Store {
>   StoreMode getMode();
>   V get(String key);
>   ...
> }
> enum StoreMode {
>   EPHEMERAL,
>   PERSISTENT,
>   ...
> }
> {code}
> The new design brings in less redundancy, more centralized code, ease to 
> reason and maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4275) Refactor e/pstore interfaces and their factories to provide a unified mechanism to access stores

2016-02-18 Thread Hanifi Gunes (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanifi Gunes updated DRILL-4275:

Issue Type: Sub-task  (was: Improvement)
Parent: DRILL-4186

> Refactor e/pstore interfaces and their factories to provide a unified 
> mechanism to access stores
> 
>
> Key: DRILL-4275
> URL: https://issues.apache.org/jira/browse/DRILL-4275
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: Hanifi Gunes
>Assignee: Hanifi Gunes
>
> We rely on E/PStore interfaces to persist data. Even though E/PStore stands 
> for Ephemeral and Persistent stores respectively, the current design for 
> EStore does not extend the interface/functionality of PStore at all, which 
> hints abstraction for EStore is redundant. This issue proposes a new unified 
> Store interface replacing the old E/PStore that exposes an additional method 
> that report persistence level as follows:
> {code:title=Store interface}
> interface Store {
>   StoreMode getMode();
>   V get(String key);
>   ...
> }
> enum StoreMode {
>   EPHEMERAL,
>   PERSISTENT,
>   ...
> }
> {code}
> The new design brings in less redundancy, more centralized code, ease to 
> reason and maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

42 matches

Mail list logo