[GitHub] drill issue #662: DRILL-5051: Fix incorrect result returned in nest query wi...

2016-12-14 Thread sudheeshkatkam
Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/662
  
@zbdzzg Thank you for the patch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #662: DRILL-5051: Fix incorrect result returned in nest query wi...

2016-12-14 Thread zbdzzg
Github user zbdzzg commented on the issue:

https://github.com/apache/drill/pull/662
  
@sudheeshkatkam Thanks for your review, commit message has been changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-5131) Parquet Writer fails with heap space not available error on TPCDS 1TB data set

2016-12-14 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-5131:


 Summary: Parquet Writer fails with heap space not available error 
on TPCDS 1TB data set
 Key: DRILL-5131
 URL: https://issues.apache.org/jira/browse/DRILL-5131
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.9.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=cf2b7c7

The below query fails with "Out of Heap Space" error and brings down the 
drillbit

{code}
create table store_sales as select
case when (columns[0]='') then cast(null as integer) else cast(columns[0] as 
integer) end as ss_sold_date_sk,
case when (columns[1]='') then cast(null as integer) else cast(columns[1] as 
integer) end as ss_sold_time_sk,
case when (columns[2]='') then cast(null as integer) else cast(columns[2] as 
integer) end as ss_item_sk,
case when (columns[3]='') then cast(null as integer) else cast(columns[3] as 
integer) end as ss_customer_sk,
case when (columns[4]='') then cast(null as integer) else cast(columns[4] as 
integer) end as ss_cdemo_sk,
case when (columns[5]='') then cast(null as integer) else cast(columns[5] as 
integer) end as ss_hdemo_sk,
case when (columns[6]='') then cast(null as integer) else cast(columns[6] as 
integer) end as ss_addr_sk,
case when (columns[7]='') then cast(null as integer) else cast(columns[7] as 
integer) end as ss_store_sk,
case when (columns[8]='') then cast(null as integer) else cast(columns[8] as 
integer) end as ss_promo_sk,
case when (columns[9]='') then cast(null as integer) else cast(columns[9] as 
integer) end as ss_ticket_number,
case when (columns[10]='') then cast(null as integer) else cast(columns[10] as 
integer) end as ss_quantity,
case when (columns[11]='') then cast(null as decimal(7,2)) else 
cast(columns[11] as decimal(7,2)) end as ss_wholesale_cost,
case when (columns[12]='') then cast(null as decimal(7,2)) else 
cast(columns[12] as decimal(7,2)) end as ss_list_price,
case when (columns[13]='') then cast(null as decimal(7,2)) else 
cast(columns[13] as decimal(7,2)) end as ss_sales_price,
case when (columns[14]='') then cast(null as decimal(7,2)) else 
cast(columns[14] as decimal(7,2)) end as ss_ext_discount_amt,
case when (columns[15]='') then cast(null as decimal(7,2)) else 
cast(columns[15] as decimal(7,2)) end as ss_ext_sales_price,
case when (columns[16]='') then cast(null as decimal(7,2)) else 
cast(columns[16] as decimal(7,2)) end as ss_ext_wholesale_cost,
case when (columns[17]='') then cast(null as decimal(7,2)) else 
cast(columns[17] as decimal(7,2)) end as ss_ext_list_price,
case when (columns[18]='') then cast(null as decimal(7,2)) else 
cast(columns[18] as decimal(7,2)) end as ss_ext_tax,
case when (columns[19]='') then cast(null as decimal(7,2)) else 
cast(columns[19] as decimal(7,2)) end as ss_coupon_amt,
case when (columns[20]='') then cast(null as decimal(7,2)) else 
cast(columns[20] as decimal(7,2)) end as ss_net_paid,
case when (columns[21]='') then cast(null as decimal(7,2)) else 
cast(columns[21] as decimal(7,2)) end as ss_net_paid_inc_tax,
case when (columns[22]='') then cast(null as decimal(7,2)) else 
cast(columns[22] as decimal(7,2)) end as ss_net_profit
from dfs.`/drill/testdata/tpcds/text/sf1000/store_sales.dat`;
{code}

Exception from the logs
{code}
2016-12-14 14:23:49,303 [27ae4152-0fd4-aa0f-56db-a21e2f54d6c2:frag:1:14] ERROR 
o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, exiting. 
Information message: Unable to handle out of memory condition in 
FragmentExecutor.
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.parquet.bytes.CapacityByteArrayOutputStream.writeToOutput(CapacityByteArrayOutputStream.java:223)
 ~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.bytes.CapacityByteArrayOutputStream.writeTo(CapacityByteArrayOutputStream.java:239)
 ~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.bytes.BytesInput$CapacityBAOSBytesInput.writeAllTo(BytesInput.java:355)
 ~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.bytes.BytesInput$SequenceBytesIn.writeAllTo(BytesInput.java:266)
 ~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at org.apache.parquet.bytes.BytesInput.toByteArray(BytesInput.java:174) 
~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.bytes.BytesInput.toByteBuffer(BytesInput.java:185) 
~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.hadoop.DirectCodecFactory$SnappyCompressor.compress(DirectCodecFactory.java:291)
 ~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:94)
 ~[parquet-hadoop-1.8.1-drill-r0.jar:1.8.1-drill-r0]
at 

[GitHub] drill pull request #693: DRILL-5122: DrillBuf performs expensive logging if ...

2016-12-14 Thread sohami
Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/693#discussion_r92457956
  
--- Diff: pom.xml ---
@@ -423,21 +423,23 @@
   maven-surefire-plugin
   2.17
   
--Xms512m -Xmx3g -Ddrill.exec.http.enabled=false
-  -Ddrill.exec.sys.store.provider.local.write=false
-  
-Dorg.apache.drill.exec.server.Drillbit.system_options="org.apache.drill.exec.compile.ClassTransformer.scalar_replacement=on"
-  -Ddrill.test.query.printing.silent=true
-  -Ddrill.catastrophic_to_standard_out=true
+-Xms512m -Xmx3g
   -XX:MaxPermSize=512M -XX:MaxDirectMemorySize=3072M
-  -Djava.net.preferIPv4Stack=true
-  -Djava.awt.headless=true
   -XX:+CMSClassUnloadingEnabled -ea
 ${forkCount}
 true
 
   
./exec/jdbc/src/test/resources/storage-plugins.json
 
 
+  false
+  
false
+  
"org.apache.drill.exec.compile.ClassTransformer.scalar_replacement=on"
--- End diff --

The line exceeds 120 char limit. Please fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #693: DRILL-5122: DrillBuf performs expensive logging if ...

2016-12-14 Thread sohami
Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/693#discussion_r92472636
  
--- Diff: exec/memory/base/pom.xml ---
@@ -40,10 +40,21 @@
 
   
 
-
   
+
+  
+
+  
--- End diff --

Missing closing parenthesis in the comment. (Tests .. )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #693: DRILL-5122: DrillBuf performs expensive logging if ...

2016-12-14 Thread sohami
Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/693#discussion_r92462112
  
--- Diff: pom.xml ---
@@ -423,21 +423,23 @@
   maven-surefire-plugin
   2.17
   
--Xms512m -Xmx3g -Ddrill.exec.http.enabled=false
-  -Ddrill.exec.sys.store.provider.local.write=false
-  
-Dorg.apache.drill.exec.server.Drillbit.system_options="org.apache.drill.exec.compile.ClassTransformer.scalar_replacement=on"
-  -Ddrill.test.query.printing.silent=true
-  -Ddrill.catastrophic_to_standard_out=true
+-Xms512m -Xmx3g
   -XX:MaxPermSize=512M -XX:MaxDirectMemorySize=3072M
-  -Djava.net.preferIPv4Stack=true
-  -Djava.awt.headless=true
   -XX:+CMSClassUnloadingEnabled -ea
 ${forkCount}
 true
 
   
./exec/jdbc/src/test/resources/storage-plugins.json
 
 
+  false
+  
false
+  
"org.apache.drill.exec.compile.ClassTransformer.scalar_replacement=on"
+  
true
+  
true
+  
true
+  true
--- End diff --

"Java.net.preferIPv4Stack" property needs to be set as command line option 
since it's checked by VM only once at startup. 
[Reference](https://docs.oracle.com/javase/7/docs/api/java/net/doc-files/net-properties.html):

And Maven 
[documentation](http://maven.apache.org/surefire/maven-surefire-plugin/examples/system-properties.html)
 states that SystemProperties are used only for the ones that can be set after 
VM startup.

Guess same thing is with "java.awt.headless"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #686: DRILL-5117: Compile error when query a json file with 1000...

2016-12-14 Thread jinfengni
Github user jinfengni commented on the issue:

https://github.com/apache/drill/pull/686
  
I think the cause of DRILL-1808 is same as DRILL-5117. We should mark them 
as related or duplicated in the JIRA. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #594: DRILL-4842: SELECT * on JSON data results in Number...

2016-12-14 Thread Serhii-Harnyk
Github user Serhii-Harnyk commented on a diff in the pull request:

https://github.com/apache/drill/pull/594#discussion_r92478836
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java
 ---
@@ -510,10 +517,13 @@ private void writeDataAllText(MapWriter map, 
FieldSelection selection,
   case VALUE_NUMBER_FLOAT:
   case VALUE_NUMBER_INT:
   case VALUE_STRING:
+removeNotNullColumn(fieldName);
--- End diff --

To clean 'path' if it knows it is recovering from some errors, was added 
this line: 
https://github.com/Serhii-Harnyk/drill/blob/9ce6d56b46bcd540697c85cd2f280831dd50b277/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L243.
 Call of the method removeNotNullColumn() is a try to optimize the final size 
of the map fieldPathWriter, by removing fields, which were added to the map and 
initializing at current iteration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #686: DRILL-5117: Compile error when query a json file with 1000...

2016-12-14 Thread Serhii-Harnyk
Github user Serhii-Harnyk commented on the issue:

https://github.com/apache/drill/pull/686
  
@sudheeshkatkam, when running for example test testEXTERNAL_SORT(), 
generates class CopierGen4, in which methods doSetup() and doEval() does not 
splits correctly. But with this fix them both splits into the smaller methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Http Plugin

2016-12-14 Thread Remzi Düzağaç
Hi,

By http plugin, I was imagining a simple plugin which can query http
resources, such as rest services.
The plugin Charles has mentioned looks promising, it can be good starting
point. Thank you very much Charles

Best

On Tue, Dec 13, 2016 at 3:42 AM, Charles Givre  wrote:

> Maybe this one?
> https://github.com/kevinlynx/drill-storage-http <
> https://github.com/kevinlynx/drill-storage-http>
>
>
>
> > On Dec 12, 2016, at 18:16, Jim Scott  wrote:
> >
> > Can you elaborate on what you mean by http plugin?
> >
> > On Mon, Dec 12, 2016 at 5:08 PM, Remzi Düzağaç 
> wrote:
> >
> >> Hi Guys,
> >>
> >> I would like to implement (at least give a try) a http storage plugin.
> Are
> >> there any guide for plugin development to speed things up.
> >> I could only found a small section in contribution ides document. If
> there
> >> is any document or minimal code base to start would be very usefull
> >>
> >> Thanks
> >>
> >
> >
> >
> > --
> > *Jim Scott*
> > Director, Enterprise Strategy & Architecture
> > +1 (347) 746-9281
> > @kingmesal 
> >
> > 
> > [image: MapR Technologies] 
>
>


[GitHub] drill issue #686: DRILL-5117: Compile error when query a json file with 1000...

2016-12-14 Thread sudheeshkatkam
Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/686
  
@Serhii-Harnyk I am curious how this resolves DRILL-1808. Can you provide 
some details?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #686: DRILL-5117: Compile error when query a json file with 1000...

2016-12-14 Thread jinfengni
Github user jinfengni commented on the issue:

https://github.com/apache/drill/pull/686
  
+1

LGTM. Thanks for the PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #662: DRILL-5051: Fix incorrect result returned in nest query wi...

2016-12-14 Thread sudheeshkatkam
Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/662
  
+1

Can you please change the commit message to "DRILL-5051: Fix incorrect 
computation of 'fetch' in LimitRecordBatch when 'offset' is specified"? The 
JIRA/PR title is fine as is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #685: Drill 5043: Function that returns a unique id per session/...

2016-12-14 Thread sudheeshkatkam
Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/685
  
+1, pending minor change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #685: Drill 5043: Function that returns a unique id per s...

2016-12-14 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/685#discussion_r92454190
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/ContextFunctions.java
 ---
@@ -77,4 +76,28 @@ public void eval() {
   out.buffer = buffer;
 }
   }
+
+  /**
+   * Implement "session_id" function. Returns the unique id of the current 
session.
+   */
+  @FunctionTemplate(name = "session_id", scope = 
FunctionTemplate.FunctionScope.SIMPLE)
+  public static class SessionId implements DrillSimpleFunc {
+@Output VarCharHolder out;
+@Inject ContextInformation contextInfo;
+@Inject DrillBuf buffer;
+@Workspace int sessionIdBytesLength;
+
+public void setup() {
--- End diff --

Annotate with `@Override`, here and below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #686: DRILL-5117: Compile error when query a json file wi...

2016-12-14 Thread Serhii-Harnyk
Github user Serhii-Harnyk commented on a diff in the pull request:

https://github.com/apache/drill/pull/686#discussion_r92452238
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/compile/TestLargeFileCompilation.java
 ---
@@ -154,4 +158,20 @@ public void testProject() throws Exception {
 testNoResult(ITERATION_COUNT, LARGE_QUERY_SELECT_LIST);
   }
 
+  @Test
+  public void testSelectAllFromFileWithManyColumns() throws Exception {
+File path = new File(BaseTestQuery.getTempDir("json/input"));
--- End diff --

You are right, with this changes both tests testEXTERNAL_SORT and 
testTOP_N_SORT work. So I enabled them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #594: DRILL-4842: SELECT * on JSON data results in Number...

2016-12-14 Thread chunhui-shi
Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/594#discussion_r92452185
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java
 ---
@@ -510,10 +517,13 @@ private void writeDataAllText(MapWriter map, 
FieldSelection selection,
   case VALUE_NUMBER_FLOAT:
   case VALUE_NUMBER_INT:
   case VALUE_STRING:
+removeNotNullColumn(fieldName);
--- End diff --

I feel uncomfortable since the fix added 'removeNotNullColumn(fieldName)' 
on the code path that is supposed to be hotspot. Could the reader just clean 
'path' if it knows it is recovering from some errors?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #686: DRILL-5117: Compile error when query a json file with 1000...

2016-12-14 Thread Serhii-Harnyk
Github user Serhii-Harnyk commented on the issue:

https://github.com/apache/drill/pull/686
  
@jinfengni, could you please review new changes? I squashed all changes 
into single commit and rebased into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #683: DRILL-5114: Rationalize use of Logback logging in u...

2016-12-14 Thread chunhui-shi
Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/683#discussion_r92451306
  
--- Diff: logical/src/test/resources/logback-test.xml ---
@@ -0,0 +1,42 @@
+

[GitHub] drill pull request #683: DRILL-5114: Rationalize use of Logback logging in u...

2016-12-14 Thread chunhui-shi
Github user chunhui-shi commented on a diff in the pull request:

https://github.com/apache/drill/pull/683#discussion_r92451193
  
--- Diff: exec/memory/base/src/test/resources/logback-test.xml ---
@@ -0,0 +1,42 @@
+
--- End diff --

Why could not the test here use existing logback-test.xml?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #685: Drill 5043: Function that returns a unique id per session/...

2016-12-14 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/685
  
Looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #685: Drill 5043: Function that returns a unique id per session/...

2016-12-14 Thread nagarajanchinnasamy
Github user nagarajanchinnasamy commented on the issue:

https://github.com/apache/drill/pull/685
  
@arina-ielchiieva 

  - squashing done :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---