[jira] [Created] (DRILL-6073) Errors in trig function descriptions

2018-01-04 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-6073:
--

 Summary: Errors in trig function descriptions
 Key: DRILL-6073
 URL: https://issues.apache.org/jira/browse/DRILL-6073
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Reporter: Paul Rogers
Assignee: Bridget Bevens
Priority: Minor


The documentation contains a [Math and Trig 
|http://drill.apache.org/docs/math-and-trig/] page. Information about the trig 
functions are wrong.

{{asin\(x)}}, {{acos\(x)}}, {{atan\(x)}}: text: "Inverse sine/cosine/tangent of 
angle x in radians". But, of course, an inverse function produces radians as 
output, and takes the sin/cos/tan value as input. This, the value "x" is not an 
angle in radians, it is the sin/cos/tan value to be inverted.

{{SINH()}}, {{COSH()}}, {{TANH()}}: these each take an input, but the 
placeholder "x" is not shown, incorrectly suggesting that they take no argument.

List of functions: The list of non-trig functions earlier on the page is in the 
form of a table. For symmetry, the list of trig functions should also be a 
table, not a bulleted list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-6072) Broken link on Lexical Structure in docs

2018-01-04 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-6072:
--

 Summary: Broken link on Lexical Structure in docs 
 Key: DRILL-6072
 URL: https://issues.apache.org/jira/browse/DRILL-6072
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Reporter: Paul Rogers
Assignee: Bridget Bevens
Priority: Minor


The [Lexical Structure|http://drill.apache.org/docs/lexical-structure] page has 
a TOC at the top. The link for "Identifiers" is wrong. The target in the TOC is 
"#identifier". The actual link in the text is "#identifiers".

Please fix the TOC to match the text. (Please don't change the actual target 
anchor, I have links that point to that target.)





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] drill pull request #1047: DRILL-5970: DrillParquetReader always builds the s...

2018-01-04 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1047#discussion_r159799621
  
--- Diff: exec/vector/src/main/codegen/templates/BaseWriter.java ---
@@ -106,37 +114,37 @@
 MapOrListWriter list(String name);
 boolean isMapWriter();
 boolean isListWriter();
-UInt1Writer uInt1(String name);
-UInt2Writer uInt2(String name);
-UInt4Writer uInt4(String name);
-UInt8Writer uInt8(String name);
-VarCharWriter varChar(String name);
-Var16CharWriter var16Char(String name);
-TinyIntWriter tinyInt(String name);
-SmallIntWriter smallInt(String name);
-IntWriter integer(String name);
-BigIntWriter bigInt(String name);
-Float4Writer float4(String name);
-Float8Writer float8(String name);
-BitWriter bit(String name);
-VarBinaryWriter varBinary(String name);
+UInt1Writer uInt1(String name, TypeProtos.DataMode dataMode);
--- End diff --

Really not sure we want to do this. These writers are also used in JSON, 
and are used for every field in every object. Now, every request to get a 
writer will have to pass the mode. This seems like we are making the problem 
far, far more complex than necessary.

The current code has rules for the type to choose. In general, the type is 
OPTIONAL for single (scalar) values and REPEATED for repeated (array) values.

The mode passed here *cannot* be REPEATED: just won't work. So, can we pass 
REQUIRED?

Let's think about how these are used in JSON. In JSON, we discover fields 
as we read each object. A field need not appear in the first object, it might 
appear 20 objects in. Then, after the 25th object, it may not ever appear again.

So, in JSON, we can *never* use REQUIRED; we *must* use OPTIONAL. Key 
reason: JSON provides no schema and Drill cannot predict what the schema will 
turn out to be.

Now, let's move to Parquet. Parquet does have a schema. In fact, with 
Parquet, we know the schema before we read the first row. And, in Parquet, a 
scalar column can be REQUIRED or OPTIONAL.

All of this suggests a better solution. (Indeed, the solution implemented 
in the new column writer layer.) Allow Parquet to declare an "early" schema. 
That is, prior to the first row, call methods that declare each column with its 
cardinality.

Then, when reading the field, always pass the name as today. If the field 
is new, it *must* be OPTIONAL. Otherwise, it will use whatever was used before.

Let's say this another way. Below these methods is a call to an 
`addOrGet()` method. In Parquet, call those methods before the first row to do 
the "add" part. Then, later, the method will do only the "get" part.

The result is that you won't have to modify so many files, won't have to 
complicate the APIs and won't have to worry about a client passing in REPEATED 
mode for a scalar column.


---


Parquet MR version

2018-01-04 Thread Vlad Rozov

Hi everyone,

With parquet-mr 1.9.0 released more than 1 year ago and parquet-mr 1.8.2 
almost a year ago, should the drill dependency on parquet-mr be updated 
from the current custom 1.8.1-drill-r0? As far as I can tell both 1.8.2 
and 1.9.0 have fixes from the 1.8.1-drill-r0 patch. Will the community 
prefer 1.9.0 over 1.8.2 even if 1.9.0 may introduce some incompatibilities?


Thank you,

Vlad


[GitHub] drill issue #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout(int)

2018-01-04 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1024
  
Ready for a final review. 
All unit tests, with the exception of 
`PreparedStatementTest.testServerTriggeredQueryTimeout` . The test is being 
ignored because the timed pause injection is not being honoured for a 
PreparedStatement, though it is honoured for a regular Statement. In actual 
dev/functional testing, however, the timeout works, which makes me believe 
there is a limitation with the test framework injecting pauses for Prepared 
Statement.


---


[GitHub] drill pull request #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout...

2018-01-04 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1024#discussion_r159790956
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ---
@@ -156,29 +156,19 @@ public void cleanUp() {
   }
 
   @Override
-  public int getQueryTimeout() throws AlreadyClosedSqlException
+  public int getQueryTimeout() throws AlreadyClosedSqlException, 
SQLException
   {
 throwIfClosed();
-return 0;  // (No no timeout.)
+return super.getQueryTimeout();
   }
 
   @Override
-  public void setQueryTimeout( int milliseconds )
+  public void setQueryTimeout( int seconds )
   throws AlreadyClosedSqlException,
  InvalidParameterSqlException,
- SQLFeatureNotSupportedException {
+ SQLException {
--- End diff --

+1


---


[GitHub] drill issue #1076: DRILL-6036: Create sys.connections table

2018-01-04 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1076
  
+1 (binding)


---


[jira] [Created] (DRILL-6071) Limit batch size for flatten operator

2018-01-04 Thread Padma Penumarthy (JIRA)
Padma Penumarthy created DRILL-6071:
---

 Summary: Limit batch size for flatten operator
 Key: DRILL-6071
 URL: https://issues.apache.org/jira/browse/DRILL-6071
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.12.0
Reporter: Padma Penumarthy
Assignee: Padma Penumarthy
 Fix For: 1.13.0


flatten currently uses an adaptive algorithm to control the outgoing batch 
size. 
While processing the input batch, it adjusts the number of records in outgoing 
batch based on memory usage so far. Once memory usage exceeds the configured 
limit, the algorithm becomes more proactive and adjusts the limit half way 
through  and end of every batch. All this periodic checking of memory usage is 
unnecessary overhead and impacts performance. Also, we will know only after the 
fact. 

Instead, figure out how many rows should be there in the outgoing batch from 
incoming batch.
The way to do that would be to figure out average row size of the outgoing 
batch and based on that figure out how many rows can be there for a given 
amount of memory. value vectors provide us the necessary information to be able 
to figure this out.








--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] drill issue #1076: DRILL-6036: Create sys.connections table

2018-01-04 Thread sohami
Github user sohami commented on the issue:

https://github.com/apache/drill/pull/1076
  
+1 LGTM. Thanks for the changes.


---


[GitHub] drill issue #1077: DRILL-5068: Create sys.profiles and sys.profiles_json tab...

2018-01-04 Thread sohami
Github user sohami commented on the issue:

https://github.com/apache/drill/pull/1077
  
Thanks for the changes. +1 LGTM.


---


Re: Proposed Slack Channel for Drill Users & Devs

2018-01-04 Thread John Omernik
Not all of us have a required email address? :)

If you have an *@apache.org , @dremio.com
, @mapr.com , @maprtech.com
, @utk.edu , @simba.com
, or @sfu.ca  *email address, you can create
an account .


On Thu, Jan 4, 2018 at 12:38 PM, Abhishek Girish  wrote:

> Like Robert mentioned, we do have a Slack channel, although I'm not sure
> who's the admin now. It's been quite in there, but we should be able to
> resume using that.
> On Thu, Jan 4, 2018 at 10:19 AM Manjeet Singh 
> wrote:
>
> > Please consider me
> >
> > On 4 Jan 2018 9:40 pm, "Charles Givre"  wrote:
> >
> > > Hello everyone,
> > > I’m curious, if I were to start a Slack channel for Drill users and
> Devs,
> > > would there be interest?
> > > — C
> >
>


Re: Proposed Slack Channel for Drill Users & Devs

2018-01-04 Thread Abhishek Girish
Like Robert mentioned, we do have a Slack channel, although I'm not sure
who's the admin now. It's been quite in there, but we should be able to
resume using that.
On Thu, Jan 4, 2018 at 10:19 AM Manjeet Singh 
wrote:

> Please consider me
>
> On 4 Jan 2018 9:40 pm, "Charles Givre"  wrote:
>
> > Hello everyone,
> > I’m curious, if I were to start a Slack channel for Drill users and Devs,
> > would there be interest?
> > — C
>


Re: Proposed Slack Channel for Drill Users & Devs

2018-01-04 Thread Manjeet Singh
Please consider me

On 4 Jan 2018 9:40 pm, "Charles Givre"  wrote:

> Hello everyone,
> I’m curious, if I were to start a Slack channel for Drill users and Devs,
> would there be interest?
> — C


Re: Proposed Slack Channel for Drill Users & Devs

2018-01-04 Thread John Omernik
Yes Please!

On Thu, Jan 4, 2018 at 11:36 AM, Robert Wu  wrote:

> Hi,
>
> I think someone created one a while back (under "drillers.slack.com").
>
> Best regards,
>
> Rob
>
> -Original Message-
> From: Charles Givre [mailto:cgi...@gmail.com]
> Sent: Thursday, January 04, 2018 8:10 AM
> To: dev@drill.apache.org; user 
> Subject: Proposed Slack Channel for Drill Users & Devs
>
> Hello everyone,
> I’m curious, if I were to start a Slack channel for Drill users and Devs,
> would there be interest?
> — C
>


RE: Proposed Slack Channel for Drill Users & Devs

2018-01-04 Thread Robert Wu
Hi,

I think someone created one a while back (under "drillers.slack.com").

Best regards,

Rob

-Original Message-
From: Charles Givre [mailto:cgi...@gmail.com] 
Sent: Thursday, January 04, 2018 8:10 AM
To: dev@drill.apache.org; user 
Subject: Proposed Slack Channel for Drill Users & Devs

Hello everyone, 
I’m curious, if I were to start a Slack channel for Drill users and Devs, would 
there be interest?
— C


[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

2018-01-04 Thread vdiravka
GitHub user vdiravka opened a pull request:

https://github.com/apache/drill/pull/1083

DRILL-4185: UNION ALL involving empty directory on any side of union …

…all results in Failed query
SchemaLessTable, SchemaLessScan, SchemaLessBatchCreator, SchemaLessBatch 
classes are inroduced. 
The main idea that empty directory is a Drill (SchemaLess) table.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vdiravka/drill DRILL-4185

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1083.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1083


commit 0aeb9ecc94a25fd19002ffdacdbfd86ec4a8a90f
Author: Vitalii Diravka 
Date:   2017-12-01T20:48:05Z

DRILL-4185: UNION ALL involving empty directory on any side of union all 
results in Failed query




---


[jira] [Created] (DRILL-6070) Hash join with empty tables should not do casting of data types to INT

2018-01-04 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-6070:
--

 Summary: Hash join with empty tables should not do casting of data 
types to INT
 Key: DRILL-6070
 URL: https://issues.apache.org/jira/browse/DRILL-6070
 Project: Apache Drill
  Issue Type: Bug
Reporter: Vitalii Diravka
 Fix For: Future


LeftJoin query by leveraging HashJoin operator leads to error, but by using 
MergeJoin works fine.

{code}
0: jdbc:drill:zk=local> alter session set `planner.enable_hashjoin` = true;
+---+---+
|  ok   |  summary  |
+---+---+
| true  | planner.enable_hashjoin updated.  |
+---+---+
1 row selected (0.078 seconds)
0: jdbc:drill:zk=local> alter session set `planner.enable_mergejoin` = false;
+---++
|  ok   |  summary   |
+---++
| true  | planner.enable_mergejoin updated.  |
+---++
1 row selected (0.079 seconds)
0: jdbc:drill:zk=local> select t1.a1, t1.b1, t2.a2, t2.b2 from 
dfs.`/home/vitalii/IdeaProjects/drill-fork/exec/java-exec/target/test-classes/jsoninput/nullable1.json`
 t1 left join 
dfs.`/home/vitalii/IdeaProjects/drill-fork/exec/java-exec/src/test/resources/project/pushdown/empty0.json`
 t2 on t1.b1 = t2.b2;
Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts 
between 1. Numeric data
 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARCHAR, Right 
type: INT. Add explicit casts to avoid this error

Fragment 0:0

[Error Id: 2cfc662f-48c2-4e62-a2ea-5a0f33d64c9b on vitalii-pc:31010] 
(state=,code=0)
{code}

{code}
00-00Screen : rowType = RecordType(ANY a1, ANY b1, ANY a2, ANY b2): 
rowcount = 1.0, cumulative cost = {2.1 rows, 20.1 cpu, 0.0 io, 0.0 network, 
17.6 memory}, id = 930
00-01  Project(a1=[$0], b1=[$1], a2=[$2], b2=[$3]) : rowType = 
RecordType(ANY a1, ANY b1, ANY a2, ANY b2): rowcount = 1.0, cumulative cost = 
{2.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 929
00-02Project(a1=[$1], b1=[$0], a2=[$3], b2=[$2]) : rowType = 
RecordType(ANY a1, ANY b1, ANY a2, ANY b2): rowcount = 1.0, cumulative cost = 
{2.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 928
00-03  HashJoin(condition=[=($0, $2)], joinType=[left]) : rowType = 
RecordType(ANY b1, ANY a1, ANY b2, ANY a2): rowcount = 1.0, cumulative cost = 
{2.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 927
00-05Scan(groupscan=[EasyGroupScan 
[selectionRoot=file:/home/vitalii/IdeaProjects/drill-fork/exec/java-exec/target/test-classes/jsoninput/nullable1.json,
 numFiles=1, columns=[`b1`, `a1`], 
files=[file:/home/vitalii/IdeaProjects/drill-fork/exec/java-exec/target/test-classes/jsoninput/nullable1.json]]])
 : rowType = RecordType(ANY b1, ANY a1): rowcount = 1.0, cumulative cost = {0.0 
rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 925
00-04Scan(groupscan=[EasyGroupScan 
[selectionRoot=file:/home/vitalii/IdeaProjects/drill-fork/exec/java-exec/src/test/resources/project/pushdown/empty0.json,
 numFiles=1, columns=[`b2`, `a2`], 
files=[file:/home/vitalii/IdeaProjects/drill-fork/exec/java-exec/src/test/resources/project/pushdown/empty0.json]]])
 : rowType = RecordType(ANY b2, ANY a2): rowcount = 1.0, cumulative cost = {0.0 
rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 926

{code}

Left join with empty tables should not do casting of data types to 
INT.
The result should be the same as for MergeJoin operator:
{code}
0: jdbc:drill:zk=local> alter session set `planner.enable_hashjoin` = false;
+---+---+
|  ok   |  summary  |
+---+---+
| true  | planner.enable_hashjoin updated.  |
+---+---+
1 row selected (0.087 seconds)
0: jdbc:drill:zk=local> alter session set `planner.enable_mergejoin` = true;
+---++
|  ok   |  summary   |
+---++
| true  | planner.enable_mergejoin updated.  |
+---++
1 row selected (0.073 seconds)
0: jdbc:drill:zk=local> select t1.a1, t1.b1, t2.a2, t2.b2 from 
dfs.`/home/vitalii/IdeaProjects/drill-fork/exec/java-exec/target/test-classes/jsoninput/nullable1.json`
 t1 left join 
dfs.`/home/vitalii/IdeaProjects/drill-fork/exec/java-exec/src/test/resources/project/pushdown/empty0.json`
 t2 on t1.b1 = t2.b2;
+-+---+---+---+
| a1  |  b1   |  a2   |  b2   |
+-+---+---+---+
| 1   | abc   | null  | null  |
| 2   | null  | null  | null  |
+-+---+---+---+
2 rows selected (0.624 seconds)
{code}



--
This message was sent by 

Proposed Slack Channel for Drill Users & Devs

2018-01-04 Thread Charles Givre
Hello everyone, 
I’m curious, if I were to start a Slack channel for Drill users and Devs, would 
there be interest?
— C

[jira] [Created] (DRILL-6069) Hash agg operator requires large memory amount when planner.width.max_per_node is large

2018-01-04 Thread Volodymyr Vysotskyi (JIRA)
Volodymyr Vysotskyi created DRILL-6069:
--

 Summary: Hash agg operator requires large memory amount when 
planner.width.max_per_node is large
 Key: DRILL-6069
 URL: https://issues.apache.org/jira/browse/DRILL-6069
 Project: Apache Drill
  Issue Type: Bug
Reporter: Volodymyr Vysotskyi


Queries, whose plan contains few HashAgg operators, requires large memory 
amount when planner.width.max_per_node is large. 
It may be observed using this physical plan:
{code:xml}
{
  "head" : {
"version" : 1,
"generator" : {
  "type" : "DefaultSqlHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
  "kind" : "BOOLEAN",
  "accessibleScopes" : "ALL",
  "name" : "planner.enable_decimal_data_type",
  "bool_val" : true,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "exec.hashagg.min_batches_per_partition",
  "num_val" : 1,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "planner.width.max_per_node",
  "num_val" : 2,
  "scope" : "SESSION"
}, {
  "kind" : "BOOLEAN",
  "accessibleScopes" : "ALL",
  "name" : "exec.errors.verbose",
  "bool_val" : true,
  "scope" : "SESSION"
} ],
"queue" : 0,
"hasResourcePlan" : false,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "parquet-scan",
"@id" : 131093,
"entries" : [ {
  "path" : "file:/tmp/parquet/ship_mode"
} ],
"storage" : {
  "type" : "file",
  "enabled" : true,
  "connection" : "file:///",
  "config" : null,
  "workspaces" : {
"root" : {
  "location" : "/",
  "writable" : false,
  "defaultInputFormat" : null
},
"tmp" : {
  "location" : 
"/home/mapr/drill/exec/java-exec/./target/org.apache.drill.exec.vector.complex.writer.TestJsonReader/dfsTestTmp/1514539108116-0",
  "writable" : true,
  "defaultInputFormat" : null
}
  },
  "formats" : {
"psv" : {
  "type" : "text",
  "extensions" : [ "tbl" ],
  "delimiter" : "|"
},
"csv" : {
  "type" : "text",
  "extensions" : [ "csv" ],
  "delimiter" : ","
},
"tsv" : {
  "type" : "text",
  "extensions" : [ "tsv" ],
  "delimiter" : "\t"
},
"httpd" : {
  "type" : "httpd",
  "logFormat" : "%h %t \"%r\" %>s %b \"%{Referer}i\""
},
"parquet" : {
  "type" : "parquet"
},
"json" : {
  "type" : "json",
  "extensions" : [ "json" ]
},
"pcap" : {
  "type" : "pcap"
},
"avro" : {
  "type" : "avro"
},
"sequencefile" : {
  "type" : "sequencefile",
  "extensions" : [ "seq" ]
},
"csvh" : {
  "type" : "text",
  "extensions" : [ "csvh" ],
  "extractHeader" : true,
  "delimiter" : ","
},
"txt" : {
  "type" : "text",
  "extensions" : [ "txt" ],
  "delimiter" : "\u"
},
"ssv" : {
  "type" : "text",
  "extensions" : [ "ssv" ],
  "delimiter" : " "
},
"csvh-test" : {
  "type" : "text",
  "extensions" : [ "csvh-test" ],
  "skipFirstLine" : true,
  "extractHeader" : true,
  "delimiter" : ","
}
  }
},
"format" : {
  "type" : "parquet"
},
"columns" : [ "`sm_ship_mode_sk`", "`sm_carrier`" ],
"selectionRoot" : "file:/tmp/parquet/ship_mode",
"filter" : "true",
"fileSet" : [ "/tmp/parquet/ship_mode/0_0_0.parquet" ],
"cost" : 20.0
  }, {
"pop" : "filter",
"@id" : 131090,
"child" : 131093,
"expr" : "booleanOr(equal(cast( (`sm_carrier` ) as VARCHAR(200) ), 
'ZOUROS') , equal(cast( (`sm_carrier` ) as VARCHAR(200) ), 'ZHOU') ) ",
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : 5.0
  }, {
"pop" : "selection-vector-remover",
"@id" : 131088,
"child" : 131090,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : 5.0
  }, {
"pop" : "project",
"@id" : 131086,
"exprs" : [ {
  "ref" : "`sm_ship_mode_sk`",
  "expr" : "cast( (`sm_ship_mode_sk` ) as INT )"
} ],
"child" : 131088,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : 5.0
  }, {
"pop" : "parquet-scan",
"@id" : 131107,
"entries" : [ {
  "path" : "file:/tmp/parquet/date_dim"
} ],
"storage" : {
  "type" : "file",
  "enabled" : true,
  "connection" : "file:///",
  "config" : null,
  "workspaces" : {
"root" : {