[GitHub] drill issue #1175: DRILL-6262: IndexOutOfBoundException in RecordBatchSize f...

2018-03-19 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1175
  
Thanks. See how confusing it is? I wrote the darn thing originally and even 
I can't keep the names straight... :-)


---


[GitHub] drill pull request #1175: DRILL-6262: IndexOutOfBoundException in RecordBatc...

2018-03-19 Thread sohami
Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/1175#discussion_r175629518
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
@@ -321,10 +321,8 @@ public ColumnSize(ValueVector v, String prefix) {
 
   // Calculate pure data size.
   if (isVariableWidth) {
-UInt4Vector offsetVector = ((RepeatedValueVector) 
v).getOffsetVector();
-int innerValueCount = 
offsetVector.getAccessor().get(valueCount);
 VariableWidthVector dataVector = ((VariableWidthVector) 
((RepeatedValueVector) v).getDataVector());
-totalDataSize = 
dataVector.getOffsetVector().getAccessor().get(innerValueCount);
+totalDataSize = dataVector.getCurrentSizeInBytes();
--- End diff --

@paul-rogers - I don't think `totalDataSize` includes both offset vector 
side and bytes size. It was meant to only include **pure data size only** for 
all entries in that column and that's what comment also suggests.

Instead `totalNetSize` includes the size for data and offset vector which 
is used for computing the rowWidth.


---


[GitHub] drill pull request #1175: DRILL-6262: IndexOutOfBoundException in RecordBatc...

2018-03-19 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1175#discussion_r175619764
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
@@ -321,10 +321,8 @@ public ColumnSize(ValueVector v, String prefix) {
 
   // Calculate pure data size.
   if (isVariableWidth) {
-UInt4Vector offsetVector = ((RepeatedValueVector) 
v).getOffsetVector();
-int innerValueCount = 
offsetVector.getAccessor().get(valueCount);
 VariableWidthVector dataVector = ((VariableWidthVector) 
((RepeatedValueVector) v).getDataVector());
-totalDataSize = 
dataVector.getOffsetVector().getAccessor().get(innerValueCount);
+totalDataSize = dataVector.getCurrentSizeInBytes();
--- End diff --

Good improvement. The original code exposes far too much of the 
implementation.

After all these changes, does the "dataSize" include both the offset vector 
and bytes? It should, else calls will be wrong. There are supposed to be three 
sizes:

* Payload size: actual data bytes.
* Data size: data + offsets + bits
* Overall size: full length of all vectors.

Payload size is what the user sees. Data size is how we calculate row width 
(since the rows must contain the overhead bytes). Vector length, here, only 
helps compute density, but is generated elsewhere. The point is, keep all three 
in mind, but keep the code separate. Otherwise, it is *very* easy to get 
confused and have the calculations blow up...


---


[GitHub] drill issue #1176: DRILL-6275: Fixed direct memory reporting in sys.memory.

2018-03-19 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1176
  
LGTM. Verified on a 4-node setup with running queries.
+1


---


[GitHub] drill pull request #1169: DRILL-6243: Added alert box to confirm shutdown of...

2018-03-19 Thread sohami
Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/1169#discussion_r175598697
  
--- Diff: exec/java-exec/src/main/resources/rest/index.ftl ---
@@ -272,17 +272,19 @@
   }
<#if model.shouldShowAdminInfo() || !model.isAuthEnabled()>
   function shutdown(button) {
-  var requestPath = "/gracefulShutdown";
-  var url = getRequestUrl(requestPath);
-  var result = $.ajax({
-type: 'POST',
-url: url,
-contentType : 'text/plain',
-complete: function(data) {
-alert(data.responseJSON["response"]);
-button.prop('disabled',true).css('opacity',0.5);
-}
-  });
+  if (confirm("Click ok to shutdown")) {
--- End diff --

Message should be more of like` "Are you sure you want to shutdown Drillbit 
running on + location.host + node ?"`


---


[GitHub] drill pull request #1169: DRILL-6243: Added alert box to confirm shutdown of...

2018-03-19 Thread sohami
Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/1169#discussion_r175598108
  
--- Diff: exec/java-exec/src/main/resources/rest/index.ftl ---
@@ -272,17 +272,19 @@
   }
<#if model.shouldShowAdminInfo() || !model.isAuthEnabled()>
   function shutdown(button) {
-  var requestPath = "/gracefulShutdown";
-  var url = getRequestUrl(requestPath);
-  var result = $.ajax({
-type: 'POST',
-url: url,
-contentType : 'text/plain',
-complete: function(data) {
-alert(data.responseJSON["response"]);
-button.prop('disabled',true).css('opacity',0.5);
-}
-  });
+  if (confirm("Click ok to shutdown")) {
+  var requestPath = "/gracefulShutdown";
+  var url = getRequestUrl(requestPath);
+  var result = $.ajax({
+   type: 'POST',
+   url: url,
+   contentType : 'text/plain',
+   complete: function(data) {
+alert(data.responseJSON["response"]);
+
button.prop('disabled',true).css('opacity',0.5);
+}
--- End diff --

please fix indentation here and below. Also add the `error: `callback for 
Ajax request. 
Like alert with received error ?


---


[GitHub] drill issue #1175: DRILL-6262: IndexOutOfBoundException in RecordBatchSize f...

2018-03-19 Thread ppadma
Github user ppadma commented on the issue:

https://github.com/apache/drill/pull/1175
  
LGTM. +1.


---


[GitHub] drill issue #1176: DRILL-6275: Fixed direct memory reporting in sys.memory.

2018-03-19 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1176
  
Tested fix manually on my laptop.


---


[GitHub] drill pull request #1176: DRILL-6275: Fixed direct memory reporting in sys.m...

2018-03-19 Thread ilooner
GitHub user ilooner opened a pull request:

https://github.com/apache/drill/pull/1176

DRILL-6275: Fixed direct memory reporting in sys.memory.

@kkhatua Thanks for pinpointing the root cause! Please review. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner/drill DRILL-6275

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1176.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1176


commit 7f65b7d4b4b9e42dc3597ac9758c39c6ce0903b7
Author: Timothy Farkas 
Date:   2018-03-19T20:16:37Z

DRILL-6275: Fixed direct memory reporting in sys.memory.




---


[jira] [Created] (DRILL-6276) Drill CTAS creates parquet file having page greater than 200 MB.

2018-03-19 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6276:
-

 Summary: Drill CTAS creates parquet file having page greater than 
200 MB.
 Key: DRILL-6276
 URL: https://issues.apache.org/jira/browse/DRILL-6276
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.13.0
Reporter: Robert Hou
 Attachments: alltypes_asc_16MB.json

I used this CTAS to create a parquet file from a json file:
{noformat}
create table `alltypes.parquet` as select cast(BigIntValue as BigInt) 
BigIntValue, cast(BooleanValue as Boolean) BooleanValue, cast (DateValue as 
Date) DateValue, cast (FloatValue as Float) FloatValue, cast (DoubleValue as 
Double) DoubleValue, cast (IntegerValue as Integer) IntegerValue, cast 
(TimeValue as Time) TimeValue, cast (TimestampValue as Timestamp) 
TimestampValue, cast (IntervalYearValue as INTERVAL YEAR) IntervalYearValue, 
cast (IntervalDayValue as INTERVAL DAY) IntervalDayValue, cast 
(IntervalSecondValue as INTERVAL SECOND) IntervalSecondValue, cast (BinaryValue 
as binary) Binaryvalue, cast (VarcharValue as varchar) VarcharValue from 
`alltypes.json`;
{noformat}

I ran parquet-tools/parquet-dump :

VarcharValue TV=6885 RL=0 DL=1


page 0:  DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:17240317 VC:6885

The page size is 16MB.  This is with a 16MB data set.  When I try a similar 1GB 
data set, the page size starts at over 200 MB, decreasing down to 1MB.

VarcharValue TV=208513 RL=0 DL=1


page 0:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:215243750 VC:87433
page 1:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:112350266 VC:43717
page 2:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:52501154 VC:21859
page 3:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:27725498 VC:10930
page 4:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:12181241 VC:5466
page 5:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:11005971 VC:2734
page 6:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1133237 VC:1797
page 7:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1462803 VC:899
page 8:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1050967 VC:490
page 9:   DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1051603 VC:424
page 10:  DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1050919 VC:378
page 11:  DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1050487 VC:345
page 12:  DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1050783 VC:319
page 13:  DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1052303 VC:299
page 14:  DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1053235 VC:282
page 15:  DLE:RLE RLE:BIT_PACKED VLE:PLAIN SZ:1055979 VC:268

The column has a varchar, and the size varies from 2 bytes to 5000 bytes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] drill issue #1175: DRILL-6262: IndexOutOfBoundException in RecordBatchSize f...

2018-03-19 Thread sohami
Github user sohami commented on the issue:

https://github.com/apache/drill/pull/1175
  
@bitblender - Updated the test. Please review.


---


Drill Hangout tomorrow at 10 am PST

2018-03-19 Thread Boaz Ben-Zvi
 We will have our bi-weekly hangout tomorrow March 20th at 10 am PST.
Please reply to this post with proposed topics to discuss.

If these proposed topics won’t take much time, then the remainder would be used 
to present the design of the Hash-Join Spill; I will describe the basic 
changes, and Tim will talk briefly about the memory calculator used in deciding 
“when to spill”. 
These basic changes to the Hash-Join for spilling are described in the design 
document: 
   
https://docs.google.com/document/d/1-c_oGQY4E5d58qJYv_zc7ka834hSaB3wDQwqKcMoSAI/edit#heading=h.6wkihnd871vj


   The Hangout link: 
 
   https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

   Thank you,

   Boaz





[jira] [Created] (DRILL-6275) drillbit direct_current memory usage is not populated/updated

2018-03-19 Thread Chun Chang (JIRA)
Chun Chang created DRILL-6275:
-

 Summary: drillbit direct_current memory usage is not 
populated/updated
 Key: DRILL-6275
 URL: https://issues.apache.org/jira/browse/DRILL-6275
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Affects Versions: 1.13.0
Reporter: Chun Chang


We used to keep track drill memory usage in sys.memory. And it was useful in 
detecting memory leaks. This feature seems broken. The direct_current memory 
usage is not populated or updated.

{noformat}
0: jdbc:drill:zk=10.10.30.166:5181> select * from sys.memory;
+---++---+-+-+-+--+
| hostname | user_port | heap_current | heap_max | direct_current | 
jvm_direct_current | direct_max |
+---++---+-+-+-+--+
| 10.10.30.168 | 31010 | 1162636800 | 2147483648 | 0 | 22096 | 10737418240 |
| 10.10.30.169 | 31010 | 1301175040 | 2147483648 | 0 | 22096 | 10737418240 |
| 10.10.30.166 | 31010 | 989448872 | 2147483648 | 0 | 22096 | 10737418240 |
| 10.10.30.167 | 31010 | 1767205312 | 2147483648 | 0 | 22096 | 10737418240 |
+---++---+-+-+-+--+
4 rows selected (1.564 seconds)
0: jdbc:drill:zk=10.10.30.166:5181> select * from sys.version;
+--+---+---++-++
| version | commit_id | commit_message | commit_time | build_email | build_time 
|
+--+---+---++-++
| 1.13.0-SNAPSHOT | 534212456cc25a49272838cba91c223f63df7fd2 | Cleanup when 
closing, and cleanup spill after a kill | 07.03.2018 @ 16:18:27 PST | 
inram...@gmail.com | 08.03.2018 @ 10:09:28 PST |
+--+---+---++-++
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6274) MergeJoin Memory Manager is still using Fragmentation Factor

2018-03-19 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-6274:


 Summary: MergeJoin Memory Manager is still using Fragmentation 
Factor
 Key: DRILL-6274
 URL: https://issues.apache.org/jira/browse/DRILL-6274
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.13.0
Reporter: Sorabh Hamirwasia
Assignee: Padma Penumarthy
 Fix For: 1.14.0


MergeJoinMemoryManager is using 
[WORST_CASE_FRAGMENTATION_FACTOR|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java#L156]
 for memory computation in outgoing batch. This needs to be updated to not use 
it anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] drill issue #1152: DRILL-6199: Add support for filter push down and partitio...

2018-03-19 Thread priteshm
Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1152
  
Thanks, @chunhui-shi - marked it as ready-to-commit since the original 
feature was already merged to 1.13. The batch committer this week can take 
another look as well.


---


[GitHub] drill issue #1152: DRILL-6199: Add support for filter push down and partitio...

2018-03-19 Thread chunhui-shi
Github user chunhui-shi commented on the issue:

https://github.com/apache/drill/pull/1152
  
+1, good to me.


---


[GitHub] drill issue #1152: DRILL-6199: Add support for filter push down and partitio...

2018-03-19 Thread priteshm
Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1152
  
@HanumathRao, @chunhui-shi any more comments from you? 


---


[GitHub] drill issue #1175: DRILL-6262: IndexOutOfBoundException in RecordBatchSize f...

2018-03-19 Thread sohami
Github user sohami commented on the issue:

https://github.com/apache/drill/pull/1175
  
@ppadma - Please review


---


[GitHub] drill pull request #1175: DRILL-6262: IndexOutOfBoundException in RecordBatc...

2018-03-19 Thread sohami
GitHub user sohami opened a pull request:

https://github.com/apache/drill/pull/1175

DRILL-6262: IndexOutOfBoundException in RecordBatchSize for empty var…

…iableWidthVector

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sohami/drill DRILL-6262

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1175


commit dbde22ea37c486b483601ab551e7fc7c23fb96b8
Author: Sorabh Hamirwasia 
Date:   2018-03-16T23:57:12Z

DRILL-6262: IndexOutOfBoundException in RecordBatchSize for empty 
variableWidthVector




---


[jira] [Created] (DRILL-6273) Remove dependency licensed under Category X

2018-03-19 Thread Vlad Rozov (JIRA)
Vlad Rozov created DRILL-6273:
-

 Summary: Remove dependency licensed under Category X
 Key: DRILL-6273
 URL: https://issues.apache.org/jira/browse/DRILL-6273
 Project: Apache Drill
  Issue Type: Task
Reporter: Vlad Rozov
 Fix For: 1.14.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6272) Remove binary jars files from source distribution

2018-03-19 Thread Vlad Rozov (JIRA)
Vlad Rozov created DRILL-6272:
-

 Summary: Remove binary jars files from source distribution
 Key: DRILL-6272
 URL: https://issues.apache.org/jira/browse/DRILL-6272
 Project: Apache Drill
  Issue Type: Task
Reporter: Vlad Rozov
 Fix For: 1.14.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6271) Update copyright range in NOTICE

2018-03-19 Thread Vlad Rozov (JIRA)
Vlad Rozov created DRILL-6271:
-

 Summary: Update copyright range in NOTICE
 Key: DRILL-6271
 URL: https://issues.apache.org/jira/browse/DRILL-6271
 Project: Apache Drill
  Issue Type: Task
Reporter: Vlad Rozov
 Fix For: 1.14.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6270) Add debug startup option flag for drill in embedded and server mode

2018-03-19 Thread Volodymyr Tkach (JIRA)
Volodymyr Tkach created DRILL-6270:
--

 Summary: Add debug startup option flag for drill in embedded and 
server mode
 Key: DRILL-6270
 URL: https://issues.apache.org/jira/browse/DRILL-6270
 Project: Apache Drill
  Issue Type: Task
Reporter: Volodymyr Tkach
Assignee: Anton Gozhiy


Add possibility to run sqlline.sh and drillbit.sh scripts with -- 
with standard java remote debug options with the ability to override port.

Example: drillbit.sh start - 50001



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] drill issue #1174: DRILL-6250: Sqlline start command with password appears i...

2018-03-19 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1174
  
+1


---


[GitHub] drill issue #1174: DRILL-6250: Sqlline start command with password appears i...

2018-03-19 Thread vladimirtkach
Github user vladimirtkach commented on the issue:

https://github.com/apache/drill/pull/1174
  
@arina-ielchiieva addressed code review  comments


---


[GitHub] drill pull request #1174: DRILL-6250: Sqlline start command with password ap...

2018-03-19 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1174#discussion_r175368114
  
--- Diff: 
common/src/main/java/org/apache/drill/common/config/DrillConfig.java ---
@@ -52,8 +52,8 @@
   public DrillConfig(Config config) {
 super(config);
 logger.debug("Setting up DrillConfig object.");
-logger.trace("Given Config object is:\n{}",
- config.root().render(ConfigRenderOptions.defaults()));
+logger.trace("Given Config object is:\n{}", 
config.withoutPath("password").withoutPath("sun.java.command")
--- End diff --

Please add comment why we exclude `sun.jaba.command`.


---


Re: [ANNOUNCE] Apache Drill release 1.13.0

2018-03-19 Thread Arina Ielchiieva
One more point, from this release Apache Drill no longer supports JDK7 and
fully moved to JDK8.

Kind regards
Arina

On Mon, Mar 19, 2018 at 5:51 AM, Abhishek Girish  wrote:

> Congratulations everyone, on yet another great release of Apache Drill!
> On Mon, Mar 19, 2018 at 6:57 AM Parth Chandra  wrote:
>
> > On behalf of the Apache Drill community, I am happy to announce the
> > release of
> > Apache Drill 1.13.0.
> >
> > For information about Apache Drill, and to get involved, visit the
> > project website
> > [1].
> >
> > This release of Drill provides the following new features and
> improvements:
> >
> > - YARN support for Drill [DRILL-1170
> > ]
> >
> > - Support HTTP Kerberos auth using SPNEGO [DRILL-5425
> > ]
> >
> > - Support SQL syntax highlighting of queries [DRILL-5868
> > ]
> >
> > - Drill should support user/distribution specific configuration checks
> > during startup [DRILL-6068
> > ]
> >
> > - Upgrade DRILL to Calcite 1.15.0 [DRILL-5966
> > ]
> >
> > - Batch Sizing improvements to reduce memory footprint of operators
> >
> > - [DRILL-6071 <
> > https://issues.apache.org/jira/browse/DRILL-6071>]
> > - Limit batch size for flatten operator
> >
> > - [DRILL-6126 <
> > https://issues.apache.org/jira/browse/DRILL-6126>]
> > - Allocate memory for value vectors upfront in flatten operator
> >
> > - [DRILL-6123 <
> > https://issues.apache.org/jira/browse/DRILL-6123>]
> > - Limit batch size for Merge Join based on memory.
> >
> > - [DRILL-6177 <
> > https://issues.apache.org/jira/browse/DRILL-6177>]
> > - Merge Join - Allocate memory for outgoing value vectors based on sizes
> of
> > incoming batches.
> >
> >
> > For the full list please see release notes [2].
> >
> > The binary and source artifacts are available here [3].
> >
> > Thanks to everyone in the community who contributed to this release!
> >
> > 1. https://drill.apache.org/
> > 2. https://drill.apache.org/docs/apache-drill-1-13-0-release-notes/
> > 3. https://drill.apache.org/download/
> >
>