[GitHub] drill pull request #1082: DRILL-5741: Automatically manage memory allocation...

2018-01-09 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1082#discussion_r160606654
  
--- Diff: distribution/src/resources/drill-config.sh ---
@@ -180,18 +251,46 @@ else
   fi
 fi
 
-# Default memory settings if none provided by the environment or
+# Checking if being executed in context of Drillbit and not SQLLine
+if [ "$DRILLBIT_CONTEXT" == "1" ]; then 
+  # *-auto.sh allows for distrib/user specific checks to be done
+  distribAuto="$DRILL_CONF_DIR/distrib-auto.sh"
+  if [ ! -r "$distribAuto" ]; then 
distribAuto="$DRILL_HOME/conf/distrib-auto.sh"; fi
+  if [ ! -r "$distribAuto" ]; then distribAuto=""; fi
+  drillAuto="$DRILL_CONF_DIR/drill-auto.sh"
+  if [ ! -r "$drillAuto" ]; then 
drillAuto="$DRILL_HOME/conf/drill-auto.sh"; fi
--- End diff --

Agreed. The drill-config file follows this for `drill-env.sh`, but checks 
for both locations in case of `distrib-env.sh`.


---


[GitHub] drill pull request #1082: DRILL-5741: Automatically manage memory allocation...

2018-01-09 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1082#discussion_r160606258
  
--- Diff: distribution/src/resources/distrib-auto.sh ---
@@ -0,0 +1,223 @@
+#!/usr/bin/env bash
--- End diff --

Wasn't sure which would be a a suitable place. I reasoned the "Apache" 
distribution is what we are defining it for, hence the choice


---


[GitHub] drill pull request #1082: DRILL-5741: Automatically manage memory allocation...

2018-01-09 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1082#discussion_r160606013
  
--- Diff: distribution/src/assemble/bin.xml ---
@@ -345,6 +345,16 @@
   0755
   conf
 
+
+  src/resources/drill-auto.sh
--- End diff --

AutoConfig was the intent, but looked a bit verbose. How about 
`drill-autocheck.sh` ? :)


---


[GitHub] drill issue #1087: Attempt to fix memory leak in Parquet

2018-01-09 Thread sachouche
Github user sachouche commented on the issue:

https://github.com/apache/drill/pull/1087
  
@parthchandra can you please review this fix?


---


[GitHub] drill pull request #1087: Attempt to fix memory leak in Parquet

2018-01-09 Thread sachouche
GitHub user sachouche opened a pull request:

https://github.com/apache/drill/pull/1087

Attempt to fix memory leak in Parquet

** Problem Description **
This is an extremely rare leak which I was able to emulate by putting a 
sleep in the AsyncPageReader right after reading the page and before enqueue in 
the result queue. This is how this issue could manifest itself in real life 
scenario:
- AsyncPageReader reads a page into a buffer but didn't enqueue yet the 
result (thread got preempted)
- Parquet Scan thread blocked waiting on the task (Future object dequeued)
- Cancel received and Scan thread interrupted 
- Future.get() returns (Future object is lost)
- Scan thread executes release logic
- Scan thread is not able to interrupt the AsyncPageReader thread since the 
future object is lost
-  AsyncPageReader thread resumes and enqueues the DrillBuf in the result 
queue
- This results in a leak since this buffer is not properly released

** Fix Description **
- The fix is straightforward as we peek the Future object during the 
blocking get() method
- This way, an exception (such as an interrupt) will leave the Future 
object in the task queue
- The cleanup logic will be able to guarantee the DrillBuf object is either 
GCed by the AsyncPageReader or ParquetScan thread

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sachouche/drill DRILL-6079

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1087.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1087


commit 52030d1d9cc3b8992a10ade8c7126d66e785043a
Author: Salim Achouche 
Date:   2017-12-22T19:50:56Z

Attempt to fix memory leak in Parquet




---


[GitHub] drill issue #1001: JIRA DRILL-5879: Like operator performance improvements

2018-01-09 Thread priteshm
Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1001
  
@sachouche can you update this PR?


---


Re: Parquet MR version

2018-01-09 Thread Vlad Rozov
Support for byte buffers aka "PARQUET-77: ByteBuffer use in read and 
write paths" is included both to 1.8.2 and 1.9.0 parquet-mr. Based on 
the following discussion 
https://lists.apache.org/thread.html/e1a3d1e49d3b775cb5e1417c882dad7bd2f038e00237ff2a9057c608@1454714357@%3Cdev.drill.apache.org%3E 
Drill should move to the released version of parquet-mr. I'll file a 
JIRA and summarize my findings.


Thank you,

Vlad

On 1/5/18 02:23, salim achouche wrote:

I don’t think it will be a lot of work to do so as the main changes (from the 
Parquet branch) is related to passing direct buffers vs using Java heap.

FYI - I would also think that we should contribute to the Parquet code base so 
that an abstraction is used on top of byte arrays (e.g., ByteBuffer) so that 
Drill will not need to use a private Parquet branch. I guess the key is to 
figure out why such an approach was not followed in the first place.

Regards,
Salim


On Jan 5, 2018, at 2:37 AM, Vlad Rozov  wrote:

Hi everyone,

With parquet-mr 1.9.0 released more than 1 year ago and parquet-mr 1.8.2 almost 
a year ago, should the drill dependency on parquet-mr be updated from the 
current custom 1.8.1-drill-r0? As far as I can tell both 1.8.2 and 1.9.0 have 
fixes from the 1.8.1-drill-r0 patch. Will the community prefer 1.9.0 over 1.8.2 
even if 1.9.0 may introduce some incompatibilities?

Thank you,

Vlad




[GitHub] drill issue #367: DRILL-4364: Image Metadata Format Plugin

2018-01-09 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/367
  
@nagix Did you get a chance to rebase this commit?


---


[GitHub] drill issue #456: DRILL-4566: TDigest, median, and quantile functions

2018-01-09 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/456
  
Is anyone reviewing this?


---


[jira] [Created] (DRILL-6079) Memory leak caused by ParquetRowGroupScan

2018-01-09 Thread salim achouche (JIRA)
salim achouche created DRILL-6079:
-

 Summary: Memory leak caused by ParquetRowGroupScan
 Key: DRILL-6079
 URL: https://issues.apache.org/jira/browse/DRILL-6079
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Reporter: salim achouche
Assignee: salim achouche
Priority: Minor


Concurrency tests with assertion enabled indicate a memory leak in the Parquet 
scanner code:

2017-10-25 17:28:52,149 [260ed3bc-dbdf-8b4a-66d6-b2bd804e8c74:frag:11:63] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
Memory was leaked by query. Memory leaked: (2097152)
Allocator(op:11:63:6:ParquetRowGroupScan) 100/2097152/7348224/100 
(res/actual/peak/limit)


Fragment 11:63

[Error Id: 5927becb-f000-43db-95b5-93b33064a6fd on mperf113.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
IllegalStateException: Memory was leaked by query. Memory leaked: (2097152)
Allocator(op:11:63:6:ParquetRowGroupScan) 100/2097152/7348224/100 
(res/actual/peak/limit)


Fragment 11:63

[Error Id: 5927becb-f000-43db-95b5-93b33064a6fd on mperf113.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
 ~[drill-common-1.11.0-mapr.jar:1.11.0-mapr]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:301)
 [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
 [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267)
 [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.11.0-mapr.jar:1.11.0-mapr]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_121]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
Caused by: java.lang.IllegalStateException: Memory was leaked by query. Memory 
leaked: (2097152)
Allocator(op:11:63:6:ParquetRowGroupScan) 100/2097152/7348224/100 
(res/actual/peak/limit)

at 
org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:520) 
~[drill-memory-base-1.11.0-mapr.jar:1.11.0-mapr]
at 
org.apache.drill.exec.ops.AbstractOperatorExecContext.close(AbstractOperatorExecContext.java:86)
 ~[drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
at 
org.apache.drill.exec.ops.OperatorContextImpl.close(OperatorContextImpl.java:108)
 ~[drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
at 
org.apache.drill.exec.ops.FragmentContext.suppressingClose(FragmentContext.java:435)
 ~[drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
at 
org.apache.drill.exec.ops.FragmentContext.close(FragmentContext.java:424) 
~[drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:324)
 [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155)
 [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr]
... 5 common frames omitted
2017-10-25 17:28:52,149 [260ed3bc-dbdf-8b4a-66d6-b2bd804e8c74:frag:6:0] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
Memory was leaked by query. Memory leaked: (2097152)
Allocator(op:6:0:5:ParquetRowGroupScan) 100/2097152/19947520/100 
(res/actual/peak/limit)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

2018-01-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1059#discussion_r160542798
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java
 ---
@@ -333,4 +339,55 @@ public void testNlJoinWithLargeRightInputSuccess() 
throws Exception {
   test(RESET_JOIN_OPTIMIZATION);
 }
   }
+
+  private static void buildFile(String fileName, String[] data, File 
testDir) throws IOException {
+try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, 
fileName {
+  for (String line : data) {
+out.println(line);
+  }
+}
+  }
+
+  public static void testWithEmptyJoin(File testDir, String joinType,
+String joinPattern, long result) throws 
Exception {
+buildFile("dept.json", new String[0], testDir);
+String query = String.format(testEmptyJoin, joinType);
+testPlanMatchingPatterns(query, new String[]{joinPattern}, new 
String[]{});
+testBuilder()
+.sqlQuery(query)
+.unOrdered()
+.baselineColumns("cnt")
+.baselineValues(result)
+.build().run();
+  }
+
+  @Test
+  public void testNestedLeftJoinWithEmptyTable() throws Exception {
+try {
+  test(DISABLE_NLJ_SCALAR);
+  testWithEmptyJoin(dirTestWatcher.getRootDir(), "left outer", 
nlpattern, 1155L);
+} finally {
+  test(RESET_HJ);
--- End diff --

Please use `client.resetSessionOption(...)`.


---


[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

2018-01-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1059#discussion_r160542923
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestNestedLoopJoin.java
 ---
@@ -333,4 +339,55 @@ public void testNlJoinWithLargeRightInputSuccess() 
throws Exception {
   test(RESET_JOIN_OPTIMIZATION);
 }
   }
+
+  private static void buildFile(String fileName, String[] data, File 
testDir) throws IOException {
+try(PrintWriter out = new PrintWriter(new FileWriter(new File(testDir, 
fileName {
+  for (String line : data) {
+out.println(line);
+  }
+}
+  }
+
+  public static void testWithEmptyJoin(File testDir, String joinType,
+String joinPattern, long result) throws 
Exception {
+buildFile("dept.json", new String[0], testDir);
+String query = String.format(testEmptyJoin, joinType);
+testPlanMatchingPatterns(query, new String[]{joinPattern}, new 
String[]{});
+testBuilder()
+.sqlQuery(query)
+.unOrdered()
+.baselineColumns("cnt")
+.baselineValues(result)
+.build().run();
+  }
+
+  @Test
+  public void testNestedLeftJoinWithEmptyTable() throws Exception {
+try {
+  test(DISABLE_NLJ_SCALAR);
--- End diff --

Please use `client.setSessionOption(...)` to make it clear what we are 
doing here.


---


[GitHub] drill pull request #1059: DRILL-5851: Empty table during a join operation wi...

2018-01-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1059#discussion_r160540854
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/AbstractRecordBatch.java
 ---
@@ -228,4 +228,20 @@ public WritableBatch getWritableBatch() {
   public VectorContainer getOutgoingContainer() {
 throw new UnsupportedOperationException(String.format(" You should not 
call getOutgoingContainer() for class %s", this.getClass().getCanonicalName()));
   }
+
+  public void drainStream(IterOutcome stream, int input, RecordBatch 
batch) {
+if (stream == IterOutcome.OK_NEW_SCHEMA || stream == IterOutcome.OK) {
+  for (final VectorWrapper wrapper : batch) {
+wrapper.getValueVector().clear();
+  }
+  batch.kill(true);
+  stream = next(input, batch);
+  while (stream == IterOutcome.OK_NEW_SCHEMA || stream == 
IterOutcome.OK) {
+for (final VectorWrapper wrapper : batch) {
+  wrapper.getValueVector().clear();
+}
+stream = next(input, batch);
+  }
--- End diff --

In one-on-one discussions it became clear that fixing this issue is out of 
the scope of this particular task. That's fine. Please just file a JIRA to 
track the potential problem so we don't forget about it.


---


[GitHub] drill pull request #1082: DRILL-5741: Automatically manage memory allocation...

2018-01-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1082#discussion_r160537845
  
--- Diff: distribution/src/resources/distrib-auto.sh ---
@@ -0,0 +1,223 @@
+#!/usr/bin/env bash
--- End diff --

In general, "distrib" files are meant to be added by Drill distributions, 
not by Drill itself. If we want to provide the memory checking in Drill, then 
it should go into some other file other than "distrib."


---


[GitHub] drill pull request #1082: DRILL-5741: Automatically manage memory allocation...

2018-01-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1082#discussion_r160538214
  
--- Diff: distribution/src/assemble/bin.xml ---
@@ -345,6 +345,16 @@
   0755
   conf
 
+
+  src/resources/drill-auto.sh
--- End diff --

"-auto"? Is this for the self-driving car feature of Drill? :-)

Is this meant to be for startup-time environment checks? For other add-on 
processing? Perhaps we can pick a name that makes that a bit clearer.


---


[GitHub] drill pull request #1082: DRILL-5741: Automatically manage memory allocation...

2018-01-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1082#discussion_r160539584
  
--- Diff: distribution/src/resources/drill-config.sh ---
@@ -180,18 +251,46 @@ else
   fi
 fi
 
-# Default memory settings if none provided by the environment or
+# Checking if being executed in context of Drillbit and not SQLLine
+if [ "$DRILLBIT_CONTEXT" == "1" ]; then 
+  # *-auto.sh allows for distrib/user specific checks to be done
+  distribAuto="$DRILL_CONF_DIR/distrib-auto.sh"
+  if [ ! -r "$distribAuto" ]; then 
distribAuto="$DRILL_HOME/conf/distrib-auto.sh"; fi
+  if [ ! -r "$distribAuto" ]; then distribAuto=""; fi
+  drillAuto="$DRILL_CONF_DIR/drill-auto.sh"
+  if [ ! -r "$drillAuto" ]; then 
drillAuto="$DRILL_HOME/conf/drill-auto.sh"; fi
--- End diff --

This part is just a bit tricky. Drill- and distribution- specific files do, 
in fact, reside in `$DRILL_HOME/conf`. But, user files reside in a number of 
places. So, please use:

- `$DRILL_HOME/conf` for `distrib-auto.sh`
- `$DRILL_CONF_DIR` for user-supplied files (such as `drill-auto.sh`)


---


[GitHub] drill pull request #1082: DRILL-5741: Automatically manage memory allocation...

2018-01-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1082#discussion_r160539726
  
--- Diff: distribution/src/resources/drill-config.sh ---
@@ -180,18 +251,46 @@ else
   fi
 fi
 
-# Default memory settings if none provided by the environment or
+# Checking if being executed in context of Drillbit and not SQLLine
+if [ "$DRILLBIT_CONTEXT" == "1" ]; then 
+  # *-auto.sh allows for distrib/user specific checks to be done
+  distribAuto="$DRILL_CONF_DIR/distrib-auto.sh"
+  if [ ! -r "$distribAuto" ]; then 
distribAuto="$DRILL_HOME/conf/distrib-auto.sh"; fi
+  if [ ! -r "$distribAuto" ]; then distribAuto=""; fi
+  drillAuto="$DRILL_CONF_DIR/drill-auto.sh"
+  if [ ! -r "$drillAuto" ]; then 
drillAuto="$DRILL_HOME/conf/drill-auto.sh"; fi
+  if [ ! -r "$drillAuto" ]; then drillAuto=""; fi
+
+  # Enforcing checks in order (distrib-auto.sh , drill-auto.sh)
+  # (NOTE: A script is executed only if it has relevant executable lines)
+  if [ -n "$distribAuto" ] && [ $(executableLineCount $distribAuto) -gt 0 
]; then
+. "$distribAuto"
+if [ $? -gt 0 ]; then fatal_error "Aborting Drill Startup due failed 
checks from $distribAuto"; fi
+  fi
+  if [ -n "$drillAuto" ] && [ $(executableLineCount $drillAuto) -gt 0 ]; 
then
--- End diff --

To make the code a bit simpler, touch each file once:

* Check if it exists.
* Invoke it if so.
* Handle any errors that are reported.



---


[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

2018-01-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160525590
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetGroupScan.java
 ---
@@ -56,65 +56,50 @@ private void prepareTables(final String tableName, 
boolean refreshMetadata) thro
   public void testFix4376() throws Exception {
 prepareTables("4376_1", true);
 
-testBuilder()
-  .sqlQuery("SELECT COUNT(*) AS `count` FROM dfs.tmp.`4376_1/60*`")
-  .ordered()
-  .baselineColumns("count").baselineValues(1984L)
-  .go();
+int actualRecordCount = testSql("SELECT * FROM dfs.tmp.`4376_1/60*`");
+int expectedRecordCount = 1984;
+assertEquals(String.format("Received unexpected number of rows in 
output: expected=%d, received=%s",
--- End diff --

`expected=%d` -> `expected = %d`. Please correct here and in other cases.
  


---


[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

2018-01-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160525146
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BaseTestQuery.java ---
@@ -119,6 +125,15 @@ public static void setupDefaultTestCluster() throws 
Exception {
 // turns on the verbose errors in tests
 // sever side stacktraces are added to the message before sending back 
to the client
 test("ALTER SESSION SET `exec.errors.verbose` = true");
+emptyDirCreating();
+  }
+
+  /**
+   * Creates an empty directory under dfs.root schema.
+   */
+  private static void emptyDirCreating() {
--- End diff --

Please rename -> `createEmptyDirectory`.


---


[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

2018-01-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160526940
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java
 ---
@@ -250,20 +250,12 @@ private boolean metaDataFileExists(FileSystem fs, 
FileStatus dir) throws IOExcep
 }
 
 boolean isDirReadable(DrillFileSystem fs, FileStatus dir) {
-  Path p = new Path(dir.getPath(), 
ParquetFileWriter.PARQUET_METADATA_FILE);
   try {
-if (fs.exists(p)) {
-  return true;
-} else {
-
-  if (metaDataFileExists(fs, dir)) {
-return true;
-  }
-  List statuses = DrillFileSystemUtil.listFiles(fs, 
dir.getPath(), false);
-  return !statuses.isEmpty() && super.isFileReadable(fs, 
statuses.get(0));
-}
+// There should be at least one file, which is readable by Drill
+List statuses = DrillFileSystemUtil.listFiles(fs, 
dir.getPath(), false);
+return !statuses.isEmpty() && super.isFileReadable(fs, 
statuses.get(0));
--- End diff --

Why did you remove `metaDataFileExists(fs, dir)` check? 
`DrillFileSystemUtil` won't list dot files.
 Plus it seems you break the assumptions highlighted in `isReadable` method 
above.


---


[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

2018-01-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160524662
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/SchemalessScan.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.base;
+
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.PhysicalOperatorSetupException;
+import org.apache.drill.exec.planner.logical.DynamicDrillTable;
+import org.apache.drill.exec.proto.CoordinationProtos;
+
+import java.util.List;
+
+/**
+ *  The type of scan operator, which allows to scan schemaless tables 
({@link DynamicDrillTable} with null selection)
+ */
+@JsonTypeName("schemaless-scan")
+public class SchemalessScan extends AbstractGroupScan implements SubScan {
+
+  public SchemalessScan(String userName) {
+super(userName);
+  }
+
+  public SchemalessScan(AbstractGroupScan that) {
+super(that);
+  }
+
+  @Override
+  public void applyAssignments(List 
endpoints) throws PhysicalOperatorSetupException {
+  }
+
+  @Override
+  public SubScan getSpecificScan(int minorFragmentId) throws 
ExecutionSetupException {
+return this;
+  }
+
+  @Override
+  public int getMaxParallelizationWidth() {
+return 1;
+  }
+
+  @Override
+  public String getDigest() {
+return toString();
--- End diff --

What value `toString` will return?


---


[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

2018-01-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160525926
  
--- Diff: exec/java-exec/src/test/java/org/apache/drill/TestUnionAll.java 
---
@@ -1197,4 +1197,64 @@ public void testFieldWithDots() throws Exception {
   .baselineValues("1", "2", "1", null, "a")
   .go();
   }
-}
\ No newline at end of file
+
+  @Test
+  public void testUnionAllRightEmptyDir() throws Exception {
--- End diff --

1. Does union (aka union distinct) also work fine?
2. Case when bot directories in union / union all are empty?
  


---


[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

2018-01-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160525739
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/TestEmptyInputSql.java ---
@@ -177,4 +177,33 @@ public void testQueryEmptyCsv() throws Exception {
 .run();
   }
 
+  @Test
+  public void testEmptyDirectory() throws Exception {
+final BatchSchema expectedSchema = new SchemaBuilder().build();
+
+testBuilder()
+.sqlQuery("select * from dfs.`%s`", EMPTY_DIR_NAME)
+.schemaBaseLine(expectedSchema)
+.build()
+.run();
+  }
+
+  @Test
+  public void testEmptyDirectoryAndFieldInQuery() throws Exception {
+final List> expectedSchema = 
Lists.newArrayList();
+final TypeProtos.MajorType majorType = 
TypeProtos.MajorType.newBuilder()
+.setMinorType(TypeProtos.MinorType.INT) // field "key" is absent 
in schema-less table
--- End diff --

schema-less -> schemaless


---


[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

2018-01-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160524828
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/RefreshMetadataHandler.java
 ---
@@ -78,19 +78,24 @@ public PhysicalPlan getPlan(SqlNode sqlNode) throws 
ValidationException, RelConv
 
   final Table table = schema.getTable(tableName);
 
-  if(table == null){
+  if(table == null) {
--- End diff --

Please add space before `if`, here and below.


---


[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

2018-01-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r160525303
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/BaseTestQuery.java ---
@@ -119,6 +125,15 @@ public static void setupDefaultTestCluster() throws 
Exception {
 // turns on the verbose errors in tests
 // sever side stacktraces are added to the message before sending back 
to the client
 test("ALTER SESSION SET `exec.errors.verbose` = true");
+emptyDirCreating();
--- End diff --

Why do we create empty directory for all tests that extend this class? I 
guess need to create it only for those tests that need it.


---


[GitHub] drill issue #652: DRILL-4990:Use new HDFS API access instead of listStatus t...

2018-01-09 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/652
  
@ppadma can you rebase this with the current master, and include the 
workaround? It does not make sense to hold up this commit for so long if a 
workaround for the Windows platform is sufficient.


---


[GitHub] drill issue #440: Elasticsearch storage plugin

2018-01-09 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/440
  
@hamdanuk this commit has too many commits, which makes the PR nearly 
impossible to review. Would you like to rebase your work on the latest Drill 
1.12.0 and try again?


---


[GitHub] drill pull request #1060: DRILL-5846: Improve parquet performance for Flat D...

2018-01-09 Thread sachouche
Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1060#discussion_r160514755
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/VLAbstractEntryReader.java
 ---
@@ -0,0 +1,215 @@

+/***
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ 
**/
+package org.apache.drill.exec.store.parquet.columnreaders;
+
+import java.nio.ByteBuffer;
+
+import 
org.apache.drill.exec.store.parquet.columnreaders.VLColumnBulkInput.ColumnPrecisionInfo;
+import 
org.apache.drill.exec.store.parquet.columnreaders.VLColumnBulkInput.PageDataInfo;
+import org.apache.drill.exec.util.MemoryUtils;
+
+/** Abstract class for sub-classes implementing several algorithms for 
loading a Bulk Entry */
+abstract class VLAbstractEntryReader {
+
+  /** byte buffer used for buffering page data */
+  protected final ByteBuffer buffer;
+  /** Page Data Information */
+  protected final PageDataInfo pageInfo;
+  /** expected precision type: fixed or variable length */
+  protected final ColumnPrecisionInfo columnPrecInfo;
+  /** Bulk entry */
+  protected final VLColumnBulkEntry entry;
+
+  /**
+   * CTOR.
+   * @param _buffer byte buffer for data buffering (within CPU cache)
+   * @param _pageInfo page being processed information
+   * @param _columnPrecInfo column precision information
+   * @param _entry reusable bulk entry object
+   */
+  VLAbstractEntryReader(ByteBuffer _buffer,
+PageDataInfo _pageInfo,
+ColumnPrecisionInfo _columnPrecInfo,
+VLColumnBulkEntry _entry) {
+
+this.buffer = _buffer;
+this.pageInfo   = _pageInfo;
+this.columnPrecInfo = _columnPrecInfo;
+this.entry  = _entry;
+  }
+
+  /**
+   * @param valuesToRead maximum values to read within the current page
+   * @return a bulk entry object
+   */
+  abstract VLColumnBulkEntry getEntry(int valsToReadWithinPage);
+
+  /**
+   * Indicates whether to use bulk processing
+   */
+  protected final boolean bulkProcess() {
+return columnPrecInfo.bulkProcess;
+  }
+
+  /**
+   * Loads new data into the buffer if empty or the force flag is set.
+   *
+   * @param force flag to force loading new data into the buffer
+   */
+  protected final boolean load(boolean force) {
+
+if (!force && buffer.hasRemaining()) {
+  return true; // NOOP
+}
+
+// We're here either because the buffer is empty or we want to force a 
new load operation.
+// In the case of force, there might be unprocessed data (still in the 
buffer) which is fine
+// since the caller updates the page data buffer's offset only for the 
data it has consumed; this
+// means unread data will be loaded again but this time will be 
positioned in the beginning of the
+// buffer. This can happen only for the last entry in the buffer when 
either of its length or value
+// is incomplete.
+buffer.clear();
+
+int remaining = remainingPageData();
+int toCopy= remaining > buffer.capacity() ? buffer.capacity() : 
remaining;
+
+if (toCopy == 0) {
+  return false;
+}
+
+pageInfo.pageData.getBytes(pageInfo.pageDataOff, buffer.array(), 
buffer.position(), toCopy);
--- End diff --

@parthchandra I know this is counter intuitive but this path makes it 
faster and I had VTune to prove it; let me explain it:
- The logic is to fetch small chunks of data from main memory to the L1 
cache (using 4k buffers) for processing
- This logic doesn't introduce any extra memory bandwidth penalty since in 
all cases data has to move from main memory to the CPU and back to main memory 
(if modified)
- Note that the CPU has optimizations that disable 

[GitHub] drill issue #1060: DRILL-5846: Improve parquet performance for Flat Data Typ...

2018-01-09 Thread sachouche
Github user sachouche commented on the issue:

https://github.com/apache/drill/pull/1060
  
Before I reply to the provided comments I want first to thank both Parth 
and Paul for taking time to review this Pull Request.
@parthchandra Regarding the High Level Comments

- FS Access Manager
The attached document listed several possible improvements in terms of CPU, 
Partitioning (changes to the query plan), and IO layer. I didn't want to club 
all the changes in a single pull request; instead I started with the Pull 
Request with the highest priority (in terms of performance improvement) which 
is CPU related enhancements.

- Drill Memory Package
I understand where you're coming from; I have analyzed this task and came 
up with a solution which I felt was the cleanest. Let me take you through my 
thinking process and the alternative solutions that I have considered:
a) Why did I use this low level APIs?
The Bulk Parquet reader uses the Drill Buffer APIs to copy data chunks to 
the CPU cache; this means that memory sanity checks are always performed. 
During code profiling I have noticed the Hotspot was using too many 
instructions (byte based assembly was used); I have researched ways to improve 
this and found some Hadoop and Compression java code which utilized 64bit 
access instructions when manipulating byte arrays (e.g., 
[Snappy](https://github.com/airlift/aircompressor/blob/master/src/main/java/io/airlift/compress/snappy/SnappyRawCompressor.java)).
  This makes a huge difference

b) Safety
As indicated previously, native data is always accessed through the Drill 
Buffer APIs (that is, accessed data is sanitized). During bulk processing, the 
new utility has additional checks which are triggered when assertions are 
enabled. This means all test-suite and QA tests (which run with assertions On) 
will be able to catch any faulty memory access; note also the checks are 
simpler since they only pertain to byte arrays boundary checks.

c) Abstraction
I agree that the Drill code base should follow a common code path when 
accessing native memory (minimize chances of introducing bugs), though I felt 
the Drill Memory package was the main abstraction since it was centralized and 
thus easier to test.

d) Alternatives
I have considered using the Netty io.netty.util.internal.PlatformDependent0 
to access the Java intrinsics APIs though this class has a package scope and 
the wrapper public class io.netty.util.internal.PlatformDependent doesn't 
expose this low level functionality. Thus the work around would have been to 
create MemoryUtils class under the io.netty.util.internal package. I didn't 
think this was a cleaner approach than just using Java's intrinsic APIs 
directly (note also that Java9 will expose such intrinsics under a public JDK 
package instead of the sun.misc. Please let me know if you rather prefer the 
io.netty.util.internal route instead.

- Changes to the Vector Code
I had few discussions with Paul about the need to reconcile DRILL-5657; the 
agreement was to proceed with the current implementation and then apply 
refactoring (future releases) as my changes are backward compatible (new public 
APIs added). 

- Arrow Integration
I will create a task to analyze the impact of the Arrow Integration on the 
Parquet Reader.

- UserBitShared.proto
This class was updated to deal with the implicit columns optimization. I 
have added two fields to the NullableVarChar class to capture the logical 
number of rows and the fact this feature was active. This allowed me to use 
different Accessor and Mutator implementation. You and I had a discussion about 
this and the agreement was to:
a) The new optimization is configurable (current default is off); though we 
might change the default at some point
b) Detect C++ based clients and turn of this optimization (that is, the 
implicit columns enhancements); I didn't implement this task yet as I forgot 
the name of the property that you have indicated (this should be quite fast to 
implement though)

- Testing
Yes, QA helped me to run the test-suite with the two new options enabled. 
The new changes didn't modify any of the encoding/compression logic but I'll be 
happy to add any test(s) you deem necessary.. 

- Code Style
Thank you for the pointers; I'll stick to your feedback in future changes 
to ease the review process.





---


[GitHub] drill issue #458: DRILL-4573: Zero copy LIKE, REGEXP_MATCHES, SUBSTR

2018-01-09 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/458
  
@jcmcote Did you get a chance to follow up on @jinfengni 's comment on this 
PR?


---


[GitHub] drill issue #814: Add clickable localhost URL

2018-01-09 Thread ebuildy
Github user ebuildy commented on the issue:

https://github.com/apache/drill/pull/814
  
Hello, nop, no JIRA ticket associated.


---


[GitHub] drill issue #814: Add clickable localhost URL

2018-01-09 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/814
  
@ebuildy is there a JIRA associated with this PR?


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-09 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160492087
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -539,7 +553,12 @@ public void setValueLengthSafe(int index, int length) {
 }
 
 
-public void setSafe(int index, int start, int end, DrillBuf buffer){
+<#if type.minor == "VarDecimal">
--- End diff --

I suppose condition should be negated or if and else should be swapped


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-09 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160489866
  
--- Diff: exec/vector/src/main/codegen/templates/ComplexWriters.java ---
@@ -99,7 +99,7 @@ public void write(Nullable${minor.class?cap_first}Holder 
h) {
 
   <#if !(minor.class == "Decimal9" || minor.class == "Decimal18" || 
minor.class == "Decimal28Sparse" || minor.class == "Decimal38Sparse" || 
minor.class == "Decimal28Dense" || minor.class == "Decimal38Dense")>
   public void write${minor.class}(<#list fields as field>${field.type} 
${field.name}<#if field_has_next>, ) {
-mutator.addSafe(idx(), <#list fields as field>${field.name}<#if 
field_has_next>, );
+mutator.addSafe(idx(), <#list fields as field><#if field.name == 
"scale"><#break>${field.name}<#if field_has_next && 
fields[field_index+1].name != "scale" >, );
--- End diff --

I think it would be better to replace these checks by a check for 
`minor.class`


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-09 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160480509
  
--- Diff: exec/java-exec/src/main/codegen/templates/SqlAccessors.java ---
@@ -127,6 +127,25 @@ public String getString(int index) {
 }
   <#break>
 
+<#case "VarDecimal">
+
+@Override
+public String getString(int index){
+<#if mode=="Nullable">
+if(ac.isNull(index)){
+  return null;
+}
+
+try {
--- End diff --

I don't see the necessity of try catch there. Could you please explain?


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-09 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160481004
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/StringOutputRecordWriter.java ---
@@ -146,7 +146,7 @@ public void writeField() throws IOException {
 // TODO: error check
 addField(fieldId, reader.readObject().toString());
 
-  <#elseif minor.class == "VarChar" || minor.class == "Var16Char" || 
minor.class == "VarBinary">
+  <#elseif minor.class == "VarChar" || minor.class == "Var16Char" || 
minor.class == "VarBinary" || minor.class == "VarDecimal">
--- End diff --

May these types be moved into previous `elseif`?
  


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-09 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160458242
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastDecimalVarchar.java ---
@@ -150,6 +150,14 @@ public void setup() {
 
 public void eval() {
 
+<#if type.from.contains("VarDecimal")>
+java.math.BigDecimal bigDecimal = 
org.apache.drill.exec.util.DecimalUtility.getBigDecimalFromDrillBuf(in.buffer, 
in.start, in.end - in.start, in.scale);
+String str = bigDecimal.toString();
+out.buffer = buffer;
+out.start = 0;
+out.end = Math.min((int)len.value, str.length());
+out.buffer.setBytes(0, str.getBytes());
--- End diff --

What about the case when `str` has a length greater than `len`? I think it 
would be better to use `setBytes(int index, byte[] src, int srcIndex, int 
length)` method here.


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-09 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160492409
  
--- Diff: 
exec/vector/src/main/java/org/apache/drill/exec/util/DecimalUtility.java ---
@@ -159,9 +159,20 @@ public static BigDecimal 
getBigDecimalFromSparse(DrillBuf data, int startIndex,
 }
 
 public static BigDecimal getBigDecimalFromDrillBuf(DrillBuf bytebuf, 
int start, int length, int scale) {
+  if (length <= 0) {
+// if the length is somehow non-positive, interpret this as zero
+//System.out.println("getBigDecimal forces 0 with start " + start 
+ " len " + length);
+  try {
+  throw new Exception("hi there");
--- End diff --

Please remove


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-09 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160476071
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/DecimalFunctions.java ---
@@ -102,7 +111,578 @@
 <#-- For each DECIMAL... type (in DecimalTypes.tdd) ... -->
 <#list comparisonTypesDecimal.decimalTypes as type>
 
-<#if type.name.endsWith("Sparse")>
+<#if type.name.endsWith("VarDecimal")>
+
+<@pp.changeOutputFile 
name="/org/apache/drill/exec/expr/fn/impl/${type.name}Functions.java" />
+
+<#include "/@includes/license.ftl" />
+
+package org.apache.drill.exec.expr.fn.impl;
+
+<#include "/@includes/vv_imports.ftl" />
+
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import 
org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.annotations.Workspace;
+import org.apache.drill.exec.expr.fn.FunctionGenerationHelper;
+import org.apache.drill.exec.expr.holders.*;
+import org.apache.drill.exec.record.RecordBatch;
+
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.DrillBuf;
+
+import java.nio.ByteBuffer;
+
+@SuppressWarnings("unused")
+public class ${type.name}Functions {
+private static void initBuffer(DrillBuf buffer) {
+// for VarDecimal, this method of setting initial size is actually 
only a very rough heuristic.
+int size = (${type.storage} * 
(org.apache.drill.exec.util.DecimalUtility.INTEGER_SIZE));
+buffer = buffer.reallocIfNeeded(size);
+ }
+
+@FunctionTemplate(name = "subtract", scope = 
FunctionTemplate.FunctionScope.DECIMAL_ADD_SCALE, nulls = 
NullHandling.NULL_IF_NULL)
--- End diff --

Please specify `FunctionTemplate` correct scope, returnType and 
checkPrecisionRange for all functions in this class.
  


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-09 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160473858
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/DecimalAggrTypeFunctions2.java
 ---
@@ -108,9 +108,12 @@ public void output() {
 out.buffer = buffer;
 out.start  = 0;
 out.scale = outputScale.value;
-out.precision = 38;
 java.math.BigDecimal average = 
((java.math.BigDecimal)(value.obj)).divide(java.math.BigDecimal.valueOf(count.value,
 0), out.scale, java.math.BigDecimal.ROUND_HALF_UP);
+<#if type.inputType.contains("VarDecimal")>
--- End diff --

It would be better to swap if and else and negate if condition for better 
readability.


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-09 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160490947
  
--- Diff: exec/vector/src/main/codegen/templates/NullableValueVectors.java 
---
@@ -327,13 +327,17 @@ public Mutator getMutator(){
 return v;
   }
 
-  public void copyFrom(int fromIndex, int thisIndex, 
Nullable${minor.class}Vector from){
+  protected void copyFromUnsafe(int fromIndex, int thisIndex, 
Nullable${minor.class}Vector from){
--- End diff --

Are changes in this class necessary?


---


[GitHub] drill pull request #570: DRILL-4834 decimal implementation is vulnerable to ...

2018-01-09 Thread vvysotskyi
Github user vvysotskyi commented on a diff in the pull request:

https://github.com/apache/drill/pull/570#discussion_r160465652
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/Decimal/CastIntDecimal.java ---
@@ -68,15 +68,31 @@ public void setup() {
 
 public void eval() {
 out.scale = (int) scale.value;
+
+<#if !type.to.endsWith("VarDecimal")>
 out.precision = (int) precision.value;
+
 
-<#if type.to == "Decimal9" || type.to == "Decimal18">
+<#if type.to.endsWith("VarDecimal")>
+out.start = 0;
+out.buffer = buffer;
+String s = Long.toString((long)in.value);
+for (int i = 0; i < out.scale; ++i) {  // add 0's to get unscaled 
integer
+s += "0";
--- End diff --

I think the usage of `StringBuilder` will be more suitable here.


---


[GitHub] drill pull request #1084: DRILL-5868: Support SQL syntax highlighting of que...

2018-01-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1084#discussion_r160491764
  
--- Diff: 
exec/java-exec/src/main/resources/rest/static/js/ace-code-editor/mode-sql.js ---
@@ -1 +1,128 @@

-ace.define("ace/mode/sql_highlight_rules",["require","exports","module","ace/lib/oop","ace/mode/text_highlight_rules"],function(e,t,n){"use
 strict";var 
r=e("../lib/oop"),i=e("./text_highlight_rules").TextHighlightRules,s=function(){var
 
e="select|insert|update|delete|from|where|and|or|group|by|order|limit|offset|having|as|case|when|else|end|type|left|right|join|on|outer|desc|asc|union|create|table|primary|key|if|foreign|not|references|default|null|inner|cross|natural|database|drop|grant",t="true|false",n="avg|count|first|last|max|min|sum|ucase|lcase|mid|len|round|rank|now|format|coalesce|ifnull|isnull|nvl",r="int|numeric|decimal|date|varchar|char|bigint|float|double|bit|binary|text|set|timestamp|money|real|number|integer",i=this.createKeywordMapper({"support.function":n,keyword:e,"constant.language":t,"storage.type":r},"identifier",!0);this.$rules={start:[{token:"comment",regex:"--.*$"},{token:"comment",start:"/\\*",end:"\\*/"},{token:"string",regex:'".*?"'},{token:"string",regex
 
:"'.*?'"},{token:"string",regex:"`.*?`"},{token:"constant.numeric",regex:"[+-]?\\d+(?:(?:\\.\\d*)?(?:[eE][+-]?\\d+)?)?\\b"},{token:i,regex:"[a-zA-Z_$][a-zA-Z0-9_$]*\\b"},{token:"keyword.operator",regex:"\\+|\\-|\\/|\\/\\/|%|<@>|@>|<@|&|\\^|~|<|>|<=|=>|==|!=|<>|="},{token:"paren.lparen",regex:"[\\(]"},{token:"paren.rparen",regex:"[\\)]"},{token:"text",regex:"\\s+"}]},this.normalizeRules()};r.inherits(s,i),t.SqlHighlightRules=s}),ace.define("ace/mode/sql",["require","exports","module","ace/lib/oop","ace/mode/text","ace/mode/sql_highlight_rules"],function(e,t,n){"use
 strict";var 
r=e("../lib/oop"),i=e("./text").Mode,s=e("./sql_highlight_rules").SqlHighlightRules,o=function(){this.HighlightRules=s,this.$behaviour=this.$defaultBehaviour};r.inherits(o,i),function(){this.lineCommentStart="--",this.$id="ace/mode/sql"}.call(o.prototype),t.Mode=o})
\ No newline at end of file
+/**
+ * Drill SQL Definition (Forked from SqlServer definition)
+ */
+

+ace.define("ace/mode/sql_highlight_rules",["require","exports","module","ace/lib/oop","ace/mode/text_highlight_rules"],
 function(require, exports, module) {
+"use strict";
+
+var oop = require("../lib/oop");
+var TextHighlightRules = 
require("./text_highlight_rules").TextHighlightRules;
+
+var SqlHighlightRules = function() {
+
+//TODO: https://drill.apache.org/docs/reserved-keywords/
+   //e.g. Cubing operators like ROLLUP are not listed
+var keywords = (
+
"select|insert|update|delete|from|where|and|or|group|by|order|limit|offset|having|as|case|"
 +
+
"when|else|end|type|left|right|join|on|outer|desc|asc|union|create|table|key|if|"
 +
+"not|default|null|inner|database|drop|" +
+   "flatten|kvgen|columns"
+);
+   //Confirmed to be UnSupported as of Drill-1.12.0
+   /* cross|natural|primary|foreign|references|grant */
+
+var builtinConstants = (
+"true|false"
+);
+
+//Drill-specific
+var builtinFunctions = (
+//Math and Trignometric
+
"abs|cbrt|ceil|ceiling|degrees|e|exp|floor|log|log|log10|lshift|mod|negative|pi|pow|radians|rand|round|round|rshift|sign|sqrt|trunc|trunc|"
 +
--- End diff --

Sounds good, could you please then just file a Jira for this improvement?


---


[GitHub] drill pull request #1084: DRILL-5868: Support SQL syntax highlighting of que...

2018-01-09 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1084#discussion_r160491309
  
--- Diff: 
exec/java-exec/src/main/resources/rest/static/js/ace-code-editor/mode-sql.js ---
@@ -1 +1,128 @@

-ace.define("ace/mode/sql_highlight_rules",["require","exports","module","ace/lib/oop","ace/mode/text_highlight_rules"],function(e,t,n){"use
 strict";var 
r=e("../lib/oop"),i=e("./text_highlight_rules").TextHighlightRules,s=function(){var
 
e="select|insert|update|delete|from|where|and|or|group|by|order|limit|offset|having|as|case|when|else|end|type|left|right|join|on|outer|desc|asc|union|create|table|primary|key|if|foreign|not|references|default|null|inner|cross|natural|database|drop|grant",t="true|false",n="avg|count|first|last|max|min|sum|ucase|lcase|mid|len|round|rank|now|format|coalesce|ifnull|isnull|nvl",r="int|numeric|decimal|date|varchar|char|bigint|float|double|bit|binary|text|set|timestamp|money|real|number|integer",i=this.createKeywordMapper({"support.function":n,keyword:e,"constant.language":t,"storage.type":r},"identifier",!0);this.$rules={start:[{token:"comment",regex:"--.*$"},{token:"comment",start:"/\\*",end:"\\*/"},{token:"string",regex:'".*?"'},{token:"string",regex
 
:"'.*?'"},{token:"string",regex:"`.*?`"},{token:"constant.numeric",regex:"[+-]?\\d+(?:(?:\\.\\d*)?(?:[eE][+-]?\\d+)?)?\\b"},{token:i,regex:"[a-zA-Z_$][a-zA-Z0-9_$]*\\b"},{token:"keyword.operator",regex:"\\+|\\-|\\/|\\/\\/|%|<@>|@>|<@|&|\\^|~|<|>|<=|=>|==|!=|<>|="},{token:"paren.lparen",regex:"[\\(]"},{token:"paren.rparen",regex:"[\\)]"},{token:"text",regex:"\\s+"}]},this.normalizeRules()};r.inherits(s,i),t.SqlHighlightRules=s}),ace.define("ace/mode/sql",["require","exports","module","ace/lib/oop","ace/mode/text","ace/mode/sql_highlight_rules"],function(e,t,n){"use
 strict";var 
r=e("../lib/oop"),i=e("./text").Mode,s=e("./sql_highlight_rules").SqlHighlightRules,o=function(){this.HighlightRules=s,this.$behaviour=this.$defaultBehaviour};r.inherits(o,i),function(){this.lineCommentStart="--",this.$id="ace/mode/sql"}.call(o.prototype),t.Mode=o})
\ No newline at end of file
+/**
+ * Drill SQL Definition (Forked from SqlServer definition)
+ */
+

+ace.define("ace/mode/sql_highlight_rules",["require","exports","module","ace/lib/oop","ace/mode/text_highlight_rules"],
 function(require, exports, module) {
+"use strict";
+
+var oop = require("../lib/oop");
+var TextHighlightRules = 
require("./text_highlight_rules").TextHighlightRules;
+
+var SqlHighlightRules = function() {
+
+//TODO: https://drill.apache.org/docs/reserved-keywords/
+   //e.g. Cubing operators like ROLLUP are not listed
+var keywords = (
+
"select|insert|update|delete|from|where|and|or|group|by|order|limit|offset|having|as|case|"
 +
+
"when|else|end|type|left|right|join|on|outer|desc|asc|union|create|table|key|if|"
 +
+"not|default|null|inner|database|drop|" +
+   "flatten|kvgen|columns"
--- End diff --

There is a small challenge with this list. The 300+ keywords are a mix of 
reserved SQL keywords, functions and other words. I could be wrong, but I think 
some of these are not even supported (e.g. TRIGGER).
I would need to separate the functions and SQL keywords for the colour 
highlighting to appear correctly.
Let me walk through the list and at least identify the SQL keywords. 
Functions were already covered separately.


---


[GitHub] drill pull request #1084: DRILL-5868: Support SQL syntax highlighting of que...

2018-01-09 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1084#discussion_r160485223
  
--- Diff: 
exec/java-exec/src/main/resources/rest/static/js/ace-code-editor/snippets/sql.js
 ---
@@ -0,0 +1,46 @@
+/**
+ * Drill SQL Syntax Snippets
+ */
+
+ace.define("ace/snippets/sql",["require","exports","module"], 
function(require, exports, module) {
--- End diff --

+1
I did see this initially, but ignored it as I made progress. Guess I got 
trained to ignoring the contents in the end :)
Will fix this.


---


[GitHub] drill pull request #1084: DRILL-5868: Support SQL syntax highlighting of que...

2018-01-09 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1084#discussion_r160482817
  
--- Diff: 
exec/java-exec/src/main/resources/rest/static/js/ace-code-editor/mode-sql.js ---
@@ -1 +1,128 @@

-ace.define("ace/mode/sql_highlight_rules",["require","exports","module","ace/lib/oop","ace/mode/text_highlight_rules"],function(e,t,n){"use
 strict";var 
r=e("../lib/oop"),i=e("./text_highlight_rules").TextHighlightRules,s=function(){var
 
e="select|insert|update|delete|from|where|and|or|group|by|order|limit|offset|having|as|case|when|else|end|type|left|right|join|on|outer|desc|asc|union|create|table|primary|key|if|foreign|not|references|default|null|inner|cross|natural|database|drop|grant",t="true|false",n="avg|count|first|last|max|min|sum|ucase|lcase|mid|len|round|rank|now|format|coalesce|ifnull|isnull|nvl",r="int|numeric|decimal|date|varchar|char|bigint|float|double|bit|binary|text|set|timestamp|money|real|number|integer",i=this.createKeywordMapper({"support.function":n,keyword:e,"constant.language":t,"storage.type":r},"identifier",!0);this.$rules={start:[{token:"comment",regex:"--.*$"},{token:"comment",start:"/\\*",end:"\\*/"},{token:"string",regex:'".*?"'},{token:"string",regex
 
:"'.*?'"},{token:"string",regex:"`.*?`"},{token:"constant.numeric",regex:"[+-]?\\d+(?:(?:\\.\\d*)?(?:[eE][+-]?\\d+)?)?\\b"},{token:i,regex:"[a-zA-Z_$][a-zA-Z0-9_$]*\\b"},{token:"keyword.operator",regex:"\\+|\\-|\\/|\\/\\/|%|<@>|@>|<@|&|\\^|~|<|>|<=|=>|==|!=|<>|="},{token:"paren.lparen",regex:"[\\(]"},{token:"paren.rparen",regex:"[\\)]"},{token:"text",regex:"\\s+"}]},this.normalizeRules()};r.inherits(s,i),t.SqlHighlightRules=s}),ace.define("ace/mode/sql",["require","exports","module","ace/lib/oop","ace/mode/text","ace/mode/sql_highlight_rules"],function(e,t,n){"use
 strict";var 
r=e("../lib/oop"),i=e("./text").Mode,s=e("./sql_highlight_rules").SqlHighlightRules,o=function(){this.HighlightRules=s,this.$behaviour=this.$defaultBehaviour};r.inherits(o,i),function(){this.lineCommentStart="--",this.$id="ace/mode/sql"}.call(o.prototype),t.Mode=o})
\ No newline at end of file
+/**
+ * Drill SQL Definition (Forked from SqlServer definition)
+ */
+

+ace.define("ace/mode/sql_highlight_rules",["require","exports","module","ace/lib/oop","ace/mode/text_highlight_rules"],
 function(require, exports, module) {
+"use strict";
+
+var oop = require("../lib/oop");
+var TextHighlightRules = 
require("./text_highlight_rules").TextHighlightRules;
+
+var SqlHighlightRules = function() {
+
+//TODO: https://drill.apache.org/docs/reserved-keywords/
+   //e.g. Cubing operators like ROLLUP are not listed
+var keywords = (
+
"select|insert|update|delete|from|where|and|or|group|by|order|limit|offset|having|as|case|"
 +
+
"when|else|end|type|left|right|join|on|outer|desc|asc|union|create|table|key|if|"
 +
+"not|default|null|inner|database|drop|" +
+   "flatten|kvgen|columns"
+);
+   //Confirmed to be UnSupported as of Drill-1.12.0
+   /* cross|natural|primary|foreign|references|grant */
+
+var builtinConstants = (
+"true|false"
+);
+
+//Drill-specific
+var builtinFunctions = (
+//Math and Trignometric
+
"abs|cbrt|ceil|ceiling|degrees|e|exp|floor|log|log|log10|lshift|mod|negative|pi|pow|radians|rand|round|round|rshift|sign|sqrt|trunc|trunc|"
 +
--- End diff --

I tried looking at that option, but it is quite difficult to figure out how 
the libraries read these files. We could do a dynamic generation of this list, 
when the {sys.functions} table does get implemented. 


---


[GitHub] drill pull request #1084: DRILL-5868: Support SQL syntax highlighting of que...

2018-01-09 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1084#discussion_r160482221
  
--- Diff: 
exec/java-exec/src/main/resources/rest/static/js/ace-code-editor/mode-sql.js ---
@@ -1 +1,128 @@

-ace.define("ace/mode/sql_highlight_rules",["require","exports","module","ace/lib/oop","ace/mode/text_highlight_rules"],function(e,t,n){"use
 strict";var 
r=e("../lib/oop"),i=e("./text_highlight_rules").TextHighlightRules,s=function(){var
 
e="select|insert|update|delete|from|where|and|or|group|by|order|limit|offset|having|as|case|when|else|end|type|left|right|join|on|outer|desc|asc|union|create|table|primary|key|if|foreign|not|references|default|null|inner|cross|natural|database|drop|grant",t="true|false",n="avg|count|first|last|max|min|sum|ucase|lcase|mid|len|round|rank|now|format|coalesce|ifnull|isnull|nvl",r="int|numeric|decimal|date|varchar|char|bigint|float|double|bit|binary|text|set|timestamp|money|real|number|integer",i=this.createKeywordMapper({"support.function":n,keyword:e,"constant.language":t,"storage.type":r},"identifier",!0);this.$rules={start:[{token:"comment",regex:"--.*$"},{token:"comment",start:"/\\*",end:"\\*/"},{token:"string",regex:'".*?"'},{token:"string",regex
 
:"'.*?'"},{token:"string",regex:"`.*?`"},{token:"constant.numeric",regex:"[+-]?\\d+(?:(?:\\.\\d*)?(?:[eE][+-]?\\d+)?)?\\b"},{token:i,regex:"[a-zA-Z_$][a-zA-Z0-9_$]*\\b"},{token:"keyword.operator",regex:"\\+|\\-|\\/|\\/\\/|%|<@>|@>|<@|&|\\^|~|<|>|<=|=>|==|!=|<>|="},{token:"paren.lparen",regex:"[\\(]"},{token:"paren.rparen",regex:"[\\)]"},{token:"text",regex:"\\s+"}]},this.normalizeRules()};r.inherits(s,i),t.SqlHighlightRules=s}),ace.define("ace/mode/sql",["require","exports","module","ace/lib/oop","ace/mode/text","ace/mode/sql_highlight_rules"],function(e,t,n){"use
 strict";var 
r=e("../lib/oop"),i=e("./text").Mode,s=e("./sql_highlight_rules").SqlHighlightRules,o=function(){this.HighlightRules=s,this.$behaviour=this.$defaultBehaviour};r.inherits(o,i),function(){this.lineCommentStart="--",this.$id="ace/mode/sql"}.call(o.prototype),t.Mode=o})
\ No newline at end of file
+/**
+ * Drill SQL Definition (Forked from SqlServer definition)
+ */
+

+ace.define("ace/mode/sql_highlight_rules",["require","exports","module","ace/lib/oop","ace/mode/text_highlight_rules"],
 function(require, exports, module) {
+"use strict";
+
+var oop = require("../lib/oop");
+var TextHighlightRules = 
require("./text_highlight_rules").TextHighlightRules;
+
+var SqlHighlightRules = function() {
+
+//TODO: https://drill.apache.org/docs/reserved-keywords/
+   //e.g. Cubing operators like ROLLUP are not listed
+var keywords = (
+
"select|insert|update|delete|from|where|and|or|group|by|order|limit|offset|having|as|case|"
 +
+
"when|else|end|type|left|right|join|on|outer|desc|asc|union|create|table|key|if|"
 +
+"not|default|null|inner|database|drop|" +
+   "flatten|kvgen|columns"
--- End diff --

This list I could add. Adding everything else is a bit harder. I'll update 
the list in an add-on comit and attach to this PR. Thnx!


---


Re: Drill Hangout Jan 9, 2018

2018-01-09 Thread Vlad Rozov

Should we also talk about Java 9 and Hadoop 3.x support?

Thank you,

Vlad

On 1/9/18 08:36, Arina Yelchiyeva wrote:

Mark Java 8 support as critical for 1.13.0 release?
For example, in Calcite 1.16 support for Java 7 will be dropped.


On Tue, Jan 9, 2018 at 9:34 AM, Aditya Allamraju  wrote:


We will have our first bi-weekly hangout in the 2018 tomorrow at 10 am.
Please reply to this post with proposed topics to discuss.

Hangout link: https://plus.google.com/hangouts/_/event/

ci4rdiju8bv04a64efj

5fedd0lc

Thank you,

Vlad





Re: Drill Hangout Jan 9, 2018

2018-01-09 Thread Vlad Rozov

Hi Aditya,

Yes, it is open to everyone with exception that hangouts allow 10 
connections only.


Thank you,

Vlad

On 1/8/18 23:34, Aditya Allamraju wrote:

Hi Vlad,

If this is open to everyone in this group, i would like to join this
hangout.
Please let me know.

Thanks
Aditya

On Tue, Jan 9, 2018 at 1:22 AM, Vlad Rozov  wrote:


We will have our first bi-weekly hangout in the 2018 tomorrow at 10 am.
Please reply to this post with proposed topics to discuss.

Hangout link: https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj
5fedd0lc

Thank you,

Vlad





Re: Drill Hangout Jan 9, 2018

2018-01-09 Thread Arina Yelchiyeva
Mark Java 8 support as critical for 1.13.0 release?
For example, in Calcite 1.16 support for Java 7 will be dropped.


On Tue, Jan 9, 2018 at 9:34 AM, Aditya Allamraju  wrote:

> Hi Vlad,
>
> If this is open to everyone in this group, i would like to join this
> hangout.
> Please let me know.
>
> Thanks
> Aditya
>
> On Tue, Jan 9, 2018 at 1:22 AM, Vlad Rozov  wrote:
>
> > We will have our first bi-weekly hangout in the 2018 tomorrow at 10 am.
> > Please reply to this post with proposed topics to discuss.
> >
> > Hangout link: https://plus.google.com/hangouts/_/event/
> ci4rdiju8bv04a64efj
> > 5fedd0lc
> >
> > Thank you,
> >
> > Vlad
> >
>


[GitHub] drill pull request #1084: DRILL-5868: Support SQL syntax highlighting of que...

2018-01-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1084#discussion_r160457025
  
--- Diff: 
exec/java-exec/src/main/resources/rest/static/js/ace-code-editor/snippets/sql.js
 ---
@@ -0,0 +1,46 @@
+/**
+ * Drill SQL Syntax Snippets
+ */
+
+ace.define("ace/snippets/sql",["require","exports","module"], 
function(require, exports, module) {
+"use strict";
+
+exports.snippetText = "snippet info\n\
+   select * from INFORMATION_SCHEMA.${1:};\n\
+snippet sysmem\n\
--- End diff --

Also snippets are case-sensitive. Is it common practice?


---


Re: PCAP Issues

2018-01-09 Thread Ted Dunning
The pcap format is there almost entirely just to give us a record structure.

The stream could give us that just as easily.

So that is one thing that could be done.

Another thought is that data could be buffered a (very) short time and
groups of packets could be sent as a single message. That would decrease
streaming overhead. I would vote against that change at first, but it would
make the pcap format relevant again.



On Tue, Jan 9, 2018 at 6:45 AM, John Omernik  wrote:

> I've been following this and and also now playing with them more.  A couple
> of tidbits that I think are worth re-mentioning:
>
> 1. pcapng the ability to read this too will be handy.
>
> 2. The flags are really cool, and the work here is great.
>
> The other thing I'd like to consider for this is with recent work on the
> Kafka plugin, and I am guessing a similar plugin with MapR Streams, what if
> we were able to have a tool like daemonlogger, or tcpdump read from a
> network interface and write directly to a Kafka/MapR Stream topic?
>
> My thought is this, you write to the topic, you partition based on a hash
> of known data components (like the 5-tuple of protocol, sport, dport,
> saddr, and daddr). From here, you could have things subscribe to the topic
> and process, you could etl, you could query.  There would be some concern
> at high scale, but even that has methods to address.  One of the challenges
> we have infosec is getting processing close enough to capture OR
> capture to file, move them around, have creates process to read and process
> files.
>
> It would be cool, to combine the pcap plugin directly with the kafka/stream
> plugin in drill and be able to read live pcaps!
>
> One big challenge is the pcap format has a header, but it seems to be a
> simple header that we "may" be able to forge upon dropping the needle on a
> stream of packet records.  I am just brainstorming aloud right now, (I
> should consider waiting until at least 3 cups of coffee before I make this
> inner dialog public)
>
> Curious on thoughts here.
>
>
>
>
> On Fri, Jan 5, 2018 at 10:01 AM, Charles Givre  wrote:
>
> > I assumed (incorrectly as it turned out) that I was not reading the
> > offsets correctly or something.
> >
> >
> > > On Jan 5, 2018, at 10:59, Ted Dunning  wrote:
> > >
> > > Yeah... I got the same result.
> > >
> > > I will push my latest shortly (without the boolean stuff)
> > >
> > > On Fri, Jan 5, 2018 at 7:52 AM, Charles Givre 
> wrote:
> > >
> > >> Ted,
> > >> I’m wondering if we’ve uncovered an unsupported, undocumented feature
> in
> > >> Drill with respect to the BitWriter.  I made the following changes:
> > >>
> > >>
> > >> To PcapRecordReader.java, I added:
> > >>
> > >>
> > >> case "ack_flag":
> > >>  if (packet.isTcpPacket()) {
> > >>int flag = (packet.getAckFlag()) ? 1 : 0;
> > >>setBooleanColumnValue(flag, pci, count);
> > >>  }
> > >>  break;
> > >> ...
> > >>
> > >> private void setBooleanColumnValue(final int data, final
> > >> ProjectedColumnInfo pci, final int count) {
> > >>  ((NullableBitVector.Mutator) pci.vv.getMutator())
> > >>.setSafe(count, data);
> > >> }
> > >>
> > >> Then I added this to PcapFormatUtils
> > >>
> > >> public static int getBit(final byte[] buf, final int posByte, final
> int
> > >> posBit ) {
> > >>  byte valByte = buf[posByte];
> > >>  return valByte >> (8 - (posBit + 1)) & 0x0001;
> > >> }
> > >>
> > >> I added a column in the Schema.java and a function in Packet.java to
> get
> > >> the bit.  I was getting NPEs and I’m wondering now if the cause wasn’t
> > my
> > >> code but rather a problem with the bitwriter.  I’m going to play with
> > this
> > >> and see if I can get Drill to write a True/False value at all, and
> > report
> > >> back.  If the BitWriter is throwing NPE, I’ll create a JIRA for it.
> > >>
> > >> Thanks for your work on the PCAP format reader.
> > >> —C
> > >>
> > >>
> > >>
> > >>> On Jan 3, 2018, at 17:33, Ted Dunning  wrote:
> > >>>
> > >>> Don't think that will work.
> > >>>
> > >>> I tried what seemed obvious and got a NPE. Joys of Drill.
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Jan 3, 2018 at 1:31 PM, Charles Givre 
> > wrote:
> > >>>
> >  This isn’t the most elegant example, but should do the trick.
> > 
> >  https://github.com/cgivre/drill-network-functions/blob/
> >  master/src/main/java/org/apache/drill/contrib/function/
> > IsPrivateIP.java
> > >> <
> >  https://github.com/cgivre/drill-network-functions/blob/
> >  master/src/main/java/org/apache/drill/contrib/function/
> > >> IsPrivateIP.java>
> > 
> > 
> > 
> > 
> > > On Jan 3, 2018, at 16:09, Ted Dunning 
> wrote:
> > >
> > > On Wed, Jan 3, 2018 at 12:17 PM, Charles Givre 
> > >> wrote:
> > >
> > >> HI Ted,
> > >> This is really looking amazing.  Thank you so much for 

[GitHub] drill pull request #1084: DRILL-5868: Support SQL syntax highlighting of que...

2018-01-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1084#discussion_r160447919
  
--- Diff: 
exec/java-exec/src/main/resources/rest/static/js/ace-code-editor/mode-sql.js ---
@@ -1 +1,128 @@

-ace.define("ace/mode/sql_highlight_rules",["require","exports","module","ace/lib/oop","ace/mode/text_highlight_rules"],function(e,t,n){"use
 strict";var 
r=e("../lib/oop"),i=e("./text_highlight_rules").TextHighlightRules,s=function(){var
 
e="select|insert|update|delete|from|where|and|or|group|by|order|limit|offset|having|as|case|when|else|end|type|left|right|join|on|outer|desc|asc|union|create|table|primary|key|if|foreign|not|references|default|null|inner|cross|natural|database|drop|grant",t="true|false",n="avg|count|first|last|max|min|sum|ucase|lcase|mid|len|round|rank|now|format|coalesce|ifnull|isnull|nvl",r="int|numeric|decimal|date|varchar|char|bigint|float|double|bit|binary|text|set|timestamp|money|real|number|integer",i=this.createKeywordMapper({"support.function":n,keyword:e,"constant.language":t,"storage.type":r},"identifier",!0);this.$rules={start:[{token:"comment",regex:"--.*$"},{token:"comment",start:"/\\*",end:"\\*/"},{token:"string",regex:'".*?"'},{token:"string",regex
 
:"'.*?'"},{token:"string",regex:"`.*?`"},{token:"constant.numeric",regex:"[+-]?\\d+(?:(?:\\.\\d*)?(?:[eE][+-]?\\d+)?)?\\b"},{token:i,regex:"[a-zA-Z_$][a-zA-Z0-9_$]*\\b"},{token:"keyword.operator",regex:"\\+|\\-|\\/|\\/\\/|%|<@>|@>|<@|&|\\^|~|<|>|<=|=>|==|!=|<>|="},{token:"paren.lparen",regex:"[\\(]"},{token:"paren.rparen",regex:"[\\)]"},{token:"text",regex:"\\s+"}]},this.normalizeRules()};r.inherits(s,i),t.SqlHighlightRules=s}),ace.define("ace/mode/sql",["require","exports","module","ace/lib/oop","ace/mode/text","ace/mode/sql_highlight_rules"],function(e,t,n){"use
 strict";var 
r=e("../lib/oop"),i=e("./text").Mode,s=e("./sql_highlight_rules").SqlHighlightRules,o=function(){this.HighlightRules=s,this.$behaviour=this.$defaultBehaviour};r.inherits(o,i),function(){this.lineCommentStart="--",this.$id="ace/mode/sql"}.call(o.prototype),t.Mode=o})
\ No newline at end of file
+/**
+ * Drill SQL Definition (Forked from SqlServer definition)
+ */
+

+ace.define("ace/mode/sql_highlight_rules",["require","exports","module","ace/lib/oop","ace/mode/text_highlight_rules"],
 function(require, exports, module) {
+"use strict";
+
+var oop = require("../lib/oop");
+var TextHighlightRules = 
require("./text_highlight_rules").TextHighlightRules;
+
+var SqlHighlightRules = function() {
+
+//TODO: https://drill.apache.org/docs/reserved-keywords/
+   //e.g. Cubing operators like ROLLUP are not listed
+var keywords = (
+
"select|insert|update|delete|from|where|and|or|group|by|order|limit|offset|having|as|case|"
 +
+
"when|else|end|type|left|right|join|on|outer|desc|asc|union|create|table|key|if|"
 +
+"not|default|null|inner|database|drop|" +
+   "flatten|kvgen|columns"
--- End diff --

We have more keywords which would be nice to highlight and which are in 
https://drill.apache.org/docs/reserved-keywords/ list, for example: alter, 
function, explain, session, system etc. maybe we can add them in this PR?


---


[GitHub] drill pull request #1084: DRILL-5868: Support SQL syntax highlighting of que...

2018-01-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1084#discussion_r160452751
  
--- Diff: 
exec/java-exec/src/main/resources/rest/static/js/ace-code-editor/mode-sql.js ---
@@ -1 +1,128 @@

-ace.define("ace/mode/sql_highlight_rules",["require","exports","module","ace/lib/oop","ace/mode/text_highlight_rules"],function(e,t,n){"use
 strict";var 
r=e("../lib/oop"),i=e("./text_highlight_rules").TextHighlightRules,s=function(){var
 
e="select|insert|update|delete|from|where|and|or|group|by|order|limit|offset|having|as|case|when|else|end|type|left|right|join|on|outer|desc|asc|union|create|table|primary|key|if|foreign|not|references|default|null|inner|cross|natural|database|drop|grant",t="true|false",n="avg|count|first|last|max|min|sum|ucase|lcase|mid|len|round|rank|now|format|coalesce|ifnull|isnull|nvl",r="int|numeric|decimal|date|varchar|char|bigint|float|double|bit|binary|text|set|timestamp|money|real|number|integer",i=this.createKeywordMapper({"support.function":n,keyword:e,"constant.language":t,"storage.type":r},"identifier",!0);this.$rules={start:[{token:"comment",regex:"--.*$"},{token:"comment",start:"/\\*",end:"\\*/"},{token:"string",regex:'".*?"'},{token:"string",regex
 
:"'.*?'"},{token:"string",regex:"`.*?`"},{token:"constant.numeric",regex:"[+-]?\\d+(?:(?:\\.\\d*)?(?:[eE][+-]?\\d+)?)?\\b"},{token:i,regex:"[a-zA-Z_$][a-zA-Z0-9_$]*\\b"},{token:"keyword.operator",regex:"\\+|\\-|\\/|\\/\\/|%|<@>|@>|<@|&|\\^|~|<|>|<=|=>|==|!=|<>|="},{token:"paren.lparen",regex:"[\\(]"},{token:"paren.rparen",regex:"[\\)]"},{token:"text",regex:"\\s+"}]},this.normalizeRules()};r.inherits(s,i),t.SqlHighlightRules=s}),ace.define("ace/mode/sql",["require","exports","module","ace/lib/oop","ace/mode/text","ace/mode/sql_highlight_rules"],function(e,t,n){"use
 strict";var 
r=e("../lib/oop"),i=e("./text").Mode,s=e("./sql_highlight_rules").SqlHighlightRules,o=function(){this.HighlightRules=s,this.$behaviour=this.$defaultBehaviour};r.inherits(o,i),function(){this.lineCommentStart="--",this.$id="ace/mode/sql"}.call(o.prototype),t.Mode=o})
\ No newline at end of file
+/**
+ * Drill SQL Definition (Forked from SqlServer definition)
+ */
+

+ace.define("ace/mode/sql_highlight_rules",["require","exports","module","ace/lib/oop","ace/mode/text_highlight_rules"],
 function(require, exports, module) {
+"use strict";
+
+var oop = require("../lib/oop");
+var TextHighlightRules = 
require("./text_highlight_rules").TextHighlightRules;
+
+var SqlHighlightRules = function() {
+
+//TODO: https://drill.apache.org/docs/reserved-keywords/
+   //e.g. Cubing operators like ROLLUP are not listed
+var keywords = (
+
"select|insert|update|delete|from|where|and|or|group|by|order|limit|offset|having|as|case|"
 +
+
"when|else|end|type|left|right|join|on|outer|desc|asc|union|create|table|key|if|"
 +
+"not|default|null|inner|database|drop|" +
+   "flatten|kvgen|columns"
+);
+   //Confirmed to be UnSupported as of Drill-1.12.0
+   /* cross|natural|primary|foreign|references|grant */
+
+var builtinConstants = (
+"true|false"
+);
+
+//Drill-specific
+var builtinFunctions = (
+//Math and Trignometric
+
"abs|cbrt|ceil|ceiling|degrees|e|exp|floor|log|log|log10|lshift|mod|negative|pi|pow|radians|rand|round|round|rshift|sign|sqrt|trunc|trunc|"
 +
--- End diff --

I believe n future we'll have system table with Drill functions. Will be 
there a possibility to pass list of functions to the js script instead of 
hard-coding them?


---


[GitHub] drill pull request #1084: DRILL-5868: Support SQL syntax highlighting of que...

2018-01-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1084#discussion_r160449936
  
--- Diff: 
exec/java-exec/src/main/resources/rest/static/js/ace-code-editor/snippets/sql.js
 ---
@@ -0,0 +1,46 @@
+/**
+ * Drill SQL Syntax Snippets
+ */
+
+ace.define("ace/snippets/sql",["require","exports","module"], 
function(require, exports, module) {
--- End diff --

1. Could you please add information about snippets to Jira description. We 
might what to add it to Drill documentation later so user knows which snippets 
can be used. Let's say, it's hard to guess the snippers without looking into 
the code :)

2. When using snippets, I saw the following warning.


![image](https://user-images.githubusercontent.com/15086720/34730558-6a34fde4-f568-11e7-81e9-3e6d62091189.png)
Could you please check?
  


---


Re: PCAP Issues

2018-01-09 Thread John Omernik
I've been following this and and also now playing with them more.  A couple
of tidbits that I think are worth re-mentioning:

1. pcapng the ability to read this too will be handy.

2. The flags are really cool, and the work here is great.

The other thing I'd like to consider for this is with recent work on the
Kafka plugin, and I am guessing a similar plugin with MapR Streams, what if
we were able to have a tool like daemonlogger, or tcpdump read from a
network interface and write directly to a Kafka/MapR Stream topic?

My thought is this, you write to the topic, you partition based on a hash
of known data components (like the 5-tuple of protocol, sport, dport,
saddr, and daddr). From here, you could have things subscribe to the topic
and process, you could etl, you could query.  There would be some concern
at high scale, but even that has methods to address.  One of the challenges
we have infosec is getting processing close enough to capture OR
capture to file, move them around, have creates process to read and process
files.

It would be cool, to combine the pcap plugin directly with the kafka/stream
plugin in drill and be able to read live pcaps!

One big challenge is the pcap format has a header, but it seems to be a
simple header that we "may" be able to forge upon dropping the needle on a
stream of packet records.  I am just brainstorming aloud right now, (I
should consider waiting until at least 3 cups of coffee before I make this
inner dialog public)

Curious on thoughts here.




On Fri, Jan 5, 2018 at 10:01 AM, Charles Givre  wrote:

> I assumed (incorrectly as it turned out) that I was not reading the
> offsets correctly or something.
>
>
> > On Jan 5, 2018, at 10:59, Ted Dunning  wrote:
> >
> > Yeah... I got the same result.
> >
> > I will push my latest shortly (without the boolean stuff)
> >
> > On Fri, Jan 5, 2018 at 7:52 AM, Charles Givre  wrote:
> >
> >> Ted,
> >> I’m wondering if we’ve uncovered an unsupported, undocumented feature in
> >> Drill with respect to the BitWriter.  I made the following changes:
> >>
> >>
> >> To PcapRecordReader.java, I added:
> >>
> >>
> >> case "ack_flag":
> >>  if (packet.isTcpPacket()) {
> >>int flag = (packet.getAckFlag()) ? 1 : 0;
> >>setBooleanColumnValue(flag, pci, count);
> >>  }
> >>  break;
> >> ...
> >>
> >> private void setBooleanColumnValue(final int data, final
> >> ProjectedColumnInfo pci, final int count) {
> >>  ((NullableBitVector.Mutator) pci.vv.getMutator())
> >>.setSafe(count, data);
> >> }
> >>
> >> Then I added this to PcapFormatUtils
> >>
> >> public static int getBit(final byte[] buf, final int posByte, final int
> >> posBit ) {
> >>  byte valByte = buf[posByte];
> >>  return valByte >> (8 - (posBit + 1)) & 0x0001;
> >> }
> >>
> >> I added a column in the Schema.java and a function in Packet.java to get
> >> the bit.  I was getting NPEs and I’m wondering now if the cause wasn’t
> my
> >> code but rather a problem with the bitwriter.  I’m going to play with
> this
> >> and see if I can get Drill to write a True/False value at all, and
> report
> >> back.  If the BitWriter is throwing NPE, I’ll create a JIRA for it.
> >>
> >> Thanks for your work on the PCAP format reader.
> >> —C
> >>
> >>
> >>
> >>> On Jan 3, 2018, at 17:33, Ted Dunning  wrote:
> >>>
> >>> Don't think that will work.
> >>>
> >>> I tried what seemed obvious and got a NPE. Joys of Drill.
> >>>
> >>>
> >>>
> >>> On Wed, Jan 3, 2018 at 1:31 PM, Charles Givre 
> wrote:
> >>>
>  This isn’t the most elegant example, but should do the trick.
> 
>  https://github.com/cgivre/drill-network-functions/blob/
>  master/src/main/java/org/apache/drill/contrib/function/
> IsPrivateIP.java
> >> <
>  https://github.com/cgivre/drill-network-functions/blob/
>  master/src/main/java/org/apache/drill/contrib/function/
> >> IsPrivateIP.java>
> 
> 
> 
> 
> > On Jan 3, 2018, at 16:09, Ted Dunning  wrote:
> >
> > On Wed, Jan 3, 2018 at 12:17 PM, Charles Givre 
> >> wrote:
> >
> >> HI Ted,
> >> This is really looking amazing.  Thank you so much for working on
>  this.  I
> >> wanted to ask whether you’ve tried the BitWriter to write a Boolean
>  value?
> >> I’ve done that in UDFs.
> >
> >
> > Point me to an example. I think booleans would be vastly better.
> 
> 
> >>
> >>
>
>


[GitHub] drill pull request #1047: DRILL-5970: DrillParquetReader always builds the s...

2018-01-09 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1047#discussion_r160419416
  
--- Diff: exec/vector/src/main/codegen/templates/BaseWriter.java ---
@@ -106,37 +114,37 @@
 MapOrListWriter list(String name);
 boolean isMapWriter();
 boolean isListWriter();
-UInt1Writer uInt1(String name);
-UInt2Writer uInt2(String name);
-UInt4Writer uInt4(String name);
-UInt8Writer uInt8(String name);
-VarCharWriter varChar(String name);
-Var16CharWriter var16Char(String name);
-TinyIntWriter tinyInt(String name);
-SmallIntWriter smallInt(String name);
-IntWriter integer(String name);
-BigIntWriter bigInt(String name);
-Float4Writer float4(String name);
-Float8Writer float8(String name);
-BitWriter bit(String name);
-VarBinaryWriter varBinary(String name);
+UInt1Writer uInt1(String name, TypeProtos.DataMode dataMode);
--- End diff --

> The mode passed here cannot be REPEATED: just won't work. So, can we pass 
REQUIRED?
Yes, we can with my improvements - for OPTIONAL dataMode we will pass 
Nullable Vectors and usual one for REQUIRED.

The proposed alternative is interesting, I will think how to implement it 
in the best way. Maybe replacing SingleMapWriter with OptionalMapWriter and 
RequredMapWriter could be done. 

Note: here are some good code refactoring was done, so this PR shouldn't be 
closed.


---


[GitHub] drill issue #1066: DRILL-3993: Changes to support Calcite 1.15

2018-01-09 Thread vvysotskyi
Github user vvysotskyi commented on the issue:

https://github.com/apache/drill/pull/1066
  
@chunhui-shi, I have made additional fixes in new commits (commits after 
DRILL-3993: Changes after code review. 3120762). Could you please take a look? 
Also, I have created pull request on incubator-calcite: 
https://github.com/mapr/incubator-calcite/pull/16


---


[GitHub] drill issue #1049: DRILL-5971: Fix INT64, INT32 logical types in complex par...

2018-01-09 Thread vvysotskyi
Github user vvysotskyi commented on the issue:

https://github.com/apache/drill/pull/1049
  
@parthchandra, thanks for the pull request! LGTM, +1.


---


[GitHub] drill issue #1023: DRILL-5922 Fixed Child Allocator Leak. DRILL-5926 Stabali...

2018-01-09 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1023
  
@paul-rogers For some reason this wan't included in a batch commit. After 
rebasing onto master I noticed some tests were failing sporadically so I've 
included a 1 line fix for DRILL-6003 in the last commit. Can you please take a 
look and +1 again? Hopefully the change will go in this time.


---