Re: [DISCUSS] Drill 1.12.0 release

2017-11-09 Thread Chunhui Shi
Hi Arina,


Could we consider to include DRILL-5089 in 1.12.0? It is about lazy loading 
schema for storage plugins. Could you or Paul take a look at the pull request 
for this JIRA https://github.com/apache/drill/pull/1032? I think both of you 
are familiar with this part.


Thanks,


Chunhui


From: Arina Yelchiyeva 
Sent: Thursday, November 9, 2017 8:11:35 AM
To: dev@drill.apache.org
Subject: Re: [DISCUSS] Drill 1.12.0 release

Yes, they are already in master.

On Thu, Nov 9, 2017 at 6:05 PM, Charles Givre  wrote:

> We’re including the Networking functions in this release right?
>
> > On Nov 9, 2017, at 11:04, Arina Yelchiyeva 
> wrote:
> >
> > If changes will be done before cut off date, targeting mid November that
> it
> > will be possible to include this Jira.
> >
> > On Thu, Nov 9, 2017 at 6:03 PM, Charles Givre  wrote:
> >
> >> Hi Arina,
> >> Can we include DRILL-4091 Support for additional GIS operations in
> version
> >> 1.12?  In general the code looked pretty good.  There was a unit test
> >> missing which the developer submitted and some minor formatting issues
> >> which I’m still waiting on.
> >> Thanks,
> >> —C
> >>
> >>
> >>
> >>> On Nov 9, 2017, at 10:58, Arina Yelchiyeva  >
> >> wrote:
> >>>
> >>> Current status:
> >>>
> >>> Blocker:
> >>> DRILL-5917: Ban org.json:json library in Drill (developer - Vlad R.,
> code
> >>> reviewer - ?) - in progress.
> >>>
> >>> Targeted for 1.12 release:
> >>> DRILL-5337: OpenTSDB plugin (developer - Dmitriy & Vlad S., code
> >> reviewer -
> >>> Arina) - code review in final stage.
> >>> DRILL-4779: Kafka storage plugin support (developer - Anil & Kamesh,
> code
> >>> reviewer - Paul) - in review.
> >>> DRILL-5943: Avoid the strong check introduced by DRILL-5582 for PLAIN
> >>> mechanism (developer - Sorabh, code reviewer - Parth & Laurent) -
> waiting
> >>> for the code review.
> >>> DRILL-5771: Fix serDe errors for format plugins (developer - Arina,
> code
> >>> reviewer - Tim) - waiting for the code review.
> >>>
> >>> Kind regards
> >>> Arina
> >>>
> >>> On Fri, Oct 20, 2017 at 1:49 PM, Arina Yelchiyeva <
> >>> arina.yelchiy...@gmail.com> wrote:
> >>>
>  Current status:
> 
>  Targeted for 1.12 release:
>  DRILL-5832: Migrate OperatorFixture to use SystemOptionManager rather
> >> than
>  mock (developer - Paul, code reviewer - ?) - waiting for the code
> review
>  DRILL-5842: Refactor and simplify the fragment, operator contexts for
>  testing (developer - Paul, code reviewer - ?) - waiting for the code
>  review
>  DRILL-5834: Adding network functions (developer - Charles, code
> reviewer
>  - Arina) - waiting changes after code review
>  DRILL-5337: OpenTSDB plugin (developer - Dmitriy, code reviewer -
> >> Arina) - waiting
>  for the code review
>  DRILL-5772: Enable UTF-8 support in query string by default
> (developer -
>  Arina, code reviewer - Paul) - finalizing approach
>  DRILL-4779: Kafka storage plugin support (developer - Anil, code
> >> reviewer
>  - ?) - finishing implementation
> 
>  Under question:
>  DRILL-4286: Graceful shutdown of drillbit (developer - Jyothsna, code
>  reviewer - ?) - waiting for the status update from the developer
> 
>  Please free to suggest other items that are targeted for 1.12 release.
>  There are many Jiras that have fix version marked as 1.12, it would be
> >> good
>  if developers revisit them and update fix version to the actual one.
>  Link to the dashboard - https://issues.apache.org/
>  jira/secure/RapidBoard.jspa?rapidView=185=
> DRILL=detail
> 
>  Kind regards
>  Arina
> 
> 
>  On Wed, Oct 11, 2017 at 2:42 AM, Parth Chandra 
> >> wrote:
> 
> > I'm waiting to merge the SSL  changes in. Waiting a couple of days
> >> more to
> > see if there are any more comments before I merge the changes in.
> >
> > On Mon, Oct 9, 2017 at 10:28 AM, Paul Rogers 
> wrote:
> >
> >> Hi Arina,
> >>
> >> In addition to my own PRs, there are several in the “active” queue
> >> that
> > we
> >> could get in if we can just push them over the line and clear the
> >> queue.
> >> The owners of the PRs should check if we are waiting on them to take
> > action.
> >>
> >> 977 DRILL-5849: Add freemarker lib to dependencyManagement to
> >> ensure
> >> prop…
> >> 976 DRILL-5797: Choose parquet reader from read columns
> >> 975 DRILL-5743: Handling column family and column scan for hbase
> >> 973 DRILL-5775: Select * query on a maprdb binary table fails
> >> 972 DRILL-5838: Fix MaprDB filter pushdown for the case of
> nested
> >> field (reg. of DRILL-4264)
> >> 950 Drill 5431: SSL Support
> >> 949 

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

2017-11-09 Thread chunhui-shi
GitHub user chunhui-shi opened a pull request:

https://github.com/apache/drill/pull/1032

DRILL-5089: Dynamically load schema of storage plugin only when neede…

…d for every query

For each query, loading all storage plugins and loading all workspaces 
under file system plugins is not needed.

This patch use DynamicRootSchema as the root schema for Drill. Which loads 
correspondent storage only when needed.

infoschema to read full schema information and load second level schema 
accordingly.

for workspaces under the same Filesyetm, no need to create FileSystem for 
each workspace.

use fs.access API to check permission which is available after HDFS 2.6 
except for windows + local file system case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chunhui-shi/drill DRILL-5089-pull

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1032.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1032


commit a381677c59a7371733bae12ad4896b7cc927da5e
Author: chunhui-shi 
Date:   2017-11-03T00:06:25Z

DRILL-5089: Dynamically load schema of storage plugin only when needed for 
every query

For each query, loading all storage plugins and loading all workspaces 
under file system plugins is not needed.

This patch use DynamicRootSchema as the root schema for Drill. Which loads 
correspondent storage only when needed.

infoschema to read full schema information and load second level schema 
accordingly.

for workspaces under the same Filesyetm, no need to create FileSystem for 
each workspace.

use fs.access API to check permission which is available after HDFS 2.6 
except for windows + local file system case.




---


[GitHub] drill pull request #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout...

2017-11-09 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1024#discussion_r150160362
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -96,6 +105,14 @@ private void throwIfClosed() throws 
AlreadyClosedSqlException,
 throw new AlreadyClosedSqlException( "ResultSet is already 
closed." );
   }
 }
+
+//Implicit check for whether timeout is set
+if (elapsedTimer != null) {
--- End diff --

Ok, so I think I see how you've been trying to help me test the server side 
timeout.

You are hoping to have a unit test force the awaiteFirstMessage() throw the 
exception by preventing the server from sending back any batch of data, since 
the sample test data doesn't allow for any query to run sufficiently long. All 
the current tests I've added essentially have already delivered the data from 
the 'Drill Server' to the 'DrillClient', but the application downstream has not 
consumed it.

Your suggestion of putting a `pause` before the `execute()` call got me 
thinking that the timer had already begun after Statement initialization. My 
understanding now is that you're simply asking to block any SCREEN operator 
from sending back any batches. So, the DrillCursor should time out waiting for 
the first batch. In fact, I'm thinking that I don't even need a pause. The 
DrillCursor awaits all the time for something from the SCREEN operator that 
never comes and eventually times out.

However, since the control injection is essentially applying to the 
Connection (`alter session ...`, any other unit tests in parallel execution on 
the same connection, would be affected by this. So, I would need to also undo 
this at the end of the test, if the connection is reused. Or fork off a 
connection exclusively for this.

Was that what you've been suggesting all along?


---


[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...

2017-11-09 Thread weijietong
Github user weijietong commented on the issue:

https://github.com/apache/drill/pull/889
  
@amansinha100 thanks for sharing the information.  Got your point. I think 
your propose on  
[CALCITE-1048](https://issues.apache.org/jira/browse/CALCITE-1048) is possible. 
Since [CALCITE-794](https://issues.apache.org/jira/browse/CALCITE-794) has 
completed at version 1.6 ,it seems there's a more perfect solution( to get the 
least max number of all the rels of the RelSubSet). But due to Drill's Caclite 
version is still based on 1.4 , I support your current temp solution. Only 
wonder that whether the explicitly searched RelNode's (such as 
DrillAggregateRel) maxRowCount can represent the best RelNode's maxRowCount ?  


---


[GitHub] drill pull request #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout...

2017-11-09 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1024#discussion_r150157923
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -96,6 +105,14 @@ private void throwIfClosed() throws 
AlreadyClosedSqlException,
 throw new AlreadyClosedSqlException( "ResultSet is already 
closed." );
   }
 }
+
+//Implicit check for whether timeout is set
+if (elapsedTimer != null) {
--- End diff --

I don't think you are wrong, but I think the interpretation of the timeout 
is ambiguous. My understanding based on what drivers like Oracle do is to start 
the timeout only when the execute call is made. So, for a regular Statement 
object, just initialization (or even setting the timeout) should not be the 
basis of starting the timer. 
With regards to whether we are testing for the time when only the 
DrillCursor is in operation, we'd need a query that is running sufficiently 
long to timeout before the server can send back anything for the very first 
time. The `awaitFirstMessage()` already has the timeout applied there and 
worked in some of my longer running sample queries. If you're hinting towards 
this, then yes.. it is certainly doesn't hurt to have the test, although the 
timeout does guarantee exactly that.

I'm not familiar with the Drillbit Injection feature, so let me tinker a 
bit to confirm it before I update the PR.


---


[GitHub] drill pull request #1031: DRILL-5917: Ban org.json:json library in Drill

2017-11-09 Thread vrozov
GitHub user vrozov opened a pull request:

https://github.com/apache/drill/pull/1031

DRILL-5917: Ban org.json:json library in Drill

@arina-ielchiieva Please review

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vrozov/drill DRILL-5917

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1031.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1031


commit 525989d8603954bfd5e12a680acde7ac00e26300
Author: Vlad Rozov 
Date:   2017-11-08T01:07:38Z

DRILL-5917: Ban org.json:json library in Drill




---


[GitHub] drill issue #1014: DRILL-5771: Fix serDe errors for format plugins

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1014
  
Not sure the description here is entirely correct. Let's separate two 
concepts: the plugin (code) and the plugin definition (the stuff in JSON.)

Plugin definitions are stored in ZK and retrieved by the Foreman. There may 
be some form of race condition in the Foreman, but that's not my focus here.

The plugin *definition* is read by the Forman and serialized into the 
physical plan. Each worker reads the definition from the physical plan. For 
this reason, the worker's definition can never be out of date: it is the 
definition used when serializing the plan.

Further, Drill allows table functions which provide query-time name/value 
pair settings for format plugin properties. The only way these can work is to 
be serialized along with the query. So, the actual serialized format plugin 
definition, included with the query, includes both the ZK information and the 
table function information.


---


[jira] [Created] (DRILL-5950) Allow JSON files to be splittable - for sequence of objects format

2017-11-09 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5950:
--

 Summary: Allow JSON files to be splittable - for sequence of 
objects format
 Key: DRILL-5950
 URL: https://issues.apache.org/jira/browse/DRILL-5950
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.12.0
Reporter: Paul Rogers


The JSON plugin format is not currently splittable. This means that every JSON 
file must be read by only a single thread. By contrast, text files are 
splittable.

The key barrier to allowing JSON files to be splittable is the lack of a good 
mechanism to find the start of a record at some arbitrary point in the file. 
Text readers handle this by scanning forward looking for (say) the newline that 
separates records. (Though this process can be thrown off if a newline appears 
in a quoted value, and the start quote appears before the split point.)

However, as was discovered in a previous JSON fix, Drill's form of JSON does 
provide the tools. In standard JSON, a list of records must be stuctured as a 
list:

{code}
[ { text: "first record"},
  { text: "second record"},
  ...
  { text: "final record" }
]
{code}

In this form, it is impossible to find the start of a record without parsing 
from the first character onwards.

But, Drill uses a common, but non-standard, JSON structure that dispenses with 
the array and the commas between records:

{code}
{ text: "first record" }
{ text: "second record" }
...
{ text: "last record" }
{code}

This form does unambiguously allow finding the start of the record. Simply scan 
until we find the tokens: , , possibly separated by white space. 
That sequence is not valid JSON and only occurs between records in the 
sequence-of-records format.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] drill issue #1014: DRILL-5771: Fix serDe errors for format plugins

2017-11-09 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1014
  
@arina-ielchiieva 

- The parts addressing DRILL-4640 and DRILL-5166 LGTM
- I think the fix for DRILL-5771 LGTM but I would like write down what I 
think is happening and confirm with you that my understanding is correct. This 
is mostly just a learning exercise for me since I am not very familiar with 
this part of the code :).

In DRILL-5771 there were two issues.

## Race Conditions With Format Plugins

### Issue

The following used to happen before the fix:

  1. When using an existing format plugin, the **FormatPlugin** would 
create a **DrillTable** with a **NamedFormatPluginConfig** which only contains 
the name of the format plugin to use.
  1. The **ScanOperator** created for a **DrillTable** will contain the 
**NamedFormatPluginConfig**
  1. When the **ScanOperators** are serialized in to the physical plan the 
serialized **ScanOperator** will only contain the name of the format plugin to 
use.
  1. When a worker deserializes the physical plan to do a scan, he gets the 
name of the **FormatPluginConfig** to use.
  1. The worker then looks up the correct **FormatPlugin** in the 
**FormatCreator** using the name he has.
  1. The worker can get into trouble if the **FormatPlugins** he has cached 
in his **FormatCreator** is out of sync with the rest of the cluster.

### Fix

Race conditions are eliminated because the **DrillTables** returned by the 
**FormatPlugins** no longer contain a **NamedFormatPluginConfig**, they contain 
the full **FormatPluginConfig** not just a name alias. So when a query is 
executed:
  1. The ScanOperator contains the complete **FormatPluginConfig**
  1. When the physical plan is serialized it contains the complete 
**FormatPluginConfig** for each scan operator.
  1. When a worker node deserializes the ScanOperator it also has the 
complete **FormatPluginConfig** so it can reconstruct the **FormatPlugin** 
correctly, whereas previously the worker would have to do a lookup using the 
**FormatPlugin** name in the **FormatCreator** when the cache in the 
**FormatCreator** may be out of sync with the rest of the cluster. 

## FormatPluginConfig Equals and HashCode

### Issue

The **FileSystemPlugin** looks up **FormatPlugins** corresponding to a 
**FormatPluginConfig** in formatPluginsByConfig. However, the 
**FormatPluginConfig** implementations didn't override equals and hashCode.




---


[GitHub] drill pull request #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout...

2017-11-09 Thread laurentgo
Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/1024#discussion_r150131105
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -96,6 +105,14 @@ private void throwIfClosed() throws 
AlreadyClosedSqlException,
 throw new AlreadyClosedSqlException( "ResultSet is already 
closed." );
   }
 }
+
+//Implicit check for whether timeout is set
+if (elapsedTimer != null) {
--- End diff --

Yes, I'm wrong? (asking because the rest of the sentence suggest I was 
right in my interpretation of the test). Maybe we can/should test both? I would 
have like to test for the first batch, but it's not possible to access the 
query id until `statement.execute()`, and I'd need it to unpause the request.


---


[GitHub] drill issue #1025: DRILL-5936: Refactor MergingRecordBatch based on code rev...

2017-11-09 Thread priteshm
Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1025
  
@amansinha100 can you review this change?


---


[GitHub] drill pull request #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout...

2017-11-09 Thread kkhatua
Github user kkhatua commented on a diff in the pull request:

https://github.com/apache/drill/pull/1024#discussion_r150127658
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -96,6 +105,14 @@ private void throwIfClosed() throws 
AlreadyClosedSqlException,
 throw new AlreadyClosedSqlException( "ResultSet is already 
closed." );
   }
 }
+
+//Implicit check for whether timeout is set
+if (elapsedTimer != null) {
--- End diff --

Yes. So I'm testing for the part when the batch has been fetched byt 
DrillCursor but not consumed via the DrillResultSetImpl. That's why I found the 
need for pausing the Screen operator odd and, hence, the question.


---


[GitHub] drill pull request #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout...

2017-11-09 Thread laurentgo
Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/1024#discussion_r150119338
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -96,6 +105,14 @@ private void throwIfClosed() throws 
AlreadyClosedSqlException,
 throw new AlreadyClosedSqlException( "ResultSet is already 
closed." );
   }
 }
+
+//Implicit check for whether timeout is set
+if (elapsedTimer != null) {
--- End diff --

I wonder if we actually test timeout during DrillCursor operations. It 
seems your test relies on the user being slow to read data from the result set 
although the data has already been fetched by the client. Am I wrong?


---


[GitHub] drill pull request #1028: DRILL-5943: Avoid the strong check introduced by D...

2017-11-09 Thread laurentgo
Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/1028#discussion_r150118518
  
--- Diff: contrib/native/client/src/clientlib/saslAuthenticatorImpl.hpp ---
@@ -59,6 +59,12 @@ class SaslAuthenticatorImpl {
 
 const char *getErrorMessage(int errorCode);
 
+static const std::string KERBEROS_SIMPLE_NAME;
+
+static const std::string KERBEROS_SASL_NAME;
--- End diff --

do we need to expose it? (it looks like we only look for the keys)


---


[jira] [Created] (DRILL-5949) JSON format options should be part of plugin config; not session options

2017-11-09 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5949:
--

 Summary: JSON format options should be part of plugin config; not 
session options
 Key: DRILL-5949
 URL: https://issues.apache.org/jira/browse/DRILL-5949
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.12.0
Reporter: Paul Rogers


Drill provides a JSON record reader. Drill provides two ways to configure this 
reader:

* Using the JSON plugin configuration.
* Using a set of session options.

The plugin configuration defines the file suffix associated with JSON files. 
The session options are:

* {{store.json.all_text_mode}}
* {{store.json.read_numbers_as_double}}
* {{store.json.reader.skip_invalid_records}}
* {{store.json.reader.print_skipped_invalid_record_number}}

Suppose I have to JSON files from different sources (and keep them in distinct 
directories.) For the one, I want to use {{all_text_mode}} off as the data is 
nicely formatted. Also, my numbers are fine, so I want 
{{read_numbers_as_double}} off.

But, the other file is a mess and uses a rather ad-hoc format. So, I want these 
two options turned on.

As it turns out I often query both files. Today, I must set the session options 
one way to query my "clean" file, then reverse them to query the "dirty" file.

Next, I want to join the two files. How do I set the options one way for the 
"clean" file, and the other for the "dirty" file within the *same query*? Can't.

Now, consider the text format plugin that can read CSV, TSV, PSV and so on. It 
has a variety of options. But, the are *not* session options; they are instead 
options in the plugin definition. This allows me to, say, have a plugin config 
for CSV-with-headers files that I get from source A, and a different plugin 
config for my CSV-without-headers files from source B.

Suppose we applied the text reader technique to the JSON reader. We'd move the 
session options listed above into the JSON format plugin. Then, I can define 
one plugin for my "clean" files, and a different plugin config for my "dirty" 
files.

What's more, I can then use table functions to adjust the format for each file 
as needed within a single query. Since table functions are part of a query, I 
can add them to a view that I define for the various JSON files.

The result is a far simpler user experience than the tedium of resetting 
session options for every query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-09 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r15009
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetComparison.java 
---
@@ -255,4 +257,39 @@ private void verifyArray(String colLabel, ArrayReader 
ea,
   }
 }
   }
+
+  // TODO make a native RowSetComparison comparator
+  public static class ObjectComparator implements Comparator {
--- End diff --

This is used in the DrillTestWrapper to verify the ordering of results. I 
agree this is not suitable for equality tests, but it's intended to be used 
only for ordering tests. I didn't add support for all the supported RowSet 
types because we would first have to move DrillTestWrapper to use RowSets 
(currently it uses Maps and Lists to represent data). Currently it is not used 
by RowSets, but the intention is to move DrillTestWrapper to use RowSets and 
then make this comparator operate on RowSets, but that will be an incremental 
process.


---


[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150097140
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java ---
@@ -85,8 +85,7 @@
* new row set with the updated columns, then merge the new
* and old row sets to create a new immutable row set.
*/
-
-  public interface RowSetWriter extends TupleWriter {
+  interface RowSetWriter extends TupleWriter {
--- End diff --

Ah, forgot that the file defines an interface, not a class. (The situation 
I described was an interface nested inside a class.) So, you're good.


---


[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-09 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150096444
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/file/JsonFileBuilder.java
 ---
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test.rowSet.file;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.accessor.ColumnAccessor;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.test.rowSet.RowSet;
+
+import java.io.BufferedOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class JsonFileBuilder
+{
+  public static final String DEFAULT_DOUBLE_FORMATTER = "%f";
+  public static final String DEFAULT_INTEGER_FORMATTER = "%d";
+  public static final String DEFAULT_LONG_FORMATTER = "%d";
+  public static final String DEFAULT_STRING_FORMATTER = "\"%s\"";
+  public static final String DEFAULT_DECIMAL_FORMATTER = "%s";
+  public static final String DEFAULT_PERIOD_FORMATTER = "%s";
+
+  public static final Map DEFAULT_FORMATTERS = new 
ImmutableMap.Builder()
+.put(ColumnAccessor.ValueType.DOUBLE, DEFAULT_DOUBLE_FORMATTER)
+.put(ColumnAccessor.ValueType.INTEGER, DEFAULT_INTEGER_FORMATTER)
+.put(ColumnAccessor.ValueType.LONG, DEFAULT_LONG_FORMATTER)
+.put(ColumnAccessor.ValueType.STRING, DEFAULT_STRING_FORMATTER)
+.put(ColumnAccessor.ValueType.DECIMAL, DEFAULT_DECIMAL_FORMATTER)
+.put(ColumnAccessor.ValueType.PERIOD, DEFAULT_PERIOD_FORMATTER)
+.build();
+
+  private final RowSet rowSet;
+  private final Map customFormatters = Maps.newHashMap();
+
+  public JsonFileBuilder(RowSet rowSet) {
+this.rowSet = Preconditions.checkNotNull(rowSet);
+Preconditions.checkArgument(rowSet.rowCount() > 0, "The given rowset 
is empty.");
+  }
+
+  public JsonFileBuilder setCustomFormatter(final String columnName, final 
String columnFormatter) {
+Preconditions.checkNotNull(columnName);
+Preconditions.checkNotNull(columnFormatter);
+
+Iterator fields = rowSet
+  .schema()
+  .batch()
+  .iterator();
+
+boolean hasColumn = false;
+
+while (!hasColumn && fields.hasNext()) {
+  hasColumn = fields.next()
+.getName()
+.equals(columnName);
+}
+
+final String message = String.format("(%s) is not a valid column", 
columnName);
+Preconditions.checkArgument(hasColumn, message);
+
+customFormatters.put(columnName, columnFormatter);
+
+return this;
+  }
+
+  public void build(File tableFile) throws IOException {
--- End diff --

Sounds Good


---


[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-09 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150096261
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java ---
@@ -85,8 +85,7 @@
* new row set with the updated columns, then merge the new
* and old row sets to create a new immutable row set.
*/
-
-  public interface RowSetWriter extends TupleWriter {
+  interface RowSetWriter extends TupleWriter {
--- End diff --

IntelliJ gave a warning that the modifier is redundant. Also an interface 
nested inside another interface is public by default.

https://beginnersbook.com/2016/03/nested-or-inner-interfaces-in-java/


---


[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150087815
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java
 ---
@@ -0,0 +1,61 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import java.util.List;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.impl.BatchCreator;
+import org.apache.drill.exec.physical.impl.ScanBatch;
+import org.apache.drill.exec.record.CloseableRecordBatch;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.store.RecordReader;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+public class KafkaScanBatchCreator implements BatchCreator {
+  static final Logger logger = 
LoggerFactory.getLogger(KafkaScanBatchCreator.class);
+
+  @Override
+  public CloseableRecordBatch getBatch(FragmentContext context, 
KafkaSubScan subScan, List children)
+  throws ExecutionSetupException {
+Preconditions.checkArgument(children.isEmpty());
+List readers = Lists.newArrayList();
+List columns = null;
+for (KafkaSubScan.KafkaSubScanSpec scanSpec : 
subScan.getPartitionSubScanSpecList()) {
+  try {
+if ((columns = subScan.getCoulmns()) == null) {
+  columns = GroupScan.ALL_COLUMNS;
+}
--- End diff --

When will the columns be null? Not sure this is a valid state. However, as 
noted above, an empty list is a valid state (used for `COUNT(*)` queries.)


---


[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150087650
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java
 ---
@@ -0,0 +1,61 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import java.util.List;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.impl.BatchCreator;
+import org.apache.drill.exec.physical.impl.ScanBatch;
+import org.apache.drill.exec.record.CloseableRecordBatch;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.store.RecordReader;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+public class KafkaScanBatchCreator implements BatchCreator {
+  static final Logger logger = 
LoggerFactory.getLogger(KafkaScanBatchCreator.class);
+
+  @Override
+  public CloseableRecordBatch getBatch(FragmentContext context, 
KafkaSubScan subScan, List children)
+  throws ExecutionSetupException {
+Preconditions.checkArgument(children.isEmpty());
+List readers = Lists.newArrayList();
+List columns = null;
+for (KafkaSubScan.KafkaSubScanSpec scanSpec : 
subScan.getPartitionSubScanSpecList()) {
+  try {
+if ((columns = subScan.getCoulmns()) == null) {
+  columns = GroupScan.ALL_COLUMNS;
+}
--- End diff --

The column list can be shared by all readers, and so can be created outside 
of the loop over scan specs.


---


[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150087581
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java
 ---
@@ -0,0 +1,61 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import java.util.List;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.impl.BatchCreator;
+import org.apache.drill.exec.physical.impl.ScanBatch;
+import org.apache.drill.exec.record.CloseableRecordBatch;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.store.RecordReader;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+public class KafkaScanBatchCreator implements BatchCreator {
+  static final Logger logger = 
LoggerFactory.getLogger(KafkaScanBatchCreator.class);
+
+  @Override
+  public CloseableRecordBatch getBatch(FragmentContext context, 
KafkaSubScan subScan, List children)
+  throws ExecutionSetupException {
+Preconditions.checkArgument(children.isEmpty());
+List readers = Lists.newArrayList();
+List columns = null;
+for (KafkaSubScan.KafkaSubScanSpec scanSpec : 
subScan.getPartitionSubScanSpecList()) {
+  try {
+if ((columns = subScan.getCoulmns()) == null) {
+  columns = GroupScan.ALL_COLUMNS;
+}
--- End diff --

`getCoulmns()` --> `getColumns()`


---


[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150088237
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java
 ---
@@ -0,0 +1,61 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import java.util.List;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.impl.BatchCreator;
+import org.apache.drill.exec.physical.impl.ScanBatch;
+import org.apache.drill.exec.record.CloseableRecordBatch;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.store.RecordReader;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+
+public class KafkaScanBatchCreator implements BatchCreator {
+  static final Logger logger = 
LoggerFactory.getLogger(KafkaScanBatchCreator.class);
+
+  @Override
+  public CloseableRecordBatch getBatch(FragmentContext context, 
KafkaSubScan subScan, List children)
+  throws ExecutionSetupException {
+Preconditions.checkArgument(children.isEmpty());
+List readers = Lists.newArrayList();
+List columns = null;
+for (KafkaSubScan.KafkaSubScanSpec scanSpec : 
subScan.getPartitionSubScanSpecList()) {
+  try {
+if ((columns = subScan.getCoulmns()) == null) {
+  columns = GroupScan.ALL_COLUMNS;
+}
+readers.add(new KafkaRecordReader(scanSpec, columns, context, 
subScan.getKafkaStoragePlugin()));
+  } catch (Exception e) {
+logger.error("KafkaRecordReader creation failed for subScan:  " + 
subScan + ".",e);
--- End diff --

Here we are catching all errors and putting a generic messages into the 
log, and sending a generic exception up the stack. Better is to use a 
`UserException` at the actual point of failure so we can tell the user exactly 
what is wrong. Then, here, have a `catch` block for `UserException` that simply 
rethrows, while handling all other exceptions as is done here.


---


[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150086292
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+}
+return transformed;
+  }
+
+  @Override
+  public void setup(OperatorContext context, OutputMutator output) throws 
ExecutionSetupException {
+this.writer = new VectorContainerWriter(output, 

[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150084784
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+}
+return transformed;
+  }
+
+  @Override
+  public void setup(OperatorContext context, OutputMutator output) throws 
ExecutionSetupException {
+this.writer = new VectorContainerWriter(output, 

[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150081972
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
--- End diff --

Preparing columns is really quite difficult with many cases to handle. 
Drill appears to allow projection of the form `a.b`, `a.c` which means that `a` 
is a map and we wish to project just `b` and `c` from `a`.

As it turns out, there is an active project to replace the ad-hoc 
projection logic in each reader with a common, complete implementation. That 
work may be ready by Drill 1.13, so whatever is done here is temporary.

Finally, the code checks for 

[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150084335
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+}
+return transformed;
+  }
+
+  @Override
+  public void setup(OperatorContext context, OutputMutator output) throws 
ExecutionSetupException {
+this.writer = new VectorContainerWriter(output, 

[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150086335
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+}
+return transformed;
+  }
+
+  @Override
+  public void setup(OperatorContext context, OutputMutator output) throws 
ExecutionSetupException {
+this.writer = new VectorContainerWriter(output, 

[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150087367
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+}
+return transformed;
+  }
+
+  @Override
+  public void setup(OperatorContext context, OutputMutator output) throws 
ExecutionSetupException {
+this.writer = new VectorContainerWriter(output, 

[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150082711
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+}
+return transformed;
+  }
+
+  @Override
+  public void setup(OperatorContext context, OutputMutator output) throws 
ExecutionSetupException {
+this.writer = new VectorContainerWriter(output, 

[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150083767
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java
 ---
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import static 
org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT;
+
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec;
+import org.apache.drill.exec.store.kafka.decoders.MessageReader;
+import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory;
+import org.apache.drill.exec.util.Utilities;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+public class KafkaRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(KafkaRecordReader.class);
+  public static final long DEFAULT_MESSAGES_PER_BATCH = 4000;
+
+  private VectorContainerWriter writer;
+  private MessageReader messageReader;
+
+  private boolean unionEnabled;
+  private KafkaConsumer kafkaConsumer;
+  private KafkaStoragePlugin plugin;
+  private KafkaSubScanSpec subScanSpec;
+  private long kafkaPollTimeOut;
+  private long endOffset;
+
+  private long currentOffset;
+  private long totalFetchTime = 0;
+
+  private List partitions;
+  private final boolean enableAllTextMode;
+  private final boolean readNumbersAsDouble;
+
+  private Iterator> messageIter;
+
+  public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, 
List projectedColumns,
+  FragmentContext context, KafkaStoragePlugin plugin) {
+setColumns(projectedColumns);
+this.enableAllTextMode = 
context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val;
+this.readNumbersAsDouble = context.getOptions()
+
.getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val;
+this.unionEnabled = 
context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);
+this.plugin = plugin;
+this.subScanSpec = subScanSpec;
+this.endOffset = subScanSpec.getEndOffset();
+this.kafkaPollTimeOut = 
Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT));
+  }
+
+  @Override
+  protected Collection transformColumns(Collection 
projectedColumns) {
+Set transformed = Sets.newLinkedHashSet();
+if (!isStarQuery()) {
+  for (SchemaPath column : projectedColumns) {
+transformed.add(column);
+  }
+} else {
+  transformed.add(Utilities.STAR_COLUMN);
+}
+return transformed;
+  }
+
+  @Override
+  public void setup(OperatorContext context, OutputMutator output) throws 
ExecutionSetupException {
+this.writer = new VectorContainerWriter(output, 

[GitHub] drill pull request #1029: DRILL-5867: List profiles in pages rather than a l...

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1029#discussion_r150086785
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java
 ---
@@ -93,13 +96,35 @@ public ProfileInfo(DrillConfig drillConfig, String 
queryId, long startTime, long
   this.time = new Date(startTime);
   this.foreman = foreman;
   this.link = generateLink(drillConfig, foreman, queryId);
-  this.query = query.substring(0,  Math.min(query.length(), 150));
+  this.query = extractQuerySnippet(query);
   this.state = state;
   this.user = user;
   this.totalCost = totalCost;
   this.queueName = queueName;
 }
 
+private String extractQuerySnippet(String queryText) {
+  //Extract upto max char limit as snippet
+  String sizeCappedQuerySnippet = queryText.substring(0,  
Math.min(queryText.length(), QUERY_SNIPPET_MAX_CHAR));
+  //Trimming down based on line-count
+  if ( QUERY_SNIPPET_MAX_LINES < 
sizeCappedQuerySnippet.split(System.lineSeparator()).length ) {
+int linesConstructed = 0;
+StringBuilder lineCappedQuerySnippet = new StringBuilder();
+String[] queryParts = 
sizeCappedQuerySnippet.split(System.lineSeparator());
+for (String qPart : queryParts) {
+  lineCappedQuerySnippet.append(qPart);
+  if ( ++linesConstructed < QUERY_SNIPPET_MAX_LINES ) {
+lineCappedQuerySnippet.append(System.lineSeparator());
--- End diff --

Do we want to append with new line or maybe space for better readability?


---


[GitHub] drill pull request #1029: DRILL-5867: List profiles in pages rather than a l...

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1029#discussion_r150085841
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java
 ---
@@ -93,13 +96,35 @@ public ProfileInfo(DrillConfig drillConfig, String 
queryId, long startTime, long
   this.time = new Date(startTime);
   this.foreman = foreman;
   this.link = generateLink(drillConfig, foreman, queryId);
-  this.query = query.substring(0,  Math.min(query.length(), 150));
+  this.query = extractQuerySnippet(query);
   this.state = state;
   this.user = user;
   this.totalCost = totalCost;
   this.queueName = queueName;
 }
 
+private String extractQuerySnippet(String queryText) {
--- End diff --

1. I usually place private method int he end of of the class.
2. We can add javadoc here explaining that first we limit original query 
size and if size fits but query has too many lines we limit it as well for 
better readability on Web UI.


---


[GitHub] drill pull request #1029: DRILL-5867: List profiles in pages rather than a l...

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1029#discussion_r150086018
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java
 ---
@@ -93,13 +96,35 @@ public ProfileInfo(DrillConfig drillConfig, String 
queryId, long startTime, long
   this.time = new Date(startTime);
   this.foreman = foreman;
   this.link = generateLink(drillConfig, foreman, queryId);
-  this.query = query.substring(0,  Math.min(query.length(), 150));
+  this.query = extractQuerySnippet(query);
   this.state = state;
   this.user = user;
   this.totalCost = totalCost;
   this.queueName = queueName;
 }
 
+private String extractQuerySnippet(String queryText) {
+  //Extract upto max char limit as snippet
+  String sizeCappedQuerySnippet = queryText.substring(0,  
Math.min(queryText.length(), QUERY_SNIPPET_MAX_CHAR));
+  //Trimming down based on line-count
+  if ( QUERY_SNIPPET_MAX_LINES < 
sizeCappedQuerySnippet.split(System.lineSeparator()).length ) {
--- End diff --

1. We can create variable for 
`sizeCappedQuerySnippet.split(System.lineSeparator())` so we do split only once.
2. Please remove spaces in `if` clause: `if ( QUERY_SNIPPET_MAX_LINES < 
sizeCappedQuerySnippet.split(System.lineSeparator()).length ) {` -> `if 
(QUERY_SNIPPET_MAX_LINES < splittedQuery.length) {` and in `if ( 
++linesConstructed < QUERY_SNIPPET_MAX_LINES ) {` in the code below.


---


[GitHub] drill issue #1021: DRILL-5923: Display name for query state

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1021
  
Well, I don't have strong preference here, we can use array, as long as 
Prasad makes it nicely documented as in your example rather then in one line.
```
String displayNames[] = {
  "First Value", // FIRST_VALUE = 0
  "Second Value", // SECOND_VALUE = 1
  ...
};
```


---


Errors Building Drill

2017-11-09 Thread Charles Givre
Hello all,
I’m getting the following errors when I try to build Drill from source with the 
tests.  If I skip the mongodb test it builds fine, but I’m not sure what could 
be causing this.  Any suggestions?
— C


2017-11-09T15:13:25.047-0500 I NETWORK  [conn2] end connection 127.0.0.1:58580 
(1 connection now open)
[mongod output] 2017-11-09T15:13:25.158-0500 I CONTROL  [conn6] now exiting
[mongod output] 2017-11-09T15:13:25.158-0500 I NETWORK  [conn6] shutdown: going 
to close listening sockets...
[mongod output] 2017-11-09T15:13:25.158-0500 I NETWORK  [conn6] closing 
listening socket: 5
[mongod output] 2017-11-09T15:13:25.158-0500 I NETWORK  [conn6] closing 
listening socket: 6
[mongod output] 2017-11-09T15:13:25.158-0500 I NETWORK  [conn6] removing socket 
file: /tmp/mongodb-27022.sock
[mongod output] 2017-11-09T15:13:25.158-0500 I NETWORK  [conn6] shutdown: going 
to flush diaglog...
[mongod output] 2017-11-09T15:13:25.158-0500 I NETWORK  [conn6] shutdown: going 
to close sockets...
[mongod output] 2017-11-09T15:13:25.158-0500 I STORAGE  [conn6] 
WiredTigerKVEngine shutting down
[mongod output] 2017-11-09T15:13:25.311-0500 I STORAGE  [conn6] shutdown: 
removing fs lock...
[mongod output] 2017-11-09T15:13:25.312-0500 I CONTROL  [conn6] dbexit:  rc: 0
[mongod output] 

Results :

Tests in error: 
  MongoTestSuit.initMongo:231 » IO Could not start process: 

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0



[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150073673
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetComparison.java 
---
@@ -255,4 +257,39 @@ private void verifyArray(String colLabel, ArrayReader 
ea,
   }
 }
   }
+
+  // TODO make a native RowSetComparison comparator
+  public static class ObjectComparator implements Comparator {
--- End diff --

Defined here, but not used in this file. Does not include all types that 
Drill supports (via the RowSet): Date, byte arrays, BigDecimal, etc. Does not 
allow for ranges for floats & doubles as does JUnit. (Two floats are seldom 
exactly equal.)


---


[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150072992
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/file/JsonFileBuilder.java
 ---
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.test.rowSet.file;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.accessor.ColumnAccessor;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.test.rowSet.RowSet;
+
+import java.io.BufferedOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class JsonFileBuilder
+{
+  public static final String DEFAULT_DOUBLE_FORMATTER = "%f";
+  public static final String DEFAULT_INTEGER_FORMATTER = "%d";
+  public static final String DEFAULT_LONG_FORMATTER = "%d";
+  public static final String DEFAULT_STRING_FORMATTER = "\"%s\"";
+  public static final String DEFAULT_DECIMAL_FORMATTER = "%s";
+  public static final String DEFAULT_PERIOD_FORMATTER = "%s";
+
+  public static final Map DEFAULT_FORMATTERS = new 
ImmutableMap.Builder()
+.put(ColumnAccessor.ValueType.DOUBLE, DEFAULT_DOUBLE_FORMATTER)
+.put(ColumnAccessor.ValueType.INTEGER, DEFAULT_INTEGER_FORMATTER)
+.put(ColumnAccessor.ValueType.LONG, DEFAULT_LONG_FORMATTER)
+.put(ColumnAccessor.ValueType.STRING, DEFAULT_STRING_FORMATTER)
+.put(ColumnAccessor.ValueType.DECIMAL, DEFAULT_DECIMAL_FORMATTER)
+.put(ColumnAccessor.ValueType.PERIOD, DEFAULT_PERIOD_FORMATTER)
+.build();
+
+  private final RowSet rowSet;
+  private final Map customFormatters = Maps.newHashMap();
+
+  public JsonFileBuilder(RowSet rowSet) {
+this.rowSet = Preconditions.checkNotNull(rowSet);
+Preconditions.checkArgument(rowSet.rowCount() > 0, "The given rowset 
is empty.");
+  }
+
+  public JsonFileBuilder setCustomFormatter(final String columnName, final 
String columnFormatter) {
+Preconditions.checkNotNull(columnName);
+Preconditions.checkNotNull(columnFormatter);
+
+Iterator fields = rowSet
+  .schema()
+  .batch()
+  .iterator();
+
+boolean hasColumn = false;
+
+while (!hasColumn && fields.hasNext()) {
+  hasColumn = fields.next()
+.getName()
+.equals(columnName);
+}
+
+final String message = String.format("(%s) is not a valid column", 
columnName);
+Preconditions.checkArgument(hasColumn, message);
+
+customFormatters.put(columnName, columnFormatter);
+
+return this;
+  }
+
+  public void build(File tableFile) throws IOException {
--- End diff --

Great! This does not yet handle nested tuples or arrays; in part because 
the row set work for that is still sitting in PR #914. You can update this to 
be aware of maps and map arrays once that PR is committed.


---


[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r150073945
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java ---
@@ -85,8 +85,7 @@
* new row set with the updated columns, then merge the new
* and old row sets to create a new immutable row set.
*/
-
-  public interface RowSetWriter extends TupleWriter {
+  interface RowSetWriter extends TupleWriter {
--- End diff --

Aren't nested interfaces `protected` by default? Just had to change one 
from default to `public` so I could use it in another package...


---


[GitHub] drill issue #1021: DRILL-5923: Display name for query state

2017-11-09 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1021
  
@arina-ielchiieva, it helps to think about the source of the enum. This is 
a Protobuf enum. The ordinal values cannot change; they are a contract between 
sender and receiver. We can add new ones, or retire old ones, but otherwise the 
values are frozen in time.

The array approach captures this reality. We could document the array 
better:
```
String displayNames[] = {
  "First Value", // FIRST_VALUE = 0
  "Second Value", // SECOND_VALUE = 1
  ...
};
```
We can also do a bounds check:
```
if (enumValue.ordinal() >= displayNames.length) {
  return enumValue.toString();
else
  return displayNames[enumValue.ordinal());
}
```
But, IMHO a map seems overkill for such a simple task. Yes, it works, but 
is unnecessary. As they say, "make it as simple as possible (but no simpler)."


---


[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit

2017-11-09 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/921#discussion_r150047464
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
@@ -348,6 +354,21 @@ public void run() {
  */
   }
 
+  /*
+Check if the foreman is ONLINE. If not dont accept any new queries.
+   */
+  public void checkForemanState() throws ForemanException{
+DrillbitEndpoint foreman = drillbitContext.getEndpoint();
+Collection dbs = drillbitContext.getAvailableBits();
--- End diff --

I was thinking of encapsulating code from lines 360 to 367 into a boolean 
isOnline(), since all the values in that code are derived from the current 
DrillbitContext.  Then your code would be simplified to 
`
public void checkForemanState() throws ForemanException{
  if (!drillbitContext.isOnline()) {
  throw new ForemanException("Query submission failed since Foreman is 
shutting down.");
  }
}
`


---


[GitHub] drill issue #1029: DRILL-5867: List profiles in pages rather than a long ver...

2017-11-09 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1029
  
Snapshot when testing with search filter for FAILED queries and navigating 
to page 2 of that list. Information about the number of filtered items, etc is 
also provided.

![image](https://user-images.githubusercontent.com/4335237/32622085-a8826c56-c536-11e7-9a18-7a09142b250e.png)



---


[GitHub] drill issue #1029: DRILL-5867: List profiles in pages rather than a long ver...

2017-11-09 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1029
  
Snapshot when rendering the defaults (10 per page) from a pre-loaded set of 
the latest 123 profiles

![image](https://user-images.githubusercontent.com/4335237/32621917-412a90ba-c536-11e7-9d51-83220ce072d3.png)
The query snippet is restricted to 8 lines at most and indicates if there 
is more to the query text with a trailing set of `...`


---


[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150028303
  
--- Diff: contrib/storage-kafka/pom.xml ---
@@ -0,0 +1,130 @@
+
+
+http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;
+  xmlns="http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;>
+  4.0.0
+
+  
+drill-contrib-parent
+org.apache.drill.contrib
+1.12.0-SNAPSHOT
+  
+
+  drill-storage-kafka
+  contrib/kafka-storage-plugin
+
+  
+UTF-8
+0.11.0.1
+**/KafkaTestSuit.class
--- End diff --

What is the reason to define `kafka.TestSuite` property?


---


[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150029170
  
--- Diff: contrib/storage-kafka/pom.xml ---
@@ -0,0 +1,130 @@
+
+
+http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;
+  xmlns="http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;>
+  4.0.0
+
+  
+drill-contrib-parent
+org.apache.drill.contrib
+1.12.0-SNAPSHOT
+  
+
+  drill-storage-kafka
+  contrib/kafka-storage-plugin
+
+  
+UTF-8
+0.11.0.1
+**/KafkaTestSuit.class
+  
+
+  
+
+  
+org.apache.maven.plugins
+maven-surefire-plugin
+
--- End diff --

It will be better to go with the default `maven-surefire-plugin` 
configuration unless there is a good justification to use custom config. Most 
of the time this can be achieved by using default test name convention.


---


[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150030044
  
--- Diff: contrib/storage-kafka/pom.xml ---
@@ -0,0 +1,130 @@
+
+
+http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;
+  xmlns="http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;>
+  4.0.0
+
+  
+drill-contrib-parent
+org.apache.drill.contrib
+1.12.0-SNAPSHOT
+  
+
+  drill-storage-kafka
+  contrib/kafka-storage-plugin
+
+  
+UTF-8
+0.11.0.1
+**/KafkaTestSuit.class
+  
+
+  
+
+  
+org.apache.maven.plugins
+maven-surefire-plugin
+
+  
+${kafka.TestSuite}
+  
+  
+**/TestKafkaQueries.java
+  
+  
+
+  logback.log.dir
+  ${project.build.directory}/surefire-reports
+
+  
+
+  
+
+  
+
+  
+
+  org.apache.drill.exec
+  drill-java-exec
+  ${project.version}
+  
+
--- End diff --

Why is it necessary to exclude zookeeper? If a specific version of 
zookeeper is required, will it be better to explicitly add zookeeper to the 
dependency management?


---


[jira] [Created] (DRILL-5948) The wrong number of batches is displayed

2017-11-09 Thread Vlad (JIRA)
Vlad created DRILL-5948:
---

 Summary: The wrong number of batches is displayed
 Key: DRILL-5948
 URL: https://issues.apache.org/jira/browse/DRILL-5948
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Vlad


I suppose, when you execute a query with a small amount of data drill must 
create 1 batch, but here you can see that drill created 2 batches. I think it's 
a wrong behaviour for the drill. Full JSON file will be in the attachment.
{code:html}
"fragmentProfile": [
{
"majorFragmentId": 0,
"minorFragmentProfile": [
{
"state": 3,
"minorFragmentId": 0,
"operatorProfile": [
{
"inputProfile": [
{
"records": 1,
"batches": 2,
"schemas": 1
}
],
"operatorId": 2,
"operatorType": 29,
"setupNanos": 0,
"processNanos": 1767363740,
"peakLocalMemoryAllocated": 639120,
"waitNanos": 25787
},
{code}

Step to reproduce:
# Create JSON file with 1 row
# Execute star query whith this file, for example 
{code:sql}
select * from dfs.`/path/to/your/file/example.json`
{code}
# Go to the Profile page on the UI, and open info about your query
# Open JSON profile



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150019576
  
--- Diff: contrib/storage-kafka/pom.xml ---
@@ -0,0 +1,130 @@
+
+
+http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;
+  xmlns="http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;>
+  4.0.0
+
+  
+drill-contrib-parent
+org.apache.drill.contrib
+1.12.0-SNAPSHOT
+  
+
+  drill-storage-kafka
+  contrib/kafka-storage-plugin
+
+  
+UTF-8
--- End diff --

If the setting is necessary, it will be better to set it at the root pom.


---


Re: [DISCUSS] Drill 1.12.0 release

2017-11-09 Thread Arina Yelchiyeva
Yes, they are already in master.

On Thu, Nov 9, 2017 at 6:05 PM, Charles Givre  wrote:

> We’re including the Networking functions in this release right?
>
> > On Nov 9, 2017, at 11:04, Arina Yelchiyeva 
> wrote:
> >
> > If changes will be done before cut off date, targeting mid November that
> it
> > will be possible to include this Jira.
> >
> > On Thu, Nov 9, 2017 at 6:03 PM, Charles Givre  wrote:
> >
> >> Hi Arina,
> >> Can we include DRILL-4091 Support for additional GIS operations in
> version
> >> 1.12?  In general the code looked pretty good.  There was a unit test
> >> missing which the developer submitted and some minor formatting issues
> >> which I’m still waiting on.
> >> Thanks,
> >> —C
> >>
> >>
> >>
> >>> On Nov 9, 2017, at 10:58, Arina Yelchiyeva  >
> >> wrote:
> >>>
> >>> Current status:
> >>>
> >>> Blocker:
> >>> DRILL-5917: Ban org.json:json library in Drill (developer - Vlad R.,
> code
> >>> reviewer - ?) - in progress.
> >>>
> >>> Targeted for 1.12 release:
> >>> DRILL-5337: OpenTSDB plugin (developer - Dmitriy & Vlad S., code
> >> reviewer -
> >>> Arina) - code review in final stage.
> >>> DRILL-4779: Kafka storage plugin support (developer - Anil & Kamesh,
> code
> >>> reviewer - Paul) - in review.
> >>> DRILL-5943: Avoid the strong check introduced by DRILL-5582 for PLAIN
> >>> mechanism (developer - Sorabh, code reviewer - Parth & Laurent) -
> waiting
> >>> for the code review.
> >>> DRILL-5771: Fix serDe errors for format plugins (developer - Arina,
> code
> >>> reviewer - Tim) - waiting for the code review.
> >>>
> >>> Kind regards
> >>> Arina
> >>>
> >>> On Fri, Oct 20, 2017 at 1:49 PM, Arina Yelchiyeva <
> >>> arina.yelchiy...@gmail.com> wrote:
> >>>
>  Current status:
> 
>  Targeted for 1.12 release:
>  DRILL-5832: Migrate OperatorFixture to use SystemOptionManager rather
> >> than
>  mock (developer - Paul, code reviewer - ?) - waiting for the code
> review
>  DRILL-5842: Refactor and simplify the fragment, operator contexts for
>  testing (developer - Paul, code reviewer - ?) - waiting for the code
>  review
>  DRILL-5834: Adding network functions (developer - Charles, code
> reviewer
>  - Arina) - waiting changes after code review
>  DRILL-5337: OpenTSDB plugin (developer - Dmitriy, code reviewer -
> >> Arina) - waiting
>  for the code review
>  DRILL-5772: Enable UTF-8 support in query string by default
> (developer -
>  Arina, code reviewer - Paul) - finalizing approach
>  DRILL-4779: Kafka storage plugin support (developer - Anil, code
> >> reviewer
>  - ?) - finishing implementation
> 
>  Under question:
>  DRILL-4286: Graceful shutdown of drillbit (developer - Jyothsna, code
>  reviewer - ?) - waiting for the status update from the developer
> 
>  Please free to suggest other items that are targeted for 1.12 release.
>  There are many Jiras that have fix version marked as 1.12, it would be
> >> good
>  if developers revisit them and update fix version to the actual one.
>  Link to the dashboard - https://issues.apache.org/
>  jira/secure/RapidBoard.jspa?rapidView=185=
> DRILL=detail
> 
>  Kind regards
>  Arina
> 
> 
>  On Wed, Oct 11, 2017 at 2:42 AM, Parth Chandra 
> >> wrote:
> 
> > I'm waiting to merge the SSL  changes in. Waiting a couple of days
> >> more to
> > see if there are any more comments before I merge the changes in.
> >
> > On Mon, Oct 9, 2017 at 10:28 AM, Paul Rogers 
> wrote:
> >
> >> Hi Arina,
> >>
> >> In addition to my own PRs, there are several in the “active” queue
> >> that
> > we
> >> could get in if we can just push them over the line and clear the
> >> queue.
> >> The owners of the PRs should check if we are waiting on them to take
> > action.
> >>
> >> 977 DRILL-5849: Add freemarker lib to dependencyManagement to
> >> ensure
> >> prop…
> >> 976 DRILL-5797: Choose parquet reader from read columns
> >> 975 DRILL-5743: Handling column family and column scan for hbase
> >> 973 DRILL-5775: Select * query on a maprdb binary table fails
> >> 972 DRILL-5838: Fix MaprDB filter pushdown for the case of
> nested
> >> field (reg. of DRILL-4264)
> >> 950 Drill 5431: SSL Support
> >> 949 DRILL-5795: Parquet Filter push down at rowgroup level
> >> 936 DRILL-5772: Add unit tests to indicate how utf-8 support can
> >> be
> >> enabled / disabled in Drill
> >> 904 DRILL-5717: change some date time test cases with specific
> >> timezone or Local
> >> 892 DRILL-5645: negation of expression causes null pointer
> >> exception
> >> 889 DRILL-5691: enhance scalar sub queries checking for the
> > cartesian
> >> join
> >>
> >> (Items not 

Re: [DISCUSS] Drill 1.12.0 release

2017-11-09 Thread Charles Givre
We’re including the Networking functions in this release right?

> On Nov 9, 2017, at 11:04, Arina Yelchiyeva  wrote:
> 
> If changes will be done before cut off date, targeting mid November that it
> will be possible to include this Jira.
> 
> On Thu, Nov 9, 2017 at 6:03 PM, Charles Givre  wrote:
> 
>> Hi Arina,
>> Can we include DRILL-4091 Support for additional GIS operations in version
>> 1.12?  In general the code looked pretty good.  There was a unit test
>> missing which the developer submitted and some minor formatting issues
>> which I’m still waiting on.
>> Thanks,
>> —C
>> 
>> 
>> 
>>> On Nov 9, 2017, at 10:58, Arina Yelchiyeva 
>> wrote:
>>> 
>>> Current status:
>>> 
>>> Blocker:
>>> DRILL-5917: Ban org.json:json library in Drill (developer - Vlad R., code
>>> reviewer - ?) - in progress.
>>> 
>>> Targeted for 1.12 release:
>>> DRILL-5337: OpenTSDB plugin (developer - Dmitriy & Vlad S., code
>> reviewer -
>>> Arina) - code review in final stage.
>>> DRILL-4779: Kafka storage plugin support (developer - Anil & Kamesh, code
>>> reviewer - Paul) - in review.
>>> DRILL-5943: Avoid the strong check introduced by DRILL-5582 for PLAIN
>>> mechanism (developer - Sorabh, code reviewer - Parth & Laurent) - waiting
>>> for the code review.
>>> DRILL-5771: Fix serDe errors for format plugins (developer - Arina, code
>>> reviewer - Tim) - waiting for the code review.
>>> 
>>> Kind regards
>>> Arina
>>> 
>>> On Fri, Oct 20, 2017 at 1:49 PM, Arina Yelchiyeva <
>>> arina.yelchiy...@gmail.com> wrote:
>>> 
 Current status:
 
 Targeted for 1.12 release:
 DRILL-5832: Migrate OperatorFixture to use SystemOptionManager rather
>> than
 mock (developer - Paul, code reviewer - ?) - waiting for the code review
 DRILL-5842: Refactor and simplify the fragment, operator contexts for
 testing (developer - Paul, code reviewer - ?) - waiting for the code
 review
 DRILL-5834: Adding network functions (developer - Charles, code reviewer
 - Arina) - waiting changes after code review
 DRILL-5337: OpenTSDB plugin (developer - Dmitriy, code reviewer -
>> Arina) - waiting
 for the code review
 DRILL-5772: Enable UTF-8 support in query string by default (developer -
 Arina, code reviewer - Paul) - finalizing approach
 DRILL-4779: Kafka storage plugin support (developer - Anil, code
>> reviewer
 - ?) - finishing implementation
 
 Under question:
 DRILL-4286: Graceful shutdown of drillbit (developer - Jyothsna, code
 reviewer - ?) - waiting for the status update from the developer
 
 Please free to suggest other items that are targeted for 1.12 release.
 There are many Jiras that have fix version marked as 1.12, it would be
>> good
 if developers revisit them and update fix version to the actual one.
 Link to the dashboard - https://issues.apache.org/
 jira/secure/RapidBoard.jspa?rapidView=185=DRILL=detail
 
 Kind regards
 Arina
 
 
 On Wed, Oct 11, 2017 at 2:42 AM, Parth Chandra 
>> wrote:
 
> I'm waiting to merge the SSL  changes in. Waiting a couple of days
>> more to
> see if there are any more comments before I merge the changes in.
> 
> On Mon, Oct 9, 2017 at 10:28 AM, Paul Rogers  wrote:
> 
>> Hi Arina,
>> 
>> In addition to my own PRs, there are several in the “active” queue
>> that
> we
>> could get in if we can just push them over the line and clear the
>> queue.
>> The owners of the PRs should check if we are waiting on them to take
> action.
>> 
>> 977 DRILL-5849: Add freemarker lib to dependencyManagement to
>> ensure
>> prop…
>> 976 DRILL-5797: Choose parquet reader from read columns
>> 975 DRILL-5743: Handling column family and column scan for hbase
>> 973 DRILL-5775: Select * query on a maprdb binary table fails
>> 972 DRILL-5838: Fix MaprDB filter pushdown for the case of nested
>> field (reg. of DRILL-4264)
>> 950 Drill 5431: SSL Support
>> 949 DRILL-5795: Parquet Filter push down at rowgroup level
>> 936 DRILL-5772: Add unit tests to indicate how utf-8 support can
>> be
>> enabled / disabled in Drill
>> 904 DRILL-5717: change some date time test cases with specific
>> timezone or Local
>> 892 DRILL-5645: negation of expression causes null pointer
>> exception
>> 889 DRILL-5691: enhance scalar sub queries checking for the
> cartesian
>> join
>> 
>> (Items not on the list above have become “inactive” for a variety of
>> reasons.)
>> 
>> Thanks,
>> 
>> - Paul
>> 
>>> On Oct 9, 2017, at 9:57 AM, Paul Rogers  wrote:
>>> 
>>> Hi Arina,
>>> 
>>> I’d like to include the following that are needed to finish up the
>> “managed” sort and spill-to-disk for 

Re: [DISCUSS] Drill 1.12.0 release

2017-11-09 Thread Arina Yelchiyeva
If changes will be done before cut off date, targeting mid November that it
will be possible to include this Jira.

On Thu, Nov 9, 2017 at 6:03 PM, Charles Givre  wrote:

> Hi Arina,
> Can we include DRILL-4091 Support for additional GIS operations in version
> 1.12?  In general the code looked pretty good.  There was a unit test
> missing which the developer submitted and some minor formatting issues
> which I’m still waiting on.
> Thanks,
> —C
>
>
>
> > On Nov 9, 2017, at 10:58, Arina Yelchiyeva 
> wrote:
> >
> > Current status:
> >
> > Blocker:
> > DRILL-5917: Ban org.json:json library in Drill (developer - Vlad R., code
> > reviewer - ?) - in progress.
> >
> > Targeted for 1.12 release:
> > DRILL-5337: OpenTSDB plugin (developer - Dmitriy & Vlad S., code
> reviewer -
> > Arina) - code review in final stage.
> > DRILL-4779: Kafka storage plugin support (developer - Anil & Kamesh, code
> > reviewer - Paul) - in review.
> > DRILL-5943: Avoid the strong check introduced by DRILL-5582 for PLAIN
> > mechanism (developer - Sorabh, code reviewer - Parth & Laurent) - waiting
> > for the code review.
> > DRILL-5771: Fix serDe errors for format plugins (developer - Arina, code
> > reviewer - Tim) - waiting for the code review.
> >
> > Kind regards
> > Arina
> >
> > On Fri, Oct 20, 2017 at 1:49 PM, Arina Yelchiyeva <
> > arina.yelchiy...@gmail.com> wrote:
> >
> >> Current status:
> >>
> >> Targeted for 1.12 release:
> >> DRILL-5832: Migrate OperatorFixture to use SystemOptionManager rather
> than
> >> mock (developer - Paul, code reviewer - ?) - waiting for the code review
> >> DRILL-5842: Refactor and simplify the fragment, operator contexts for
> >> testing (developer - Paul, code reviewer - ?) - waiting for the code
> >> review
> >> DRILL-5834: Adding network functions (developer - Charles, code reviewer
> >> - Arina) - waiting changes after code review
> >> DRILL-5337: OpenTSDB plugin (developer - Dmitriy, code reviewer -
> Arina) - waiting
> >> for the code review
> >> DRILL-5772: Enable UTF-8 support in query string by default (developer -
> >> Arina, code reviewer - Paul) - finalizing approach
> >> DRILL-4779: Kafka storage plugin support (developer - Anil, code
> reviewer
> >> - ?) - finishing implementation
> >>
> >> Under question:
> >> DRILL-4286: Graceful shutdown of drillbit (developer - Jyothsna, code
> >> reviewer - ?) - waiting for the status update from the developer
> >>
> >> Please free to suggest other items that are targeted for 1.12 release.
> >> There are many Jiras that have fix version marked as 1.12, it would be
> good
> >> if developers revisit them and update fix version to the actual one.
> >> Link to the dashboard - https://issues.apache.org/
> >> jira/secure/RapidBoard.jspa?rapidView=185=DRILL=detail
> >>
> >> Kind regards
> >> Arina
> >>
> >>
> >> On Wed, Oct 11, 2017 at 2:42 AM, Parth Chandra 
> wrote:
> >>
> >>> I'm waiting to merge the SSL  changes in. Waiting a couple of days
> more to
> >>> see if there are any more comments before I merge the changes in.
> >>>
> >>> On Mon, Oct 9, 2017 at 10:28 AM, Paul Rogers  wrote:
> >>>
>  Hi Arina,
> 
>  In addition to my own PRs, there are several in the “active” queue
> that
> >>> we
>  could get in if we can just push them over the line and clear the
> queue.
>  The owners of the PRs should check if we are waiting on them to take
> >>> action.
> 
>  977 DRILL-5849: Add freemarker lib to dependencyManagement to
> ensure
>  prop…
>  976 DRILL-5797: Choose parquet reader from read columns
>  975 DRILL-5743: Handling column family and column scan for hbase
>  973 DRILL-5775: Select * query on a maprdb binary table fails
>  972 DRILL-5838: Fix MaprDB filter pushdown for the case of nested
>  field (reg. of DRILL-4264)
>  950 Drill 5431: SSL Support
>  949 DRILL-5795: Parquet Filter push down at rowgroup level
>  936 DRILL-5772: Add unit tests to indicate how utf-8 support can
> be
>  enabled / disabled in Drill
>  904 DRILL-5717: change some date time test cases with specific
>  timezone or Local
>  892 DRILL-5645: negation of expression causes null pointer
> exception
>  889 DRILL-5691: enhance scalar sub queries checking for the
> >>> cartesian
>  join
> 
>  (Items not on the list above have become “inactive” for a variety of
>  reasons.)
> 
>  Thanks,
> 
>  - Paul
> 
> > On Oct 9, 2017, at 9:57 AM, Paul Rogers  wrote:
> >
> > Hi Arina,
> >
> > I’d like to include the following that are needed to finish up the
>  “managed” sort and spill-to-disk for hash agg:
> >
> > #928: DRILL-5716: Queue-driven memory allocation
> > #958, DRILL-5808: Reduce memory allocator strictness for "managed"
>  operators
> > #960, DRILL-5815: Option to set 

Re: [DISCUSS] Drill 1.12.0 release

2017-11-09 Thread Charles Givre
Hi Arina, 
Can we include DRILL-4091 Support for additional GIS operations in version 
1.12?  In general the code looked pretty good.  There was a unit test missing 
which the developer submitted and some minor formatting issues which I’m still 
waiting on. 
Thanks,
—C 



> On Nov 9, 2017, at 10:58, Arina Yelchiyeva  wrote:
> 
> Current status:
> 
> Blocker:
> DRILL-5917: Ban org.json:json library in Drill (developer - Vlad R., code
> reviewer - ?) - in progress.
> 
> Targeted for 1.12 release:
> DRILL-5337: OpenTSDB plugin (developer - Dmitriy & Vlad S., code reviewer -
> Arina) - code review in final stage.
> DRILL-4779: Kafka storage plugin support (developer - Anil & Kamesh, code
> reviewer - Paul) - in review.
> DRILL-5943: Avoid the strong check introduced by DRILL-5582 for PLAIN
> mechanism (developer - Sorabh, code reviewer - Parth & Laurent) - waiting
> for the code review.
> DRILL-5771: Fix serDe errors for format plugins (developer - Arina, code
> reviewer - Tim) - waiting for the code review.
> 
> Kind regards
> Arina
> 
> On Fri, Oct 20, 2017 at 1:49 PM, Arina Yelchiyeva <
> arina.yelchiy...@gmail.com> wrote:
> 
>> Current status:
>> 
>> Targeted for 1.12 release:
>> DRILL-5832: Migrate OperatorFixture to use SystemOptionManager rather than
>> mock (developer - Paul, code reviewer - ?) - waiting for the code review
>> DRILL-5842: Refactor and simplify the fragment, operator contexts for
>> testing (developer - Paul, code reviewer - ?) - waiting for the code
>> review
>> DRILL-5834: Adding network functions (developer - Charles, code reviewer
>> - Arina) - waiting changes after code review
>> DRILL-5337: OpenTSDB plugin (developer - Dmitriy, code reviewer - Arina) - 
>> waiting
>> for the code review
>> DRILL-5772: Enable UTF-8 support in query string by default (developer -
>> Arina, code reviewer - Paul) - finalizing approach
>> DRILL-4779: Kafka storage plugin support (developer - Anil, code reviewer
>> - ?) - finishing implementation
>> 
>> Under question:
>> DRILL-4286: Graceful shutdown of drillbit (developer - Jyothsna, code
>> reviewer - ?) - waiting for the status update from the developer
>> 
>> Please free to suggest other items that are targeted for 1.12 release.
>> There are many Jiras that have fix version marked as 1.12, it would be good
>> if developers revisit them and update fix version to the actual one.
>> Link to the dashboard - https://issues.apache.org/
>> jira/secure/RapidBoard.jspa?rapidView=185=DRILL=detail
>> 
>> Kind regards
>> Arina
>> 
>> 
>> On Wed, Oct 11, 2017 at 2:42 AM, Parth Chandra  wrote:
>> 
>>> I'm waiting to merge the SSL  changes in. Waiting a couple of days more to
>>> see if there are any more comments before I merge the changes in.
>>> 
>>> On Mon, Oct 9, 2017 at 10:28 AM, Paul Rogers  wrote:
>>> 
 Hi Arina,
 
 In addition to my own PRs, there are several in the “active” queue that
>>> we
 could get in if we can just push them over the line and clear the queue.
 The owners of the PRs should check if we are waiting on them to take
>>> action.
 
 977 DRILL-5849: Add freemarker lib to dependencyManagement to ensure
 prop…
 976 DRILL-5797: Choose parquet reader from read columns
 975 DRILL-5743: Handling column family and column scan for hbase
 973 DRILL-5775: Select * query on a maprdb binary table fails
 972 DRILL-5838: Fix MaprDB filter pushdown for the case of nested
 field (reg. of DRILL-4264)
 950 Drill 5431: SSL Support
 949 DRILL-5795: Parquet Filter push down at rowgroup level
 936 DRILL-5772: Add unit tests to indicate how utf-8 support can be
 enabled / disabled in Drill
 904 DRILL-5717: change some date time test cases with specific
 timezone or Local
 892 DRILL-5645: negation of expression causes null pointer exception
 889 DRILL-5691: enhance scalar sub queries checking for the
>>> cartesian
 join
 
 (Items not on the list above have become “inactive” for a variety of
 reasons.)
 
 Thanks,
 
 - Paul
 
> On Oct 9, 2017, at 9:57 AM, Paul Rogers  wrote:
> 
> Hi Arina,
> 
> I’d like to include the following that are needed to finish up the
 “managed” sort and spill-to-disk for hash agg:
> 
> #928: DRILL-5716: Queue-driven memory allocation
> #958, DRILL-5808: Reduce memory allocator strictness for "managed"
 operators
> #960, DRILL-5815: Option to set query memory as percent of total
> 
> The following is needed to resolve issues with HBase support in empty
 batches:
> 
> #968, DRILL-5830: Resolve regressions to MapR DB from DRILL-5546
> 
> The following are nice-to-haves that build on work already done in
>>> this
 release, and that some of my own work depends on:
> 
> #970, DRILL-5832: Migrate OperatorFixture to use 

Re: [DISCUSS] Drill 1.12.0 release

2017-11-09 Thread Arina Yelchiyeva
Current status:

Blocker:
DRILL-5917: Ban org.json:json library in Drill (developer - Vlad R., code
reviewer - ?) - in progress.

Targeted for 1.12 release:
DRILL-5337: OpenTSDB plugin (developer - Dmitriy & Vlad S., code reviewer -
Arina) - code review in final stage.
DRILL-4779: Kafka storage plugin support (developer - Anil & Kamesh, code
reviewer - Paul) - in review.
DRILL-5943: Avoid the strong check introduced by DRILL-5582 for PLAIN
mechanism (developer - Sorabh, code reviewer - Parth & Laurent) - waiting
for the code review.
DRILL-5771: Fix serDe errors for format plugins (developer - Arina, code
reviewer - Tim) - waiting for the code review.

Kind regards
Arina

On Fri, Oct 20, 2017 at 1:49 PM, Arina Yelchiyeva <
arina.yelchiy...@gmail.com> wrote:

> Current status:
>
> Targeted for 1.12 release:
> DRILL-5832: Migrate OperatorFixture to use SystemOptionManager rather than
> mock (developer - Paul, code reviewer - ?) - waiting for the code review
> DRILL-5842: Refactor and simplify the fragment, operator contexts for
> testing (developer - Paul, code reviewer - ?) - waiting for the code
> review
> DRILL-5834: Adding network functions (developer - Charles, code reviewer
> - Arina) - waiting changes after code review
> DRILL-5337: OpenTSDB plugin (developer - Dmitriy, code reviewer - Arina) - 
> waiting
> for the code review
> DRILL-5772: Enable UTF-8 support in query string by default (developer -
> Arina, code reviewer - Paul) - finalizing approach
> DRILL-4779: Kafka storage plugin support (developer - Anil, code reviewer
> - ?) - finishing implementation
>
> Under question:
> DRILL-4286: Graceful shutdown of drillbit (developer - Jyothsna, code
> reviewer - ?) - waiting for the status update from the developer
>
> Please free to suggest other items that are targeted for 1.12 release.
> There are many Jiras that have fix version marked as 1.12, it would be good
> if developers revisit them and update fix version to the actual one.
> Link to the dashboard - https://issues.apache.org/
> jira/secure/RapidBoard.jspa?rapidView=185=DRILL=detail
>
> Kind regards
> Arina
>
>
> On Wed, Oct 11, 2017 at 2:42 AM, Parth Chandra  wrote:
>
>> I'm waiting to merge the SSL  changes in. Waiting a couple of days more to
>> see if there are any more comments before I merge the changes in.
>>
>> On Mon, Oct 9, 2017 at 10:28 AM, Paul Rogers  wrote:
>>
>> > Hi Arina,
>> >
>> > In addition to my own PRs, there are several in the “active” queue that
>> we
>> > could get in if we can just push them over the line and clear the queue.
>> > The owners of the PRs should check if we are waiting on them to take
>> action.
>> >
>> > 977 DRILL-5849: Add freemarker lib to dependencyManagement to ensure
>> > prop…
>> > 976 DRILL-5797: Choose parquet reader from read columns
>> > 975 DRILL-5743: Handling column family and column scan for hbase
>> > 973 DRILL-5775: Select * query on a maprdb binary table fails
>> > 972 DRILL-5838: Fix MaprDB filter pushdown for the case of nested
>> > field (reg. of DRILL-4264)
>> > 950 Drill 5431: SSL Support
>> > 949 DRILL-5795: Parquet Filter push down at rowgroup level
>> > 936 DRILL-5772: Add unit tests to indicate how utf-8 support can be
>> > enabled / disabled in Drill
>> > 904 DRILL-5717: change some date time test cases with specific
>> > timezone or Local
>> > 892 DRILL-5645: negation of expression causes null pointer exception
>> > 889 DRILL-5691: enhance scalar sub queries checking for the
>> cartesian
>> > join
>> >
>> > (Items not on the list above have become “inactive” for a variety of
>> > reasons.)
>> >
>> > Thanks,
>> >
>> > - Paul
>> >
>> > > On Oct 9, 2017, at 9:57 AM, Paul Rogers  wrote:
>> > >
>> > > Hi Arina,
>> > >
>> > > I’d like to include the following that are needed to finish up the
>> > “managed” sort and spill-to-disk for hash agg:
>> > >
>> > > #928: DRILL-5716: Queue-driven memory allocation
>> > > #958, DRILL-5808: Reduce memory allocator strictness for "managed"
>> > operators
>> > > #960, DRILL-5815: Option to set query memory as percent of total
>> > >
>> > > The following is needed to resolve issues with HBase support in empty
>> > batches:
>> > >
>> > > #968, DRILL-5830: Resolve regressions to MapR DB from DRILL-5546
>> > >
>> > > The following are nice-to-haves that build on work already done in
>> this
>> > release, and that some of my own work depends on:
>> > >
>> > > #970, DRILL-5832: Migrate OperatorFixture to use SystemOptionManager
>> > rather than mock
>> > > #978: DRILL-5842: Refactor and simplify the fragment, operator
>> contexts
>> > for testing
>> > >
>> > > The following is not needed for 1.12 per-se, but is the foundation
>> for a
>> > project I’m working on; would be good to get this in after 2-3 months of
>> > review time:
>> > >
>> > > #921, foundation for batch size limitation
>> > >
>> > > The key issue with each of the above is that they 

[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread akumarb2010
Github user akumarb2010 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r150002516
  
--- Diff: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/DrillKafkaConfig.java
 ---
@@ -0,0 +1,31 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+public class DrillKafkaConfig {
+
+  /**
+   * Timeout for fetching messages from Kafka
--- End diff --

Thanks Paul, this is very good point and it perfectly make sense to add 
them as Drill session options instead of Drill config properties. We are 
working on these changes.


---


[GitHub] drill pull request #1021: DRILL-5923: Display name for query state

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1021#discussion_r149997750
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileUtil.java
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.server.rest.profile;
+
+import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState;
+
+import java.util.Collections;
+import java.util.Map;
+
+import com.google.common.collect.Maps;
+
+public class ProfileUtil {
+  // Mapping query state names to display names
+  private static final Map queryStateDisplayName;
+
+  static {
+Map displayNames = Maps.newHashMap();
--- End diff --

1. Please use `Map` since you're already receiving 
`QueryState` as in parameter in method.
Besides, it would guarantee you did not make mistake writing query state 
enum names.
2. `queryStateDisplayName` -> `queryStateDisplayNames`


---


[GitHub] drill pull request #1021: DRILL-5923: Display name for query state

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1021#discussion_r149998367
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileUtil.java
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.server.rest.profile;
+
+import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState;
+
+import java.util.Collections;
+import java.util.Map;
+
+import com.google.common.collect.Maps;
+
+public class ProfileUtil {
+  // Mapping query state names to display names
+  private static final Map queryStateDisplayName;
+
+  static {
+Map displayNames = Maps.newHashMap();
+displayNames.put("STARTING", "Starting");
+displayNames.put("RUNNING", "Running");
+displayNames.put("COMPLETED", "Succeeded");
+displayNames.put("CANCELED", "Canceled");
+displayNames.put("FAILED", "Failed");
+displayNames.put("CANCELLATION_REQUESTED", "Cancellation Requested");
+displayNames.put("ENQUEUED", "Enqueued");
+queryStateDisplayName = Collections.unmodifiableMap(displayNames);
+  }
+
+
+  /**
+   * Utility to return display name for query state
+   * @param queryState
+   * @return display string for query state
+   */
+  public final static String getQueryStateDisplayName(QueryState 
queryState) {
+String state = queryState.name();
+if (queryStateDisplayName.containsKey(state)) {
--- End diff --

This would be more optimal:
```
String state = queryStateDisplayNames.get(queryState);
if (state == null) {
   state = "Unknown State"
}
return state;
```


---


[GitHub] drill issue #904: DRILL-5717: change some date time test cases with specific...

2017-11-09 Thread vvysotskyi
Github user vvysotskyi commented on the issue:

https://github.com/apache/drill/pull/904
  
@weijietong, thanks for the pull request, +1


---


[GitHub] drill issue #1020: DRILL-5921: Display counter metrics in table

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1020
  
+1, LGTM.


---


[GitHub] drill pull request #1020: DRILL-5921: Display counter metrics in table

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1020#discussion_r149994673
  
--- Diff: exec/java-exec/src/main/resources/rest/metrics/metrics.ftl ---
@@ -138,21 +154,14 @@
   });
 };
 
-function updateOthers(metrics) {
-  $.each(["counters", "meters"], function(i, key) {
-if(! $.isEmptyObject(metrics[key])) {
-  $("#" + key + "Val").html(JSON.stringify(metrics[key], null, 2));
-}
-  });
-};
-
 var update = function() {
   $.get("/status/metrics", function(metrics) {
 updateGauges(metrics.gauges);
 updateBars(metrics.gauges);
 if(! $.isEmptyObject(metrics.timers)) createTable(metrics.timers, 
"timers");
 if(! $.isEmptyObject(metrics.histograms)) 
createTable(metrics.histograms, "histograms");
-updateOthers(metrics);
+if(! $.isEmptyObject(metrics.counters)) 
createCountersTable(metrics.counters);
+if(! $.isEmptyObject(metrics.meters)) 
$("#metersVal").html(JSON.stringify(metrics.meters, null, 2));
--- End diff --

Well, sounds good then, thanks for making the changes.


---


[GitHub] drill issue #1014: DRILL-5771: Fix serDe errors for format plugins

2017-11-09 Thread priteshm
Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1014
  
@ilooner can you please review this?


---


[GitHub] drill issue #1030: DRILL-5941: Skip header / footer improvements for Hive st...

2017-11-09 Thread priteshm
Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1030
  
@ppadma can you review this?


---


[GitHub] drill pull request #1021: DRILL-5923: Display name for query state

2017-11-09 Thread prasadns14
Github user prasadns14 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1021#discussion_r149974787
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/QueryStateDisplayName.java
 ---
@@ -0,0 +1,35 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.server.rest.profile;
+
+import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState;
+
+public class QueryStateDisplayName {
+  // Values should correspond to the QueryState enum in UserBitShared.proto
--- End diff --

@arina-ielchiieva 
yes, map will definitely make it to easier to visualize the mapping. 
Made the changes


---


[GitHub] drill pull request #1020: DRILL-5921: Display counter metrics in table

2017-11-09 Thread prasadns14
Github user prasadns14 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1020#discussion_r149972346
  
--- Diff: exec/java-exec/src/main/resources/rest/metrics/metrics.ftl ---
@@ -138,21 +154,14 @@
   });
 };
 
-function updateOthers(metrics) {
-  $.each(["counters", "meters"], function(i, key) {
-if(! $.isEmptyObject(metrics[key])) {
-  $("#" + key + "Val").html(JSON.stringify(metrics[key], null, 2));
-}
-  });
-};
-
 var update = function() {
   $.get("/status/metrics", function(metrics) {
 updateGauges(metrics.gauges);
 updateBars(metrics.gauges);
 if(! $.isEmptyObject(metrics.timers)) createTable(metrics.timers, 
"timers");
 if(! $.isEmptyObject(metrics.histograms)) 
createTable(metrics.histograms, "histograms");
-updateOthers(metrics);
+if(! $.isEmptyObject(metrics.counters)) 
createCountersTable(metrics.counters);
+if(! $.isEmptyObject(metrics.meters)) 
$("#metersVal").html(JSON.stringify(metrics.meters, null, 2));
--- End diff --

@arina-ielchiieva,
I have considered reusing existing methods before deciding to have a 
separate method.
With the above suggestion, the table will now look as below-

drill.connections.rpc.control.encrypted|  {count: 0}

'|' here is column delimiter. Do we want to display only the number in the 
second column or a key/value pair? 
I just wanted it to be consistent with the other metrics tables. (so I 
print value.count) 

Removed meters section. 


---


[GitHub] drill pull request #1030: DRILL-5941: Skip header / footer improvements for ...

2017-11-09 Thread arina-ielchiieva
GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/1030

DRILL-5941: Skip header / footer improvements for Hive storage plugin

Overview:
1. When table has header / footer process input splits fo the same file in 
one reader (bug fix for DRILL-5941).
2. Apply skip header logic during reader initialization only once to avoid 
checks during reading the data (DRILL-5106).
3. Apply skip footer logic only when footer is more then 0, otherwise 
default processing will be done without buffering data in queue (DRIL-5106).

Code changes:
1. AbstractReadersInitializer was introduced to factor out common logic 
during readers intialization.
It will have three implementations:
a. Default (each input split gets its own reader);
b. Empty (for empty tables);
c. InputSplitGroups (applied when table has header / footer and input 
splits of the same file should be processed together).

2. AbstractRecordsInspector was introduced to improve performance when 
table has footer is less or equals to 0.
It will have two implementations:
a. Default (records will be processed one by one without buffering);
b. SkipFooter (queue will be used to buffer N records that should be 
skipped in the end of file processing).

3. Allow HiveAbstractReader to have multiple input splits.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-5941

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1030.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1030






---


[GitHub] drill issue #1026: DRILL-5919: Add non-numeric support for JSON processing

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1026
  
Thanks, +1, LGTM.


---


[GitHub] drill issue #904: DRILL-5717: change some date time test cases with specific...

2017-11-09 Thread weijietong
Github user weijietong commented on the issue:

https://github.com/apache/drill/pull/904
  
done


---


[GitHub] drill issue #1026: DRILL-5919: Add non-numeric support for JSON processing

2017-11-09 Thread vladimirtkach
Github user vladimirtkach commented on the issue:

https://github.com/apache/drill/pull/1026
  
@arina-ielchiieva made code changes according to your comments.


---


[GitHub] drill pull request #1026: DRILL-5919: Add non-numeric support for JSON proce...

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1026#discussion_r149903182
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestJsonNonNumerics.java
 ---
@@ -0,0 +1,167 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to you under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.drill.exec.vector.complex.writer;
+
+import com.google.common.collect.ImmutableMap;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.BaseTestQuery;
+import org.apache.drill.common.exceptions.UserRemoteException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.RecordBatchLoader;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.rpc.user.QueryDataBatch;
+import org.apache.drill.exec.vector.VarCharVector;
+import org.junit.Test;
+
+import java.io.File;
+import java.util.List;
+
+import static org.hamcrest.CoreMatchers.containsString;
+import static org.junit.Assert.*;
+
+public class TestJsonNonNumerics extends BaseTestQuery {
+
+  @Test
+  public void testNonNumericSelect() throws Exception {
+File file = new File(getTempDir(""), "nan_test.json");
+String json = "{\"nan\":NaN, \"inf\":Infinity}";
+String query = String.format("select * from 
dfs.`%s`",file.getAbsolutePath());
+try {
+  FileUtils.writeStringToFile(file, json);
+  test("alter session set `store.json.reader.non_numeric_numbers` = 
true");
+  testBuilder()
+.sqlQuery(query)
+.unOrdered()
+.baselineColumns("nan", "inf")
+.baselineValues(Double.NaN, Double.POSITIVE_INFINITY)
+.build()
+.run();
+} finally {
+  test("alter session reset `store.json.reader.non_numeric_numbers`");
+  FileUtils.deleteQuietly(file);
+}
+  }
+
+  @Test(expected = UserRemoteException.class)
+  public void testNonNumericFailure() throws Exception {
+File file = new File(getTempDir(""), "nan_test.json");
+test("alter session set `store.json.reader.non_numeric_numbers` = 
false");
+String json = "{\"nan\":NaN, \"inf\":Infinity}";
+try {
+  FileUtils.writeStringToFile(file, json);
+  test("select * from dfs.`%s`;", file.getAbsolutePath());
+} catch (UserRemoteException e) {
+  assertThat(e.getMessage(), containsString("Error parsing JSON"));
+  throw e;
+} finally {
+  test("alter session reset `store.json.reader.non_numeric_numbers`");
+  FileUtils.deleteQuietly(file);
+}
+  }
+
+  @Test
+  public void testCreateTableNonNumerics() throws Exception {
+File file = new File(getTempDir(""), "nan_test.json");
+String json = "{\"nan\":NaN, \"inf\":Infinity}";
+String tableName = "ctas_test";
+try {
+  FileUtils.writeStringToFile(file, json);
+  test("alter session set `store.json.reader.non_numeric_numbers` = 
true");
+  test("alter session set `store.json.writer.non_numeric_numbers` = 
true");
+  test("alter session set `store.format`='json'");
+  test("create table dfs_test.tmp.`%s` as select * from dfs.`%s`;", 
tableName, file.getAbsolutePath());
+
+  // ensuring that `NaN` and `Infinity` tokens ARE NOT enclosed with 
double quotes
+  File resultFile = new File(new 
File(getDfsTestTmpSchemaLocation(),tableName),"0_0_0.json");
+  String resultJson = FileUtils.readFileToString(resultFile);
+  int nanIndex = resultJson.indexOf("NaN");
+  assertFalse("`NaN` must not be enclosed with \"\" ", 
resultJson.charAt(nanIndex - 1) == '"');
+  assertFalse("`NaN` must not be enclosed with \"\" ", 
resultJson.charAt(nanIndex + "NaN".length()) == '"');
+  int infIndex = resultJson.indexOf("Infinity");
+  assertFalse("`Infinity` must not be enclosed with \"\" ", 
resultJson.charAt(infIndex - 1) == '"');
+  

[GitHub] drill pull request #1026: DRILL-5919: Add non-numeric support for JSON proce...

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1026#discussion_r149903705
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestJsonNonNumerics.java
 ---
@@ -0,0 +1,167 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to you under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.drill.exec.vector.complex.writer;
+
+import com.google.common.collect.ImmutableMap;
+import org.apache.commons.io.FileUtils;
+import org.apache.drill.BaseTestQuery;
+import org.apache.drill.common.exceptions.UserRemoteException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.RecordBatchLoader;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.rpc.user.QueryDataBatch;
+import org.apache.drill.exec.vector.VarCharVector;
+import org.junit.Test;
+
+import java.io.File;
+import java.util.List;
+
+import static org.hamcrest.CoreMatchers.containsString;
+import static org.junit.Assert.*;
+
+public class TestJsonNonNumerics extends BaseTestQuery {
+
+  @Test
+  public void testNonNumericSelect() throws Exception {
+File file = new File(getTempDir(""), "nan_test.json");
--- End diff --

It's better to pass dir name as well, rather than emptiness. Ex: 
`getTempDir("test_nan")`


---


[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r149893057
  
--- Diff: contrib/storage-kafka/src/test/resources/logback-test.xml ---
@@ -0,0 +1,51 @@
+
--- End diff --

Please remove. Now we have common logging configuration for all in 
drill-common module.


---


[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r149893459
  
--- Diff: 
contrib/storage-kafka/src/test/java/org/apache/drill/exec/store/kafka/cluster/EmbeddedZKQuorum.java
 ---
@@ -0,0 +1,83 @@
+/**
--- End diff --

Apache header should be in a form of comment, not Java doc. Please update 
here and in other newly added files.
Hope, somebody will add to check-style so we won't have to remind about it 
all the time.


---


[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1027#discussion_r149893582
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroRecordReader.java
 ---
@@ -343,4 +343,4 @@ public void close() {
   }
 }
   }
-}
+}
--- End diff --

Please revert changes in this file.


---


[GitHub] drill pull request #1020: DRILL-5921: Display counter metrics in table

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1020#discussion_r149891566
  
--- Diff: exec/java-exec/src/main/resources/rest/metrics/metrics.ftl ---
@@ -138,21 +154,14 @@
   });
 };
 
-function updateOthers(metrics) {
-  $.each(["counters", "meters"], function(i, key) {
-if(! $.isEmptyObject(metrics[key])) {
-  $("#" + key + "Val").html(JSON.stringify(metrics[key], null, 2));
-}
-  });
-};
-
 var update = function() {
   $.get("/status/metrics", function(metrics) {
 updateGauges(metrics.gauges);
 updateBars(metrics.gauges);
 if(! $.isEmptyObject(metrics.timers)) createTable(metrics.timers, 
"timers");
 if(! $.isEmptyObject(metrics.histograms)) 
createTable(metrics.histograms, "histograms");
-updateOthers(metrics);
+if(! $.isEmptyObject(metrics.counters)) 
createCountersTable(metrics.counters);
+if(! $.isEmptyObject(metrics.meters)) 
$("#metersVal").html(JSON.stringify(metrics.meters, null, 2));
--- End diff --

@prasadns14
1. Thanks for adding the screenshots.
2. Most of the code in `createTable` and `createCountersTable` coincide.  I 
suggested you make one function. For example, with three parameters, 
`createTable(metric, name, addReportingClass)`. When you don't need to add 
reporting class you'll call this method with false. Our goal here is generify 
existing methods rather then adding new specific with almost the same content.
3. If we don't have any meters, let's remove them.



---


[GitHub] drill pull request #1021: DRILL-5923: Display name for query state

2017-11-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1021#discussion_r149889024
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/QueryStateDisplayName.java
 ---
@@ -0,0 +1,35 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.server.rest.profile;
+
+import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState;
+
+public class QueryStateDisplayName {
+  // Values should correspond to the QueryState enum in UserBitShared.proto
--- End diff --

@prasadns14 
1. When using array you depend on enum value order. If for some reason 
value order changes, your approach will be working incorrectly.
2. Having map will show which key is mapped to which value and it will be 
much easier to understand that, for example, completed is mapped to succeeded.

I recommend you use map approach.


---