Re: [DISCUSS] Drill 1.12.0 release
Hi Arina, Could we consider to include DRILL-5089 in 1.12.0? It is about lazy loading schema for storage plugins. Could you or Paul take a look at the pull request for this JIRA https://github.com/apache/drill/pull/1032? I think both of you are familiar with this part. Thanks, Chunhui From: Arina YelchiyevaSent: Thursday, November 9, 2017 8:11:35 AM To: dev@drill.apache.org Subject: Re: [DISCUSS] Drill 1.12.0 release Yes, they are already in master. On Thu, Nov 9, 2017 at 6:05 PM, Charles Givre wrote: > We’re including the Networking functions in this release right? > > > On Nov 9, 2017, at 11:04, Arina Yelchiyeva > wrote: > > > > If changes will be done before cut off date, targeting mid November that > it > > will be possible to include this Jira. > > > > On Thu, Nov 9, 2017 at 6:03 PM, Charles Givre wrote: > > > >> Hi Arina, > >> Can we include DRILL-4091 Support for additional GIS operations in > version > >> 1.12? In general the code looked pretty good. There was a unit test > >> missing which the developer submitted and some minor formatting issues > >> which I’m still waiting on. > >> Thanks, > >> —C > >> > >> > >> > >>> On Nov 9, 2017, at 10:58, Arina Yelchiyeva > > >> wrote: > >>> > >>> Current status: > >>> > >>> Blocker: > >>> DRILL-5917: Ban org.json:json library in Drill (developer - Vlad R., > code > >>> reviewer - ?) - in progress. > >>> > >>> Targeted for 1.12 release: > >>> DRILL-5337: OpenTSDB plugin (developer - Dmitriy & Vlad S., code > >> reviewer - > >>> Arina) - code review in final stage. > >>> DRILL-4779: Kafka storage plugin support (developer - Anil & Kamesh, > code > >>> reviewer - Paul) - in review. > >>> DRILL-5943: Avoid the strong check introduced by DRILL-5582 for PLAIN > >>> mechanism (developer - Sorabh, code reviewer - Parth & Laurent) - > waiting > >>> for the code review. > >>> DRILL-5771: Fix serDe errors for format plugins (developer - Arina, > code > >>> reviewer - Tim) - waiting for the code review. > >>> > >>> Kind regards > >>> Arina > >>> > >>> On Fri, Oct 20, 2017 at 1:49 PM, Arina Yelchiyeva < > >>> arina.yelchiy...@gmail.com> wrote: > >>> > Current status: > > Targeted for 1.12 release: > DRILL-5832: Migrate OperatorFixture to use SystemOptionManager rather > >> than > mock (developer - Paul, code reviewer - ?) - waiting for the code > review > DRILL-5842: Refactor and simplify the fragment, operator contexts for > testing (developer - Paul, code reviewer - ?) - waiting for the code > review > DRILL-5834: Adding network functions (developer - Charles, code > reviewer > - Arina) - waiting changes after code review > DRILL-5337: OpenTSDB plugin (developer - Dmitriy, code reviewer - > >> Arina) - waiting > for the code review > DRILL-5772: Enable UTF-8 support in query string by default > (developer - > Arina, code reviewer - Paul) - finalizing approach > DRILL-4779: Kafka storage plugin support (developer - Anil, code > >> reviewer > - ?) - finishing implementation > > Under question: > DRILL-4286: Graceful shutdown of drillbit (developer - Jyothsna, code > reviewer - ?) - waiting for the status update from the developer > > Please free to suggest other items that are targeted for 1.12 release. > There are many Jiras that have fix version marked as 1.12, it would be > >> good > if developers revisit them and update fix version to the actual one. > Link to the dashboard - https://issues.apache.org/ > jira/secure/RapidBoard.jspa?rapidView=185= > DRILL=detail > > Kind regards > Arina > > > On Wed, Oct 11, 2017 at 2:42 AM, Parth Chandra > >> wrote: > > > I'm waiting to merge the SSL changes in. Waiting a couple of days > >> more to > > see if there are any more comments before I merge the changes in. > > > > On Mon, Oct 9, 2017 at 10:28 AM, Paul Rogers > wrote: > > > >> Hi Arina, > >> > >> In addition to my own PRs, there are several in the “active” queue > >> that > > we > >> could get in if we can just push them over the line and clear the > >> queue. > >> The owners of the PRs should check if we are waiting on them to take > > action. > >> > >> 977 DRILL-5849: Add freemarker lib to dependencyManagement to > >> ensure > >> prop… > >> 976 DRILL-5797: Choose parquet reader from read columns > >> 975 DRILL-5743: Handling column family and column scan for hbase > >> 973 DRILL-5775: Select * query on a maprdb binary table fails > >> 972 DRILL-5838: Fix MaprDB filter pushdown for the case of > nested > >> field (reg. of DRILL-4264) > >> 950 Drill 5431: SSL Support > >> 949
[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...
GitHub user chunhui-shi opened a pull request: https://github.com/apache/drill/pull/1032 DRILL-5089: Dynamically load schema of storage plugin only when neede⦠â¦d for every query For each query, loading all storage plugins and loading all workspaces under file system plugins is not needed. This patch use DynamicRootSchema as the root schema for Drill. Which loads correspondent storage only when needed. infoschema to read full schema information and load second level schema accordingly. for workspaces under the same Filesyetm, no need to create FileSystem for each workspace. use fs.access API to check permission which is available after HDFS 2.6 except for windows + local file system case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chunhui-shi/drill DRILL-5089-pull Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1032.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1032 commit a381677c59a7371733bae12ad4896b7cc927da5e Author: chunhui-shiDate: 2017-11-03T00:06:25Z DRILL-5089: Dynamically load schema of storage plugin only when needed for every query For each query, loading all storage plugins and loading all workspaces under file system plugins is not needed. This patch use DynamicRootSchema as the root schema for Drill. Which loads correspondent storage only when needed. infoschema to read full schema information and load second level schema accordingly. for workspaces under the same Filesyetm, no need to create FileSystem for each workspace. use fs.access API to check permission which is available after HDFS 2.6 except for windows + local file system case. ---
[GitHub] drill pull request #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout...
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1024#discussion_r150160362 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -96,6 +105,14 @@ private void throwIfClosed() throws AlreadyClosedSqlException, throw new AlreadyClosedSqlException( "ResultSet is already closed." ); } } + +//Implicit check for whether timeout is set +if (elapsedTimer != null) { --- End diff -- Ok, so I think I see how you've been trying to help me test the server side timeout. You are hoping to have a unit test force the awaiteFirstMessage() throw the exception by preventing the server from sending back any batch of data, since the sample test data doesn't allow for any query to run sufficiently long. All the current tests I've added essentially have already delivered the data from the 'Drill Server' to the 'DrillClient', but the application downstream has not consumed it. Your suggestion of putting a `pause` before the `execute()` call got me thinking that the timer had already begun after Statement initialization. My understanding now is that you're simply asking to block any SCREEN operator from sending back any batches. So, the DrillCursor should time out waiting for the first batch. In fact, I'm thinking that I don't even need a pause. The DrillCursor awaits all the time for something from the SCREEN operator that never comes and eventually times out. However, since the control injection is essentially applying to the Connection (`alter session ...`, any other unit tests in parallel execution on the same connection, would be affected by this. So, I would need to also undo this at the end of the test, if the connection is reused. Or fork off a connection exclusively for this. Was that what you've been suggesting all along? ---
[GitHub] drill issue #889: DRILL-5691: enhance scalar sub queries checking for the ca...
Github user weijietong commented on the issue: https://github.com/apache/drill/pull/889 @amansinha100 thanks for sharing the information. Got your point. I think your propose on [CALCITE-1048](https://issues.apache.org/jira/browse/CALCITE-1048) is possible. Since [CALCITE-794](https://issues.apache.org/jira/browse/CALCITE-794) has completed at version 1.6 ,it seems there's a more perfect solution( to get the least max number of all the rels of the RelSubSet). But due to Drill's Caclite version is still based on 1.4 , I support your current temp solution. Only wonder that whether the explicitly searched RelNode's (such as DrillAggregateRel) maxRowCount can represent the best RelNode's maxRowCount ? ---
[GitHub] drill pull request #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout...
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1024#discussion_r150157923 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -96,6 +105,14 @@ private void throwIfClosed() throws AlreadyClosedSqlException, throw new AlreadyClosedSqlException( "ResultSet is already closed." ); } } + +//Implicit check for whether timeout is set +if (elapsedTimer != null) { --- End diff -- I don't think you are wrong, but I think the interpretation of the timeout is ambiguous. My understanding based on what drivers like Oracle do is to start the timeout only when the execute call is made. So, for a regular Statement object, just initialization (or even setting the timeout) should not be the basis of starting the timer. With regards to whether we are testing for the time when only the DrillCursor is in operation, we'd need a query that is running sufficiently long to timeout before the server can send back anything for the very first time. The `awaitFirstMessage()` already has the timeout applied there and worked in some of my longer running sample queries. If you're hinting towards this, then yes.. it is certainly doesn't hurt to have the test, although the timeout does guarantee exactly that. I'm not familiar with the Drillbit Injection feature, so let me tinker a bit to confirm it before I update the PR. ---
[GitHub] drill pull request #1031: DRILL-5917: Ban org.json:json library in Drill
GitHub user vrozov opened a pull request: https://github.com/apache/drill/pull/1031 DRILL-5917: Ban org.json:json library in Drill @arina-ielchiieva Please review You can merge this pull request into a Git repository by running: $ git pull https://github.com/vrozov/drill DRILL-5917 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1031.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1031 commit 525989d8603954bfd5e12a680acde7ac00e26300 Author: Vlad RozovDate: 2017-11-08T01:07:38Z DRILL-5917: Ban org.json:json library in Drill ---
[GitHub] drill issue #1014: DRILL-5771: Fix serDe errors for format plugins
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1014 Not sure the description here is entirely correct. Let's separate two concepts: the plugin (code) and the plugin definition (the stuff in JSON.) Plugin definitions are stored in ZK and retrieved by the Foreman. There may be some form of race condition in the Foreman, but that's not my focus here. The plugin *definition* is read by the Forman and serialized into the physical plan. Each worker reads the definition from the physical plan. For this reason, the worker's definition can never be out of date: it is the definition used when serializing the plan. Further, Drill allows table functions which provide query-time name/value pair settings for format plugin properties. The only way these can work is to be serialized along with the query. So, the actual serialized format plugin definition, included with the query, includes both the ZK information and the table function information. ---
[jira] [Created] (DRILL-5950) Allow JSON files to be splittable - for sequence of objects format
Paul Rogers created DRILL-5950: -- Summary: Allow JSON files to be splittable - for sequence of objects format Key: DRILL-5950 URL: https://issues.apache.org/jira/browse/DRILL-5950 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.12.0 Reporter: Paul Rogers The JSON plugin format is not currently splittable. This means that every JSON file must be read by only a single thread. By contrast, text files are splittable. The key barrier to allowing JSON files to be splittable is the lack of a good mechanism to find the start of a record at some arbitrary point in the file. Text readers handle this by scanning forward looking for (say) the newline that separates records. (Though this process can be thrown off if a newline appears in a quoted value, and the start quote appears before the split point.) However, as was discovered in a previous JSON fix, Drill's form of JSON does provide the tools. In standard JSON, a list of records must be stuctured as a list: {code} [ { text: "first record"}, { text: "second record"}, ... { text: "final record" } ] {code} In this form, it is impossible to find the start of a record without parsing from the first character onwards. But, Drill uses a common, but non-standard, JSON structure that dispenses with the array and the commas between records: {code} { text: "first record" } { text: "second record" } ... { text: "last record" } {code} This form does unambiguously allow finding the start of the record. Simply scan until we find the tokens: , , possibly separated by white space. That sequence is not valid JSON and only occurs between records in the sequence-of-records format. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] drill issue #1014: DRILL-5771: Fix serDe errors for format plugins
Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1014 @arina-ielchiieva - The parts addressing DRILL-4640 and DRILL-5166 LGTM - I think the fix for DRILL-5771 LGTM but I would like write down what I think is happening and confirm with you that my understanding is correct. This is mostly just a learning exercise for me since I am not very familiar with this part of the code :). In DRILL-5771 there were two issues. ## Race Conditions With Format Plugins ### Issue The following used to happen before the fix: 1. When using an existing format plugin, the **FormatPlugin** would create a **DrillTable** with a **NamedFormatPluginConfig** which only contains the name of the format plugin to use. 1. The **ScanOperator** created for a **DrillTable** will contain the **NamedFormatPluginConfig** 1. When the **ScanOperators** are serialized in to the physical plan the serialized **ScanOperator** will only contain the name of the format plugin to use. 1. When a worker deserializes the physical plan to do a scan, he gets the name of the **FormatPluginConfig** to use. 1. The worker then looks up the correct **FormatPlugin** in the **FormatCreator** using the name he has. 1. The worker can get into trouble if the **FormatPlugins** he has cached in his **FormatCreator** is out of sync with the rest of the cluster. ### Fix Race conditions are eliminated because the **DrillTables** returned by the **FormatPlugins** no longer contain a **NamedFormatPluginConfig**, they contain the full **FormatPluginConfig** not just a name alias. So when a query is executed: 1. The ScanOperator contains the complete **FormatPluginConfig** 1. When the physical plan is serialized it contains the complete **FormatPluginConfig** for each scan operator. 1. When a worker node deserializes the ScanOperator it also has the complete **FormatPluginConfig** so it can reconstruct the **FormatPlugin** correctly, whereas previously the worker would have to do a lookup using the **FormatPlugin** name in the **FormatCreator** when the cache in the **FormatCreator** may be out of sync with the rest of the cluster. ## FormatPluginConfig Equals and HashCode ### Issue The **FileSystemPlugin** looks up **FormatPlugins** corresponding to a **FormatPluginConfig** in formatPluginsByConfig. However, the **FormatPluginConfig** implementations didn't override equals and hashCode. ---
[GitHub] drill pull request #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout...
Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/1024#discussion_r150131105 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -96,6 +105,14 @@ private void throwIfClosed() throws AlreadyClosedSqlException, throw new AlreadyClosedSqlException( "ResultSet is already closed." ); } } + +//Implicit check for whether timeout is set +if (elapsedTimer != null) { --- End diff -- Yes, I'm wrong? (asking because the rest of the sentence suggest I was right in my interpretation of the test). Maybe we can/should test both? I would have like to test for the first batch, but it's not possible to access the query id until `statement.execute()`, and I'd need it to unpause the request. ---
[GitHub] drill issue #1025: DRILL-5936: Refactor MergingRecordBatch based on code rev...
Github user priteshm commented on the issue: https://github.com/apache/drill/pull/1025 @amansinha100 can you review this change? ---
[GitHub] drill pull request #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout...
Github user kkhatua commented on a diff in the pull request: https://github.com/apache/drill/pull/1024#discussion_r150127658 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -96,6 +105,14 @@ private void throwIfClosed() throws AlreadyClosedSqlException, throw new AlreadyClosedSqlException( "ResultSet is already closed." ); } } + +//Implicit check for whether timeout is set +if (elapsedTimer != null) { --- End diff -- Yes. So I'm testing for the part when the batch has been fetched byt DrillCursor but not consumed via the DrillResultSetImpl. That's why I found the need for pausing the Screen operator odd and, hence, the question. ---
[GitHub] drill pull request #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout...
Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/1024#discussion_r150119338 --- Diff: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java --- @@ -96,6 +105,14 @@ private void throwIfClosed() throws AlreadyClosedSqlException, throw new AlreadyClosedSqlException( "ResultSet is already closed." ); } } + +//Implicit check for whether timeout is set +if (elapsedTimer != null) { --- End diff -- I wonder if we actually test timeout during DrillCursor operations. It seems your test relies on the user being slow to read data from the result set although the data has already been fetched by the client. Am I wrong? ---
[GitHub] drill pull request #1028: DRILL-5943: Avoid the strong check introduced by D...
Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/1028#discussion_r150118518 --- Diff: contrib/native/client/src/clientlib/saslAuthenticatorImpl.hpp --- @@ -59,6 +59,12 @@ class SaslAuthenticatorImpl { const char *getErrorMessage(int errorCode); +static const std::string KERBEROS_SIMPLE_NAME; + +static const std::string KERBEROS_SASL_NAME; --- End diff -- do we need to expose it? (it looks like we only look for the keys) ---
[jira] [Created] (DRILL-5949) JSON format options should be part of plugin config; not session options
Paul Rogers created DRILL-5949: -- Summary: JSON format options should be part of plugin config; not session options Key: DRILL-5949 URL: https://issues.apache.org/jira/browse/DRILL-5949 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.12.0 Reporter: Paul Rogers Drill provides a JSON record reader. Drill provides two ways to configure this reader: * Using the JSON plugin configuration. * Using a set of session options. The plugin configuration defines the file suffix associated with JSON files. The session options are: * {{store.json.all_text_mode}} * {{store.json.read_numbers_as_double}} * {{store.json.reader.skip_invalid_records}} * {{store.json.reader.print_skipped_invalid_record_number}} Suppose I have to JSON files from different sources (and keep them in distinct directories.) For the one, I want to use {{all_text_mode}} off as the data is nicely formatted. Also, my numbers are fine, so I want {{read_numbers_as_double}} off. But, the other file is a mess and uses a rather ad-hoc format. So, I want these two options turned on. As it turns out I often query both files. Today, I must set the session options one way to query my "clean" file, then reverse them to query the "dirty" file. Next, I want to join the two files. How do I set the options one way for the "clean" file, and the other for the "dirty" file within the *same query*? Can't. Now, consider the text format plugin that can read CSV, TSV, PSV and so on. It has a variety of options. But, the are *not* session options; they are instead options in the plugin definition. This allows me to, say, have a plugin config for CSV-with-headers files that I get from source A, and a different plugin config for my CSV-without-headers files from source B. Suppose we applied the text reader technique to the JSON reader. We'd move the session options listed above into the JSON format plugin. Then, I can define one plugin for my "clean" files, and a different plugin config for my "dirty" files. What's more, I can then use table functions to adjust the format for each file as needed within a single query. Since table functions are part of a query, I can add them to a view that I define for the various JSON files. The result is a far simpler user experience than the tedium of resetting session options for every query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/984#discussion_r15009 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetComparison.java --- @@ -255,4 +257,39 @@ private void verifyArray(String colLabel, ArrayReader ea, } } } + + // TODO make a native RowSetComparison comparator + public static class ObjectComparator implements Comparator { --- End diff -- This is used in the DrillTestWrapper to verify the ordering of results. I agree this is not suitable for equality tests, but it's intended to be used only for ordering tests. I didn't add support for all the supported RowSet types because we would first have to move DrillTestWrapper to use RowSets (currently it uses Maps and Lists to represent data). Currently it is not used by RowSets, but the intention is to move DrillTestWrapper to use RowSets and then make this comparator operate on RowSets, but that will be an incremental process. ---
[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/984#discussion_r150097140 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java --- @@ -85,8 +85,7 @@ * new row set with the updated columns, then merge the new * and old row sets to create a new immutable row set. */ - - public interface RowSetWriter extends TupleWriter { + interface RowSetWriter extends TupleWriter { --- End diff -- Ah, forgot that the file defines an interface, not a class. (The situation I described was an interface nested inside a class.) So, you're good. ---
[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/984#discussion_r150096444 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/file/JsonFileBuilder.java --- @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.test.rowSet.file; + +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.Lists; +import com.google.common.collect.Maps; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.accessor.ColumnAccessor; +import org.apache.drill.exec.vector.accessor.ColumnReader; +import org.apache.drill.test.rowSet.RowSet; + +import java.io.BufferedOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.util.Iterator; +import java.util.List; +import java.util.Map; + +public class JsonFileBuilder +{ + public static final String DEFAULT_DOUBLE_FORMATTER = "%f"; + public static final String DEFAULT_INTEGER_FORMATTER = "%d"; + public static final String DEFAULT_LONG_FORMATTER = "%d"; + public static final String DEFAULT_STRING_FORMATTER = "\"%s\""; + public static final String DEFAULT_DECIMAL_FORMATTER = "%s"; + public static final String DEFAULT_PERIOD_FORMATTER = "%s"; + + public static final MapDEFAULT_FORMATTERS = new ImmutableMap.Builder() +.put(ColumnAccessor.ValueType.DOUBLE, DEFAULT_DOUBLE_FORMATTER) +.put(ColumnAccessor.ValueType.INTEGER, DEFAULT_INTEGER_FORMATTER) +.put(ColumnAccessor.ValueType.LONG, DEFAULT_LONG_FORMATTER) +.put(ColumnAccessor.ValueType.STRING, DEFAULT_STRING_FORMATTER) +.put(ColumnAccessor.ValueType.DECIMAL, DEFAULT_DECIMAL_FORMATTER) +.put(ColumnAccessor.ValueType.PERIOD, DEFAULT_PERIOD_FORMATTER) +.build(); + + private final RowSet rowSet; + private final Map customFormatters = Maps.newHashMap(); + + public JsonFileBuilder(RowSet rowSet) { +this.rowSet = Preconditions.checkNotNull(rowSet); +Preconditions.checkArgument(rowSet.rowCount() > 0, "The given rowset is empty."); + } + + public JsonFileBuilder setCustomFormatter(final String columnName, final String columnFormatter) { +Preconditions.checkNotNull(columnName); +Preconditions.checkNotNull(columnFormatter); + +Iterator fields = rowSet + .schema() + .batch() + .iterator(); + +boolean hasColumn = false; + +while (!hasColumn && fields.hasNext()) { + hasColumn = fields.next() +.getName() +.equals(columnName); +} + +final String message = String.format("(%s) is not a valid column", columnName); +Preconditions.checkArgument(hasColumn, message); + +customFormatters.put(columnName, columnFormatter); + +return this; + } + + public void build(File tableFile) throws IOException { --- End diff -- Sounds Good ---
[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...
Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/984#discussion_r150096261 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java --- @@ -85,8 +85,7 @@ * new row set with the updated columns, then merge the new * and old row sets to create a new immutable row set. */ - - public interface RowSetWriter extends TupleWriter { + interface RowSetWriter extends TupleWriter { --- End diff -- IntelliJ gave a warning that the modifier is redundant. Also an interface nested inside another interface is public by default. https://beginnersbook.com/2016/03/nested-or-inner-interfaces-in-java/ ---
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150087815 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java --- @@ -0,0 +1,61 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import java.util.List; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.GroupScan; +import org.apache.drill.exec.physical.impl.BatchCreator; +import org.apache.drill.exec.physical.impl.ScanBatch; +import org.apache.drill.exec.record.CloseableRecordBatch; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.store.RecordReader; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; + +public class KafkaScanBatchCreator implements BatchCreator { + static final Logger logger = LoggerFactory.getLogger(KafkaScanBatchCreator.class); + + @Override + public CloseableRecordBatch getBatch(FragmentContext context, KafkaSubScan subScan, List children) + throws ExecutionSetupException { +Preconditions.checkArgument(children.isEmpty()); +List readers = Lists.newArrayList(); +List columns = null; +for (KafkaSubScan.KafkaSubScanSpec scanSpec : subScan.getPartitionSubScanSpecList()) { + try { +if ((columns = subScan.getCoulmns()) == null) { + columns = GroupScan.ALL_COLUMNS; +} --- End diff -- When will the columns be null? Not sure this is a valid state. However, as noted above, an empty list is a valid state (used for `COUNT(*)` queries.) ---
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150087650 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java --- @@ -0,0 +1,61 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import java.util.List; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.GroupScan; +import org.apache.drill.exec.physical.impl.BatchCreator; +import org.apache.drill.exec.physical.impl.ScanBatch; +import org.apache.drill.exec.record.CloseableRecordBatch; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.store.RecordReader; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; + +public class KafkaScanBatchCreator implements BatchCreator { + static final Logger logger = LoggerFactory.getLogger(KafkaScanBatchCreator.class); + + @Override + public CloseableRecordBatch getBatch(FragmentContext context, KafkaSubScan subScan, List children) + throws ExecutionSetupException { +Preconditions.checkArgument(children.isEmpty()); +List readers = Lists.newArrayList(); +List columns = null; +for (KafkaSubScan.KafkaSubScanSpec scanSpec : subScan.getPartitionSubScanSpecList()) { + try { +if ((columns = subScan.getCoulmns()) == null) { + columns = GroupScan.ALL_COLUMNS; +} --- End diff -- The column list can be shared by all readers, and so can be created outside of the loop over scan specs. ---
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150087581 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java --- @@ -0,0 +1,61 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import java.util.List; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.GroupScan; +import org.apache.drill.exec.physical.impl.BatchCreator; +import org.apache.drill.exec.physical.impl.ScanBatch; +import org.apache.drill.exec.record.CloseableRecordBatch; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.store.RecordReader; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; + +public class KafkaScanBatchCreator implements BatchCreator { + static final Logger logger = LoggerFactory.getLogger(KafkaScanBatchCreator.class); + + @Override + public CloseableRecordBatch getBatch(FragmentContext context, KafkaSubScan subScan, List children) + throws ExecutionSetupException { +Preconditions.checkArgument(children.isEmpty()); +List readers = Lists.newArrayList(); +List columns = null; +for (KafkaSubScan.KafkaSubScanSpec scanSpec : subScan.getPartitionSubScanSpecList()) { + try { +if ((columns = subScan.getCoulmns()) == null) { + columns = GroupScan.ALL_COLUMNS; +} --- End diff -- `getCoulmns()` --> `getColumns()` ---
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150088237 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaScanBatchCreator.java --- @@ -0,0 +1,61 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import java.util.List; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.physical.base.GroupScan; +import org.apache.drill.exec.physical.impl.BatchCreator; +import org.apache.drill.exec.physical.impl.ScanBatch; +import org.apache.drill.exec.record.CloseableRecordBatch; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.store.RecordReader; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; + +public class KafkaScanBatchCreator implements BatchCreator { + static final Logger logger = LoggerFactory.getLogger(KafkaScanBatchCreator.class); + + @Override + public CloseableRecordBatch getBatch(FragmentContext context, KafkaSubScan subScan, List children) + throws ExecutionSetupException { +Preconditions.checkArgument(children.isEmpty()); +List readers = Lists.newArrayList(); +List columns = null; +for (KafkaSubScan.KafkaSubScanSpec scanSpec : subScan.getPartitionSubScanSpecList()) { + try { +if ((columns = subScan.getCoulmns()) == null) { + columns = GroupScan.ALL_COLUMNS; +} +readers.add(new KafkaRecordReader(scanSpec, columns, context, subScan.getKafkaStoragePlugin())); + } catch (Exception e) { +logger.error("KafkaRecordReader creation failed for subScan: " + subScan + ".",e); --- End diff -- Here we are catching all errors and putting a generic messages into the log, and sending a generic exception up the stack. Better is to use a `UserException` at the actual point of failure so we can tell the user exactly what is wrong. Then, here, have a `catch` block for `UserException` that simply rethrows, while handling all other exceptions as is done here. ---
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150086292 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +} +return transformed; + } + + @Override + public void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException { +this.writer = new VectorContainerWriter(output,
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150084784 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +} +return transformed; + } + + @Override + public void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException { +this.writer = new VectorContainerWriter(output,
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150081972 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { --- End diff -- Preparing columns is really quite difficult with many cases to handle. Drill appears to allow projection of the form `a.b`, `a.c` which means that `a` is a map and we wish to project just `b` and `c` from `a`. As it turns out, there is an active project to replace the ad-hoc projection logic in each reader with a common, complete implementation. That work may be ready by Drill 1.13, so whatever is done here is temporary. Finally, the code checks for
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150084335 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +} +return transformed; + } + + @Override + public void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException { +this.writer = new VectorContainerWriter(output,
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150086335 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +} +return transformed; + } + + @Override + public void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException { +this.writer = new VectorContainerWriter(output,
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150087367 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +} +return transformed; + } + + @Override + public void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException { +this.writer = new VectorContainerWriter(output,
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150082711 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +} +return transformed; + } + + @Override + public void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException { +this.writer = new VectorContainerWriter(output,
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150083767 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaRecordReader.java --- @@ -0,0 +1,178 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +import static org.apache.drill.exec.store.kafka.DrillKafkaConfig.DRILL_KAFKA_POLL_TIMEOUT; + +import java.util.Collection; +import java.util.Iterator; +import java.util.List; +import java.util.Set; +import java.util.concurrent.TimeUnit; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.kafka.KafkaSubScan.KafkaSubScanSpec; +import org.apache.drill.exec.store.kafka.decoders.MessageReader; +import org.apache.drill.exec.store.kafka.decoders.MessageReaderFactory; +import org.apache.drill.exec.util.Utilities; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.common.TopicPartition; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.base.Stopwatch; +import com.google.common.collect.Lists; +import com.google.common.collect.Sets; +public class KafkaRecordReader extends AbstractRecordReader { + private static final Logger logger = LoggerFactory.getLogger(KafkaRecordReader.class); + public static final long DEFAULT_MESSAGES_PER_BATCH = 4000; + + private VectorContainerWriter writer; + private MessageReader messageReader; + + private boolean unionEnabled; + private KafkaConsumerkafkaConsumer; + private KafkaStoragePlugin plugin; + private KafkaSubScanSpec subScanSpec; + private long kafkaPollTimeOut; + private long endOffset; + + private long currentOffset; + private long totalFetchTime = 0; + + private List partitions; + private final boolean enableAllTextMode; + private final boolean readNumbersAsDouble; + + private Iterator > messageIter; + + public KafkaRecordReader(KafkaSubScan.KafkaSubScanSpec subScanSpec, List projectedColumns, + FragmentContext context, KafkaStoragePlugin plugin) { +setColumns(projectedColumns); +this.enableAllTextMode = context.getOptions().getOption(ExecConstants.KAFKA_ALL_TEXT_MODE).bool_val; +this.readNumbersAsDouble = context.getOptions() + .getOption(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE).bool_val; +this.unionEnabled = context.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE); +this.plugin = plugin; +this.subScanSpec = subScanSpec; +this.endOffset = subScanSpec.getEndOffset(); +this.kafkaPollTimeOut = Long.valueOf(plugin.getConfig().getDrillKafkaProps().getProperty(DRILL_KAFKA_POLL_TIMEOUT)); + } + + @Override + protected Collection transformColumns(Collection projectedColumns) { +Set transformed = Sets.newLinkedHashSet(); +if (!isStarQuery()) { + for (SchemaPath column : projectedColumns) { +transformed.add(column); + } +} else { + transformed.add(Utilities.STAR_COLUMN); +} +return transformed; + } + + @Override + public void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException { +this.writer = new VectorContainerWriter(output,
[GitHub] drill pull request #1029: DRILL-5867: List profiles in pages rather than a l...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1029#discussion_r150086785 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -93,13 +96,35 @@ public ProfileInfo(DrillConfig drillConfig, String queryId, long startTime, long this.time = new Date(startTime); this.foreman = foreman; this.link = generateLink(drillConfig, foreman, queryId); - this.query = query.substring(0, Math.min(query.length(), 150)); + this.query = extractQuerySnippet(query); this.state = state; this.user = user; this.totalCost = totalCost; this.queueName = queueName; } +private String extractQuerySnippet(String queryText) { + //Extract upto max char limit as snippet + String sizeCappedQuerySnippet = queryText.substring(0, Math.min(queryText.length(), QUERY_SNIPPET_MAX_CHAR)); + //Trimming down based on line-count + if ( QUERY_SNIPPET_MAX_LINES < sizeCappedQuerySnippet.split(System.lineSeparator()).length ) { +int linesConstructed = 0; +StringBuilder lineCappedQuerySnippet = new StringBuilder(); +String[] queryParts = sizeCappedQuerySnippet.split(System.lineSeparator()); +for (String qPart : queryParts) { + lineCappedQuerySnippet.append(qPart); + if ( ++linesConstructed < QUERY_SNIPPET_MAX_LINES ) { +lineCappedQuerySnippet.append(System.lineSeparator()); --- End diff -- Do we want to append with new line or maybe space for better readability? ---
[GitHub] drill pull request #1029: DRILL-5867: List profiles in pages rather than a l...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1029#discussion_r150085841 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -93,13 +96,35 @@ public ProfileInfo(DrillConfig drillConfig, String queryId, long startTime, long this.time = new Date(startTime); this.foreman = foreman; this.link = generateLink(drillConfig, foreman, queryId); - this.query = query.substring(0, Math.min(query.length(), 150)); + this.query = extractQuerySnippet(query); this.state = state; this.user = user; this.totalCost = totalCost; this.queueName = queueName; } +private String extractQuerySnippet(String queryText) { --- End diff -- 1. I usually place private method int he end of of the class. 2. We can add javadoc here explaining that first we limit original query size and if size fits but query has too many lines we limit it as well for better readability on Web UI. ---
[GitHub] drill pull request #1029: DRILL-5867: List profiles in pages rather than a l...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1029#discussion_r150086018 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileResources.java --- @@ -93,13 +96,35 @@ public ProfileInfo(DrillConfig drillConfig, String queryId, long startTime, long this.time = new Date(startTime); this.foreman = foreman; this.link = generateLink(drillConfig, foreman, queryId); - this.query = query.substring(0, Math.min(query.length(), 150)); + this.query = extractQuerySnippet(query); this.state = state; this.user = user; this.totalCost = totalCost; this.queueName = queueName; } +private String extractQuerySnippet(String queryText) { + //Extract upto max char limit as snippet + String sizeCappedQuerySnippet = queryText.substring(0, Math.min(queryText.length(), QUERY_SNIPPET_MAX_CHAR)); + //Trimming down based on line-count + if ( QUERY_SNIPPET_MAX_LINES < sizeCappedQuerySnippet.split(System.lineSeparator()).length ) { --- End diff -- 1. We can create variable for `sizeCappedQuerySnippet.split(System.lineSeparator())` so we do split only once. 2. Please remove spaces in `if` clause: `if ( QUERY_SNIPPET_MAX_LINES < sizeCappedQuerySnippet.split(System.lineSeparator()).length ) {` -> `if (QUERY_SNIPPET_MAX_LINES < splittedQuery.length) {` and in `if ( ++linesConstructed < QUERY_SNIPPET_MAX_LINES ) {` in the code below. ---
[GitHub] drill issue #1021: DRILL-5923: Display name for query state
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1021 Well, I don't have strong preference here, we can use array, as long as Prasad makes it nicely documented as in your example rather then in one line. ``` String displayNames[] = { "First Value", // FIRST_VALUE = 0 "Second Value", // SECOND_VALUE = 1 ... }; ``` ---
Errors Building Drill
Hello all, I’m getting the following errors when I try to build Drill from source with the tests. If I skip the mongodb test it builds fine, but I’m not sure what could be causing this. Any suggestions? — C 2017-11-09T15:13:25.047-0500 I NETWORK [conn2] end connection 127.0.0.1:58580 (1 connection now open) [mongod output] 2017-11-09T15:13:25.158-0500 I CONTROL [conn6] now exiting [mongod output] 2017-11-09T15:13:25.158-0500 I NETWORK [conn6] shutdown: going to close listening sockets... [mongod output] 2017-11-09T15:13:25.158-0500 I NETWORK [conn6] closing listening socket: 5 [mongod output] 2017-11-09T15:13:25.158-0500 I NETWORK [conn6] closing listening socket: 6 [mongod output] 2017-11-09T15:13:25.158-0500 I NETWORK [conn6] removing socket file: /tmp/mongodb-27022.sock [mongod output] 2017-11-09T15:13:25.158-0500 I NETWORK [conn6] shutdown: going to flush diaglog... [mongod output] 2017-11-09T15:13:25.158-0500 I NETWORK [conn6] shutdown: going to close sockets... [mongod output] 2017-11-09T15:13:25.158-0500 I STORAGE [conn6] WiredTigerKVEngine shutting down [mongod output] 2017-11-09T15:13:25.311-0500 I STORAGE [conn6] shutdown: removing fs lock... [mongod output] 2017-11-09T15:13:25.312-0500 I CONTROL [conn6] dbexit: rc: 0 [mongod output] Results : Tests in error: MongoTestSuit.initMongo:231 » IO Could not start process: Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/984#discussion_r150073673 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetComparison.java --- @@ -255,4 +257,39 @@ private void verifyArray(String colLabel, ArrayReader ea, } } } + + // TODO make a native RowSetComparison comparator + public static class ObjectComparator implements Comparator { --- End diff -- Defined here, but not used in this file. Does not include all types that Drill supports (via the RowSet): Date, byte arrays, BigDecimal, etc. Does not allow for ranges for floats & doubles as does JUnit. (Two floats are seldom exactly equal.) ---
[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/984#discussion_r150072992 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/file/JsonFileBuilder.java --- @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.test.rowSet.file; + +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableMap; +import com.google.common.collect.Lists; +import com.google.common.collect.Maps; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.vector.accessor.ColumnAccessor; +import org.apache.drill.exec.vector.accessor.ColumnReader; +import org.apache.drill.test.rowSet.RowSet; + +import java.io.BufferedOutputStream; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.util.Iterator; +import java.util.List; +import java.util.Map; + +public class JsonFileBuilder +{ + public static final String DEFAULT_DOUBLE_FORMATTER = "%f"; + public static final String DEFAULT_INTEGER_FORMATTER = "%d"; + public static final String DEFAULT_LONG_FORMATTER = "%d"; + public static final String DEFAULT_STRING_FORMATTER = "\"%s\""; + public static final String DEFAULT_DECIMAL_FORMATTER = "%s"; + public static final String DEFAULT_PERIOD_FORMATTER = "%s"; + + public static final MapDEFAULT_FORMATTERS = new ImmutableMap.Builder() +.put(ColumnAccessor.ValueType.DOUBLE, DEFAULT_DOUBLE_FORMATTER) +.put(ColumnAccessor.ValueType.INTEGER, DEFAULT_INTEGER_FORMATTER) +.put(ColumnAccessor.ValueType.LONG, DEFAULT_LONG_FORMATTER) +.put(ColumnAccessor.ValueType.STRING, DEFAULT_STRING_FORMATTER) +.put(ColumnAccessor.ValueType.DECIMAL, DEFAULT_DECIMAL_FORMATTER) +.put(ColumnAccessor.ValueType.PERIOD, DEFAULT_PERIOD_FORMATTER) +.build(); + + private final RowSet rowSet; + private final Map customFormatters = Maps.newHashMap(); + + public JsonFileBuilder(RowSet rowSet) { +this.rowSet = Preconditions.checkNotNull(rowSet); +Preconditions.checkArgument(rowSet.rowCount() > 0, "The given rowset is empty."); + } + + public JsonFileBuilder setCustomFormatter(final String columnName, final String columnFormatter) { +Preconditions.checkNotNull(columnName); +Preconditions.checkNotNull(columnFormatter); + +Iterator fields = rowSet + .schema() + .batch() + .iterator(); + +boolean hasColumn = false; + +while (!hasColumn && fields.hasNext()) { + hasColumn = fields.next() +.getName() +.equals(columnName); +} + +final String message = String.format("(%s) is not a valid column", columnName); +Preconditions.checkArgument(hasColumn, message); + +customFormatters.put(columnName, columnFormatter); + +return this; + } + + public void build(File tableFile) throws IOException { --- End diff -- Great! This does not yet handle nested tuples or arrays; in part because the row set work for that is still sitting in PR #914. You can update this to be aware of maps and map arrays once that PR is committed. ---
[GitHub] drill pull request #984: DRILL-5783 Made a unit test for generated Priority ...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/984#discussion_r150073945 --- Diff: exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSet.java --- @@ -85,8 +85,7 @@ * new row set with the updated columns, then merge the new * and old row sets to create a new immutable row set. */ - - public interface RowSetWriter extends TupleWriter { + interface RowSetWriter extends TupleWriter { --- End diff -- Aren't nested interfaces `protected` by default? Just had to change one from default to `public` so I could use it in another package... ---
[GitHub] drill issue #1021: DRILL-5923: Display name for query state
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1021 @arina-ielchiieva, it helps to think about the source of the enum. This is a Protobuf enum. The ordinal values cannot change; they are a contract between sender and receiver. We can add new ones, or retire old ones, but otherwise the values are frozen in time. The array approach captures this reality. We could document the array better: ``` String displayNames[] = { "First Value", // FIRST_VALUE = 0 "Second Value", // SECOND_VALUE = 1 ... }; ``` We can also do a bounds check: ``` if (enumValue.ordinal() >= displayNames.length) { return enumValue.toString(); else return displayNames[enumValue.ordinal()); } ``` But, IMHO a map seems overkill for such a simple task. Yes, it works, but is unnecessary. As they say, "make it as simple as possible (but no simpler)." ---
[GitHub] drill pull request #921: DRILL-4286 Graceful shutdown of drillbit
Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/921#discussion_r150047464 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java --- @@ -348,6 +354,21 @@ public void run() { */ } + /* +Check if the foreman is ONLINE. If not dont accept any new queries. + */ + public void checkForemanState() throws ForemanException{ +DrillbitEndpoint foreman = drillbitContext.getEndpoint(); +Collection dbs = drillbitContext.getAvailableBits(); --- End diff -- I was thinking of encapsulating code from lines 360 to 367 into a boolean isOnline(), since all the values in that code are derived from the current DrillbitContext. Then your code would be simplified to ` public void checkForemanState() throws ForemanException{ if (!drillbitContext.isOnline()) { throw new ForemanException("Query submission failed since Foreman is shutting down."); } } ` ---
[GitHub] drill issue #1029: DRILL-5867: List profiles in pages rather than a long ver...
Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/1029 Snapshot when testing with search filter for FAILED queries and navigating to page 2 of that list. Information about the number of filtered items, etc is also provided. ![image](https://user-images.githubusercontent.com/4335237/32622085-a8826c56-c536-11e7-9a18-7a09142b250e.png) ---
[GitHub] drill issue #1029: DRILL-5867: List profiles in pages rather than a long ver...
Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/1029 Snapshot when rendering the defaults (10 per page) from a pre-loaded set of the latest 123 profiles ![image](https://user-images.githubusercontent.com/4335237/32621917-412a90ba-c536-11e7-9d51-83220ce072d3.png) The query snippet is restricted to 8 lines at most and indicates if there is more to the query text with a trailing set of `...` ---
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150028303 --- Diff: contrib/storage-kafka/pom.xml --- @@ -0,0 +1,130 @@ + + +http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd; + xmlns="http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;> + 4.0.0 + + +drill-contrib-parent +org.apache.drill.contrib +1.12.0-SNAPSHOT + + + drill-storage-kafka + contrib/kafka-storage-plugin + + +UTF-8 +0.11.0.1 +**/KafkaTestSuit.class --- End diff -- What is the reason to define `kafka.TestSuite` property? ---
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150029170 --- Diff: contrib/storage-kafka/pom.xml --- @@ -0,0 +1,130 @@ + + +http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd; + xmlns="http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;> + 4.0.0 + + +drill-contrib-parent +org.apache.drill.contrib +1.12.0-SNAPSHOT + + + drill-storage-kafka + contrib/kafka-storage-plugin + + +UTF-8 +0.11.0.1 +**/KafkaTestSuit.class + + + + + +org.apache.maven.plugins +maven-surefire-plugin + --- End diff -- It will be better to go with the default `maven-surefire-plugin` configuration unless there is a good justification to use custom config. Most of the time this can be achieved by using default test name convention. ---
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150030044 --- Diff: contrib/storage-kafka/pom.xml --- @@ -0,0 +1,130 @@ + + +http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd; + xmlns="http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;> + 4.0.0 + + +drill-contrib-parent +org.apache.drill.contrib +1.12.0-SNAPSHOT + + + drill-storage-kafka + contrib/kafka-storage-plugin + + +UTF-8 +0.11.0.1 +**/KafkaTestSuit.class + + + + + +org.apache.maven.plugins +maven-surefire-plugin + + +${kafka.TestSuite} + + +**/TestKafkaQueries.java + + + + logback.log.dir + ${project.build.directory}/surefire-reports + + + + + + + + + + org.apache.drill.exec + drill-java-exec + ${project.version} + + --- End diff -- Why is it necessary to exclude zookeeper? If a specific version of zookeeper is required, will it be better to explicitly add zookeeper to the dependency management? ---
[jira] [Created] (DRILL-5948) The wrong number of batches is displayed
Vlad created DRILL-5948: --- Summary: The wrong number of batches is displayed Key: DRILL-5948 URL: https://issues.apache.org/jira/browse/DRILL-5948 Project: Apache Drill Issue Type: Bug Affects Versions: 1.11.0 Reporter: Vlad I suppose, when you execute a query with a small amount of data drill must create 1 batch, but here you can see that drill created 2 batches. I think it's a wrong behaviour for the drill. Full JSON file will be in the attachment. {code:html} "fragmentProfile": [ { "majorFragmentId": 0, "minorFragmentProfile": [ { "state": 3, "minorFragmentId": 0, "operatorProfile": [ { "inputProfile": [ { "records": 1, "batches": 2, "schemas": 1 } ], "operatorId": 2, "operatorType": 29, "setupNanos": 0, "processNanos": 1767363740, "peakLocalMemoryAllocated": 639120, "waitNanos": 25787 }, {code} Step to reproduce: # Create JSON file with 1 row # Execute star query whith this file, for example {code:sql} select * from dfs.`/path/to/your/file/example.json` {code} # Go to the Profile page on the UI, and open info about your query # Open JSON profile -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150019576 --- Diff: contrib/storage-kafka/pom.xml --- @@ -0,0 +1,130 @@ + + +http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd; + xmlns="http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;> + 4.0.0 + + +drill-contrib-parent +org.apache.drill.contrib +1.12.0-SNAPSHOT + + + drill-storage-kafka + contrib/kafka-storage-plugin + + +UTF-8 --- End diff -- If the setting is necessary, it will be better to set it at the root pom. ---
Re: [DISCUSS] Drill 1.12.0 release
Yes, they are already in master. On Thu, Nov 9, 2017 at 6:05 PM, Charles Givrewrote: > We’re including the Networking functions in this release right? > > > On Nov 9, 2017, at 11:04, Arina Yelchiyeva > wrote: > > > > If changes will be done before cut off date, targeting mid November that > it > > will be possible to include this Jira. > > > > On Thu, Nov 9, 2017 at 6:03 PM, Charles Givre wrote: > > > >> Hi Arina, > >> Can we include DRILL-4091 Support for additional GIS operations in > version > >> 1.12? In general the code looked pretty good. There was a unit test > >> missing which the developer submitted and some minor formatting issues > >> which I’m still waiting on. > >> Thanks, > >> —C > >> > >> > >> > >>> On Nov 9, 2017, at 10:58, Arina Yelchiyeva > > >> wrote: > >>> > >>> Current status: > >>> > >>> Blocker: > >>> DRILL-5917: Ban org.json:json library in Drill (developer - Vlad R., > code > >>> reviewer - ?) - in progress. > >>> > >>> Targeted for 1.12 release: > >>> DRILL-5337: OpenTSDB plugin (developer - Dmitriy & Vlad S., code > >> reviewer - > >>> Arina) - code review in final stage. > >>> DRILL-4779: Kafka storage plugin support (developer - Anil & Kamesh, > code > >>> reviewer - Paul) - in review. > >>> DRILL-5943: Avoid the strong check introduced by DRILL-5582 for PLAIN > >>> mechanism (developer - Sorabh, code reviewer - Parth & Laurent) - > waiting > >>> for the code review. > >>> DRILL-5771: Fix serDe errors for format plugins (developer - Arina, > code > >>> reviewer - Tim) - waiting for the code review. > >>> > >>> Kind regards > >>> Arina > >>> > >>> On Fri, Oct 20, 2017 at 1:49 PM, Arina Yelchiyeva < > >>> arina.yelchiy...@gmail.com> wrote: > >>> > Current status: > > Targeted for 1.12 release: > DRILL-5832: Migrate OperatorFixture to use SystemOptionManager rather > >> than > mock (developer - Paul, code reviewer - ?) - waiting for the code > review > DRILL-5842: Refactor and simplify the fragment, operator contexts for > testing (developer - Paul, code reviewer - ?) - waiting for the code > review > DRILL-5834: Adding network functions (developer - Charles, code > reviewer > - Arina) - waiting changes after code review > DRILL-5337: OpenTSDB plugin (developer - Dmitriy, code reviewer - > >> Arina) - waiting > for the code review > DRILL-5772: Enable UTF-8 support in query string by default > (developer - > Arina, code reviewer - Paul) - finalizing approach > DRILL-4779: Kafka storage plugin support (developer - Anil, code > >> reviewer > - ?) - finishing implementation > > Under question: > DRILL-4286: Graceful shutdown of drillbit (developer - Jyothsna, code > reviewer - ?) - waiting for the status update from the developer > > Please free to suggest other items that are targeted for 1.12 release. > There are many Jiras that have fix version marked as 1.12, it would be > >> good > if developers revisit them and update fix version to the actual one. > Link to the dashboard - https://issues.apache.org/ > jira/secure/RapidBoard.jspa?rapidView=185= > DRILL=detail > > Kind regards > Arina > > > On Wed, Oct 11, 2017 at 2:42 AM, Parth Chandra > >> wrote: > > > I'm waiting to merge the SSL changes in. Waiting a couple of days > >> more to > > see if there are any more comments before I merge the changes in. > > > > On Mon, Oct 9, 2017 at 10:28 AM, Paul Rogers > wrote: > > > >> Hi Arina, > >> > >> In addition to my own PRs, there are several in the “active” queue > >> that > > we > >> could get in if we can just push them over the line and clear the > >> queue. > >> The owners of the PRs should check if we are waiting on them to take > > action. > >> > >> 977 DRILL-5849: Add freemarker lib to dependencyManagement to > >> ensure > >> prop… > >> 976 DRILL-5797: Choose parquet reader from read columns > >> 975 DRILL-5743: Handling column family and column scan for hbase > >> 973 DRILL-5775: Select * query on a maprdb binary table fails > >> 972 DRILL-5838: Fix MaprDB filter pushdown for the case of > nested > >> field (reg. of DRILL-4264) > >> 950 Drill 5431: SSL Support > >> 949 DRILL-5795: Parquet Filter push down at rowgroup level > >> 936 DRILL-5772: Add unit tests to indicate how utf-8 support can > >> be > >> enabled / disabled in Drill > >> 904 DRILL-5717: change some date time test cases with specific > >> timezone or Local > >> 892 DRILL-5645: negation of expression causes null pointer > >> exception > >> 889 DRILL-5691: enhance scalar sub queries checking for the > > cartesian > >> join > >> > >> (Items not
Re: [DISCUSS] Drill 1.12.0 release
We’re including the Networking functions in this release right? > On Nov 9, 2017, at 11:04, Arina Yelchiyevawrote: > > If changes will be done before cut off date, targeting mid November that it > will be possible to include this Jira. > > On Thu, Nov 9, 2017 at 6:03 PM, Charles Givre wrote: > >> Hi Arina, >> Can we include DRILL-4091 Support for additional GIS operations in version >> 1.12? In general the code looked pretty good. There was a unit test >> missing which the developer submitted and some minor formatting issues >> which I’m still waiting on. >> Thanks, >> —C >> >> >> >>> On Nov 9, 2017, at 10:58, Arina Yelchiyeva >> wrote: >>> >>> Current status: >>> >>> Blocker: >>> DRILL-5917: Ban org.json:json library in Drill (developer - Vlad R., code >>> reviewer - ?) - in progress. >>> >>> Targeted for 1.12 release: >>> DRILL-5337: OpenTSDB plugin (developer - Dmitriy & Vlad S., code >> reviewer - >>> Arina) - code review in final stage. >>> DRILL-4779: Kafka storage plugin support (developer - Anil & Kamesh, code >>> reviewer - Paul) - in review. >>> DRILL-5943: Avoid the strong check introduced by DRILL-5582 for PLAIN >>> mechanism (developer - Sorabh, code reviewer - Parth & Laurent) - waiting >>> for the code review. >>> DRILL-5771: Fix serDe errors for format plugins (developer - Arina, code >>> reviewer - Tim) - waiting for the code review. >>> >>> Kind regards >>> Arina >>> >>> On Fri, Oct 20, 2017 at 1:49 PM, Arina Yelchiyeva < >>> arina.yelchiy...@gmail.com> wrote: >>> Current status: Targeted for 1.12 release: DRILL-5832: Migrate OperatorFixture to use SystemOptionManager rather >> than mock (developer - Paul, code reviewer - ?) - waiting for the code review DRILL-5842: Refactor and simplify the fragment, operator contexts for testing (developer - Paul, code reviewer - ?) - waiting for the code review DRILL-5834: Adding network functions (developer - Charles, code reviewer - Arina) - waiting changes after code review DRILL-5337: OpenTSDB plugin (developer - Dmitriy, code reviewer - >> Arina) - waiting for the code review DRILL-5772: Enable UTF-8 support in query string by default (developer - Arina, code reviewer - Paul) - finalizing approach DRILL-4779: Kafka storage plugin support (developer - Anil, code >> reviewer - ?) - finishing implementation Under question: DRILL-4286: Graceful shutdown of drillbit (developer - Jyothsna, code reviewer - ?) - waiting for the status update from the developer Please free to suggest other items that are targeted for 1.12 release. There are many Jiras that have fix version marked as 1.12, it would be >> good if developers revisit them and update fix version to the actual one. Link to the dashboard - https://issues.apache.org/ jira/secure/RapidBoard.jspa?rapidView=185=DRILL=detail Kind regards Arina On Wed, Oct 11, 2017 at 2:42 AM, Parth Chandra >> wrote: > I'm waiting to merge the SSL changes in. Waiting a couple of days >> more to > see if there are any more comments before I merge the changes in. > > On Mon, Oct 9, 2017 at 10:28 AM, Paul Rogers wrote: > >> Hi Arina, >> >> In addition to my own PRs, there are several in the “active” queue >> that > we >> could get in if we can just push them over the line and clear the >> queue. >> The owners of the PRs should check if we are waiting on them to take > action. >> >> 977 DRILL-5849: Add freemarker lib to dependencyManagement to >> ensure >> prop… >> 976 DRILL-5797: Choose parquet reader from read columns >> 975 DRILL-5743: Handling column family and column scan for hbase >> 973 DRILL-5775: Select * query on a maprdb binary table fails >> 972 DRILL-5838: Fix MaprDB filter pushdown for the case of nested >> field (reg. of DRILL-4264) >> 950 Drill 5431: SSL Support >> 949 DRILL-5795: Parquet Filter push down at rowgroup level >> 936 DRILL-5772: Add unit tests to indicate how utf-8 support can >> be >> enabled / disabled in Drill >> 904 DRILL-5717: change some date time test cases with specific >> timezone or Local >> 892 DRILL-5645: negation of expression causes null pointer >> exception >> 889 DRILL-5691: enhance scalar sub queries checking for the > cartesian >> join >> >> (Items not on the list above have become “inactive” for a variety of >> reasons.) >> >> Thanks, >> >> - Paul >> >>> On Oct 9, 2017, at 9:57 AM, Paul Rogers wrote: >>> >>> Hi Arina, >>> >>> I’d like to include the following that are needed to finish up the >> “managed” sort and spill-to-disk for
Re: [DISCUSS] Drill 1.12.0 release
If changes will be done before cut off date, targeting mid November that it will be possible to include this Jira. On Thu, Nov 9, 2017 at 6:03 PM, Charles Givrewrote: > Hi Arina, > Can we include DRILL-4091 Support for additional GIS operations in version > 1.12? In general the code looked pretty good. There was a unit test > missing which the developer submitted and some minor formatting issues > which I’m still waiting on. > Thanks, > —C > > > > > On Nov 9, 2017, at 10:58, Arina Yelchiyeva > wrote: > > > > Current status: > > > > Blocker: > > DRILL-5917: Ban org.json:json library in Drill (developer - Vlad R., code > > reviewer - ?) - in progress. > > > > Targeted for 1.12 release: > > DRILL-5337: OpenTSDB plugin (developer - Dmitriy & Vlad S., code > reviewer - > > Arina) - code review in final stage. > > DRILL-4779: Kafka storage plugin support (developer - Anil & Kamesh, code > > reviewer - Paul) - in review. > > DRILL-5943: Avoid the strong check introduced by DRILL-5582 for PLAIN > > mechanism (developer - Sorabh, code reviewer - Parth & Laurent) - waiting > > for the code review. > > DRILL-5771: Fix serDe errors for format plugins (developer - Arina, code > > reviewer - Tim) - waiting for the code review. > > > > Kind regards > > Arina > > > > On Fri, Oct 20, 2017 at 1:49 PM, Arina Yelchiyeva < > > arina.yelchiy...@gmail.com> wrote: > > > >> Current status: > >> > >> Targeted for 1.12 release: > >> DRILL-5832: Migrate OperatorFixture to use SystemOptionManager rather > than > >> mock (developer - Paul, code reviewer - ?) - waiting for the code review > >> DRILL-5842: Refactor and simplify the fragment, operator contexts for > >> testing (developer - Paul, code reviewer - ?) - waiting for the code > >> review > >> DRILL-5834: Adding network functions (developer - Charles, code reviewer > >> - Arina) - waiting changes after code review > >> DRILL-5337: OpenTSDB plugin (developer - Dmitriy, code reviewer - > Arina) - waiting > >> for the code review > >> DRILL-5772: Enable UTF-8 support in query string by default (developer - > >> Arina, code reviewer - Paul) - finalizing approach > >> DRILL-4779: Kafka storage plugin support (developer - Anil, code > reviewer > >> - ?) - finishing implementation > >> > >> Under question: > >> DRILL-4286: Graceful shutdown of drillbit (developer - Jyothsna, code > >> reviewer - ?) - waiting for the status update from the developer > >> > >> Please free to suggest other items that are targeted for 1.12 release. > >> There are many Jiras that have fix version marked as 1.12, it would be > good > >> if developers revisit them and update fix version to the actual one. > >> Link to the dashboard - https://issues.apache.org/ > >> jira/secure/RapidBoard.jspa?rapidView=185=DRILL=detail > >> > >> Kind regards > >> Arina > >> > >> > >> On Wed, Oct 11, 2017 at 2:42 AM, Parth Chandra > wrote: > >> > >>> I'm waiting to merge the SSL changes in. Waiting a couple of days > more to > >>> see if there are any more comments before I merge the changes in. > >>> > >>> On Mon, Oct 9, 2017 at 10:28 AM, Paul Rogers wrote: > >>> > Hi Arina, > > In addition to my own PRs, there are several in the “active” queue > that > >>> we > could get in if we can just push them over the line and clear the > queue. > The owners of the PRs should check if we are waiting on them to take > >>> action. > > 977 DRILL-5849: Add freemarker lib to dependencyManagement to > ensure > prop… > 976 DRILL-5797: Choose parquet reader from read columns > 975 DRILL-5743: Handling column family and column scan for hbase > 973 DRILL-5775: Select * query on a maprdb binary table fails > 972 DRILL-5838: Fix MaprDB filter pushdown for the case of nested > field (reg. of DRILL-4264) > 950 Drill 5431: SSL Support > 949 DRILL-5795: Parquet Filter push down at rowgroup level > 936 DRILL-5772: Add unit tests to indicate how utf-8 support can > be > enabled / disabled in Drill > 904 DRILL-5717: change some date time test cases with specific > timezone or Local > 892 DRILL-5645: negation of expression causes null pointer > exception > 889 DRILL-5691: enhance scalar sub queries checking for the > >>> cartesian > join > > (Items not on the list above have become “inactive” for a variety of > reasons.) > > Thanks, > > - Paul > > > On Oct 9, 2017, at 9:57 AM, Paul Rogers wrote: > > > > Hi Arina, > > > > I’d like to include the following that are needed to finish up the > “managed” sort and spill-to-disk for hash agg: > > > > #928: DRILL-5716: Queue-driven memory allocation > > #958, DRILL-5808: Reduce memory allocator strictness for "managed" > operators > > #960, DRILL-5815: Option to set
Re: [DISCUSS] Drill 1.12.0 release
Hi Arina, Can we include DRILL-4091 Support for additional GIS operations in version 1.12? In general the code looked pretty good. There was a unit test missing which the developer submitted and some minor formatting issues which I’m still waiting on. Thanks, —C > On Nov 9, 2017, at 10:58, Arina Yelchiyevawrote: > > Current status: > > Blocker: > DRILL-5917: Ban org.json:json library in Drill (developer - Vlad R., code > reviewer - ?) - in progress. > > Targeted for 1.12 release: > DRILL-5337: OpenTSDB plugin (developer - Dmitriy & Vlad S., code reviewer - > Arina) - code review in final stage. > DRILL-4779: Kafka storage plugin support (developer - Anil & Kamesh, code > reviewer - Paul) - in review. > DRILL-5943: Avoid the strong check introduced by DRILL-5582 for PLAIN > mechanism (developer - Sorabh, code reviewer - Parth & Laurent) - waiting > for the code review. > DRILL-5771: Fix serDe errors for format plugins (developer - Arina, code > reviewer - Tim) - waiting for the code review. > > Kind regards > Arina > > On Fri, Oct 20, 2017 at 1:49 PM, Arina Yelchiyeva < > arina.yelchiy...@gmail.com> wrote: > >> Current status: >> >> Targeted for 1.12 release: >> DRILL-5832: Migrate OperatorFixture to use SystemOptionManager rather than >> mock (developer - Paul, code reviewer - ?) - waiting for the code review >> DRILL-5842: Refactor and simplify the fragment, operator contexts for >> testing (developer - Paul, code reviewer - ?) - waiting for the code >> review >> DRILL-5834: Adding network functions (developer - Charles, code reviewer >> - Arina) - waiting changes after code review >> DRILL-5337: OpenTSDB plugin (developer - Dmitriy, code reviewer - Arina) - >> waiting >> for the code review >> DRILL-5772: Enable UTF-8 support in query string by default (developer - >> Arina, code reviewer - Paul) - finalizing approach >> DRILL-4779: Kafka storage plugin support (developer - Anil, code reviewer >> - ?) - finishing implementation >> >> Under question: >> DRILL-4286: Graceful shutdown of drillbit (developer - Jyothsna, code >> reviewer - ?) - waiting for the status update from the developer >> >> Please free to suggest other items that are targeted for 1.12 release. >> There are many Jiras that have fix version marked as 1.12, it would be good >> if developers revisit them and update fix version to the actual one. >> Link to the dashboard - https://issues.apache.org/ >> jira/secure/RapidBoard.jspa?rapidView=185=DRILL=detail >> >> Kind regards >> Arina >> >> >> On Wed, Oct 11, 2017 at 2:42 AM, Parth Chandra wrote: >> >>> I'm waiting to merge the SSL changes in. Waiting a couple of days more to >>> see if there are any more comments before I merge the changes in. >>> >>> On Mon, Oct 9, 2017 at 10:28 AM, Paul Rogers wrote: >>> Hi Arina, In addition to my own PRs, there are several in the “active” queue that >>> we could get in if we can just push them over the line and clear the queue. The owners of the PRs should check if we are waiting on them to take >>> action. 977 DRILL-5849: Add freemarker lib to dependencyManagement to ensure prop… 976 DRILL-5797: Choose parquet reader from read columns 975 DRILL-5743: Handling column family and column scan for hbase 973 DRILL-5775: Select * query on a maprdb binary table fails 972 DRILL-5838: Fix MaprDB filter pushdown for the case of nested field (reg. of DRILL-4264) 950 Drill 5431: SSL Support 949 DRILL-5795: Parquet Filter push down at rowgroup level 936 DRILL-5772: Add unit tests to indicate how utf-8 support can be enabled / disabled in Drill 904 DRILL-5717: change some date time test cases with specific timezone or Local 892 DRILL-5645: negation of expression causes null pointer exception 889 DRILL-5691: enhance scalar sub queries checking for the >>> cartesian join (Items not on the list above have become “inactive” for a variety of reasons.) Thanks, - Paul > On Oct 9, 2017, at 9:57 AM, Paul Rogers wrote: > > Hi Arina, > > I’d like to include the following that are needed to finish up the “managed” sort and spill-to-disk for hash agg: > > #928: DRILL-5716: Queue-driven memory allocation > #958, DRILL-5808: Reduce memory allocator strictness for "managed" operators > #960, DRILL-5815: Option to set query memory as percent of total > > The following is needed to resolve issues with HBase support in empty batches: > > #968, DRILL-5830: Resolve regressions to MapR DB from DRILL-5546 > > The following are nice-to-haves that build on work already done in >>> this release, and that some of my own work depends on: > > #970, DRILL-5832: Migrate OperatorFixture to use
Re: [DISCUSS] Drill 1.12.0 release
Current status: Blocker: DRILL-5917: Ban org.json:json library in Drill (developer - Vlad R., code reviewer - ?) - in progress. Targeted for 1.12 release: DRILL-5337: OpenTSDB plugin (developer - Dmitriy & Vlad S., code reviewer - Arina) - code review in final stage. DRILL-4779: Kafka storage plugin support (developer - Anil & Kamesh, code reviewer - Paul) - in review. DRILL-5943: Avoid the strong check introduced by DRILL-5582 for PLAIN mechanism (developer - Sorabh, code reviewer - Parth & Laurent) - waiting for the code review. DRILL-5771: Fix serDe errors for format plugins (developer - Arina, code reviewer - Tim) - waiting for the code review. Kind regards Arina On Fri, Oct 20, 2017 at 1:49 PM, Arina Yelchiyeva < arina.yelchiy...@gmail.com> wrote: > Current status: > > Targeted for 1.12 release: > DRILL-5832: Migrate OperatorFixture to use SystemOptionManager rather than > mock (developer - Paul, code reviewer - ?) - waiting for the code review > DRILL-5842: Refactor and simplify the fragment, operator contexts for > testing (developer - Paul, code reviewer - ?) - waiting for the code > review > DRILL-5834: Adding network functions (developer - Charles, code reviewer > - Arina) - waiting changes after code review > DRILL-5337: OpenTSDB plugin (developer - Dmitriy, code reviewer - Arina) - > waiting > for the code review > DRILL-5772: Enable UTF-8 support in query string by default (developer - > Arina, code reviewer - Paul) - finalizing approach > DRILL-4779: Kafka storage plugin support (developer - Anil, code reviewer > - ?) - finishing implementation > > Under question: > DRILL-4286: Graceful shutdown of drillbit (developer - Jyothsna, code > reviewer - ?) - waiting for the status update from the developer > > Please free to suggest other items that are targeted for 1.12 release. > There are many Jiras that have fix version marked as 1.12, it would be good > if developers revisit them and update fix version to the actual one. > Link to the dashboard - https://issues.apache.org/ > jira/secure/RapidBoard.jspa?rapidView=185=DRILL=detail > > Kind regards > Arina > > > On Wed, Oct 11, 2017 at 2:42 AM, Parth Chandrawrote: > >> I'm waiting to merge the SSL changes in. Waiting a couple of days more to >> see if there are any more comments before I merge the changes in. >> >> On Mon, Oct 9, 2017 at 10:28 AM, Paul Rogers wrote: >> >> > Hi Arina, >> > >> > In addition to my own PRs, there are several in the “active” queue that >> we >> > could get in if we can just push them over the line and clear the queue. >> > The owners of the PRs should check if we are waiting on them to take >> action. >> > >> > 977 DRILL-5849: Add freemarker lib to dependencyManagement to ensure >> > prop… >> > 976 DRILL-5797: Choose parquet reader from read columns >> > 975 DRILL-5743: Handling column family and column scan for hbase >> > 973 DRILL-5775: Select * query on a maprdb binary table fails >> > 972 DRILL-5838: Fix MaprDB filter pushdown for the case of nested >> > field (reg. of DRILL-4264) >> > 950 Drill 5431: SSL Support >> > 949 DRILL-5795: Parquet Filter push down at rowgroup level >> > 936 DRILL-5772: Add unit tests to indicate how utf-8 support can be >> > enabled / disabled in Drill >> > 904 DRILL-5717: change some date time test cases with specific >> > timezone or Local >> > 892 DRILL-5645: negation of expression causes null pointer exception >> > 889 DRILL-5691: enhance scalar sub queries checking for the >> cartesian >> > join >> > >> > (Items not on the list above have become “inactive” for a variety of >> > reasons.) >> > >> > Thanks, >> > >> > - Paul >> > >> > > On Oct 9, 2017, at 9:57 AM, Paul Rogers wrote: >> > > >> > > Hi Arina, >> > > >> > > I’d like to include the following that are needed to finish up the >> > “managed” sort and spill-to-disk for hash agg: >> > > >> > > #928: DRILL-5716: Queue-driven memory allocation >> > > #958, DRILL-5808: Reduce memory allocator strictness for "managed" >> > operators >> > > #960, DRILL-5815: Option to set query memory as percent of total >> > > >> > > The following is needed to resolve issues with HBase support in empty >> > batches: >> > > >> > > #968, DRILL-5830: Resolve regressions to MapR DB from DRILL-5546 >> > > >> > > The following are nice-to-haves that build on work already done in >> this >> > release, and that some of my own work depends on: >> > > >> > > #970, DRILL-5832: Migrate OperatorFixture to use SystemOptionManager >> > rather than mock >> > > #978: DRILL-5842: Refactor and simplify the fragment, operator >> contexts >> > for testing >> > > >> > > The following is not needed for 1.12 per-se, but is the foundation >> for a >> > project I’m working on; would be good to get this in after 2-3 months of >> > review time: >> > > >> > > #921, foundation for batch size limitation >> > > >> > > The key issue with each of the above is that they
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user akumarb2010 commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r150002516 --- Diff: contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/DrillKafkaConfig.java --- @@ -0,0 +1,31 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.kafka; + +public class DrillKafkaConfig { + + /** + * Timeout for fetching messages from Kafka --- End diff -- Thanks Paul, this is very good point and it perfectly make sense to add them as Drill session options instead of Drill config properties. We are working on these changes. ---
[GitHub] drill pull request #1021: DRILL-5923: Display name for query state
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1021#discussion_r149997750 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileUtil.java --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.server.rest.profile; + +import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState; + +import java.util.Collections; +import java.util.Map; + +import com.google.common.collect.Maps; + +public class ProfileUtil { + // Mapping query state names to display names + private static final MapqueryStateDisplayName; + + static { +Map displayNames = Maps.newHashMap(); --- End diff -- 1. Please use `Map ` since you're already receiving `QueryState` as in parameter in method. Besides, it would guarantee you did not make mistake writing query state enum names. 2. `queryStateDisplayName` -> `queryStateDisplayNames` ---
[GitHub] drill pull request #1021: DRILL-5923: Display name for query state
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1021#discussion_r149998367 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileUtil.java --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.server.rest.profile; + +import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState; + +import java.util.Collections; +import java.util.Map; + +import com.google.common.collect.Maps; + +public class ProfileUtil { + // Mapping query state names to display names + private static final MapqueryStateDisplayName; + + static { +Map displayNames = Maps.newHashMap(); +displayNames.put("STARTING", "Starting"); +displayNames.put("RUNNING", "Running"); +displayNames.put("COMPLETED", "Succeeded"); +displayNames.put("CANCELED", "Canceled"); +displayNames.put("FAILED", "Failed"); +displayNames.put("CANCELLATION_REQUESTED", "Cancellation Requested"); +displayNames.put("ENQUEUED", "Enqueued"); +queryStateDisplayName = Collections.unmodifiableMap(displayNames); + } + + + /** + * Utility to return display name for query state + * @param queryState + * @return display string for query state + */ + public final static String getQueryStateDisplayName(QueryState queryState) { +String state = queryState.name(); +if (queryStateDisplayName.containsKey(state)) { --- End diff -- This would be more optimal: ``` String state = queryStateDisplayNames.get(queryState); if (state == null) { state = "Unknown State" } return state; ``` ---
[GitHub] drill issue #904: DRILL-5717: change some date time test cases with specific...
Github user vvysotskyi commented on the issue: https://github.com/apache/drill/pull/904 @weijietong, thanks for the pull request, +1 ---
[GitHub] drill issue #1020: DRILL-5921: Display counter metrics in table
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1020 +1, LGTM. ---
[GitHub] drill pull request #1020: DRILL-5921: Display counter metrics in table
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1020#discussion_r149994673 --- Diff: exec/java-exec/src/main/resources/rest/metrics/metrics.ftl --- @@ -138,21 +154,14 @@ }); }; -function updateOthers(metrics) { - $.each(["counters", "meters"], function(i, key) { -if(! $.isEmptyObject(metrics[key])) { - $("#" + key + "Val").html(JSON.stringify(metrics[key], null, 2)); -} - }); -}; - var update = function() { $.get("/status/metrics", function(metrics) { updateGauges(metrics.gauges); updateBars(metrics.gauges); if(! $.isEmptyObject(metrics.timers)) createTable(metrics.timers, "timers"); if(! $.isEmptyObject(metrics.histograms)) createTable(metrics.histograms, "histograms"); -updateOthers(metrics); +if(! $.isEmptyObject(metrics.counters)) createCountersTable(metrics.counters); +if(! $.isEmptyObject(metrics.meters)) $("#metersVal").html(JSON.stringify(metrics.meters, null, 2)); --- End diff -- Well, sounds good then, thanks for making the changes. ---
[GitHub] drill issue #1014: DRILL-5771: Fix serDe errors for format plugins
Github user priteshm commented on the issue: https://github.com/apache/drill/pull/1014 @ilooner can you please review this? ---
[GitHub] drill issue #1030: DRILL-5941: Skip header / footer improvements for Hive st...
Github user priteshm commented on the issue: https://github.com/apache/drill/pull/1030 @ppadma can you review this? ---
[GitHub] drill pull request #1021: DRILL-5923: Display name for query state
Github user prasadns14 commented on a diff in the pull request: https://github.com/apache/drill/pull/1021#discussion_r149974787 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/QueryStateDisplayName.java --- @@ -0,0 +1,35 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.server.rest.profile; + +import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState; + +public class QueryStateDisplayName { + // Values should correspond to the QueryState enum in UserBitShared.proto --- End diff -- @arina-ielchiieva yes, map will definitely make it to easier to visualize the mapping. Made the changes ---
[GitHub] drill pull request #1020: DRILL-5921: Display counter metrics in table
Github user prasadns14 commented on a diff in the pull request: https://github.com/apache/drill/pull/1020#discussion_r149972346 --- Diff: exec/java-exec/src/main/resources/rest/metrics/metrics.ftl --- @@ -138,21 +154,14 @@ }); }; -function updateOthers(metrics) { - $.each(["counters", "meters"], function(i, key) { -if(! $.isEmptyObject(metrics[key])) { - $("#" + key + "Val").html(JSON.stringify(metrics[key], null, 2)); -} - }); -}; - var update = function() { $.get("/status/metrics", function(metrics) { updateGauges(metrics.gauges); updateBars(metrics.gauges); if(! $.isEmptyObject(metrics.timers)) createTable(metrics.timers, "timers"); if(! $.isEmptyObject(metrics.histograms)) createTable(metrics.histograms, "histograms"); -updateOthers(metrics); +if(! $.isEmptyObject(metrics.counters)) createCountersTable(metrics.counters); +if(! $.isEmptyObject(metrics.meters)) $("#metersVal").html(JSON.stringify(metrics.meters, null, 2)); --- End diff -- @arina-ielchiieva, I have considered reusing existing methods before deciding to have a separate method. With the above suggestion, the table will now look as below- drill.connections.rpc.control.encrypted| {count: 0} '|' here is column delimiter. Do we want to display only the number in the second column or a key/value pair? I just wanted it to be consistent with the other metrics tables. (so I print value.count) Removed meters section. ---
[GitHub] drill pull request #1030: DRILL-5941: Skip header / footer improvements for ...
GitHub user arina-ielchiieva opened a pull request: https://github.com/apache/drill/pull/1030 DRILL-5941: Skip header / footer improvements for Hive storage plugin Overview: 1. When table has header / footer process input splits fo the same file in one reader (bug fix for DRILL-5941). 2. Apply skip header logic during reader initialization only once to avoid checks during reading the data (DRILL-5106). 3. Apply skip footer logic only when footer is more then 0, otherwise default processing will be done without buffering data in queue (DRIL-5106). Code changes: 1. AbstractReadersInitializer was introduced to factor out common logic during readers intialization. It will have three implementations: a. Default (each input split gets its own reader); b. Empty (for empty tables); c. InputSplitGroups (applied when table has header / footer and input splits of the same file should be processed together). 2. AbstractRecordsInspector was introduced to improve performance when table has footer is less or equals to 0. It will have two implementations: a. Default (records will be processed one by one without buffering); b. SkipFooter (queue will be used to buffer N records that should be skipped in the end of file processing). 3. Allow HiveAbstractReader to have multiple input splits. You can merge this pull request into a Git repository by running: $ git pull https://github.com/arina-ielchiieva/drill DRILL-5941 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1030.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1030 ---
[GitHub] drill issue #1026: DRILL-5919: Add non-numeric support for JSON processing
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1026 Thanks, +1, LGTM. ---
[GitHub] drill issue #904: DRILL-5717: change some date time test cases with specific...
Github user weijietong commented on the issue: https://github.com/apache/drill/pull/904 done ---
[GitHub] drill issue #1026: DRILL-5919: Add non-numeric support for JSON processing
Github user vladimirtkach commented on the issue: https://github.com/apache/drill/pull/1026 @arina-ielchiieva made code changes according to your comments. ---
[GitHub] drill pull request #1026: DRILL-5919: Add non-numeric support for JSON proce...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1026#discussion_r149903182 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestJsonNonNumerics.java --- @@ -0,0 +1,167 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to you under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.drill.exec.vector.complex.writer; + +import com.google.common.collect.ImmutableMap; +import org.apache.commons.io.FileUtils; +import org.apache.drill.BaseTestQuery; +import org.apache.drill.common.exceptions.UserRemoteException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.record.RecordBatchLoader; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.rpc.user.QueryDataBatch; +import org.apache.drill.exec.vector.VarCharVector; +import org.junit.Test; + +import java.io.File; +import java.util.List; + +import static org.hamcrest.CoreMatchers.containsString; +import static org.junit.Assert.*; + +public class TestJsonNonNumerics extends BaseTestQuery { + + @Test + public void testNonNumericSelect() throws Exception { +File file = new File(getTempDir(""), "nan_test.json"); +String json = "{\"nan\":NaN, \"inf\":Infinity}"; +String query = String.format("select * from dfs.`%s`",file.getAbsolutePath()); +try { + FileUtils.writeStringToFile(file, json); + test("alter session set `store.json.reader.non_numeric_numbers` = true"); + testBuilder() +.sqlQuery(query) +.unOrdered() +.baselineColumns("nan", "inf") +.baselineValues(Double.NaN, Double.POSITIVE_INFINITY) +.build() +.run(); +} finally { + test("alter session reset `store.json.reader.non_numeric_numbers`"); + FileUtils.deleteQuietly(file); +} + } + + @Test(expected = UserRemoteException.class) + public void testNonNumericFailure() throws Exception { +File file = new File(getTempDir(""), "nan_test.json"); +test("alter session set `store.json.reader.non_numeric_numbers` = false"); +String json = "{\"nan\":NaN, \"inf\":Infinity}"; +try { + FileUtils.writeStringToFile(file, json); + test("select * from dfs.`%s`;", file.getAbsolutePath()); +} catch (UserRemoteException e) { + assertThat(e.getMessage(), containsString("Error parsing JSON")); + throw e; +} finally { + test("alter session reset `store.json.reader.non_numeric_numbers`"); + FileUtils.deleteQuietly(file); +} + } + + @Test + public void testCreateTableNonNumerics() throws Exception { +File file = new File(getTempDir(""), "nan_test.json"); +String json = "{\"nan\":NaN, \"inf\":Infinity}"; +String tableName = "ctas_test"; +try { + FileUtils.writeStringToFile(file, json); + test("alter session set `store.json.reader.non_numeric_numbers` = true"); + test("alter session set `store.json.writer.non_numeric_numbers` = true"); + test("alter session set `store.format`='json'"); + test("create table dfs_test.tmp.`%s` as select * from dfs.`%s`;", tableName, file.getAbsolutePath()); + + // ensuring that `NaN` and `Infinity` tokens ARE NOT enclosed with double quotes + File resultFile = new File(new File(getDfsTestTmpSchemaLocation(),tableName),"0_0_0.json"); + String resultJson = FileUtils.readFileToString(resultFile); + int nanIndex = resultJson.indexOf("NaN"); + assertFalse("`NaN` must not be enclosed with \"\" ", resultJson.charAt(nanIndex - 1) == '"'); + assertFalse("`NaN` must not be enclosed with \"\" ", resultJson.charAt(nanIndex + "NaN".length()) == '"'); + int infIndex = resultJson.indexOf("Infinity"); + assertFalse("`Infinity` must not be enclosed with \"\" ", resultJson.charAt(infIndex - 1) == '"'); +
[GitHub] drill pull request #1026: DRILL-5919: Add non-numeric support for JSON proce...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1026#discussion_r149903705 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/vector/complex/writer/TestJsonNonNumerics.java --- @@ -0,0 +1,167 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to you under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.drill.exec.vector.complex.writer; + +import com.google.common.collect.ImmutableMap; +import org.apache.commons.io.FileUtils; +import org.apache.drill.BaseTestQuery; +import org.apache.drill.common.exceptions.UserRemoteException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.record.RecordBatchLoader; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.rpc.user.QueryDataBatch; +import org.apache.drill.exec.vector.VarCharVector; +import org.junit.Test; + +import java.io.File; +import java.util.List; + +import static org.hamcrest.CoreMatchers.containsString; +import static org.junit.Assert.*; + +public class TestJsonNonNumerics extends BaseTestQuery { + + @Test + public void testNonNumericSelect() throws Exception { +File file = new File(getTempDir(""), "nan_test.json"); --- End diff -- It's better to pass dir name as well, rather than emptiness. Ex: `getTempDir("test_nan")` ---
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r149893057 --- Diff: contrib/storage-kafka/src/test/resources/logback-test.xml --- @@ -0,0 +1,51 @@ + --- End diff -- Please remove. Now we have common logging configuration for all in drill-common module. ---
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r149893459 --- Diff: contrib/storage-kafka/src/test/java/org/apache/drill/exec/store/kafka/cluster/EmbeddedZKQuorum.java --- @@ -0,0 +1,83 @@ +/** --- End diff -- Apache header should be in a form of comment, not Java doc. Please update here and in other newly added files. Hope, somebody will add to check-style so we won't have to remind about it all the time. ---
[GitHub] drill pull request #1027: DRILL-4779 : Kafka storage plugin
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1027#discussion_r149893582 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroRecordReader.java --- @@ -343,4 +343,4 @@ public void close() { } } } -} +} --- End diff -- Please revert changes in this file. ---
[GitHub] drill pull request #1020: DRILL-5921: Display counter metrics in table
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1020#discussion_r149891566 --- Diff: exec/java-exec/src/main/resources/rest/metrics/metrics.ftl --- @@ -138,21 +154,14 @@ }); }; -function updateOthers(metrics) { - $.each(["counters", "meters"], function(i, key) { -if(! $.isEmptyObject(metrics[key])) { - $("#" + key + "Val").html(JSON.stringify(metrics[key], null, 2)); -} - }); -}; - var update = function() { $.get("/status/metrics", function(metrics) { updateGauges(metrics.gauges); updateBars(metrics.gauges); if(! $.isEmptyObject(metrics.timers)) createTable(metrics.timers, "timers"); if(! $.isEmptyObject(metrics.histograms)) createTable(metrics.histograms, "histograms"); -updateOthers(metrics); +if(! $.isEmptyObject(metrics.counters)) createCountersTable(metrics.counters); +if(! $.isEmptyObject(metrics.meters)) $("#metersVal").html(JSON.stringify(metrics.meters, null, 2)); --- End diff -- @prasadns14 1. Thanks for adding the screenshots. 2. Most of the code in `createTable` and `createCountersTable` coincide. I suggested you make one function. For example, with three parameters, `createTable(metric, name, addReportingClass)`. When you don't need to add reporting class you'll call this method with false. Our goal here is generify existing methods rather then adding new specific with almost the same content. 3. If we don't have any meters, let's remove them. ---
[GitHub] drill pull request #1021: DRILL-5923: Display name for query state
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1021#discussion_r149889024 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/QueryStateDisplayName.java --- @@ -0,0 +1,35 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.server.rest.profile; + +import org.apache.drill.exec.proto.UserBitShared.QueryResult.QueryState; + +public class QueryStateDisplayName { + // Values should correspond to the QueryState enum in UserBitShared.proto --- End diff -- @prasadns14 1. When using array you depend on enum value order. If for some reason value order changes, your approach will be working incorrectly. 2. Having map will show which key is mapped to which value and it will be much easier to understand that, for example, completed is mapped to succeeded. I recommend you use map approach. ---