Re: Actual vectorization execution
Yes, Panama is a OpenJDK project ,Intel is the prime contributor. Some team of our company has tried this feature . I have a plan to try this feature to Drill. As @aman points out, it really will take some work to validate its stable and the newer version JDK. Gandiva brings a great idea to utilize LLVM to generate dynamic code. But to a JVM project , it can not cross the JNI to use C++ code. So Gandiva designed to operate on a batch data to compensate the JNI invocation cost. And it maybe only suit to physical operators like Project , Filter to work as one RecordBatch one JNI invocation to operate on a batch off-heap data . To HashAggregate ,HashJoin which operates a group row data, the JNI invocation times will too frequent to cause a performance cost. The Panama's intrinsic vector API has no JNI cost and gives us more flexible ability to code. I don't know whether we have a plan to move to Arrow. If not, we can reference Gandiva's idea to let some operator or partition of its logic to worked as C++ code. I agree with Paul's description of current Drill's execution. Current is only memory column data, not actual vectorized execution. Even a for-loop-a-column-data will not guarantee to generate SIMD code by the JVM JIT. Of course ,this still a good performance compared to others ,as this will be cpu pipeline friendly and a data local programing behavior. I have researched some papers like[1] to find a way to let operators like HashAggregate , HashJoin to be a vectorized execution. But java lacks the ability to directly generate vectorized code. It makes me worry that a rewrite operator will not have any notable performance promotion. So I look forward to Panama. [1] http://www.cs.columbia.edu/~orestis/sigmod15.pdf On Sat, Jun 30, 2018 at 4:08 AM Paul Rogers wrote: > Hi Weijie, > > As it turns out, vectorized processing in Drill is more aspirational than > operational at this point in time. > > The code used in Drill is not actually vector-based even though the data > itself is columnar. Drill generally does row-wise operations because > row-wise operations fit the SQL semantics better than column-wise > operations. There is generally a "loop over all rows" block of code that > calls into a "do something for column a" block, followed by a "do something > for column b" block, etc. > > For vectorized processing, the loops have to be inverted: "do all column > a" followed by "do all column b". That is not always possible, however. > > Further, many of Drill's readers produce Nullable types. In this case, > every value carries a null/not-null flag which must be checked for each > data value. It is unlikely that CPU instructions exist for this case. > > So, a first step is to research how various operators could be vectorized. > For example, how would we handle a "WHERE x = 10" case in a way that would > benefit from vectorization? How about a "SUM(x)" case? > > Once that is sorted out (there are likely research papers that explain how > others have done it), we can move onto changing the generated code (the > loop-over-all-rows code) to use the newer design. > > Thanks, > - Paul > > > > On Friday, June 29, 2018, 10:30:04 AM PDT, Aman Sinha < > amansi...@gmail.com> wrote: > > Hi Weijie, the Panama project is an OpenJDK initialitve, right [1] ? not > Intel specific. > It would be quite a bit of work to test and certify with Intel's JVM which > may be still in the experimental stage. > Also, you may have seen the Gandiva project for Apache Arrow which aims to > improve vectorization for operations > on Arrow buffers (this requires integration with Arrow). > > I assume the test program or workload you were running was already written > to exploit vectorization. Have you also looked into > Drill's code-gen to see which ones are amenable to vectorization ? We > could start with some small use case and expand. > > [1] > http://www.oracle.com/technetwork/java/jvmls2016-ajila-vidstedt-3125545.pdf > > On Fri, Jun 29, 2018 at 3:23 AM weijie tong > wrote: > > > HI all: > > > > I have investigate some vector friendly java codes's jit assembly code > by > > the JITWatch tool . Then I found that JVM did not generate the expected > AVX > > code.According to some conclusion from the JVM expert , JVM only supply > > some restrict usage case to generate AVX code. > > > >I found Intel have fired a project called panama, which supply the > > intrinsic vector API to actual execute AVX code. Here is the reference ( > > > > > https://software.intel.com/en-us/articles/vector-api-developer-program-for-java > > ) > > . It also supports offheap calculation. From our JVM team's message, the > > vector api will be released at JDK11. > > > >So I wonder whether we can distribute Intel's current JVM as a > supplied > > default JVM to users (like spark distribution with a default scala) and > as > > a option to rewrite parts of our operator codes according to this new > > vector api. > > >
[GitHub] HanumathRao commented on issue #1347: DRILL-6545: Projection Push down into Lateral Join operator.
HanumathRao commented on issue #1347: DRILL-6545: Projection Push down into Lateral Join operator. URL: https://github.com/apache/drill/pull/1347#issuecomment-401513962 @amansinha100 Thank you for the review. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sachouche commented on issue #1355: DRILL-6560: Enhanced the batch statistics logging enablement
sachouche commented on issue #1355: DRILL-6560: Enhanced the batch statistics logging enablement URL: https://github.com/apache/drill/pull/1355#issuecomment-401513955 @bitblender, can you please review this PR? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sachouche opened a new pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement
sachouche opened a new pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement URL: https://github.com/apache/drill/pull/1355 This PR is reserved for internal Drill testing. It provides more logging control when testing the batch sizing functionality. At this time, only the Parquet reader consumes this functionality. Other operators need to be retrofitted to use the centralized batch sizing logging utility. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sachouche commented on issue #1354: DRILL-6570: Fixed IndexOutofBoundException in Parquet Reader
sachouche commented on issue #1354: DRILL-6570: Fixed IndexOutofBoundException in Parquet Reader URL: https://github.com/apache/drill/pull/1354#issuecomment-401513254 @kkhatua, can you please review this fix? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sachouche opened a new pull request #1354: DRILL-6570: Fixed IndexOutofBoundException in Parquet Reader
sachouche opened a new pull request #1354: DRILL-6570: Fixed IndexOutofBoundException in Parquet Reader URL: https://github.com/apache/drill/pull/1354 Reserving same size intermediary buffers to handle the case of false-positive; that is, a column is first thought to be fixed length (after sampling), then reverted to variable length as we read more values. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-6570) IndexOutOfBoundsException when using Flat Parquet Reader
salim achouche created DRILL-6570: - Summary: IndexOutOfBoundsException when using Flat Parquet Reader Key: DRILL-6570 URL: https://issues.apache.org/jira/browse/DRILL-6570 Project: Apache Drill Issue Type: Improvement Components: Storage - Parquet Reporter: salim achouche Assignee: salim achouche Fix For: 1.14.0 * The Parquet Reader creates a reusable bulk entry based on the column precision * It uses the column precision for optimizing the intermediary heap buffers * It first detected the column was fixed length but then it reverted this assumption when the column changed precision * This step was fine except the bulk entry memory requirement changed though the code didn't update the bulk entry intermediary buffers -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199308452 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsHandler.java ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store; + +import org.apache.drill.common.logical.StoragePluginConfig; +import org.apache.drill.exec.planner.logical.StoragePlugins; +import org.apache.drill.exec.store.sys.PersistentStore; + + +/** + * Storage plugins handler is an additional service for updating storage plugins configs from the file + */ +public interface StoragePluginsHandler { + + /** + * Update incoming storage plugins configs from persistence store if present, otherwise bootstrap plugins configs. + * One of the params should be null, second shouldn't + * + * @param persistentStore the last storage plugins configs from persistence store + * @param bootstrapPlugins bootstrap storage plugins, which are used in case of first Drill start up + * @return all storage plugins, which should be loaded into persistence store + */ + void loadPlugins(PersistentStore persistentStore, StoragePlugins bootstrapPlugins); Review comment: I thought about it. There was some difficulties with it: * One - we need to do more calls to PS. We need to get all plugins from PS with iterator one by one. In my case we even don't get plugins (if `enabled` status is defined in the new plugin). * Arina proposed to determine handler not just for updating, but for loading plugins configs too. In this case it is more independent and the logic became simpler (annotations show nullability of variables) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199308790 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsHandlerService.java ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.base.Charsets; +import com.google.common.io.Resources; +import com.jasonclawson.jackson.dataformat.hocon.HoconFactory; +import org.apache.drill.common.config.CommonConstants; +import org.apache.drill.common.config.LogicalPlanPersistence; +import org.apache.drill.common.exceptions.DrillRuntimeException; +import org.apache.drill.common.logical.StoragePluginConfig; +import org.apache.drill.common.scanner.ClassPathScanner; +import org.apache.drill.exec.planner.logical.StoragePlugins; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.store.sys.PersistentStore; + +import javax.annotation.Nullable; +import javax.validation.constraints.NotNull; +import java.io.IOException; +import java.net.URL; +import java.util.HashMap; +import java.util.Map; +import java.util.Optional; +import java.util.Set; + +/** + * Drill plugins handler, which allows to update storage plugins configs from the + * {@link CommonConstants#STORAGE_PLUGINS_OVERRIDE_CONF} conf file + * + * TODO: DRILL-6564: It can be improved with configs versioning and service of creating + * {@link CommonConstants#STORAGE_PLUGINS_OVERRIDE_CONF} + */ +public class StoragePluginsHandlerService implements StoragePluginsHandler { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(StoragePluginsHandlerService.class); + + private final LogicalPlanPersistence hoconLogicalPlanPersistence; + + public StoragePluginsHandlerService(DrillbitContext context) { +hoconLogicalPlanPersistence = new LogicalPlanPersistence(context.getConfig(), context.getClasspathScan(), +new ObjectMapper(new HoconFactory())); + } + + @Override + public void loadPlugins(@NotNull PersistentStore persistentStore, + @Nullable StoragePlugins bootstrapPlugins) { +// if bootstrapPlugins is not null -- fresh Drill set up +StoragePlugins pluginsToBeWrittenToPersistentStore; + +StoragePlugins newPlugins = getNewStoragePlugins(); + +if (newPlugins != null) { + pluginsToBeWrittenToPersistentStore = new StoragePlugins(new HashMap<>()); + Optional.ofNullable(bootstrapPlugins) + .ifPresent(pluginsToBeWrittenToPersistentStore::putAll); + + for (Map.Entry newPlugin : newPlugins) { +String pluginName = newPlugin.getKey(); +StoragePluginConfig oldPluginConfig = Optional.ofNullable(bootstrapPlugins) +.map(plugins -> plugins.getConfig(pluginName)) +.orElse(persistentStore.get(pluginName)); +StoragePluginConfig updatedStatusPluginConfig = updatePluginStatus(oldPluginConfig, newPlugin.getValue()); +pluginsToBeWrittenToPersistentStore.put(pluginName, updatedStatusPluginConfig); + } +} else { + pluginsToBeWrittenToPersistentStore = bootstrapPlugins; +} + +// load pluginsToBeWrittenToPersistentStore to Persistent Store +Optional.ofNullable(pluginsToBeWrittenToPersistentStore) +.ifPresent(plugins -> plugins.forEach(plugin -> persistentStore.put(plugin.getKey(), plugin.getValue(; + } + + /** + * Helper method to identify the enabled status for new storage plugins config. If this status is absent in the updater + * file, the status is kept from the configs, which are going to be updated + * + * @param oldPluginConfig current storage plugin config from Persistent Store or bootstrap config file + * @param newPluginConfig new storage plugin config + * @return new storage plugin config with updated enabled status + */ + private StoragePluginConfig updatePluginStatus(@Nullable StoragePluginConfig oldPluginConfig, + @NotNull StoragePluginConfig newPluginConfig) { +if
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199307744 ## File path: contrib/storage-kafka/src/main/resources/bootstrap-storage-plugins.json ## @@ -2,8 +2,8 @@ "storage":{ kafka : { type:"kafka", - enabled: false, - kafkaConsumerProps: {"bootstrap.servers":"localhost:9092", "group.id" : "drill-consumer"} + kafkaConsumerProps: {"bootstrap.servers":"localhost:9092", "group.id" : "drill-consumer"}, + enabled: false Review comment: Currently after deserializing all the plugins configs are shown in Drill UI with enabled status in the end. It is replaced in config files too for consistency. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199310178 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java ## @@ -121,69 +121,73 @@ public StoragePlugin load(StoragePluginConfig config) throws Exception { } @Override - public void init() throws DrillbitStartupException { + public void init() { availablePlugins = findAvailablePlugins(classpathScan); +try { + StoragePlugins bootstrapPlugins = pluginSystemTable.getAll().hasNext() ? null : loadBootstrapPlugins(); -// create registered plugins defined in "storage-plugins.json" -plugins.putAll(createPlugins()); + StoragePluginsHandler storagePluginsUpdaterService = new StoragePluginsHandlerService(context); + storagePluginsUpdaterService.loadPlugins(pluginSystemTable, bootstrapPlugins); + + // Defines enabled plugins + defineEnabledPlugins(); +} catch (IOException e) { + logger.error("Failure setting up storage enabledPlugins. Drillbit exiting.", e); + throw new IllegalStateException(e); +} } - @SuppressWarnings("resource") - private Map createPlugins() throws DrillbitStartupException { -try { - /* - * Check if the storage plugins system table has any entries. If not, load the boostrap-storage-plugin file into - * the system table. - */ - if (!pluginSystemTable.getAll().hasNext()) { -// bootstrap load the config since no plugins are stored. -logger.info("No storage plugin instances configured in persistent store, loading bootstrap configuration."); -Collection urls = ClassPathScanner.forResource(ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE, false); -if (urls != null && !urls.isEmpty()) { - logger.info("Loading the storage plugin configs from URLs {}.", urls); - Map pluginURLMap = Maps.newHashMap(); - for (URL url : urls) { -String pluginsData = Resources.toString(url, Charsets.UTF_8); -StoragePlugins plugins = lpPersistence.getMapper().readValue(pluginsData, StoragePlugins.class); -for (Map.Entry config : plugins) { - if (!definePluginConfig(config.getKey(), config.getValue())) { -logger.warn("Duplicate plugin instance '{}' defined in [{}, {}], ignoring the later one.", -config.getKey(), pluginURLMap.get(config.getKey()), url); -continue; - } - pluginURLMap.put(config.getKey(), url); -} - } -} else { - throw new IOException("Failure finding " + ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE); -} + /** + * Read bootstrap storage plugins {@link ExecConstants#BOOTSTRAP_STORAGE_PLUGINS_FILE} files for the first fresh + * instantiating of Drill + * + * @return bootstrap storage plugins + * @throws IOException if a read error occurs + */ + private StoragePlugins loadBootstrapPlugins() throws IOException { +// bootstrap load the config since no plugins are stored. +logger.info("No storage plugin instances configured in persistent store, loading bootstrap configuration."); +Set urls = ClassPathScanner.forResource(ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE, false); +if (urls != null && !urls.isEmpty()) { + logger.info("Loading the storage plugin configs from URLs {}.", urls); + StoragePlugins bootstrapPlugins = new StoragePlugins(new HashMap<>()); + for (URL url : urls) { +String pluginsData = Resources.toString(url, Charsets.UTF_8); + bootstrapPlugins.putAll(lpPersistence.getMapper().readValue(pluginsData, StoragePlugins.class)); Review comment: Agree. Simpler doesn't mean better. Returned the previous logic. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on issue #1333: DRILL-6410: Memory leak in Parquet Reader during cancellation
ilooner commented on issue #1333: DRILL-6410: Memory leak in Parquet Reader during cancellation URL: https://github.com/apache/drill/pull/1333#issuecomment-401509765 @priteshm will take a look monday. @vrozov please fix conflict and travis failures. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Agirish commented on issue #1348: DRILL-6346: Create an Official Drill Docker Container
Agirish commented on issue #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#issuecomment-401508476 @arina-ielchiieva , on second thoughts, i think having a seperate profile is better. Created a new profile called 'docker'. Steps remain mostly similar: Build Project (any profile can be provided. If it's 'docker' profile, then docker image is built) `$ mvn clean install` Build Docker image Push image to Docker Hub (or any Container Registry): ``` cd distribution mvn dockerfile:build -Pdocker mvn dockerfile:push -Pdocker ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_
Robert Hou created DRILL-6569: - Summary: Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet Key: DRILL-6569 URL: https://issues.apache.org/jira/browse/DRILL-6569 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.14.0 Reporter: Robert Hou Assignee: Pritesh Maker Fix For: 1.14.0 This is TPCDS Query 19. Query: /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql SELECT i_brand_id brand_id, i_brand brand, i_manufact_id, i_manufact, Sum(ss_ext_sales_price) ext_price FROM date_dim, store_sales, item, customer, customer_address, store WHERE d_date_sk = ss_sold_date_sk AND ss_item_sk = i_item_sk AND i_manager_id = 38 AND d_moy = 12 AND d_year = 1998 AND ss_customer_sk = c_customer_sk AND c_current_addr_sk = ca_address_sk AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5) AND ss_store_sk = s_store_sk GROUP BY i_brand, i_brand_id, i_manufact_id, i_manufact ORDER BY ext_price DESC, i_brand, i_brand_id, i_manufact_id, i_manufact LIMIT 100; Here is the stack trace: 2018-06-29 07:00:32 INFO DrillTestLogger:348 - Exception: java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet Fragment 4:26 [Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010] (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243 hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199 org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57 org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417 org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54 org.apache.drill.exec.physical.impl.ScanBatch.next():172 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 org.apache.drill.exec.record.AbstractRecordBatch.next():152 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 org.apache.drill.exec.record.AbstractRecordBatch.next():152 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 org.apache.drill.exec.record.AbstractRecordBatch.next():152 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 org.apache.drill.exec.record.AbstractRecordBatch.next():152 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147 org.apache.drill.exec.record.AbstractRecordBatch.next():172 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 org.apache.drill.exec.record.AbstractRecordBatch.next():172 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 org.apache.drill.exec.record.AbstractRecordBatch.next():172 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109
[GitHub] sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199306981 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java ## @@ -121,69 +121,73 @@ public StoragePlugin load(StoragePluginConfig config) throws Exception { } @Override - public void init() throws DrillbitStartupException { + public void init() { availablePlugins = findAvailablePlugins(classpathScan); +try { + StoragePlugins bootstrapPlugins = pluginSystemTable.getAll().hasNext() ? null : loadBootstrapPlugins(); -// create registered plugins defined in "storage-plugins.json" -plugins.putAll(createPlugins()); + StoragePluginsHandler storagePluginsUpdaterService = new StoragePluginsHandlerService(context); + storagePluginsUpdaterService.loadPlugins(pluginSystemTable, bootstrapPlugins); + + // Defines enabled plugins + defineEnabledPlugins(); +} catch (IOException e) { + logger.error("Failure setting up storage enabledPlugins. Drillbit exiting.", e); + throw new IllegalStateException(e); +} } - @SuppressWarnings("resource") - private Map createPlugins() throws DrillbitStartupException { -try { - /* - * Check if the storage plugins system table has any entries. If not, load the boostrap-storage-plugin file into - * the system table. - */ - if (!pluginSystemTable.getAll().hasNext()) { -// bootstrap load the config since no plugins are stored. -logger.info("No storage plugin instances configured in persistent store, loading bootstrap configuration."); -Collection urls = ClassPathScanner.forResource(ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE, false); -if (urls != null && !urls.isEmpty()) { - logger.info("Loading the storage plugin configs from URLs {}.", urls); - Map pluginURLMap = Maps.newHashMap(); - for (URL url : urls) { -String pluginsData = Resources.toString(url, Charsets.UTF_8); -StoragePlugins plugins = lpPersistence.getMapper().readValue(pluginsData, StoragePlugins.class); -for (Map.Entry config : plugins) { - if (!definePluginConfig(config.getKey(), config.getValue())) { -logger.warn("Duplicate plugin instance '{}' defined in [{}, {}], ignoring the later one.", -config.getKey(), pluginURLMap.get(config.getKey()), url); -continue; - } - pluginURLMap.put(config.getKey(), url); -} - } -} else { - throw new IOException("Failure finding " + ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE); -} + /** + * Read bootstrap storage plugins {@link ExecConstants#BOOTSTRAP_STORAGE_PLUGINS_FILE} files for the first fresh + * instantiating of Drill + * + * @return bootstrap storage plugins + * @throws IOException if a read error occurs + */ + private StoragePlugins loadBootstrapPlugins() throws IOException { +// bootstrap load the config since no plugins are stored. +logger.info("No storage plugin instances configured in persistent store, loading bootstrap configuration."); +Set urls = ClassPathScanner.forResource(ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE, false); +if (urls != null && !urls.isEmpty()) { + logger.info("Loading the storage plugin configs from URLs {}.", urls); + StoragePlugins bootstrapPlugins = new StoragePlugins(new HashMap<>()); + for (URL url : urls) { +String pluginsData = Resources.toString(url, Charsets.UTF_8); + bootstrapPlugins.putAll(lpPersistence.getMapper().readValue(pluginsData, StoragePlugins.class)); Review comment: This behavior is different from existing one since here in case of duplicate plugin config we will choose the last one whereas previously we were logging a warning and considering first seen plugin. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199218902 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsHandler.java ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store; + +import org.apache.drill.common.logical.StoragePluginConfig; +import org.apache.drill.exec.planner.logical.StoragePlugins; +import org.apache.drill.exec.store.sys.PersistentStore; + + +/** + * Storage plugins handler is an additional service for updating storage plugins configs from the file + */ +public interface StoragePluginsHandler { + + /** + * Update incoming storage plugins configs from persistence store if present, otherwise bootstrap plugins configs. + * One of the params should be null, second shouldn't + * + * @param bootstrapPlugins bootstrap storage plugins, which are used in case of first Drill start up + * @param persistentStore the last storage plugins configs from persistence store + * @return all storage plugins, which should be loaded into persistence store + */ + StoragePlugins updatePlugins(StoragePlugins bootstrapPlugins, PersistentStore persistentStore); Review comment: Also I think the signature of the function should be to accept only `StoragePlugins` now it's upto caller to call it with bootstrap configuration or one inside PersistentStore. Basically the handler with get a initial configuration and it will produce a final configuration using that initial config and it's own logic. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199305835 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/NamedStoragePluginConfig.java ## @@ -17,22 +17,51 @@ */ package org.apache.drill.exec.store; +import com.fasterxml.jackson.annotation.JsonProperty; import org.apache.drill.common.logical.StoragePluginConfig; import com.fasterxml.jackson.annotation.JsonTypeName; -@JsonTypeName("named") +@JsonTypeName(NamedStoragePluginConfig.NAME) public class NamedStoragePluginConfig extends StoragePluginConfig { - public String name; + + public static final String NAME = "named"; + + private final String name; + + public NamedStoragePluginConfig(@JsonProperty("name") String name) { +this.name = name; + } + + public String getName() { +return name; + } @Override - public boolean equals(Object o) { -return this == o; + public boolean equals(Object obj) { +if (this == obj) { + return true; +} +if (obj == null) { + return false; +} +if (getClass() != obj.getClass()) { + return false; +} +NamedStoragePluginConfig other = (NamedStoragePluginConfig) obj; +if (name == null) { + return other.name == null; +} else { + return name.equals(other.name); +} } @Override public int hashCode() { -return name.hashCode(); +final int prime = 31; +int result = 1; +result = prime * result + ((name == null) ? 0 : name.hashCode()); Review comment: how about just `return (prime + ((name == null) ? 0 : name.hashCode());` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199305514 ## File path: contrib/storage-kafka/src/main/resources/bootstrap-storage-plugins.json ## @@ -2,8 +2,8 @@ "storage":{ kafka : { type:"kafka", - enabled: false, - kafkaConsumerProps: {"bootstrap.servers":"localhost:9092", "group.id" : "drill-consumer"} + kafkaConsumerProps: {"bootstrap.servers":"localhost:9092", "group.id" : "drill-consumer"}, + enabled: false Review comment: is there any specific reason why this `enabled` setting is moved to end for all the plugins ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199307586 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsHandlerService.java ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.base.Charsets; +import com.google.common.io.Resources; +import com.jasonclawson.jackson.dataformat.hocon.HoconFactory; +import org.apache.drill.common.config.CommonConstants; +import org.apache.drill.common.config.LogicalPlanPersistence; +import org.apache.drill.common.exceptions.DrillRuntimeException; +import org.apache.drill.common.logical.StoragePluginConfig; +import org.apache.drill.common.scanner.ClassPathScanner; +import org.apache.drill.exec.planner.logical.StoragePlugins; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.store.sys.PersistentStore; + +import javax.annotation.Nullable; +import javax.validation.constraints.NotNull; +import java.io.IOException; +import java.net.URL; +import java.util.HashMap; +import java.util.Map; +import java.util.Optional; +import java.util.Set; + +/** + * Drill plugins handler, which allows to update storage plugins configs from the + * {@link CommonConstants#STORAGE_PLUGINS_OVERRIDE_CONF} conf file + * + * TODO: DRILL-6564: It can be improved with configs versioning and service of creating + * {@link CommonConstants#STORAGE_PLUGINS_OVERRIDE_CONF} + */ +public class StoragePluginsHandlerService implements StoragePluginsHandler { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(StoragePluginsHandlerService.class); + + private final LogicalPlanPersistence hoconLogicalPlanPersistence; + + public StoragePluginsHandlerService(DrillbitContext context) { +hoconLogicalPlanPersistence = new LogicalPlanPersistence(context.getConfig(), context.getClasspathScan(), +new ObjectMapper(new HoconFactory())); + } + + @Override + public void loadPlugins(@NotNull PersistentStore persistentStore, + @Nullable StoragePlugins bootstrapPlugins) { +// if bootstrapPlugins is not null -- fresh Drill set up +StoragePlugins pluginsToBeWrittenToPersistentStore; + +StoragePlugins newPlugins = getNewStoragePlugins(); + +if (newPlugins != null) { + pluginsToBeWrittenToPersistentStore = new StoragePlugins(new HashMap<>()); + Optional.ofNullable(bootstrapPlugins) + .ifPresent(pluginsToBeWrittenToPersistentStore::putAll); + + for (Map.Entry newPlugin : newPlugins) { +String pluginName = newPlugin.getKey(); +StoragePluginConfig oldPluginConfig = Optional.ofNullable(bootstrapPlugins) +.map(plugins -> plugins.getConfig(pluginName)) +.orElse(persistentStore.get(pluginName)); +StoragePluginConfig updatedStatusPluginConfig = updatePluginStatus(oldPluginConfig, newPlugin.getValue()); +pluginsToBeWrittenToPersistentStore.put(pluginName, updatedStatusPluginConfig); + } +} else { + pluginsToBeWrittenToPersistentStore = bootstrapPlugins; +} + +// load pluginsToBeWrittenToPersistentStore to Persistent Store +Optional.ofNullable(pluginsToBeWrittenToPersistentStore) +.ifPresent(plugins -> plugins.forEach(plugin -> persistentStore.put(plugin.getKey(), plugin.getValue(; + } + + /** + * Helper method to identify the enabled status for new storage plugins config. If this status is absent in the updater + * file, the status is kept from the configs, which are going to be updated + * + * @param oldPluginConfig current storage plugin config from Persistent Store or bootstrap config file + * @param newPluginConfig new storage plugin config + * @return new storage plugin config with updated enabled status + */ + private StoragePluginConfig updatePluginStatus(@Nullable StoragePluginConfig oldPluginConfig, + @NotNull StoragePluginConfig newPluginConfig) { +if
[GitHub] sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199306162 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsHandler.java ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store; + +import org.apache.drill.common.logical.StoragePluginConfig; +import org.apache.drill.exec.planner.logical.StoragePlugins; +import org.apache.drill.exec.store.sys.PersistentStore; + + +/** + * Storage plugins handler is an additional service for updating storage plugins configs from the file + */ +public interface StoragePluginsHandler { + + /** + * Update incoming storage plugins configs from persistence store if present, otherwise bootstrap plugins configs. + * One of the params should be null, second shouldn't + * + * @param persistentStore the last storage plugins configs from persistence store + * @param bootstrapPlugins bootstrap storage plugins, which are used in case of first Drill start up + * @return all storage plugins, which should be loaded into persistence store + */ + void loadPlugins(PersistentStore persistentStore, StoragePlugins bootstrapPlugins); Review comment: It would be cleaner if `loadPlugins` accepts all the `storagePluginConfig` instead of store and bootstrapPlugins. The handler should behave as `loadPlugins` is getting existing plugin and then returning updated plugins. The caller will decide how to pass existing plugin i.e. either from store or bootstrap file and what to do with updated plugins. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-6568) Jenkins Regression: TPCDS query 68 fails with IllegalStateException: Unexpected EMIT outcome received in buildSchema phase
Robert Hou created DRILL-6568: - Summary: Jenkins Regression: TPCDS query 68 fails with IllegalStateException: Unexpected EMIT outcome received in buildSchema phase Key: DRILL-6568 URL: https://issues.apache.org/jira/browse/DRILL-6568 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.14.0 Reporter: Robert Hou Assignee: Khurram Faraaz Fix For: 1.14.0 This is TPCDS Query 68. Query: /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf1/original/maprdb/json/query68.sql SELECT c_last_name, c_first_name, ca_city, bought_city, ss_ticket_number, extended_price, extended_tax, list_price FROM (SELECT ss_ticket_number, ss_customer_sk, ca_city bought_city, Sum(ss_ext_sales_price) extended_price, Sum(ss_ext_list_price) list_price, Sum(ss_ext_tax) extended_tax FROM store_sales, date_dim, store, household_demographics, customer_address WHERE store_sales.ss_sold_date_sk = date_dim.d_date_sk AND store_sales.ss_store_sk = store.s_store_sk AND store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk AND store_sales.ss_addr_sk = customer_address.ca_address_sk AND date_dim.d_dom BETWEEN 1 AND 2 AND ( household_demographics.hd_dep_count = 8 OR household_demographics.hd_vehicle_count = 3 ) AND date_dim.d_year IN ( 1998, 1998 + 1, 1998 + 2 ) AND store.s_city IN ( 'Fairview', 'Midway' ) GROUP BY ss_ticket_number, ss_customer_sk, ss_addr_sk, ca_city) dn, customer, customer_address current_addr WHERE ss_customer_sk = c_customer_sk AND customer.c_current_addr_sk = current_addr.ca_address_sk AND current_addr.ca_city <> bought_city ORDER BY c_last_name, ss_ticket_number LIMIT 100; Here is the stack trace: 2018-06-29 07:00:32 INFO DrillTestLogger:348 - Exception: java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Unexpected EMIT outcome received in buildSchema phase Fragment 0:0 [Error Id: edbe3477-805e-4f1f-8405-d5c194dc28c2 on atsqa6c87.qa.lab:31010] (java.lang.IllegalStateException) Unexpected EMIT outcome received in buildSchema phase org.apache.drill.exec.physical.impl.TopN.TopNBatch.buildSchema():178 org.apache.drill.exec.record.AbstractRecordBatch.next():152 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 org.apache.drill.exec.record.AbstractRecordBatch.next():172 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():87 org.apache.drill.exec.record.AbstractRecordBatch.next():172 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 org.apache.drill.exec.record.AbstractRecordBatch.next():172 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147 org.apache.drill.exec.record.AbstractRecordBatch.next():172 org.apache.drill.exec.physical.impl.BaseRootExec.next():103 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 org.apache.drill.exec.physical.impl.BaseRootExec.next():93 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1595 org.apache.drill.exec.work.fragment.FragmentExecutor.run():281 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528) at org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:600) at org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1904) at org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:64) at oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:630) at org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1109) at
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199276235 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsUpdater.java ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.base.Charsets; +import com.google.common.io.Resources; +import com.jasonclawson.jackson.dataformat.hocon.HoconFactory; +import org.apache.drill.common.config.CommonConstants; +import org.apache.drill.common.config.LogicalPlanPersistence; +import org.apache.drill.common.exceptions.DrillIOException; +import org.apache.drill.common.logical.StoragePluginConfig; +import org.apache.drill.common.scanner.ClassPathScanner; +import org.apache.drill.exec.planner.logical.StoragePlugins; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.store.sys.PersistentStore; + +import javax.validation.constraints.NotNull; +import java.io.IOException; +import java.net.URL; +import java.util.HashMap; +import java.util.Map; +import java.util.Set; + +/** + * Drill plugins handler, which allows to update storage plugins configs from the + * {@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE} conf file + * + * TODO: it can be improved with configs versioning and service of creating {@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE} + */ +public class StoragePluginsUpdater implements StoragePluginsHandler { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(StoragePluginsUpdater.class); + + final DrillbitContext context; Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r198884749 ## File path: common/src/main/java/org/apache/drill/exec/metrics/DrillMetrics.java ## @@ -36,11 +36,11 @@ private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DrillMetrics.class); public static final boolean METRICS_JMX_OUTPUT_ENABLED = - SystemPropertyUtil.getBoolean("drill.metrics.jmx.enabled", true); + SystemPropertyUtil.getBoolean("drill.exec.metrics.jmx.enabled", true); Review comment: The issues comes from `drill-override-example.conf`: https://github.com/apache/drill/blob/master/distribution/src/resources/drill-override-example.conf#L22 https://github.com/apache/drill/blob/master/distribution/src/resources/drill-override-example.conf#L66 Since drill.apache.org tells the options without `exec` [link](https://drill.apache.org/docs/monitoring-metrics/#disabling-drill-metrics) I reverted the constants names and changed `drill-override-example.conf` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199073387 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java ## @@ -194,25 +215,21 @@ public void init() throws DrillbitStartupException { * @param config plugin config * @param plugin plugin implementation */ - - public void definePlugin(String name, StoragePluginConfig config, StoragePlugin plugin) { -addPlugin(name, plugin); -definePluginConfig(name, config); - } - - private boolean definePluginConfig(String name, StoragePluginConfig config) { -return pluginSystemTable.putIfAbsent(name, config); + @VisibleForTesting + public void addPluginTonPersistenceStoreIfAbsent(String name, StoragePluginConfig config, StoragePlugin plugin) { Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r198892158 ## File path: distribution/src/resources/drill-override-example.conf ## @@ -251,6 +251,9 @@ drill.exec: { protocol: "TLSv1.2", #ssl provider. May be "JDK" or "OPENSSL". Default is "JDK" provider: "JDK" + }, + storage_plugins.update: { Review comment: The option is removed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r198932079 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java ## @@ -121,71 +117,96 @@ public StoragePlugin load(StoragePluginConfig config) throws Exception { } @Override - public void init() throws DrillbitStartupException { + public void init() { availablePlugins = findAvailablePlugins(classpathScan); -// create registered plugins defined in "storage-plugins.json" -plugins.putAll(createPlugins()); - } - - @SuppressWarnings("resource") - private Map createPlugins() throws DrillbitStartupException { try { - /* - * Check if the storage plugins system table has any entries. If not, load the boostrap-storage-plugin file into - * the system table. - */ + StoragePlugins pluginsToBeWrittenToPersistenceStore = null; + final boolean isPluginsUpdaterEnabled = context.getConfig().getBoolean(ExecConstants.STORAGE_PLUGINS_UPDATE); Review comment: the option is removed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r198883586 ## File path: common/src/main/java/org/apache/drill/common/exceptions/DrillIOException.java ## @@ -19,24 +19,23 @@ import java.io.IOException; -public class DrillIOException extends IOException{ +public class DrillIOException extends IOException { static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DrillIOException.class); public DrillIOException() { super(); } - public DrillIOException(String message, Throwable cause) { -super(message, cause); + public DrillIOException(Throwable cause, String formatMessage, Object...args) { Review comment: It makes sense. Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199073178 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java ## @@ -63,17 +65,15 @@ import com.google.common.cache.CacheLoader; import com.google.common.cache.LoadingCache; import com.google.common.cache.RemovalListener; -import com.google.common.cache.RemovalNotification; -import com.google.common.collect.Lists; -import com.google.common.collect.Maps; -import com.google.common.collect.Sets; import com.google.common.io.Resources; public class StoragePluginRegistryImpl implements StoragePluginRegistry { private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(StoragePluginRegistryImpl.class); + /** Drill supported plugins */ private Map> availablePlugins = Collections.emptyMap(); - private final StoragePluginMap plugins = new StoragePluginMap(); + /** Enabled plugins */ Review comment: This is not a comment, It is one line JavaDoc for the variable. I have changed format to the three lines Java doc This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199073307 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java ## @@ -121,71 +117,96 @@ public StoragePlugin load(StoragePluginConfig config) throws Exception { } @Override - public void init() throws DrillbitStartupException { + public void init() { availablePlugins = findAvailablePlugins(classpathScan); -// create registered plugins defined in "storage-plugins.json" -plugins.putAll(createPlugins()); - } - - @SuppressWarnings("resource") - private Map createPlugins() throws DrillbitStartupException { try { - /* - * Check if the storage plugins system table has any entries. If not, load the boostrap-storage-plugin file into - * the system table. - */ + StoragePlugins pluginsToBeWrittenToPersistenceStore = null; + final boolean isPluginsUpdaterEnabled = context.getConfig().getBoolean(ExecConstants.STORAGE_PLUGINS_UPDATE); if (!pluginSystemTable.getAll().hasNext()) { -// bootstrap load the config since no plugins are stored. -logger.info("No storage plugin instances configured in persistent store, loading bootstrap configuration."); -Collection urls = ClassPathScanner.forResource(ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE, false); -if (urls != null && !urls.isEmpty()) { - logger.info("Loading the storage plugin configs from URLs {}.", urls); - Map pluginURLMap = Maps.newHashMap(); - for (URL url : urls) { -String pluginsData = Resources.toString(url, Charsets.UTF_8); -StoragePlugins plugins = lpPersistence.getMapper().readValue(pluginsData, StoragePlugins.class); -for (Map.Entry config : plugins) { - if (!definePluginConfig(config.getKey(), config.getValue())) { -logger.warn("Duplicate plugin instance '{}' defined in [{}, {}], ignoring the later one.", -config.getKey(), pluginURLMap.get(config.getKey()), url); -continue; - } - pluginURLMap.put(config.getKey(), url); -} - } -} else { - throw new IOException("Failure finding " + ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE); +// create registered plugins defined in "storage-plugins.json" +StoragePlugins bootstrapPlugins = readBootstrapStoragePlugins(); +if (isPluginsUpdaterEnabled) { + StoragePluginsHandler storagePluginsUpdaterService = new StoragePluginsUpdater(context); + pluginsToBeWrittenToPersistenceStore = storagePluginsUpdaterService.updatePlugins(bootstrapPlugins, null); +} +if (pluginsToBeWrittenToPersistenceStore == null) { + pluginsToBeWrittenToPersistenceStore = bootstrapPlugins; +} + } else { +if (isPluginsUpdaterEnabled) { + StoragePluginsHandler storagePluginsUpdaterService = new StoragePluginsUpdater(context); + pluginsToBeWrittenToPersistenceStore = storagePluginsUpdaterService.updatePlugins(null, pluginSystemTable); } } - Map activePlugins = new HashMap(); - for (Map.Entry entry : Lists.newArrayList(pluginSystemTable.getAll())) { -String name = entry.getKey(); -StoragePluginConfig config = entry.getValue(); -if (config.isEnabled()) { - try { -StoragePlugin plugin = create(name, config); -activePlugins.put(name, plugin); - } catch (ExecutionSetupException e) { -logger.error("Failure while setting up StoragePlugin with name: '{}', disabling.", name, e); -config.setEnabled(false); -pluginSystemTable.put(name, config); - } + // load plugins to persistence store + if (pluginsToBeWrittenToPersistenceStore != null) { +for (Entry plugin : pluginsToBeWrittenToPersistenceStore) { + pluginSystemTable.put(plugin.getKey(), plugin.getValue()); } } - activePlugins.put(INFORMATION_SCHEMA_PLUGIN, new InfoSchemaStoragePlugin(new InfoSchemaConfig(), context, - INFORMATION_SCHEMA_PLUGIN)); - activePlugins.put(SYS_PLUGIN, new SystemTablePlugin(SystemTablePluginConfig.INSTANCE, context, SYS_PLUGIN)); - - return activePlugins; + defineEnabledPlugins(); } catch (IOException e) { - logger.error("Failure setting up storage plugins. Drillbit exiting.", e); + logger.error("Failure setting up storage enabledPlugins. Drillbit exiting.", e); throw new IllegalStateException(e); } } + /** + * Read bootstrap storage plugins {@link ExecConstants#BOOTSTRAP_STORAGE_PLUGINS_FILE} files for the first fresh + * instantiating of Drill + * + * @return bootstrap storage plugins + * @throws
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199075637 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsHandler.java ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store; + +import org.apache.drill.common.logical.StoragePluginConfig; +import org.apache.drill.exec.planner.logical.StoragePlugins; +import org.apache.drill.exec.store.sys.PersistentStore; + + +/** + * Storage plugins handler is an additional service for updating storage plugins configs from the file + */ +public interface StoragePluginsHandler { + + /** + * Update incoming storage plugins configs from persistence store if present, otherwise bootstrap plugins configs. + * One of the params should be null, second shouldn't + * + * @param bootstrapPlugins bootstrap storage plugins, which are used in case of first Drill start up + * @param persistentStore the last storage plugins configs from persistence store + * @return all storage plugins, which should be loaded into persistence store + */ + StoragePlugins updatePlugins(StoragePlugins bootstrapPlugins, PersistentStore persistentStore); Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199079357 ## File path: distribution/src/deb/control/conffiles ## @@ -16,3 +16,4 @@ /etc/drill/conf/drill-override.conf /etc/drill/conf/logback.xml /etc/drill/conf/drill-env.sh +/etc/drill/conf/storage-plugins-updates.conf Review comment: It should be removed. Thanks for catching this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199076849 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsUpdater.java ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.base.Charsets; +import com.google.common.io.Resources; +import com.jasonclawson.jackson.dataformat.hocon.HoconFactory; +import org.apache.drill.common.config.CommonConstants; +import org.apache.drill.common.config.LogicalPlanPersistence; +import org.apache.drill.common.exceptions.DrillIOException; +import org.apache.drill.common.logical.StoragePluginConfig; +import org.apache.drill.common.scanner.ClassPathScanner; +import org.apache.drill.exec.planner.logical.StoragePlugins; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.store.sys.PersistentStore; + +import javax.validation.constraints.NotNull; +import java.io.IOException; +import java.net.URL; +import java.util.HashMap; +import java.util.Map; +import java.util.Set; + +/** + * Drill plugins handler, which allows to update storage plugins configs from the + * {@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE} conf file + * + * TODO: it can be improved with configs versioning and service of creating {@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE} + */ +public class StoragePluginsUpdater implements StoragePluginsHandler { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(StoragePluginsUpdater.class); + + final DrillbitContext context; + final LogicalPlanPersistence hoconLogicalPlanPersistence; + + public StoragePluginsUpdater(DrillbitContext context) { +this.context = context; +hoconLogicalPlanPersistence = new LogicalPlanPersistence(context.getConfig(), context.getClasspathScan(), +new ObjectMapper(new HoconFactory())); + } + + @Override + public StoragePlugins updatePlugins(StoragePlugins bootstrapPlugins, PersistentStore persistentStore) { +StoragePlugins plugins = null; +Set urlSet = ClassPathScanner.forResource(CommonConstants.STORAGE_PLUGINS_UPDATER_FILE, false); +try { + if (!urlSet.isEmpty()) { +if (urlSet.size() != 1) { + throw new DrillIOException("More than one %s file is placed in Drill's classpath", Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199073271 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java ## @@ -121,71 +117,96 @@ public StoragePlugin load(StoragePluginConfig config) throws Exception { } @Override - public void init() throws DrillbitStartupException { + public void init() { availablePlugins = findAvailablePlugins(classpathScan); -// create registered plugins defined in "storage-plugins.json" -plugins.putAll(createPlugins()); - } - - @SuppressWarnings("resource") - private Map createPlugins() throws DrillbitStartupException { try { - /* - * Check if the storage plugins system table has any entries. If not, load the boostrap-storage-plugin file into - * the system table. - */ + StoragePlugins pluginsToBeWrittenToPersistenceStore = null; + final boolean isPluginsUpdaterEnabled = context.getConfig().getBoolean(ExecConstants.STORAGE_PLUGINS_UPDATE); if (!pluginSystemTable.getAll().hasNext()) { -// bootstrap load the config since no plugins are stored. -logger.info("No storage plugin instances configured in persistent store, loading bootstrap configuration."); -Collection urls = ClassPathScanner.forResource(ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE, false); -if (urls != null && !urls.isEmpty()) { - logger.info("Loading the storage plugin configs from URLs {}.", urls); - Map pluginURLMap = Maps.newHashMap(); - for (URL url : urls) { -String pluginsData = Resources.toString(url, Charsets.UTF_8); -StoragePlugins plugins = lpPersistence.getMapper().readValue(pluginsData, StoragePlugins.class); -for (Map.Entry config : plugins) { - if (!definePluginConfig(config.getKey(), config.getValue())) { -logger.warn("Duplicate plugin instance '{}' defined in [{}, {}], ignoring the later one.", -config.getKey(), pluginURLMap.get(config.getKey()), url); -continue; - } - pluginURLMap.put(config.getKey(), url); -} - } -} else { - throw new IOException("Failure finding " + ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE); +// create registered plugins defined in "storage-plugins.json" +StoragePlugins bootstrapPlugins = readBootstrapStoragePlugins(); +if (isPluginsUpdaterEnabled) { + StoragePluginsHandler storagePluginsUpdaterService = new StoragePluginsUpdater(context); + pluginsToBeWrittenToPersistenceStore = storagePluginsUpdaterService.updatePlugins(bootstrapPlugins, null); +} +if (pluginsToBeWrittenToPersistenceStore == null) { + pluginsToBeWrittenToPersistenceStore = bootstrapPlugins; +} + } else { +if (isPluginsUpdaterEnabled) { + StoragePluginsHandler storagePluginsUpdaterService = new StoragePluginsUpdater(context); + pluginsToBeWrittenToPersistenceStore = storagePluginsUpdaterService.updatePlugins(null, pluginSystemTable); } } - Map activePlugins = new HashMap(); - for (Map.Entry entry : Lists.newArrayList(pluginSystemTable.getAll())) { -String name = entry.getKey(); -StoragePluginConfig config = entry.getValue(); -if (config.isEnabled()) { - try { -StoragePlugin plugin = create(name, config); -activePlugins.put(name, plugin); - } catch (ExecutionSetupException e) { -logger.error("Failure while setting up StoragePlugin with name: '{}', disabling.", name, e); -config.setEnabled(false); -pluginSystemTable.put(name, config); - } + // load plugins to persistence store Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199276413 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsUpdater.java ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.base.Charsets; +import com.google.common.io.Resources; +import com.jasonclawson.jackson.dataformat.hocon.HoconFactory; +import org.apache.drill.common.config.CommonConstants; +import org.apache.drill.common.config.LogicalPlanPersistence; +import org.apache.drill.common.exceptions.DrillIOException; +import org.apache.drill.common.logical.StoragePluginConfig; +import org.apache.drill.common.scanner.ClassPathScanner; +import org.apache.drill.exec.planner.logical.StoragePlugins; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.store.sys.PersistentStore; + +import javax.validation.constraints.NotNull; +import java.io.IOException; +import java.net.URL; +import java.util.HashMap; +import java.util.Map; +import java.util.Set; + +/** + * Drill plugins handler, which allows to update storage plugins configs from the + * {@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE} conf file + * + * TODO: it can be improved with configs versioning and service of creating {@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE} + */ +public class StoragePluginsUpdater implements StoragePluginsHandler { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(StoragePluginsUpdater.class); + + final DrillbitContext context; + final LogicalPlanPersistence hoconLogicalPlanPersistence; + + public StoragePluginsUpdater(DrillbitContext context) { +this.context = context; +hoconLogicalPlanPersistence = new LogicalPlanPersistence(context.getConfig(), context.getClasspathScan(), +new ObjectMapper(new HoconFactory())); + } + + @Override + public StoragePlugins updatePlugins(StoragePlugins bootstrapPlugins, PersistentStore persistentStore) { +StoragePlugins plugins = null; +Set urlSet = ClassPathScanner.forResource(CommonConstants.STORAGE_PLUGINS_UPDATER_FILE, false); +try { + if (!urlSet.isEmpty()) { +if (urlSet.size() != 1) { + throw new DrillIOException("More than one %s file is placed in Drill's classpath", + CommonConstants.STORAGE_PLUGINS_UPDATER_FILE); +} +URL fileUrl = urlSet.iterator().next(); +plugins = new StoragePlugins(new HashMap<>()); +if (bootstrapPlugins != null) { + plugins.putAll(bootstrapPlugins); +} + +String newPluginsData = Resources.toString(fileUrl, Charsets.UTF_8); +StoragePlugins newPlugins = hoconLogicalPlanPersistence.getMapper().readValue(newPluginsData, StoragePlugins.class); + +for (Map.Entry newPlugin : newPlugins) { + String pluginName = newPlugin.getKey(); + StoragePluginConfig oldPluginConfig = persistentStore == null ? bootstrapPlugins.getConfig(pluginName) : + persistentStore.get(pluginName); + StoragePluginConfig updatedStatusPluginConfig = + determineUndefinedEnabledStatus(oldPluginConfig, newPlugin.getValue()); + plugins.put(pluginName, updatedStatusPluginConfig); +} + } +} catch (IOException e) { + logger.error("Failures are obtained while updating storage plugins from %s file. Proceed without updating", Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r198886347 ## File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveSchemaFactory.java ## @@ -72,29 +71,27 @@ public HiveSchemaFactory(final HiveStoragePlugin plugin, final String name, fina isDrillImpersonationEnabled = plugin.getContext().getConfig().getBoolean(ExecConstants.IMPERSONATION_ENABLED); try { + // TODO: DRILL-6412. Clients for plugin should be instantiated only for the case, when plugin is enabled processUserMetastoreClient = DrillHiveMetaStoreClient.createCloseableClientWithCaching(hiveConf); -} catch (MetaException e) { - throw new ExecutionSetupException("Failure setting up Hive metastore client.", e); -} -metaStoreClientLoadingCache = CacheBuilder -.newBuilder() -.expireAfterAccess(10, TimeUnit.MINUTES) -.maximumSize(5) // Up to 5 clients for impersonation-enabled. -.removalListener(new RemovalListener() { - @Override - public void onRemoval(RemovalNotification notification) { + metaStoreClientLoadingCache = CacheBuilder + .newBuilder() + .expireAfterAccess(10, TimeUnit.MINUTES) + .maximumSize(5) // Up to 5 clients for impersonation-enabled. + .removalListener((RemovalListener) notification -> { DrillHiveMetaStoreClient client = notification.getValue(); client.close(); - } -}) -.build(new CacheLoader() { - @Override - public DrillHiveMetaStoreClient load(String userName) throws Exception { -return DrillHiveMetaStoreClient.createClientWithAuthz(processUserMetastoreClient, hiveConf, userName); - } -}); + }) + .build(new CacheLoader() { +@Override +public DrillHiveMetaStoreClient load(String userName) throws Exception { + return DrillHiveMetaStoreClient.createClientWithAuthz(processUserMetastoreClient, hiveConf, userName); +} + }); +} catch (MetaException e) { Review comment: If the exception is obtained the further logic from constructor doesn't make sense. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199270039 ## File path: distribution/src/resources/storage-plugins-example.conf ## @@ -0,0 +1,66 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This file involves storage plugins configs, which can be updated on the Drill start-up. This is controlled with a +# "drill.exec.storage_plugins.update.enabled" boot option, see https://drill.apache.org/docs/start-up-options/. +# This file is in HOCON format, see https://github.com/typesafehub/config/blob/master/HOCON.md for more information. + + "storage":{ +cp: { + type: "file", + connection: "classpath:///", + formats: { +"csv" : { + type: "text", + extensions: [ "csv" ], + delimiter: "," +} + } +} + } + "storage":{ +dfs: { + type: "file", + enabled: false, Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199093883 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/NamedStoragePluginConfig.java ## @@ -25,6 +25,14 @@ public class NamedStoragePluginConfig extends StoragePluginConfig { public String name; + public String getName() { +return name; + } + + public void setName(String name) { +this.name = name; + } + Review comment: It is not my code, therefore I didn't pay attention to this. I have edited these methods. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199266170 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistry.java ## @@ -83,15 +84,13 @@ * @param name * @param plugin */ - void addPlugin(String name, StoragePlugin plugin); + void addPluginToEnabled(String name, StoragePlugin plugin); Review comment: It is not the same: this method adds one plugin to existing Map of enabled plugins. I suppose `addEnabledPlugin` looks good. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199276671 ## File path: exec/java-exec/src/main/resources/drill-module.conf ## @@ -393,7 +393,11 @@ drill.exec: { //port hunting for drillbits. Enabled only for testing purposes. port_hunt : false, // Allow drillbit to bind to loopback address in distributed mode. Enabled only for testing purposes. - allow_loopback_address_binding : false + allow_loopback_address_binding : false, + // Allow overwrite storage plugins configs from the storage-plugins.conf file in the process of drillbit starting + storage_plugins.update: { +enabled: true Review comment: the option is removed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199276599 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsUpdater.java ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.common.base.Charsets; +import com.google.common.io.Resources; +import com.jasonclawson.jackson.dataformat.hocon.HoconFactory; +import org.apache.drill.common.config.CommonConstants; +import org.apache.drill.common.config.LogicalPlanPersistence; +import org.apache.drill.common.exceptions.DrillIOException; +import org.apache.drill.common.logical.StoragePluginConfig; +import org.apache.drill.common.scanner.ClassPathScanner; +import org.apache.drill.exec.planner.logical.StoragePlugins; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.store.sys.PersistentStore; + +import javax.validation.constraints.NotNull; +import java.io.IOException; +import java.net.URL; +import java.util.HashMap; +import java.util.Map; +import java.util.Set; + +/** + * Drill plugins handler, which allows to update storage plugins configs from the + * {@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE} conf file + * + * TODO: it can be improved with configs versioning and service of creating {@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE} + */ +public class StoragePluginsUpdater implements StoragePluginsHandler { + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(StoragePluginsUpdater.class); + + final DrillbitContext context; + final LogicalPlanPersistence hoconLogicalPlanPersistence; + + public StoragePluginsUpdater(DrillbitContext context) { +this.context = context; +hoconLogicalPlanPersistence = new LogicalPlanPersistence(context.getConfig(), context.getClasspathScan(), +new ObjectMapper(new HoconFactory())); + } + + @Override + public StoragePlugins updatePlugins(StoragePlugins bootstrapPlugins, PersistentStore persistentStore) { +StoragePlugins plugins = null; +Set urlSet = ClassPathScanner.forResource(CommonConstants.STORAGE_PLUGINS_UPDATER_FILE, false); +try { + if (!urlSet.isEmpty()) { +if (urlSet.size() != 1) { + throw new DrillIOException("More than one %s file is placed in Drill's classpath", + CommonConstants.STORAGE_PLUGINS_UPDATER_FILE); +} +URL fileUrl = urlSet.iterator().next(); +plugins = new StoragePlugins(new HashMap<>()); +if (bootstrapPlugins != null) { + plugins.putAll(bootstrapPlugins); +} + +String newPluginsData = Resources.toString(fileUrl, Charsets.UTF_8); +StoragePlugins newPlugins = hoconLogicalPlanPersistence.getMapper().readValue(newPluginsData, StoragePlugins.class); + +for (Map.Entry newPlugin : newPlugins) { + String pluginName = newPlugin.getKey(); + StoragePluginConfig oldPluginConfig = persistentStore == null ? bootstrapPlugins.getConfig(pluginName) : + persistentStore.get(pluginName); + StoragePluginConfig updatedStatusPluginConfig = + determineUndefinedEnabledStatus(oldPluginConfig, newPlugin.getValue()); + plugins.put(pluginName, updatedStatusPluginConfig); +} + } +} catch (IOException e) { + logger.error("Failures are obtained while updating storage plugins from %s file. Proceed without updating", + CommonConstants.STORAGE_PLUGINS_UPDATER_FILE, e); + return bootstrapPlugins; +} +return plugins; + } + + /** + * Helper method to identify the enabled status for new storage plugins config. If this status is absent in the updater + * file, the status is kept from the configs, which are going to be updated + * + * @param oldPluginConfig current storage plugin config from persistence store or bootstrap config file + * @param newPluginConfig new storage plugin config + * @return new storage plugin config with updated enabled status + */ + private
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199104274 ## File path: distribution/src/resources/storage-plugins-example.conf ## @@ -0,0 +1,66 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This file involves storage plugins configs, which can be updated on the Drill start-up. This is controlled with a +# "drill.exec.storage_plugins.update.enabled" boot option, see https://drill.apache.org/docs/start-up-options/. +# This file is in HOCON format, see https://github.com/typesafehub/config/blob/master/HOCON.md for more information. + + "storage":{ +cp: { + type: "file", + connection: "classpath:///", + formats: { +"csv" : { + type: "text", + extensions: [ "csv" ], + delimiter: "," +} + } +} + } + "storage":{ +dfs: { + type: "file", + enabled: false, + connection: "hdfs:///", + workspaces: { +"root": { + "location": "/", + "writable": false, + "defaultInputFormat": null, + "allowAccessOutsideWorkspace": false +} + }, + formats: { +"parquet": { + "type": "parquet" +} + } +} + } + "storage":{ +mongo : { + type:"mongo", + enabled: true, Review comment: Actually this is the same. https://github.com/apache/drill/blob/master/contrib/storage-mongo/src/main/resources/bootstrap-storage-plugins.json#L5 I have edited it here and in all bootstrap-storage-plugins for consistency. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r198882971 ## File path: common/src/main/java/org/apache/drill/common/config/CommonConstants.java ## @@ -31,4 +31,7 @@ /** Override configuration file name. (Classpath resource pathname.) */ String CONFIG_OVERRIDE_RESOURCE_PATHNAME = "drill-override.conf"; + /** Override plugins configs file name. (Classpath resource pathname.) */ + String STORAGE_PLUGINS_UPDATER_FILE = "storage-plugins.conf"; Review comment: looks good. Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r198931779 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ## @@ -248,6 +248,12 @@ private ExecConstants() { */ public static final String DEFAULT_TEMPORARY_WORKSPACE = "drill.exec.default_temporary_workspace"; + /** + * Storage plugins configs will be updated in the case, when + * {@link org.apache.drill.common.config.CommonConstants#STORAGE_PLUGINS_UPDATER_FILE} file is present + */ + public static final String STORAGE_PLUGINS_UPDATE = "drill.exec.storage_plugins.update.enabled"; Review comment: The option isn't really helpful. Removed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r198928608 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginMap.java ## @@ -111,7 +112,7 @@ public StoragePlugin get(String name) { return nameMap.entrySet().iterator(); } - public Iterable names() { + public Set names() { Review comment: At least `Сollection` is needed for creating new Set from existing. It is a requirement for plugins names, `Set` guarantees avoiding of duplicates of names for plugins. I have renamed it to `getNames()` and added javadoc This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199077537 ## File path: logical/src/main/java/org/apache/drill/common/config/LogicalPlanPersistence.java ## @@ -37,12 +37,12 @@ public class LogicalPlanPersistence { private ObjectMapper mapper; Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r198928698 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java ## @@ -121,71 +117,96 @@ public StoragePlugin load(StoragePluginConfig config) throws Exception { } @Override - public void init() throws DrillbitStartupException { + public void init() { availablePlugins = findAvailablePlugins(classpathScan); -// create registered plugins defined in "storage-plugins.json" -plugins.putAll(createPlugins()); - } - - @SuppressWarnings("resource") - private Map createPlugins() throws DrillbitStartupException { try { - /* - * Check if the storage plugins system table has any entries. If not, load the boostrap-storage-plugin file into - * the system table. - */ + StoragePlugins pluginsToBeWrittenToPersistenceStore = null; Review comment: renamed and the variable is placed in `StoragePluginsHandlerService` class This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r199105919 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/NamedStoragePluginConfig.java ## @@ -25,6 +25,14 @@ public class NamedStoragePluginConfig extends StoragePluginConfig { public String name; + public String getName() { Review comment: Good suggestion, done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#discussion_r198926537 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistry.java ## @@ -83,15 +84,13 @@ * @param name Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container
Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r199230386 ## File path: distribution/pom.xml ## @@ -485,6 +485,36 @@ + + + apache-release + + + +com.spotify +dockerfile-maven-plugin +1.4.3 + + +docker-image + Review comment: We can get rid of 'push' if you'd like. 'build' is good enough for most purposes. I'm not sure about 'package'. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kkhatua commented on issue #1350: DRILL-4580: Support for exporting storage plugin configurations
kkhatua commented on issue #1350: DRILL-4580: Support for exporting storage plugin configurations URL: https://github.com/apache/drill/pull/1350#issuecomment-401503825 I'd be more comfortable with single plugins to avoid any possible corruption (e.g. format, etc) as debugging could be a nightmare. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ppadma commented on issue #1324: DRILL-6310: limit batch size for hash aggregate
ppadma commented on issue #1324: DRILL-6310: limit batch size for hash aggregate URL: https://github.com/apache/drill/pull/1324#issuecomment-401498074 @Ben-Zvi Thank you Boaz for the thorough review and giving me a +1 finally. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-6567) Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: java.lang.reflect.UndeclaredThrowableException.
Robert Hou created DRILL-6567: - Summary: Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: java.lang.reflect.UndeclaredThrowableException. Key: DRILL-6567 URL: https://issues.apache.org/jira/browse/DRILL-6567 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.14.0 Reporter: Robert Hou Assignee: Pritesh Maker Fix For: 1.14.0 This is TPCDS Query 93. Query: /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query93.sql SELECT ss_customer_sk, Sum(act_sales) sumsales FROM (SELECT ss_item_sk, ss_ticket_number, ss_customer_sk, CASE WHEN sr_return_quantity IS NOT NULL THEN ( ss_quantity - sr_return_quantity ) * ss_sales_price ELSE ( ss_quantity * ss_sales_price ) END act_sales FROM store_sales LEFT OUTER JOIN store_returns ON ( sr_item_sk = ss_item_sk AND sr_ticket_number = ss_ticket_number ), reason WHERE sr_reason_sk = r_reason_sk AND r_reason_desc = 'reason 38') t GROUP BY ss_customer_sk ORDER BY sumsales, ss_customer_sk LIMIT 100; Here is the stack trace: 2018-06-29 07:00:32 INFO DrillTestLogger:348 - Exception: java.sql.SQLException: INTERNAL_ERROR ERROR: java.lang.reflect.UndeclaredThrowableException Setup failed for null Fragment 4:56 [Error Id: 3c72c14d-9362-4a9b-affb-5cf937bed89e on atsqa6c82.qa.lab:31010] (org.apache.drill.common.exceptions.ExecutionSetupException) java.lang.reflect.UndeclaredThrowableException org.apache.drill.common.exceptions.ExecutionSetupException.fromThrowable():30 org.apache.drill.exec.store.hive.readers.HiveAbstractReader.setup():327 org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():245 org.apache.drill.exec.physical.impl.ScanBatch.next():164 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 org.apache.drill.exec.record.AbstractRecordBatch.next():152 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147 org.apache.drill.exec.record.AbstractRecordBatch.next():172 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276 org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238 org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218 org.apache.drill.exec.record.AbstractRecordBatch.next():152 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147 org.apache.drill.exec.record.AbstractRecordBatch.next():172 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema():118 org.apache.drill.exec.record.AbstractRecordBatch.next():152 org.apache.drill.exec.physical.impl.BaseRootExec.next():103 org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152 org.apache.drill.exec.physical.impl.BaseRootExec.next():93 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1595 org.apache.drill.exec.work.fragment.FragmentExecutor.run():281 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 Caused By (java.util.concurrent.ExecutionException) java.lang.reflect.UndeclaredThrowableException java.util.concurrent.FutureTask.report():122 java.util.concurrent.FutureTask.get():192 org.apache.drill.exec.store.hive.readers.HiveAbstractReader.setup():320 org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():245 org.apache.drill.exec.physical.impl.ScanBatch.next():164 org.apache.drill.exec.record.AbstractRecordBatch.next():119
[jira] [Created] (DRILL-6566) Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase.
Robert Hou created DRILL-6566: - Summary: Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase. Key: DRILL-6566 URL: https://issues.apache.org/jira/browse/DRILL-6566 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.14.0 Reporter: Robert Hou Assignee: Pritesh Maker Fix For: 1.14.0 This is TPCDS Query 66. Query: /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf1/hive-generated-parquet/hive1_native/query66.sql SELECT w_warehouse_name, w_warehouse_sq_ft, w_city, w_county, w_state, w_country, ship_carriers, year1, Sum(jan_sales) AS jan_sales, Sum(feb_sales) AS feb_sales, Sum(mar_sales) AS mar_sales, Sum(apr_sales) AS apr_sales, Sum(may_sales) AS may_sales, Sum(jun_sales) AS jun_sales, Sum(jul_sales) AS jul_sales, Sum(aug_sales) AS aug_sales, Sum(sep_sales) AS sep_sales, Sum(oct_sales) AS oct_sales, Sum(nov_sales) AS nov_sales, Sum(dec_sales) AS dec_sales, Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot, Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot, Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot, Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot, Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot, Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot, Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot, Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot, Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot, Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot, Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot, Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot, Sum(jan_net) AS jan_net, Sum(feb_net) AS feb_net, Sum(mar_net) AS mar_net, Sum(apr_net) AS apr_net, Sum(may_net) AS may_net, Sum(jun_net) AS jun_net, Sum(jul_net) AS jul_net, Sum(aug_net) AS aug_net, Sum(sep_net) AS sep_net, Sum(oct_net) AS oct_net, Sum(nov_net) AS nov_net, Sum(dec_net) AS dec_net FROM (SELECT w_warehouse_name, w_warehouse_sq_ft, w_city, w_county, w_state, w_country, 'ZOUROS' || ',' || 'ZHOU' AS ship_carriers, d_yearAS year1, Sum(CASE WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity ELSE 0 END) AS jan_sales, Sum(CASE WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity ELSE 0 END) AS feb_sales, Sum(CASE WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity ELSE 0 END) AS mar_sales, Sum(CASE WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity ELSE 0 END) AS apr_sales, Sum(CASE WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity ELSE 0 END) AS may_sales, Sum(CASE WHEN d_moy = 6 THEN ws_ext_sales_price * ws_quantity ELSE 0 END) AS jun_sales, Sum(CASE WHEN d_moy = 7 THEN ws_ext_sales_price * ws_quantity ELSE 0 END) AS jul_sales, Sum(CASE WHEN d_moy = 8 THEN ws_ext_sales_price * ws_quantity ELSE 0 END) AS aug_sales, Sum(CASE WHEN d_moy = 9 THEN ws_ext_sales_price * ws_quantity ELSE 0 END) AS sep_sales, Sum(CASE WHEN d_moy = 10 THEN ws_ext_sales_price * ws_quantity ELSE 0 END) AS oct_sales, Sum(CASE WHEN d_moy = 11 THEN ws_ext_sales_price * ws_quantity ELSE 0 END) AS nov_sales, Sum(CASE WHEN d_moy = 12 THEN ws_ext_sales_price * ws_quantity ELSE 0 END) AS dec_sales, Sum(CASE WHEN d_moy = 1 THEN ws_net_paid_inc_ship * ws_quantity ELSE 0 END) AS jan_net, Sum(CASE WHEN d_moy = 2 THEN ws_net_paid_inc_ship * ws_quantity ELSE 0 END) AS feb_net, Sum(CASE WHEN d_moy = 3 THEN ws_net_paid_inc_ship * ws_quantity ELSE 0 END) AS mar_net, Sum(CASE WHEN d_moy = 4 THEN ws_net_paid_inc_ship * ws_quantity ELSE 0 END) AS apr_net, Sum(CASE WHEN d_moy = 5 THEN ws_net_paid_inc_ship * ws_quantity ELSE 0 END) AS may_net, Sum(CASE WHEN d_moy = 6 THEN ws_net_paid_inc_ship * ws_quantity ELSE 0 END) AS jun_net, Sum(CASE WHEN d_moy = 7 THEN ws_net_paid_inc_ship * ws_quantity ELSE 0 END) AS jul_net, Sum(CASE WHEN d_moy = 8 THEN ws_net_paid_inc_ship * ws_quantity ELSE 0 END) AS aug_net, Sum(CASE WHEN d_moy = 9 THEN ws_net_paid_inc_ship * ws_quantity ELSE 0 END) AS sep_net, Sum(CASE WHEN d_moy = 10 THEN ws_net_paid_inc_ship * ws_quantity ELSE 0 END) AS oct_net, Sum(CASE WHEN d_moy = 11 THEN ws_net_paid_inc_ship * ws_quantity ELSE 0 END) AS nov_net, Sum(CASE WHEN d_moy = 12 THEN
[GitHub] Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate
Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate URL: https://github.com/apache/drill/pull/1324#discussion_r199289952 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java ## @@ -694,7 +694,7 @@ public PutStatus put(int incomingRowIdx, IndexPointer htIdxHolder, int hashCode, } htIdxHolder.value = currentIdx; return addedBatch ? PutStatus.NEW_BATCH_ADDED : -(freeIndex + 1 > totalIndexSize) ? +(freeIndex + 1 > prevIndexSize + batchHolders.get(batchHolders.size()-1).getTargetBatchRowCount()) ? Review comment: `prevIndexSize` is used in two places, here and when checking if a new batch is needed. Both use it the same way (by adding the size of the last batch). Both places are part of the **HOT** code, called for every row. So maybe if this variable could be set to include the size of the last batch, a lot of computations would be saved. Will it be simple? Need to know the size of the last batch when setting this variable. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] priteshm commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch Sizing functionality
priteshm commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch Sizing functionality URL: https://github.com/apache/drill/pull/1330#issuecomment-401478271 I do see that a similar approach is followed for other operators for batch sizing. I do agree with @ilooner that it would be good to discuss alternative proposals on a separate JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] priteshm commented on issue #1272: DRILL-5977: Filter Pushdown in Drill-Kafka plugin
priteshm commented on issue #1272: DRILL-5977: Filter Pushdown in Drill-Kafka plugin URL: https://github.com/apache/drill/pull/1272#issuecomment-401474974 @akumarb2010 is this ready to be committed? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-6565) cume_dist does not return enough rows
Robert Hou created DRILL-6565: - Summary: cume_dist does not return enough rows Key: DRILL-6565 URL: https://issues.apache.org/jira/browse/DRILL-6565 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.14.0 Reporter: Robert Hou Assignee: Pritesh Maker Attachments: drillbit.log.7802 This query should return 64 rows but only returns 38 rows: alter session set `planner.width.max_per_node` = 1; alter session set `planner.width.max_per_query` = 1; select * from ( select cume_dist() over (order by Index) IntervalSecondValuea, Index from (select * from dfs.`/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB_1GB.parquet` order by BigIntvalue)) d where d.Index = 1; I tried to reproduce the problem by using a smaller table, but it does not reproduce. I tried to reproduce the problem without the outside select statement, but it does not reproduce. Here is the explain plan: {noformat} | 00-00Screen : rowType = RecordType(DOUBLE IntervalSecondValuea, ANY Index): rowcount = 12000.0, cumulative cost = {757200.0 rows, 1.1573335922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4034 00-01 ProjectAllowDup(IntervalSecondValuea=[$0], Index=[$1]) : rowType = RecordType(DOUBLE IntervalSecondValuea, ANY Index): rowcount = 12000.0, cumulative cost = {756000.0 rows, 1.1572135922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4033 00-02Project(w0$o0=[$1], $0=[$0]) : rowType = RecordType(DOUBLE w0$o0, ANY $0): rowcount = 12000.0, cumulative cost = {744000.0 rows, 1.1548135922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4032 00-03 SelectionVectorRemover : rowType = RecordType(ANY $0, DOUBLE w0$o0): rowcount = 12000.0, cumulative cost = {732000.0 rows, 1.1524135922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4031 00-04Filter(condition=[=($0, 1)]) : rowType = RecordType(ANY $0, DOUBLE w0$o0): rowcount = 12000.0, cumulative cost = {72.0 rows, 1.1512135922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4030 00-05 Window(window#0=[window(partition {} order by [0] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [CUME_DIST()])]) : rowType = RecordType(ANY $0, DOUBLE w0$o0): rowcount = 8.0, cumulative cost = {64.0 rows, 1.1144135922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4029 00-06SelectionVectorRemover : rowType = RecordType(ANY $0): rowcount = 8.0, cumulative cost = {56.0 rows, 1.0984135922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4028 00-07 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY $0): rowcount = 8.0, cumulative cost = {48.0 rows, 1.0904135922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4027 00-08Project($0=[ITEM($0, 'Index')]) : rowType = RecordType(ANY $0): rowcount = 8.0, cumulative cost = {40.0 rows, 5692067.961455824 cpu, 0.0 io, 0.0 network, 128.0 memory}, id = 4026 00-09 SelectionVectorRemover : rowType = RecordType(DYNAMIC_STAR T2¦¦**, ANY BigIntvalue): rowcount = 8.0, cumulative cost = {32.0 rows, 5612067.961455824 cpu, 0.0 io, 0.0 network, 128.0 memory}, id = 4025 00-10Sort(sort0=[$1], dir0=[ASC]) : rowType = RecordType(DYNAMIC_STAR T2¦¦**, ANY BigIntvalue): rowcount = 8.0, cumulative cost = {24.0 rows, 5532067.961455824 cpu, 0.0 io, 0.0 network, 128.0 memory}, id = 4024 00-11 Project(T2¦¦**=[$0], BigIntvalue=[$1]) : rowType = RecordType(DYNAMIC_STAR T2¦¦**, ANY BigIntvalue): rowcount = 8.0, cumulative cost = {16.0 rows, 32.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4023 00-12Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB_1GB.parquet]], selectionRoot=maprfs:/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB_1GB.parquet, numFiles=1, numRowGroups=6, usedMetadataFile=false, columns=[`**`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY BigIntvalue): rowcount = 8.0, cumulative cost = {8.0 rows, 16.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4022 {noformat} I have attached the drillbit.log. The commit id is: | 1.14.0-SNAPSHOT | aa127b70b1e46f7f4aa19881f25eda583627830a | DRILL-6523: Fix NPE for describe of partial schema | 22.06.2018 @ 11:28:23 PDT | r...@mapr.com | 23.06.2018 @ 02:05:10 PDT | fourvarchar_asc_nulls95.q -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate
Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate URL: https://github.com/apache/drill/pull/1324#discussion_r199274012 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java ## @@ -568,8 +576,10 @@ private void retryAfterOOM(boolean batchAdded) throws RetryAfterSpillException { if ( batchAdded ) { logger.trace("OOM - Removing index {} from the batch holders list",batchHolders.size() - 1); BatchHolder bh = batchHolders.remove(batchHolders.size() - 1); + prevIndexSize = batchHolders.size() > 1 ? (batchHolders.size()-1) * BATCH_SIZE : 0; + totalIndexSize = batchHolders.size() * BATCH_SIZE; // update freeIndex to point to end of last batch + 1 - freeIndex = batchHolders.size() * BATCH_SIZE + 1; + freeIndex = totalIndexSize + 1; Review comment: No need for "+ 1" -- the free index starts at batch offset of zero. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate
Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate URL: https://github.com/apache/drill/pull/1324#discussion_r199274859 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java ## @@ -684,7 +694,7 @@ public PutStatus put(int incomingRowIdx, IndexPointer htIdxHolder, int hashCode, } htIdxHolder.value = currentIdx; return addedBatch ? PutStatus.NEW_BATCH_ADDED : -(freeIndex + 1 > (batchHolders.size() * BATCH_SIZE)) ? +(freeIndex + 1 > totalIndexSize) ? Review comment: Problem: The original code "overflew" the (64K - 1) by adding one, hence this check returned KEY_ADDED_LAST, which signaled to the caller that a batch just got filled -- a good opportunity to check if need to spill (since a new batch would surely be allocated soon). By going below 64K, this would not work, hence the code misses on some of the checks of "is spill needed now". This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Agirish commented on issue #1348: DRILL-6346: Create an Official Drill Docker Container
Agirish commented on issue #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#issuecomment-401466019 I'll work on adding the instructions into both the docs dir on this repo and the write-up on Parth's repo once the PR is in. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container
Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r199272513 ## File path: distribution/pom.xml ## @@ -485,6 +485,36 @@ + + + apache-release + + + +com.spotify +dockerfile-maven-plugin +1.4.3 + + +docker-image + Review comment: Here is how we can explicitly build, tag and push (any one action at at time): Inside the distribution directory, executing: `mvn -Papache-release dockerfile:build` `mvn -Papache-release dockerfile:tag` `mvn -Papache-release dockerfile:push` Build and Tag happens on the user's system. Push pushes the image to the container registry on Docker Hub: https://hub.docker.com/r/drill/apache-drill-centos/tags/ On the root directory, one can simply run the below command to build and tag the container. Can be optionally pushed to Docker Hub separately. `mvn clean install -Papache-release -DskipTests` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container
Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r199272414 ## File path: distribution/pom.xml ## @@ -485,6 +485,36 @@ + + + apache-release + + + +com.spotify +dockerfile-maven-plugin +1.4.3 + + +docker-image + + build + push + + + + + drill/apache-drill-centos Review comment: Sure. How about even simpler tag: drill/apache-drill:1.14.0 The default tag will be for CentOS. We can have additional tags like you mentioned: drill/apache-drill:1.14.0-centos (alias for 1.14.0) drill/apache-drill:1.14.0-ubuntu drill/apache-drill:1.14.0-alpine etc.. Also sure, the Docker Hub repo can be moved into a maven property. Should we keep the drill docker hub repo as the default, or just make it a local repository by default instead. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
Re: Actual vectorization execution
Hi Weijie, As it turns out, vectorized processing in Drill is more aspirational than operational at this point in time. The code used in Drill is not actually vector-based even though the data itself is columnar. Drill generally does row-wise operations because row-wise operations fit the SQL semantics better than column-wise operations. There is generally a "loop over all rows" block of code that calls into a "do something for column a" block, followed by a "do something for column b" block, etc. For vectorized processing, the loops have to be inverted: "do all column a" followed by "do all column b". That is not always possible, however. Further, many of Drill's readers produce Nullable types. In this case, every value carries a null/not-null flag which must be checked for each data value. It is unlikely that CPU instructions exist for this case. So, a first step is to research how various operators could be vectorized. For example, how would we handle a "WHERE x = 10" case in a way that would benefit from vectorization? How about a "SUM(x)" case? Once that is sorted out (there are likely research papers that explain how others have done it), we can move onto changing the generated code (the loop-over-all-rows code) to use the newer design. Thanks, - Paul On Friday, June 29, 2018, 10:30:04 AM PDT, Aman Sinha wrote: Hi Weijie, the Panama project is an OpenJDK initialitve, right [1] ? not Intel specific. It would be quite a bit of work to test and certify with Intel's JVM which may be still in the experimental stage. Also, you may have seen the Gandiva project for Apache Arrow which aims to improve vectorization for operations on Arrow buffers (this requires integration with Arrow). I assume the test program or workload you were running was already written to exploit vectorization. Have you also looked into Drill's code-gen to see which ones are amenable to vectorization ? We could start with some small use case and expand. [1] http://www.oracle.com/technetwork/java/jvmls2016-ajila-vidstedt-3125545.pdf On Fri, Jun 29, 2018 at 3:23 AM weijie tong wrote: > HI all: > > I have investigate some vector friendly java codes's jit assembly code by > the JITWatch tool . Then I found that JVM did not generate the expected AVX > code.According to some conclusion from the JVM expert , JVM only supply > some restrict usage case to generate AVX code. > > I found Intel have fired a project called panama, which supply the > intrinsic vector API to actual execute AVX code. Here is the reference ( > > https://software.intel.com/en-us/articles/vector-api-developer-program-for-java > ) > . It also supports offheap calculation. From our JVM team's message, the > vector api will be released at JDK11. > > So I wonder whether we can distribute Intel's current JVM as a supplied > default JVM to users (like spark distribution with a default scala) and as > a option to rewrite parts of our operator codes according to this new > vector api. >
[GitHub] ilooner commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container
ilooner commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r199263635 ## File path: distribution/pom.xml ## @@ -485,6 +485,36 @@ + + + apache-release + + + +com.spotify +dockerfile-maven-plugin +1.4.3 + + +docker-image + + build + push + + + + + drill/apache-drill-centos Review comment: I suggest not keeping a separate repository based on os. Typically there is one repository and the os is included in the tag for the image. ``` drill/apache-drill:1.14-centos drill/apache-drill:1.14-alpine ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch Sizing functionality
ilooner commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch Sizing functionality URL: https://github.com/apache/drill/pull/1330#issuecomment-401453225 @vrozov My understanding was the following. QA has setup automatic tests of both the performance of batch sizing as well as correctness on a real cluster. Each batch sizing change has unit tests to validate batch size. But on a real cluster with real data, the only viable way to validate right now for QA is to check the batch sizes output by an operator is through logging. Since Drill takes testing on real clusters seriously and aims to do more than just unit tests, I think this is perfectly acceptable. Since logging has overhead, and QA wanted to automate both the performance and correctness tests, they required the ability to turn logging off via sql line. This was the approach agreed on by developers and testers in the Drill community including @sachouche, @bitblender, @ppadma, robert (don't know his github id), and @priteshm. Given the scope of agreement in the community, the fact that similar changes have already been merged, and also to minor impact on the drill code itself ~20 lines; I suggest moving this discussion to a separate change. In my investigation I was not able to find a viable alternative to this approach, @vrozov perhaps you could present an alternative approach on the dev list and lead the proposal. It would be a great help moving forward. In the meantime the changes proposed here represent a valuable performance improvement for the Drill community, so let's not hold up this change over this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vrozov commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch Sizing functionality
vrozov commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch Sizing functionality URL: https://github.com/apache/drill/pull/1330#issuecomment-40105 If logging needs to be enabled just for QA purposes, I still don't see why it needs to be enabled per query or be dynamic and can't be managed by logging.xml. Additionally, if I understand "QA purpose", batch sizes needs to be validated by unit testing, not by looking at the logs. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-6563) TPCDS query 10 has regressed
Khurram Faraaz created DRILL-6563: - Summary: TPCDS query 10 has regressed Key: DRILL-6563 URL: https://issues.apache.org/jira/browse/DRILL-6563 Project: Apache Drill Issue Type: Bug Components: Query Planning Optimization Affects Versions: 1.14.0 Reporter: Khurram Faraaz Attachments: 24ca3c6c-90e1-a4bf-6c6f-3f981fa2d043.sys.drill, query10.fast_plan_old_commit, tpcds_query_10_plan_slow_140d09e.pdf, tpcds_query_plan_10_140d09e.txt TPC-DS query 10 has regressed in performance from taking 3.5 seconds to execute on Apache Drill 1.14.0 commit b92f599 , to 07 min 51.851 sec to complete execution on Apache Drill 1.14.0 commit 140d09e. Query was executed over SF1 parquet views on a 4 node cluster. Query plan from old and newer commit is attached here, with the query profile from newer commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container
Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r199251973 ## File path: distribution/Dockerfile ## @@ -0,0 +1,28 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +FROM centos:7 +ARG VERSION + +RUN yum install -y java-1.8.0-openjdk-devel which ; yum clean all ; rm -rf /var/cache/yum Review comment: Done. Please take a look and suggest if any of the comments can be rephrased. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Agirish commented on issue #1350: DRILL-4580: Support for exporting storage plugin configurations
Agirish commented on issue #1350: DRILL-4580: Support for exporting storage plugin configurations URL: https://github.com/apache/drill/pull/1350#issuecomment-401441469 Filed DRILL-6562 to track this. Regarding one vs multiple files, my reasoning was to keep it more human readable and flexible - giving options to selectively import any plugin as needed. So suggested a compressed folder with each of the plugin files. But your approach sounds good. I'm thinking the "Export All" option would be used for All or None scenarios. And the regular per plugin "Export" option being introduced in this PR can support the flexible per plugin import. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-6562) Provide option to Export All storage plugin configurations at once
Abhishek Girish created DRILL-6562: -- Summary: Provide option to Export All storage plugin configurations at once Key: DRILL-6562 URL: https://issues.apache.org/jira/browse/DRILL-6562 Project: Apache Drill Issue Type: Improvement Components: Client - HTTP Reporter: Abhishek Girish Assignee: Abhishek Girish Follow-up to DRILL-4580. Provide ability to export all storage plugin configurations at once, with a new "Export All" option on the Storage page of the Drill web UI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] KazydubB commented on issue #1337: Upgrade ZooKeeper patch version to 3.4.12 and add Apache Curator to dependencyManagement
KazydubB commented on issue #1337: Upgrade ZooKeeper patch version to 3.4.12 and add Apache Curator to dependencyManagement URL: https://github.com/apache/drill/pull/1337#issuecomment-401439843 WIP: going to exclude unneeded dependencies from dependencies added to dependencyManagement. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] arina-ielchiieva commented on issue #1350: DRILL-4580: Support for exporting storage plugin configurations
arina-ielchiieva commented on issue #1350: DRILL-4580: Support for exporting storage plugin configurations URL: https://github.com/apache/drill/pull/1350#issuecomment-401437213 Sounds fair regarding the enhancement Jira. I think keeping export button even in update mode is also good. Please create Jira for the enhancement for import / export all. Also why you think import all should export each plugin in different file, it can export all in one (similar as we have for bootstrap plugins)? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KazydubB commented on a change in pull request #1337: Upgrade ZooKeeper patch version to 3.4.12 and add Apache Curator to dependencyManagement
KazydubB commented on a change in pull request #1337: Upgrade ZooKeeper patch version to 3.4.12 and add Apache Curator to dependencyManagement URL: https://github.com/apache/drill/pull/1337#discussion_r199242382 ## File path: pom.xml ## @@ -1552,6 +1553,11 @@ + +org.apache.zookeeper +zookeeper Review comment: I've tried upgrading all curator dependencies to 2.12.0, it seems to work as expected but curator-client's size is much larger than current one's, so one needs to update max size of drill-jdbc-all module by extra 1-2 MB. It is better to be handled as another task. However I did add curator dependencies to dependencyManagement explicitly. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
Re: Travis CI Timed Out
The timeouts are a sporadic issue. You can click on the failed build to bring up Travis. For now, since you are a committer you will have a restart build button to the right of the console (non-committers don't get this button for some reason), this will try the build again and it will probably pass. Vitali is proposing some fixes for the timeout issue, so hopefully we won't have to deal with it soon. On Thu, Jun 28, 2018 at 9:21 PM, Charles Givre wrote: > I submitted some changes and the following happened. Any suggestions? > > > Audit done. > > [ [1;34mINFO [m] > [ [1;34mINFO [m] [1m--- [0;32mmaven-enforcer-plugin:1.3.1:enforce [m > [1m(avoid_bad_dependencies) [m @ [36mdrill-sqlline [0;1m --- [m > [ [1;34mINFO [m] > [ [1;34mINFO [m] [1m--- [0;32mmaven-install-plugin:2.5.2:install [m > [1m(default-install) [m @ [36mdrill-sqlline [0;1m --- [m > [ [1;34mINFO [m] Installing /home/travis/build/apache/ > drill/contrib/sqlline/target/drill-sqlline-1.14.0-SNAPSHOT.jar to > /home/travis/.m2/repository/org/apache/drill/contrib/ > drill-sqlline/1.14.0-SNAPSHOT/drill-sqlline-1.14.0-SNAPSHOT.jar > [ [1;34mINFO [m] Installing > /home/travis/build/apache/drill/contrib/sqlline/pom.xml > to /home/travis/.m2/repository/org/apache/drill/contrib/ > drill-sqlline/1.14.0-SNAPSHOT/drill-sqlline-1.14.0-SNAPSHOT.pom > [ [1;34mINFO [m] Installing /home/travis/build/apache/ > drill/contrib/sqlline/target/drill-sqlline-1.14.0-SNAPSHOT-tests.jar to > /home/travis/.m2/repository/org/apache/drill/contrib/ > drill-sqlline/1.14.0-SNAPSHOT/drill-sqlline-1.14.0-SNAPSHOT-tests.jar > [ [1;34mINFO [m] [1m--- > - [m > [ [1;34mINFO [m] [1mReactor Summary: [m > [ [1;34mINFO [m] > [ [1;34mINFO [m] Apache Drill Root POM .. > [1;32mSUCCESS [m [ 11.686 s] > [ [1;34mINFO [m] tools/Parent Pom ... > [1;32mSUCCESS [m [ 1.118 s] > [ [1;34mINFO [m] tools/freemarker codegen tooling ... > [1;32mSUCCESS [m [ 7.546 s] > [ [1;34mINFO [m] Drill Protocol . > [1;32mSUCCESS [m [ 1.690 s] > [ [1;34mINFO [m] Common (Logical Plan, Base expressions) > [1;32mSUCCESS [m [ 6.200 s] > [ [1;34mINFO [m] Logical Plan, Base expressions . > [1;32mSUCCESS [m [ 6.076 s] > [ [1;34mINFO [m] exec/Parent Pom > [1;32mSUCCESS [m [ 2.769 s] > [ [1;34mINFO [m] exec/memory/Parent Pom . > [1;32mSUCCESS [m [ 0.825 s] > [ [1;34mINFO [m] exec/memory/base ... > [1;32mSUCCESS [m [ 3.922 s] > [ [1;34mINFO [m] exec/rpc ... > [1;32mSUCCESS [m [ 1.866 s] > [ [1;34mINFO [m] exec/Vectors ... > [1;32mSUCCESS [m [ 7.492 s] > [ [1;34mINFO [m] contrib/Parent Pom . > [1;32mSUCCESS [m [ 2.568 s] > [ [1;34mINFO [m] contrib/data/Parent Pom > [1;32mSUCCESS [m [ 0.693 s] > [ [1;34mINFO [m] contrib/data/tpch-sample-data .. > [1;32mSUCCESS [m [ 3.937 s] > [ [1;34mINFO [m] exec/Java Execution Engine . > [1;32mSUCCESS [m [28:30 min] > [ [1;34mINFO [m] exec/JDBC Driver using dependencies > [1;32mSUCCESS [m [02:47 min] > [ [1;34mINFO [m] JDBC JAR with all dependencies . > [1;32mSUCCESS [m [ 42.480 s] > [ [1;34mINFO [m] Drill-on-YARN .. > [1;32mSUCCESS [m [ 15.818 s] > [ [1;34mINFO [m] contrib/kudu-storage-plugin > [1;32mSUCCESS [m [ 10.498 s] > [ [1;34mINFO [m] contrib/opentsdb-storage-plugin > [1;32mSUCCESS [m [ 19.961 s] > [ [1;34mINFO [m] contrib/mongo-storage-plugin ... > [1;32mSUCCESS [m [ 52.961 s] > [ [1;34mINFO [m] contrib/hbase-storage-plugin ... > [1;32mSUCCESS [m [01:18 min] > [ [1;34mINFO [m] contrib/jdbc-storage-plugin > [1;32mSUCCESS [m [ 25.873 s] > [ [1;34mINFO [m] contrib/hive-storage-plugin/Parent Pom . > [1;32mSUCCESS [m [ 3.396 s] > [ [1;34mINFO [m] contrib/hive-storage-plugin/hive-exec-shaded ... > [1;32mSUCCESS [m [ 37.665 s] > [ [1;34mINFO [m] contrib/mapr-format-plugin . > [1;32mSUCCESS [m [ 5.200 s] > [ [1;34mINFO [m] contrib/hive-storage-plugin/core ... > [1;32mSUCCESS [m [ 10.537 s] > [ [1;34mINFO [m] contrib/drill-gis-plugin ... > [1;32mSUCCESS [m [ 21.492 s] > [ [1;34mINFO [m] contrib/kafka-storage-plugin ... > [1;32mSUCCESS [m [ 6.602 s] > [ [1;34mINFO [m] Packaging and Distribution Assembly > [1;32mSUCCESS [m [ 58.576 s] > [ [1;34mINFO [m] contrib/sqlline > [1;32mSUCCESS [m [ 1.570 s] > [ [1;34mINFO [m] [1m--- >
[GitHub] vvysotskyi commented on a change in pull request #1353: DRILL-6553: Fix TopN for unnest operator
vvysotskyi commented on a change in pull request #1353: DRILL-6553: Fix TopN for unnest operator URL: https://github.com/apache/drill/pull/1353#discussion_r199237071 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/lateraljoin/TestE2EUnnestAndLateral.java ## @@ -394,4 +401,30 @@ public void testLateral_HashAgg_with_nulls() throws Exception { .baselineValues("dd",222L) .build().run(); } + + @Test + public void testUnnestTopN() throws Exception { Review comment: Thanks, done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Agirish commented on issue #1350: DRILL-4580: Support for exporting storage plugin configurations
Agirish commented on issue #1350: DRILL-4580: Support for exporting storage plugin configurations URL: https://github.com/apache/drill/pull/1350#issuecomment-401424556 @arina-ielchiieva, I thought about that - but it's a bit more complicated in how i'd like it to be: An "Export All" button" should export each of the storage plugin configurations into JSON files and generate a single compressed folder which contains each of these JSON files individually. And an "Import" button which can support importing either a single JSON file or a compressed folder which can create one or more storage plugins. I think having individual export button is helpful. We could keep it in the "Update" page (image 2 above). We could remove it from the main storage plugin list page (image 1) if you think it's cluttering the UI. In a later PR, we can add support for the "Export All" and "Import" (one or more) buttons. Would that be okay? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ilooner commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch Sizing functionality
ilooner commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch Sizing functionality URL: https://github.com/apache/drill/pull/1330#issuecomment-401424230 @vrozov I discussed the logging with @sachouche. The issue is that some information needed to be logged for QA purposes on a per query basis, and for ease of testing needed to be toggled from the sqlline. This was the agreed on approach by the batch sizing team, so at this point we're coming in at the end of a series of discussions. Nevertheless I spent about half a day to find a way this use case could be met in a logback centric way, and I could not find a reasonable alternative that could be done without a large amount of work. The main issue is that slf4j / logback has global logging configurations, which makes it impossible to turn on logging on a per query bases. We can use MDC to include a query id in the log messages and we could filter by query, but even that is not ideal since you are generating logs you don't need for all queries. Then there is the issue of updating the logging configuration. The only logback native method is to enable periodic scanning of the logback.xml file and to update the logback.xml file on all nodes when you want to enable certain logging. That is not a great experience for someone automating tests, or debugging. The conclusion we came to was that Drill needs a real logging solution, and to really do it right you would have to go down the rabbit hole. 1. Enable per query logging configurations, not just a global logging configuration. To do this we would have to use thread local loggers instead of static loggers, or something like https://clogr.io/, or a custom ContextSelector https://stackoverflow.com/questions/38208617/practical-use-of-logback-context-selectors. 2. Allow either sql line configuration of log levels or web ui configuration of log levels (like Apex had). 3. Include a logging context with the query context to communicate the logging config for a query. These are non trivial, have a lot of corner cases, and require design discussions with the community. For the interim Salim is making the options controlling logging internal, so users will not see them. Then he is going to have a follow up PR to consolidate the logging across all the batch sizing operators. Later when we have a real logging solution we can delete these options and do the right thing. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container
Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r199230469 ## File path: distribution/Dockerfile ## @@ -0,0 +1,28 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +FROM centos:7 +ARG VERSION + +RUN yum install -y java-1.8.0-openjdk-devel which ; yum clean all ; rm -rf /var/cache/yum Review comment: Yes, that's right. I can add comments. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container
Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r199230386 ## File path: distribution/pom.xml ## @@ -485,6 +485,36 @@ + + + apache-release + + + +com.spotify +dockerfile-maven-plugin +1.4.3 + + +docker-image + Review comment: We can get rid of 'push' if you'd like. 'build' is good enough for most purposes. I'm not sure about 'package'. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container
Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r199230067 ## File path: distribution/pom.xml ## @@ -485,6 +485,36 @@ + + + apache-release + + + +com.spotify +dockerfile-maven-plugin +1.4.3 + + +docker-image + + build + push + + + + + drill/apache-drill-centos Review comment: 1. This is configurable - anyone can update it to any private or public Docker hub or any Container Registry. Do you have a different approach in mind? 2. I briefly touched upon this in the description. Since the ask was for an official Docker image for AD, it makes sense to build this once during the release. Anyone can download the Docker image at anytime, using a simple pull. Everything else remains the same. I also chose this approach to prevent Docker images from being generated everytime. 3. Tim and I have. We intend to give access to the 'drill' Docker Hub Organization to all committers. Perhaps, starting with release managers for every release. It's a good central place to have all related docker images for Drill. 4. Sure. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
Re: Actual vectorization execution
Hi Weijie, the Panama project is an OpenJDK initialitve, right [1] ? not Intel specific. It would be quite a bit of work to test and certify with Intel's JVM which may be still in the experimental stage. Also, you may have seen the Gandiva project for Apache Arrow which aims to improve vectorization for operations on Arrow buffers (this requires integration with Arrow). I assume the test program or workload you were running was already written to exploit vectorization. Have you also looked into Drill's code-gen to see which ones are amenable to vectorization ? We could start with some small use case and expand. [1] http://www.oracle.com/technetwork/java/jvmls2016-ajila-vidstedt-3125545.pdf On Fri, Jun 29, 2018 at 3:23 AM weijie tong wrote: > HI all: > > I have investigate some vector friendly java codes's jit assembly code by > the JITWatch tool . Then I found that JVM did not generate the expected AVX > code.According to some conclusion from the JVM expert , JVM only supply > some restrict usage case to generate AVX code. > >I found Intel have fired a project called panama, which supply the > intrinsic vector API to actual execute AVX code. Here is the reference ( > > https://software.intel.com/en-us/articles/vector-api-developer-program-for-java > ) > . It also supports offheap calculation. From our JVM team's message, the > vector api will be released at JDK11. > >So I wonder whether we can distribute Intel's current JVM as a supplied > default JVM to users (like spark distribution with a default scala) and as > a option to rewrite parts of our operator codes according to this new > vector api. >
[jira] [Created] (DRILL-6561) Lateral excluding the columns from output container provided by projection push into rules
Sorabh Hamirwasia created DRILL-6561: Summary: Lateral excluding the columns from output container provided by projection push into rules Key: DRILL-6561 URL: https://issues.apache.org/jira/browse/DRILL-6561 Project: Apache Drill Issue Type: Improvement Components: Execution - Relational Operators Affects Versions: 1.14.0 Reporter: Sorabh Hamirwasia Assignee: Sorabh Hamirwasia Fix For: 1.14.0 With DRILL-6545, LateralPop will have information about list of columns to be excluded from the Lateral output container. Mostly this is used to avoid producing origin repeated column in Lateral output if it's not required from the projection list. This is needed because in absence of it Lateral has to copy the repeated column N number of times where N is the number of rows in right incoming batch for each left incoming batch row. This copy was very costly both from memory and latency perspective. Hence avoiding it is a must for Lateral-Unnest case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container
Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container URL: https://github.com/apache/drill/pull/1348#discussion_r199228648 ## File path: distribution/Dockerfile ## @@ -0,0 +1,28 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +FROM centos:7 +ARG VERSION + +RUN yum install -y java-1.8.0-openjdk-devel which ; yum clean all ; rm -rf /var/cache/yum + +COPY target/apache-drill-$VERSION.tar.gz /tmp +RUN mkdir /opt/drill +RUN tar -xvzf /tmp/apache-drill-$VERSION.tar.gz --directory=/opt/drill --strip-components 1 + +ENTRYPOINT /opt/drill/bin/drill-embedded Review comment: So in the current implementation, when a user connects to the container using the docker run command, it will start Drill in embedded mode and go into the Sqlline prompt. Once they quit, the Drillbit process stops and container exits automatically. Each time with the docker run command, you get a Sqlline session. Please see the description above for an example. In my original implementation, I have an init script which runs forever (keeping the container alive until someone explicitly stops it). That requires ZK. I discussed with Tim and we thought getting Embedded mode into Apache Drill codebase would be a good first step. And then we can discuss on the approach we'd like to take to get the stand-alone mode working. See: https://github.com/Agirish/drill-containers/blob/master/dockerfiles/apache-drill-centos-1.13.0/init-script https://github.com/Agirish/drill-containers/blob/master/dockerfiles/apache-zookeeper-centos-3.4.10/init-script This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-6560) Allow options for controlling the batch size per operator
salim achouche created DRILL-6560: - Summary: Allow options for controlling the batch size per operator Key: DRILL-6560 URL: https://issues.apache.org/jira/browse/DRILL-6560 Project: Apache Drill Issue Type: Improvement Components: Execution - Flow Reporter: salim achouche Assignee: salim achouche Fix For: 1.14.0 This Jira is for internal Drill DEV use; the following capabilities are needed for automating the batch sizing functionality testing: * Control the enablement of batch sizing statistics at session (per query) and server level (all queries) * Control the granularity of batch sizing statistics (summary or verbose) * Control the set of operators that should log batch statistics -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: DRILL-6519 Tavis CI timing out
We are just finalizing the list of tests we are going to be marked as slow. Once done, this would be an easy fix :) Kind regards, Arina On Fri, Jun 29, 2018 at 7:41 PM Charles Givre wrote: > Hi Vitalii > I saw the discussion but is there anything that I can do to prevent this > from happening? > > Sent from my iPhone > > > On Jun 29, 2018, at 12:34, Vitalii Diravka > wrote: > > > > Hi Charles, > > > > Recently there was a discussion in dev mailing list regarding this. > Please > > take a look. > > Also I have created Jira ticket [1]. > > > > [1] https://issues.apache.org/jira/browse/DRILL-6559 > > > > Kind regards > > Vitalii > > > > > >> On Fri, Jun 29, 2018 at 5:43 PM Charles Givre wrote: > >> > >> Hi Arina, > >> Could you or someone take a look at the CI. I’ve submitted the changes, > >> the CI seems to build successfully, but times out and thus fails the > check. > >> Thanks, > >> — C >
Re: [DISCUSSION] Travis build failures
Or we can be more radical here and exclude all tests from contib module? On Fri, Jun 29, 2018 at 7:28 PM Vitalii Diravka wrote: > Agree, TestTpchExplain and TestTpchPlanning are fast enough. > TestTpchLimit0 has PlannerTest category, but it is slower than the above > tests. > Actually only TestTpchSingleMode and TestTpchLimit0 are candidates for > excluding. > I am not sure that it is enough for Travis (45 seconds on my machine). What > about Mongo unit tests? > > I have created Jira for it [1]. > > [1] https://issues.apache.org/jira/browse/DRILL-6559 > > Kind regards > Vitalii > > > On Wed, Jun 27, 2018 at 10:29 PM Aman Sinha wrote: > > > Sounds good but why exclude the planning tests ? Only the TPC-H runtime > > tests should be excluded I think. > > > > On Wed, Jun 27, 2018 at 12:16 PM Timothy Farkas > wrote: > > > > > +1 > > > > > > On Wed, Jun 27, 2018 at 10:00 AM, Vitalii Diravka < > > > vitalii.dira...@gmail.com > > > > wrote: > > > > > > > This is a topic from last Hangout meeting. > > > > > > > > Sometimes Drill Travis Build fails because of job run expires. > > > > Certainly, the right way is to accelerate Drill execution :) > > > > > > > > Nevertheless I believe we could consider excluding some more tests > from > > > > Travis Build. > > > > We can add all TPCH tests ( > > > > TestTpchLimit0, TestTpchExplain, TestTpchPlanning, TestTpchExplain) > to > > > the > > > > SlowTest category. > > > > Actually it isn't urgent, but we can consider it for future, if this > > > > happens more often. > > > > > > > > Kind regards > > > > Vitalii > > > > > > > > > >
Re: DRILL-6519 Tavis CI timing out
Hi Vitalii I saw the discussion but is there anything that I can do to prevent this from happening? Sent from my iPhone > On Jun 29, 2018, at 12:34, Vitalii Diravka wrote: > > Hi Charles, > > Recently there was a discussion in dev mailing list regarding this. Please > take a look. > Also I have created Jira ticket [1]. > > [1] https://issues.apache.org/jira/browse/DRILL-6559 > > Kind regards > Vitalii > > >> On Fri, Jun 29, 2018 at 5:43 PM Charles Givre wrote: >> >> Hi Arina, >> Could you or someone take a look at the CI. I’ve submitted the changes, >> the CI seems to build successfully, but times out and thus fails the check. >> Thanks, >> — C
[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling URL: https://github.com/apache/drill/pull/1349#discussion_r199217282 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java ## @@ -417,16 +417,9 @@ public static DateCorruptionStatus checkForCorruptDateValuesInStatistics(Parquet // column does not appear in this file, skip it continue; } - Statistics statistics = footer.getBlocks().get(rowGroupIndex).getColumns().get(colIndex).getStatistics(); - Integer max = (Integer) statistics.genericGetMax(); - if (statistics.hasNonNullValue()) { -if (max > ParquetReaderUtility.DATE_CORRUPTION_THRESHOLD) { - return DateCorruptionStatus.META_SHOWS_CORRUPTION; -} - } else { -// no statistics, go check the first page -return DateCorruptionStatus.META_UNCLEAR_TEST_VALUES; - } + IntStatistics statistics = (IntStatistics)footer.getBlocks().get(rowGroupIndex).getColumns().get(colIndex).getStatistics(); Review comment: I don't see any specific code style in regards to spacing for casting (grep -I -R \(\([a-zA-Z0-9_]*\)[a-zA-Z0-9_] * | grep -c java). It seems to be a preference of a contributor. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
Re: DRILL-6519 Tavis CI timing out
Hi Charles, Recently there was a discussion in dev mailing list regarding this. Please take a look. Also I have created Jira ticket [1]. [1] https://issues.apache.org/jira/browse/DRILL-6559 Kind regards Vitalii On Fri, Jun 29, 2018 at 5:43 PM Charles Givre wrote: > Hi Arina, > Could you or someone take a look at the CI. I’ve submitted the changes, > the CI seems to build successfully, but times out and thus fails the check. > Thanks, > — C
[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling URL: https://github.com/apache/drill/pull/1349#discussion_r199215493 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java ## @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) { * * @param stat parquet column statistics * @param rowCount number of rows in the parquet file - * @return True if all rows are null in the parquet file - * False if at least one row is not null. + * @return true if all rows are null in the parquet file and false otherwise */ static boolean isAllNulls(Statistics stat, long rowCount) { -return stat.isNumNullsSet() && stat.getNumNulls() == rowCount; +Preconditions.checkArgument(rowCount >= 0, String.format("negative rowCount %d is not valid", rowCount)); Review comment: It does not matter where it comes from (it actually comes from `RowGroupInfo`). The condition needs to be intercepted prior to calling `isAllNulls()` as `isAllNulls()` would return the wrong result (whether true or false) for a negative row count. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
Re: [DISCUSSION] Travis build failures
Agree, TestTpchExplain and TestTpchPlanning are fast enough. TestTpchLimit0 has PlannerTest category, but it is slower than the above tests. Actually only TestTpchSingleMode and TestTpchLimit0 are candidates for excluding. I am not sure that it is enough for Travis (45 seconds on my machine). What about Mongo unit tests? I have created Jira for it [1]. [1] https://issues.apache.org/jira/browse/DRILL-6559 Kind regards Vitalii On Wed, Jun 27, 2018 at 10:29 PM Aman Sinha wrote: > Sounds good but why exclude the planning tests ? Only the TPC-H runtime > tests should be excluded I think. > > On Wed, Jun 27, 2018 at 12:16 PM Timothy Farkas wrote: > > > +1 > > > > On Wed, Jun 27, 2018 at 10:00 AM, Vitalii Diravka < > > vitalii.dira...@gmail.com > > > wrote: > > > > > This is a topic from last Hangout meeting. > > > > > > Sometimes Drill Travis Build fails because of job run expires. > > > Certainly, the right way is to accelerate Drill execution :) > > > > > > Nevertheless I believe we could consider excluding some more tests from > > > Travis Build. > > > We can add all TPCH tests ( > > > TestTpchLimit0, TestTpchExplain, TestTpchPlanning, TestTpchExplain) to > > the > > > SlowTest category. > > > Actually it isn't urgent, but we can consider it for future, if this > > > happens more often. > > > > > > Kind regards > > > Vitalii > > > > > >
[GitHub] arina-ielchiieva commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling
arina-ielchiieva commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling URL: https://github.com/apache/drill/pull/1349#discussion_r199212137 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java ## @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) { * * @param stat parquet column statistics * @param rowCount number of rows in the parquet file - * @return True if all rows are null in the parquet file - * False if at least one row is not null. + * @return true if all rows are null in the parquet file and false otherwise */ static boolean isAllNulls(Statistics stat, long rowCount) { -return stat.isNumNullsSet() && stat.getNumNulls() == rowCount; +Preconditions.checkArgument(rowCount >= 0, String.format("negative rowCount %d is not valid", rowCount)); Review comment: But this negative rowCount could only come from the statistics, I don't think that bad statistics should prevent user from executing the query. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-6559) Travis timing out
Vitalii Diravka created DRILL-6559: -- Summary: Travis timing out Key: DRILL-6559 URL: https://issues.apache.org/jira/browse/DRILL-6559 Project: Apache Drill Issue Type: Improvement Components: Tools, Build Test Affects Versions: 1.13.0 Reporter: Vitalii Diravka Assignee: Vitalii Diravka Fix For: 1.14.0 There was a decision to exclude some more TPCH unit tests from Travis CI build by adding TestTpchSingleMode, TestTpchLimit0 (optionally). Excluding other test cases are discussed. https://lists.apache.org/thread.html/35f41f16d1679029e6089e447e2a85534243e9d5904116114c5168e8@%3Cdev.drill.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] sohami commented on a change in pull request #1353: DRILL-6553: Fix TopN for unnest operator
sohami commented on a change in pull request #1353: DRILL-6553: Fix TopN for unnest operator URL: https://github.com/apache/drill/pull/1353#discussion_r199210264 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/lateraljoin/TestE2EUnnestAndLateral.java ## @@ -394,4 +401,30 @@ public void testLateral_HashAgg_with_nulls() throws Exception { .baselineValues("dd",222L) .build().run(); } + + @Test + public void testUnnestTopN() throws Exception { Review comment: Please make this test part of `TestLateralPlans.java` where we have other plan verification tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling URL: https://github.com/apache/drill/pull/1349#discussion_r199210267 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java ## @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) { * * @param stat parquet column statistics * @param rowCount number of rows in the parquet file - * @return True if all rows are null in the parquet file - * False if at least one row is not null. + * @return true if all rows are null in the parquet file and false otherwise */ static boolean isAllNulls(Statistics stat, long rowCount) { -return stat.isNumNullsSet() && stat.getNumNulls() == rowCount; +Preconditions.checkArgument(rowCount >= 0, String.format("negative rowCount %d is not valid", rowCount)); Review comment: I hope for the reverse. Negative row count passed to this method indicates a bug, and bug means incorrect result. I'd rather fail a query than give a wrong result. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] arina-ielchiieva commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling
arina-ielchiieva commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling URL: https://github.com/apache/drill/pull/1349#discussion_r199209381 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java ## @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) { * * @param stat parquet column statistics * @param rowCount number of rows in the parquet file - * @return True if all rows are null in the parquet file - * False if at least one row is not null. + * @return true if all rows are null in the parquet file and false otherwise */ static boolean isAllNulls(Statistics stat, long rowCount) { -return stat.isNumNullsSet() && stat.getNumNulls() == rowCount; +Preconditions.checkArgument(rowCount >= 0, String.format("negative rowCount %d is not valid", rowCount)); Review comment: Ok, then just please describe what will happen during query execution if negative row comes into this method, will query fail or statistics will be ignored and there will not filter push down? I hope for the second. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-6558) Drill query fails when file name contains semicolon
Volodymyr Vysotskyi created DRILL-6558: -- Summary: Drill query fails when file name contains semicolon Key: DRILL-6558 URL: https://issues.apache.org/jira/browse/DRILL-6558 Project: Apache Drill Issue Type: Bug Reporter: Volodymyr Vysotskyi Queries on the tables which contain semicolon in the name: {code:sql} select * from dfs.`/tmp/q:0` {code} fails with error: {noformat} org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: java.net.URISyntaxException: Relative path in absolute URI: q:0 SQL Query null [Error Id: 34fafee1-8fbe-4fe0-9fcb-ddcc926bb192 on user515050-pc:31010] (java.lang.IllegalArgumentException) java.net.URISyntaxException: Relative path in absolute URI: q:0 org.apache.hadoop.fs.Path.initialize():205 org.apache.hadoop.fs.Path.():171 org.apache.hadoop.fs.Path.():93 org.apache.hadoop.fs.Globber.glob():253 org.apache.hadoop.fs.FileSystem.globStatus():1655 org.apache.drill.exec.store.dfs.DrillFileSystem.globStatus():547 org.apache.drill.exec.store.dfs.FileSelection.create():274 org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create():607 org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create():408 org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry():96 org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get():90 org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable():561 org.apache.drill.exec.store.dfs.FileSystemSchemaFactory$FileSystemSchema.getTable():132 org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable():82 org.apache.calcite.jdbc.CalciteSchema.getTable():257 org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEntryFrom():1022 org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEntry():979 org.apache.calcite.prepare.CalciteCatalogReader.getTable():123 org.apache.drill.exec.planner.sql.SqlConverter$DrillCalciteCatalogReader.getTable():650 org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom():260 org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3219 org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60 org.apache.calcite.sql.validate.AbstractNamespace.validate():84 org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():947 org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():928 org.apache.calcite.sql.SqlSelect.validate():226 org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():903 org.apache.calcite.sql.validate.SqlValidatorImpl.validate():613 org.apache.drill.exec.planner.sql.SqlConverter.validate():190 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():644 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():204 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():176 org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():145 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():83 org.apache.drill.exec.work.foreman.Foreman.runSQL():567 org.apache.drill.exec.work.foreman.Foreman.run():266 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 Caused By (java.net.URISyntaxException) Relative path in absolute URI: q:0 java.net.URI.checkPath():1823 java.net.URI.():745 org.apache.hadoop.fs.Path.initialize():202 org.apache.hadoop.fs.Path.():171 org.apache.hadoop.fs.Path.():93 org.apache.hadoop.fs.Globber.glob():253 org.apache.hadoop.fs.FileSystem.globStatus():1655 org.apache.drill.exec.store.dfs.DrillFileSystem.globStatus():547 org.apache.drill.exec.store.dfs.FileSelection.create():274 org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create():607 org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create():408 org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry():96 org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get():90 org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable():561 org.apache.drill.exec.store.dfs.FileSystemSchemaFactory$FileSystemSchema.getTable():132 org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable():82 org.apache.calcite.jdbc.CalciteSchema.getTable():257 org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEntryFrom():1022 org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEntry():979 org.apache.calcite.prepare.CalciteCatalogReader.getTable():123 org.apache.drill.exec.planner.sql.SqlConverter$DrillCalciteCatalogReader.getTable():650 org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom():260 org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3219 org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling URL: https://github.com/apache/drill/pull/1349#discussion_r199208654 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java ## @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) { * * @param stat parquet column statistics * @param rowCount number of rows in the parquet file - * @return True if all rows are null in the parquet file - * False if at least one row is not null. + * @return true if all rows are null in the parquet file and false otherwise */ static boolean isAllNulls(Statistics stat, long rowCount) { -return stat.isNumNullsSet() && stat.getNumNulls() == rowCount; +Preconditions.checkArgument(rowCount >= 0, String.format("negative rowCount %d is not valid", rowCount)); Review comment: To validate input. It can't give the correct answer if an input is a junk (`rowCount` is negative). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling URL: https://github.com/apache/drill/pull/1349#discussion_r199207853 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java ## @@ -417,16 +417,9 @@ public static DateCorruptionStatus checkForCorruptDateValuesInStatistics(Parquet // column does not appear in this file, skip it continue; } - Statistics statistics = footer.getBlocks().get(rowGroupIndex).getColumns().get(colIndex).getStatistics(); - Integer max = (Integer) statistics.genericGetMax(); - if (statistics.hasNonNullValue()) { -if (max > ParquetReaderUtility.DATE_CORRUPTION_THRESHOLD) { - return DateCorruptionStatus.META_SHOWS_CORRUPTION; -} - } else { -// no statistics, go check the first page -return DateCorruptionStatus.META_UNCLEAR_TEST_VALUES; - } + IntStatistics statistics = (IntStatistics)footer.getBlocks().get(rowGroupIndex).getColumns().get(colIndex).getStatistics(); Review comment: Please see parquet format spec. `DATE` is always `int32`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] arina-ielchiieva commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling
arina-ielchiieva commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling URL: https://github.com/apache/drill/pull/1349#discussion_r199207189 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java ## @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) { * * @param stat parquet column statistics * @param rowCount number of rows in the parquet file - * @return True if all rows are null in the parquet file - * False if at least one row is not null. + * @return true if all rows are null in the parquet file and false otherwise */ static boolean isAllNulls(Statistics stat, long rowCount) { -return stat.isNumNullsSet() && stat.getNumNulls() == rowCount; +Preconditions.checkArgument(rowCount >= 0, String.format("negative rowCount %d is not valid", rowCount)); Review comment: I think this check is not needed here :) Why this method should check what was given to it? It's job just to compare row count and number of nulls. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services