Re: Actual vectorization execution

2018-06-29 Thread weijie tong
Yes, Panama is a OpenJDK project ,Intel is the prime contributor. Some team
of our company has tried this feature . I have a plan to try this feature
to Drill. As @aman points out, it really will take some work to validate
its stable and the newer version JDK.

Gandiva brings a great idea to utilize LLVM to generate dynamic code. But
to a JVM project , it can not cross the JNI to use C++ code. So Gandiva
designed to operate on a batch data to compensate the JNI invocation cost.
And it maybe only suit to physical operators like Project , Filter to work
as one RecordBatch one JNI invocation to operate on a batch off-heap data .
To HashAggregate ,HashJoin which operates a group row data, the JNI
invocation times will too frequent to cause a performance cost. The
Panama's intrinsic vector API has no JNI cost and gives us more flexible
ability to code.

I don't know whether we have a plan to move to Arrow. If not, we can
reference Gandiva's idea to let some operator or partition of its logic to
worked as C++ code.

I agree with Paul's description of current Drill's execution. Current is
only memory column data, not actual vectorized execution. Even a
for-loop-a-column-data will not guarantee to generate SIMD code by the JVM
JIT. Of course ,this still a good performance compared to others ,as this
will be cpu pipeline friendly and a data local programing behavior. I have
researched some papers like[1] to find a way to let operators like
HashAggregate , HashJoin to be a vectorized execution. But java lacks the
ability to directly generate vectorized code. It makes me worry that a
rewrite operator will not have any notable performance promotion. So I look
forward to Panama.

[1] http://www.cs.columbia.edu/~orestis/sigmod15.pdf









On Sat, Jun 30, 2018 at 4:08 AM Paul Rogers 
wrote:

> Hi Weijie,
>
> As it turns out, vectorized processing in Drill is more aspirational than
> operational at this point in time.
>
> The code used in Drill is not actually vector-based even though the data
> itself is columnar. Drill generally does row-wise operations because
> row-wise operations fit the SQL semantics better than column-wise
> operations. There is generally a "loop over all rows" block of code that
> calls into a "do something for column a" block, followed by a "do something
> for column b" block, etc.
>
> For vectorized processing, the loops have to be inverted: "do all column
> a" followed by "do all column b". That is not always possible, however.
>
> Further, many of Drill's readers produce Nullable types. In this case,
> every value carries a null/not-null flag which must be checked for each
> data value. It is unlikely that CPU instructions exist for this case.
>
> So, a first step is to research how various operators could be vectorized.
> For example, how would we handle a "WHERE x = 10" case in a way that would
> benefit from vectorization? How about a "SUM(x)" case?
>
> Once that is sorted out (there are likely research papers that explain how
> others have done it), we can move onto changing the generated code (the
> loop-over-all-rows code) to use the newer design.
>
> Thanks,
> - Paul
>
>
>
> On Friday, June 29, 2018, 10:30:04 AM PDT, Aman Sinha <
> amansi...@gmail.com> wrote:
>
>  Hi Weijie,  the Panama project is an OpenJDK initialitve, right [1] ? not
> Intel specific.
> It would be quite a bit of work to test and certify with Intel's JVM which
> may be still in the experimental stage.
> Also, you may have seen the Gandiva project for Apache Arrow which aims to
> improve vectorization for operations
> on Arrow buffers (this requires integration with Arrow).
>
> I assume the test program or workload you were running was already written
> to exploit vectorization.  Have you also looked into
> Drill's code-gen to see which ones are amenable to vectorization ?  We
> could start with some small use case and expand.
>
> [1]
> http://www.oracle.com/technetwork/java/jvmls2016-ajila-vidstedt-3125545.pdf
>
> On Fri, Jun 29, 2018 at 3:23 AM weijie tong 
> wrote:
>
> > HI all:
> >
> >  I have investigate some vector friendly java codes's jit assembly code
> by
> > the JITWatch tool . Then I found that JVM did not generate the expected
> AVX
> > code.According to some conclusion from the JVM expert , JVM only supply
> > some restrict usage case to generate AVX code.
> >
> >I found Intel have fired a project called panama, which supply the
> > intrinsic vector API to actual execute AVX code. Here is the reference (
> >
> >
> https://software.intel.com/en-us/articles/vector-api-developer-program-for-java
> > )
> > . It also supports offheap calculation.  From our JVM team's message, the
> > vector api will be released at JDK11.
> >
> >So I wonder whether we can distribute Intel's current JVM as a
> supplied
> > default JVM to users (like spark distribution with a default scala) and
> as
> > a option to rewrite parts of our operator codes according to this new
> > vector api.
> >
>


[GitHub] HanumathRao commented on issue #1347: DRILL-6545: Projection Push down into Lateral Join operator.

2018-06-29 Thread GitBox
HanumathRao commented on issue #1347: DRILL-6545: Projection Push down into 
Lateral Join operator.
URL: https://github.com/apache/drill/pull/1347#issuecomment-401513962
 
 
   @amansinha100  Thank you for the review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sachouche commented on issue #1355: DRILL-6560: Enhanced the batch statistics logging enablement

2018-06-29 Thread GitBox
sachouche commented on issue #1355: DRILL-6560: Enhanced the batch statistics 
logging enablement
URL: https://github.com/apache/drill/pull/1355#issuecomment-401513955
 
 
   @bitblender, can you please review this PR?
   
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sachouche opened a new pull request #1355: DRILL-6560: Enhanced the batch statistics logging enablement

2018-06-29 Thread GitBox
sachouche opened a new pull request #1355: DRILL-6560: Enhanced the batch 
statistics logging enablement
URL: https://github.com/apache/drill/pull/1355
 
 
   This PR is reserved for internal Drill testing. It provides more logging 
control when testing the batch sizing functionality. At this time, only the 
Parquet reader consumes this functionality. Other operators need to be 
retrofitted to use the centralized batch sizing logging utility.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sachouche commented on issue #1354: DRILL-6570: Fixed IndexOutofBoundException in Parquet Reader

2018-06-29 Thread GitBox
sachouche commented on issue #1354: DRILL-6570: Fixed IndexOutofBoundException 
in Parquet Reader
URL: https://github.com/apache/drill/pull/1354#issuecomment-401513254
 
 
   @kkhatua, can you please review this fix?
   
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sachouche opened a new pull request #1354: DRILL-6570: Fixed IndexOutofBoundException in Parquet Reader

2018-06-29 Thread GitBox
sachouche opened a new pull request #1354: DRILL-6570: Fixed 
IndexOutofBoundException in Parquet Reader
URL: https://github.com/apache/drill/pull/1354
 
 
   Reserving same size intermediary buffers to handle the case of 
false-positive; that is, a column is first thought to be fixed length (after 
sampling), then reverted to variable length as we read more values.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6570) IndexOutOfBoundsException when using Flat Parquet Reader

2018-06-29 Thread salim achouche (JIRA)
salim achouche created DRILL-6570:
-

 Summary: IndexOutOfBoundsException when using Flat Parquet  Reader
 Key: DRILL-6570
 URL: https://issues.apache.org/jira/browse/DRILL-6570
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Parquet
Reporter: salim achouche
Assignee: salim achouche
 Fix For: 1.14.0


* The Parquet Reader creates a reusable bulk entry based on the column precision
 * It uses the column precision for optimizing the intermediary heap buffers
 * It first detected the column was fixed length but then it reverted this 
assumption when the column changed precision
 * This step was fine except the bulk entry memory requirement changed though 
the code didn't update the bulk entry intermediary buffers

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199308452
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsHandler.java
 ##
 @@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.exec.planner.logical.StoragePlugins;
+import org.apache.drill.exec.store.sys.PersistentStore;
+
+
+/**
+ * Storage plugins handler is an additional service for updating storage 
plugins configs from the file
+ */
+public interface StoragePluginsHandler {
+
+  /**
+   * Update incoming storage plugins configs from persistence store if 
present, otherwise bootstrap plugins configs.
+   * One of the params should be null, second shouldn't
+   *
+   * @param persistentStore the last storage plugins configs from persistence 
store
+   * @param bootstrapPlugins bootstrap storage plugins, which are used in case 
of first Drill start up
+   * @return all storage plugins, which should be loaded into persistence store
+   */
+  void loadPlugins(PersistentStore persistentStore, 
StoragePlugins bootstrapPlugins);
 
 Review comment:
   I thought about it. There was some difficulties with it:
   * One - we need to do more calls to PS. We need to get all plugins from PS 
with iterator one by one.
 In my case we even don't get plugins (if `enabled` status is defined in 
the new plugin).
   * Arina proposed to determine handler not just for updating, but for loading 
plugins configs too. In this case it is more independent and the logic became 
simpler (annotations show nullability of variables)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199308790
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsHandlerService.java
 ##
 @@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.base.Charsets;
+import com.google.common.io.Resources;
+import com.jasonclawson.jackson.dataformat.hocon.HoconFactory;
+import org.apache.drill.common.config.CommonConstants;
+import org.apache.drill.common.config.LogicalPlanPersistence;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.common.scanner.ClassPathScanner;
+import org.apache.drill.exec.planner.logical.StoragePlugins;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.sys.PersistentStore;
+
+import javax.annotation.Nullable;
+import javax.validation.constraints.NotNull;
+import java.io.IOException;
+import java.net.URL;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Set;
+
+/**
+ * Drill plugins handler, which allows to update storage plugins configs from 
the
+ * {@link CommonConstants#STORAGE_PLUGINS_OVERRIDE_CONF} conf file
+ *
+ * TODO: DRILL-6564: It can be improved with configs versioning and service of 
creating
+ * {@link CommonConstants#STORAGE_PLUGINS_OVERRIDE_CONF}
+ */
+public class StoragePluginsHandlerService implements StoragePluginsHandler {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(StoragePluginsHandlerService.class);
+
+  private final LogicalPlanPersistence hoconLogicalPlanPersistence;
+
+  public StoragePluginsHandlerService(DrillbitContext context) {
+hoconLogicalPlanPersistence = new 
LogicalPlanPersistence(context.getConfig(), context.getClasspathScan(),
+new ObjectMapper(new HoconFactory()));
+  }
+
+  @Override
+  public void loadPlugins(@NotNull PersistentStore 
persistentStore,
+  @Nullable StoragePlugins bootstrapPlugins) {
+// if bootstrapPlugins is not null -- fresh Drill set up
+StoragePlugins pluginsToBeWrittenToPersistentStore;
+
+StoragePlugins newPlugins = getNewStoragePlugins();
+
+if (newPlugins != null) {
+  pluginsToBeWrittenToPersistentStore = new StoragePlugins(new 
HashMap<>());
+  Optional.ofNullable(bootstrapPlugins)
+  .ifPresent(pluginsToBeWrittenToPersistentStore::putAll);
+
+  for (Map.Entry newPlugin : newPlugins) {
+String pluginName = newPlugin.getKey();
+StoragePluginConfig oldPluginConfig = 
Optional.ofNullable(bootstrapPlugins)
+.map(plugins -> plugins.getConfig(pluginName))
+.orElse(persistentStore.get(pluginName));
+StoragePluginConfig updatedStatusPluginConfig = 
updatePluginStatus(oldPluginConfig, newPlugin.getValue());
+pluginsToBeWrittenToPersistentStore.put(pluginName, 
updatedStatusPluginConfig);
+  }
+} else {
+  pluginsToBeWrittenToPersistentStore = bootstrapPlugins;
+}
+
+// load pluginsToBeWrittenToPersistentStore to Persistent Store
+Optional.ofNullable(pluginsToBeWrittenToPersistentStore)
+.ifPresent(plugins -> plugins.forEach(plugin -> 
persistentStore.put(plugin.getKey(), plugin.getValue(;
+  }
+
+  /**
+   * Helper method to identify the enabled status for new storage plugins 
config. If this status is absent in the updater
+   * file, the status is kept from the configs, which are going to be updated
+   *
+   * @param oldPluginConfig current storage plugin config from Persistent 
Store or bootstrap config file
+   * @param newPluginConfig new storage plugin config
+   * @return new storage plugin config with updated enabled status
+   */
+  private StoragePluginConfig updatePluginStatus(@Nullable StoragePluginConfig 
oldPluginConfig,
+ @NotNull StoragePluginConfig 
newPluginConfig) {
+if 

[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199307744
 
 

 ##
 File path: 
contrib/storage-kafka/src/main/resources/bootstrap-storage-plugins.json
 ##
 @@ -2,8 +2,8 @@
   "storage":{
 kafka : {
   type:"kafka",
-  enabled: false,
-  kafkaConsumerProps: {"bootstrap.servers":"localhost:9092", "group.id" : 
"drill-consumer"}
+  kafkaConsumerProps: {"bootstrap.servers":"localhost:9092", "group.id" : 
"drill-consumer"},
+  enabled: false
 
 Review comment:
   Currently after deserializing all the plugins configs are shown in Drill UI 
with enabled status in the end.
   It is replaced in config files too for consistency. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199310178
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
 ##
 @@ -121,69 +121,73 @@ public StoragePlugin load(StoragePluginConfig config) 
throws Exception {
   }
 
   @Override
-  public void init() throws DrillbitStartupException {
+  public void init() {
 availablePlugins = findAvailablePlugins(classpathScan);
+try {
+  StoragePlugins bootstrapPlugins = pluginSystemTable.getAll().hasNext() ? 
null : loadBootstrapPlugins();
 
-// create registered plugins defined in "storage-plugins.json"
-plugins.putAll(createPlugins());
+  StoragePluginsHandler storagePluginsUpdaterService = new 
StoragePluginsHandlerService(context);
+  storagePluginsUpdaterService.loadPlugins(pluginSystemTable, 
bootstrapPlugins);
+
+  // Defines enabled plugins
+  defineEnabledPlugins();
+} catch (IOException e) {
+  logger.error("Failure setting up storage enabledPlugins.  Drillbit 
exiting.", e);
+  throw new IllegalStateException(e);
+}
   }
 
-  @SuppressWarnings("resource")
-  private Map createPlugins() throws 
DrillbitStartupException {
-try {
-  /*
-   * Check if the storage plugins system table has any entries. If not, 
load the boostrap-storage-plugin file into
-   * the system table.
-   */
-  if (!pluginSystemTable.getAll().hasNext()) {
-// bootstrap load the config since no plugins are stored.
-logger.info("No storage plugin instances configured in persistent 
store, loading bootstrap configuration.");
-Collection urls = 
ClassPathScanner.forResource(ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE, 
false);
-if (urls != null && !urls.isEmpty()) {
-  logger.info("Loading the storage plugin configs from URLs {}.", 
urls);
-  Map pluginURLMap = Maps.newHashMap();
-  for (URL url : urls) {
-String pluginsData = Resources.toString(url, Charsets.UTF_8);
-StoragePlugins plugins = 
lpPersistence.getMapper().readValue(pluginsData, StoragePlugins.class);
-for (Map.Entry config : plugins) {
-  if (!definePluginConfig(config.getKey(), config.getValue())) {
-logger.warn("Duplicate plugin instance '{}' defined in [{}, 
{}], ignoring the later one.",
-config.getKey(), pluginURLMap.get(config.getKey()), url);
-continue;
-  }
-  pluginURLMap.put(config.getKey(), url);
-}
-  }
-} else {
-  throw new IOException("Failure finding " + 
ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE);
-}
+  /**
+   * Read bootstrap storage plugins {@link 
ExecConstants#BOOTSTRAP_STORAGE_PLUGINS_FILE} files for the first fresh
+   * instantiating of Drill
+   *
+   * @return bootstrap storage plugins
+   * @throws IOException if a read error occurs
+   */
+  private StoragePlugins loadBootstrapPlugins() throws IOException {
+// bootstrap load the config since no plugins are stored.
+logger.info("No storage plugin instances configured in persistent store, 
loading bootstrap configuration.");
+Set urls = 
ClassPathScanner.forResource(ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE, 
false);
+if (urls != null && !urls.isEmpty()) {
+  logger.info("Loading the storage plugin configs from URLs {}.", urls);
+  StoragePlugins bootstrapPlugins = new StoragePlugins(new HashMap<>());
+  for (URL url : urls) {
+String pluginsData = Resources.toString(url, Charsets.UTF_8);
+
bootstrapPlugins.putAll(lpPersistence.getMapper().readValue(pluginsData, 
StoragePlugins.class));
 
 Review comment:
   Agree. Simpler doesn't mean better. 
   Returned the previous logic.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on issue #1333: DRILL-6410: Memory leak in Parquet Reader during cancellation

2018-06-29 Thread GitBox
ilooner commented on issue #1333: DRILL-6410: Memory leak in Parquet Reader 
during cancellation
URL: https://github.com/apache/drill/pull/1333#issuecomment-401509765
 
 
   @priteshm will take a look monday.
   @vrozov please fix conflict and travis failures.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Agirish commented on issue #1348: DRILL-6346: Create an Official Drill Docker Container

2018-06-29 Thread GitBox
Agirish commented on issue #1348: DRILL-6346: Create an Official Drill Docker 
Container
URL: https://github.com/apache/drill/pull/1348#issuecomment-401508476
 
 
   @arina-ielchiieva , on second thoughts, i think having a seperate profile is 
better. Created a new profile called 'docker'. 
   Steps remain mostly similar:
   Build Project (any profile can be provided. If it's 'docker' profile, then 
docker image is built)
   `$ mvn clean install`
   Build Docker image Push image to Docker Hub (or any Container Registry):
   ```
   cd distribution
   mvn dockerfile:build -Pdocker
   mvn dockerfile:push -Pdocker
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6569) Jenkins Regression: TPCDS query 19 fails with INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_

2018-06-29 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6569:
-

 Summary: Jenkins Regression: TPCDS query 19 fails with 
INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 in file 
maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet
 Key: DRILL-6569
 URL: https://issues.apache.org/jira/browse/DRILL-6569
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Pritesh Maker
 Fix For: 1.14.0


This is TPCDS Query 19.

Query: 
/root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query19.sql

SELECT i_brand_id  brand_id,
i_brand brand,
i_manufact_id,
i_manufact,
Sum(ss_ext_sales_price) ext_price
FROM   date_dim,
store_sales,
item,
customer,
customer_address,
store
WHERE  d_date_sk = ss_sold_date_sk
AND ss_item_sk = i_item_sk
AND i_manager_id = 38
AND d_moy = 12
AND d_year = 1998
AND ss_customer_sk = c_customer_sk
AND c_current_addr_sk = ca_address_sk
AND Substr(ca_zip, 1, 5) <> Substr(s_zip, 1, 5)
AND ss_store_sk = s_store_sk
GROUP  BY i_brand,
i_brand_id,
i_manufact_id,
i_manufact
ORDER  BY ext_price DESC,
i_brand,
i_brand_id,
i_manufact_id,
i_manufact
LIMIT 100;

Here is the stack trace:
2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
Exception:

java.sql.SQLException: INTERNAL_ERROR ERROR: Can not read value at 2 in block 0 
in file maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet

Fragment 4:26

[Error Id: 6401a71e-7a5d-4a10-a17c-16873fc3239b on atsqa6c88.qa.lab:31010]

  (hive.org.apache.parquet.io.ParquetDecodingException) Can not read value at 2 
in block 0 in file 
maprfs:///drill/testdata/tpcds_sf100/parquet/store_sales/1_13_1.parquet

hive.org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue():243
hive.org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue():227

org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():199

org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next():57

org.apache.drill.exec.store.hive.readers.HiveAbstractReader.hasNextValue():417
org.apache.drill.exec.store.hive.readers.HiveParquetReader.next():54
org.apache.drill.exec.physical.impl.ScanBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276

org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276

org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276

org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276

org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109

[GitHub] sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199306981
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
 ##
 @@ -121,69 +121,73 @@ public StoragePlugin load(StoragePluginConfig config) 
throws Exception {
   }
 
   @Override
-  public void init() throws DrillbitStartupException {
+  public void init() {
 availablePlugins = findAvailablePlugins(classpathScan);
+try {
+  StoragePlugins bootstrapPlugins = pluginSystemTable.getAll().hasNext() ? 
null : loadBootstrapPlugins();
 
-// create registered plugins defined in "storage-plugins.json"
-plugins.putAll(createPlugins());
+  StoragePluginsHandler storagePluginsUpdaterService = new 
StoragePluginsHandlerService(context);
+  storagePluginsUpdaterService.loadPlugins(pluginSystemTable, 
bootstrapPlugins);
+
+  // Defines enabled plugins
+  defineEnabledPlugins();
+} catch (IOException e) {
+  logger.error("Failure setting up storage enabledPlugins.  Drillbit 
exiting.", e);
+  throw new IllegalStateException(e);
+}
   }
 
-  @SuppressWarnings("resource")
-  private Map createPlugins() throws 
DrillbitStartupException {
-try {
-  /*
-   * Check if the storage plugins system table has any entries. If not, 
load the boostrap-storage-plugin file into
-   * the system table.
-   */
-  if (!pluginSystemTable.getAll().hasNext()) {
-// bootstrap load the config since no plugins are stored.
-logger.info("No storage plugin instances configured in persistent 
store, loading bootstrap configuration.");
-Collection urls = 
ClassPathScanner.forResource(ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE, 
false);
-if (urls != null && !urls.isEmpty()) {
-  logger.info("Loading the storage plugin configs from URLs {}.", 
urls);
-  Map pluginURLMap = Maps.newHashMap();
-  for (URL url : urls) {
-String pluginsData = Resources.toString(url, Charsets.UTF_8);
-StoragePlugins plugins = 
lpPersistence.getMapper().readValue(pluginsData, StoragePlugins.class);
-for (Map.Entry config : plugins) {
-  if (!definePluginConfig(config.getKey(), config.getValue())) {
-logger.warn("Duplicate plugin instance '{}' defined in [{}, 
{}], ignoring the later one.",
-config.getKey(), pluginURLMap.get(config.getKey()), url);
-continue;
-  }
-  pluginURLMap.put(config.getKey(), url);
-}
-  }
-} else {
-  throw new IOException("Failure finding " + 
ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE);
-}
+  /**
+   * Read bootstrap storage plugins {@link 
ExecConstants#BOOTSTRAP_STORAGE_PLUGINS_FILE} files for the first fresh
+   * instantiating of Drill
+   *
+   * @return bootstrap storage plugins
+   * @throws IOException if a read error occurs
+   */
+  private StoragePlugins loadBootstrapPlugins() throws IOException {
+// bootstrap load the config since no plugins are stored.
+logger.info("No storage plugin instances configured in persistent store, 
loading bootstrap configuration.");
+Set urls = 
ClassPathScanner.forResource(ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE, 
false);
+if (urls != null && !urls.isEmpty()) {
+  logger.info("Loading the storage plugin configs from URLs {}.", urls);
+  StoragePlugins bootstrapPlugins = new StoragePlugins(new HashMap<>());
+  for (URL url : urls) {
+String pluginsData = Resources.toString(url, Charsets.UTF_8);
+
bootstrapPlugins.putAll(lpPersistence.getMapper().readValue(pluginsData, 
StoragePlugins.class));
 
 Review comment:
   This behavior is different from existing one since here in case of duplicate 
plugin config we will choose the last one whereas previously we were logging a 
warning and considering first seen plugin.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199218902
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsHandler.java
 ##
 @@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.exec.planner.logical.StoragePlugins;
+import org.apache.drill.exec.store.sys.PersistentStore;
+
+
+/**
+ * Storage plugins handler is an additional service for updating storage 
plugins configs from the file
+ */
+public interface StoragePluginsHandler {
+
+  /**
+   * Update incoming storage plugins configs from persistence store if 
present, otherwise bootstrap plugins configs.
+   * One of the params should be null, second shouldn't
+   *
+   * @param bootstrapPlugins bootstrap storage plugins, which are used in case 
of first Drill start up
+   * @param persistentStore the last storage plugins configs from persistence 
store
+   * @return all storage plugins, which should be loaded into persistence store
+   */
+  StoragePlugins updatePlugins(StoragePlugins bootstrapPlugins, 
PersistentStore persistentStore);
 
 Review comment:
   Also I think the signature of the function should be to accept only 
`StoragePlugins` now it's upto caller to call it with bootstrap configuration 
or one inside PersistentStore. Basically the handler with get a initial 
configuration and it will produce a final configuration using that initial 
config and it's own logic.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199305835
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/NamedStoragePluginConfig.java
 ##
 @@ -17,22 +17,51 @@
  */
 package org.apache.drill.exec.store;
 
+import com.fasterxml.jackson.annotation.JsonProperty;
 import org.apache.drill.common.logical.StoragePluginConfig;
 
 import com.fasterxml.jackson.annotation.JsonTypeName;
 
-@JsonTypeName("named")
+@JsonTypeName(NamedStoragePluginConfig.NAME)
 public class NamedStoragePluginConfig extends StoragePluginConfig {
-  public String name;
+
+  public static final String NAME = "named";
+
+  private final String name;
+
+  public NamedStoragePluginConfig(@JsonProperty("name") String name) {
+this.name = name;
+  }
+
+  public String getName() {
+return name;
+  }
 
   @Override
-  public boolean equals(Object o) {
-return this == o;
+  public boolean equals(Object obj) {
+if (this == obj) {
+  return true;
+}
+if (obj == null) {
+  return false;
+}
+if (getClass() != obj.getClass()) {
+  return false;
+}
+NamedStoragePluginConfig other = (NamedStoragePluginConfig) obj;
+if (name == null) {
+  return other.name == null;
+} else {
+  return name.equals(other.name);
+}
   }
 
   @Override
   public int hashCode() {
-return name.hashCode();
+final int prime = 31;
+int result = 1;
+result = prime * result + ((name == null) ? 0 : name.hashCode());
 
 Review comment:
   how about just `return (prime + ((name == null) ? 0 : name.hashCode());`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199305514
 
 

 ##
 File path: 
contrib/storage-kafka/src/main/resources/bootstrap-storage-plugins.json
 ##
 @@ -2,8 +2,8 @@
   "storage":{
 kafka : {
   type:"kafka",
-  enabled: false,
-  kafkaConsumerProps: {"bootstrap.servers":"localhost:9092", "group.id" : 
"drill-consumer"}
+  kafkaConsumerProps: {"bootstrap.servers":"localhost:9092", "group.id" : 
"drill-consumer"},
+  enabled: false
 
 Review comment:
   is there any specific reason why this `enabled` setting is moved to end for 
all the plugins ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199307586
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsHandlerService.java
 ##
 @@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.base.Charsets;
+import com.google.common.io.Resources;
+import com.jasonclawson.jackson.dataformat.hocon.HoconFactory;
+import org.apache.drill.common.config.CommonConstants;
+import org.apache.drill.common.config.LogicalPlanPersistence;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.common.scanner.ClassPathScanner;
+import org.apache.drill.exec.planner.logical.StoragePlugins;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.sys.PersistentStore;
+
+import javax.annotation.Nullable;
+import javax.validation.constraints.NotNull;
+import java.io.IOException;
+import java.net.URL;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Set;
+
+/**
+ * Drill plugins handler, which allows to update storage plugins configs from 
the
+ * {@link CommonConstants#STORAGE_PLUGINS_OVERRIDE_CONF} conf file
+ *
+ * TODO: DRILL-6564: It can be improved with configs versioning and service of 
creating
+ * {@link CommonConstants#STORAGE_PLUGINS_OVERRIDE_CONF}
+ */
+public class StoragePluginsHandlerService implements StoragePluginsHandler {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(StoragePluginsHandlerService.class);
+
+  private final LogicalPlanPersistence hoconLogicalPlanPersistence;
+
+  public StoragePluginsHandlerService(DrillbitContext context) {
+hoconLogicalPlanPersistence = new 
LogicalPlanPersistence(context.getConfig(), context.getClasspathScan(),
+new ObjectMapper(new HoconFactory()));
+  }
+
+  @Override
+  public void loadPlugins(@NotNull PersistentStore 
persistentStore,
+  @Nullable StoragePlugins bootstrapPlugins) {
+// if bootstrapPlugins is not null -- fresh Drill set up
+StoragePlugins pluginsToBeWrittenToPersistentStore;
+
+StoragePlugins newPlugins = getNewStoragePlugins();
+
+if (newPlugins != null) {
+  pluginsToBeWrittenToPersistentStore = new StoragePlugins(new 
HashMap<>());
+  Optional.ofNullable(bootstrapPlugins)
+  .ifPresent(pluginsToBeWrittenToPersistentStore::putAll);
+
+  for (Map.Entry newPlugin : newPlugins) {
+String pluginName = newPlugin.getKey();
+StoragePluginConfig oldPluginConfig = 
Optional.ofNullable(bootstrapPlugins)
+.map(plugins -> plugins.getConfig(pluginName))
+.orElse(persistentStore.get(pluginName));
+StoragePluginConfig updatedStatusPluginConfig = 
updatePluginStatus(oldPluginConfig, newPlugin.getValue());
+pluginsToBeWrittenToPersistentStore.put(pluginName, 
updatedStatusPluginConfig);
+  }
+} else {
+  pluginsToBeWrittenToPersistentStore = bootstrapPlugins;
+}
+
+// load pluginsToBeWrittenToPersistentStore to Persistent Store
+Optional.ofNullable(pluginsToBeWrittenToPersistentStore)
+.ifPresent(plugins -> plugins.forEach(plugin -> 
persistentStore.put(plugin.getKey(), plugin.getValue(;
+  }
+
+  /**
+   * Helper method to identify the enabled status for new storage plugins 
config. If this status is absent in the updater
+   * file, the status is kept from the configs, which are going to be updated
+   *
+   * @param oldPluginConfig current storage plugin config from Persistent 
Store or bootstrap config file
+   * @param newPluginConfig new storage plugin config
+   * @return new storage plugin config with updated enabled status
+   */
+  private StoragePluginConfig updatePluginStatus(@Nullable StoragePluginConfig 
oldPluginConfig,
+ @NotNull StoragePluginConfig 
newPluginConfig) {
+if 

[GitHub] sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
sohami commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199306162
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsHandler.java
 ##
 @@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.exec.planner.logical.StoragePlugins;
+import org.apache.drill.exec.store.sys.PersistentStore;
+
+
+/**
+ * Storage plugins handler is an additional service for updating storage 
plugins configs from the file
+ */
+public interface StoragePluginsHandler {
+
+  /**
+   * Update incoming storage plugins configs from persistence store if 
present, otherwise bootstrap plugins configs.
+   * One of the params should be null, second shouldn't
+   *
+   * @param persistentStore the last storage plugins configs from persistence 
store
+   * @param bootstrapPlugins bootstrap storage plugins, which are used in case 
of first Drill start up
+   * @return all storage plugins, which should be loaded into persistence store
+   */
+  void loadPlugins(PersistentStore persistentStore, 
StoragePlugins bootstrapPlugins);
 
 Review comment:
   It would be cleaner if `loadPlugins` accepts all the `storagePluginConfig` 
instead of store and bootstrapPlugins. The handler should behave as 
`loadPlugins` is getting existing plugin and then returning updated plugins. 
The caller will decide how to pass existing plugin i.e. either from store or 
bootstrap file and what to do with updated plugins.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6568) Jenkins Regression: TPCDS query 68 fails with IllegalStateException: Unexpected EMIT outcome received in buildSchema phase

2018-06-29 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6568:
-

 Summary: Jenkins Regression: TPCDS query 68 fails with 
IllegalStateException: Unexpected EMIT outcome received in buildSchema phase
 Key: DRILL-6568
 URL: https://issues.apache.org/jira/browse/DRILL-6568
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Khurram Faraaz
 Fix For: 1.14.0


This is TPCDS Query 68.

Query: 
/root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf1/original/maprdb/json/query68.sql

SELECT c_last_name,
c_first_name,
ca_city,
bought_city,
ss_ticket_number,
extended_price,
extended_tax,
list_price
FROM   (SELECT ss_ticket_number,
ss_customer_sk,
ca_city bought_city,
Sum(ss_ext_sales_price) extended_price,
Sum(ss_ext_list_price)  list_price,
Sum(ss_ext_tax) extended_tax
FROM   store_sales,
date_dim,
store,
household_demographics,
customer_address
WHERE  store_sales.ss_sold_date_sk = date_dim.d_date_sk
AND store_sales.ss_store_sk = store.s_store_sk
AND store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk
AND store_sales.ss_addr_sk = customer_address.ca_address_sk
AND date_dim.d_dom BETWEEN 1 AND 2
AND ( household_demographics.hd_dep_count = 8
OR household_demographics.hd_vehicle_count = 3 )
AND date_dim.d_year IN ( 1998, 1998 + 1, 1998 + 2 )
AND store.s_city IN ( 'Fairview', 'Midway' )
GROUP  BY ss_ticket_number,
ss_customer_sk,
ss_addr_sk,
ca_city) dn,
customer,
customer_address current_addr
WHERE  ss_customer_sk = c_customer_sk
AND customer.c_current_addr_sk = current_addr.ca_address_sk
AND current_addr.ca_city <> bought_city
ORDER  BY c_last_name,
ss_ticket_number
LIMIT 100;

Here is the stack trace:
2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
Exception:

java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Unexpected EMIT 
outcome received in buildSchema phase

Fragment 0:0

[Error Id: edbe3477-805e-4f1f-8405-d5c194dc28c2 on atsqa6c87.qa.lab:31010]

  (java.lang.IllegalStateException) Unexpected EMIT outcome received in 
buildSchema phase
org.apache.drill.exec.physical.impl.TopN.TopNBatch.buildSchema():178
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():87
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.physical.impl.BaseRootExec.next():103
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
org.apache.drill.exec.physical.impl.BaseRootExec.next():93
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748

at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528)
at 
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:600)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1904)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:64)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:630)
at 
org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1109)
at 

[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199276235
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsUpdater.java
 ##
 @@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.base.Charsets;
+import com.google.common.io.Resources;
+import com.jasonclawson.jackson.dataformat.hocon.HoconFactory;
+import org.apache.drill.common.config.CommonConstants;
+import org.apache.drill.common.config.LogicalPlanPersistence;
+import org.apache.drill.common.exceptions.DrillIOException;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.common.scanner.ClassPathScanner;
+import org.apache.drill.exec.planner.logical.StoragePlugins;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.sys.PersistentStore;
+
+import javax.validation.constraints.NotNull;
+import java.io.IOException;
+import java.net.URL;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Drill plugins handler, which allows to update storage plugins configs from 
the
+ * {@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE} conf file
+ *
+ * TODO: it can be improved with configs versioning and service of creating 
{@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE}
+ */
+public class StoragePluginsUpdater implements StoragePluginsHandler {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(StoragePluginsUpdater.class);
+
+  final DrillbitContext context;
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r198884749
 
 

 ##
 File path: common/src/main/java/org/apache/drill/exec/metrics/DrillMetrics.java
 ##
 @@ -36,11 +36,11 @@
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillMetrics.class);
 
   public static final boolean METRICS_JMX_OUTPUT_ENABLED =
-  SystemPropertyUtil.getBoolean("drill.metrics.jmx.enabled", true);
+  SystemPropertyUtil.getBoolean("drill.exec.metrics.jmx.enabled", true);
 
 Review comment:
   The issues comes from `drill-override-example.conf`:
   
https://github.com/apache/drill/blob/master/distribution/src/resources/drill-override-example.conf#L22
   
https://github.com/apache/drill/blob/master/distribution/src/resources/drill-override-example.conf#L66
   Since drill.apache.org tells the options without `exec` 
[link](https://drill.apache.org/docs/monitoring-metrics/#disabling-drill-metrics)
 I reverted the constants names and changed `drill-override-example.conf`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199073387
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
 ##
 @@ -194,25 +215,21 @@ public void init() throws DrillbitStartupException {
* @param config plugin config
* @param plugin plugin implementation
*/
-
-  public void definePlugin(String name, StoragePluginConfig config, 
StoragePlugin plugin) {
-addPlugin(name, plugin);
-definePluginConfig(name, config);
-  }
-
-  private boolean definePluginConfig(String name, StoragePluginConfig config) {
-return pluginSystemTable.putIfAbsent(name, config);
+  @VisibleForTesting
+  public void addPluginTonPersistenceStoreIfAbsent(String name, 
StoragePluginConfig config, StoragePlugin plugin) {
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r198892158
 
 

 ##
 File path: distribution/src/resources/drill-override-example.conf
 ##
 @@ -251,6 +251,9 @@ drill.exec: {
 protocol: "TLSv1.2",
 #ssl provider. May be "JDK" or "OPENSSL". Default is "JDK"
 provider: "JDK"
+  },
+  storage_plugins.update: {
 
 Review comment:
   The option is removed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r198932079
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
 ##
 @@ -121,71 +117,96 @@ public StoragePlugin load(StoragePluginConfig config) 
throws Exception {
   }
 
   @Override
-  public void init() throws DrillbitStartupException {
+  public void init() {
 availablePlugins = findAvailablePlugins(classpathScan);
 
-// create registered plugins defined in "storage-plugins.json"
-plugins.putAll(createPlugins());
-  }
-
-  @SuppressWarnings("resource")
-  private Map createPlugins() throws 
DrillbitStartupException {
 try {
-  /*
-   * Check if the storage plugins system table has any entries. If not, 
load the boostrap-storage-plugin file into
-   * the system table.
-   */
+  StoragePlugins pluginsToBeWrittenToPersistenceStore = null;
+  final boolean isPluginsUpdaterEnabled = 
context.getConfig().getBoolean(ExecConstants.STORAGE_PLUGINS_UPDATE);
 
 Review comment:
   the option is removed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r198883586
 
 

 ##
 File path: 
common/src/main/java/org/apache/drill/common/exceptions/DrillIOException.java
 ##
 @@ -19,24 +19,23 @@
 
 import java.io.IOException;
 
-public class DrillIOException extends IOException{
+public class DrillIOException extends IOException {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillIOException.class);
 
   public DrillIOException() {
 super();
   }
 
-  public DrillIOException(String message, Throwable cause) {
-super(message, cause);
+  public DrillIOException(Throwable cause, String formatMessage, 
Object...args) {
 
 Review comment:
   It makes sense. Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199073178
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
 ##
 @@ -63,17 +65,15 @@
 import com.google.common.cache.CacheLoader;
 import com.google.common.cache.LoadingCache;
 import com.google.common.cache.RemovalListener;
-import com.google.common.cache.RemovalNotification;
-import com.google.common.collect.Lists;
-import com.google.common.collect.Maps;
-import com.google.common.collect.Sets;
 import com.google.common.io.Resources;
 
 public class StoragePluginRegistryImpl implements StoragePluginRegistry {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(StoragePluginRegistryImpl.class);
 
+  /** Drill supported plugins */
   private Map> availablePlugins = 
Collections.emptyMap();
-  private final StoragePluginMap plugins = new StoragePluginMap();
+  /** Enabled plugins */
 
 Review comment:
   This is not a comment, It is one line JavaDoc for the variable.
   I have changed format to the three lines Java doc


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199073307
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
 ##
 @@ -121,71 +117,96 @@ public StoragePlugin load(StoragePluginConfig config) 
throws Exception {
   }
 
   @Override
-  public void init() throws DrillbitStartupException {
+  public void init() {
 availablePlugins = findAvailablePlugins(classpathScan);
 
-// create registered plugins defined in "storage-plugins.json"
-plugins.putAll(createPlugins());
-  }
-
-  @SuppressWarnings("resource")
-  private Map createPlugins() throws 
DrillbitStartupException {
 try {
-  /*
-   * Check if the storage plugins system table has any entries. If not, 
load the boostrap-storage-plugin file into
-   * the system table.
-   */
+  StoragePlugins pluginsToBeWrittenToPersistenceStore = null;
+  final boolean isPluginsUpdaterEnabled = 
context.getConfig().getBoolean(ExecConstants.STORAGE_PLUGINS_UPDATE);
   if (!pluginSystemTable.getAll().hasNext()) {
-// bootstrap load the config since no plugins are stored.
-logger.info("No storage plugin instances configured in persistent 
store, loading bootstrap configuration.");
-Collection urls = 
ClassPathScanner.forResource(ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE, 
false);
-if (urls != null && !urls.isEmpty()) {
-  logger.info("Loading the storage plugin configs from URLs {}.", 
urls);
-  Map pluginURLMap = Maps.newHashMap();
-  for (URL url : urls) {
-String pluginsData = Resources.toString(url, Charsets.UTF_8);
-StoragePlugins plugins = 
lpPersistence.getMapper().readValue(pluginsData, StoragePlugins.class);
-for (Map.Entry config : plugins) {
-  if (!definePluginConfig(config.getKey(), config.getValue())) {
-logger.warn("Duplicate plugin instance '{}' defined in [{}, 
{}], ignoring the later one.",
-config.getKey(), pluginURLMap.get(config.getKey()), url);
-continue;
-  }
-  pluginURLMap.put(config.getKey(), url);
-}
-  }
-} else {
-  throw new IOException("Failure finding " + 
ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE);
+// create registered plugins defined in "storage-plugins.json"
+StoragePlugins bootstrapPlugins = readBootstrapStoragePlugins();
+if (isPluginsUpdaterEnabled) {
+  StoragePluginsHandler storagePluginsUpdaterService = new 
StoragePluginsUpdater(context);
+  pluginsToBeWrittenToPersistenceStore = 
storagePluginsUpdaterService.updatePlugins(bootstrapPlugins, null);
+}
+if (pluginsToBeWrittenToPersistenceStore == null) {
+  pluginsToBeWrittenToPersistenceStore = bootstrapPlugins;
+}
+  } else {
+if (isPluginsUpdaterEnabled) {
+  StoragePluginsHandler storagePluginsUpdaterService = new 
StoragePluginsUpdater(context);
+  pluginsToBeWrittenToPersistenceStore = 
storagePluginsUpdaterService.updatePlugins(null, pluginSystemTable);
 }
   }
 
-  Map activePlugins = new HashMap();
-  for (Map.Entry entry : 
Lists.newArrayList(pluginSystemTable.getAll())) {
-String name = entry.getKey();
-StoragePluginConfig config = entry.getValue();
-if (config.isEnabled()) {
-  try {
-StoragePlugin plugin = create(name, config);
-activePlugins.put(name, plugin);
-  } catch (ExecutionSetupException e) {
-logger.error("Failure while setting up StoragePlugin with name: 
'{}', disabling.", name, e);
-config.setEnabled(false);
-pluginSystemTable.put(name, config);
-  }
+  // load plugins to persistence store
+  if (pluginsToBeWrittenToPersistenceStore != null) {
+for (Entry plugin : 
pluginsToBeWrittenToPersistenceStore) {
+  pluginSystemTable.put(plugin.getKey(), plugin.getValue());
 }
   }
 
-  activePlugins.put(INFORMATION_SCHEMA_PLUGIN, new 
InfoSchemaStoragePlugin(new InfoSchemaConfig(), context,
-  INFORMATION_SCHEMA_PLUGIN));
-  activePlugins.put(SYS_PLUGIN, new 
SystemTablePlugin(SystemTablePluginConfig.INSTANCE, context, SYS_PLUGIN));
-
-  return activePlugins;
+  defineEnabledPlugins();
 } catch (IOException e) {
-  logger.error("Failure setting up storage plugins.  Drillbit exiting.", 
e);
+  logger.error("Failure setting up storage enabledPlugins.  Drillbit 
exiting.", e);
   throw new IllegalStateException(e);
 }
   }
 
+  /**
+   * Read bootstrap storage plugins {@link 
ExecConstants#BOOTSTRAP_STORAGE_PLUGINS_FILE} files for the first fresh
+   * instantiating of Drill
+   *
+   * @return bootstrap storage plugins
+   * @throws 

[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199075637
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsHandler.java
 ##
 @@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.exec.planner.logical.StoragePlugins;
+import org.apache.drill.exec.store.sys.PersistentStore;
+
+
+/**
+ * Storage plugins handler is an additional service for updating storage 
plugins configs from the file
+ */
+public interface StoragePluginsHandler {
+
+  /**
+   * Update incoming storage plugins configs from persistence store if 
present, otherwise bootstrap plugins configs.
+   * One of the params should be null, second shouldn't
+   *
+   * @param bootstrapPlugins bootstrap storage plugins, which are used in case 
of first Drill start up
+   * @param persistentStore the last storage plugins configs from persistence 
store
+   * @return all storage plugins, which should be loaded into persistence store
+   */
+  StoragePlugins updatePlugins(StoragePlugins bootstrapPlugins, 
PersistentStore persistentStore);
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199079357
 
 

 ##
 File path: distribution/src/deb/control/conffiles
 ##
 @@ -16,3 +16,4 @@
 /etc/drill/conf/drill-override.conf
 /etc/drill/conf/logback.xml
 /etc/drill/conf/drill-env.sh
+/etc/drill/conf/storage-plugins-updates.conf
 
 Review comment:
   It should be removed. Thanks for catching this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199076849
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsUpdater.java
 ##
 @@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.base.Charsets;
+import com.google.common.io.Resources;
+import com.jasonclawson.jackson.dataformat.hocon.HoconFactory;
+import org.apache.drill.common.config.CommonConstants;
+import org.apache.drill.common.config.LogicalPlanPersistence;
+import org.apache.drill.common.exceptions.DrillIOException;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.common.scanner.ClassPathScanner;
+import org.apache.drill.exec.planner.logical.StoragePlugins;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.sys.PersistentStore;
+
+import javax.validation.constraints.NotNull;
+import java.io.IOException;
+import java.net.URL;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Drill plugins handler, which allows to update storage plugins configs from 
the
+ * {@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE} conf file
+ *
+ * TODO: it can be improved with configs versioning and service of creating 
{@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE}
+ */
+public class StoragePluginsUpdater implements StoragePluginsHandler {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(StoragePluginsUpdater.class);
+
+  final DrillbitContext context;
+  final LogicalPlanPersistence hoconLogicalPlanPersistence;
+
+  public StoragePluginsUpdater(DrillbitContext context) {
+this.context = context;
+hoconLogicalPlanPersistence = new 
LogicalPlanPersistence(context.getConfig(), context.getClasspathScan(),
+new ObjectMapper(new HoconFactory()));
+  }
+
+  @Override
+  public StoragePlugins updatePlugins(StoragePlugins bootstrapPlugins, 
PersistentStore persistentStore) {
+StoragePlugins plugins = null;
+Set urlSet = 
ClassPathScanner.forResource(CommonConstants.STORAGE_PLUGINS_UPDATER_FILE, 
false);
+try {
+  if (!urlSet.isEmpty()) {
+if (urlSet.size() != 1) {
+  throw new DrillIOException("More than one %s file is placed in 
Drill's classpath",
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199073271
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
 ##
 @@ -121,71 +117,96 @@ public StoragePlugin load(StoragePluginConfig config) 
throws Exception {
   }
 
   @Override
-  public void init() throws DrillbitStartupException {
+  public void init() {
 availablePlugins = findAvailablePlugins(classpathScan);
 
-// create registered plugins defined in "storage-plugins.json"
-plugins.putAll(createPlugins());
-  }
-
-  @SuppressWarnings("resource")
-  private Map createPlugins() throws 
DrillbitStartupException {
 try {
-  /*
-   * Check if the storage plugins system table has any entries. If not, 
load the boostrap-storage-plugin file into
-   * the system table.
-   */
+  StoragePlugins pluginsToBeWrittenToPersistenceStore = null;
+  final boolean isPluginsUpdaterEnabled = 
context.getConfig().getBoolean(ExecConstants.STORAGE_PLUGINS_UPDATE);
   if (!pluginSystemTable.getAll().hasNext()) {
-// bootstrap load the config since no plugins are stored.
-logger.info("No storage plugin instances configured in persistent 
store, loading bootstrap configuration.");
-Collection urls = 
ClassPathScanner.forResource(ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE, 
false);
-if (urls != null && !urls.isEmpty()) {
-  logger.info("Loading the storage plugin configs from URLs {}.", 
urls);
-  Map pluginURLMap = Maps.newHashMap();
-  for (URL url : urls) {
-String pluginsData = Resources.toString(url, Charsets.UTF_8);
-StoragePlugins plugins = 
lpPersistence.getMapper().readValue(pluginsData, StoragePlugins.class);
-for (Map.Entry config : plugins) {
-  if (!definePluginConfig(config.getKey(), config.getValue())) {
-logger.warn("Duplicate plugin instance '{}' defined in [{}, 
{}], ignoring the later one.",
-config.getKey(), pluginURLMap.get(config.getKey()), url);
-continue;
-  }
-  pluginURLMap.put(config.getKey(), url);
-}
-  }
-} else {
-  throw new IOException("Failure finding " + 
ExecConstants.BOOTSTRAP_STORAGE_PLUGINS_FILE);
+// create registered plugins defined in "storage-plugins.json"
+StoragePlugins bootstrapPlugins = readBootstrapStoragePlugins();
+if (isPluginsUpdaterEnabled) {
+  StoragePluginsHandler storagePluginsUpdaterService = new 
StoragePluginsUpdater(context);
+  pluginsToBeWrittenToPersistenceStore = 
storagePluginsUpdaterService.updatePlugins(bootstrapPlugins, null);
+}
+if (pluginsToBeWrittenToPersistenceStore == null) {
+  pluginsToBeWrittenToPersistenceStore = bootstrapPlugins;
+}
+  } else {
+if (isPluginsUpdaterEnabled) {
+  StoragePluginsHandler storagePluginsUpdaterService = new 
StoragePluginsUpdater(context);
+  pluginsToBeWrittenToPersistenceStore = 
storagePluginsUpdaterService.updatePlugins(null, pluginSystemTable);
 }
   }
 
-  Map activePlugins = new HashMap();
-  for (Map.Entry entry : 
Lists.newArrayList(pluginSystemTable.getAll())) {
-String name = entry.getKey();
-StoragePluginConfig config = entry.getValue();
-if (config.isEnabled()) {
-  try {
-StoragePlugin plugin = create(name, config);
-activePlugins.put(name, plugin);
-  } catch (ExecutionSetupException e) {
-logger.error("Failure while setting up StoragePlugin with name: 
'{}', disabling.", name, e);
-config.setEnabled(false);
-pluginSystemTable.put(name, config);
-  }
+  // load plugins to persistence store
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199276413
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsUpdater.java
 ##
 @@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.base.Charsets;
+import com.google.common.io.Resources;
+import com.jasonclawson.jackson.dataformat.hocon.HoconFactory;
+import org.apache.drill.common.config.CommonConstants;
+import org.apache.drill.common.config.LogicalPlanPersistence;
+import org.apache.drill.common.exceptions.DrillIOException;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.common.scanner.ClassPathScanner;
+import org.apache.drill.exec.planner.logical.StoragePlugins;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.sys.PersistentStore;
+
+import javax.validation.constraints.NotNull;
+import java.io.IOException;
+import java.net.URL;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Drill plugins handler, which allows to update storage plugins configs from 
the
+ * {@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE} conf file
+ *
+ * TODO: it can be improved with configs versioning and service of creating 
{@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE}
+ */
+public class StoragePluginsUpdater implements StoragePluginsHandler {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(StoragePluginsUpdater.class);
+
+  final DrillbitContext context;
+  final LogicalPlanPersistence hoconLogicalPlanPersistence;
+
+  public StoragePluginsUpdater(DrillbitContext context) {
+this.context = context;
+hoconLogicalPlanPersistence = new 
LogicalPlanPersistence(context.getConfig(), context.getClasspathScan(),
+new ObjectMapper(new HoconFactory()));
+  }
+
+  @Override
+  public StoragePlugins updatePlugins(StoragePlugins bootstrapPlugins, 
PersistentStore persistentStore) {
+StoragePlugins plugins = null;
+Set urlSet = 
ClassPathScanner.forResource(CommonConstants.STORAGE_PLUGINS_UPDATER_FILE, 
false);
+try {
+  if (!urlSet.isEmpty()) {
+if (urlSet.size() != 1) {
+  throw new DrillIOException("More than one %s file is placed in 
Drill's classpath",
+  CommonConstants.STORAGE_PLUGINS_UPDATER_FILE);
+}
+URL fileUrl = urlSet.iterator().next();
+plugins = new StoragePlugins(new HashMap<>());
+if (bootstrapPlugins != null) {
+  plugins.putAll(bootstrapPlugins);
+}
+
+String newPluginsData = Resources.toString(fileUrl, Charsets.UTF_8);
+StoragePlugins newPlugins = 
hoconLogicalPlanPersistence.getMapper().readValue(newPluginsData, 
StoragePlugins.class);
+
+for (Map.Entry newPlugin : newPlugins) {
+  String pluginName = newPlugin.getKey();
+  StoragePluginConfig oldPluginConfig = persistentStore == null ? 
bootstrapPlugins.getConfig(pluginName) :
+  persistentStore.get(pluginName);
+  StoragePluginConfig updatedStatusPluginConfig =
+  determineUndefinedEnabledStatus(oldPluginConfig, 
newPlugin.getValue());
+  plugins.put(pluginName, updatedStatusPluginConfig);
+}
+  }
+} catch (IOException e) {
+  logger.error("Failures are obtained while updating storage plugins from 
%s file. Proceed without updating",
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r198886347
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveSchemaFactory.java
 ##
 @@ -72,29 +71,27 @@ public HiveSchemaFactory(final HiveStoragePlugin plugin, 
final String name, fina
 isDrillImpersonationEnabled = 
plugin.getContext().getConfig().getBoolean(ExecConstants.IMPERSONATION_ENABLED);
 
 try {
+  // TODO: DRILL-6412. Clients for plugin should be instantiated only for 
the case, when plugin is enabled
   processUserMetastoreClient =
   DrillHiveMetaStoreClient.createCloseableClientWithCaching(hiveConf);
-} catch (MetaException e) {
-  throw new ExecutionSetupException("Failure setting up Hive metastore 
client.", e);
-}
 
-metaStoreClientLoadingCache = CacheBuilder
-.newBuilder()
-.expireAfterAccess(10, TimeUnit.MINUTES)
-.maximumSize(5) // Up to 5 clients for impersonation-enabled.
-.removalListener(new RemovalListener() {
-  @Override
-  public void onRemoval(RemovalNotification notification) {
+  metaStoreClientLoadingCache = CacheBuilder
+  .newBuilder()
+  .expireAfterAccess(10, TimeUnit.MINUTES)
+  .maximumSize(5) // Up to 5 clients for impersonation-enabled.
+  .removalListener((RemovalListener) 
notification -> {
 DrillHiveMetaStoreClient client = notification.getValue();
 client.close();
-  }
-})
-.build(new CacheLoader() {
-  @Override
-  public DrillHiveMetaStoreClient load(String userName) throws 
Exception {
-return 
DrillHiveMetaStoreClient.createClientWithAuthz(processUserMetastoreClient, 
hiveConf, userName);
-  }
-});
+  })
+  .build(new CacheLoader() {
+@Override
+public DrillHiveMetaStoreClient load(String userName) throws 
Exception {
+  return 
DrillHiveMetaStoreClient.createClientWithAuthz(processUserMetastoreClient, 
hiveConf, userName);
+}
+  });
+} catch (MetaException e) {
 
 Review comment:
   If the exception is obtained the further logic from constructor doesn't make 
sense.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199270039
 
 

 ##
 File path: distribution/src/resources/storage-plugins-example.conf
 ##
 @@ -0,0 +1,66 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This file involves storage plugins configs, which can be updated on the 
Drill start-up. This is controlled with a
+# "drill.exec.storage_plugins.update.enabled" boot option, see 
https://drill.apache.org/docs/start-up-options/.
+# This file is in HOCON format, see 
https://github.com/typesafehub/config/blob/master/HOCON.md for more information.
+
+  "storage":{
+cp: {
+  type: "file",
+  connection: "classpath:///",
+  formats: {
+"csv" : {
+  type: "text",
+  extensions: [ "csv" ],
+  delimiter: ","
+}
+  }
+}
+  }
+  "storage":{
+dfs: {
+  type: "file",
+  enabled: false,
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199093883
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/NamedStoragePluginConfig.java
 ##
 @@ -25,6 +25,14 @@
 public class NamedStoragePluginConfig extends StoragePluginConfig {
   public String name;
 
+  public String getName() {
+return name;
+  }
+
+  public void setName(String name) {
+this.name = name;
+  }
+
 
 Review comment:
   It is not my code, therefore I didn't pay attention to this.
   I have edited these methods.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199266170
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistry.java
 ##
 @@ -83,15 +84,13 @@
* @param name
* @param plugin
*/
-  void addPlugin(String name, StoragePlugin plugin);
+  void addPluginToEnabled(String name, StoragePlugin plugin);
 
 Review comment:
   It is not the same: this method adds one plugin to existing Map of enabled 
plugins.
   I suppose `addEnabledPlugin` looks good.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199276671
 
 

 ##
 File path: exec/java-exec/src/main/resources/drill-module.conf
 ##
 @@ -393,7 +393,11 @@ drill.exec: {
   //port hunting for drillbits. Enabled only for testing purposes.
   port_hunt : false,
   // Allow drillbit to bind to loopback address in distributed mode. Enabled 
only for testing purposes.
-  allow_loopback_address_binding : false
+  allow_loopback_address_binding : false,
+  // Allow overwrite storage plugins configs from the storage-plugins.conf 
file in the process of drillbit starting
+  storage_plugins.update: {
+enabled: true
 
 Review comment:
   the option is removed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199276599
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginsUpdater.java
 ##
 @@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.common.base.Charsets;
+import com.google.common.io.Resources;
+import com.jasonclawson.jackson.dataformat.hocon.HoconFactory;
+import org.apache.drill.common.config.CommonConstants;
+import org.apache.drill.common.config.LogicalPlanPersistence;
+import org.apache.drill.common.exceptions.DrillIOException;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.common.scanner.ClassPathScanner;
+import org.apache.drill.exec.planner.logical.StoragePlugins;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.sys.PersistentStore;
+
+import javax.validation.constraints.NotNull;
+import java.io.IOException;
+import java.net.URL;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Drill plugins handler, which allows to update storage plugins configs from 
the
+ * {@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE} conf file
+ *
+ * TODO: it can be improved with configs versioning and service of creating 
{@link CommonConstants#STORAGE_PLUGINS_UPDATER_FILE}
+ */
+public class StoragePluginsUpdater implements StoragePluginsHandler {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(StoragePluginsUpdater.class);
+
+  final DrillbitContext context;
+  final LogicalPlanPersistence hoconLogicalPlanPersistence;
+
+  public StoragePluginsUpdater(DrillbitContext context) {
+this.context = context;
+hoconLogicalPlanPersistence = new 
LogicalPlanPersistence(context.getConfig(), context.getClasspathScan(),
+new ObjectMapper(new HoconFactory()));
+  }
+
+  @Override
+  public StoragePlugins updatePlugins(StoragePlugins bootstrapPlugins, 
PersistentStore persistentStore) {
+StoragePlugins plugins = null;
+Set urlSet = 
ClassPathScanner.forResource(CommonConstants.STORAGE_PLUGINS_UPDATER_FILE, 
false);
+try {
+  if (!urlSet.isEmpty()) {
+if (urlSet.size() != 1) {
+  throw new DrillIOException("More than one %s file is placed in 
Drill's classpath",
+  CommonConstants.STORAGE_PLUGINS_UPDATER_FILE);
+}
+URL fileUrl = urlSet.iterator().next();
+plugins = new StoragePlugins(new HashMap<>());
+if (bootstrapPlugins != null) {
+  plugins.putAll(bootstrapPlugins);
+}
+
+String newPluginsData = Resources.toString(fileUrl, Charsets.UTF_8);
+StoragePlugins newPlugins = 
hoconLogicalPlanPersistence.getMapper().readValue(newPluginsData, 
StoragePlugins.class);
+
+for (Map.Entry newPlugin : newPlugins) {
+  String pluginName = newPlugin.getKey();
+  StoragePluginConfig oldPluginConfig = persistentStore == null ? 
bootstrapPlugins.getConfig(pluginName) :
+  persistentStore.get(pluginName);
+  StoragePluginConfig updatedStatusPluginConfig =
+  determineUndefinedEnabledStatus(oldPluginConfig, 
newPlugin.getValue());
+  plugins.put(pluginName, updatedStatusPluginConfig);
+}
+  }
+} catch (IOException e) {
+  logger.error("Failures are obtained while updating storage plugins from 
%s file. Proceed without updating",
+  CommonConstants.STORAGE_PLUGINS_UPDATER_FILE, e);
+  return bootstrapPlugins;
+}
+return plugins;
+  }
+
+  /**
+   * Helper method to identify the enabled status for new storage plugins 
config. If this status is absent in the updater
+   * file, the status is kept from the configs, which are going to be updated
+   *
+   * @param oldPluginConfig current storage plugin config from persistence 
store or bootstrap config file
+   * @param newPluginConfig new storage plugin config
+   * @return new storage plugin config with updated enabled status
+   */
+  private 

[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199104274
 
 

 ##
 File path: distribution/src/resources/storage-plugins-example.conf
 ##
 @@ -0,0 +1,66 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This file involves storage plugins configs, which can be updated on the 
Drill start-up. This is controlled with a
+# "drill.exec.storage_plugins.update.enabled" boot option, see 
https://drill.apache.org/docs/start-up-options/.
+# This file is in HOCON format, see 
https://github.com/typesafehub/config/blob/master/HOCON.md for more information.
+
+  "storage":{
+cp: {
+  type: "file",
+  connection: "classpath:///",
+  formats: {
+"csv" : {
+  type: "text",
+  extensions: [ "csv" ],
+  delimiter: ","
+}
+  }
+}
+  }
+  "storage":{
+dfs: {
+  type: "file",
+  enabled: false,
+  connection: "hdfs:///",
+  workspaces: {
+"root": {
+  "location": "/",
+  "writable": false,
+  "defaultInputFormat": null,
+  "allowAccessOutsideWorkspace": false
+}
+  },
+  formats: {
+"parquet": {
+  "type": "parquet"
+}
+  }
+}
+  }
+  "storage":{
+mongo : {
+  type:"mongo",
+  enabled: true,
 
 Review comment:
   Actually this is the same. 
   
https://github.com/apache/drill/blob/master/contrib/storage-mongo/src/main/resources/bootstrap-storage-plugins.json#L5
   
   I have edited it here and in all bootstrap-storage-plugins for consistency. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r198882971
 
 

 ##
 File path: 
common/src/main/java/org/apache/drill/common/config/CommonConstants.java
 ##
 @@ -31,4 +31,7 @@
   /** Override configuration file name.  (Classpath resource pathname.) */
   String CONFIG_OVERRIDE_RESOURCE_PATHNAME = "drill-override.conf";
 
+  /** Override plugins configs file name.  (Classpath resource pathname.) */
+  String STORAGE_PLUGINS_UPDATER_FILE = "storage-plugins.conf";
 
 Review comment:
   looks good. Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r198931779
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java
 ##
 @@ -248,6 +248,12 @@ private ExecConstants() {
*/
   public static final String DEFAULT_TEMPORARY_WORKSPACE = 
"drill.exec.default_temporary_workspace";
 
+  /**
+   * Storage plugins configs will be updated in the case, when
+   * {@link 
org.apache.drill.common.config.CommonConstants#STORAGE_PLUGINS_UPDATER_FILE} 
file is present
+   */
+  public static final String STORAGE_PLUGINS_UPDATE = 
"drill.exec.storage_plugins.update.enabled";
 
 Review comment:
   The option isn't really helpful. Removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r198928608
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginMap.java
 ##
 @@ -111,7 +112,7 @@ public StoragePlugin get(String name) {
 return nameMap.entrySet().iterator();
   }
 
-  public Iterable names() {
+  public Set names() {
 
 Review comment:
   At least `Сollection` is needed for creating new Set from existing.
   It is a requirement for plugins names, `Set` guarantees avoiding of 
duplicates of names for plugins.
   I have renamed it to `getNames()` and added javadoc 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199077537
 
 

 ##
 File path: 
logical/src/main/java/org/apache/drill/common/config/LogicalPlanPersistence.java
 ##
 @@ -37,12 +37,12 @@
 public class LogicalPlanPersistence {
   private ObjectMapper mapper;
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r198928698
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
 ##
 @@ -121,71 +117,96 @@ public StoragePlugin load(StoragePluginConfig config) 
throws Exception {
   }
 
   @Override
-  public void init() throws DrillbitStartupException {
+  public void init() {
 availablePlugins = findAvailablePlugins(classpathScan);
 
-// create registered plugins defined in "storage-plugins.json"
-plugins.putAll(createPlugins());
-  }
-
-  @SuppressWarnings("resource")
-  private Map createPlugins() throws 
DrillbitStartupException {
 try {
-  /*
-   * Check if the storage plugins system table has any entries. If not, 
load the boostrap-storage-plugin file into
-   * the system table.
-   */
+  StoragePlugins pluginsToBeWrittenToPersistenceStore = null;
 
 Review comment:
   renamed and the variable is placed in `StoragePluginsHandlerService` class


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r199105919
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/NamedStoragePluginConfig.java
 ##
 @@ -25,6 +25,14 @@
 public class NamedStoragePluginConfig extends StoragePluginConfig {
   public String name;
 
+  public String getName() {
 
 Review comment:
   Good suggestion, done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins Handler

2018-06-29 Thread GitBox
vdiravka commented on a change in pull request #1345: DRILL-6494: Drill Plugins 
Handler
URL: https://github.com/apache/drill/pull/1345#discussion_r198926537
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistry.java
 ##
 @@ -83,15 +84,13 @@
* @param name
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container

2018-06-29 Thread GitBox
Agirish commented on a change in pull request #1348: DRILL-6346: Create an 
Official Drill Docker Container
URL: https://github.com/apache/drill/pull/1348#discussion_r199230386
 
 

 ##
 File path: distribution/pom.xml
 ##
 @@ -485,6 +485,36 @@
 
   
 
+
+  
+  apache-release
+  
+
+  
+com.spotify
+dockerfile-maven-plugin
+1.4.3
+
+  
+docker-image
+
 
 Review comment:
   We can get rid of 'push' if you'd like. 'build' is good enough for most 
purposes. I'm not sure about 'package'. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kkhatua commented on issue #1350: DRILL-4580: Support for exporting storage plugin configurations

2018-06-29 Thread GitBox
kkhatua commented on issue #1350: DRILL-4580: Support for exporting storage 
plugin configurations
URL: https://github.com/apache/drill/pull/1350#issuecomment-401503825
 
 
   I'd be more comfortable with single plugins to avoid any possible corruption 
(e.g. format, etc) as debugging could be a nightmare. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ppadma commented on issue #1324: DRILL-6310: limit batch size for hash aggregate

2018-06-29 Thread GitBox
ppadma commented on issue #1324: DRILL-6310: limit batch size for hash aggregate
URL: https://github.com/apache/drill/pull/1324#issuecomment-401498074
 
 
   @Ben-Zvi Thank you Boaz for the thorough review and giving me a +1 finally. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6567) Jenkins Regression: TPCDS query 93 fails with INTERNAL_ERROR ERROR: java.lang.reflect.UndeclaredThrowableException.

2018-06-29 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6567:
-

 Summary: Jenkins Regression: TPCDS query 93 fails with 
INTERNAL_ERROR ERROR: java.lang.reflect.UndeclaredThrowableException.
 Key: DRILL-6567
 URL: https://issues.apache.org/jira/browse/DRILL-6567
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Pritesh Maker
 Fix For: 1.14.0


This is TPCDS Query 93.

Query: 
/root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/hive/parquet/query93.sql

SELECT ss_customer_sk,
Sum(act_sales) sumsales
FROM   (SELECT ss_item_sk,
ss_ticket_number,
ss_customer_sk,
CASE
WHEN sr_return_quantity IS NOT NULL THEN
( ss_quantity - sr_return_quantity ) * ss_sales_price
ELSE ( ss_quantity * ss_sales_price )
END act_sales
FROM   store_sales
LEFT OUTER JOIN store_returns
ON ( sr_item_sk = ss_item_sk
AND sr_ticket_number = ss_ticket_number ),
reason
WHERE  sr_reason_sk = r_reason_sk
AND r_reason_desc = 'reason 38') t
GROUP  BY ss_customer_sk
ORDER  BY sumsales,
ss_customer_sk
LIMIT 100;

Here is the stack trace:
2018-06-29 07:00:32 INFO  DrillTestLogger:348 - 
Exception:

java.sql.SQLException: INTERNAL_ERROR ERROR: 
java.lang.reflect.UndeclaredThrowableException

Setup failed for null
Fragment 4:56

[Error Id: 3c72c14d-9362-4a9b-affb-5cf937bed89e on atsqa6c82.qa.lab:31010]

  (org.apache.drill.common.exceptions.ExecutionSetupException) 
java.lang.reflect.UndeclaredThrowableException

org.apache.drill.common.exceptions.ExecutionSetupException.fromThrowable():30
org.apache.drill.exec.store.hive.readers.HiveAbstractReader.setup():327
org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():245
org.apache.drill.exec.physical.impl.ScanBatch.next():164
org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276

org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch():276

org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides():238
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema():218
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():147
org.apache.drill.exec.record.AbstractRecordBatch.next():172
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema():118
org.apache.drill.exec.record.AbstractRecordBatch.next():152
org.apache.drill.exec.physical.impl.BaseRootExec.next():103

org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():152
org.apache.drill.exec.physical.impl.BaseRootExec.next():93
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748
  Caused By (java.util.concurrent.ExecutionException) 
java.lang.reflect.UndeclaredThrowableException
java.util.concurrent.FutureTask.report():122
java.util.concurrent.FutureTask.get():192
org.apache.drill.exec.store.hive.readers.HiveAbstractReader.setup():320
org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas():245
org.apache.drill.exec.physical.impl.ScanBatch.next():164
org.apache.drill.exec.record.AbstractRecordBatch.next():119


[jira] [Created] (DRILL-6566) Jenkins Regression: TPCDS query 66 fails with RESOURCE ERROR: One or more nodes ran out of memory while executing the query. AGGR OOM at First Phase.

2018-06-29 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6566:
-

 Summary: Jenkins Regression: TPCDS query 66 fails with RESOURCE 
ERROR: One or more nodes ran out of memory while executing the query.  AGGR OOM 
at First Phase.
 Key: DRILL-6566
 URL: https://issues.apache.org/jira/browse/DRILL-6566
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Pritesh Maker
 Fix For: 1.14.0


This is TPCDS Query 66.

Query: 
/root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf1/hive-generated-parquet/hive1_native/query66.sql

SELECT w_warehouse_name,
w_warehouse_sq_ft,
w_city,
w_county,
w_state,
w_country,
ship_carriers,
year1,
Sum(jan_sales) AS jan_sales,
Sum(feb_sales) AS feb_sales,
Sum(mar_sales) AS mar_sales,
Sum(apr_sales) AS apr_sales,
Sum(may_sales) AS may_sales,
Sum(jun_sales) AS jun_sales,
Sum(jul_sales) AS jul_sales,
Sum(aug_sales) AS aug_sales,
Sum(sep_sales) AS sep_sales,
Sum(oct_sales) AS oct_sales,
Sum(nov_sales) AS nov_sales,
Sum(dec_sales) AS dec_sales,
Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot,
Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot,
Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot,
Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot,
Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot,
Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot,
Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot,
Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot,
Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot,
Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot,
Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot,
Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot,
Sum(jan_net)   AS jan_net,
Sum(feb_net)   AS feb_net,
Sum(mar_net)   AS mar_net,
Sum(apr_net)   AS apr_net,
Sum(may_net)   AS may_net,
Sum(jun_net)   AS jun_net,
Sum(jul_net)   AS jul_net,
Sum(aug_net)   AS aug_net,
Sum(sep_net)   AS sep_net,
Sum(oct_net)   AS oct_net,
Sum(nov_net)   AS nov_net,
Sum(dec_net)   AS dec_net
FROM   (SELECT w_warehouse_name,
w_warehouse_sq_ft,
w_city,
w_county,
w_state,
w_country,
'ZOUROS'
|| ','
|| 'ZHOU' AS ship_carriers,
d_yearAS year1,
Sum(CASE
WHEN d_moy = 1 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS jan_sales,
Sum(CASE
WHEN d_moy = 2 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS feb_sales,
Sum(CASE
WHEN d_moy = 3 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS mar_sales,
Sum(CASE
WHEN d_moy = 4 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS apr_sales,
Sum(CASE
WHEN d_moy = 5 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS may_sales,
Sum(CASE
WHEN d_moy = 6 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS jun_sales,
Sum(CASE
WHEN d_moy = 7 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS jul_sales,
Sum(CASE
WHEN d_moy = 8 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS aug_sales,
Sum(CASE
WHEN d_moy = 9 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS sep_sales,
Sum(CASE
WHEN d_moy = 10 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS oct_sales,
Sum(CASE
WHEN d_moy = 11 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS nov_sales,
Sum(CASE
WHEN d_moy = 12 THEN ws_ext_sales_price * ws_quantity
ELSE 0
END)  AS dec_sales,
Sum(CASE
WHEN d_moy = 1 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS jan_net,
Sum(CASE
WHEN d_moy = 2 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS feb_net,
Sum(CASE
WHEN d_moy = 3 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS mar_net,
Sum(CASE
WHEN d_moy = 4 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS apr_net,
Sum(CASE
WHEN d_moy = 5 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS may_net,
Sum(CASE
WHEN d_moy = 6 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS jun_net,
Sum(CASE
WHEN d_moy = 7 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS jul_net,
Sum(CASE
WHEN d_moy = 8 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS aug_net,
Sum(CASE
WHEN d_moy = 9 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS sep_net,
Sum(CASE
WHEN d_moy = 10 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS oct_net,
Sum(CASE
WHEN d_moy = 11 THEN ws_net_paid_inc_ship * ws_quantity
ELSE 0
END)  AS nov_net,
Sum(CASE
WHEN d_moy = 12 THEN 

[GitHub] Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate

2018-06-29 Thread GitBox
Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch 
size for hash aggregate
URL: https://github.com/apache/drill/pull/1324#discussion_r199289952
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java
 ##
 @@ -694,7 +694,7 @@ public PutStatus put(int incomingRowIdx, IndexPointer 
htIdxHolder, int hashCode,
 }
 htIdxHolder.value = currentIdx;
 return  addedBatch ? PutStatus.NEW_BATCH_ADDED :
-(freeIndex + 1 > totalIndexSize) ?
+(freeIndex + 1 > prevIndexSize + 
batchHolders.get(batchHolders.size()-1).getTargetBatchRowCount()) ?
 
 Review comment:
   `prevIndexSize` is used in two places, here and when checking if a new batch 
is needed. Both use it the same way (by adding the size of the last batch). 
Both places are part of the **HOT** code, called for every row.
  So maybe if this variable could be set to include the size of the last 
batch, a lot of computations would be saved.
  Will it be simple?  Need to know the size of the last batch when setting 
this variable.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] priteshm commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch Sizing functionality

2018-06-29 Thread GitBox
priteshm commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch 
Sizing functionality
URL: https://github.com/apache/drill/pull/1330#issuecomment-401478271
 
 
   I do see that a similar approach is followed for other operators for batch 
sizing. I do agree with @ilooner that it would be good to discuss alternative 
proposals on a separate JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] priteshm commented on issue #1272: DRILL-5977: Filter Pushdown in Drill-Kafka plugin

2018-06-29 Thread GitBox
priteshm commented on issue #1272: DRILL-5977: Filter Pushdown in Drill-Kafka 
plugin
URL: https://github.com/apache/drill/pull/1272#issuecomment-401474974
 
 
   @akumarb2010 is this ready to be committed? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6565) cume_dist does not return enough rows

2018-06-29 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6565:
-

 Summary: cume_dist does not return enough rows
 Key: DRILL-6565
 URL: https://issues.apache.org/jira/browse/DRILL-6565
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Pritesh Maker
 Attachments: drillbit.log.7802

This query should return 64 rows but only returns 38 rows:
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.width.max_per_query` = 1;
select * from (
select cume_dist() over (order by Index) IntervalSecondValuea, Index from 
(select * from 
dfs.`/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB_1GB.parquet` order 
by BigIntvalue)) d where d.Index = 1;

I tried to reproduce the problem by using a smaller table, but it does not 
reproduce.  I tried to reproduce the problem without the outside select 
statement, but it does not reproduce.

Here is the explain plan:
{noformat}
| 00-00Screen : rowType = RecordType(DOUBLE IntervalSecondValuea, ANY 
Index): rowcount = 12000.0, cumulative cost = {757200.0 rows, 
1.1573335922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4034
00-01  ProjectAllowDup(IntervalSecondValuea=[$0], Index=[$1]) : rowType = 
RecordType(DOUBLE IntervalSecondValuea, ANY Index): rowcount = 12000.0, 
cumulative cost = {756000.0 rows, 1.1572135922911648E7 cpu, 0.0 io, 0.0 
network, 192.0 memory}, id = 4033
00-02Project(w0$o0=[$1], $0=[$0]) : rowType = RecordType(DOUBLE w0$o0, 
ANY $0): rowcount = 12000.0, cumulative cost = {744000.0 rows, 
1.1548135922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4032
00-03  SelectionVectorRemover : rowType = RecordType(ANY $0, DOUBLE 
w0$o0): rowcount = 12000.0, cumulative cost = {732000.0 rows, 
1.1524135922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4031
00-04Filter(condition=[=($0, 1)]) : rowType = RecordType(ANY $0, 
DOUBLE w0$o0): rowcount = 12000.0, cumulative cost = {72.0 rows, 
1.1512135922911648E7 cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4030
00-05  Window(window#0=[window(partition {} order by [0] range 
between UNBOUNDED PRECEDING and CURRENT ROW aggs [CUME_DIST()])]) : rowType = 
RecordType(ANY $0, DOUBLE w0$o0): rowcount = 8.0, cumulative cost = 
{64.0 rows, 1.1144135922911648E7 cpu, 0.0 io, 0.0 network, 192.0 
memory}, id = 4029
00-06SelectionVectorRemover : rowType = RecordType(ANY $0): 
rowcount = 8.0, cumulative cost = {56.0 rows, 1.0984135922911648E7 cpu, 
0.0 io, 0.0 network, 192.0 memory}, id = 4028
00-07  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
$0): rowcount = 8.0, cumulative cost = {48.0 rows, 1.0904135922911648E7 
cpu, 0.0 io, 0.0 network, 192.0 memory}, id = 4027
00-08Project($0=[ITEM($0, 'Index')]) : rowType = 
RecordType(ANY $0): rowcount = 8.0, cumulative cost = {40.0 rows, 
5692067.961455824 cpu, 0.0 io, 0.0 network, 128.0 memory}, id = 4026
00-09  SelectionVectorRemover : rowType = 
RecordType(DYNAMIC_STAR T2¦¦**, ANY BigIntvalue): rowcount = 8.0, 
cumulative cost = {32.0 rows, 5612067.961455824 cpu, 0.0 io, 0.0 network, 
128.0 memory}, id = 4025
00-10Sort(sort0=[$1], dir0=[ASC]) : rowType = 
RecordType(DYNAMIC_STAR T2¦¦**, ANY BigIntvalue): rowcount = 8.0, 
cumulative cost = {24.0 rows, 5532067.961455824 cpu, 0.0 io, 0.0 network, 
128.0 memory}, id = 4024
00-11  Project(T2¦¦**=[$0], BigIntvalue=[$1]) : rowType 
= RecordType(DYNAMIC_STAR T2¦¦**, ANY BigIntvalue): rowcount = 8.0, 
cumulative cost = {16.0 rows, 32.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 4023
00-12Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB_1GB.parquet]],
 
selectionRoot=maprfs:/drill/testdata/batch_memory/fourvarchar_asc_nulls_16MB_1GB.parquet,
 numFiles=1, numRowGroups=6, usedMetadataFile=false, columns=[`**`]]]) : 
rowType = RecordType(DYNAMIC_STAR **, ANY BigIntvalue): rowcount = 8.0, 
cumulative cost = {8.0 rows, 16.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 4022
{noformat}

I have attached the drillbit.log.

The commit id is:
| 1.14.0-SNAPSHOT  | aa127b70b1e46f7f4aa19881f25eda583627830a  | DRILL-6523: 
Fix NPE for describe of partial schema  | 22.06.2018 @ 11:28:23 PDT  | 
r...@mapr.com  | 23.06.2018 @ 02:05:10 PDT  |

fourvarchar_asc_nulls95.q



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate

2018-06-29 Thread GitBox
Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch 
size for hash aggregate
URL: https://github.com/apache/drill/pull/1324#discussion_r199274012
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java
 ##
 @@ -568,8 +576,10 @@ private void retryAfterOOM(boolean batchAdded) throws 
RetryAfterSpillException {
 if ( batchAdded ) {
   logger.trace("OOM - Removing index {} from the batch holders 
list",batchHolders.size() - 1);
   BatchHolder bh = batchHolders.remove(batchHolders.size() - 1);
+  prevIndexSize = batchHolders.size() > 1 ? (batchHolders.size()-1) * 
BATCH_SIZE : 0;
+  totalIndexSize = batchHolders.size() * BATCH_SIZE;
   // update freeIndex to point to end of last batch + 1
-  freeIndex = batchHolders.size() * BATCH_SIZE + 1;
+  freeIndex = totalIndexSize + 1;
 
 Review comment:
   No need for "+ 1" -- the free index starts at batch offset of zero.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate

2018-06-29 Thread GitBox
Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch 
size for hash aggregate
URL: https://github.com/apache/drill/pull/1324#discussion_r199274859
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java
 ##
 @@ -684,7 +694,7 @@ public PutStatus put(int incomingRowIdx, IndexPointer 
htIdxHolder, int hashCode,
 }
 htIdxHolder.value = currentIdx;
 return  addedBatch ? PutStatus.NEW_BATCH_ADDED :
-(freeIndex + 1 > (batchHolders.size() * BATCH_SIZE)) ?
+(freeIndex + 1 > totalIndexSize) ?
 
 Review comment:
   Problem: The original code "overflew" the (64K - 1) by adding one, hence 
this check returned KEY_ADDED_LAST, which signaled to the caller that a batch 
just got filled -- a good opportunity to check if need to spill (since a new 
batch would surely be allocated soon).
  By going below 64K, this would not work, hence the code misses on some of 
the checks of "is spill needed now".
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Agirish commented on issue #1348: DRILL-6346: Create an Official Drill Docker Container

2018-06-29 Thread GitBox
Agirish commented on issue #1348: DRILL-6346: Create an Official Drill Docker 
Container
URL: https://github.com/apache/drill/pull/1348#issuecomment-401466019
 
 
   I'll work on adding the instructions into both the docs dir on this repo and 
the write-up on Parth's repo once the PR is in. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container

2018-06-29 Thread GitBox
Agirish commented on a change in pull request #1348: DRILL-6346: Create an 
Official Drill Docker Container
URL: https://github.com/apache/drill/pull/1348#discussion_r199272513
 
 

 ##
 File path: distribution/pom.xml
 ##
 @@ -485,6 +485,36 @@
 
   
 
+
+  
+  apache-release
+  
+
+  
+com.spotify
+dockerfile-maven-plugin
+1.4.3
+
+  
+docker-image
+
 
 Review comment:
   Here is how we can explicitly build, tag and push (any one action at at 
time):
   
   Inside the distribution directory, executing:
   `mvn -Papache-release dockerfile:build`
   `mvn -Papache-release dockerfile:tag`
   `mvn -Papache-release dockerfile:push`
   
   Build and Tag happens on the user's system. Push pushes the image to the 
container registry on Docker Hub: 
https://hub.docker.com/r/drill/apache-drill-centos/tags/
   
   On the root directory, one can simply run the below command to build and tag 
the container. Can be optionally pushed to Docker Hub separately. 
`mvn clean install -Papache-release -DskipTests`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container

2018-06-29 Thread GitBox
Agirish commented on a change in pull request #1348: DRILL-6346: Create an 
Official Drill Docker Container
URL: https://github.com/apache/drill/pull/1348#discussion_r199272414
 
 

 ##
 File path: distribution/pom.xml
 ##
 @@ -485,6 +485,36 @@
 
   
 
+
+  
+  apache-release
+  
+
+  
+com.spotify
+dockerfile-maven-plugin
+1.4.3
+
+  
+docker-image
+
+  build
+  push
+
+  
+
+
+  drill/apache-drill-centos
 
 Review comment:
   Sure. How about even simpler tag: 
   drill/apache-drill:1.14.0
   The default tag will be for CentOS. We can have additional tags like you 
mentioned:
   drill/apache-drill:1.14.0-centos (alias for 1.14.0)
   drill/apache-drill:1.14.0-ubuntu
   drill/apache-drill:1.14.0-alpine 
   etc..
   
   Also sure, the Docker Hub repo can be moved into a maven property. Should we 
keep the drill docker hub repo as the default, or just make it a local 
repository by default instead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: Actual vectorization execution

2018-06-29 Thread Paul Rogers
Hi Weijie,

As it turns out, vectorized processing in Drill is more aspirational than 
operational at this point in time.

The code used in Drill is not actually vector-based even though the data itself 
is columnar. Drill generally does row-wise operations because row-wise 
operations fit the SQL semantics better than column-wise operations. There is 
generally a "loop over all rows" block of code that calls into a "do something 
for column a" block, followed by a "do something for column b" block, etc.

For vectorized processing, the loops have to be inverted: "do all column a" 
followed by "do all column b". That is not always possible, however.

Further, many of Drill's readers produce Nullable types. In this case, every 
value carries a null/not-null flag which must be checked for each data value. 
It is unlikely that CPU instructions exist for this case.

So, a first step is to research how various operators could be vectorized. For 
example, how would we handle a "WHERE x = 10" case in a way that would benefit 
from vectorization? How about a "SUM(x)" case?

Once that is sorted out (there are likely research papers that explain how 
others have done it), we can move onto changing the generated code (the 
loop-over-all-rows code) to use the newer design.

Thanks,
- Paul

 

On Friday, June 29, 2018, 10:30:04 AM PDT, Aman Sinha  
wrote:  
 
 Hi Weijie,  the Panama project is an OpenJDK initialitve, right [1] ? not
Intel specific.
It would be quite a bit of work to test and certify with Intel's JVM which
may be still in the experimental stage.
Also, you may have seen the Gandiva project for Apache Arrow which aims to
improve vectorization for operations
on Arrow buffers (this requires integration with Arrow).

I assume the test program or workload you were running was already written
to exploit vectorization.  Have you also looked into
Drill's code-gen to see which ones are amenable to vectorization ?  We
could start with some small use case and expand.

[1]
http://www.oracle.com/technetwork/java/jvmls2016-ajila-vidstedt-3125545.pdf

On Fri, Jun 29, 2018 at 3:23 AM weijie tong  wrote:

> HI all:
>
>  I have investigate some vector friendly java codes's jit assembly code by
> the JITWatch tool . Then I found that JVM did not generate the expected AVX
> code.According to some conclusion from the JVM expert , JVM only supply
> some restrict usage case to generate AVX code.
>
>    I found Intel have fired a project called panama, which supply the
> intrinsic vector API to actual execute AVX code. Here is the reference (
>
> https://software.intel.com/en-us/articles/vector-api-developer-program-for-java
> )
> . It also supports offheap calculation.  From our JVM team's message, the
> vector api will be released at JDK11.
>
>    So I wonder whether we can distribute Intel's current JVM as a supplied
> default JVM to users (like spark distribution with a default scala) and as
> a option to rewrite parts of our operator codes according to this new
> vector api.
>
  

[GitHub] ilooner commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container

2018-06-29 Thread GitBox
ilooner commented on a change in pull request #1348: DRILL-6346: Create an 
Official Drill Docker Container
URL: https://github.com/apache/drill/pull/1348#discussion_r199263635
 
 

 ##
 File path: distribution/pom.xml
 ##
 @@ -485,6 +485,36 @@
 
   
 
+
+  
+  apache-release
+  
+
+  
+com.spotify
+dockerfile-maven-plugin
+1.4.3
+
+  
+docker-image
+
+  build
+  push
+
+  
+
+
+  drill/apache-drill-centos
 
 Review comment:
   I suggest not keeping a separate repository based on os. Typically there is 
one repository and the os is included in the tag for the image.
   
   ```
   drill/apache-drill:1.14-centos
   drill/apache-drill:1.14-alpine
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch Sizing functionality

2018-06-29 Thread GitBox
ilooner commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch 
Sizing functionality
URL: https://github.com/apache/drill/pull/1330#issuecomment-401453225
 
 
   @vrozov My understanding was the following. QA has setup automatic tests of 
both the performance of batch sizing as well as correctness on a real cluster. 
Each batch sizing change has unit tests to validate batch size. But on a real 
cluster with real data, the only viable way to validate right now for QA is to 
check the batch sizes output by an operator is through logging. Since Drill 
takes testing on real clusters seriously and aims to do more than just unit 
tests, I think this is perfectly acceptable.
   
   Since logging has overhead, and QA wanted to automate both the performance 
and correctness tests, they required the ability to turn logging off via sql 
line. This was the approach agreed on by developers and testers in the Drill 
community including @sachouche, @bitblender, @ppadma, robert (don't know his 
github id), and @priteshm.
   
   Given the scope of agreement in the community, the fact that similar changes 
have already been merged, and also to minor impact on the drill code itself ~20 
lines; I suggest moving this discussion to a separate change. In my 
investigation I was not able to find a viable alternative to this approach, 
@vrozov perhaps you could present an alternative approach on the dev list and 
lead the proposal. It would be a great help moving forward.
   
   In the meantime the changes proposed here represent a valuable performance 
improvement for the Drill community, so let's not hold up this change over this.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch Sizing functionality

2018-06-29 Thread GitBox
vrozov commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch 
Sizing functionality
URL: https://github.com/apache/drill/pull/1330#issuecomment-40105
 
 
   If logging needs to be enabled just for QA purposes, I still don't see why 
it needs to be enabled per query or be dynamic and can't be managed by 
logging.xml. Additionally, if I understand "QA purpose", batch sizes needs to 
be validated by unit testing, not by looking at the logs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6563) TPCDS query 10 has regressed

2018-06-29 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-6563:
-

 Summary: TPCDS query 10 has regressed 
 Key: DRILL-6563
 URL: https://issues.apache.org/jira/browse/DRILL-6563
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning  Optimization
Affects Versions: 1.14.0
Reporter: Khurram Faraaz
 Attachments: 24ca3c6c-90e1-a4bf-6c6f-3f981fa2d043.sys.drill, 
query10.fast_plan_old_commit, tpcds_query_10_plan_slow_140d09e.pdf, 
tpcds_query_plan_10_140d09e.txt

TPC-DS query 10 has regressed in performance from taking 3.5 seconds to execute 
on Apache Drill 1.14.0 commit  b92f599 , to 07 min 51.851 sec to complete 
execution on Apache Drill 1.14.0 commit 140d09e. Query was executed over SF1 
parquet views on a 4 node cluster.

Query plan from old and newer commit is attached here, with the query profile 
from newer commit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container

2018-06-29 Thread GitBox
Agirish commented on a change in pull request #1348: DRILL-6346: Create an 
Official Drill Docker Container
URL: https://github.com/apache/drill/pull/1348#discussion_r199251973
 
 

 ##
 File path: distribution/Dockerfile
 ##
 @@ -0,0 +1,28 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM centos:7
+ARG VERSION
+
+RUN yum install -y java-1.8.0-openjdk-devel which ; yum clean all ; rm -rf 
/var/cache/yum
 
 Review comment:
   Done. Please take a look and suggest if any of the comments can be rephrased.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Agirish commented on issue #1350: DRILL-4580: Support for exporting storage plugin configurations

2018-06-29 Thread GitBox
Agirish commented on issue #1350: DRILL-4580: Support for exporting storage 
plugin configurations
URL: https://github.com/apache/drill/pull/1350#issuecomment-401441469
 
 
   Filed DRILL-6562 to track this. 
   
   Regarding one vs multiple files, my reasoning was to keep it more human 
readable and flexible - giving options to selectively import any plugin as 
needed. So suggested a compressed folder with each of the plugin files. But 
your approach sounds good. I'm thinking the "Export All" option would be used 
for All or None scenarios. And the regular per plugin "Export" option being 
introduced in this PR can support the flexible per plugin import. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6562) Provide option to Export All storage plugin configurations at once

2018-06-29 Thread Abhishek Girish (JIRA)
Abhishek Girish created DRILL-6562:
--

 Summary: Provide option to Export All storage plugin 
configurations at once
 Key: DRILL-6562
 URL: https://issues.apache.org/jira/browse/DRILL-6562
 Project: Apache Drill
  Issue Type: Improvement
  Components: Client - HTTP
Reporter: Abhishek Girish
Assignee: Abhishek Girish


Follow-up to DRILL-4580.

Provide ability to export all storage plugin configurations at once, with a new 
"Export All" option on the Storage page of the Drill web UI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] KazydubB commented on issue #1337: Upgrade ZooKeeper patch version to 3.4.12 and add Apache Curator to dependencyManagement

2018-06-29 Thread GitBox
KazydubB commented on issue #1337: Upgrade ZooKeeper patch version to 3.4.12 
and add Apache Curator to dependencyManagement
URL: https://github.com/apache/drill/pull/1337#issuecomment-401439843
 
 
   WIP: going to exclude unneeded dependencies from dependencies added to 
dependencyManagement.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on issue #1350: DRILL-4580: Support for exporting storage plugin configurations

2018-06-29 Thread GitBox
arina-ielchiieva commented on issue #1350: DRILL-4580: Support for exporting 
storage plugin configurations
URL: https://github.com/apache/drill/pull/1350#issuecomment-401437213
 
 
   Sounds fair regarding the enhancement Jira. I think keeping export button 
even in update mode is also good. Please create Jira for the enhancement for 
import / export all. Also why you think import all should export each plugin in 
different file, it can export all in one (similar as we have for bootstrap 
plugins)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] KazydubB commented on a change in pull request #1337: Upgrade ZooKeeper patch version to 3.4.12 and add Apache Curator to dependencyManagement

2018-06-29 Thread GitBox
KazydubB commented on a change in pull request #1337: Upgrade ZooKeeper patch 
version to 3.4.12 and add Apache Curator to dependencyManagement
URL: https://github.com/apache/drill/pull/1337#discussion_r199242382
 
 

 ##
 File path: pom.xml
 ##
 @@ -1552,6 +1553,11 @@
   
 
   
+  
+org.apache.zookeeper
+zookeeper
 
 Review comment:
   I've tried upgrading all curator dependencies to 2.12.0, it seems to work as 
expected but curator-client's size is much larger than current one's, so one 
needs to update max size of drill-jdbc-all module by extra 1-2 MB. It is better 
to be handled as another task.
   However I did add curator dependencies to dependencyManagement explicitly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: Travis CI Timed Out

2018-06-29 Thread Timothy Farkas
The timeouts are a sporadic issue. You can click on the failed build to
bring up Travis. For now, since you are a committer you will have a restart
build button to the right of the console (non-committers don't get this
button for some reason), this will try the build again and it will probably
pass. Vitali is proposing some fixes for the timeout issue, so hopefully we
won't have to deal with it soon.


​

On Thu, Jun 28, 2018 at 9:21 PM, Charles Givre  wrote:

> I submitted some changes and the following happened.  Any suggestions?
>
>
> Audit done.
>
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---  [0;32mmaven-enforcer-plugin:1.3.1:enforce [m
> [1m(avoid_bad_dependencies) [m @  [36mdrill-sqlline [0;1m --- [m
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---  [0;32mmaven-install-plugin:2.5.2:install [m
> [1m(default-install) [m @  [36mdrill-sqlline [0;1m --- [m
> [ [1;34mINFO [m] Installing /home/travis/build/apache/
> drill/contrib/sqlline/target/drill-sqlline-1.14.0-SNAPSHOT.jar to
> /home/travis/.m2/repository/org/apache/drill/contrib/
> drill-sqlline/1.14.0-SNAPSHOT/drill-sqlline-1.14.0-SNAPSHOT.jar
> [ [1;34mINFO [m] Installing 
> /home/travis/build/apache/drill/contrib/sqlline/pom.xml
> to /home/travis/.m2/repository/org/apache/drill/contrib/
> drill-sqlline/1.14.0-SNAPSHOT/drill-sqlline-1.14.0-SNAPSHOT.pom
> [ [1;34mINFO [m] Installing /home/travis/build/apache/
> drill/contrib/sqlline/target/drill-sqlline-1.14.0-SNAPSHOT-tests.jar to
> /home/travis/.m2/repository/org/apache/drill/contrib/
> drill-sqlline/1.14.0-SNAPSHOT/drill-sqlline-1.14.0-SNAPSHOT-tests.jar
> [ [1;34mINFO [m]  [1m---
> - [m
> [ [1;34mINFO [m]  [1mReactor Summary: [m
> [ [1;34mINFO [m]
> [ [1;34mINFO [m] Apache Drill Root POM ..
> [1;32mSUCCESS [m [ 11.686 s]
> [ [1;34mINFO [m] tools/Parent Pom ...
> [1;32mSUCCESS [m [  1.118 s]
> [ [1;34mINFO [m] tools/freemarker codegen tooling ...
> [1;32mSUCCESS [m [  7.546 s]
> [ [1;34mINFO [m] Drill Protocol .
> [1;32mSUCCESS [m [  1.690 s]
> [ [1;34mINFO [m] Common (Logical Plan, Base expressions) 
> [1;32mSUCCESS [m [  6.200 s]
> [ [1;34mINFO [m] Logical Plan, Base expressions .
> [1;32mSUCCESS [m [  6.076 s]
> [ [1;34mINFO [m] exec/Parent Pom 
> [1;32mSUCCESS [m [  2.769 s]
> [ [1;34mINFO [m] exec/memory/Parent Pom .
> [1;32mSUCCESS [m [  0.825 s]
> [ [1;34mINFO [m] exec/memory/base ...
> [1;32mSUCCESS [m [  3.922 s]
> [ [1;34mINFO [m] exec/rpc ...
> [1;32mSUCCESS [m [  1.866 s]
> [ [1;34mINFO [m] exec/Vectors ...
> [1;32mSUCCESS [m [  7.492 s]
> [ [1;34mINFO [m] contrib/Parent Pom .
> [1;32mSUCCESS [m [  2.568 s]
> [ [1;34mINFO [m] contrib/data/Parent Pom 
> [1;32mSUCCESS [m [  0.693 s]
> [ [1;34mINFO [m] contrib/data/tpch-sample-data ..
> [1;32mSUCCESS [m [  3.937 s]
> [ [1;34mINFO [m] exec/Java Execution Engine .
> [1;32mSUCCESS [m [28:30 min]
> [ [1;34mINFO [m] exec/JDBC Driver using dependencies 
> [1;32mSUCCESS [m [02:47 min]
> [ [1;34mINFO [m] JDBC JAR with all dependencies .
> [1;32mSUCCESS [m [ 42.480 s]
> [ [1;34mINFO [m] Drill-on-YARN ..
> [1;32mSUCCESS [m [ 15.818 s]
> [ [1;34mINFO [m] contrib/kudu-storage-plugin 
> [1;32mSUCCESS [m [ 10.498 s]
> [ [1;34mINFO [m] contrib/opentsdb-storage-plugin 
> [1;32mSUCCESS [m [ 19.961 s]
> [ [1;34mINFO [m] contrib/mongo-storage-plugin ...
> [1;32mSUCCESS [m [ 52.961 s]
> [ [1;34mINFO [m] contrib/hbase-storage-plugin ...
> [1;32mSUCCESS [m [01:18 min]
> [ [1;34mINFO [m] contrib/jdbc-storage-plugin 
> [1;32mSUCCESS [m [ 25.873 s]
> [ [1;34mINFO [m] contrib/hive-storage-plugin/Parent Pom .
> [1;32mSUCCESS [m [  3.396 s]
> [ [1;34mINFO [m] contrib/hive-storage-plugin/hive-exec-shaded ...
> [1;32mSUCCESS [m [ 37.665 s]
> [ [1;34mINFO [m] contrib/mapr-format-plugin .
> [1;32mSUCCESS [m [  5.200 s]
> [ [1;34mINFO [m] contrib/hive-storage-plugin/core ...
> [1;32mSUCCESS [m [ 10.537 s]
> [ [1;34mINFO [m] contrib/drill-gis-plugin ...
> [1;32mSUCCESS [m [ 21.492 s]
> [ [1;34mINFO [m] contrib/kafka-storage-plugin ...
> [1;32mSUCCESS [m [  6.602 s]
> [ [1;34mINFO [m] Packaging and Distribution Assembly 
> [1;32mSUCCESS [m [ 58.576 s]
> [ [1;34mINFO [m] contrib/sqlline 
> [1;32mSUCCESS [m [  1.570 s]
> [ [1;34mINFO [m]  [1m---
> 

[GitHub] vvysotskyi commented on a change in pull request #1353: DRILL-6553: Fix TopN for unnest operator

2018-06-29 Thread GitBox
vvysotskyi commented on a change in pull request #1353: DRILL-6553: Fix TopN 
for unnest operator
URL: https://github.com/apache/drill/pull/1353#discussion_r199237071
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/lateraljoin/TestE2EUnnestAndLateral.java
 ##
 @@ -394,4 +401,30 @@ public void testLateral_HashAgg_with_nulls() throws 
Exception {
   .baselineValues("dd",222L)
   .build().run();
   }
+
+  @Test
+  public void testUnnestTopN() throws Exception {
 
 Review comment:
   Thanks, done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Agirish commented on issue #1350: DRILL-4580: Support for exporting storage plugin configurations

2018-06-29 Thread GitBox
Agirish commented on issue #1350: DRILL-4580: Support for exporting storage 
plugin configurations
URL: https://github.com/apache/drill/pull/1350#issuecomment-401424556
 
 
   @arina-ielchiieva, I thought about that - but it's a bit more complicated in 
how i'd like it to be: An "Export All" button" should export each of the 
storage plugin configurations into JSON files and generate a single compressed 
folder which contains each of these JSON files individually. 
   And an "Import" button which can support importing either a single JSON file 
or a compressed folder which can create one or more storage plugins. 
   
   I think having individual export button is helpful. We could keep it in the 
"Update" page (image 2 above). We could remove it from the main storage plugin 
list page (image 1) if you think it's cluttering the UI. 
   
   In a later PR, we can add support for the "Export All" and "Import" (one or 
more) buttons. Would that be okay? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch Sizing functionality

2018-06-29 Thread GitBox
ilooner commented on issue #1330: DRILL-6147: Adding Columnar Parquet Batch 
Sizing functionality
URL: https://github.com/apache/drill/pull/1330#issuecomment-401424230
 
 
   @vrozov I discussed the logging with @sachouche. The issue is that some 
information needed to be logged for QA purposes on a per query basis, and for 
ease of testing needed to be toggled from the sqlline. This was the agreed on 
approach by the batch sizing team, so at this point we're coming in at the end 
of a series of discussions. Nevertheless I spent about half a day to find a way 
this use case could be met in a logback centric way, and I could not find a 
reasonable alternative that could be done without a large amount of work.
   
   The main issue is that slf4j / logback has global logging configurations, 
which makes it impossible to turn on logging on a per query bases. We can use 
MDC to include a query id in the log messages and we could filter by query, but 
even that is not ideal since you are generating logs you don't need for all 
queries. Then there is the issue of updating the logging configuration. The 
only logback native method is to enable periodic scanning of the logback.xml 
file and to update the logback.xml file on all nodes when you want to enable 
certain logging. That is not a great experience for someone automating tests, 
or debugging.
   
   The conclusion we came to was that Drill needs a real logging solution, and 
to really do it right you would have to go down the rabbit hole.
   
 1. Enable per query logging configurations, not just a global logging 
configuration. To do this we would have to use thread local loggers instead of 
static loggers, or something like https://clogr.io/, or a custom 
ContextSelector 
https://stackoverflow.com/questions/38208617/practical-use-of-logback-context-selectors.
 2. Allow either sql line configuration of log levels or web ui 
configuration of log levels (like Apex had).
 3. Include a logging context with the query context to communicate the 
logging config for a query.
   
   These are non trivial, have a lot of corner cases, and require design 
discussions with the community.
   
   For the interim Salim is making the options controlling logging internal, so 
users will not see them. Then he is going to have a follow up PR to consolidate 
the logging across all the batch sizing operators. Later when we have a real 
logging solution we can delete these options and do the right thing.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container

2018-06-29 Thread GitBox
Agirish commented on a change in pull request #1348: DRILL-6346: Create an 
Official Drill Docker Container
URL: https://github.com/apache/drill/pull/1348#discussion_r199230469
 
 

 ##
 File path: distribution/Dockerfile
 ##
 @@ -0,0 +1,28 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM centos:7
+ARG VERSION
+
+RUN yum install -y java-1.8.0-openjdk-devel which ; yum clean all ; rm -rf 
/var/cache/yum
 
 Review comment:
   Yes, that's right. I can add comments. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container

2018-06-29 Thread GitBox
Agirish commented on a change in pull request #1348: DRILL-6346: Create an 
Official Drill Docker Container
URL: https://github.com/apache/drill/pull/1348#discussion_r199230386
 
 

 ##
 File path: distribution/pom.xml
 ##
 @@ -485,6 +485,36 @@
 
   
 
+
+  
+  apache-release
+  
+
+  
+com.spotify
+dockerfile-maven-plugin
+1.4.3
+
+  
+docker-image
+
 
 Review comment:
   We can get rid of 'push' if you'd like. 'build' is good enough for most 
purposes. I'm not sure about 'package'. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container

2018-06-29 Thread GitBox
Agirish commented on a change in pull request #1348: DRILL-6346: Create an 
Official Drill Docker Container
URL: https://github.com/apache/drill/pull/1348#discussion_r199230067
 
 

 ##
 File path: distribution/pom.xml
 ##
 @@ -485,6 +485,36 @@
 
   
 
+
+  
+  apache-release
+  
+
+  
+com.spotify
+dockerfile-maven-plugin
+1.4.3
+
+  
+docker-image
+
+  build
+  push
+
+  
+
+
+  drill/apache-drill-centos
 
 Review comment:
   1. This is configurable - anyone can update it to any private or public 
Docker hub or any Container Registry. Do you have a different approach in mind?
   2. I briefly touched upon this in the description. Since the ask was for an 
official Docker image for AD, it makes sense to build this once during the 
release. Anyone can download the Docker image at anytime, using a simple pull. 
Everything else remains the same. I also chose this approach to prevent Docker 
images from being generated everytime. 
   
   3. Tim and I have. We intend to give access to the 'drill' Docker Hub 
Organization to all committers. Perhaps, starting with release managers for 
every release. It's a good central place to have all related docker images for 
Drill.
   
   4. Sure. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: Actual vectorization execution

2018-06-29 Thread Aman Sinha
Hi Weijie,  the Panama project is an OpenJDK initialitve, right [1] ? not
Intel specific.
It would be quite a bit of work to test and certify with Intel's JVM which
may be still in the experimental stage.
Also, you may have seen the Gandiva project for Apache Arrow which aims to
improve vectorization for operations
on Arrow buffers (this requires integration with Arrow).

I assume the test program or workload you were running was already written
to exploit vectorization.  Have you also looked into
Drill's code-gen to see which ones are amenable to vectorization ?  We
could start with some small use case and expand.

[1]
http://www.oracle.com/technetwork/java/jvmls2016-ajila-vidstedt-3125545.pdf

On Fri, Jun 29, 2018 at 3:23 AM weijie tong  wrote:

> HI all:
>
>   I have investigate some vector friendly java codes's jit assembly code by
> the JITWatch tool . Then I found that JVM did not generate the expected AVX
> code.According to some conclusion from the JVM expert , JVM only supply
> some restrict usage case to generate AVX code.
>
>I found Intel have fired a project called panama, which supply the
> intrinsic vector API to actual execute AVX code. Here is the reference (
>
> https://software.intel.com/en-us/articles/vector-api-developer-program-for-java
> )
> . It also supports offheap calculation.  From our JVM team's message, the
> vector api will be released at JDK11.
>
>So I wonder whether we can distribute Intel's current JVM as a supplied
> default JVM to users (like spark distribution with a default scala) and as
> a option to rewrite parts of our operator codes according to this new
> vector api.
>


[jira] [Created] (DRILL-6561) Lateral excluding the columns from output container provided by projection push into rules

2018-06-29 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-6561:


 Summary: Lateral excluding the columns from output container 
provided by projection push into rules
 Key: DRILL-6561
 URL: https://issues.apache.org/jira/browse/DRILL-6561
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Sorabh Hamirwasia
Assignee: Sorabh Hamirwasia
 Fix For: 1.14.0


With DRILL-6545, LateralPop will have information about list of columns to be 
excluded from the Lateral output container. Mostly this is used to avoid 
producing origin repeated column in Lateral output if it's not required from 
the projection list. This is needed because in absence of it Lateral has to 
copy the repeated column N number of times where N is the number of rows in 
right incoming batch for each left incoming batch row. This copy was very 
costly both from memory and latency perspective. Hence avoiding it is a must 
for Lateral-Unnest case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Agirish commented on a change in pull request #1348: DRILL-6346: Create an Official Drill Docker Container

2018-06-29 Thread GitBox
Agirish commented on a change in pull request #1348: DRILL-6346: Create an 
Official Drill Docker Container
URL: https://github.com/apache/drill/pull/1348#discussion_r199228648
 
 

 ##
 File path: distribution/Dockerfile
 ##
 @@ -0,0 +1,28 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM centos:7
+ARG VERSION
+
+RUN yum install -y java-1.8.0-openjdk-devel which ; yum clean all ; rm -rf 
/var/cache/yum
+
+COPY target/apache-drill-$VERSION.tar.gz /tmp
+RUN mkdir /opt/drill
+RUN tar -xvzf /tmp/apache-drill-$VERSION.tar.gz --directory=/opt/drill 
--strip-components 1
+
+ENTRYPOINT /opt/drill/bin/drill-embedded
 
 Review comment:
   So in the current implementation, when a user connects to the container 
using the docker run command, it will start Drill in embedded mode and go into 
the Sqlline prompt. Once they quit, the Drillbit process stops and container 
exits automatically. Each time with the docker run command, you get a Sqlline 
session. Please see the description above for an example. 
   
   In my original implementation, I have an init script which runs forever 
(keeping the container alive until someone explicitly stops it). That requires 
ZK. I discussed with Tim and we thought getting Embedded mode into Apache Drill 
codebase would be a good first step. And then we can discuss on the approach 
we'd like to take to get the stand-alone mode working. 
   
   See: 
https://github.com/Agirish/drill-containers/blob/master/dockerfiles/apache-drill-centos-1.13.0/init-script
   
https://github.com/Agirish/drill-containers/blob/master/dockerfiles/apache-zookeeper-centos-3.4.10/init-script


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6560) Allow options for controlling the batch size per operator

2018-06-29 Thread salim achouche (JIRA)
salim achouche created DRILL-6560:
-

 Summary: Allow options for controlling the batch size per operator
 Key: DRILL-6560
 URL: https://issues.apache.org/jira/browse/DRILL-6560
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Flow
Reporter: salim achouche
Assignee: salim achouche
 Fix For: 1.14.0


This Jira is for internal Drill DEV use; the following capabilities are needed 
for automating the batch sizing functionality testing:
 * Control the enablement of batch sizing statistics at session (per query) and 
server level (all queries)
 * Control the granularity of batch sizing statistics (summary or verbose)
 * Control the set of operators that should log batch statistics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: DRILL-6519 Tavis CI timing out

2018-06-29 Thread Arina Yelchiyeva
We are just finalizing the list of tests we are going to be marked as slow.
Once done, this would be an easy fix :)

Kind regards,
Arina

On Fri, Jun 29, 2018 at 7:41 PM Charles Givre  wrote:

> Hi Vitalii
> I saw the discussion but is there anything that I can do to prevent this
> from happening?
>
> Sent from my iPhone
>
> > On Jun 29, 2018, at 12:34, Vitalii Diravka 
> wrote:
> >
> > Hi Charles,
> >
> > Recently there was a discussion in dev mailing list regarding this.
> Please
> > take a look.
> > Also I have created Jira ticket [1].
> >
> > [1] https://issues.apache.org/jira/browse/DRILL-6559
> >
> > Kind regards
> > Vitalii
> >
> >
> >> On Fri, Jun 29, 2018 at 5:43 PM Charles Givre  wrote:
> >>
> >> Hi Arina,
> >> Could you or someone take a look at the CI.  I’ve submitted the changes,
> >> the CI seems to build successfully, but times out and thus fails the
> check.
> >> Thanks,
> >> — C
>


Re: [DISCUSSION] Travis build failures

2018-06-29 Thread Arina Yelchiyeva
Or we can be more radical here and exclude all tests from contib module?

On Fri, Jun 29, 2018 at 7:28 PM Vitalii Diravka 
wrote:

> Agree, TestTpchExplain and TestTpchPlanning are fast enough.
> TestTpchLimit0 has PlannerTest category, but it is slower than the above
> tests.
> Actually only TestTpchSingleMode and TestTpchLimit0 are candidates for
> excluding.
> I am not sure that it is enough for Travis (45 seconds on my machine). What
> about Mongo unit tests?
>
> I have created Jira for it [1].
>
> [1] https://issues.apache.org/jira/browse/DRILL-6559
>
> Kind regards
> Vitalii
>
>
> On Wed, Jun 27, 2018 at 10:29 PM Aman Sinha  wrote:
>
> > Sounds good but why exclude the planning tests ?  Only the TPC-H runtime
> > tests should be excluded I think.
> >
> > On Wed, Jun 27, 2018 at 12:16 PM Timothy Farkas 
> wrote:
> >
> > > +1
> > >
> > > On Wed, Jun 27, 2018 at 10:00 AM, Vitalii Diravka <
> > > vitalii.dira...@gmail.com
> > > > wrote:
> > >
> > > > This is a topic from last Hangout meeting.
> > > >
> > > > Sometimes Drill Travis Build fails because of job run expires.
> > > > Certainly, the right way is to accelerate Drill execution :)
> > > >
> > > > Nevertheless I believe we could consider excluding some more tests
> from
> > > > Travis Build.
> > > > We can add all TPCH tests (
> > > > TestTpchLimit0, TestTpchExplain, TestTpchPlanning, TestTpchExplain)
> to
> > > the
> > > > SlowTest category.
> > > > Actually it isn't urgent, but we can consider it for future, if this
> > > > happens more often.
> > > >
> > > > Kind regards
> > > > Vitalii
> > > >
> > >
> >
>


Re: DRILL-6519 Tavis CI timing out

2018-06-29 Thread Charles Givre
Hi Vitalii 
I saw the discussion but is there anything that I can do to prevent this from 
happening? 

Sent from my iPhone

> On Jun 29, 2018, at 12:34, Vitalii Diravka  wrote:
> 
> Hi Charles,
> 
> Recently there was a discussion in dev mailing list regarding this. Please
> take a look.
> Also I have created Jira ticket [1].
> 
> [1] https://issues.apache.org/jira/browse/DRILL-6559
> 
> Kind regards
> Vitalii
> 
> 
>> On Fri, Jun 29, 2018 at 5:43 PM Charles Givre  wrote:
>> 
>> Hi Arina,
>> Could you or someone take a look at the CI.  I’ve submitted the changes,
>> the CI seems to build successfully, but times out and thus fails the check.
>> Thanks,
>> — C


[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling

2018-06-29 Thread GitBox
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code 
improvements in parquet statistics handling
URL: https://github.com/apache/drill/pull/1349#discussion_r199217282
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ##
 @@ -417,16 +417,9 @@ public static DateCorruptionStatus 
checkForCorruptDateValuesInStatistics(Parquet
 // column does not appear in this file, skip it
 continue;
   }
-  Statistics statistics = 
footer.getBlocks().get(rowGroupIndex).getColumns().get(colIndex).getStatistics();
-  Integer max = (Integer) statistics.genericGetMax();
-  if (statistics.hasNonNullValue()) {
-if (max > ParquetReaderUtility.DATE_CORRUPTION_THRESHOLD) {
-  return DateCorruptionStatus.META_SHOWS_CORRUPTION;
-}
-  } else {
-// no statistics, go check the first page
-return DateCorruptionStatus.META_UNCLEAR_TEST_VALUES;
-  }
+  IntStatistics statistics = 
(IntStatistics)footer.getBlocks().get(rowGroupIndex).getColumns().get(colIndex).getStatistics();
 
 Review comment:
   I don't see any specific code style in regards to spacing for casting (grep 
-I -R \(\([a-zA-Z0-9_]*\)[a-zA-Z0-9_] * | grep -c java). It seems to be a 
preference of a contributor.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: DRILL-6519 Tavis CI timing out

2018-06-29 Thread Vitalii Diravka
Hi Charles,

Recently there was a discussion in dev mailing list regarding this. Please
take a look.
Also I have created Jira ticket [1].

[1] https://issues.apache.org/jira/browse/DRILL-6559

Kind regards
Vitalii


On Fri, Jun 29, 2018 at 5:43 PM Charles Givre  wrote:

> Hi Arina,
> Could you or someone take a look at the CI.  I’ve submitted the changes,
> the CI seems to build successfully, but times out and thus fails the check.
> Thanks,
> — C


[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling

2018-06-29 Thread GitBox
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code 
improvements in parquet statistics handling
URL: https://github.com/apache/drill/pull/1349#discussion_r199215493
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java
 ##
 @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) {
*
* @param stat parquet column statistics
* @param rowCount number of rows in the parquet file
-   * @return True if all rows are null in the parquet file
-   *  False if at least one row is not null.
+   * @return true if all rows are null in the parquet file and 
false otherwise
*/
   static boolean isAllNulls(Statistics stat, long rowCount) {
-return stat.isNumNullsSet() && stat.getNumNulls() == rowCount;
+Preconditions.checkArgument(rowCount >= 0, String.format("negative 
rowCount %d is not valid", rowCount));
 
 Review comment:
   It does not matter where it comes from (it actually comes from 
`RowGroupInfo`). The condition needs to be intercepted prior to calling 
`isAllNulls()` as `isAllNulls()` would return the wrong result (whether true or 
false) for a negative row count.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [DISCUSSION] Travis build failures

2018-06-29 Thread Vitalii Diravka
Agree, TestTpchExplain and TestTpchPlanning are fast enough.
TestTpchLimit0 has PlannerTest category, but it is slower than the above
tests.
Actually only TestTpchSingleMode and TestTpchLimit0 are candidates for
excluding.
I am not sure that it is enough for Travis (45 seconds on my machine). What
about Mongo unit tests?

I have created Jira for it [1].

[1] https://issues.apache.org/jira/browse/DRILL-6559

Kind regards
Vitalii


On Wed, Jun 27, 2018 at 10:29 PM Aman Sinha  wrote:

> Sounds good but why exclude the planning tests ?  Only the TPC-H runtime
> tests should be excluded I think.
>
> On Wed, Jun 27, 2018 at 12:16 PM Timothy Farkas  wrote:
>
> > +1
> >
> > On Wed, Jun 27, 2018 at 10:00 AM, Vitalii Diravka <
> > vitalii.dira...@gmail.com
> > > wrote:
> >
> > > This is a topic from last Hangout meeting.
> > >
> > > Sometimes Drill Travis Build fails because of job run expires.
> > > Certainly, the right way is to accelerate Drill execution :)
> > >
> > > Nevertheless I believe we could consider excluding some more tests from
> > > Travis Build.
> > > We can add all TPCH tests (
> > > TestTpchLimit0, TestTpchExplain, TestTpchPlanning, TestTpchExplain) to
> > the
> > > SlowTest category.
> > > Actually it isn't urgent, but we can consider it for future, if this
> > > happens more often.
> > >
> > > Kind regards
> > > Vitalii
> > >
> >
>


[GitHub] arina-ielchiieva commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling

2018-06-29 Thread GitBox
arina-ielchiieva commented on a change in pull request #1349: DRILL-6554: Minor 
code improvements in parquet statistics handling
URL: https://github.com/apache/drill/pull/1349#discussion_r199212137
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java
 ##
 @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) {
*
* @param stat parquet column statistics
* @param rowCount number of rows in the parquet file
-   * @return True if all rows are null in the parquet file
-   *  False if at least one row is not null.
+   * @return true if all rows are null in the parquet file and 
false otherwise
*/
   static boolean isAllNulls(Statistics stat, long rowCount) {
-return stat.isNumNullsSet() && stat.getNumNulls() == rowCount;
+Preconditions.checkArgument(rowCount >= 0, String.format("negative 
rowCount %d is not valid", rowCount));
 
 Review comment:
   But this negative rowCount could only come from the statistics, I don't 
think that bad statistics should prevent user from executing the query.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6559) Travis timing out

2018-06-29 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-6559:
--

 Summary: Travis timing out
 Key: DRILL-6559
 URL: https://issues.apache.org/jira/browse/DRILL-6559
 Project: Apache Drill
  Issue Type: Improvement
  Components: Tools, Build  Test
Affects Versions: 1.13.0
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka
 Fix For: 1.14.0


There was a decision to exclude some more TPCH unit tests from Travis CI build 
by adding 
TestTpchSingleMode, TestTpchLimit0 (optionally).

Excluding other test cases are discussed.

https://lists.apache.org/thread.html/35f41f16d1679029e6089e447e2a85534243e9d5904116114c5168e8@%3Cdev.drill.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] sohami commented on a change in pull request #1353: DRILL-6553: Fix TopN for unnest operator

2018-06-29 Thread GitBox
sohami commented on a change in pull request #1353: DRILL-6553: Fix TopN for 
unnest operator
URL: https://github.com/apache/drill/pull/1353#discussion_r199210264
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/lateraljoin/TestE2EUnnestAndLateral.java
 ##
 @@ -394,4 +401,30 @@ public void testLateral_HashAgg_with_nulls() throws 
Exception {
   .baselineValues("dd",222L)
   .build().run();
   }
+
+  @Test
+  public void testUnnestTopN() throws Exception {
 
 Review comment:
   Please make this test part of `TestLateralPlans.java` where we have other 
plan verification tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling

2018-06-29 Thread GitBox
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code 
improvements in parquet statistics handling
URL: https://github.com/apache/drill/pull/1349#discussion_r199210267
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java
 ##
 @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) {
*
* @param stat parquet column statistics
* @param rowCount number of rows in the parquet file
-   * @return True if all rows are null in the parquet file
-   *  False if at least one row is not null.
+   * @return true if all rows are null in the parquet file and 
false otherwise
*/
   static boolean isAllNulls(Statistics stat, long rowCount) {
-return stat.isNumNullsSet() && stat.getNumNulls() == rowCount;
+Preconditions.checkArgument(rowCount >= 0, String.format("negative 
rowCount %d is not valid", rowCount));
 
 Review comment:
   I hope for the reverse. Negative row count passed to this method indicates a 
bug, and bug means incorrect result. I'd rather fail a query than give a wrong 
result.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling

2018-06-29 Thread GitBox
arina-ielchiieva commented on a change in pull request #1349: DRILL-6554: Minor 
code improvements in parquet statistics handling
URL: https://github.com/apache/drill/pull/1349#discussion_r199209381
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java
 ##
 @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) {
*
* @param stat parquet column statistics
* @param rowCount number of rows in the parquet file
-   * @return True if all rows are null in the parquet file
-   *  False if at least one row is not null.
+   * @return true if all rows are null in the parquet file and 
false otherwise
*/
   static boolean isAllNulls(Statistics stat, long rowCount) {
-return stat.isNumNullsSet() && stat.getNumNulls() == rowCount;
+Preconditions.checkArgument(rowCount >= 0, String.format("negative 
rowCount %d is not valid", rowCount));
 
 Review comment:
   Ok, then just please describe what will happen during query execution if 
negative row comes into this method, will query fail or statistics will be 
ignored and there will not filter push down? I hope for the second.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6558) Drill query fails when file name contains semicolon

2018-06-29 Thread Volodymyr Vysotskyi (JIRA)
Volodymyr Vysotskyi created DRILL-6558:
--

 Summary: Drill query fails when file name contains semicolon
 Key: DRILL-6558
 URL: https://issues.apache.org/jira/browse/DRILL-6558
 Project: Apache Drill
  Issue Type: Bug
Reporter: Volodymyr Vysotskyi


Queries on the tables which contain semicolon in the name:
{code:sql}
select * from dfs.`/tmp/q:0`
{code}
fails with error:
{noformat}
org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: 
java.net.URISyntaxException: Relative path in absolute URI: q:0

SQL Query null

[Error Id: 34fafee1-8fbe-4fe0-9fcb-ddcc926bb192 on user515050-pc:31010]

(java.lang.IllegalArgumentException) java.net.URISyntaxException: Relative path 
in absolute URI: q:0
 org.apache.hadoop.fs.Path.initialize():205
 org.apache.hadoop.fs.Path.():171
 org.apache.hadoop.fs.Path.():93
 org.apache.hadoop.fs.Globber.glob():253
 org.apache.hadoop.fs.FileSystem.globStatus():1655
 org.apache.drill.exec.store.dfs.DrillFileSystem.globStatus():547
 org.apache.drill.exec.store.dfs.FileSelection.create():274
 
org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create():607
 
org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create():408
 org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry():96
 org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get():90
 
org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable():561
 
org.apache.drill.exec.store.dfs.FileSystemSchemaFactory$FileSystemSchema.getTable():132
 org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable():82
 org.apache.calcite.jdbc.CalciteSchema.getTable():257
 org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEntryFrom():1022
 org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEntry():979
 org.apache.calcite.prepare.CalciteCatalogReader.getTable():123
 
org.apache.drill.exec.planner.sql.SqlConverter$DrillCalciteCatalogReader.getTable():650
 
org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom():260
 org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3219
 org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
 org.apache.calcite.sql.validate.AbstractNamespace.validate():84
 org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():947
 org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():928
 org.apache.calcite.sql.SqlSelect.validate():226
 org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():903
 org.apache.calcite.sql.validate.SqlValidatorImpl.validate():613
 org.apache.drill.exec.planner.sql.SqlConverter.validate():190
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():644
 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():204
 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():176
 org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():145
 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():83
 org.apache.drill.exec.work.foreman.Foreman.runSQL():567
 org.apache.drill.exec.work.foreman.Foreman.run():266
 java.util.concurrent.ThreadPoolExecutor.runWorker():1149
 java.util.concurrent.ThreadPoolExecutor$Worker.run():624
 java.lang.Thread.run():748
 Caused By (java.net.URISyntaxException) Relative path in absolute URI: q:0
 java.net.URI.checkPath():1823
 java.net.URI.():745
 org.apache.hadoop.fs.Path.initialize():202
 org.apache.hadoop.fs.Path.():171
 org.apache.hadoop.fs.Path.():93
 org.apache.hadoop.fs.Globber.glob():253
 org.apache.hadoop.fs.FileSystem.globStatus():1655
 org.apache.drill.exec.store.dfs.DrillFileSystem.globStatus():547
 org.apache.drill.exec.store.dfs.FileSelection.create():274
 
org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create():607
 
org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create():408
 org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry():96
 org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get():90
 
org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable():561
 
org.apache.drill.exec.store.dfs.FileSystemSchemaFactory$FileSystemSchema.getTable():132
 org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable():82
 org.apache.calcite.jdbc.CalciteSchema.getTable():257
 org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEntryFrom():1022
 org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEntry():979
 org.apache.calcite.prepare.CalciteCatalogReader.getTable():123
 
org.apache.drill.exec.planner.sql.SqlConverter$DrillCalciteCatalogReader.getTable():650
 
org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom():260
 org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3219
 org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
 

[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling

2018-06-29 Thread GitBox
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code 
improvements in parquet statistics handling
URL: https://github.com/apache/drill/pull/1349#discussion_r199208654
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java
 ##
 @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) {
*
* @param stat parquet column statistics
* @param rowCount number of rows in the parquet file
-   * @return True if all rows are null in the parquet file
-   *  False if at least one row is not null.
+   * @return true if all rows are null in the parquet file and 
false otherwise
*/
   static boolean isAllNulls(Statistics stat, long rowCount) {
-return stat.isNumNullsSet() && stat.getNumNulls() == rowCount;
+Preconditions.checkArgument(rowCount >= 0, String.format("negative 
rowCount %d is not valid", rowCount));
 
 Review comment:
   To validate input. It can't give the correct answer if an input is a junk 
(`rowCount` is negative).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling

2018-06-29 Thread GitBox
vrozov commented on a change in pull request #1349: DRILL-6554: Minor code 
improvements in parquet statistics handling
URL: https://github.com/apache/drill/pull/1349#discussion_r199207853
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ##
 @@ -417,16 +417,9 @@ public static DateCorruptionStatus 
checkForCorruptDateValuesInStatistics(Parquet
 // column does not appear in this file, skip it
 continue;
   }
-  Statistics statistics = 
footer.getBlocks().get(rowGroupIndex).getColumns().get(colIndex).getStatistics();
-  Integer max = (Integer) statistics.genericGetMax();
-  if (statistics.hasNonNullValue()) {
-if (max > ParquetReaderUtility.DATE_CORRUPTION_THRESHOLD) {
-  return DateCorruptionStatus.META_SHOWS_CORRUPTION;
-}
-  } else {
-// no statistics, go check the first page
-return DateCorruptionStatus.META_UNCLEAR_TEST_VALUES;
-  }
+  IntStatistics statistics = 
(IntStatistics)footer.getBlocks().get(rowGroupIndex).getColumns().get(colIndex).getStatistics();
 
 Review comment:
   Please see parquet format spec. `DATE` is always `int32`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on a change in pull request #1349: DRILL-6554: Minor code improvements in parquet statistics handling

2018-06-29 Thread GitBox
arina-ielchiieva commented on a change in pull request #1349: DRILL-6554: Minor 
code improvements in parquet statistics handling
URL: https://github.com/apache/drill/pull/1349#discussion_r199207189
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java
 ##
 @@ -39,22 +40,21 @@ static boolean isNullOrEmpty(Statistics stat) {
*
* @param stat parquet column statistics
* @param rowCount number of rows in the parquet file
-   * @return True if all rows are null in the parquet file
-   *  False if at least one row is not null.
+   * @return true if all rows are null in the parquet file and 
false otherwise
*/
   static boolean isAllNulls(Statistics stat, long rowCount) {
-return stat.isNumNullsSet() && stat.getNumNulls() == rowCount;
+Preconditions.checkArgument(rowCount >= 0, String.format("negative 
rowCount %d is not valid", rowCount));
 
 Review comment:
   I think this check is not needed here :) Why this method should check what 
was given to it? It's job just to compare row count and number of nulls.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >