Re: release: third_party/

2017-01-31 Thread Harshad Deshmukh

Hello all,

At long last, we have dealt with the third party issue. Some highlights:

1. Most of the libraries are now downloaded through a shell script.

2. The download links point to the release versions of the libraries.

3. After downloading the source code, we apply appropriate patches.

Some libraries don't have an official release yet, so we will wait until 
they have a release. Until then, we have copied their entire source code 
to our third party directory. I have updated the build instructions 
based on these changes.


Thanks Julian, Marc and Zuyu for your help.

On 01/23/2017 09:57 PM, Jignesh Patel wrote:

Thanks Zuyu for the nice summary! For this release, I’d support going with 
Harshad’s lead, which is a single script to download the third party libraries.

Dear Harshad: If your life is simpler with any of the other option (e.g. the 
issue you are having with an old version of Ubuntu in Travis), when feel free 
to go with the Mesos approach.

Cheers,
Jignesh

On 1/23/17, 12:26 AM, "Zuyu Zhang"  wrote:

 FYI, there are some Apache projects in C++ (
 https://projects.apache.org/projects.html?language), and more in github (
 https://github.com/apache?language=c%2B%2B), including incubator projects.
 
 I summaries how typically they deal with the third parties and the release.
 
- Apache Mesos (https://github.com/apache/mesos) has most third parties

in release tar balls, along with patches.
- Apache Kudu (https://github.com/apache/kudu) has multiple scripts to
download and build third parties.
- Apache NiFi - MiNiFi (https://github.com/apache/nifi-minifi-cpp)
includes the whole codebase of third parties.
 
 Cheers,

 Zuyu
 





--
Thanks,
Harshad



Release Signing

2017-01-31 Thread Marc Spehlmann
One of the steps that must take place before releasing a release tarball is
to have the release managers digitally sign the tarball.

Hakan, Jignesh, Harshad I think you all are the release managers. Please
follow this guide

http://quickstep.apache.org/release-signing/

to
1) create a key pair
2) upload the public key to a public keyserver
3) (bonus for now) add the public key to a KEYS file in the root of
quickstep.

When the release tarball is ready, we can sign it.

To be fair, I'm not totally sure how this works because it seems to me that
everyone has to sign the release with their private key, meaning that it
must be uploaded to each PC where the private key is held, then signed?
That seems cumbersome.

Anyways, steps 1,2 are straightforward and need to be done before we
resolve that last problem.

Cheers,
Marc


[GitHub] incubator-quickstep pull request #172: QUICKSTEP-69 Query optimization with ...

2017-01-31 Thread hbdeshmukh
Github user hbdeshmukh commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/172#discussion_r98729248
  
--- Diff: query_optimizer/rules/InjectJoinFilters.cpp ---
@@ -0,0 +1,439 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "query_optimizer/rules/InjectJoinFilters.hpp"
+
+#include 
+#include 
+#include 
+
+#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExpressionUtil.hpp"
+#include "query_optimizer/expressions/Predicate.hpp"
+#include "query_optimizer/physical/LIPFilterConfiguration.hpp"
+#include "query_optimizer/physical/Aggregate.hpp"
+#include "query_optimizer/physical/FilterJoin.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/PatternMatcher.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/PhysicalType.hpp"
+#include "query_optimizer/physical/Selection.hpp"
+#include "query_optimizer/physical/TopLevelPlan.hpp"
+#include "query_optimizer/rules/PruneColumns.hpp"
+#include "types/TypeID.hpp"
+#include "types/TypedValue.hpp"
+#include "utility/lip_filter/LIPFilter.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+namespace optimizer {
+
+namespace E = ::quickstep::optimizer::expressions;
+namespace P = ::quickstep::optimizer::physical;
+
+P::PhysicalPtr InjectJoinFilters::apply(const P::PhysicalPtr ) {
+  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
+
+  const P::TopLevelPlanPtr top_level_plan =
+ std::static_pointer_cast(input);
+  cost_model_.reset(
+  new cost::StarSchemaSimpleCostModel(
+  top_level_plan->shared_subplans()));
+  lip_filter_configuration_.reset(new P::LIPFilterConfiguration());
+
+  // Step 1. Transform applicable HashJoin nodes to FilterJoin nodes.
+  P::PhysicalPtr output = transformHashJoinToFilters(input);
+
+  // Step 2. Push down FilterJoin nodes to be evaluated early.
+  output = pushDownFilters(output);
+
+  // Step 3. Add Selection nodes for attaching the LIPFilters, if 
necessary.
+  output = addFilterAnchors(output, false);
+
+  // Step 4. Because of the pushdown of FilterJoin nodes, there are 
optimization
+  // opportunities for projecting columns early.
+  output = PruneColumns().apply(output);
+
+  // Step 5. For each FilterJoin node, attach its corresponding LIPFilter 
to
+  // proper nodes.
+  concretizeAsLIPFilters(output, nullptr);
+
+  if (!lip_filter_configuration_->getBuildInfoMap().empty() ||
+  !lip_filter_configuration_->getProbeInfoMap().empty()) {
+output = std::static_pointer_cast(output)
+->copyWithLIPFilterConfiguration(
+  
P::LIPFilterConfigurationPtr(lip_filter_configuration_.release()));
+  }
+
+  return output;
+}
+
+bool InjectJoinFilters::isTransformable(
+const physical::HashJoinPtr _join) const {
+  // Conditions for replacing a HashJoin with a FilterJoin:
+
+  // No residual predicate.
+  if (hash_join->residual_predicate() != nullptr) {
+return false;
+  }
+  // Single attribute equi-join.
+  if (hash_join->right_join_attributes().size() > 1) {
+return false;
+  }
+  // All the output attributes must be from the probe side.
+  if (!E::SubsetOfExpressions(hash_join->getOutputAttributes(),
+  hash_join->left()->getOutputAttributes())) {
+return false;
+  }
+  switch (hash_join->join_type()) {
+case P::HashJoin::JoinType::kInnerJoin: {
+  // In the case of inner join, the build side join attributes must be 
unique.
+  if (!cost_model_->impliesUniqueAttributes(hash_join->right(),
+

[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-quickstep/pull/174


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #172: QUICKSTEP-69 Query optimization with ...

2017-01-31 Thread hbdeshmukh
Github user hbdeshmukh commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/172#discussion_r98724664
  
--- Diff: query_optimizer/rules/InjectJoinFilters.cpp ---
@@ -0,0 +1,439 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "query_optimizer/rules/InjectJoinFilters.hpp"
+
+#include 
+#include 
+#include 
+
+#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExpressionUtil.hpp"
+#include "query_optimizer/expressions/Predicate.hpp"
+#include "query_optimizer/physical/LIPFilterConfiguration.hpp"
+#include "query_optimizer/physical/Aggregate.hpp"
+#include "query_optimizer/physical/FilterJoin.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/PatternMatcher.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/PhysicalType.hpp"
+#include "query_optimizer/physical/Selection.hpp"
+#include "query_optimizer/physical/TopLevelPlan.hpp"
+#include "query_optimizer/rules/PruneColumns.hpp"
+#include "types/TypeID.hpp"
+#include "types/TypedValue.hpp"
+#include "utility/lip_filter/LIPFilter.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+namespace optimizer {
+
+namespace E = ::quickstep::optimizer::expressions;
+namespace P = ::quickstep::optimizer::physical;
+
+P::PhysicalPtr InjectJoinFilters::apply(const P::PhysicalPtr ) {
+  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
+
+  const P::TopLevelPlanPtr top_level_plan =
+ std::static_pointer_cast(input);
+  cost_model_.reset(
+  new cost::StarSchemaSimpleCostModel(
+  top_level_plan->shared_subplans()));
+  lip_filter_configuration_.reset(new P::LIPFilterConfiguration());
+
+  // Step 1. Transform applicable HashJoin nodes to FilterJoin nodes.
+  P::PhysicalPtr output = transformHashJoinToFilters(input);
+
+  // Step 2. Push down FilterJoin nodes to be evaluated early.
+  output = pushDownFilters(output);
+
+  // Step 3. Add Selection nodes for attaching the LIPFilters, if 
necessary.
+  output = addFilterAnchors(output, false);
+
+  // Step 4. Because of the pushdown of FilterJoin nodes, there are 
optimization
+  // opportunities for projecting columns early.
+  output = PruneColumns().apply(output);
+
+  // Step 5. For each FilterJoin node, attach its corresponding LIPFilter 
to
+  // proper nodes.
+  concretizeAsLIPFilters(output, nullptr);
+
+  if (!lip_filter_configuration_->getBuildInfoMap().empty() ||
+  !lip_filter_configuration_->getProbeInfoMap().empty()) {
+output = std::static_pointer_cast(output)
+->copyWithLIPFilterConfiguration(
+  
P::LIPFilterConfigurationPtr(lip_filter_configuration_.release()));
+  }
+
+  return output;
+}
+
+bool InjectJoinFilters::isTransformable(
+const physical::HashJoinPtr _join) const {
+  // Conditions for replacing a HashJoin with a FilterJoin:
+
+  // No residual predicate.
+  if (hash_join->residual_predicate() != nullptr) {
+return false;
+  }
+  // Single attribute equi-join.
+  if (hash_join->right_join_attributes().size() > 1) {
+return false;
+  }
+  // All the output attributes must be from the probe side.
+  if (!E::SubsetOfExpressions(hash_join->getOutputAttributes(),
+  hash_join->left()->getOutputAttributes())) {
+return false;
+  }
+  switch (hash_join->join_type()) {
+case P::HashJoin::JoinType::kInnerJoin: {
+  // In the case of inner join, the build side join attributes must be 
unique.
+  if (!cost_model_->impliesUniqueAttributes(hash_join->right(),
+

[GitHub] incubator-quickstep pull request #172: QUICKSTEP-69 Query optimization with ...

2017-01-31 Thread hbdeshmukh
Github user hbdeshmukh commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/172#discussion_r98722409
  
--- Diff: query_optimizer/rules/InjectJoinFilters.hpp ---
@@ -0,0 +1,115 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_INJECT_JOIN_FILTERS_HPP_
+#define QUICKSTEP_QUERY_OPTIMIZER_RULES_INJECT_JOIN_FILTERS_HPP_
+
+#include 
+#include 
+#include 
+
+#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/physical/LIPFilterConfiguration.hpp"
+#include "query_optimizer/physical/FilterJoin.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/rules/Rule.hpp"
+#include "utility/Macros.hpp"
+
+namespace quickstep {
+namespace optimizer {
+
+/** \addtogroup OptimizerRules
+ *  @{
+ */
+
+/**
+ * @brief Rule that applies to a physical plan to transform HashJoin nodes 
into
+ *FilterJoin nodes.
+ * 
+ * This is an optimization that strength-reduces HashJoins to FilterJoins
+ * (implemented as LIPFilters attached to some anchoring operators where 
the
+ * filters get applied). Briefly speaking, the idea is that in the case 
that
+ * (1) the join attribute has consecutive integer values bounded in a 
reasonably
+ * small range AND (2) the output attributes are all from the probe-side 
table,
+ * we can eliminate the HashJoin by building a BitVector on the build-side
+ * attribute and using the BitVector to filter the probe-side table.
+ */
+class InjectJoinFilters : public Rule {
+ public:
+  /**
+   * @brief Constructor.
+   */
+  InjectJoinFilters() {}
+
+  ~InjectJoinFilters() override {}
+
+  std::string getName() const override {
+return "TransformFilterJoins";
+  }
+
+  physical::PhysicalPtr apply(const physical::PhysicalPtr ) override;
+
+ private:
+  // Check whether a HashJoin can be transformed into a FilterJoin.
+  bool isTransformable(const physical::HashJoinPtr _join) const;
+
+  // Transform applicable HashJoin nodes into FilterJoin nodes.
+  physical::PhysicalPtr transformHashJoinToFilters(
+  const physical::PhysicalPtr ) const;
+
+  // Push down FilterJoin nodes to be evaluated early.
+  physical::PhysicalPtr pushDownFilters(const physical::PhysicalPtr 
) const;
+
+  // Add Selection node, if necessary, for anchoring the LIP filters built 
by
+  // FilterJoin nodes.
+  physical::PhysicalPtr addFilterAnchors(const physical::PhysicalPtr 
,
+ const bool 
ancestor_can_anchor_filter) const;
+
+  // Setup lip_filter_configuration_ with the transformed plan tree.
+  void concretizeAsLIPFilters(const physical::PhysicalPtr ,
+  const physical::PhysicalPtr _node) 
const;
+
+  physical::PhysicalPtr pushDownFiltersInternal(
+  const physical::PhysicalPtr _child,
+  const physical::PhysicalPtr _child,
+  const physical::FilterJoinPtr _join) const;
+
+  bool findExactMinMaxValuesForAttributeHelper(
+  const physical::PhysicalPtr _plan,
+  const expressions::AttributeReferencePtr ,
+  std::int64_t *min_cpp_value,
+  std::int64_t *max_cpp_value) const;
+
+  std::unique_ptr cost_model_;
+  std::unique_ptr 
lip_filter_configuration_;
+
+  // 1G bits = 128MB
--- End diff --

This could be made a GFLAG variable in a later refactoring. You can add a 
TODO for this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or 

[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98716718
  
--- Diff: query_optimizer/PhysicalGenerator.cpp ---
@@ -108,6 +109,7 @@ P::PhysicalPtr PhysicalGenerator::generateInitialPlan(
 P::PhysicalPtr PhysicalGenerator::optimizePlan() {
   std::vector> rules;
   rules.emplace_back(new PruneColumns());
+  rules.emplace_back(new PushDownLowCostDisjunctivePredicate());
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98714653
  
--- Diff: query_optimizer/PhysicalGenerator.cpp ---
@@ -108,6 +109,7 @@ P::PhysicalPtr PhysicalGenerator::generateInitialPlan(
 P::PhysicalPtr PhysicalGenerator::optimizePlan() {
   std::vector> rules;
   rules.emplace_back(new PruneColumns());
+  rules.emplace_back(new PushDownLowCostDisjunctivePredicate());
--- End diff --

Please add a `NOTE` comment regarding two `Selection`s after applying this 
rule.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98714161
  
--- Diff: query_optimizer/PhysicalGenerator.cpp ---
@@ -108,6 +109,7 @@ P::PhysicalPtr PhysicalGenerator::generateInitialPlan(
 P::PhysicalPtr PhysicalGenerator::optimizePlan() {
   std::vector> rules;
   rules.emplace_back(new PruneColumns());
+  rules.emplace_back(new PushDownLowCostDisjunctivePredicate());
--- End diff --

Oh I made a mistake that actually the `InjectFilterJoin` optimization will 
not generate unnecessary `Selection`s, but 
`PushDownLowCostDisjunctivePredicate` may generate two `Selection`s in a chain. 
Currently this is a not an issue since the `Selection`s for tables with 
cardinality around 100 are very light-weight. We may add a `FuseSelection` 
optimization later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98711958
  
--- Diff: query_optimizer/PhysicalGenerator.cpp ---
@@ -108,6 +109,7 @@ P::PhysicalPtr PhysicalGenerator::generateInitialPlan(
 P::PhysicalPtr PhysicalGenerator::optimizePlan() {
   std::vector> rules;
   rules.emplace_back(new PruneColumns());
+  rules.emplace_back(new PushDownLowCostDisjunctivePredicate());
--- End diff --

Yes it will only create necessary `Selection`s. I.e. there won't be two 
selections in a chain that can be fused into one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98711534
  
--- Diff: query_optimizer/PhysicalGenerator.cpp ---
@@ -108,6 +109,7 @@ P::PhysicalPtr PhysicalGenerator::generateInitialPlan(
 P::PhysicalPtr PhysicalGenerator::optimizePlan() {
   std::vector> rules;
   rules.emplace_back(new PruneColumns());
+  rules.emplace_back(new PushDownLowCostDisjunctivePredicate());
--- End diff --

I think 
[here](https://github.com/apache/incubator-quickstep/pull/174/files#diff-ca3b59cc48fbc383291bdfefaf5de128R188)
 we add a new `Physical::Select`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98710981
  
--- Diff: query_optimizer/rules/PushDownLowCostDisjunctivePredicate.cpp ---
@@ -0,0 +1,218 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp"
+
+#include 
+#include 
+
+#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExpressionUtil.hpp"
+#include "query_optimizer/expressions/LogicalAnd.hpp"
+#include "query_optimizer/expressions/LogicalOr.hpp"
+#include "query_optimizer/expressions/PatternMatcher.hpp"
+#include "query_optimizer/expressions/Predicate.hpp"
+#include "query_optimizer/physical/Aggregate.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/NestedLoopsJoin.hpp"
+#include "query_optimizer/physical/PatternMatcher.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/PhysicalType.hpp"
+#include "query_optimizer/physical/Selection.hpp"
+#include "query_optimizer/physical/TableReference.hpp"
+#include "query_optimizer/physical/TopLevelPlan.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+namespace optimizer {
+
+namespace E = ::quickstep::optimizer::expressions;
+namespace P = ::quickstep::optimizer::physical;
+
+P::PhysicalPtr PushDownLowCostDisjunctivePredicate::apply(const 
P::PhysicalPtr ) {
+  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
+
+  const P::TopLevelPlanPtr top_level_plan =
+ std::static_pointer_cast(input);
+  cost_model_.reset(
+  new cost::StarSchemaSimpleCostModel(
+  top_level_plan->shared_subplans()));
+
+  collectApplicablePredicates(input);
+
+  if (!applicable_predicates_.empty()) {
+// Apply the selected predicates to stored relations.
+return attachPredicates(input);
+  } else {
+return input;
+  }
+}
+
+void PushDownLowCostDisjunctivePredicate::collectApplicablePredicates(
+const physical::PhysicalPtr ) {
+  P::TableReferencePtr table_reference;
+  if (P::SomeTableReference::MatchesWithConditionalCast(input, 
_reference)) {
+// Consider only stored relations with small cardinality as targets.
+if (cost_model_->estimateCardinality(input) <= 100u) {
+  applicable_nodes_.emplace_back(input, 
_reference->attribute_list());
+}
+return;
+  }
+
+  for (const auto  : input->children()) {
+collectApplicablePredicates(child);
+  }
+
+  E::PredicatePtr filter_predicate = nullptr;
+  switch (input->getPhysicalType()) {
+case P::PhysicalType::kAggregate: {
+  filter_predicate =
+  std::static_pointer_cast(input)->filter_predicate();
+  break;
+}
+case P::PhysicalType::kHashJoin: {
+  const P::HashJoinPtr hash_join =
+  std::static_pointer_cast(input);
+  if (hash_join->join_type() == P::HashJoin::JoinType::kInnerJoin) {
+filter_predicate = hash_join->residual_predicate();
+  }
+  break;
+}
+case P::PhysicalType::kNestedLoopsJoin: {
+  filter_predicate =
+  std::static_pointer_cast(input)->join_predicate();
+  break;
+}
+case P::PhysicalType::kSelection: {
+  filter_predicate =
+  std::static_pointer_cast(input)->filter_predicate();
+  break;
+}
+default:
+  break;
+  }
+
+  E::LogicalOrPtr disjunctive_predicate;
+  if (filter_predicate == nullptr ||
+  !E::SomeLogicalOr::MatchesWithConditionalCast(filter_predicate, 
_predicate)) {
+return;
+  }
+
+  // Consider only disjunctive normal form, i.e. disjunction of 

[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98710964
  
--- Diff: query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp ---
@@ -0,0 +1,118 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#ifndef 
QUICKSTEP_QUERY_OPTIMIZER_RULES_PUSH_DOWN_LOW_COST_DISJUNCTIVE_PREDICATE_HPP_
+#define 
QUICKSTEP_QUERY_OPTIMIZER_RULES_PUSH_DOWN_LOW_COST_DISJUNCTIVE_PREDICATE_HPP_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/Predicate.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/rules/Rule.hpp"
+#include "utility/Macros.hpp"
+
+namespace quickstep {
+namespace optimizer {
+
+/** \addtogroup OptimizerRules
+ *  @{
+ */
+
+/**
+ * @brief Rule that applies to a physical plan to push down low-cost 
disjunctive
+ *predicate when proper conditions are met.
+ *
+ * Here we elaborate the conditions.
+ *
+ * Let
+ *   P = p_{1,1} AND ... AND p_{1, m_1} OR ... OR p_{n,1} AND ... AND 
p_{n, m_n}
+ * be a predicate in disjunctive normal form.
+ *
+ * Now consider each small-cardinality relation R, if for each i in 1..n, 
there
+ * exists at least one predicate p_{i, k_i} that is applicable to R. Then 
we can
+ * construct a new predicate
+ *   P' = p_{1, k_1} OR ... OR p_{n, k_n}
+ * and push down P' to be applied to R.
+ *
+ * Also, if any conjunctive component in P contains more than one 
predicate that
+ * is applicable to R, then we can combine all these applicable predicates 
as a
+ * conjunctive component in P'.
+ *
+ * Finally, note that if there exists a conjunctive component that 
contains no
+ * predicate applicable to R. Then the condition fails and we cannot do a 
push
+ * down for R.
+ */
+class PushDownLowCostDisjunctivePredicate : public 
Rule {
+ public:
+  /**
+   * @brief Constructor.
+   */
+  PushDownLowCostDisjunctivePredicate() {}
+
+  ~PushDownLowCostDisjunctivePredicate() override {}
+
+  std::string getName() const override {
+return "PushDownLowCostDisjunctivePredicate";
+  }
+
+  physical::PhysicalPtr apply(const physical::PhysicalPtr ) override;
+
+ private:
+  struct PredicateInfo {
+PredicateInfo() {}
+inline void add(expressions::PredicatePtr predicate) {
+  predicates.emplace_back(predicate);
+}
+std::vector predicates;
+  };
+
+  void collectApplicablePredicates(const physical::PhysicalPtr );
+  physical::PhysicalPtr attachPredicates(const physical::PhysicalPtr 
) const;
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98710865
  
--- Diff: query_optimizer/rules/PushDownLowCostDisjunctivePredicate.cpp ---
@@ -0,0 +1,218 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp"
+
+#include 
+#include 
+
+#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExpressionUtil.hpp"
+#include "query_optimizer/expressions/LogicalAnd.hpp"
+#include "query_optimizer/expressions/LogicalOr.hpp"
+#include "query_optimizer/expressions/PatternMatcher.hpp"
+#include "query_optimizer/expressions/Predicate.hpp"
+#include "query_optimizer/physical/Aggregate.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/NestedLoopsJoin.hpp"
+#include "query_optimizer/physical/PatternMatcher.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/PhysicalType.hpp"
+#include "query_optimizer/physical/Selection.hpp"
+#include "query_optimizer/physical/TableReference.hpp"
+#include "query_optimizer/physical/TopLevelPlan.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+namespace optimizer {
+
+namespace E = ::quickstep::optimizer::expressions;
+namespace P = ::quickstep::optimizer::physical;
+
+P::PhysicalPtr PushDownLowCostDisjunctivePredicate::apply(const 
P::PhysicalPtr ) {
+  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
+
+  const P::TopLevelPlanPtr top_level_plan =
+ std::static_pointer_cast(input);
+  cost_model_.reset(
+  new cost::StarSchemaSimpleCostModel(
+  top_level_plan->shared_subplans()));
+
+  collectApplicablePredicates(input);
+
+  if (!applicable_predicates_.empty()) {
+// Apply the selected predicates to stored relations.
+return attachPredicates(input);
+  } else {
+return input;
+  }
+}
+
+void PushDownLowCostDisjunctivePredicate::collectApplicablePredicates(
+const physical::PhysicalPtr ) {
+  P::TableReferencePtr table_reference;
+  if (P::SomeTableReference::MatchesWithConditionalCast(input, 
_reference)) {
+// Consider only stored relations with small cardinality as targets.
+if (cost_model_->estimateCardinality(input) <= 100u) {
--- End diff --

It is okay to be conservative and the stat is more accurate if the 
`\analyze` command gets executed.

Added a gflag.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98708887
  
--- Diff: query_optimizer/PhysicalGenerator.cpp ---
@@ -108,6 +109,7 @@ P::PhysicalPtr PhysicalGenerator::generateInitialPlan(
 P::PhysicalPtr PhysicalGenerator::optimizePlan() {
   std::vector> rules;
   rules.emplace_back(new PruneColumns());
+  rules.emplace_back(new PushDownLowCostDisjunctivePredicate());
--- End diff --

The three rules blocks from `PushDownLowCostDisjunctivePredicate` to 
`ReorderColumns` can be arranged in any order.

`PushDownLowCostDisjunctivePredicate` will not generate unnecessary 
`Selection`s. And currently in the physical generate there's no rule that fuses 
`Selection`s.

I add a comment to indicate that currently new rules should better be added 
before `AttachLIPFilters` because rules after that needs extra handling of 
`LIPFilterConfiguration` for transformed nodes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98708919
  
--- Diff: query_optimizer/PhysicalGenerator.cpp ---
@@ -27,6 +27,7 @@
 #include "query_optimizer/logical/Logical.hpp"
 #include "query_optimizer/physical/Physical.hpp"
 #include "query_optimizer/rules/AttachLIPFilters.hpp"
+#include "query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp"
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98691158
  
--- Diff: query_optimizer/rules/PushDownLowCostDisjunctivePredicate.cpp ---
@@ -0,0 +1,218 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp"
+
+#include 
+#include 
+
+#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExpressionUtil.hpp"
+#include "query_optimizer/expressions/LogicalAnd.hpp"
+#include "query_optimizer/expressions/LogicalOr.hpp"
+#include "query_optimizer/expressions/PatternMatcher.hpp"
+#include "query_optimizer/expressions/Predicate.hpp"
+#include "query_optimizer/physical/Aggregate.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/NestedLoopsJoin.hpp"
+#include "query_optimizer/physical/PatternMatcher.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/PhysicalType.hpp"
+#include "query_optimizer/physical/Selection.hpp"
+#include "query_optimizer/physical/TableReference.hpp"
+#include "query_optimizer/physical/TopLevelPlan.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+namespace optimizer {
+
+namespace E = ::quickstep::optimizer::expressions;
+namespace P = ::quickstep::optimizer::physical;
+
+P::PhysicalPtr PushDownLowCostDisjunctivePredicate::apply(const 
P::PhysicalPtr ) {
+  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
+
+  const P::TopLevelPlanPtr top_level_plan =
+ std::static_pointer_cast(input);
+  cost_model_.reset(
+  new cost::StarSchemaSimpleCostModel(
+  top_level_plan->shared_subplans()));
+
+  collectApplicablePredicates(input);
+
+  if (!applicable_predicates_.empty()) {
+// Apply the selected predicates to stored relations.
+return attachPredicates(input);
+  } else {
+return input;
+  }
+}
+
+void PushDownLowCostDisjunctivePredicate::collectApplicablePredicates(
+const physical::PhysicalPtr ) {
+  P::TableReferencePtr table_reference;
+  if (P::SomeTableReference::MatchesWithConditionalCast(input, 
_reference)) {
+// Consider only stored relations with small cardinality as targets.
+if (cost_model_->estimateCardinality(input) <= 100u) {
--- End diff --

Just curious about `100u` here. My concern is that the result of 
`estimateCardinality` for a small relation is two orders bigger than the fact; 
I have seen it in the `query_optimizer execution_generator` unit test.

Also, I'd suggest to use a `gflags` for `100u` to replace 
`kCardinalityThreshold` defined in the header file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98689643
  
--- Diff: query_optimizer/PhysicalGenerator.cpp ---
@@ -108,6 +109,7 @@ P::PhysicalPtr PhysicalGenerator::generateInitialPlan(
 P::PhysicalPtr PhysicalGenerator::optimizePlan() {
   std::vector> rules;
   rules.emplace_back(new PruneColumns());
+  rules.emplace_back(new PushDownLowCostDisjunctivePredicate());
--- End diff --

Could we have some comments regarding the order of `rules`, i.e., why put 
`PushDownLowCostDisjunctivePredicate` here?

If I understand correctly, `PushDownLowCostDisjunctivePredicate` may 
generate new `Physical Select`s. Would the rest of `rules` eliminate multiple 
`Physical Select`s into one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98688681
  
--- Diff: query_optimizer/PhysicalGenerator.cpp ---
@@ -27,6 +27,7 @@
 #include "query_optimizer/logical/Logical.hpp"
 #include "query_optimizer/physical/Physical.hpp"
 #include "query_optimizer/rules/AttachLIPFilters.hpp"
+#include "query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp"
--- End diff --

Code style: sort in alphabetic order, so move this line after the next.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-quickstep/pull/171


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---