[GitHub] incubator-quickstep pull request #19: Improve text scan operator

2016-06-09 Thread jianqiao
GitHub user jianqiao opened a pull request:

https://github.com/apache/incubator-quickstep/pull/19

Improve text scan operator

This PR updates the `TextScanOperator` to improve its performance.

There are three main changes:
(1) Pass `text_offset` and `text_segment_size` as parameters to each 
`TextScanWorkOrder` instead of really loading the data. Then each 
`TextScanWorkOrder` reads the corresponding piece of data directly from disk.
(2) Avoid extra string copying by passing `const char **` buffer pointers 
into `parseRow()` and `extractFieldString()`.
(3) Use `ColumnVectorsValueAccessor` as the temporary container to store 
the parsed tuples. Then call `output_destination_->bulkInsertTuples()` to bulk 
insert the tuples.

**Note 1:** This updated version follows the semantics of the old 
`TextScanOperator` except that it does not support the backslash + newline 
escaping, e.g.
(a)
```
\

```
which is semantically equivalent to
(b)
```
\n
```
The updated version supports (b) but not (a). As (a) incurs extra logic 
that complicates code. Meanwhile, format (a) seems to be specific to 
PostgreSQL, and the 
[documentation](http://www.postgresql.org/docs/9.6/static/sql-copy.html) of 
PostgreSQL 9.6 says:
_It is strongly recommended that applications generating COPY data convert 
data newlines and carriage returns to the \n and \r sequences respectively. At 
present it is possible to represent a data carriage return by a backslash and 
carriage return, and to represent a data newline by a backslash and newline. 
However, these representations might not be accepted in future releases. They 
are also highly vulnerable to corruption if the COPY file is transferred across 
different machines (for example, from Unix to Windows or vice versa)._

**Note 2:** This PR relies on the fix from #18 to work correctly for 
loading `TYPE compressed_columnstore` tables.

**Note 3:** Using 40 workers, the expected loading time on cloudlab 
machines with current SQL-benchmark settings are ~465s for SSB SF100 and ~1050s 
for TPCH SF100.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-quickstep 
improve-text-scan-operator-column-vectors

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/19.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19


commit 55b06fab1bd336f2cc7ee4bd557d3328a428e4ab
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2016-06-09T08:18:37Z

Improve text scan operator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #19: Improve text scan operator

2016-06-09 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/19#discussion_r66470811
  
--- Diff: relational_operators/TextScanOperator.cpp ---
@@ -155,116 +63,50 @@ bool TextScanOperator::getAllWorkOrders(
   InsertDestination *output_destination =
   query_context->getInsertDestination(output_destination_index_);
 
-  if (parallelize_load_) {
-// Parallel implementation: Split work orders are generated for each 
file
-// being bulk-loaded. (More than one file can be loaded, because we 
support
-// glob() semantics in file name.) These work orders read the input 
file,
-// and split them in the blobs that can be parsed independently.
-if (blocking_dependencies_met_) {
-  if (!work_generated_) {
-// First, generate text-split work orders.
-for (const auto  : files) {
-  container->addNormalWorkOrder(
-  new TextSplitWorkOrder(query_id_,
- file,
- process_escape_sequences_,
- storage_manager,
- op_index_,
- scheduler_client_id,
- bus),
-  op_index_);
-  ++num_split_work_orders_;
-}
-work_generated_ = true;
-return false;
-  } else {
-// Check if there are blobs to parse.
-while (!text_blob_queue_.empty()) {
-  const TextBlob blob_work = text_blob_queue_.popOne();
-  container->addNormalWorkOrder(
-  new TextScanWorkOrder(query_id_,
-blob_work.blob_id,
-blob_work.size,
-field_terminator_,
-process_escape_sequences_,
-output_destination,
-storage_manager),
-  op_index_);
-}
-// Done if all split work orders are completed, and no blobs are 
left to
-// process.
-return num_done_split_work_orders_.load(std::memory_order_acquire) 
== num_split_work_orders_ &&
-   text_blob_queue_.empty();
-  }
-}
-return false;
-  } else {
-// Serial implementation.
-if (blocking_dependencies_met_ && !work_generated_) {
-  for (const auto  : files) {
+  // Text segment size set to 256KB.
+  constexpr std::size_t kTextSegmentSize = 0x4u;
+
+  if (blocking_dependencies_met_ && !work_generated_) {
+for (const std::string  : files) {
+  // Use standard C libary to retrieve the file size.
+  FILE *fp = std::fopen(file.c_str(), "rb");
+  std::fseek(fp, 0, SEEK_END);
+  const std::size_t file_size = std::ftell(fp);
+  std::fclose(fp);
+
+  std::size_t text_offset = 0;
+  while (text_offset < file_size) {
 container->addNormalWorkOrder(
 new TextScanWorkOrder(query_id_,
   file,
+  text_offset,
+  std::min(kTextSegmentSize, file_size - 
text_offset),
   field_terminator_,
   process_escape_sequences_,
   output_destination,
   storage_manager),
 op_index_);
+text_offset += kTextSegmentSize;
--- End diff --

Yes alternatively we can do
```
std::size_t text_segment_size = std::min(kTextSegmentSize, file_size - 
text_offset);

... // addNormalWorkOrder

text_offset += text_segment_size;
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #18: Fix a potential segfault in Compressed...

2016-06-09 Thread jianqiao
GitHub user jianqiao opened a pull request:

https://github.com/apache/incubator-quickstep/pull/18

Fix a potential segfault in CompressedBlockBuilder

This PR fixes a potential bug in `CompressedBlockBuilder`. This fix is 
necessary for a subsequent PR on `TextScanOperator` where each work order first 
constructs a `ColumnVectorsValueAccessor` and then bulk insert the value 
accessor into a `CompressedColumnStoreTupleStorageSubBlock`.

In the typical scenario, `CompressedBlockBuilder` first collects tuples 
from a `ValueAccesor` into a `PtrVector` , then calls `buildDictionary` 
to process the tuples. However, the collected tuples do not own the underlying 
attribute values' **`out_of_line_data`** if the value is of `Char` or `VarChar` 
type. This incurs segmentation fault if the `ValueAccessor` (which owns the 
`out_of_line_data`) gets released before `buildDictionary` is completed.

The fix is just to ensure that the `PtrVector` owns the underlying 
`out_of_line_data`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-quickstep 
fix-compressed-block-builder-tuple-values-not-owned

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/18.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18


commit e95b312d665ea3e6b1353b62f1efca19c303d48e
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2016-06-09T05:43:16Z

Fix a potential segfault with CompressedBlockBuilder




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #34: Bug fixed in \analyze command and reuse code.

2016-06-15 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/34
  
LGTM! Merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #40: Minor Change: Added move constructor i...

2016-06-27 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/40#discussion_r68642507
  
--- Diff: query_optimizer/logical/Sort.hpp ---
@@ -137,6 +150,19 @@ class Sort : public Logical {
 addChild(input_);
   }
 
+  Sort(const LogicalPtr ,
+   const std::vector 
&_attributes,
+   const std::vector &_ascending,
+   const std::vector &_first_flags,
+   const int limit)
+  : input_(input),
+sort_attributes_(std::move(sort_attributes)),
+sort_ascending_(std::move(sort_ascending)),
+nulls_first_flags_(std::move(nulls_first_flags)),
+limit_(limit) {
+addChild(input_);
+  }
--- End diff --

Remove `const` for all rvalue reference (`&&`) arguments, otherwise it is 
not really doing the move construction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #73: Move hash join's probe and build node ...

2016-08-03 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/73#discussion_r73290304
  
--- Diff: query_optimizer/rules/BottomUpRule.hpp ---
@@ -80,6 +81,14 @@ class BottomUpRule : public Rule {
*/
   virtual TreeNodePtr applyToNode(const TreeNodePtr ) = 0;
 
+  /**
+   * @brief Initialization code to be used for each node.
--- End diff --

The initialization is only applied with the tree's root node but not "each 
node".
Other places look good. I'll do this minor revision and then merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #76: Visualize execution plan DAGs annotated with ...

2016-08-03 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/76
  
Thanks Harshad!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #74: Removed the redundant query id in the optimiz...

2016-08-03 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/74
  
This change looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #51: Minor changes in profiling work order ...

2016-07-06 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/51#discussion_r69846306
  
--- Diff: query_execution/Foreman.cpp ---
@@ -238,16 +238,17 @@ void Foreman::printWorkOrderProfilingResults(const 
std::size_t query_id,
   const std::vector<
   std::tuple<std::size_t, std::size_t, std::size_t>>
   _times = policy_enforcer_->getProfilingResults(query_id);
-  fputs("Worker ID, NUMA Socket, Operator ID, Time (microseconds)\n", out);
+  fputs("Query ID,Worker ID,NUMA Socket,Operator ID,Time 
(microseconds)\n", out);
   for (auto workorder_entry : recorded_times) {
 // Note: Index of the "worker thread index" in the tuple is 0.
 const std::size_t worker_id = std::get<0>(workorder_entry);
 fprintf(out,
-"%lu, %d, %lu, %lu\n",
+"%lu,%lu,%d,%lu,%lu\n",
+query_id,
 worker_id,
 worker_directory_->getNUMANode(worker_id),
-std::get<1>(workorder_entry),
-std::get<2>(workorder_entry));
+std::get<1>(workorder_entry),  // Oeprator ID.
--- End diff --

Probably the comment should be `// Operator ID.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #51: Minor changes in profiling work order output.

2016-07-06 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/51
  
LGTM except for one minor typo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #79: Constructed Generators once in the optimizer.

2016-08-04 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/79
  
@zuyu The `Generator`'s can be stateless (need some refactoring), but each 
query should have its own `OptimizerContext`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #81: Refactored QueryProcessor.

2016-08-06 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/81
  
LGTM. Merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #73: Move hash join's probe and build node ...

2016-08-02 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/73#discussion_r73208331
  
--- Diff: query_optimizer/rules/SwapProbeBuild.hpp ---
@@ -0,0 +1,47 @@
+#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_SWAP_PROBE_BUILD_HPP_
+#define QUICKSTEP_QUERY_OPTIMIZER_RULES_SWAP_PROBE_BUILD_HPP_
+
+#include 
--- End diff --

Add `#include ` as `std::unique_ptr` is used in this file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #73: Move hash join's probe and build node ...

2016-08-02 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/73#discussion_r73204750
  
--- Diff: query_optimizer/rules/BottomUpRule.hpp ---
@@ -80,6 +81,8 @@ class BottomUpRule : public Rule {
*/
   virtual TreeNodePtr applyToNode(const TreeNodePtr ) = 0;
 
+  virtual void init(const TreeNodePtr ) = 0;
--- End diff --

Let's provide a default no-op implementation for `init()` here (instead of 
pure virtual) so we don't have to implement it for every subclass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #73: Move hash join's probe and build node ...

2016-08-02 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/73#discussion_r73207435
  
--- Diff: query_optimizer/rules/SwapProbeBuild.cpp ---
@@ -0,0 +1,64 @@
+#include "query_optimizer/rules/SwapProbeBuild.hpp"
+
+#include 
+#include 
+
+#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/PatternMatcher.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/TopLevelPlan.hpp"
+#include "query_optimizer/rules/Rule.hpp"
+
+
+namespace quickstep {
+namespace optimizer {
+
+P::PhysicalPtr SwapProbeBuild::applyToNode(const P::PhysicalPtr ) {
+  P::HashJoinPtr hash_join;
+
+  if (P::SomeHashJoin::MatchesWithConditionalCast(input, _join)
+  && hash_join->join_type() == P::HashJoin::JoinType::kInnerJoin) {
+P::PhysicalPtr left = hash_join->left();
+P::PhysicalPtr right = hash_join->right();
+
+
+
--- End diff --

Remove the extra blank lines here (at most 1 line).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #73: Move hash join's probe and build node decisio...

2016-08-02 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/73
  
LGTM except for some minor places. Also fix the style check issue then 
we're good to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #73: Move hash join's probe and build node ...

2016-08-01 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/73#discussion_r73079256
  
--- Diff: query_optimizer/rules/SwapProbeBuild.hpp ---
@@ -0,0 +1,46 @@
+#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_SWAP_PROBE_BUILD_HPP_
+#define QUICKSTEP_QUERY_OPTIMIZER_RULES_SWAP_PROBE_BUILD_HPP_
+
+#include 
+
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/rules/Rule.hpp"
+#include "query_optimizer/rules/BottomUpRule.hpp"
+#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
+#include "utility/Macros.hpp"
+
+namespace quickstep {
+namespace optimizer {
+
+/** \addtogroup OptimizerRules
+ *  @{
+ */
+
+namespace P = ::quickstep::optimizer::physical;
+namespace E = ::quickstep::optimizer::expressions;
+namespace C = ::quickstep::optimizer::cost;
+
+/**
+ * @brief Rule that applies to a physical plan to arrange probe and
+ *build side based on the cardinalities.
+ */
+class SwapProbeBuild : public BottomUpRule {
+ public:
+  SwapProbeBuild() {
+  }
+
+  std::string getName() const override { return "SwapProbeBuild"; }
+
+ protected:
+  P::PhysicalPtr applyToNode(const P::PhysicalPtr ) override;
+
+ private:
+  DISALLOW_COPY_AND_ASSIGN(SwapProbeBuild);
+
+  std::unique_ptr cost_model_;
--- End diff --

Currently use `SimpleCostModel` instead of `StarSchemaSimpleCostModel` for 
deciding the build/probe sides.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

2017-02-02 Thread jianqiao
GitHub user jianqiao opened a pull request:

https://github.com/apache/incubator-quickstep/pull/177

Reduce the number of group-by attributes by pulling tables up aggregation.

This PR implements an optimization (physical plan transformation) that 
pulls a table up an aggregation if many of that table's attributes serve as 
group-by attributes in the aggregation. We do the optimization because it is 
relatively slow to aggregate with a large set of group-by attributes, as well 
as to avoid copying the table's many attributes all the way up a chain of 
operators.

For example, let `R` be a relation with `PRIMARY KEY x` and attributes `y`, 
`z`. Let `S` be a relation with `FOREIGN KEY u` refering to `R.x` and attribute 
`v`. Then the optimization rule will transform the physical plan:

```
Aggregate(
  [input relation]: HashJoin(
  [probe relation]: S
  [build relation]: R
  [join expression]: S.u = R.x
  [project attributes]: v, x, y, z
)
  [aggregate expression]: SUM(v) AS sum_v
  [group-by attributes]: x, y, z
)
``` 
into:
```
HashJoin(
  [probe relation]: Aggregate(
  [input relation]: S
  [aggregate expression]: SUM(v) AS sum_v
  [group-by attribute]: u
) AS T
  [build relation]: R
  [join expression]: T.u = R.x
  [project attributes]: sum_v, x, y, z
)
```

This optimization improves the performance of TPC-H Q10 from ~13s to ~5.7s, 
with scale factor 100 on a cloudlab machine.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-quickstep 
reduce-group-by-attrs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/177.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #177


commit 83d1b592445eb27ad34d56ded14618b019bdd11d
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-01-30T00:36:14Z

Reduce the number of group-by attributes by pulling tables up aggregations




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #:

2017-02-03 Thread jianqiao
Github user jianqiao commented on the pull request:


https://github.com/apache/incubator-quickstep/commit/4f8fdbe8451aed1ad1c07a8badb5be85bee1ff57#commitcomment-20738519
  
In relational_operators/TextScanOperator.cpp:
In relational_operators/TextScanOperator.cpp on line 129:
Yes there can be probably a problem with that. I will look into it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #:

2017-02-03 Thread jianqiao
Github user jianqiao commented on the pull request:


https://github.com/apache/incubator-quickstep/commit/4f8fdbe8451aed1ad1c07a8badb5be85bee1ff57#commitcomment-20738416
  
In relational_operators/TextScanOperator.cpp:
In relational_operators/TextScanOperator.cpp on line 129:
Did you mean that the system will crash (e.g. segfault) with `EOF` here?
Is there some example (schema + data) for the case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98716718
  
--- Diff: query_optimizer/PhysicalGenerator.cpp ---
@@ -108,6 +109,7 @@ P::PhysicalPtr PhysicalGenerator::generateInitialPlan(
 P::PhysicalPtr PhysicalGenerator::optimizePlan() {
   std::vector<std::unique_ptr<Rule>> rules;
   rules.emplace_back(new PruneColumns());
+  rules.emplace_back(new PushDownLowCostDisjunctivePredicate());
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98714161
  
--- Diff: query_optimizer/PhysicalGenerator.cpp ---
@@ -108,6 +109,7 @@ P::PhysicalPtr PhysicalGenerator::generateInitialPlan(
 P::PhysicalPtr PhysicalGenerator::optimizePlan() {
   std::vector<std::unique_ptr<Rule>> rules;
   rules.emplace_back(new PruneColumns());
+  rules.emplace_back(new PushDownLowCostDisjunctivePredicate());
--- End diff --

Oh I made a mistake that actually the `InjectFilterJoin` optimization will 
not generate unnecessary `Selection`s, but 
`PushDownLowCostDisjunctivePredicate` may generate two `Selection`s in a chain. 
Currently this is a not an issue since the `Selection`s for tables with 
cardinality around 100 are very light-weight. We may add a `FuseSelection` 
optimization later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98708919
  
--- Diff: query_optimizer/PhysicalGenerator.cpp ---
@@ -27,6 +27,7 @@
 #include "query_optimizer/logical/Logical.hpp"
 #include "query_optimizer/physical/Physical.hpp"
 #include "query_optimizer/rules/AttachLIPFilters.hpp"
+#include "query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp"
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98708887
  
--- Diff: query_optimizer/PhysicalGenerator.cpp ---
@@ -108,6 +109,7 @@ P::PhysicalPtr PhysicalGenerator::generateInitialPlan(
 P::PhysicalPtr PhysicalGenerator::optimizePlan() {
   std::vector<std::unique_ptr<Rule>> rules;
   rules.emplace_back(new PruneColumns());
+  rules.emplace_back(new PushDownLowCostDisjunctivePredicate());
--- End diff --

The three rules blocks from `PushDownLowCostDisjunctivePredicate` to 
`ReorderColumns` can be arranged in any order.

`PushDownLowCostDisjunctivePredicate` will not generate unnecessary 
`Selection`s. And currently in the physical generate there's no rule that fuses 
`Selection`s.

I add a comment to indicate that currently new rules should better be added 
before `AttachLIPFilters` because rules after that needs extra handling of 
`LIPFilterConfiguration` for transformed nodes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98710981
  
--- Diff: query_optimizer/rules/PushDownLowCostDisjunctivePredicate.cpp ---
@@ -0,0 +1,218 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp"
+
+#include 
+#include 
+
+#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExpressionUtil.hpp"
+#include "query_optimizer/expressions/LogicalAnd.hpp"
+#include "query_optimizer/expressions/LogicalOr.hpp"
+#include "query_optimizer/expressions/PatternMatcher.hpp"
+#include "query_optimizer/expressions/Predicate.hpp"
+#include "query_optimizer/physical/Aggregate.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/NestedLoopsJoin.hpp"
+#include "query_optimizer/physical/PatternMatcher.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/PhysicalType.hpp"
+#include "query_optimizer/physical/Selection.hpp"
+#include "query_optimizer/physical/TableReference.hpp"
+#include "query_optimizer/physical/TopLevelPlan.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+namespace optimizer {
+
+namespace E = ::quickstep::optimizer::expressions;
+namespace P = ::quickstep::optimizer::physical;
+
+P::PhysicalPtr PushDownLowCostDisjunctivePredicate::apply(const 
P::PhysicalPtr ) {
+  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
+
+  const P::TopLevelPlanPtr top_level_plan =
+ std::static_pointer_cast(input);
+  cost_model_.reset(
+  new cost::StarSchemaSimpleCostModel(
+  top_level_plan->shared_subplans()));
+
+  collectApplicablePredicates(input);
+
+  if (!applicable_predicates_.empty()) {
+// Apply the selected predicates to stored relations.
+return attachPredicates(input);
+  } else {
+return input;
+  }
+}
+
+void PushDownLowCostDisjunctivePredicate::collectApplicablePredicates(
+const physical::PhysicalPtr ) {
+  P::TableReferencePtr table_reference;
+  if (P::SomeTableReference::MatchesWithConditionalCast(input, 
_reference)) {
+// Consider only stored relations with small cardinality as targets.
+if (cost_model_->estimateCardinality(input) <= 100u) {
+  applicable_nodes_.emplace_back(input, 
_reference->attribute_list());
+}
+return;
+  }
+
+  for (const auto  : input->children()) {
+collectApplicablePredicates(child);
+  }
+
+  E::PredicatePtr filter_predicate = nullptr;
+  switch (input->getPhysicalType()) {
+case P::PhysicalType::kAggregate: {
+  filter_predicate =
+  std::static_pointer_cast(input)->filter_predicate();
+  break;
+}
+case P::PhysicalType::kHashJoin: {
+  const P::HashJoinPtr hash_join =
+  std::static_pointer_cast(input);
+  if (hash_join->join_type() == P::HashJoin::JoinType::kInnerJoin) {
+filter_predicate = hash_join->residual_predicate();
+  }
+  break;
+}
+case P::PhysicalType::kNestedLoopsJoin: {
+  filter_predicate =
+  std::static_pointer_cast(input)->join_predicate();
+  break;
+}
+case P::PhysicalType::kSelection: {
+  filter_predicate =
+  std::static_pointer_cast(input)->filter_predicate();
+  break;
+}
+default:
+  break;
+  }
+
+  E::LogicalOrPtr disjunctive_predi

[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98710964
  
--- Diff: query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp ---
@@ -0,0 +1,118 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#ifndef 
QUICKSTEP_QUERY_OPTIMIZER_RULES_PUSH_DOWN_LOW_COST_DISJUNCTIVE_PREDICATE_HPP_
+#define 
QUICKSTEP_QUERY_OPTIMIZER_RULES_PUSH_DOWN_LOW_COST_DISJUNCTIVE_PREDICATE_HPP_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/Predicate.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/rules/Rule.hpp"
+#include "utility/Macros.hpp"
+
+namespace quickstep {
+namespace optimizer {
+
+/** \addtogroup OptimizerRules
+ *  @{
+ */
+
+/**
+ * @brief Rule that applies to a physical plan to push down low-cost 
disjunctive
+ *predicate when proper conditions are met.
+ *
+ * Here we elaborate the conditions.
+ *
+ * Let
+ *   P = p_{1,1} AND ... AND p_{1, m_1} OR ... OR p_{n,1} AND ... AND 
p_{n, m_n}
+ * be a predicate in disjunctive normal form.
+ *
+ * Now consider each small-cardinality relation R, if for each i in 1..n, 
there
+ * exists at least one predicate p_{i, k_i} that is applicable to R. Then 
we can
+ * construct a new predicate
+ *   P' = p_{1, k_1} OR ... OR p_{n, k_n}
+ * and push down P' to be applied to R.
+ *
+ * Also, if any conjunctive component in P contains more than one 
predicate that
+ * is applicable to R, then we can combine all these applicable predicates 
as a
+ * conjunctive component in P'.
+ *
+ * Finally, note that if there exists a conjunctive component that 
contains no
+ * predicate applicable to R. Then the condition fails and we cannot do a 
push
+ * down for R.
+ */
+class PushDownLowCostDisjunctivePredicate : public 
Rule {
+ public:
+  /**
+   * @brief Constructor.
+   */
+  PushDownLowCostDisjunctivePredicate() {}
+
+  ~PushDownLowCostDisjunctivePredicate() override {}
+
+  std::string getName() const override {
+return "PushDownLowCostDisjunctivePredicate";
+  }
+
+  physical::PhysicalPtr apply(const physical::PhysicalPtr ) override;
+
+ private:
+  struct PredicateInfo {
+PredicateInfo() {}
+inline void add(expressions::PredicatePtr predicate) {
+  predicates.emplace_back(predicate);
+}
+std::vector predicates;
+  };
+
+  void collectApplicablePredicates(const physical::PhysicalPtr );
+  physical::PhysicalPtr attachPredicates(const physical::PhysicalPtr 
) const;
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98710865
  
--- Diff: query_optimizer/rules/PushDownLowCostDisjunctivePredicate.cpp ---
@@ -0,0 +1,218 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "query_optimizer/rules/PushDownLowCostDisjunctivePredicate.hpp"
+
+#include 
+#include 
+
+#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExpressionUtil.hpp"
+#include "query_optimizer/expressions/LogicalAnd.hpp"
+#include "query_optimizer/expressions/LogicalOr.hpp"
+#include "query_optimizer/expressions/PatternMatcher.hpp"
+#include "query_optimizer/expressions/Predicate.hpp"
+#include "query_optimizer/physical/Aggregate.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/NestedLoopsJoin.hpp"
+#include "query_optimizer/physical/PatternMatcher.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/PhysicalType.hpp"
+#include "query_optimizer/physical/Selection.hpp"
+#include "query_optimizer/physical/TableReference.hpp"
+#include "query_optimizer/physical/TopLevelPlan.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+namespace optimizer {
+
+namespace E = ::quickstep::optimizer::expressions;
+namespace P = ::quickstep::optimizer::physical;
+
+P::PhysicalPtr PushDownLowCostDisjunctivePredicate::apply(const 
P::PhysicalPtr ) {
+  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
+
+  const P::TopLevelPlanPtr top_level_plan =
+ std::static_pointer_cast(input);
+  cost_model_.reset(
+  new cost::StarSchemaSimpleCostModel(
+  top_level_plan->shared_subplans()));
+
+  collectApplicablePredicates(input);
+
+  if (!applicable_predicates_.empty()) {
+// Apply the selected predicates to stored relations.
+return attachPredicates(input);
+  } else {
+return input;
+  }
+}
+
+void PushDownLowCostDisjunctivePredicate::collectApplicablePredicates(
+const physical::PhysicalPtr ) {
+  P::TableReferencePtr table_reference;
+  if (P::SomeTableReference::MatchesWithConditionalCast(input, 
_reference)) {
+// Consider only stored relations with small cardinality as targets.
+if (cost_model_->estimateCardinality(input) <= 100u) {
--- End diff --

It is okay to be conservative and the stat is more accurate if the 
`\analyze` command gets executed.

Added a gflag.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-31 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/174#discussion_r98711958
  
--- Diff: query_optimizer/PhysicalGenerator.cpp ---
@@ -108,6 +109,7 @@ P::PhysicalPtr PhysicalGenerator::generateInitialPlan(
 P::PhysicalPtr PhysicalGenerator::optimizePlan() {
   std::vector<std::unique_ptr<Rule>> rules;
   rules.emplace_back(new PruneColumns());
+  rules.emplace_back(new PushDownLowCostDisjunctivePredicate());
--- End diff --

Yes it will only create necessary `Selection`s. I.e. there won't be two 
selections in a chain that can be fused into one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #180: Fix the bug with SingleIdentityHashFi...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/180#discussion_r99859847
  
--- Diff: query_optimizer/rules/AttachLIPFilters.cpp ---
@@ -128,7 +129,7 @@ void AttachLIPFilters::attachLIPFilters(
   lip_filter_configuration_->addBuildInfo(
   P::SingleIdentityHashFilterBuildInfo::Create(
   pair.second->source_attribute,
-  pair.second->estimated_cardinality * 8),
+  std::max(64uL, pair.second->estimated_cardinality * 8u)),
--- End diff --

It seems that we won't need to change this value. If there are scenarios 
later that the configuration of this value makes a difference, then we can add 
a flag for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/177#discussion_r99865183
  
--- Diff: query_optimizer/Optimizer.cpp ---
@@ -30,10 +30,11 @@ void Optimizer::generateQueryHandle(const 
ParseStatement _statement,
 OptimizerContext *optimizer_context,
 QueryHandle *query_handle) {
   LogicalGenerator logical_generator(optimizer_context);
+  PhysicalGenerator physical_generator(optimizer_context);
--- End diff --

This might be done later together with the same change to 
`LogicalGenerator`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/177#discussion_r99865239
  
--- Diff: query_optimizer/rules/ReduceGroupByAttributes.cpp ---
@@ -0,0 +1,217 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "query_optimizer/rules/ReduceGroupByAttributes.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "catalog/CatalogRelation.hpp"
+#include "query_optimizer/OptimizerContext.hpp"
+#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExprId.hpp"
+#include "query_optimizer/expressions/ExpressionUtil.hpp"
+#include "query_optimizer/expressions/NamedExpression.hpp"
+#include "query_optimizer/physical/Aggregate.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/PatternMatcher.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/PhysicalType.hpp"
+#include "query_optimizer/physical/TableReference.hpp"
+#include "query_optimizer/physical/TopLevelPlan.hpp"
+#include "query_optimizer/rules/PruneColumns.hpp"
+
+#include "gflags/gflags.h"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+namespace optimizer {
+
+DEFINE_uint64(reduce_group_by_attributes_threshold, 3u,
+  "The threshold for a stored relation's number of attributes 
in a "
+  "group-by clause for the ReduceGroupByAttributes 
optimization "
+  "rule to pull the stored relation up the aggregation");
+
+namespace E = ::quickstep::optimizer::expressions;
+namespace P = ::quickstep::optimizer::physical;
+
+P::PhysicalPtr ReduceGroupByAttributes::apply(const P::PhysicalPtr ) 
{
+  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
+  cost_model_.reset(new cost::StarSchemaSimpleCostModel(
+  std::static_pointer_cast(input)->shared_subplans()));
+
+  P::PhysicalPtr output = applyInternal(input);
+  if (output != input) {
+output = PruneColumns().apply(output);
+  }
+  return output;
+}
+
+P::PhysicalPtr ReduceGroupByAttributes::applyInternal(const P::PhysicalPtr 
) {
+  std::vector new_children;
+  for (const P::PhysicalPtr  : input->children()) {
+new_children.push_back(applyInternal(child));
+  }
+
+  if (new_children != input->children()) {
+return applyToNode(input->copyWithNewChildren(new_children));
+  } else {
+return applyToNode(input);
+  }
+}
+
+P::PhysicalPtr ReduceGroupByAttributes::applyToNode(const P::PhysicalPtr 
) {
+  P::TableReferencePtr table_reference;
+  if (P::SomeTableReference::MatchesWithConditionalCast(input, 
_reference)) {
+// Collect the attributes-to-TableReference mapping info.
+for (const auto  : table_reference->attribute_list()) {
+  source_.emplace(attr->id(), std::make_pair(table_reference, attr));
+}
+return input;
+  }
+
+  P::AggregatePtr aggregate;
+  if (!P::SomeAggregate::MatchesWithConditionalCast(input, ) ||
+  aggregate->grouping_expressions().size() <= 1u) {
+return input;
+  }
+
+  // Divide the group-by attributes into groups based on their source 
table.
+  std::map<P::TableReferencePtr, std::vector> 
table_attributes;
--- End diff --

We would expect tables attributes to contain a small number of entries (# 
of storage tables for the attributes in one group-by clause, typically 1 to 3). 
Also consider that there an iteration of all the map elements below. It might 
be bette

[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99867096
  
--- Diff: query_optimizer/ExecutionGenerator.cpp ---
@@ -371,6 +378,109 @@ void ExecutionGenerator::dropAllTemporaryRelations() {
   }
 }
 
+bool ExecutionGenerator::canUseCollisionFreeAggregation(
+const P::AggregatePtr ,
+const std::size_t estimated_num_groups,
+std::size_t *max_num_groups) const {
+#ifdef QUICKSTEP_DISTRIBUTED
+  // Currently we cannot do this fast path with the distributed setting. 
See
+  // the TODOs at InitializeAggregationOperator::getAllWorkOrderProtos() 
and
+  // FinalizeAggregationOperator::getAllWorkOrderProtos().
+  return false;
+#endif
+
+  // Supports only single group-by key.
+  if (aggregate->grouping_expressions().size() != 1) {
--- End diff --

Yes we will have a followup PR to improve TPC-H Q01.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99882605
  
--- Diff: query_optimizer/ExecutionGenerator.cpp ---
@@ -371,6 +378,109 @@ void ExecutionGenerator::dropAllTemporaryRelations() {
   }
 }
 
+bool ExecutionGenerator::canUseCollisionFreeAggregation(
+const P::AggregatePtr ,
+const std::size_t estimated_num_groups,
+std::size_t *max_num_groups) const {
+#ifdef QUICKSTEP_DISTRIBUTED
+  // Currently we cannot do this fast path with the distributed setting. 
See
+  // the TODOs at InitializeAggregationOperator::getAllWorkOrderProtos() 
and
+  // FinalizeAggregationOperator::getAllWorkOrderProtos().
+  return false;
+#endif
+
+  // Supports only single group-by key.
+  if (aggregate->grouping_expressions().size() != 1) {
+return false;
+  }
+
+  // We need to know the exact min/max stats of the group-by key.
+  // So it must be a CatalogAttribute (but not an expression).
+  E::AttributeReferencePtr group_by_key_attr;
+  const E::ExpressionPtr agg_expr = 
aggregate->grouping_expressions().front();
+  if (!E::SomeAttributeReference::MatchesWithConditionalCast(agg_expr, 
_by_key_attr)) {
+return false;
+  }
+
+  bool min_value_stat_is_exact;
+  bool max_value_stat_is_exact;
+  const TypedValue min_value =
+  cost_model_for_aggregation_->findMinValueStat(
+  aggregate, group_by_key_attr, _value_stat_is_exact);
+  const TypedValue max_value =
+  cost_model_for_aggregation_->findMaxValueStat(
+  aggregate, group_by_key_attr, _value_stat_is_exact);
+  if (min_value.isNull() || max_value.isNull() ||
+  (!min_value_stat_is_exact) || (!max_value_stat_is_exact)) {
+return false;
+  }
+
+  std::int64_t min_cpp_value;
+  std::int64_t max_cpp_value;
+  switch (group_by_key_attr->getValueType().getTypeID()) {
+case TypeID::kInt: {
+  min_cpp_value = min_value.getLiteral();
+  max_cpp_value = max_value.getLiteral();
+  break;
+}
+case TypeID::kLong: {
+  min_cpp_value = min_value.getLiteral();
+  max_cpp_value = max_value.getLiteral();
+  break;
+}
+default:
+  return false;
+  }
+
+  // TODO(jianqiao):
+  // 1. Handle the case where min_cpp_value is below 0 or far greater than 
0.
+  // 2. Reason about the table size bound (e.g. by checking memory size) 
instead
+  //of hardcoding it as a gflag.
+  if (min_cpp_value < 0 ||
+  max_cpp_value >= FLAGS_collision_free_vector_table_max_size ||
+  max_cpp_value / static_cast(estimated_num_groups) > 256.0) {
+return false;
+  }
+
+  for (const auto _expr : aggregate->aggregate_expressions()) {
+const E::AggregateFunctionPtr agg_func =
+std::static_pointer_cast(agg_expr->expression());
+
+if (agg_func->is_distinct()) {
+  return false;
    +}
+
+// TODO(jianqiao): Support AggregationID::AVG.
+switch (agg_func->getAggregate().getAggregationID()) {
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99883469
  
--- Diff: query_optimizer/ExecutionGenerator.hpp ---
@@ -427,7 +445,7 @@ class ExecutionGenerator {
   /**
* @brief The cost model to use for estimating aggregation hash table 
size.
*/
-  std::unique_ptr cost_model_for_aggregation_;
+  std::unique_ptr 
cost_model_for_aggregation_;
--- End diff --

We need some extra analysis done in `StarSchemaSimpleCostModel`. I will 
refactor the analysis part out of `StarSchemaSimpleCostModel` into 
`AnalysisUtil` in a follow-up PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99874032
  
--- Diff: query_optimizer/ExecutionGenerator.cpp ---
@@ -371,6 +378,109 @@ void ExecutionGenerator::dropAllTemporaryRelations() {
   }
 }
 
+bool ExecutionGenerator::canUseCollisionFreeAggregation(
+const P::AggregatePtr ,
+const std::size_t estimated_num_groups,
+std::size_t *max_num_groups) const {
+#ifdef QUICKSTEP_DISTRIBUTED
+  // Currently we cannot do this fast path with the distributed setting. 
See
+  // the TODOs at InitializeAggregationOperator::getAllWorkOrderProtos() 
and
+  // FinalizeAggregationOperator::getAllWorkOrderProtos().
+  return false;
+#endif
+
+  // Supports only single group-by key.
+  if (aggregate->grouping_expressions().size() != 1) {
+return false;
+  }
+
+  // We need to know the exact min/max stats of the group-by key.
+  // So it must be a CatalogAttribute (but not an expression).
+  E::AttributeReferencePtr group_by_key_attr;
+  const E::ExpressionPtr agg_expr = 
aggregate->grouping_expressions().front();
+  if (!E::SomeAttributeReference::MatchesWithConditionalCast(agg_expr, 
_by_key_attr)) {
+return false;
+  }
+
+  bool min_value_stat_is_exact;
+  bool max_value_stat_is_exact;
+  const TypedValue min_value =
+  cost_model_for_aggregation_->findMinValueStat(
+  aggregate, group_by_key_attr, _value_stat_is_exact);
+  const TypedValue max_value =
+  cost_model_for_aggregation_->findMaxValueStat(
+  aggregate, group_by_key_attr, _value_stat_is_exact);
+  if (min_value.isNull() || max_value.isNull() ||
+  (!min_value_stat_is_exact) || (!max_value_stat_is_exact)) {
+return false;
+  }
+
+  std::int64_t min_cpp_value;
+  std::int64_t max_cpp_value;
+  switch (group_by_key_attr->getValueType().getTypeID()) {
+case TypeID::kInt: {
+  min_cpp_value = min_value.getLiteral();
+  max_cpp_value = max_value.getLiteral();
+  break;
+}
+case TypeID::kLong: {
+  min_cpp_value = min_value.getLiteral();
+  max_cpp_value = max_value.getLiteral();
+  break;
+}
+default:
+  return false;
--- End diff --

We can support more types later. For any type/any number of group-by keys, 
if we can define a one-to-one mapping function that maps the keys to 
range-bounded integers, then this aggregation is applicable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99882633
  
--- Diff: query_optimizer/ExecutionGenerator.cpp ---
@@ -371,6 +378,109 @@ void ExecutionGenerator::dropAllTemporaryRelations() {
   }
 }
 
+bool ExecutionGenerator::canUseCollisionFreeAggregation(
+const P::AggregatePtr ,
+const std::size_t estimated_num_groups,
+std::size_t *max_num_groups) const {
+#ifdef QUICKSTEP_DISTRIBUTED
+  // Currently we cannot do this fast path with the distributed setting. 
See
+  // the TODOs at InitializeAggregationOperator::getAllWorkOrderProtos() 
and
+  // FinalizeAggregationOperator::getAllWorkOrderProtos().
+  return false;
+#endif
+
+  // Supports only single group-by key.
+  if (aggregate->grouping_expressions().size() != 1) {
+return false;
+  }
+
+  // We need to know the exact min/max stats of the group-by key.
+  // So it must be a CatalogAttribute (but not an expression).
+  E::AttributeReferencePtr group_by_key_attr;
+  const E::ExpressionPtr agg_expr = 
aggregate->grouping_expressions().front();
+  if (!E::SomeAttributeReference::MatchesWithConditionalCast(agg_expr, 
_by_key_attr)) {
+return false;
+  }
+
+  bool min_value_stat_is_exact;
+  bool max_value_stat_is_exact;
+  const TypedValue min_value =
+  cost_model_for_aggregation_->findMinValueStat(
+  aggregate, group_by_key_attr, _value_stat_is_exact);
+  const TypedValue max_value =
+  cost_model_for_aggregation_->findMaxValueStat(
+  aggregate, group_by_key_attr, _value_stat_is_exact);
+  if (min_value.isNull() || max_value.isNull() ||
+  (!min_value_stat_is_exact) || (!max_value_stat_is_exact)) {
+return false;
+  }
+
+  std::int64_t min_cpp_value;
+  std::int64_t max_cpp_value;
+  switch (group_by_key_attr->getValueType().getTypeID()) {
+case TypeID::kInt: {
+  min_cpp_value = min_value.getLiteral();
+  max_cpp_value = max_value.getLiteral();
+  break;
+}
+case TypeID::kLong: {
+  min_cpp_value = min_value.getLiteral();
+  max_cpp_value = max_value.getLiteral();
+  break;
+}
+default:
+  return false;
+  }
+
+  // TODO(jianqiao):
+  // 1. Handle the case where min_cpp_value is below 0 or far greater than 
0.
+  // 2. Reason about the table size bound (e.g. by checking memory size) 
instead
+  //of hardcoding it as a gflag.
+  if (min_cpp_value < 0 ||
+  max_cpp_value >= FLAGS_collision_free_vector_table_max_size ||
+  max_cpp_value / static_cast(estimated_num_groups) > 256.0) {
+return false;
+  }
+
+  for (const auto _expr : aggregate->aggregate_expressions()) {
+const E::AggregateFunctionPtr agg_func =
+std::static_pointer_cast(agg_expr->expression());
+
+if (agg_func->is_distinct()) {
+  return false;
    +}
+
+// TODO(jianqiao): Support AggregationID::AVG.
+switch (agg_func->getAggregate().getAggregationID()) {
+  case AggregationID::kCount:  // Fall through
+  case AggregationID::kSum:
+break;
+  default:
+return false;
+}
+
+const auto  = agg_func->getArguments();
+if (arguments.size() > 1) {
+  return false;
+}
+
+if (arguments.size() == 1) {
+  switch (arguments.front()->getValueType().getTypeID()) {
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99897008
  
--- Diff: storage/PackedPayloadHashTable.cpp ---
@@ -0,0 +1,463 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "storage/PackedPayloadHashTable.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "expressions/aggregation/AggregationHandle.hpp"
+#include "storage/HashTableKeyManager.hpp"
+#include "storage/StorageBlob.hpp"
+#include "storage/StorageBlockInfo.hpp"
+#include "storage/StorageConstants.hpp"
+#include "storage/StorageManager.hpp"
+#include "storage/ValueAccessor.hpp"
+#include "storage/ValueAccessorMultiplexer.hpp"
+#include "threading/SpinMutex.hpp"
+#include "threading/SpinSharedMutex.hpp"
+#include "types/Type.hpp"
+#include "types/containers/ColumnVectorsValueAccessor.hpp"
+#include "utility/Alignment.hpp"
+#include "utility/Macros.hpp"
+#include "utility/PrimeNumber.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+
+PackedPayloadHashTable::PackedPayloadHashTable(
+const std::vector _types,
+const std::size_t num_entries,
+const std::vector ,
+StorageManager *storage_manager)
+: key_types_(key_types),
+  num_handles_(handles.size()),
+  handles_(handles),
+  total_payload_size_(ComputeTotalPayloadSize(handles)),
+  storage_manager_(storage_manager),
+  kBucketAlignment(alignof(std::atomic)),
+  kValueOffset(sizeof(std::atomic) + sizeof(std::size_t)),
+  key_manager_(key_types_, kValueOffset + total_payload_size_),
+  bucket_size_(ComputeBucketSize(key_manager_.getFixedKeySize())) {
+  std::size_t payload_offset_running_sum = sizeof(SpinMutex);
+  for (const auto *handle : handles) {
+payload_offsets_.emplace_back(payload_offset_running_sum);
+payload_offset_running_sum += handle->getPayloadSize();
+  }
+
+  // NOTE(jianqiao): Potential memory leak / double freeing by copying from
+  // init_payload to buckets if payload contains out of line data.
+  init_payload_ =
+  static_cast(calloc(this->total_payload_size_, 1));
+  DCHECK(init_payload_ != nullptr);
+
+  for (std::size_t i = 0; i < num_handles_; ++i) {
+handles_[i]->initPayload(init_payload_ + payload_offsets_[i]);
+  }
+
+  // Bucket size always rounds up to the alignment requirement of the 
atomic
+  // size_t "next" pointer at the front or a ValueT, whichever is larger.
+  //
+  // Give base HashTable information about what key components are stored
+  // inline from 'key_manager_'.
+  setKeyInline(key_manager_.getKeyInline());
+
+  // Pick out a prime number of slots and calculate storage requirements.
+  std::size_t num_slots_tmp =
+  get_next_prime_number(num_entries * kHashTableLoadFactor);
+  std::size_t required_memory =
+  sizeof(Header) + num_slots_tmp * sizeof(std::atomic) +
+  (num_slots_tmp / kHashTableLoadFactor) *
+  (bucket_size_ + key_manager_.getEstimatedVariableKeySize());
+  std::size_t num_storage_slots =
+  this->storage_manager_->SlotsNeededForBytes(required_memory);
+  if (num_storage_slots == 0) {
--- End diff --

All the code in `PackedPayloadHashTable` is from the original `HashTable` 
implementation. We may have an overall code-style revision later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99897239
  
--- Diff: storage/AggregationOperationState.hpp ---
@@ -156,6 +152,29 @@ class AggregationOperationState {
   const CatalogDatabaseLite );
 
   /**
+   * @brief Get the number of partitions to be used for initializing the
+   *aggregation.
+   *
+   * @return The number of partitions to be used for initializing the 
aggregation.
+   **/
+  std::size_t getNumInitializationPartitions() const;
+
+  /**
+   * @brief Get the number of partitions to be used for finalizing the
+   *aggregation.
+   *
+   * @return The number of partitions to be used for finalizing the 
aggregation.
+   **/
+  std::size_t getNumFinalizationPartitions() const;
--- End diff --

The number of partitions can be different for initialization and 
finalization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #179: QUICKSTEP-70-71 Improve aggregation performa...

2017-02-07 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/179
  
For the question about `PartitionedHashTablePool` and `HashTablePool`. Note 
that their use patterns are different so perhaps it is not natural to merge 
them into one class.

`PartitionedHashTablePool` creates **a fixed number** of hash tables **on 
its construction**. The use pattern is that every `AggregationWorkOrder` 
updates **all** of these hash tables and every `FinalizeAggregationWorkOrder` 
updates one of these hash tables.

`HashTablePool` creates hash tables **on demand**. The current use pattern 
is that every `AggregationWorkOrder` checkouts **exclusively** one hash table, 
updates the hash table, and returns the hash table back to the pool. Then only 
one `FinalizeAggregationWorkOrder` is created to merge all the tables in the 
pool to the final hash table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #179: QUICKSTEP-70-71 Improve aggregation performa...

2017-02-07 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/179
  
Updated, and tested locally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99914989
  
--- Diff: storage/AggregationOperationState.cpp ---
@@ -353,187 +353,286 @@ bool AggregationOperationState::ProtoIsValid(
   return true;
 }
 
-void AggregationOperationState::aggregateBlock(const block_id input_block,
-   LIPFilterAdaptiveProber 
*lip_filter_adaptive_prober) {
-  if (group_by_list_.empty()) {
-aggregateBlockSingleState(input_block);
-  } else {
-aggregateBlockHashTable(input_block, lip_filter_adaptive_prober);
+bool AggregationOperationState::checkAggregatePartitioned(
+const std::size_t estimated_num_groups,
+const std::vector _distinct,
+const std::vector<std::unique_ptr> _by,
+const std::vector _functions) 
const {
+  // If there's no aggregation, return false.
+  if (aggregate_functions.empty()) {
+return false;
+  }
+  // Check if there's a distinct operation involved in any aggregate, if so
+  // the aggregate can't be partitioned.
+  for (auto distinct : is_distinct) {
+if (distinct) {
+  return false;
+}
+  }
+  // There's no distinct aggregation involved, Check if there's at least 
one
+  // GROUP BY operation.
+  if (group_by.empty()) {
+return false;
+  }
+
+  // Currently we require that all the group-by keys are ScalarAttributes 
for
+  // the convenient of implementing copy elision.
+  // TODO(jianqiao): relax this requirement.
+  for (const auto _by_element : group_by) {
+if (group_by_element->getAttributeIdForValueAccessor() == 
kInvalidAttributeID) {
+  return false;
+}
   }
+
+  // There are GROUP BYs without DISTINCT. Check if the estimated number of
+  // groups is large enough to warrant a partitioned aggregation.
+  return estimated_num_groups >
+ static_cast(
+ FLAGS_partition_aggregation_num_groups_threshold);
+  return false;
 }
 
-void AggregationOperationState::finalizeAggregate(
-InsertDestination *output_destination) {
-  if (group_by_list_.empty()) {
-finalizeSingleState(output_destination);
+std::size_t AggregationOperationState::getNumInitializationPartitions() 
const {
+  if (is_aggregate_collision_free_) {
+return static_cast(
+collision_free_hashtable_.get())->getNumInitializationPartitions();
   } else {
-finalizeHashTable(output_destination);
+return 0u;
   }
 }
 
-void AggregationOperationState::mergeSingleState(
-const std::vector<std::unique_ptr> _state) {
-  DEBUG_ASSERT(local_state.size() == single_states_.size());
-  for (std::size_t agg_idx = 0; agg_idx < handles_.size(); ++agg_idx) {
-if (!is_distinct_[agg_idx]) {
-  handles_[agg_idx]->mergeStates(*local_state[agg_idx],
- single_states_[agg_idx].get());
-}
+std::size_t AggregationOperationState::getNumFinalizationPartitions() 
const {
+  if (is_aggregate_collision_free_) {
+return static_cast(
+collision_free_hashtable_.get())->getNumFinalizationPartitions();
+  } else if (is_aggregate_partitioned_) {
+return partitioned_group_by_hashtable_pool_->getNumPartitions();
+  } else  {
+return 1u;
   }
 }
 
-void AggregationOperationState::aggregateBlockSingleState(
-const block_id input_block) {
-  // Aggregate per-block state for each aggregate.
-  std::vector<std::unique_ptr> local_state;
+void AggregationOperationState::initialize(const std::size_t partition_id) 
{
+  if (is_aggregate_collision_free_) {
+static_cast(
+collision_free_hashtable_.get())->initialize(partition_id);
+  } else {
+LOG(FATAL) << "AggregationOperationState::initializeState() "
--- End diff --

If the code reaches here, then the system will probably actually crash. 
I.e. it is like a `CHECK` but we still put the `else` branch here for further 
updates.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99928307
  
--- Diff: storage/AggregationOperationState.cpp ---
@@ -353,187 +353,286 @@ bool AggregationOperationState::ProtoIsValid(
   return true;
 }
 
-void AggregationOperationState::aggregateBlock(const block_id input_block,
-   LIPFilterAdaptiveProber 
*lip_filter_adaptive_prober) {
-  if (group_by_list_.empty()) {
-aggregateBlockSingleState(input_block);
-  } else {
-aggregateBlockHashTable(input_block, lip_filter_adaptive_prober);
+bool AggregationOperationState::checkAggregatePartitioned(
+const std::size_t estimated_num_groups,
+const std::vector _distinct,
+const std::vector<std::unique_ptr> _by,
+const std::vector _functions) 
const {
+  // If there's no aggregation, return false.
+  if (aggregate_functions.empty()) {
+return false;
+  }
+  // Check if there's a distinct operation involved in any aggregate, if so
+  // the aggregate can't be partitioned.
+  for (auto distinct : is_distinct) {
+if (distinct) {
+  return false;
+}
+  }
+  // There's no distinct aggregation involved, Check if there's at least 
one
+  // GROUP BY operation.
+  if (group_by.empty()) {
+return false;
+  }
+
+  // Currently we require that all the group-by keys are ScalarAttributes 
for
+  // the convenient of implementing copy elision.
+  // TODO(jianqiao): relax this requirement.
+  for (const auto _by_element : group_by) {
+if (group_by_element->getAttributeIdForValueAccessor() == 
kInvalidAttributeID) {
+  return false;
+}
   }
+
+  // There are GROUP BYs without DISTINCT. Check if the estimated number of
+  // groups is large enough to warrant a partitioned aggregation.
+  return estimated_num_groups >
+ static_cast(
+ FLAGS_partition_aggregation_num_groups_threshold);
+  return false;
 }
 
-void AggregationOperationState::finalizeAggregate(
-InsertDestination *output_destination) {
-  if (group_by_list_.empty()) {
-finalizeSingleState(output_destination);
+std::size_t AggregationOperationState::getNumInitializationPartitions() 
const {
+  if (is_aggregate_collision_free_) {
+return static_cast(
+collision_free_hashtable_.get())->getNumInitializationPartitions();
   } else {
-finalizeHashTable(output_destination);
+return 0u;
   }
 }
 
-void AggregationOperationState::mergeSingleState(
-const std::vector<std::unique_ptr> _state) {
-  DEBUG_ASSERT(local_state.size() == single_states_.size());
-  for (std::size_t agg_idx = 0; agg_idx < handles_.size(); ++agg_idx) {
-if (!is_distinct_[agg_idx]) {
-  handles_[agg_idx]->mergeStates(*local_state[agg_idx],
- single_states_[agg_idx].get());
-}
+std::size_t AggregationOperationState::getNumFinalizationPartitions() 
const {
+  if (is_aggregate_collision_free_) {
+return static_cast(
+collision_free_hashtable_.get())->getNumFinalizationPartitions();
+  } else if (is_aggregate_partitioned_) {
+return partitioned_group_by_hashtable_pool_->getNumPartitions();
+  } else  {
+return 1u;
   }
 }
 
-void AggregationOperationState::aggregateBlockSingleState(
-const block_id input_block) {
-  // Aggregate per-block state for each aggregate.
-  std::vector<std::unique_ptr> local_state;
+void AggregationOperationState::initialize(const std::size_t partition_id) 
{
+  if (is_aggregate_collision_free_) {
+static_cast(
+collision_free_hashtable_.get())->initialize(partition_id);
+  } else {
+LOG(FATAL) << "AggregationOperationState::initializeState() "
+   << "is not supported by this aggregation";
+  }
+}
 
+void AggregationOperationState::aggregateBlock(const block_id input_block,
+   LIPFilterAdaptiveProber 
*lip_filter_adaptive_prober) {
   BlockReference block(
   storage_manager_->getBlock(input_block, input_relation_));
+  const auto _store = block->getTupleStorageSubBlock();
+  std::unique_ptr 
base_accessor(tuple_store.createValueAccessor());
+  std::unique_ptr shared_accessor;
+  ValueAccessor *accessor = base_accessor.get();
 
+  // Apply the predicate first, then the LIPFilters, to generate a 
TupleIdSequence
+  // as the existence map for the tuples.
   std::unique_ptr matches;
   if (predicate_ != nullptr) {
-std::unique

[GitHub] incubator-quickstep issue #179: QUICKSTEP-70-71 Improve aggregation performa...

2017-02-07 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/179
  
There is a `CMakeLists` to be updated -- do not merge at this moment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99918342
  
--- Diff: storage/AggregationOperationState.hpp ---
@@ -156,6 +152,29 @@ class AggregationOperationState {
   const CatalogDatabaseLite );
 
   /**
+   * @brief Get the number of partitions to be used for initializing the
+   *aggregation.
+   *
+   * @return The number of partitions to be used for initializing the 
aggregation.
+   **/
+  std::size_t getNumInitializationPartitions() const;
+
+  /**
+   * @brief Get the number of partitions to be used for finalizing the
+   *aggregation.
+   *
+   * @return The number of partitions to be used for finalizing the 
aggregation.
+   **/
+  std::size_t getNumFinalizationPartitions() const;
--- End diff --

In the distributed scenario, you may need the table to be "really 
partitioned" (i.e. divided into several tables) in a different sense of what 
partitions refer to here. The partitions here cannot be scheduled to difference 
nodes since they have to update the same table.

So later when we make `CollisionFreeVectorTable` support really partitioned 
input relations, we will can rename all the "partitions" here to some other 
name, say `segments`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99929968
  
--- Diff: relational_operators/WorkOrderFactory.cpp ---
@@ -186,6 +186,7 @@ WorkOrder* WorkOrderFactory::ReconstructFromProto(const 
serialization::WorkOrder
   LOG(INFO) << "Creating FinalizeAggregationWorkOrder in Shiftboss " 
<< shiftboss_index;
   return new FinalizeAggregationWorkOrder(
   proto.query_id(),
+  0uL,
--- End diff --

Added a TODO comment:

```
// TODO(quickstep-team): Handle inner-table partitioning in the distributed 
setting.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99930761
  
--- Diff: storage/AggregationOperationState.cpp ---
@@ -556,80 +655,83 @@ void AggregationOperationState::finalizeSingleState(
   output_destination->insertTuple(Tuple(std::move(attribute_values)));
 }
 
-void AggregationOperationState::mergeGroupByHashTables(
-AggregationStateHashTableBase *src, AggregationStateHashTableBase 
*dst) {
-  HashTableMergerFast merger(dst);
-  (static_cast<FastHashTable<true, false, true, false> *>(src))
-  ->forEachCompositeKeyFast();
+void AggregationOperationState::finalizeHashTable(
+const std::size_t partition_id,
+InsertDestination *output_destination) {
+  if (is_aggregate_collision_free_) {
+finalizeHashTableImplCollisionFree(partition_id, output_destination);
+  } else if (is_aggregate_partitioned_) {
+finalizeHashTableImplPartitioned(partition_id, output_destination);
+  } else {
+DCHECK_EQ(0u, partition_id);
+finalizeHashTableImplThreadPrivate(output_destination);
+  }
 }
 
-void AggregationOperationState::finalizeHashTable(
+void AggregationOperationState::finalizeHashTableImplCollisionFree(
+const std::size_t partition_id,
+InsertDestination *output_destination) {
+  std::vector<std::unique_ptr> final_values;
+  CollisionFreeVectorTable *hash_table =
+  static_cast(collision_free_hashtable_.get());
+
+  const std::size_t max_length =
+  hash_table->getNumTuplesInFinalizationPartition(partition_id);
+  ColumnVectorsValueAccessor complete_result;
+
+  DCHECK_EQ(1u, group_by_types_.size());
+  const Type *key_type = group_by_types_.front();
+  DCHECK(NativeColumnVector::UsableForType(*key_type));
+
+  std::unique_ptr key_cv(
+  new NativeColumnVector(*key_type, max_length));
+  hash_table->finalizeKey(partition_id, key_cv.get());
+  complete_result.addColumn(key_cv.release());
+
+  for (std::size_t i = 0; i < handles_.size(); ++i) {
+const Type *result_type = handles_[i]->getResultType();
+DCHECK(NativeColumnVector::UsableForType(*result_type));
+
+std::unique_ptr result_cv(
+new NativeColumnVector(*result_type, max_length));
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99915265
  
--- Diff: storage/AggregationOperationState.cpp ---
@@ -353,187 +353,286 @@ bool AggregationOperationState::ProtoIsValid(
   return true;
 }
 
-void AggregationOperationState::aggregateBlock(const block_id input_block,
-   LIPFilterAdaptiveProber 
*lip_filter_adaptive_prober) {
-  if (group_by_list_.empty()) {
-aggregateBlockSingleState(input_block);
-  } else {
-aggregateBlockHashTable(input_block, lip_filter_adaptive_prober);
+bool AggregationOperationState::checkAggregatePartitioned(
+const std::size_t estimated_num_groups,
+const std::vector _distinct,
+const std::vector<std::unique_ptr> _by,
+const std::vector _functions) 
const {
+  // If there's no aggregation, return false.
+  if (aggregate_functions.empty()) {
+return false;
+  }
+  // Check if there's a distinct operation involved in any aggregate, if so
+  // the aggregate can't be partitioned.
+  for (auto distinct : is_distinct) {
+if (distinct) {
+  return false;
+}
+  }
+  // There's no distinct aggregation involved, Check if there's at least 
one
+  // GROUP BY operation.
+  if (group_by.empty()) {
+return false;
+  }
+
+  // Currently we require that all the group-by keys are ScalarAttributes 
for
+  // the convenient of implementing copy elision.
+  // TODO(jianqiao): relax this requirement.
+  for (const auto _by_element : group_by) {
+if (group_by_element->getAttributeIdForValueAccessor() == 
kInvalidAttributeID) {
+  return false;
+}
   }
+
+  // There are GROUP BYs without DISTINCT. Check if the estimated number of
+  // groups is large enough to warrant a partitioned aggregation.
+  return estimated_num_groups >
+ static_cast(
+ FLAGS_partition_aggregation_num_groups_threshold);
+  return false;
 }
 
-void AggregationOperationState::finalizeAggregate(
-InsertDestination *output_destination) {
-  if (group_by_list_.empty()) {
-finalizeSingleState(output_destination);
+std::size_t AggregationOperationState::getNumInitializationPartitions() 
const {
+  if (is_aggregate_collision_free_) {
+return static_cast(
+collision_free_hashtable_.get())->getNumInitializationPartitions();
   } else {
-finalizeHashTable(output_destination);
+return 0u;
--- End diff --

Currently we create the `InitializeAggregationOperator` only for 
collision-free aggregations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99922836
  
--- Diff: relational_operators/InitializeAggregationOperator.cpp ---
@@ -0,0 +1,72 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "relational_operators/InitializeAggregationOperator.hpp"
+
+#include 
+
+#include "query_execution/QueryContext.hpp"
+#include "query_execution/WorkOrderProtosContainer.hpp"
+#include "query_execution/WorkOrdersContainer.hpp"
+#include "relational_operators/WorkOrder.pb.h"
+#include "storage/AggregationOperationState.hpp"
+
+#include "glog/logging.h"
+
+#include "tmb/id_typedefs.h"
+
+namespace quickstep {
+
+bool InitializeAggregationOperator::getAllWorkOrders(
+WorkOrdersContainer *container,
+QueryContext *query_context,
+StorageManager *storage_manager,
+const tmb::client_id scheduler_client_id,
+tmb::MessageBus *bus) {
+  if (!started_) {
+AggregationOperationState *agg_state =
+query_context->getAggregationState(aggr_state_index_);
+DCHECK(agg_state != nullptr);
+
+for (std::size_t part_id = 0;
+ part_id < agg_state->getNumInitializationPartitions();
+ ++part_id) {
+  container->addNormalWorkOrder(
+  new InitializeAggregationWorkOrder(query_id_,
+ part_id,
+ agg_state),
+  op_index_);
+}
+started_ = true;
+  }
+  return true;
+}
+
+// TODO(quickstep-team) : Think about how the number of partitions could be
+// accessed in this function. Until then, we can't use partitioned 
aggregation
+// initialization with the distributed version.
--- End diff --

The "partition" here is kind of inner-table partitioning that is not 
related to the partitioning of storage blocks. To get the "number of 
partitions" here, we need the "AggregationOperationState" object, which is not 
available at this moment.

One solution is to calculate the # of partitions value before creating the 
`AggregationOperationState` object. That requires quite some refactoring. We 
can see if we need to do that later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/177#discussion_r99963866
  
--- Diff: query_optimizer/rules/ReduceGroupByAttributes.hpp ---
@@ -0,0 +1,143 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_REDUCE_GROUP_BY_ATTRIBUTES_HPP_
+#define QUICKSTEP_QUERY_OPTIMIZER_RULES_REDUCE_GROUP_BY_ATTRIBUTES_HPP_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExprId.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/TableReference.hpp"
+#include "query_optimizer/rules/Rule.hpp"
+#include "utility/Macros.hpp"
+
+namespace quickstep {
+namespace optimizer {
+
+class OptimizerContext;
+
+/**
+ * @brief Rule that applies to a physical plan to reduce the number of 
group-by
+ *attributes for Aggregate nodes (to improve performance) by 
pulling
+ *joins up the aggregations.
+ *
+ * For example, let R be a relation with PRIMARY KEY x and attributes y, 
z. Let
+ * S be a relation with FOREIGN KEY u refering to R(x) and attribute v. 
Then the
+ * optimization rule will transform the physical plan:
+ *   Aggregate(
+ * [input relation]: HashJoin(
+ * [probe relation]: S
+ * [build relation]: R
+ * [join expression]: S.u = R.x
+ * [project attributes]: v, x, y, z
+ *   )
+ * [aggregate expression]: SUM(v) AS sum_v
+ * [group-by attributes]: x, y, z
+ *   )
+ *
+ * into:
+ *   HashJoin(
+ * [probe relation]: Aggregate(
+ * [input relation]: S
+ * [aggregate expression]: SUM(v) AS sum_v
+ * [group-by attribute]: u
+ *   ) AS T
+ * [build relation]: R
+ * [join expression]: T.u = R.x
+ * [project attributes]: sum_v, x, y, z
+ *   )
+ */
+class ReduceGroupByAttributes : public Rule {
+ public:
+  /**
+   * @brief Constructor.
+   *
+   * @param optimizer_context The optimizer context.
+   */
+  explicit ReduceGroupByAttributes(OptimizerContext *optimizer_context)
+  : optimizer_context_(optimizer_context) {}
+
+  ~ReduceGroupByAttributes() override {}
+
+  std::string getName() const override {
+return "ReduceGroupByAttributes";
+  }
+
+  physical::PhysicalPtr apply(const physical::PhysicalPtr ) override;
+
+ private:
+  struct AttributeInfo {
+AttributeInfo(const expressions::AttributeReferencePtr _in,
+  const bool is_unique_in,
+  const bool is_fixed_length_in,
+  const std::size_t maximum_size_in)
+: attribute(attribute_in),
+  is_unique(is_unique_in),
+  is_fixed_length(is_fixed_length_in),
+  maximum_size(maximum_size_in) {}
+
+// In the situation that there are multiple attributes that can serve 
as the
+// group-by key, we define an ordering based on aggregation 
performance (e.g.
+// it is faster to do aggregation with a fix-length attribute as the 
group-by
+// key than with a variable-length attribute).
+inline static bool IsBetterThan(const AttributeInfo *lhs,
+const AttributeInfo *rhs) {
+  if (lhs->is_unique != rhs->is_unique) {
+return lhs->is_unique;
+  }
+  if (lhs->is_fixed_length != rhs->is_fixed_length) {
+return lhs->is_fix

[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/177#discussion_r99964422
  
--- Diff: query_optimizer/rules/ReduceGroupByAttributes.hpp ---
@@ -0,0 +1,143 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_REDUCE_GROUP_BY_ATTRIBUTES_HPP_
+#define QUICKSTEP_QUERY_OPTIMIZER_RULES_REDUCE_GROUP_BY_ATTRIBUTES_HPP_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExprId.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/TableReference.hpp"
+#include "query_optimizer/rules/Rule.hpp"
+#include "utility/Macros.hpp"
+
+namespace quickstep {
+namespace optimizer {
+
+class OptimizerContext;
+
+/**
+ * @brief Rule that applies to a physical plan to reduce the number of 
group-by
+ *attributes for Aggregate nodes (to improve performance) by 
pulling
+ *joins up the aggregations.
+ *
+ * For example, let R be a relation with PRIMARY KEY x and attributes y, 
z. Let
+ * S be a relation with FOREIGN KEY u refering to R(x) and attribute v. 
Then the
+ * optimization rule will transform the physical plan:
+ *   Aggregate(
+ * [input relation]: HashJoin(
+ * [probe relation]: S
+ * [build relation]: R
+ * [join expression]: S.u = R.x
+ * [project attributes]: v, x, y, z
+ *   )
+ * [aggregate expression]: SUM(v) AS sum_v
+ * [group-by attributes]: x, y, z
+ *   )
+ *
+ * into:
+ *   HashJoin(
+ * [probe relation]: Aggregate(
+ * [input relation]: S
+ * [aggregate expression]: SUM(v) AS sum_v
+ * [group-by attribute]: u
+ *   ) AS T
+ * [build relation]: R
+ * [join expression]: T.u = R.x
+ * [project attributes]: sum_v, x, y, z
+ *   )
--- End diff --

I will update the header later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #173: Simplified the SelectOperator w/ partitions.

2017-02-02 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/173
  
Merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #179: QUICKSTEP-70-71 Improve aggregation performa...

2017-02-05 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/179
  
Currently there is a gflag for setting the range upbound:

https://github.com/apache/incubator-quickstep/pull/179/files#diff-3d4b5df01ed8edbe255e096eefbfc342R440


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99916031
  
--- Diff: storage/AggregationOperationState.cpp ---
@@ -353,187 +353,286 @@ bool AggregationOperationState::ProtoIsValid(
   return true;
 }
 
-void AggregationOperationState::aggregateBlock(const block_id input_block,
-   LIPFilterAdaptiveProber 
*lip_filter_adaptive_prober) {
-  if (group_by_list_.empty()) {
-aggregateBlockSingleState(input_block);
-  } else {
-aggregateBlockHashTable(input_block, lip_filter_adaptive_prober);
+bool AggregationOperationState::checkAggregatePartitioned(
+const std::size_t estimated_num_groups,
+const std::vector _distinct,
+const std::vector<std::unique_ptr> _by,
+const std::vector _functions) 
const {
+  // If there's no aggregation, return false.
+  if (aggregate_functions.empty()) {
+return false;
+  }
+  // Check if there's a distinct operation involved in any aggregate, if so
+  // the aggregate can't be partitioned.
+  for (auto distinct : is_distinct) {
+if (distinct) {
+  return false;
+}
+  }
+  // There's no distinct aggregation involved, Check if there's at least 
one
+  // GROUP BY operation.
+  if (group_by.empty()) {
+return false;
+  }
+
+  // Currently we require that all the group-by keys are ScalarAttributes 
for
+  // the convenient of implementing copy elision.
+  // TODO(jianqiao): relax this requirement.
+  for (const auto _by_element : group_by) {
+if (group_by_element->getAttributeIdForValueAccessor() == 
kInvalidAttributeID) {
+  return false;
+}
   }
+
+  // There are GROUP BYs without DISTINCT. Check if the estimated number of
+  // groups is large enough to warrant a partitioned aggregation.
+  return estimated_num_groups >
+ static_cast(
+ FLAGS_partition_aggregation_num_groups_threshold);
+  return false;
 }
 
-void AggregationOperationState::finalizeAggregate(
-InsertDestination *output_destination) {
-  if (group_by_list_.empty()) {
-finalizeSingleState(output_destination);
+std::size_t AggregationOperationState::getNumInitializationPartitions() 
const {
+  if (is_aggregate_collision_free_) {
+return static_cast(
+collision_free_hashtable_.get())->getNumInitializationPartitions();
   } else {
-finalizeHashTable(output_destination);
+return 0u;
   }
 }
 
-void AggregationOperationState::mergeSingleState(
-const std::vector<std::unique_ptr> _state) {
-  DEBUG_ASSERT(local_state.size() == single_states_.size());
-  for (std::size_t agg_idx = 0; agg_idx < handles_.size(); ++agg_idx) {
-if (!is_distinct_[agg_idx]) {
-  handles_[agg_idx]->mergeStates(*local_state[agg_idx],
- single_states_[agg_idx].get());
-}
+std::size_t AggregationOperationState::getNumFinalizationPartitions() 
const {
+  if (is_aggregate_collision_free_) {
+return static_cast(
+collision_free_hashtable_.get())->getNumFinalizationPartitions();
+  } else if (is_aggregate_partitioned_) {
+return partitioned_group_by_hashtable_pool_->getNumPartitions();
+  } else  {
+return 1u;
   }
 }
 
-void AggregationOperationState::aggregateBlockSingleState(
-const block_id input_block) {
-  // Aggregate per-block state for each aggregate.
-  std::vector<std::unique_ptr> local_state;
+void AggregationOperationState::initialize(const std::size_t partition_id) 
{
--- End diff --

The "partition" here is somehow not related to catalog or relations. I.e. 
it refers to the manual segmentation of the memory / hash table entries but not 
the partitioning of storage blocks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99901875
  
--- Diff: storage/AggregationOperationState.hpp ---
@@ -156,6 +152,29 @@ class AggregationOperationState {
   const CatalogDatabaseLite );
 
   /**
+   * @brief Get the number of partitions to be used for initializing the
+   *aggregation.
+   *
+   * @return The number of partitions to be used for initializing the 
aggregation.
+   **/
+  std::size_t getNumInitializationPartitions() const;
+
+  /**
+   * @brief Get the number of partitions to be used for finalizing the
+   *aggregation.
+   *
+   * @return The number of partitions to be used for finalizing the 
aggregation.
+   **/
+  std::size_t getNumFinalizationPartitions() const;
--- End diff --

Currently initialization is for `memset` the storage memory. It is much 
faster than finalization so we expect to have a smaller number of 
initialization partitions than finalization partitions if the storage memory is 
not large.

E.g. suppose the vector table has 1 million entries of `int` keys and 
`std::int64_t` states -- amount to 12MB storage memory. Then we may want 80 
partitions to finalize the table, but would not create 80 work orders to 
initialize the 12MB memory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #179: QUICKSTEP-70-71 Improve aggregation p...

2017-02-07 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/179#discussion_r99884619
  
--- Diff: query_optimizer/ExecutionGenerator.cpp ---
@@ -1495,9 +1607,28 @@ void ExecutionGenerator::convertAggregate(
   }
 
   if (!group_by_types.empty()) {
-// Right now, only SeparateChaining is supported.
-aggr_state_proto->set_hash_table_impl_type(
-serialization::HashTableImplType::SEPARATE_CHAINING);
+const std::size_t estimated_num_groups =
+
cost_model_for_aggregation_->estimateNumGroupsForAggregate(physical_plan);
+
+std::size_t max_num_groups;
+const bool can_use_collision_free_aggregation =
+canUseCollisionFreeAggregation(physical_plan,
+   estimated_num_groups,
+   _num_groups);
+
+if (can_use_collision_free_aggregation) {
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #180: Fix the bug with SingleIdentityHashFilter wh...

2017-02-05 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/180
  
Assigned to @zuyu.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #177: Reduce the number of group-by attributes by ...

2017-02-05 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/177
  
This optimization is somehow different from `AggregatePushDown`. I updated 
the example in this PR's description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #174: Push down disjunctive predicates to f...

2017-01-30 Thread jianqiao
GitHub user jianqiao opened a pull request:

https://github.com/apache/incubator-quickstep/pull/174

Push down disjunctive predicates to filter small-cardinality stored 
relation early.

This PR implements an optimization (physical plan transformation) that 
pushes down disjunctive predicate to filter stored relations early when proper 
conditions are met.

Here we elaborate the conditions. Let
```
P = p_{1,1} AND ... AND p_{1, m_1}
OR
...
OR
p_{n,1} AND ... AND p_{n, m_n}
```
be a predicate in _disjunctive normal form_.
 
Now consider each small-cardinality relation R, if for each `i` in `1..n`, 
there exists at least one predicate `p_{i, k_i}` that is applicable to R. Then 
we can construct a new predicate
```
P' = p_{1, k_1} OR ... OR p_{n, k_n}
```
and push down `P'` to be applied to R.
 
Also, if any conjunctive component in `P` contains more than one predicate 
that is applicable to R, then we can combine all these applicable predicates as 
a conjunctive component in `P'`.
 
Finally, note that if there exists a conjunctive component that contains no 
predicate applicable to R. Then the condition fails and we cannot do a push 
down for R.

This optimization improves the performance of TPC-H Q07 from ~17 seconds to 
~4 seconds, with SF100 on a cloudlab machine.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-quickstep 
push-down-disjunctive-predicate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/174.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #174


commit 6a184c8ad981343dd50f04c76d3937bcdce34ddc
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-01-30T07:02:19Z

Push down low cost disjunctive predicates to filter the stored relations 
early




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-30 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/171#discussion_r98543218
  
--- Diff: relational_operators/HashJoinOperator.cpp ---
@@ -504,123 +616,61 @@ void HashInnerJoinWorkOrder::execute() {
 // hash join is below a reasonable threshold so that we don't blow up
 // temporary memory requirements to an unreasonable degree.
 if (residual_predicate_ != nullptr) {
-  std::pair<std::vector, std::vector> 
filtered_matches;
+  PairOfVectors filtered_matches;
+
   for (std::size_t i = 0; i < build_tids.size(); ++i) {
 if (residual_predicate_->matchesForJoinedTuples(*build_accessor,
 build_relation_id,
 build_tids[i],
 *probe_accessor,
 probe_relation_id,
 probe_tids[i])) {
-  filtered_matches.first.push_back(build_tids[i]);
-  filtered_matches.second.push_back(probe_tids[i]);
+  filtered_matches.first.emplace_back(build_tids[i]);
+  filtered_matches.second.emplace_back(probe_tids[i]);
 }
   }
 
   build_block_entry.second = std::move(filtered_matches);
 }
 
-// TODO(chasseur): If all the output expressions are ScalarAttributes,
-// we could implement a similar fast-path to 
StorageBlock::selectSimple()
-// that avoids a copy.
-//
 // TODO(chasseur): See TODO in NestedLoopsJoinOperator.cpp about 
limiting
 // the size of materialized temporary results. In common usage, this
 // probably won't be an issue for hash-joins, but in the worst case a 
hash
 // join can still devolve into a cross-product.
-//
-// NOTE(chasseur): We could also create one big 
ColumnVectorsValueAccessor
-// and accumulate all the results across multiple block pairs into it
-// before inserting anything into output blocks, but this would require
-// some significant API extensions to the expressions system for a 
dubious
-// benefit (probably only a real performance win when there are very 
few
-// matching tuples in each individual inner block but very many inner
-// blocks with at least one match).
-
-// We now create ordered value accessors for both build and probe side,
-// using the joined tuple TIDs. Note that we have to use this 
Lambda-based
-// invocation method here because the accessors don't have a virtual
-// function that creates such an 
OrderedTupleIdSequenceAdapterValueAccessor.
-std::unique_ptr ordered_build_accessor, 
ordered_probe_accessor;
-InvokeOnValueAccessorNotAdapter(
-build_accessor.get(),
-[&](auto *accessor) -> void {  // NOLINT(build/c++11)
-  ordered_build_accessor.reset(
-  
accessor->createSharedOrderedTupleIdSequenceAdapter(build_tids));
-});
-
-if (probe_accessor->isTupleIdSequenceAdapter()) {
-  InvokeOnTupleIdSequenceAdapterValueAccessor(
-probe_accessor.get(),
-[&](auto *accessor) -> void {  // NOLINT(build/c++11)
-  ordered_probe_accessor.reset(
-
accessor->createSharedOrderedTupleIdSequenceAdapter(probe_tids));
-});
-} else {
-  InvokeOnValueAccessorNotAdapter(
-probe_accessor.get(),
-[&](auto *accessor) -> void {  // NOLINT(build/c++11)
-  ordered_probe_accessor.reset(
-
accessor->createSharedOrderedTupleIdSequenceAdapter(probe_tids));
-});
-}
-
 
 // We also need a temp value accessor to store results of any scalar 
expressions.
 ColumnVectorsValueAccessor temp_result;
+if (!non_trivial_expressions.empty()) {
+  // The getAllValuesForJoin function below needs joined tuple IDs as a
+  // vector of pair of (build-tuple-ID, probe-tuple-ID), and we have a 
pair
+  // of (build-tuple-IDs-vector, probe-tuple-IDs-vector). So we'll 
have to
+  // zip our two vectors together.
+  VectorOfPairs zipped_joined_tuple_ids;
+  zipped_joined_tuple_ids.reserve(build_tids.size());
+  for (std::size_t i = 0; i < build_tids.size(); ++i) {
+zipped_joined_tuple_ids.emplace_back(build_tids[i], probe_tids[i]);
+  }
 
-// Create a map of ValueAccessors and what attributes we want to pick 
from them
-std::vector<std::pair>> 
accessor_attribute_map;
-const std::vector accessors{
-   

[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-30 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/171#discussion_r98546082
  
--- Diff: query_optimizer/rules/ReorderColumns.cpp ---
@@ -0,0 +1,205 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "query_optimizer/rules/ReorderColumns.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExprId.hpp"
+#include "query_optimizer/expressions/NamedExpression.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/PatternMatcher.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/PhysicalType.hpp"
+#include "query_optimizer/physical/Selection.hpp"
+#include "query_optimizer/physical/TableReference.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+namespace optimizer {
+
+namespace E = ::quickstep::optimizer::expressions;
+namespace P = ::quickstep::optimizer::physical;
+
+P::PhysicalPtr ReorderColumns::apply(const P::PhysicalPtr ) {
+  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
+
+  return applyInternal(input, true);
+}
+
+P::PhysicalPtr ReorderColumns::applyInternal(const P::PhysicalPtr ,
+ bool lock_ordering) {
+  // We have to guarantee that the top level ordering of the columns remain
+  // unchanged so that the output columns are ordered as specified by the 
user.
+  // So here we use the flag "lock_ordering" to skip the first 
transformable
+  // node (i.e. the first Selection or HashJoin).
+  const bool is_not_transformable = !IsTransformable(input);
+  const bool skip_transform = lock_ordering || is_not_transformable;
+  lock_ordering = lock_ordering && is_not_transformable;
+
+  if (skip_transform) {
+std::vector new_children;
+for (const P::PhysicalPtr  : input->children()) {
+  new_children.emplace_back(applyInternal(child, lock_ordering));
+}
+
+if (new_children != input->children()) {
+  return input->copyWithNewChildren(new_children);
+} else {
+  return input;
+}
+  }
+
+  // Collect the maximal chain of transformable nodes.
+  std::vector nodes;
+  for (P::PhysicalPtr node = input; IsTransformable(node); node = 
node->children().front()) {
+nodes.emplace_back(node);
+  }
+  std::reverse(nodes.begin(), nodes.end());
+
+  // A greedy algorithm that reorders the output attributes based on the 
GEN/KILL
+  // intervals. This algorithm works well with SSB/TPCH queries and is not 
likely
+  // to make the plans worse for whatever queries.
+  std::unordered_map<E::ExprId, std::size_t> base, gen, kill;
+
+  const P::PhysicalPtr base_node =
+  applyInternal(nodes.front()->children().front(), false);
+  P::TableReferencePtr base_table;
+  if (P::SomeTableReference::MatchesWithConditionalCast(base_node, 
_table)) {
+  }
+
+  const std::vector base_attrs =
+  nodes.front()->children().front()->getOutputAttributes();
+  for (std::size_t i = 0; i < base_attrs.size(); ++i) {
+base.emplace(base_attrs[i]->id(), i);
+  }
+
+  for (std::size_t i = 0; i < nodes.size(); ++i) {
+for (const auto  : nodes[i]->getOutputAttributes()) {
+  const E::ExprId attr_id = attr->id();
+  if (gen.find(attr_id) == gen.end()) {
+gen.emplace(attr_id, i);
+  }
+  kill[attr_id] = i;
+}
+  }
+
+  const auto comparator = [, , ](const E::NamedExpressionPtr 
,
+   const E::Name

[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-30 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/171#discussion_r98543122
  
--- Diff: relational_operators/HashJoinOperator.cpp ---
@@ -504,123 +616,61 @@ void HashInnerJoinWorkOrder::execute() {
 // hash join is below a reasonable threshold so that we don't blow up
 // temporary memory requirements to an unreasonable degree.
 if (residual_predicate_ != nullptr) {
-  std::pair<std::vector, std::vector> 
filtered_matches;
+  PairOfVectors filtered_matches;
+
   for (std::size_t i = 0; i < build_tids.size(); ++i) {
 if (residual_predicate_->matchesForJoinedTuples(*build_accessor,
 build_relation_id,
 build_tids[i],
 *probe_accessor,
 probe_relation_id,
 probe_tids[i])) {
-  filtered_matches.first.push_back(build_tids[i]);
-  filtered_matches.second.push_back(probe_tids[i]);
+  filtered_matches.first.emplace_back(build_tids[i]);
+  filtered_matches.second.emplace_back(probe_tids[i]);
 }
   }
 
   build_block_entry.second = std::move(filtered_matches);
 }
 
-// TODO(chasseur): If all the output expressions are ScalarAttributes,
-// we could implement a similar fast-path to 
StorageBlock::selectSimple()
-// that avoids a copy.
-//
 // TODO(chasseur): See TODO in NestedLoopsJoinOperator.cpp about 
limiting
 // the size of materialized temporary results. In common usage, this
 // probably won't be an issue for hash-joins, but in the worst case a 
hash
 // join can still devolve into a cross-product.
-//
-// NOTE(chasseur): We could also create one big 
ColumnVectorsValueAccessor
-// and accumulate all the results across multiple block pairs into it
-// before inserting anything into output blocks, but this would require
-// some significant API extensions to the expressions system for a 
dubious
-// benefit (probably only a real performance win when there are very 
few
-// matching tuples in each individual inner block but very many inner
-// blocks with at least one match).
-
-// We now create ordered value accessors for both build and probe side,
-// using the joined tuple TIDs. Note that we have to use this 
Lambda-based
-// invocation method here because the accessors don't have a virtual
-// function that creates such an 
OrderedTupleIdSequenceAdapterValueAccessor.
-std::unique_ptr ordered_build_accessor, 
ordered_probe_accessor;
-InvokeOnValueAccessorNotAdapter(
-build_accessor.get(),
-[&](auto *accessor) -> void {  // NOLINT(build/c++11)
-  ordered_build_accessor.reset(
-  
accessor->createSharedOrderedTupleIdSequenceAdapter(build_tids));
-});
-
-if (probe_accessor->isTupleIdSequenceAdapter()) {
-  InvokeOnTupleIdSequenceAdapterValueAccessor(
-probe_accessor.get(),
-[&](auto *accessor) -> void {  // NOLINT(build/c++11)
-  ordered_probe_accessor.reset(
-
accessor->createSharedOrderedTupleIdSequenceAdapter(probe_tids));
-});
-} else {
-  InvokeOnValueAccessorNotAdapter(
-probe_accessor.get(),
-[&](auto *accessor) -> void {  // NOLINT(build/c++11)
-  ordered_probe_accessor.reset(
-
accessor->createSharedOrderedTupleIdSequenceAdapter(probe_tids));
-});
-}
-
 
 // We also need a temp value accessor to store results of any scalar 
expressions.
 ColumnVectorsValueAccessor temp_result;
+if (!non_trivial_expressions.empty()) {
+  // The getAllValuesForJoin function below needs joined tuple IDs as a
+  // vector of pair of (build-tuple-ID, probe-tuple-ID), and we have a 
pair
+  // of (build-tuple-IDs-vector, probe-tuple-IDs-vector). So we'll 
have to
+  // zip our two vectors together.
+  VectorOfPairs zipped_joined_tuple_ids;
+  zipped_joined_tuple_ids.reserve(build_tids.size());
--- End diff --

`reserve` is for pre-allocating memory and avoids buffer reallocation 
during the loop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at in

[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-30 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/171#discussion_r98544439
  
--- Diff: storage/ValueAccessor.hpp ---
@@ -305,6 +305,21 @@ class ValueAccessor {
   const TupleIdSequence _sequence) = 0;
 
   /**
+   * @brief Create a new OrderedTupleIdSequenceAdapterValueAccessor that 
wraps
+   *this ValueAccessor.
--- End diff --

`OrderedTupleIdSequenceAdapterValueAccessor` has its header comments at 
`ValueAccessor.hpp` line 548.
`OrderedTupleIdSequence` has its parameter comment below.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-30 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/171#discussion_r98544534
  
--- Diff: relational_operators/HashJoinOperator.cpp ---
@@ -484,7 +565,38 @@ void HashInnerJoinWorkOrder::execute() {
   const relation_id build_relation_id = build_relation_.getID();
   const relation_id probe_relation_id = probe_relation_.getID();
 
-  for (std::pair, 
std::vector>>
+  // Create a map of ValueAccessors and what attributes we want to pick 
from them.
+  std::vector<std::pair>> 
accessor_attribute_map;
+  const std::size_t build_index = 0, probe_index = 1, temp_index = 2;
+  for (std::size_t i = 0; i < 3; ++i) {
+accessor_attribute_map.emplace_back(
+nullptr /* place holder */,
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-30 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/171#discussion_r98544852
  
--- Diff: relational_operators/HashJoinOperator.cpp ---
@@ -484,7 +565,38 @@ void HashInnerJoinWorkOrder::execute() {
   const relation_id build_relation_id = build_relation_.getID();
   const relation_id probe_relation_id = probe_relation_.getID();
 
-  for (std::pair, 
std::vector>>
+  // Create a map of ValueAccessors and what attributes we want to pick 
from them.
+  std::vector<std::pair>> 
accessor_attribute_map;
+  const std::size_t build_index = 0, probe_index = 1, temp_index = 2;
+  for (std::size_t i = 0; i < 3; ++i) {
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-30 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/171#discussion_r98545487
  
--- Diff: query_optimizer/PhysicalGenerator.cpp ---
@@ -109,6 +114,13 @@ P::PhysicalPtr PhysicalGenerator::optimizePlan() {
   } else {
 rules.emplace_back(new SwapProbeBuild());
   }
+  if (FLAGS_reorder_columns) {
+// NOTE(jianqiao): This optimization relies on the fact that the 
intermediate
+// relations all have SPLIT_ROW_STORE layouts. If this fact gets 
changed, the
+// optimization algorithm may need to be updated and the performance 
impact
+// should be re-evaluated.
--- End diff --

Currently this cannot be checked because the temporary relations' layouts 
are generated in `ExecutionGenerator` when they get created.

In the future we will reason about the layouts during query optimization 
and add corresponding data structures.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-30 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/171#discussion_r98542466
  
--- Diff: relational_operators/HashJoinOperator.cpp ---
@@ -63,6 +65,9 @@ namespace quickstep {
 
 namespace {
 
+typedef std::vector<std::pair<tuple_id, tuple_id>> VectorOfPairs;
+typedef std::pair<std::vector, std::vector> 
PairOfVectors;
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-30 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/171#discussion_r98543320
  
--- Diff: query_optimizer/rules/ReorderColumns.hpp ---
@@ -0,0 +1,75 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_REORDER_COLUMNS_HPP_
+#define QUICKSTEP_QUERY_OPTIMIZER_RULES_REORDER_COLUMNS_HPP_
+
+#include 
+
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/rules/Rule.hpp"
+#include "utility/Macros.hpp"
+
+namespace quickstep {
+namespace optimizer {
+
+/** \addtogroup OptimizerRules
+ *  @{
+ */
+
+/**
+ * @brief Rule that applies to a physical plan to adjust the orderings of 
some
+ *intermediate nodes' output attributes to improve copy 
performance.
+ *
+ * @note This optimization is based on the fact that the intermediate 
relations
+ *   all have SPLIT_ROW_STORE layouts. If this fact gets changed, the 
rule's
+ *   algorithm may need to be updated and the performance impact 
should be
+ *   re-evaluated.
+ */
+class ReorderColumns : public Rule {
+ public:
+  /**
+   * @brief Constructor.
+   */
+  ReorderColumns() {}
+
+  ~ReorderColumns() override {}
+
+  std::string getName() const override {
+return "ReorderColumns";
+  }
+
+  physical::PhysicalPtr apply(const physical::PhysicalPtr ) override;
+
+ private:
+  physical::PhysicalPtr applyInternal(const physical::PhysicalPtr ,
+  bool lock_ordering);
+
+  // Whether the physical node can
+  inline static bool IsTransformable(const physical::PhysicalPtr );
+
+  DISALLOW_COPY_AND_ASSIGN(ReorderColumns);
+};
+
+/** @} */
+
+}  // namespace optimizer
+}  // namespace quickstep
+
+#endif /* QUICKSTEP_QUERY_OPTIMIZER_RULES_REORDER_COLUMNS_HPP_ */
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-30 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/171#discussion_r98541113
  
--- Diff: query_optimizer/rules/ReorderColumns.cpp ---
@@ -0,0 +1,205 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "query_optimizer/rules/ReorderColumns.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExprId.hpp"
+#include "query_optimizer/expressions/NamedExpression.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/PatternMatcher.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/PhysicalType.hpp"
+#include "query_optimizer/physical/Selection.hpp"
+#include "query_optimizer/physical/TableReference.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+namespace optimizer {
+
+namespace E = ::quickstep::optimizer::expressions;
+namespace P = ::quickstep::optimizer::physical;
+
+P::PhysicalPtr ReorderColumns::apply(const P::PhysicalPtr ) {
+  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
+
+  return applyInternal(input, true);
+}
+
+P::PhysicalPtr ReorderColumns::applyInternal(const P::PhysicalPtr ,
+ bool lock_ordering) {
+  // We have to guarantee that the top level ordering of the columns remain
+  // unchanged so that the output columns are ordered as specified by the 
user.
+  // So here we use the flag "lock_ordering" to skip the first 
transformable
+  // node (i.e. the first Selection or HashJoin).
+  const bool is_not_transformable = !IsTransformable(input);
+  const bool skip_transform = lock_ordering || is_not_transformable;
+  lock_ordering = lock_ordering && is_not_transformable;
+
+  if (skip_transform) {
+std::vector new_children;
+for (const P::PhysicalPtr  : input->children()) {
+  new_children.emplace_back(applyInternal(child, lock_ordering));
+}
+
+if (new_children != input->children()) {
+  return input->copyWithNewChildren(new_children);
+} else {
+  return input;
+}
+  }
+
+  // Collect the maximal chain of transformable nodes.
+  std::vector nodes;
+  for (P::PhysicalPtr node = input; IsTransformable(node); node = 
node->children().front()) {
+nodes.emplace_back(node);
+  }
+  std::reverse(nodes.begin(), nodes.end());
+
+  // A greedy algorithm that reorders the output attributes based on the 
GEN/KILL
+  // intervals. This algorithm works well with SSB/TPCH queries and is not 
likely
+  // to make the plans worse for whatever queries.
+  std::unordered_map<E::ExprId, std::size_t> base, gen, kill;
+
+  const P::PhysicalPtr base_node =
+  applyInternal(nodes.front()->children().front(), false);
+  P::TableReferencePtr base_table;
+  if (P::SomeTableReference::MatchesWithConditionalCast(base_node, 
_table)) {
+  }
--- End diff --

Removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-30 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/171#discussion_r98538859
  
--- Diff: query_optimizer/rules/ReorderColumns.cpp ---
@@ -0,0 +1,205 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "query_optimizer/rules/ReorderColumns.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExprId.hpp"
+#include "query_optimizer/expressions/NamedExpression.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/PatternMatcher.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/PhysicalType.hpp"
+#include "query_optimizer/physical/Selection.hpp"
+#include "query_optimizer/physical/TableReference.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+namespace optimizer {
+
+namespace E = ::quickstep::optimizer::expressions;
+namespace P = ::quickstep::optimizer::physical;
+
+P::PhysicalPtr ReorderColumns::apply(const P::PhysicalPtr ) {
+  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
+
+  return applyInternal(input, true);
+}
+
+P::PhysicalPtr ReorderColumns::applyInternal(const P::PhysicalPtr ,
+ bool lock_ordering) {
+  // We have to guarantee that the top level ordering of the columns remain
+  // unchanged so that the output columns are ordered as specified by the 
user.
+  // So here we use the flag "lock_ordering" to skip the first 
transformable
+  // node (i.e. the first Selection or HashJoin).
+  const bool is_not_transformable = !IsTransformable(input);
+  const bool skip_transform = lock_ordering || is_not_transformable;
+  lock_ordering = lock_ordering && is_not_transformable;
+
+  if (skip_transform) {
+std::vector new_children;
+for (const P::PhysicalPtr  : input->children()) {
+  new_children.emplace_back(applyInternal(child, lock_ordering));
+}
+
+if (new_children != input->children()) {
+  return input->copyWithNewChildren(new_children);
+} else {
+  return input;
+}
+  }
+
+  // Collect the maximal chain of transformable nodes.
+  std::vector nodes;
+  for (P::PhysicalPtr node = input; IsTransformable(node); node = 
node->children().front()) {
+nodes.emplace_back(node);
+  }
+  std::reverse(nodes.begin(), nodes.end());
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-30 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/171#discussion_r98541098
  
--- Diff: query_optimizer/rules/ReorderColumns.cpp ---
@@ -0,0 +1,205 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "query_optimizer/rules/ReorderColumns.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExprId.hpp"
+#include "query_optimizer/expressions/NamedExpression.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/PatternMatcher.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/PhysicalType.hpp"
+#include "query_optimizer/physical/Selection.hpp"
+#include "query_optimizer/physical/TableReference.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+namespace optimizer {
+
+namespace E = ::quickstep::optimizer::expressions;
+namespace P = ::quickstep::optimizer::physical;
+
+P::PhysicalPtr ReorderColumns::apply(const P::PhysicalPtr ) {
+  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
+
+  return applyInternal(input, true);
+}
+
+P::PhysicalPtr ReorderColumns::applyInternal(const P::PhysicalPtr ,
+ bool lock_ordering) {
+  // We have to guarantee that the top level ordering of the columns remain
+  // unchanged so that the output columns are ordered as specified by the 
user.
+  // So here we use the flag "lock_ordering" to skip the first 
transformable
+  // node (i.e. the first Selection or HashJoin).
+  const bool is_not_transformable = !IsTransformable(input);
+  const bool skip_transform = lock_ordering || is_not_transformable;
+  lock_ordering = lock_ordering && is_not_transformable;
+
+  if (skip_transform) {
+std::vector new_children;
+for (const P::PhysicalPtr  : input->children()) {
+  new_children.emplace_back(applyInternal(child, lock_ordering));
+}
+
+if (new_children != input->children()) {
+  return input->copyWithNewChildren(new_children);
+} else {
+  return input;
+}
+  }
+
+  // Collect the maximal chain of transformable nodes.
+  std::vector nodes;
+  for (P::PhysicalPtr node = input; IsTransformable(node); node = 
node->children().front()) {
+nodes.emplace_back(node);
+  }
+  std::reverse(nodes.begin(), nodes.end());
+
+  // A greedy algorithm that reorders the output attributes based on the 
GEN/KILL
+  // intervals. This algorithm works well with SSB/TPCH queries and is not 
likely
+  // to make the plans worse for whatever queries.
+  std::unordered_map<E::ExprId, std::size_t> base, gen, kill;
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-30 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/171#discussion_r98538359
  
--- Diff: query_optimizer/rules/ReorderColumns.cpp ---
@@ -0,0 +1,205 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "query_optimizer/rules/ReorderColumns.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExprId.hpp"
+#include "query_optimizer/expressions/NamedExpression.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/PatternMatcher.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/PhysicalType.hpp"
+#include "query_optimizer/physical/Selection.hpp"
+#include "query_optimizer/physical/TableReference.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+namespace optimizer {
+
+namespace E = ::quickstep::optimizer::expressions;
+namespace P = ::quickstep::optimizer::physical;
+
+P::PhysicalPtr ReorderColumns::apply(const P::PhysicalPtr ) {
+  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
+
+  return applyInternal(input, true);
+}
+
+P::PhysicalPtr ReorderColumns::applyInternal(const P::PhysicalPtr ,
+ bool lock_ordering) {
+  // We have to guarantee that the top level ordering of the columns remain
+  // unchanged so that the output columns are ordered as specified by the 
user.
+  // So here we use the flag "lock_ordering" to skip the first 
transformable
+  // node (i.e. the first Selection or HashJoin).
+  const bool is_not_transformable = !IsTransformable(input);
+  const bool skip_transform = lock_ordering || is_not_transformable;
+  lock_ordering = lock_ordering && is_not_transformable;
+
+  if (skip_transform) {
+std::vector new_children;
+for (const P::PhysicalPtr  : input->children()) {
+  new_children.emplace_back(applyInternal(child, lock_ordering));
--- End diff --

Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-30 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/171#discussion_r98541504
  
--- Diff: query_optimizer/rules/ReorderColumns.cpp ---
@@ -0,0 +1,205 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "query_optimizer/rules/ReorderColumns.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExprId.hpp"
+#include "query_optimizer/expressions/NamedExpression.hpp"
+#include "query_optimizer/physical/HashJoin.hpp"
+#include "query_optimizer/physical/PatternMatcher.hpp"
+#include "query_optimizer/physical/Physical.hpp"
+#include "query_optimizer/physical/PhysicalType.hpp"
+#include "query_optimizer/physical/Selection.hpp"
+#include "query_optimizer/physical/TableReference.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+namespace optimizer {
+
+namespace E = ::quickstep::optimizer::expressions;
+namespace P = ::quickstep::optimizer::physical;
+
+P::PhysicalPtr ReorderColumns::apply(const P::PhysicalPtr ) {
+  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
+
+  return applyInternal(input, true);
+}
+
+P::PhysicalPtr ReorderColumns::applyInternal(const P::PhysicalPtr ,
+ bool lock_ordering) {
+  // We have to guarantee that the top level ordering of the columns remain
+  // unchanged so that the output columns are ordered as specified by the 
user.
+  // So here we use the flag "lock_ordering" to skip the first 
transformable
+  // node (i.e. the first Selection or HashJoin).
+  const bool is_not_transformable = !IsTransformable(input);
+  const bool skip_transform = lock_ordering || is_not_transformable;
+  lock_ordering = lock_ordering && is_not_transformable;
+
+  if (skip_transform) {
+std::vector new_children;
+for (const P::PhysicalPtr  : input->children()) {
+  new_children.emplace_back(applyInternal(child, lock_ordering));
+}
+
+if (new_children != input->children()) {
+  return input->copyWithNewChildren(new_children);
+} else {
+  return input;
+}
+  }
+
+  // Collect the maximal chain of transformable nodes.
+  std::vector nodes;
+  for (P::PhysicalPtr node = input; IsTransformable(node); node = 
node->children().front()) {
+nodes.emplace_back(node);
+  }
+  std::reverse(nodes.begin(), nodes.end());
+
+  // A greedy algorithm that reorders the output attributes based on the 
GEN/KILL
+  // intervals. This algorithm works well with SSB/TPCH queries and is not 
likely
+  // to make the plans worse for whatever queries.
+  std::unordered_map<E::ExprId, std::size_t> base, gen, kill;
+
+  const P::PhysicalPtr base_node =
+  applyInternal(nodes.front()->children().front(), false);
+  P::TableReferencePtr base_table;
+  if (P::SomeTableReference::MatchesWithConditionalCast(base_node, 
_table)) {
+  }
+
+  const std::vector base_attrs =
+  nodes.front()->children().front()->getOutputAttributes();
+  for (std::size_t i = 0; i < base_attrs.size(); ++i) {
+base.emplace(base_attrs[i]->id(), i);
+  }
+
+  for (std::size_t i = 0; i < nodes.size(); ++i) {
+for (const auto  : nodes[i]->getOutputAttributes()) {
+  const E::ExprId attr_id = attr->id();
+  if (gen.find(attr_id) == gen.end()) {
+gen.emplace(attr_id, i);
+  }
+  kill[attr_id] = i;
+}
+  }
+
+  const auto comparator = [, , ](const E::NamedExpressionPtr 
,
+   const E::Name

[GitHub] incubator-quickstep pull request #170: Add unit test for CatalogRelationStat...

2017-01-29 Thread jianqiao
GitHub user jianqiao opened a pull request:

https://github.com/apache/incubator-quickstep/pull/170

Add unit test for CatalogRelationStatistics

This PR adds unit tests for CatalogRelationStatistics. It tests that the 
`\analyze` command can produce correct stats, and that the `is_exact_` flag be 
set properly with insert / update / delete / analyze operations.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-quickstep exact-stat-unittest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/170.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #170


commit d2328293578075d6214f508c7f6a4bab936a0d1e
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-01-24T02:54:51Z

Add unit test for CatalogRelationStatistics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-29 Thread jianqiao
GitHub user jianqiao opened a pull request:

https://github.com/apache/incubator-quickstep/pull/171

QUICKSTEP-68 Reorder intermediate relations'  attributes to improve copy 
performance.

Currently, all the intermediate relations (i.e. temporary relations created 
during query execution) have `SPLIT_ROW_STORE` layouts. One existing 
optimization that improves copy performance for `SPLIT_ROW_STORE` layout is 
that, if two (or more) attributes are consecutive in both the source relation 
and the destination relation, then we can memory-copy the attributes' byte 
stream with just one `std::memcpy` call.

This PR is a complementary improvement that it adjusts the ordering of the 
attributes in the physical plan to maximize the opportunity of copying 
consecutive attributes – i.e. to minimize the number of `std::memcpy` calls.

This PR uses a simple greedy algorithm to reorder the attributes, which 
works well with SSB/TPCH workloads. The algorithm may be further improved later.

The feature can be turned on/off (default is **on**) with the 
`-reorder_columns` flag.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-quickstep reorder-attrs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/171.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #171


commit 2c1c0905b18e7ecd018a3a05b9db03fdf987720b
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-01-13T00:41:17Z

Reorder output attribute order to improve copy performance.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #170: Add unit test for CatalogRelationStatistics

2017-01-29 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/170
  
Assigned to @zuyu.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-29 Thread jianqiao
Github user jianqiao closed the pull request at:

https://github.com/apache/incubator-quickstep/pull/171


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #172: Query optimization with ExactFilter

2017-01-29 Thread jianqiao
GitHub user jianqiao opened a pull request:

https://github.com/apache/incubator-quickstep/pull/172

Query optimization with ExactFilter

This is a follow-up optimization based on the facility provided by 
LIPFilters. Note that LIP (lookahead information passing) is an optimization 
that we can inject efficient filters (e.g. bloom filters) into 
Select/HashJoin/Aggregate operators to pre-filter the input relations.

This PR strength-reduces `HashJoin`s (including inner/semi/anti joins) into 
`FilterJoin`s. The semantics of a `FilterJoin` is simple: if certain conditions 
are met, we can build a bit vector from the build side and use the bit vector 
to _filter_ the probe side.

The execution part is slightly more optimized: a `FilterJoin` will not 
always be converted into a `SelectOperator` plus a `LIPFilter` as its semantics 
indicates. Instead, in most situations we can avoid creating the 
`SelectOperator` by attaching the `LIPFilter` properly to some downstream 
operators – thus avoid unnecessary materialization of intermediate relations.


Below shows the performance improvement for SSB scale factor 100 on a 
cloudlab machine:

**SSB SF100**|**master (ms)**|**w/ ExactFilter (ms)**
:-:|:-:|:-:
Q01|709|574
Q02|648|593
Q03|605|564
Q04|906|675
Q05|754|457
Q06|498|549
Q07|1687|1696
Q08|598|591
Q09|481|470
Q10|450|442
Q11|1208|882
Q12|876|656
Q13|515|475
Total|9937|8625

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-quickstep exact-filter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/172.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #172






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #171: QUICKSTEP-68 Reorder intermediate rel...

2017-01-29 Thread jianqiao
GitHub user jianqiao reopened a pull request:

https://github.com/apache/incubator-quickstep/pull/171

QUICKSTEP-68 Reorder intermediate relations'  attributes to improve copy 
performance.

Currently, all the intermediate relations (i.e. temporary relations created 
during query execution) have `SPLIT_ROW_STORE` layouts. One existing 
optimization that improves copy performance for `SPLIT_ROW_STORE` layout is 
that, if two (or more) attributes are consecutive in both the source relation 
and the destination relation, then we can memory-copy the attributes' byte 
stream with just one `std::memcpy` call.

This PR is a complementary improvement that it adjusts the ordering of the 
attributes in the physical plan to maximize the opportunity of copying 
consecutive attributes – i.e. to minimize the number of `std::memcpy` calls.

This PR uses a simple greedy algorithm to reorder the attributes, which 
works well with SSB/TPCH workloads. The algorithm may be further improved later.

The feature can be turned on/off (default is **on**) with the 
`-reorder_columns` flag.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-quickstep reorder-attrs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/171.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #171


commit 62b1742ef29bc8dea19d08fac6e0e04f45dc3f8a
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-01-13T00:41:17Z

Reorder output attribute order to improve copy performance.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #170: Add unit test for CatalogRelationStat...

2017-01-29 Thread jianqiao
Github user jianqiao closed the pull request at:

https://github.com/apache/incubator-quickstep/pull/170


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #170: Add unit test for CatalogRelationStatistics

2017-01-29 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/170
  
Updated. Now closing and reopening the PR to restart travis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #170: Add unit test for CatalogRelationStat...

2017-01-29 Thread jianqiao
GitHub user jianqiao reopened a pull request:

https://github.com/apache/incubator-quickstep/pull/170

Add unit test for CatalogRelationStatistics

This PR adds unit tests for CatalogRelationStatistics. It tests that the 
`\analyze` command can produce correct stats, and that the `is_exact_` flag be 
set properly with insert / update / delete / analyze operations.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-quickstep exact-stat-unittest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/170.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #170


commit 258661fc249da2abf92572f5a1875d897b545568
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-01-24T02:54:51Z

Add unit test for CatalogRelationStatistics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #165: Added Operator support for Partitione...

2017-01-20 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/165#discussion_r97168067
  
--- Diff: relational_operators/HashJoinOperator.cpp ---
@@ -207,49 +207,43 @@ bool HashJoinOperator::getAllNonOuterJoinWorkOrders(
 query_context->getScalarGroup(selection_index_);
 InsertDestination *output_destination =
 query_context->getInsertDestination(output_destination_index_);
-const JoinHashTable _table =
-*(query_context->getJoinHashTable(hash_table_index_));
 
 if (probe_relation_is_stored_) {
-  if (!started_) {
-for (const block_id probe_block_id : probe_relation_block_ids_) {
+  if (started_) {
+return true;
+  }
+
+  for (std::size_t part_id = 0; part_id < num_partitions_; ++part_id) {
+const JoinHashTable _table =
+*(query_context->getJoinHashTable(hash_table_index_, part_id));
+
+for (const block_id probe_block_id : 
probe_relation_block_ids_[part_id]) {
   container->addNormalWorkOrder(
-  new JoinWorkOrderClass(query_id_,
- build_relation_,
- probe_relation_,
- join_key_attributes_,
- any_join_key_attributes_nullable_,
- probe_block_id,
- residual_predicate,
- selection,
- hash_table,
- output_destination,
- storage_manager,
+  new JoinWorkOrderClass(query_id_, build_relation_, 
probe_relation_, join_key_attributes_,
+ any_join_key_attributes_nullable_, 
num_partitions_, part_id, probe_block_id,
+ residual_predicate, selection, 
hash_table, output_destination, storage_manager,
  
CreateLIPFilterAdaptiveProberHelper(lip_deployment_index_, query_context)),
   op_index_);
 }
-started_ = true;
   }
-  return started_;
+  started_ = true;
+  return true;
 } else {
-  while (num_workorders_generated_ < probe_relation_block_ids_.size()) 
{
-container->addNormalWorkOrder(
-new JoinWorkOrderClass(
-query_id_,
-build_relation_,
-probe_relation_,
-join_key_attributes_,
-any_join_key_attributes_nullable_,
-probe_relation_block_ids_[num_workorders_generated_],
-residual_predicate,
-selection,
-hash_table,
-output_destination,
-storage_manager,
-CreateLIPFilterAdaptiveProberHelper(lip_deployment_index_, 
query_context)),
-op_index_);
-++num_workorders_generated_;
-  }  // end while
+  for (std::size_t part_id = 0; part_id < num_partitions_; ++part_id) {
--- End diff --

How many partitions would we expect to have in typical situations?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #165: Added Operator support for Partitione...

2017-01-20 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/165#discussion_r97169709
  
--- Diff: relational_operators/HashJoinOperator.cpp ---
@@ -207,49 +207,43 @@ bool HashJoinOperator::getAllNonOuterJoinWorkOrders(
 query_context->getScalarGroup(selection_index_);
 InsertDestination *output_destination =
 query_context->getInsertDestination(output_destination_index_);
-const JoinHashTable _table =
-*(query_context->getJoinHashTable(hash_table_index_));
 
 if (probe_relation_is_stored_) {
-  if (!started_) {
-for (const block_id probe_block_id : probe_relation_block_ids_) {
+  if (started_) {
+return true;
+  }
+
+  for (std::size_t part_id = 0; part_id < num_partitions_; ++part_id) {
+const JoinHashTable _table =
+*(query_context->getJoinHashTable(hash_table_index_, part_id));
+
+for (const block_id probe_block_id : 
probe_relation_block_ids_[part_id]) {
   container->addNormalWorkOrder(
-  new JoinWorkOrderClass(query_id_,
- build_relation_,
- probe_relation_,
- join_key_attributes_,
- any_join_key_attributes_nullable_,
- probe_block_id,
- residual_predicate,
- selection,
- hash_table,
- output_destination,
- storage_manager,
+  new JoinWorkOrderClass(query_id_, build_relation_, 
probe_relation_, join_key_attributes_,
+ any_join_key_attributes_nullable_, 
num_partitions_, part_id, probe_block_id,
+ residual_predicate, selection, 
hash_table, output_destination, storage_manager,
  
CreateLIPFilterAdaptiveProberHelper(lip_deployment_index_, query_context)),
   op_index_);
 }
-started_ = true;
   }
-  return started_;
+  started_ = true;
+  return true;
 } else {
-  while (num_workorders_generated_ < probe_relation_block_ids_.size()) 
{
-container->addNormalWorkOrder(
-new JoinWorkOrderClass(
-query_id_,
-build_relation_,
-probe_relation_,
-join_key_attributes_,
-any_join_key_attributes_nullable_,
-probe_relation_block_ids_[num_workorders_generated_],
-residual_predicate,
-selection,
-hash_table,
-output_destination,
-storage_manager,
-CreateLIPFilterAdaptiveProberHelper(lip_deployment_index_, 
query_context)),
-op_index_);
-++num_workorders_generated_;
-  }  // end while
+  for (std::size_t part_id = 0; part_id < num_partitions_; ++part_id) {
--- End diff --

I see. It should be good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #165: Added Operator support for Partitione...

2017-01-20 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/165#discussion_r97165573
  
--- Diff: storage/StorageBlockInfo.hpp ---
@@ -49,6 +50,8 @@ static constexpr int kBlockIdDomainLengthInDigits = 
std::numeric_limits::digits10;
 static constexpr block_id_domain kMaxDomain = 
static_cast(0x);
 
+typedef std::vector blocks_in_partition;
--- End diff --

`block_in_partition` is not a primitive type. Perhaps use 
`BlockInPartition`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #167: Added test for Partitioned Hash Join.

2017-01-28 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/167
  
LGTM. Merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #190: Use BitVectorExactFilter as LIPFilter...

2017-02-22 Thread jianqiao
GitHub user jianqiao opened a pull request:

https://github.com/apache/incubator-quickstep/pull/190

Use BitVectorExactFilter as LIPFilter implementation when applicable

In the previous exact-filter PR, it was not by default use 
`BitVectorExactFilter` as the LIPFilter implementation when applicable -- 
`BitVectorExactFilter` was only used for executing `FilterJoin` nodes.

This PR is a quick update that uses the `BitVectorExactFilter` when 
applicable.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-quickstep use-exact-filter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/190.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #190


commit 2b2d7ba1970ade47b1170cd7974cb2fc53f7ba71
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-02-22T20:06:55Z

Use BitVector as LIPFilter implementation when applicable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #191: Fix the problem in the "Fuse Aggregat...

2017-02-24 Thread jianqiao
GitHub user jianqiao opened a pull request:

https://github.com/apache/incubator-quickstep/pull/191

Fix the problem in the "Fuse Aggregate with HashLeftOuterJoin" PR.

This PR addresses the bugs as pointed out in #185 as well as a few style 
issues.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-quickstep fix-fuse-agg-hash

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/191.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #191


commit a300c0420e1ff586b0cc6ce1eb2ecaafbc2eb8b3
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-02-24T16:46:06Z

Fix the problem with CrossReferenceCoalesceAggregate.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #192: Use partitioned aggregation for singl...

2017-02-24 Thread jianqiao
Github user jianqiao closed the pull request at:

https://github.com/apache/incubator-quickstep/pull/192


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #193: Use partitioned aggregation for singl...

2017-02-24 Thread jianqiao
GitHub user jianqiao opened a pull request:

https://github.com/apache/incubator-quickstep/pull/193

Use partitioned aggregation for single-function DISTINCT aggregation.

This PR uses the partitioned aggregation routine for parallelizing DISTINCT 
aggregation. It improves the performance of TPC-H Q16 from ~10s to ~2s with 
scale factor 100 on CloudLab machine.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-quickstep 
parallel-distinct-agg

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/193.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #193






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #189: patches for missed linenoise changes

2017-02-21 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/189
  
LGTM. Thanks @cramja for the fix!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #185: Fuse Aggregate with HashLeftOuterJoin...

2017-02-08 Thread jianqiao
GitHub user jianqiao opened a pull request:

https://github.com/apache/incubator-quickstep/pull/185

Fuse Aggregate with HashLeftOuterJoin to accelerate evaluation

This PR fuses `Aggregate` with `HashJoin` (left outer) into the 
`CrossReferenceCoalesceAggregate` node to enable fast-path evaluation for a 
class of queries.

Here we briefly describe the semantics of 
`CrossReferenceCoalesceAggregate`: Let `L` be a table with PRIMARY KEY `u`. Let 
`R` be a table with FOREIGN KEY `x` referring to `L(u)`. Then 
`CrossReferenceCoalesceAggregate` represents a common class of analytical 
queries that
> For each u in L, COUNT/SUM the records in R that correspond to u (i.e. 
those records satisfying R.x = L.u). In the case that there is no record for u 
in R, use 0 as the result value.

That is, `CrossReferenceCoalesceAggregate` represents queries in the 
following forms:
```
-- Form 1 --
SELECT u, COALESCE(count, 0), COALESCE(sum, 0)
FROM L LEFT OUTER JOIN (
  SELECT x, COUNT(*) AS count, SUM(y) AS sum
  FROM R
  GROUP BY x) Rt ON L.u = Rt.x;
```
or (COUNT only) 
```
-- Form 2 --
SELECT u, COUNT(x) AS count
FROM L LEFT OUTER JOIN R ON L.u = R.x
GROUP BY u;
```
where `L`, `R`, `u`, `v` and `SUM(...)/COUNT(...)` are arguments to 
`CrossReferenceCoalesceAggregate`.

The fast-path evaluation is that, we first use `L.u` to build the 
`existence_map` for a `CollisionFreeVectorTable` which is then used to perform 
aggregation on `R` grouped by `R.x`. At the time of aggregation finalization, 
any `existence_map` entry that is set by `L.u` but not updated with `R`'s 
records will have aggregation result as `0` by default -- this implicitly 
achieves the left outer join + coalesce 0 semantics.

This optimization improves TPC-H Q13's performance from ~15.5s to ~3s on a 
cloudlab machine with scale factor 100.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-quickstep 
aggregate-on-left-outer-join

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/185.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #185


commit a28b1e4d77ee12466b0801a5a7c5185f7a83e7f8
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-01-30T20:46:39Z

Fuse Aggregate with LeftOuterJoin to accelerate evaluation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #163: Marked LIP as a non-default argument.

2017-01-17 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/163
  
LGTM. Is there some special reason for doing this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #160: Added optimizer support regarding hash parti...

2017-01-17 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/160
  
LGTM except some minor style issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #160: Added optimizer support regarding has...

2017-01-17 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/160#discussion_r96450595
  
--- Diff: query_optimizer/physical/CreateTable.hpp ---
@@ -107,8 +115,9 @@ class CreateTable : public Physical {
   static CreateTablePtr Create(
   const std::string _name,
   const std::vector ,
-  const std::shared_ptr 
_properties) {
-return CreateTablePtr(new CreateTable(relation_name, attributes, 
block_properties));
+  const std::shared_ptr 
_properties,
+  const std::shared_ptr 
_scheme_header_proto) {
+return CreateTablePtr(new CreateTable(relation_name, attributes, 
block_properties, partition_scheme_header_proto));
--- End diff --

Add a comment for `partition_scheme_header_proto` in method header.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #160: Added optimizer support regarding has...

2017-01-17 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/160#discussion_r96445324
  
--- Diff: query_optimizer/resolver/Resolver.cpp ---
@@ -478,9 +484,32 @@ L::LogicalPtr Resolver::resolveCreateTable(
   std::shared_ptr
   block_properties(resolveBlockProperties(create_table_statement));
 
-  return L::CreateTable::Create(relation_name, attributes, 
block_properties);
+  std::shared_ptr 
partition_scheme_header_proto(
+  resolvePartitionClause(create_table_statement));
+
+  return L::CreateTable::Create(relation_name, attributes, 
block_properties, partition_scheme_header_proto);
+}
+
+namespace {
+
+attribute_id GetAttributeIdFromName(const 
PtrList_definition_list,
--- End diff --

Need a space between type and parameter here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #160: Added optimizer support regarding has...

2017-01-17 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/160#discussion_r96449905
  
--- Diff: query_optimizer/logical/CreateTable.hpp ---
@@ -100,8 +108,9 @@ class CreateTable : public Logical {
   static CreateTablePtr Create(
   const std::string _name,
   const std::vector ,
-  const std::shared_ptr 
_properties) {
-return CreateTablePtr(new CreateTable(relation_name, attributes, 
block_properties));
+  const std::shared_ptr 
_properties,
+  const std::shared_ptr 
_scheme_header_proto) {
--- End diff --

Add a comment for `partition_scheme_header_proto` in method header (and 
mention that `partition_scheme_header_proto` can be `nullptr` if not specified).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #198: patch to fix gcc compile error gflags

2017-02-28 Thread jianqiao
Github user jianqiao commented on the issue:

https://github.com/apache/incubator-quickstep/pull/198
  
Great! Thanks Marc @cramja.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] incubator-quickstep-0.1.0RC5

2017-03-01 Thread Jianqiao
+1. I'm good with the release.

Best,
Jianqiao

2017-03-01 17:21 GMT-06:00 HAKAN MEMISOGLU <memiso...@cs.wisc.edu>:

> Hi Marc,
>
> +1 from me. It works with Docker.
>
> However I also get an error when I try to build it with Mac.
> The linker cannot find some symbols that protoc uses.
>
> > On Mar 1, 2017, at 4:30 PM, Marc Spehlmann <spehl.apa...@gmail.com>
> wrote:
> >
> > Jianqiao and I worked today to merge some PRs that we thought should go
> in
> > the first release. Accordingly we have created another candidate.
> >
> > As per Apache, *we require 3 +1 votes* from project members to make this
> an
> > official release.
> >
> > Before voting, please test the release (build, run ctest). For a guide on
> > how to test, GOTO release/README.md
> >
> > This vote will remain open for 72 hours or until we find issues that we
> can
> > quickly correct and create another candidate.
> >
> > more details on
> >
> > https://cwiki.apache.org/confluence/display/QUICKSTEP/How+To+Release
> >
> > --Marc
>
>


[GitHub] incubator-quickstep pull request #196: Fix gcc build error with PackedPayloa...

2017-02-27 Thread jianqiao
GitHub user jianqiao opened a pull request:

https://github.com/apache/incubator-quickstep/pull/196

Fix gcc build error with PackedPayloadHashTable 

This PR fixes the gcc build error (missing `this` pointer for calling a 
method in a capture-all-by-reference lambda function).

Note: Modified `travis.yml` for testing, do not merge yet.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-quickstep fix-gcc-build

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/196.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #196


commit fa9d5bba0b200f15b3f177a4efb02843e2d31522
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-02-27T20:45:50Z

Fix PackedPayloadHashTable for gcc build

commit 77b9864ac5e878589fa5334918487bf61e219610
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2017-02-27T20:46:45Z

Simplify build for debugging




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #106: Improve StarSchemaSimpleCostModel to ...

2016-10-04 Thread jianqiao
GitHub user jianqiao opened a pull request:

https://github.com/apache/incubator-quickstep/pull/106

Improve StarSchemaSimpleCostModel to provide better group cardinality 
estimation for aggregations.

This PR revises `StarSchemaSimpleCostModel` to provide better group 
cardinality estimation for aggregations. This information will be used by a 
subsequent PR from Harshad to decide when to use _partitioned hash tables_.

The estimation relies on the _number of distinct values_ statistic of each 
group-by attribute, which can be gathered by running `\analyze` command on the 
corresponding table.

This PR does not affect the query plans for SSB queries. It changes some of 
the TPC-H query plans (and a subsequent `LIPFilter` PR will bring more changes 
to TPC-H query plans).


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/incubator-quickstep 
estimate-num-distinct-values

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/106.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #106


commit ad1f8c406954778d332f02b680e08e0118727436
Author: Jianqiao Zhu <jianq...@cs.wisc.edu>
Date:   2016-10-04T04:58:30Z

Improve StarSchemaSimpleCostModel to provide closer estimation of number of 
groups for hash aggregation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   >