[GitHub] incubator-quickstep pull request #240: Fix the issues with the common subexp...

2017-04-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-quickstep/pull/240


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

2017-04-24 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/237#discussion_r113071179
  
--- Diff: query_optimizer/expressions/Scalar.hpp ---
@@ -65,10 +67,49 @@ class Scalar : public Expression {
   const std::unordered_map& 
substitution_map)
   const = 0;
 
+  /**
+   * @brief Check whether this scalar is semantically equivalent to \p 
other.
+   *
+   * @note The fact that two scalars are semantically equal brings more
+   *   optimization opportunities, e.g. common subexpression 
elimination.
+   *   Meanwhile, it is always safe to assume that two scalars are not 
equal.
+   *
+   * @return True if this scalar equals \p other; false otherwise.
+   */
+  virtual bool equals(const ScalarPtr ) const {
+return false;
+  }
+
+  /**
+   * @brief Get a hash of this scalar.
+   *
+   * @return A hash of this scalar.
+   */
+  std::size_t hash() const {
+if (hash_cache_ == nullptr) {
+  hash_cache_ = std::make_unique(computeHash());
+}
+return *hash_cache_;
--- End diff --

This will be used by all `Scalar` expressions during the optimization 
phase, right?

Consider that only `0` will be hashed to `0`, I think it is ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

2017-04-24 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/237#discussion_r113070938
  
--- Diff: query_optimizer/expressions/Scalar.hpp ---
@@ -65,10 +67,49 @@ class Scalar : public Expression {
   const std::unordered_map& 
substitution_map)
   const = 0;
 
+  /**
+   * @brief Check whether this scalar is semantically equivalent to \p 
other.
+   *
+   * @note The fact that two scalars are semantically equal brings more
+   *   optimization opportunities, e.g. common subexpression 
elimination.
+   *   Meanwhile, it is always safe to assume that two scalars are not 
equal.
+   *
+   * @return True if this scalar equals \p other; false otherwise.
+   */
+  virtual bool equals(const ScalarPtr ) const {
+return false;
+  }
+
+  /**
+   * @brief Get a hash of this scalar.
+   *
+   * @return A hash of this scalar.
+   */
+  std::size_t hash() const {
+if (hash_cache_ == nullptr) {
+  hash_cache_ = std::make_unique(computeHash());
+}
+return *hash_cache_;
--- End diff --

Also `std::hash` returns `std::size_t`, so it may be more consistent to use 
`std::size_t` as hash value type here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

2017-04-24 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/237#discussion_r113070668
  
--- Diff: query_optimizer/expressions/Scalar.hpp ---
@@ -65,10 +67,49 @@ class Scalar : public Expression {
   const std::unordered_map& 
substitution_map)
   const = 0;
 
+  /**
+   * @brief Check whether this scalar is semantically equivalent to \p 
other.
+   *
+   * @note The fact that two scalars are semantically equal brings more
+   *   optimization opportunities, e.g. common subexpression 
elimination.
+   *   Meanwhile, it is always safe to assume that two scalars are not 
equal.
+   *
+   * @return True if this scalar equals \p other; false otherwise.
+   */
+  virtual bool equals(const ScalarPtr ) const {
+return false;
+  }
+
+  /**
+   * @brief Get a hash of this scalar.
+   *
+   * @return A hash of this scalar.
+   */
+  std::size_t hash() const {
+if (hash_cache_ == nullptr) {
+  hash_cache_ = std::make_unique(computeHash());
+}
+return *hash_cache_;
--- End diff --

There is no performance consideration here and the pointer makes it clear 
for the "null" case, since `0` itself can be a valid hash value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

2017-04-24 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/237#discussion_r113070383
  
--- Diff: query_optimizer/expressions/CommonSubexpression.hpp ---
@@ -0,0 +1,141 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#ifndef QUICKSTEP_QUERY_OPTIMIZER_EXPRESSIONS_COMMON_SUBEXPRESSION_HPP_
+#define QUICKSTEP_QUERY_OPTIMIZER_EXPRESSIONS_COMMON_SUBEXPRESSION_HPP_
+
+#include 
+#include 
+#include 
+#include 
+
+#include "query_optimizer/expressions/AttributeReference.hpp"
+#include "query_optimizer/expressions/ExprId.hpp"
+#include "query_optimizer/expressions/Expression.hpp"
+#include "query_optimizer/expressions/ExpressionType.hpp"
+#include "query_optimizer/expressions/Scalar.hpp"
+#include "utility/Macros.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+
+class Scalar;
+class Type;
+
+namespace optimizer {
+namespace expressions {
+
+/** \addtogroup OptimizerExpressions
+ *  @{
+ */
+
+class CommonSubexpression;
+typedef std::shared_ptr CommonSubexpressionPtr;
+
+/**
+ * @brief A wrapper expression that represents a common subexpression.
+ */
+class CommonSubexpression : public Scalar {
+ public:
+  ExpressionType getExpressionType() const override {
--- End diff --

We may or may not add an `override` destructor since it is a no-op. 
Considering that all other subclasses of `Scalar` do not have override 
destructors, we can just omit it for consistency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

2017-04-24 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/237#discussion_r113066931
  
--- Diff: expressions/scalar/ScalarCaseExpression.cpp ---
@@ -420,15 +429,15 @@ void 
ScalarCaseExpression::MultiplexNativeColumnVector(
 void ScalarCaseExpression::MultiplexIndirectColumnVector(
 const TupleIdSequence *source_sequence,
 const TupleIdSequence _matches,
-IndirectColumnVector *case_result,
+const IndirectColumnVector _result,
 IndirectColumnVector *output) {
   if (source_sequence == nullptr) {
 TupleIdSequence::const_iterator output_pos_it = case_matches.begin();
 for (std::size_t input_pos = 0;
- input_pos < case_result->size();
+ input_pos < case_result.size();
  ++input_pos, ++output_pos_it) {
   output->positionalWriteTypedValue(*output_pos_it,
-
case_result->moveTypedValue(input_pos));
--- End diff --

`moveTypedValue ` still makes sense as a public method that might be used 
in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #240: Fix the issues with the common subexp...

2017-04-24 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/240#discussion_r113066825
  
--- Diff: query_optimizer/rules/ExtractCommonSubexpression.hpp ---
@@ -111,7 +111,7 @@ class ExtractCommonSubexpression : public 
Rule {
   bool visitAndCount(
   const expressions::ExpressionPtr ,
   ScalarCounter *counter,
-  ScalarHashable *hashable);
+  ScalarHashable *hashable) const;
--- End diff --

Good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #240: Fix the issues with the common subexp...

2017-04-24 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/240#discussion_r113066280
  
--- Diff: query_optimizer/rules/ExtractCommonSubexpression.hpp ---
@@ -111,7 +111,7 @@ class ExtractCommonSubexpression : public 
Rule {
   bool visitAndCount(
   const expressions::ExpressionPtr ,
   ScalarCounter *counter,
-  ScalarHashable *hashable);
+  ScalarHashable *hashable) const;
--- End diff --

These three methods involve the invoking of 
`optimizer_context_->nextExprId()` which modifies  the context (symbol id 
counter) of `optimizer_context_`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-quickstep/pull/239


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #232: QUICKSTEP-87 Adds network cli interface.

2017-04-24 Thread zuyu
Github user zuyu commented on the issue:

https://github.com/apache/incubator-quickstep/pull/232
  
I mean, if this class is always a base class, move the constructor in
protected session.
On Mon, Apr 24, 2017 at 12:58 PM Marc S  wrote:

> *@cramja* commented on this pull request.
> --
>
> In cli/IOInterface.hpp
> 

> :
>
> > + * specific language governing permissions and limitations
> + * under the License.
> + **/
> +
> +#ifndef QUICKSTEP_CLI_IO_INTERFACE_HPP_
> +#define QUICKSTEP_CLI_IO_INTERFACE_HPP_
> +
> +#include 
> +
> +/**
> + * Virtual base defines a generic, file-based interface around IO.
> + */
> +class IOInterface {
> + public:
> +  IOInterface() {}
> +
>
> I removed the constructor but ran into a compiler error:
>
> 
/Users/cramja/workspace/quickstep/incubator-quickstep/cli/NetworkIO.cpp:50:12: 
error: constructor for 'quickstep::NetworkIO' must explicitly initialize the 
base class 'quickstep::IOInterface' which does not have a default constructor
> NetworkIO::NetworkIO() {
>^
> 
/Users/cramja/workspace/quickstep/incubator-quickstep/cli/IOInterface.hpp:62:7: 
note: 'quickstep::IOInterface' declared here
> class IOInterface {
>   ^
>
> On doing the suggestion in the output, that too gave a compiler error.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> 
,
> or mute the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #240: Fix the issues with the common subexp...

2017-04-24 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/240#discussion_r113044397
  
--- Diff: query_optimizer/rules/ExtractCommonSubexpression.hpp ---
@@ -111,7 +111,7 @@ class ExtractCommonSubexpression : public 
Rule {
   bool visitAndCount(
   const expressions::ExpressionPtr ,
   ScalarCounter *counter,
-  ScalarHashable *hashable);
+  ScalarHashable *hashable) const;
--- End diff --

All methods below in this file could be marked as `const`:

* `transformExpressions`
* `transformExpression`
* `visitAndTransform`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #240: Fix the issues with the common subexpression...

2017-04-24 Thread zuyu
Github user zuyu commented on the issue:

https://github.com/apache/incubator-quickstep/pull/240
  
@jianqiao We missed the following comments in this PR:

* [Remove unused `moveTypedValue` method in 
`types/containers/ColumnVector.hpp`](https://github.com/apache/incubator-quickstep/pull/237#discussion_r112827764)
* [Change `hash_cache_` as a `std::uint64_t`, instead of a smart 
pointer](https://github.com/apache/incubator-quickstep/pull/237#discussion_r112828399)
* [`override` destructor needed for 
`CommonSubexpression`](https://github.com/apache/incubator-quickstep/pull/237#discussion_r112847577)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #232: QUICKSTEP-87 Adds network cli interfa...

2017-04-24 Thread cramja
Github user cramja commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/232#discussion_r113040305
  
--- Diff: CMakeLists.txt ---
@@ -145,6 +145,7 @@ if (ENABLE_VECTOR_PREDICATE_SHORT_CIRCUIT)
   )
 endif()
 
+option(ENABLE_NETWORK_CLI "Allows use of the network cli" ON)
--- End diff --

That makes sense. I will change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #240: Fix the issues with the common subexp...

2017-04-24 Thread cramja
Github user cramja commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/240#discussion_r113038771
  
--- Diff: query_optimizer/rules/ReuseAggregateExpressions.cpp ---
@@ -194,7 +194,7 @@ P::PhysicalPtr ReuseAggregateExpressions::applyToNode(
 sum_it == ref_map.end() ? kInvalidRef : sum_it->second.front();
 
 for (const std::size_t idx : avg_it->second) {
-  agg_refs[idx].reset(new AggregateReference(sum_ref, count_ref));
+  agg_refs[idx] = std::make_unique(sum_ref, 
count_ref);
--- End diff --

What is the difference between using the assignment and `make_unique` 
versus `reset` and `new`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep issue #239: Add ThreadPrivateCompactKeyHashTable for agg...

2017-04-24 Thread hbdeshmukh
Github user hbdeshmukh commented on the issue:

https://github.com/apache/incubator-quickstep/pull/239
  
Looks good to me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread hbdeshmukh
Github user hbdeshmukh commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/239#discussion_r113015464
  
--- Diff: storage/AggregationOperationState.cpp ---
@@ -715,7 +719,18 @@ void AggregationOperationState::finalizeHashTable(
 finalizeHashTableImplPartitioned(partition_id, output_destination);
   } else {
 DCHECK_EQ(0u, partition_id);
-finalizeHashTableImplThreadPrivate(output_destination);
+DCHECK(group_by_hashtable_pool_ != nullptr);
+switch (group_by_hashtable_pool_->getHashTableImplType()) {
+  case HashTableImplType::kSeparateChaining:
--- End diff --

It's better if we are consistent in naming the HashTableImplType and the 
corresponding finalize hash table function name. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/239#discussion_r112991169
  
--- Diff: storage/ThreadPrivateCompactKeyHashTable.cpp ---
@@ -0,0 +1,422 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "storage/ThreadPrivateCompactKeyHashTable.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "expressions/aggregation/AggregationHandle.hpp"
+#include "expressions/aggregation/AggregationID.hpp"
+#include "storage/StorageBlob.hpp"
+#include "storage/StorageBlockInfo.hpp"
+#include "storage/StorageManager.hpp"
+#include "storage/ValueAccessorMultiplexer.hpp"
+#include "types/Type.hpp"
+#include "types/TypeID.hpp"
+#include "types/containers/ColumnVectorsValueAccessor.hpp"
+#include "utility/ScopedBuffer.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+
+namespace {
+
+#define CASE_KEY_SIZE(value) \
+  case value: return functor(std::integral_constant())
+
+template 
+auto InvokeOnKeySize(const std::size_t key_size, const FunctorT ) {
+  switch (key_size) {
+CASE_KEY_SIZE(1);
+CASE_KEY_SIZE(2);
+CASE_KEY_SIZE(3);
+CASE_KEY_SIZE(4);
+CASE_KEY_SIZE(5);
+CASE_KEY_SIZE(6);
+CASE_KEY_SIZE(7);
+CASE_KEY_SIZE(8);
+default:
+  break;
+  }
+  LOG(FATAL) << "Unexpected key size: " << key_size;
+}
+
+#undef CASE_KEY_SIZE
+
+}  // namespace
+
+constexpr std::size_t ThreadPrivateCompactKeyHashTable::kKeyCodeSize;
+
+ThreadPrivateCompactKeyHashTable::ThreadPrivateCompactKeyHashTable(
+const std::vector _types,
+const std::size_t num_entries,
+const std::vector ,
+StorageManager *storage_manager)
+: key_types_(key_types),
+  handles_(handles),
+  total_state_size_(0),
+  num_buckets_(0),
+  buckets_allocated_(0),
+  storage_manager_(storage_manager) {
+  // Cache key sizes.
+  for (const Type *key_type : key_types) {
+DCHECK(!key_type->isVariableLength());
+DCHECK(!key_type->isNullable());
+key_sizes_.emplace_back(key_type->maximumByteLength());
+  }
+
+  for (const AggregationHandle *handle : handles) {
+const std::vector arg_types = handle->getArgumentTypes();
+DCHECK_LE(arg_types.size(), 1u);
+DCHECK(arg_types.empty() || !arg_types.front()->isNullable());
+
+// Figure out state size.
+std::size_t state_size = 0;
+switch (handle->getAggregationID()) {
+  case AggregationID::kCount: {
+state_size = sizeof(std::int64_t);
+break;
+  }
+  case AggregationID::kSum: {
+DCHECK_EQ(1u, arg_types.size());
+switch (arg_types.front()->getTypeID()) {
+  case TypeID::kInt:  // Fall through
+  case TypeID::kLong:
+state_size = sizeof(std::int64_t);
+break;
+  case TypeID::kFloat:  // Fall through
+  case TypeID::kDouble:
+state_size = sizeof(double);
+break;
+  default:
+LOG(FATAL) << "Unexpected argument type";
+}
+break;
+  }
+  default:
+LOG(FATAL) << "Unexpected AggregationID";
+}
+state_sizes_.emplace_back(state_size);
+total_state_size_ += state_size;
+  }
+
+  // Calculate required memory size for keys and states.
+  const std::size_t required_memory =
+  num_entries * (kKeyCodeSize + total_state_size_);
+  const std::size_t num_storage_slots =
+  storage_manager_->SlotsNeededForBytes(required_memory);
+
+  // Use storage manager to allocate memory.
+  const block_id blob_id = storage_manager->createBlob(num_storage_slots);
+  blob_ = storage_manager->getBlobMutable(blob_id);

[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/239#discussion_r112884563
  
--- Diff: storage/ThreadPrivateCompactKeyHashTable.cpp ---
@@ -0,0 +1,422 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "storage/ThreadPrivateCompactKeyHashTable.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "expressions/aggregation/AggregationHandle.hpp"
+#include "expressions/aggregation/AggregationID.hpp"
+#include "storage/StorageBlob.hpp"
+#include "storage/StorageBlockInfo.hpp"
+#include "storage/StorageManager.hpp"
+#include "storage/ValueAccessorMultiplexer.hpp"
+#include "types/Type.hpp"
+#include "types/TypeID.hpp"
+#include "types/containers/ColumnVectorsValueAccessor.hpp"
+#include "utility/ScopedBuffer.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+
+namespace {
+
+#define CASE_KEY_SIZE(value) \
+  case value: return functor(std::integral_constant())
+
+template 
+auto InvokeOnKeySize(const std::size_t key_size, const FunctorT ) {
+  switch (key_size) {
+CASE_KEY_SIZE(1);
+CASE_KEY_SIZE(2);
+CASE_KEY_SIZE(3);
+CASE_KEY_SIZE(4);
+CASE_KEY_SIZE(5);
+CASE_KEY_SIZE(6);
+CASE_KEY_SIZE(7);
+CASE_KEY_SIZE(8);
+default:
+  break;
+  }
+  LOG(FATAL) << "Unexpected key size: " << key_size;
+}
+
+#undef CASE_KEY_SIZE
+
+}  // namespace
+
+constexpr std::size_t ThreadPrivateCompactKeyHashTable::kKeyCodeSize;
+
+ThreadPrivateCompactKeyHashTable::ThreadPrivateCompactKeyHashTable(
+const std::vector _types,
+const std::size_t num_entries,
+const std::vector ,
+StorageManager *storage_manager)
+: key_types_(key_types),
+  handles_(handles),
+  total_state_size_(0),
+  num_buckets_(0),
+  buckets_allocated_(0),
+  storage_manager_(storage_manager) {
+  // Cache key sizes.
+  for (const Type *key_type : key_types) {
+DCHECK(!key_type->isVariableLength());
+DCHECK(!key_type->isNullable());
+key_sizes_.emplace_back(key_type->maximumByteLength());
+  }
+
+  for (const AggregationHandle *handle : handles) {
+const std::vector arg_types = handle->getArgumentTypes();
+DCHECK_LE(arg_types.size(), 1u);
+DCHECK(arg_types.empty() || !arg_types.front()->isNullable());
+
+// Figure out state size.
+std::size_t state_size = 0;
+switch (handle->getAggregationID()) {
+  case AggregationID::kCount: {
+state_size = sizeof(std::int64_t);
+break;
+  }
+  case AggregationID::kSum: {
+DCHECK_EQ(1u, arg_types.size());
+switch (arg_types.front()->getTypeID()) {
+  case TypeID::kInt:  // Fall through
+  case TypeID::kLong:
+state_size = sizeof(std::int64_t);
+break;
+  case TypeID::kFloat:  // Fall through
+  case TypeID::kDouble:
+state_size = sizeof(double);
+break;
+  default:
+LOG(FATAL) << "Unexpected argument type";
+}
+break;
+  }
+  default:
+LOG(FATAL) << "Unexpected AggregationID";
+}
+state_sizes_.emplace_back(state_size);
+total_state_size_ += state_size;
+  }
+
+  // Calculate required memory size for keys and states.
+  const std::size_t required_memory =
+  num_entries * (kKeyCodeSize + total_state_size_);
+  const std::size_t num_storage_slots =
+  storage_manager_->SlotsNeededForBytes(required_memory);
+
+  // Use storage manager to allocate memory.
+  const block_id blob_id = storage_manager->createBlob(num_storage_slots);
+  blob_ = storage_manager->getBlobMutable(blob_id);
+
   

[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/239#discussion_r112882425
  
--- Diff: storage/ThreadPrivateCompactKeyHashTable.cpp ---
@@ -0,0 +1,422 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "storage/ThreadPrivateCompactKeyHashTable.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "expressions/aggregation/AggregationHandle.hpp"
+#include "expressions/aggregation/AggregationID.hpp"
+#include "storage/StorageBlob.hpp"
+#include "storage/StorageBlockInfo.hpp"
+#include "storage/StorageManager.hpp"
+#include "storage/ValueAccessorMultiplexer.hpp"
+#include "types/Type.hpp"
+#include "types/TypeID.hpp"
+#include "types/containers/ColumnVectorsValueAccessor.hpp"
+#include "utility/ScopedBuffer.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+
+namespace {
+
+#define CASE_KEY_SIZE(value) \
+  case value: return functor(std::integral_constant())
+
+template 
+auto InvokeOnKeySize(const std::size_t key_size, const FunctorT ) {
+  switch (key_size) {
+CASE_KEY_SIZE(1);
+CASE_KEY_SIZE(2);
+CASE_KEY_SIZE(3);
+CASE_KEY_SIZE(4);
+CASE_KEY_SIZE(5);
+CASE_KEY_SIZE(6);
+CASE_KEY_SIZE(7);
+CASE_KEY_SIZE(8);
+default:
+  break;
+  }
+  LOG(FATAL) << "Unexpected key size: " << key_size;
+}
+
+#undef CASE_KEY_SIZE
+
+}  // namespace
+
+constexpr std::size_t ThreadPrivateCompactKeyHashTable::kKeyCodeSize;
+
+ThreadPrivateCompactKeyHashTable::ThreadPrivateCompactKeyHashTable(
+const std::vector _types,
+const std::size_t num_entries,
+const std::vector ,
+StorageManager *storage_manager)
+: key_types_(key_types),
+  handles_(handles),
+  total_state_size_(0),
+  num_buckets_(0),
+  buckets_allocated_(0),
+  storage_manager_(storage_manager) {
+  // Cache key sizes.
+  for (const Type *key_type : key_types) {
+DCHECK(!key_type->isVariableLength());
+DCHECK(!key_type->isNullable());
+key_sizes_.emplace_back(key_type->maximumByteLength());
+  }
+
+  for (const AggregationHandle *handle : handles) {
+const std::vector arg_types = handle->getArgumentTypes();
+DCHECK_LE(arg_types.size(), 1u);
+DCHECK(arg_types.empty() || !arg_types.front()->isNullable());
+
+// Figure out state size.
+std::size_t state_size = 0;
+switch (handle->getAggregationID()) {
+  case AggregationID::kCount: {
+state_size = sizeof(std::int64_t);
+break;
+  }
+  case AggregationID::kSum: {
+DCHECK_EQ(1u, arg_types.size());
+switch (arg_types.front()->getTypeID()) {
+  case TypeID::kInt:  // Fall through
+  case TypeID::kLong:
+state_size = sizeof(std::int64_t);
+break;
+  case TypeID::kFloat:  // Fall through
+  case TypeID::kDouble:
+state_size = sizeof(double);
--- End diff --

FYI, for all four cases,  `state_size` is `8`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/239#discussion_r112881164
  
--- Diff: storage/HashTablePool.hpp ---
@@ -75,6 +75,10 @@ class HashTablePool {
 handles_(handles),
 storage_manager_(DCHECK_NOTNULL(storage_manager)) {}
 
+  HashTableImplType getHashTableImplType() const {
--- End diff --

Add `doxygen` comments for this public method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread jianqiao
GitHub user jianqiao opened a pull request:

https://github.com/apache/incubator-quickstep/pull/239

Add ThreadPrivateCompactKeyHashTable for aggregation.

This PR implements a new specialized aggregation hash table that is 
preferable for two-phase (i.e. aggregate locally and then merge) aggregation 
with small-cardinality group-by keys. To use this hash table, it also requires 
that the group-by keys have fixed-length types with total byte size no greater 
than 8 (so that each composite key can be packed into a 64-bit QWORD).

This PR together with #237 can improve the performance TPC-H Q01 from 
~11.5s to ~5.3s, with scale factor 100 on a cloud lab machine.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jianqiao/incubator-quickstep 
thread-private-compact-ht

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/239.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #239


commit 99af4690ae4462e7b6843bc92c4cb3ef31a1d75a
Author: Jianqiao Zhu 
Date:   2017-04-22T04:23:13Z

Add ThreadPrivateCompactKeyHashTable as a fast path data structure for 
aggregation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---