[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-quickstep/pull/239


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread hbdeshmukh
Github user hbdeshmukh commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/239#discussion_r113015464
  
--- Diff: storage/AggregationOperationState.cpp ---
@@ -715,7 +719,18 @@ void AggregationOperationState::finalizeHashTable(
 finalizeHashTableImplPartitioned(partition_id, output_destination);
   } else {
 DCHECK_EQ(0u, partition_id);
-finalizeHashTableImplThreadPrivate(output_destination);
+DCHECK(group_by_hashtable_pool_ != nullptr);
+switch (group_by_hashtable_pool_->getHashTableImplType()) {
+  case HashTableImplType::kSeparateChaining:
--- End diff --

It's better if we are consistent in naming the HashTableImplType and the 
corresponding finalize hash table function name. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread jianqiao
Github user jianqiao commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/239#discussion_r112991169
  
--- Diff: storage/ThreadPrivateCompactKeyHashTable.cpp ---
@@ -0,0 +1,422 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "storage/ThreadPrivateCompactKeyHashTable.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "expressions/aggregation/AggregationHandle.hpp"
+#include "expressions/aggregation/AggregationID.hpp"
+#include "storage/StorageBlob.hpp"
+#include "storage/StorageBlockInfo.hpp"
+#include "storage/StorageManager.hpp"
+#include "storage/ValueAccessorMultiplexer.hpp"
+#include "types/Type.hpp"
+#include "types/TypeID.hpp"
+#include "types/containers/ColumnVectorsValueAccessor.hpp"
+#include "utility/ScopedBuffer.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+
+namespace {
+
+#define CASE_KEY_SIZE(value) \
+  case value: return functor(std::integral_constant())
+
+template 
+auto InvokeOnKeySize(const std::size_t key_size, const FunctorT &functor) {
+  switch (key_size) {
+CASE_KEY_SIZE(1);
+CASE_KEY_SIZE(2);
+CASE_KEY_SIZE(3);
+CASE_KEY_SIZE(4);
+CASE_KEY_SIZE(5);
+CASE_KEY_SIZE(6);
+CASE_KEY_SIZE(7);
+CASE_KEY_SIZE(8);
+default:
+  break;
+  }
+  LOG(FATAL) << "Unexpected key size: " << key_size;
+}
+
+#undef CASE_KEY_SIZE
+
+}  // namespace
+
+constexpr std::size_t ThreadPrivateCompactKeyHashTable::kKeyCodeSize;
+
+ThreadPrivateCompactKeyHashTable::ThreadPrivateCompactKeyHashTable(
+const std::vector &key_types,
+const std::size_t num_entries,
+const std::vector &handles,
+StorageManager *storage_manager)
+: key_types_(key_types),
+  handles_(handles),
+  total_state_size_(0),
+  num_buckets_(0),
+  buckets_allocated_(0),
+  storage_manager_(storage_manager) {
+  // Cache key sizes.
+  for (const Type *key_type : key_types) {
+DCHECK(!key_type->isVariableLength());
+DCHECK(!key_type->isNullable());
+key_sizes_.emplace_back(key_type->maximumByteLength());
+  }
+
+  for (const AggregationHandle *handle : handles) {
+const std::vector arg_types = handle->getArgumentTypes();
+DCHECK_LE(arg_types.size(), 1u);
+DCHECK(arg_types.empty() || !arg_types.front()->isNullable());
+
+// Figure out state size.
+std::size_t state_size = 0;
+switch (handle->getAggregationID()) {
+  case AggregationID::kCount: {
+state_size = sizeof(std::int64_t);
+break;
+  }
+  case AggregationID::kSum: {
+DCHECK_EQ(1u, arg_types.size());
+switch (arg_types.front()->getTypeID()) {
+  case TypeID::kInt:  // Fall through
+  case TypeID::kLong:
+state_size = sizeof(std::int64_t);
+break;
+  case TypeID::kFloat:  // Fall through
+  case TypeID::kDouble:
+state_size = sizeof(double);
+break;
+  default:
+LOG(FATAL) << "Unexpected argument type";
+}
+break;
+  }
+  default:
+LOG(FATAL) << "Unexpected AggregationID";
+}
+state_sizes_.emplace_back(state_size);
+total_state_size_ += state_size;
+  }
+
+  // Calculate required memory size for keys and states.
+  const std::size_t required_memory =
+  num_entries * (kKeyCodeSize + total_state_size_);
+  const std::size_t num_storage_slots =
+  storage_manager_->SlotsNeededForBytes(required_memory);
+
+  // Use storage manager to allocate memory.
+  const block_id blob_id = storage_manager->createBlob(num_storage_slots);
+  blob_ = storage_manager->getBlobMutable(blob_id);
+
+  num_buckets_

[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/239#discussion_r112885190
  
--- Diff: storage/ThreadPrivateCompactKeyHashTable.hpp ---
@@ -0,0 +1,230 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#ifndef QUICKSTEP_STORAGE_THREAD_PRIVATE_COMPACT_KEY_HASH_TABLE_HPP_
+#define QUICKSTEP_STORAGE_THREAD_PRIVATE_COMPACT_KEY_HASH_TABLE_HPP_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "catalog/CatalogTypedefs.hpp"
+#include "storage/HashTableBase.hpp"
+#include "storage/StorageBlob.hpp"
+#include "storage/StorageConstants.hpp"
+#include "storage/ValueAccessorMultiplexer.hpp"
+#include "storage/ValueAccessorUtil.hpp"
+#include "types/containers/ColumnVector.hpp"
+#include "utility/Macros.hpp"
+
+namespace quickstep {
+
+class AggregationHandle;
+class StorageManager;
+class Type;
+
+/**
+ * @brief Specialized aggregation hash table that is preferable for 
two-phase
+ *aggregation with small-cardinality group-by keys. To use this 
hash
+ *table, it also requires that the group-by keys have fixed-length 
types
+ *with total byte size no greater than 8 (so that the keys can be 
packed
+ *into a 64-bit QWORD).
+ */
+class ThreadPrivateCompactKeyHashTable : public 
AggregationStateHashTableBase {
--- End diff --

Could we mark `final` so that all `override` methods would have performance 
benefits?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/239#discussion_r112884563
  
--- Diff: storage/ThreadPrivateCompactKeyHashTable.cpp ---
@@ -0,0 +1,422 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "storage/ThreadPrivateCompactKeyHashTable.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "expressions/aggregation/AggregationHandle.hpp"
+#include "expressions/aggregation/AggregationID.hpp"
+#include "storage/StorageBlob.hpp"
+#include "storage/StorageBlockInfo.hpp"
+#include "storage/StorageManager.hpp"
+#include "storage/ValueAccessorMultiplexer.hpp"
+#include "types/Type.hpp"
+#include "types/TypeID.hpp"
+#include "types/containers/ColumnVectorsValueAccessor.hpp"
+#include "utility/ScopedBuffer.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+
+namespace {
+
+#define CASE_KEY_SIZE(value) \
+  case value: return functor(std::integral_constant())
+
+template 
+auto InvokeOnKeySize(const std::size_t key_size, const FunctorT &functor) {
+  switch (key_size) {
+CASE_KEY_SIZE(1);
+CASE_KEY_SIZE(2);
+CASE_KEY_SIZE(3);
+CASE_KEY_SIZE(4);
+CASE_KEY_SIZE(5);
+CASE_KEY_SIZE(6);
+CASE_KEY_SIZE(7);
+CASE_KEY_SIZE(8);
+default:
+  break;
+  }
+  LOG(FATAL) << "Unexpected key size: " << key_size;
+}
+
+#undef CASE_KEY_SIZE
+
+}  // namespace
+
+constexpr std::size_t ThreadPrivateCompactKeyHashTable::kKeyCodeSize;
+
+ThreadPrivateCompactKeyHashTable::ThreadPrivateCompactKeyHashTable(
+const std::vector &key_types,
+const std::size_t num_entries,
+const std::vector &handles,
+StorageManager *storage_manager)
+: key_types_(key_types),
+  handles_(handles),
+  total_state_size_(0),
+  num_buckets_(0),
+  buckets_allocated_(0),
+  storage_manager_(storage_manager) {
+  // Cache key sizes.
+  for (const Type *key_type : key_types) {
+DCHECK(!key_type->isVariableLength());
+DCHECK(!key_type->isNullable());
+key_sizes_.emplace_back(key_type->maximumByteLength());
+  }
+
+  for (const AggregationHandle *handle : handles) {
+const std::vector arg_types = handle->getArgumentTypes();
+DCHECK_LE(arg_types.size(), 1u);
+DCHECK(arg_types.empty() || !arg_types.front()->isNullable());
+
+// Figure out state size.
+std::size_t state_size = 0;
+switch (handle->getAggregationID()) {
+  case AggregationID::kCount: {
+state_size = sizeof(std::int64_t);
+break;
+  }
+  case AggregationID::kSum: {
+DCHECK_EQ(1u, arg_types.size());
+switch (arg_types.front()->getTypeID()) {
+  case TypeID::kInt:  // Fall through
+  case TypeID::kLong:
+state_size = sizeof(std::int64_t);
+break;
+  case TypeID::kFloat:  // Fall through
+  case TypeID::kDouble:
+state_size = sizeof(double);
+break;
+  default:
+LOG(FATAL) << "Unexpected argument type";
+}
+break;
+  }
+  default:
+LOG(FATAL) << "Unexpected AggregationID";
+}
+state_sizes_.emplace_back(state_size);
+total_state_size_ += state_size;
+  }
+
+  // Calculate required memory size for keys and states.
+  const std::size_t required_memory =
+  num_entries * (kKeyCodeSize + total_state_size_);
+  const std::size_t num_storage_slots =
+  storage_manager_->SlotsNeededForBytes(required_memory);
+
+  // Use storage manager to allocate memory.
+  const block_id blob_id = storage_manager->createBlob(num_storage_slots);
+  blob_ = storage_manager->getBlobMutable(blob_id);
+
+  num_buckets_ = b

[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/239#discussion_r112881164
  
--- Diff: storage/HashTablePool.hpp ---
@@ -75,6 +75,10 @@ class HashTablePool {
 handles_(handles),
 storage_manager_(DCHECK_NOTNULL(storage_manager)) {}
 
+  HashTableImplType getHashTableImplType() const {
--- End diff --

Add `doxygen` comments for this public method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/239#discussion_r112882425
  
--- Diff: storage/ThreadPrivateCompactKeyHashTable.cpp ---
@@ -0,0 +1,422 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "storage/ThreadPrivateCompactKeyHashTable.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "expressions/aggregation/AggregationHandle.hpp"
+#include "expressions/aggregation/AggregationID.hpp"
+#include "storage/StorageBlob.hpp"
+#include "storage/StorageBlockInfo.hpp"
+#include "storage/StorageManager.hpp"
+#include "storage/ValueAccessorMultiplexer.hpp"
+#include "types/Type.hpp"
+#include "types/TypeID.hpp"
+#include "types/containers/ColumnVectorsValueAccessor.hpp"
+#include "utility/ScopedBuffer.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+
+namespace {
+
+#define CASE_KEY_SIZE(value) \
+  case value: return functor(std::integral_constant())
+
+template 
+auto InvokeOnKeySize(const std::size_t key_size, const FunctorT &functor) {
+  switch (key_size) {
+CASE_KEY_SIZE(1);
+CASE_KEY_SIZE(2);
+CASE_KEY_SIZE(3);
+CASE_KEY_SIZE(4);
+CASE_KEY_SIZE(5);
+CASE_KEY_SIZE(6);
+CASE_KEY_SIZE(7);
+CASE_KEY_SIZE(8);
+default:
+  break;
+  }
+  LOG(FATAL) << "Unexpected key size: " << key_size;
+}
+
+#undef CASE_KEY_SIZE
+
+}  // namespace
+
+constexpr std::size_t ThreadPrivateCompactKeyHashTable::kKeyCodeSize;
+
+ThreadPrivateCompactKeyHashTable::ThreadPrivateCompactKeyHashTable(
+const std::vector &key_types,
+const std::size_t num_entries,
+const std::vector &handles,
+StorageManager *storage_manager)
+: key_types_(key_types),
+  handles_(handles),
+  total_state_size_(0),
+  num_buckets_(0),
+  buckets_allocated_(0),
+  storage_manager_(storage_manager) {
+  // Cache key sizes.
+  for (const Type *key_type : key_types) {
+DCHECK(!key_type->isVariableLength());
+DCHECK(!key_type->isNullable());
+key_sizes_.emplace_back(key_type->maximumByteLength());
+  }
+
+  for (const AggregationHandle *handle : handles) {
+const std::vector arg_types = handle->getArgumentTypes();
+DCHECK_LE(arg_types.size(), 1u);
+DCHECK(arg_types.empty() || !arg_types.front()->isNullable());
+
+// Figure out state size.
+std::size_t state_size = 0;
+switch (handle->getAggregationID()) {
+  case AggregationID::kCount: {
+state_size = sizeof(std::int64_t);
+break;
+  }
+  case AggregationID::kSum: {
+DCHECK_EQ(1u, arg_types.size());
+switch (arg_types.front()->getTypeID()) {
+  case TypeID::kInt:  // Fall through
+  case TypeID::kLong:
+state_size = sizeof(std::int64_t);
+break;
+  case TypeID::kFloat:  // Fall through
+  case TypeID::kDouble:
+state_size = sizeof(double);
--- End diff --

FYI, for all four cases,  `state_size` is `8`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/239#discussion_r112884381
  
--- Diff: storage/ThreadPrivateCompactKeyHashTable.cpp ---
@@ -0,0 +1,422 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "storage/ThreadPrivateCompactKeyHashTable.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "expressions/aggregation/AggregationHandle.hpp"
+#include "expressions/aggregation/AggregationID.hpp"
+#include "storage/StorageBlob.hpp"
+#include "storage/StorageBlockInfo.hpp"
+#include "storage/StorageManager.hpp"
+#include "storage/ValueAccessorMultiplexer.hpp"
+#include "types/Type.hpp"
+#include "types/TypeID.hpp"
+#include "types/containers/ColumnVectorsValueAccessor.hpp"
+#include "utility/ScopedBuffer.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+
+namespace {
+
+#define CASE_KEY_SIZE(value) \
+  case value: return functor(std::integral_constant())
+
+template 
+auto InvokeOnKeySize(const std::size_t key_size, const FunctorT &functor) {
+  switch (key_size) {
+CASE_KEY_SIZE(1);
+CASE_KEY_SIZE(2);
+CASE_KEY_SIZE(3);
+CASE_KEY_SIZE(4);
+CASE_KEY_SIZE(5);
+CASE_KEY_SIZE(6);
+CASE_KEY_SIZE(7);
+CASE_KEY_SIZE(8);
+default:
+  break;
+  }
+  LOG(FATAL) << "Unexpected key size: " << key_size;
+}
+
+#undef CASE_KEY_SIZE
+
+}  // namespace
+
+constexpr std::size_t ThreadPrivateCompactKeyHashTable::kKeyCodeSize;
+
+ThreadPrivateCompactKeyHashTable::ThreadPrivateCompactKeyHashTable(
+const std::vector &key_types,
+const std::size_t num_entries,
+const std::vector &handles,
+StorageManager *storage_manager)
+: key_types_(key_types),
+  handles_(handles),
+  total_state_size_(0),
+  num_buckets_(0),
+  buckets_allocated_(0),
+  storage_manager_(storage_manager) {
+  // Cache key sizes.
+  for (const Type *key_type : key_types) {
+DCHECK(!key_type->isVariableLength());
+DCHECK(!key_type->isNullable());
+key_sizes_.emplace_back(key_type->maximumByteLength());
+  }
+
+  for (const AggregationHandle *handle : handles) {
+const std::vector arg_types = handle->getArgumentTypes();
+DCHECK_LE(arg_types.size(), 1u);
+DCHECK(arg_types.empty() || !arg_types.front()->isNullable());
+
+// Figure out state size.
+std::size_t state_size = 0;
+switch (handle->getAggregationID()) {
+  case AggregationID::kCount: {
+state_size = sizeof(std::int64_t);
+break;
+  }
+  case AggregationID::kSum: {
+DCHECK_EQ(1u, arg_types.size());
+switch (arg_types.front()->getTypeID()) {
+  case TypeID::kInt:  // Fall through
+  case TypeID::kLong:
+state_size = sizeof(std::int64_t);
+break;
+  case TypeID::kFloat:  // Fall through
+  case TypeID::kDouble:
+state_size = sizeof(double);
+break;
+  default:
+LOG(FATAL) << "Unexpected argument type";
+}
+break;
+  }
+  default:
+LOG(FATAL) << "Unexpected AggregationID";
+}
+state_sizes_.emplace_back(state_size);
+total_state_size_ += state_size;
+  }
+
+  // Calculate required memory size for keys and states.
+  const std::size_t required_memory =
+  num_entries * (kKeyCodeSize + total_state_size_);
+  const std::size_t num_storage_slots =
+  storage_manager_->SlotsNeededForBytes(required_memory);
+
+  // Use storage manager to allocate memory.
+  const block_id blob_id = storage_manager->createBlob(num_storage_slots);
+  blob_ = storage_manager->getBlobMutable(blob_id);
+
+  num_buckets_ = b

[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-24 Thread zuyu
Github user zuyu commented on a diff in the pull request:

https://github.com/apache/incubator-quickstep/pull/239#discussion_r112883975
  
--- Diff: storage/ThreadPrivateCompactKeyHashTable.cpp ---
@@ -0,0 +1,422 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ **/
+
+#include "storage/ThreadPrivateCompactKeyHashTable.hpp"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "expressions/aggregation/AggregationHandle.hpp"
+#include "expressions/aggregation/AggregationID.hpp"
+#include "storage/StorageBlob.hpp"
+#include "storage/StorageBlockInfo.hpp"
+#include "storage/StorageManager.hpp"
+#include "storage/ValueAccessorMultiplexer.hpp"
+#include "types/Type.hpp"
+#include "types/TypeID.hpp"
+#include "types/containers/ColumnVectorsValueAccessor.hpp"
+#include "utility/ScopedBuffer.hpp"
+
+#include "glog/logging.h"
+
+namespace quickstep {
+
+namespace {
+
+#define CASE_KEY_SIZE(value) \
+  case value: return functor(std::integral_constant())
+
+template 
+auto InvokeOnKeySize(const std::size_t key_size, const FunctorT &functor) {
+  switch (key_size) {
+CASE_KEY_SIZE(1);
+CASE_KEY_SIZE(2);
+CASE_KEY_SIZE(3);
+CASE_KEY_SIZE(4);
+CASE_KEY_SIZE(5);
+CASE_KEY_SIZE(6);
+CASE_KEY_SIZE(7);
+CASE_KEY_SIZE(8);
+default:
+  break;
+  }
+  LOG(FATAL) << "Unexpected key size: " << key_size;
+}
+
+#undef CASE_KEY_SIZE
+
+}  // namespace
+
+constexpr std::size_t ThreadPrivateCompactKeyHashTable::kKeyCodeSize;
+
+ThreadPrivateCompactKeyHashTable::ThreadPrivateCompactKeyHashTable(
+const std::vector &key_types,
+const std::size_t num_entries,
+const std::vector &handles,
+StorageManager *storage_manager)
+: key_types_(key_types),
+  handles_(handles),
+  total_state_size_(0),
+  num_buckets_(0),
+  buckets_allocated_(0),
+  storage_manager_(storage_manager) {
+  // Cache key sizes.
+  for (const Type *key_type : key_types) {
+DCHECK(!key_type->isVariableLength());
+DCHECK(!key_type->isNullable());
+key_sizes_.emplace_back(key_type->maximumByteLength());
+  }
+
+  for (const AggregationHandle *handle : handles) {
+const std::vector arg_types = handle->getArgumentTypes();
+DCHECK_LE(arg_types.size(), 1u);
+DCHECK(arg_types.empty() || !arg_types.front()->isNullable());
+
+// Figure out state size.
+std::size_t state_size = 0;
+switch (handle->getAggregationID()) {
+  case AggregationID::kCount: {
+state_size = sizeof(std::int64_t);
+break;
+  }
+  case AggregationID::kSum: {
+DCHECK_EQ(1u, arg_types.size());
+switch (arg_types.front()->getTypeID()) {
+  case TypeID::kInt:  // Fall through
+  case TypeID::kLong:
+state_size = sizeof(std::int64_t);
+break;
+  case TypeID::kFloat:  // Fall through
+  case TypeID::kDouble:
+state_size = sizeof(double);
+break;
+  default:
+LOG(FATAL) << "Unexpected argument type";
+}
+break;
+  }
+  default:
+LOG(FATAL) << "Unexpected AggregationID";
+}
+state_sizes_.emplace_back(state_size);
+total_state_size_ += state_size;
+  }
+
+  // Calculate required memory size for keys and states.
+  const std::size_t required_memory =
+  num_entries * (kKeyCodeSize + total_state_size_);
+  const std::size_t num_storage_slots =
+  storage_manager_->SlotsNeededForBytes(required_memory);
+
+  // Use storage manager to allocate memory.
+  const block_id blob_id = storage_manager->createBlob(num_storage_slots);
+  blob_ = storage_manager->getBlobMutable(blob_id);
+
+  num_buckets_ = b

[GitHub] incubator-quickstep pull request #239: Add ThreadPrivateCompactKeyHashTable ...

2017-04-23 Thread jianqiao
GitHub user jianqiao opened a pull request:

https://github.com/apache/incubator-quickstep/pull/239

Add ThreadPrivateCompactKeyHashTable for aggregation.

This PR implements a new specialized aggregation hash table that is 
preferable for two-phase (i.e. aggregate locally and then merge) aggregation 
with small-cardinality group-by keys. To use this hash table, it also requires 
that the group-by keys have fixed-length types with total byte size no greater 
than 8 (so that each composite key can be packed into a 64-bit QWORD).

This PR together with #237 can improve the performance TPC-H Q01 from 
~11.5s to ~5.3s, with scale factor 100 on a cloud lab machine.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jianqiao/incubator-quickstep 
thread-private-compact-ht

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-quickstep/pull/239.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #239


commit 99af4690ae4462e7b6843bc92c4cb3ef31a1d75a
Author: Jianqiao Zhu 
Date:   2017-04-22T04:23:13Z

Add ThreadPrivateCompactKeyHashTable as a fast path data structure for 
aggregation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---