[GitHub] [incubator-doris] EmmyMiao87 commented on issue #1650: Where clause can't push down into the right table of JOIN clasure
EmmyMiao87 commented on issue #1650: Where clause can't push down into the right table of JOIN clasure URL: https://github.com/apache/incubator-doris/issues/1650#issuecomment-521538908 The where predicate could not be pushed down when the predicate is on the outer join table. For example: `select * from a left join b on a.id=b.id where b.id=1;` In some cases, the predicate could be pushed down. In impala, for example: Query: explain select * from customer_address a left join customer_address b on a.ca_address_sk=b.ca_address_sk where b.ca_address_sk=1 ++ | Explain String | ++ | Estimated Per-Host Requirements: Memory=0B VCores=2 | | WARNING: The following tables are missing relevant table and/or column statistics. | | tpcds_hive.customer_address | | | | PLAN-ROOT SINK | | | | | 04:EXCHANGE [UNPARTITIONED] | | | | | 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST] | | | hash predicates: a.ca_address_sk = b.ca_address_sk | | | other predicates: b.ca_address_sk = 1 | | | | | |--03:EXCHANGE [BROADCAST] | | | | | | | 01:SCAN HDFS [tpcds_hive.customer_address b] | | | partitions=1/1 files=0 size=0B | | | predicates: b.ca_address_sk = 1 | | | | | 00:SCAN HDFS [tpcds_hive.customer_address a] | |partitions=1/1 files=0 size=0B | ++ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay opened a new issue #1651: Support checksum in BetaRowset
imay opened a new issue #1651: Support checksum in BetaRowset URL: https://github.com/apache/incubator-doris/issues/1651 We should add checksum for BetaRowset. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay opened a new issue #1652: support filter with predicate in BetaRowset
imay opened a new issue #1652: support filter with predicate in BetaRowset URL: https://github.com/apache/incubator-doris/issues/1652 Support predicate filter in BetaRowset iterator. It's better if we can support lazy materialization. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay closed issue #1391: Add column page cache
imay closed issue #1391: Add column page cache URL: https://github.com/apache/incubator-doris/issues/1391 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #1642: add kafka_default_offsets when no partiotion specify .support read kafka partition from start
morningman commented on a change in pull request #1642: add kafka_default_offsets when no partiotion specify .support read kafka partition from start URL: https://github.com/apache/incubator-doris/pull/1642#discussion_r314215358 ## File path: fe/src/main/java/org/apache/doris/load/routineload/KafkaRoutineLoadJob.java ## @@ -128,6 +132,9 @@ private void convertCustomProperties() throws DdlException { convertedCustomProperties.put(entry.getKey(), entry.getValue()); } } +if (convertedCustomProperties.containsKey(KAFKA_DEFAULT_OFFSETS)) { +kafkaDefaultOffSet = convertedCustomProperties.get(KAFKA_DEFAULT_OFFSETS); Review comment: ```suggestion kafkaDefaultOffSet = convertedCustomProperties.remove(KAFKA_DEFAULT_OFFSETS); ``` convertedCustomProperties 中只能包含可以被 librdkafka 识别的properties,所以这个default offset 需要单独移除 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] EmmyMiao87 commented on issue #1650: Where clause can't push down into the right table of JOIN clasure
EmmyMiao87 commented on issue #1650: Where clause can't push down into the right table of JOIN clasure URL: https://github.com/apache/incubator-doris/issues/1650#issuecomment-521565573 This https://www.ibm.com/developerworks/data/library/techarticle/purcell/0201purcell.html provides a theoretical support for a WHERE clause predicate applied to the NULL-supplying table. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] EmmyMiao87 commented on issue #353: Support loading data from Kafka
EmmyMiao87 commented on issue #353: Support loading data from Kafka URL: https://github.com/apache/incubator-doris/issues/353#issuecomment-521572294 #967 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] EmmyMiao87 commented on issue #353: Support loading data from Kafka
EmmyMiao87 commented on issue #353: Support loading data from Kafka URL: https://github.com/apache/incubator-doris/issues/353#issuecomment-521572468 #1650 scheduler routine load job for stream load This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] EmmyMiao87 removed a comment on issue #353: Support loading data from Kafka
EmmyMiao87 removed a comment on issue #353: Support loading data from Kafka URL: https://github.com/apache/incubator-doris/issues/353#issuecomment-521572468 #1650 scheduler routine load job for stream load This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] yiguolei opened a new pull request #1653: Use same dir during schema change
yiguolei opened a new pull request #1653: Use same dir during schema change URL: https://github.com/apache/incubator-doris/pull/1653 1. Use same dir during schema change 2. Should start column id from base tablet's unique id This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] morningman merged pull request #1653: Use same dir during schema change
morningman merged pull request #1653: Use same dir during schema change URL: https://github.com/apache/incubator-doris/pull/1653 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] morningman closed pull request #1613: Refactor alter job process
morningman closed pull request #1613: Refactor alter job process URL: https://github.com/apache/incubator-doris/pull/1613 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] DDDDDDouble opened a new issue #1654: When we use streamload, we can't get the label in the log
DDouble opened a new issue #1654: When we use streamload, we can't get the label in the log URL: https://github.com/apache/incubator-doris/issues/1654 @imay Hi, I will PR later. Below is my log information less log/fe.log 2019-08-15 11:04:11,494 INFO 161 [LoadAction.executeWithoutPassword():139] redirect load action to destination=TNetworkAddress(hostname:10.26.43.136, port:8040), stream: true, db: midas_report, tbl: MCD_CampaignReportDate, label: null This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] DDDDDDouble opened a new pull request #1655: Fix get label when use StreamLoad
DDouble opened a new pull request #1655: Fix get label when use StreamLoad URL: https://github.com/apache/incubator-doris/pull/1655 For https://github.com/apache/incubator-doris/issues/1654 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] xonze closed issue #1649: 0.10.12升级到0.10.15后fe不能正常选主
xonze closed issue #1649: 0.10.12升级到0.10.15后fe不能正常选主 URL: https://github.com/apache/incubator-doris/issues/1649 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] xonze commented on issue #1649: 0.10.12升级到0.10.15后fe不能正常选主
xonze commented on issue #1649: 0.10.12升级到0.10.15后fe不能正常选主 URL: https://github.com/apache/incubator-doris/issues/1649#issuecomment-521620570 经测试是升级完后我用了openjdk的原因,换成oracle 官方jdk "1.8.0_211"版本一切如初 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] manannan2017 opened a new issue #1656: the sql-reference doc review,i have some doubts
manannan2017 opened a new issue #1656: the sql-reference doc review,i have some doubts URL: https://github.com/apache/incubator-doris/issues/1656 一、github上sql-reference 1、dayofweek(DATETIME date)描述不清晰 原来:参数为Date或者Datetime类型; 纠正信息:可cast到date的数字也可以 地址:https://github.com/apache/incubator-doris/blob/master/docs/documentation/cn/sql-reference/sql-functions/date-time-functions/dayofweek.md 2、ST_Polygon建议补充信息:多边形不能交叉 关键字不能用: ![image](https://user-images.githubusercontent.com/33174388/63094043-d8d12f00-bf99-11e9-923c-69cf377e93a4.png) 地址:https://github.com/apache/incubator-doris/blob/master/docs/documentation/cn/sql-reference/sql-functions/spatial-functions/st_polygon.md 3、建议补充圆的半径最大值信息【超过8位数,就不准了】 SELECT ST_AsText(ST_Circle(111, -64, 123456789)); 地址:https://github.com/apache/incubator-doris/blob/master/docs/documentation/cn/sql-reference/sql-functions/spatial-functions/st_circle.md 二、doris官网 1、导入,分隔符与描述的不一致 ![image](https://user-images.githubusercontent.com/33174388/63094093-03bb8300-bf9a-11e9-86d8-7cf539d4935f.png) 地址:http://doris.apache.org/Docs/cn/getting-started/basic-usage.html#id10 2、描述错误:int 类型占4字节 ![image](https://user-images.githubusercontent.com/33174388/63094111-103fdb80-bf9a-11e9-9275-c204d44a86c8.png) 地址:http://doris.apache.org/Docs/cn/getting-started/data-model-rollup.html 3、错别字 ![image](https://user-images.githubusercontent.com/33174388/63094137-2352ab80-bf9a-11e9-8d3a-42e9d977a5dd.png) 地址:http://doris.apache.org/Docs/cn/getting-started/data-model-rollup.html 4、序号有误 ![image](https://user-images.githubusercontent.com/33174388/63094165-32395e00-bf9a-11e9-94ee-40ad09caf061.png) 地址:http://doris.apache.org/Docs/cn/getting-started/best-practice.html 5、数据划分 错别字 ![image](https://user-images.githubusercontent.com/33174388/63094180-3ebdb680-bf9a-11e9-8f93-27794cd8e61d.png) 三、命令行里help文档 1、help create table: ![image](https://user-images.githubusercontent.com/33174388/63094206-4aa97880-bf9a-11e9-9636-5845574d8d6b.png) 2、范围是否正确 ![image](https://user-images.githubusercontent.com/33174388/63094222-572dd100-bf9a-11e9-9192-6618af764e58.png) 3、help mini load ![image](https://user-images.githubusercontent.com/33174388/63094243-6280fc80-bf9a-11e9-9e05-313e81740feb.png) 4、help largeint ![image](https://user-images.githubusercontent.com/33174388/63094251-69a80a80-bf9a-11e9-96f0-ab6a69576f5f.png) 5、help create file ![image](https://user-images.githubusercontent.com/33174388/63094266-7167af00-bf9a-11e9-8fd3-3dda7f5b80c7.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] manannan2017 opened a new pull request #1657: Doc review
manannan2017 opened a new pull request #1657: Doc review URL: https://github.com/apache/incubator-doris/pull/1657 the sql-reference doc review,i have some doubts。Take a look at your hard work,thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on issue #1656: the sql-reference doc review,i have some doubts
imay commented on issue #1656: the sql-reference doc review,i have some doubts URL: https://github.com/apache/incubator-doris/issues/1656#issuecomment-521637670 Can you change it for us, and give PR to fix it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] chaoyli commented on a change in pull request #1646: Support page compression in BetaRowset
chaoyli commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314308435 ## File path: be/src/olap/rowset/segment_v2/page_compression.cpp ## @@ -0,0 +1,88 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "olap/rowset/segment_v2/page_compression.h" + +#include "gutil/strings/substitute.h" +#include "util/block_compression.h" +#include "util/coding.h" + +namespace doris { +namespace segment_v2 { + +using strings::Substitute; + +Status PageDecompressor::init() { +if (_data.size < 4) { +return Status::Corruption( +Substitute("Compressed page's size is too small, size=$0, needed=$1", + _data.size, 4)); +} +_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data); +return Status::OK(); +} + +Status PageDecompressor::decompress_to(void* buf) { +Slice compressed_slice(_data.data + 4, _data.size - 4); +if (compressed_slice.size == _uncompressed_bytes) { +// If compressed_slice's size is equal with _uncompressed_bytes, it means +// compressor store this directly without compression. So we just copy +// this to buf and return. +memcpy(buf, compressed_slice.data, _uncompressed_bytes); +return Status::OK(); +} + +Slice uncompressed_data((char*)buf, _uncompressed_bytes); +RETURN_IF_ERROR(_codec->decompress(compressed_slice, &uncompressed_data)); +if (uncompressed_data.size != _uncompressed_bytes) { +// If size after decompress didn't match recorded size, we think this +// page is corrupt. +return Status::Corruption( +Substitute("Uncompressed size not match, record=$0 vs decompress=$1", + _uncompressed_bytes, uncompressed_data.size)); +} +return Status::OK(); +} + +Status PageCompressor::compress(const std::vector& raw_data, +std::vector* compressed_data) { +size_t uncompressed_bytes = Slice::compute_total_size(raw_data); +size_t max_compressed_bytes = _codec->max_compressed_len(uncompressed_bytes); +_buf.resize(max_compressed_bytes + 4); +Slice compressed_slice(_buf.data() + 4, max_compressed_bytes); +RETURN_IF_ERROR(_codec->compress(raw_data, &compressed_slice)); +double compression_ratio = (double)compressed_slice.size / uncompressed_bytes; +if (compressed_slice.size >= uncompressed_bytes || +compression_ratio > _min_compression_ratio) { +// If compression ration is not lower enough we just copy uncompressed +// data to avoid decompression CPU cost +encode_fixed32_le((uint8_t*)_buf.data(), uncompressed_bytes); Review comment: encode_fixed32_le((uint8_t*)_buf.data(), uncompressed_bytes) seems redundant with line 81 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] chaoyli commented on a change in pull request #1646: Support page compression in BetaRowset
chaoyli commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314298476 ## File path: be/src/olap/rowset/segment_v2/column_writer.cpp ## @@ -226,10 +228,14 @@ Status ColumnWriter::_write_data_page(Page* page) { Status ColumnWriter::_write_physical_page(std::vector* origin_data, PagePointer* pp) { std::vector* output_data = origin_data; std::vector compressed_data; -// TODO(zc): support compress -// if (_need_compress) { -// output_data = &compressed_data; -// } + +// Put compressor out of if block, because we should use compressor's +// content until this function finished. +PageCompressor compressor(_compress_codec); +if (_compress_codec != nullptr) { +RETURN_IF_ERROR(compressor.compress(*origin_data, &compressed_data)); +output_data = &compressed_data; +} // checksum uint8_t checksum_buf[sizeof(uint32_t)]; Review comment: if _opts.need_checksum is false, checksum_buf will be leaked. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] chaoyli commented on a change in pull request #1646: Support page compression in BetaRowset
chaoyli commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314304332 ## File path: be/src/olap/rowset/segment_v2/page_compression.h ## @@ -0,0 +1,98 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include + +#include "common/status.h" +#include "util/slice.h" +#include "util/faststring.h" + +namespace doris { + +class BlockCompressionCodec; + +namespace segment_v2 { + +// Helper to decompress a compressed page. +// Compressed page is composed of +// Header | CompressedData +// Header : uncompressed data length(fixed32) +// CompressedData: binary +// Usage: +// Slice compressed_data; +// PageDecompressor decompressor(compressed_data, uncomrcodec); +// RETURN_IF_ERROR(decompressor.init()); +// std::string buf; +// buf.resize(decompressor.uncompressed_bytes()); +// RETURN_IF_ERROR(decompress_to(buf.data())); +class PageDecompressor { +public: +PageDecompressor(const Slice& data, const BlockCompressionCodec* codec) +: _data(data), _codec(codec) { +} + +// Parse and validate compressed page's header. +// Only this funciton is executed successfully, uncompressed_bytes +// and decompress_to can be called. +// Return error if this page is corrupt. +Status init(); + +// Get uncompressed size in bytes of this page +size_t uncompressed_bytes() const { return _uncompressed_bytes; } + +// Decmopress compressed data into buf whose capacity must be greater than +// uncompressed_bytes() +Status decompress_to(void* buf); Review comment: decompress will be better? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] chaoyli commented on a change in pull request #1646: Support page compression in BetaRowset
chaoyli commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314304011 ## File path: be/src/olap/rowset/segment_v2/page_compression.cpp ## @@ -0,0 +1,88 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "olap/rowset/segment_v2/page_compression.h" + +#include "gutil/strings/substitute.h" +#include "util/block_compression.h" +#include "util/coding.h" + +namespace doris { +namespace segment_v2 { + +using strings::Substitute; + +Status PageDecompressor::init() { +if (_data.size < 4) { +return Status::Corruption( +Substitute("Compressed page's size is too small, size=$0, needed=$1", + _data.size, 4)); +} +_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data); +return Status::OK(); +} + +Status PageDecompressor::decompress_to(void* buf) { +Slice compressed_slice(_data.data + 4, _data.size - 4); +if (compressed_slice.size == _uncompressed_bytes) { +// If compressed_slice's size is equal with _uncompressed_bytes, it means +// compressor store this directly without compression. So we just copy +// this to buf and return. +memcpy(buf, compressed_slice.data, _uncompressed_bytes); +return Status::OK(); +} + +Slice uncompressed_data((char*)buf, _uncompressed_bytes); +RETURN_IF_ERROR(_codec->decompress(compressed_slice, &uncompressed_data)); +if (uncompressed_data.size != _uncompressed_bytes) { +// If size after decompress didn't match recorded size, we think this +// page is corrupt. +return Status::Corruption( +Substitute("Uncompressed size not match, record=$0 vs decompress=$1", + _uncompressed_bytes, uncompressed_data.size)); +} +return Status::OK(); +} + +Status PageCompressor::compress(const std::vector& raw_data, +std::vector* compressed_data) { +size_t uncompressed_bytes = Slice::compute_total_size(raw_data); +size_t max_compressed_bytes = _codec->max_compressed_len(uncompressed_bytes); +_buf.resize(max_compressed_bytes + 4); +Slice compressed_slice(_buf.data() + 4, max_compressed_bytes); +RETURN_IF_ERROR(_codec->compress(raw_data, &compressed_slice)); +double compression_ratio = (double)compressed_slice.size / uncompressed_bytes; Review comment: compressed_slice.size >= uncompressed_bytes is belongs to compression_ratio > _min_compression_ratio? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset
imay commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314311878 ## File path: be/src/olap/rowset/segment_v2/page_compression.cpp ## @@ -0,0 +1,88 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "olap/rowset/segment_v2/page_compression.h" + +#include "gutil/strings/substitute.h" +#include "util/block_compression.h" +#include "util/coding.h" + +namespace doris { +namespace segment_v2 { + +using strings::Substitute; + +Status PageDecompressor::init() { +if (_data.size < 4) { +return Status::Corruption( +Substitute("Compressed page's size is too small, size=$0, needed=$1", + _data.size, 4)); +} +_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data); +return Status::OK(); +} + +Status PageDecompressor::decompress_to(void* buf) { +Slice compressed_slice(_data.data + 4, _data.size - 4); +if (compressed_slice.size == _uncompressed_bytes) { +// If compressed_slice's size is equal with _uncompressed_bytes, it means +// compressor store this directly without compression. So we just copy +// this to buf and return. +memcpy(buf, compressed_slice.data, _uncompressed_bytes); +return Status::OK(); +} + +Slice uncompressed_data((char*)buf, _uncompressed_bytes); +RETURN_IF_ERROR(_codec->decompress(compressed_slice, &uncompressed_data)); +if (uncompressed_data.size != _uncompressed_bytes) { +// If size after decompress didn't match recorded size, we think this +// page is corrupt. +return Status::Corruption( +Substitute("Uncompressed size not match, record=$0 vs decompress=$1", + _uncompressed_bytes, uncompressed_data.size)); +} +return Status::OK(); +} + +Status PageCompressor::compress(const std::vector& raw_data, +std::vector* compressed_data) { +size_t uncompressed_bytes = Slice::compute_total_size(raw_data); +size_t max_compressed_bytes = _codec->max_compressed_len(uncompressed_bytes); +_buf.resize(max_compressed_bytes + 4); +Slice compressed_slice(_buf.data() + 4, max_compressed_bytes); +RETURN_IF_ERROR(_codec->compress(raw_data, &compressed_slice)); +double compression_ratio = (double)compressed_slice.size / uncompressed_bytes; +if (compressed_slice.size >= uncompressed_bytes || +compression_ratio > _min_compression_ratio) { +// If compression ration is not lower enough we just copy uncompressed +// data to avoid decompression CPU cost +encode_fixed32_le((uint8_t*)_buf.data(), uncompressed_bytes); Review comment: This can be merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset
imay commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314312114 ## File path: be/src/olap/rowset/segment_v2/page_compression.h ## @@ -0,0 +1,98 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include + +#include "common/status.h" +#include "util/slice.h" +#include "util/faststring.h" + +namespace doris { + +class BlockCompressionCodec; + +namespace segment_v2 { + +// Helper to decompress a compressed page. +// Compressed page is composed of +// Header | CompressedData +// Header : uncompressed data length(fixed32) +// CompressedData: binary +// Usage: +// Slice compressed_data; +// PageDecompressor decompressor(compressed_data, uncomrcodec); +// RETURN_IF_ERROR(decompressor.init()); +// std::string buf; +// buf.resize(decompressor.uncompressed_bytes()); +// RETURN_IF_ERROR(decompress_to(buf.data())); +class PageDecompressor { +public: +PageDecompressor(const Slice& data, const BlockCompressionCodec* codec) +: _data(data), _codec(codec) { +} + +// Parse and validate compressed page's header. +// Only this funciton is executed successfully, uncompressed_bytes +// and decompress_to can be called. +// Return error if this page is corrupt. +Status init(); + +// Get uncompressed size in bytes of this page +size_t uncompressed_bytes() const { return _uncompressed_bytes; } + +// Decmopress compressed data into buf whose capacity must be greater than +// uncompressed_bytes() +Status decompress_to(void* buf); Review comment: I don't think so This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset
imay commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314312234 ## File path: be/src/olap/rowset/segment_v2/page_compression.cpp ## @@ -0,0 +1,88 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "olap/rowset/segment_v2/page_compression.h" + +#include "gutil/strings/substitute.h" +#include "util/block_compression.h" +#include "util/coding.h" + +namespace doris { +namespace segment_v2 { + +using strings::Substitute; + +Status PageDecompressor::init() { +if (_data.size < 4) { +return Status::Corruption( +Substitute("Compressed page's size is too small, size=$0, needed=$1", + _data.size, 4)); +} +_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data); +return Status::OK(); +} + +Status PageDecompressor::decompress_to(void* buf) { +Slice compressed_slice(_data.data + 4, _data.size - 4); +if (compressed_slice.size == _uncompressed_bytes) { +// If compressed_slice's size is equal with _uncompressed_bytes, it means +// compressor store this directly without compression. So we just copy +// this to buf and return. +memcpy(buf, compressed_slice.data, _uncompressed_bytes); +return Status::OK(); +} + +Slice uncompressed_data((char*)buf, _uncompressed_bytes); +RETURN_IF_ERROR(_codec->decompress(compressed_slice, &uncompressed_data)); +if (uncompressed_data.size != _uncompressed_bytes) { +// If size after decompress didn't match recorded size, we think this +// page is corrupt. +return Status::Corruption( +Substitute("Uncompressed size not match, record=$0 vs decompress=$1", + _uncompressed_bytes, uncompressed_data.size)); +} +return Status::OK(); +} + +Status PageCompressor::compress(const std::vector& raw_data, +std::vector* compressed_data) { +size_t uncompressed_bytes = Slice::compute_total_size(raw_data); +size_t max_compressed_bytes = _codec->max_compressed_len(uncompressed_bytes); +_buf.resize(max_compressed_bytes + 4); +Slice compressed_slice(_buf.data() + 4, max_compressed_bytes); +RETURN_IF_ERROR(_codec->compress(raw_data, &compressed_slice)); +double compression_ratio = (double)compressed_slice.size / uncompressed_bytes; Review comment: use integer to make it definite This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset
imay commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314312451 ## File path: be/src/olap/rowset/segment_v2/column_writer.cpp ## @@ -226,10 +228,14 @@ Status ColumnWriter::_write_data_page(Page* page) { Status ColumnWriter::_write_physical_page(std::vector* origin_data, PagePointer* pp) { std::vector* output_data = origin_data; std::vector compressed_data; -// TODO(zc): support compress -// if (_need_compress) { -// output_data = &compressed_data; -// } + +// Put compressor out of if block, because we should use compressor's +// content until this function finished. +PageCompressor compressor(_compress_codec); +if (_compress_codec != nullptr) { +RETURN_IF_ERROR(compressor.compress(*origin_data, &compressed_data)); +output_data = &compressed_data; +} // checksum uint8_t checksum_buf[sizeof(uint32_t)]; Review comment: this is a stack array, can not leak This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset
gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314300861 ## File path: be/src/olap/rowset/segment_v2/column_writer.h ## @@ -140,8 +142,7 @@ class ColumnWriter { rowid_t _next_rowid = 0; const EncodingInfo* _encoding_info = nullptr; -// const CompressionCodec* _codec = nullptr; -// TODO(zc): compression type +const BlockCompressionCodec* _compress_codec = nullptr; Review comment: This field is not needed. Just pass `_opts.compression_type` into PageCompressor and let it retrieve BlockCompressionCodec. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset
gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314264398 ## File path: be/src/util/block_compression.cpp ## @@ -319,7 +319,8 @@ class ZlibBlockCompression : public BlockCompressionCodec { auto zres = deflateInit(&zstrm, Z_DEFAULT_COMPRESSION); if (zres != Z_OK) { return Status::InvalidArgument( -Substitute("Fail to do ZLib stream compress, error=$0", zError(zres))); +Substitute("Fail to do ZLib stream compress, error=$0, res=", Review comment: ```suggestion Substitute("Fail to do ZLib stream compress, error=$0, res=$1", ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset
gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314299730 ## File path: be/src/olap/rowset/segment_v2/column_writer.h ## @@ -38,6 +39,7 @@ struct ColumnWriterOptions { CompressionTypePB compression_type = NO_COMPRESSION; bool need_checksum = false; size_t data_page_size = 64 * 1024; +double min_compress_ratio = 0.9; Review comment: the name is confusing because compression ratio is conventionally calculated as `uncompressed_size / compressed_size`. I think `space saving` is more appropriate here, see the definition in https://en.wikipedia.org/wiki/Data_compression_ratio ```suggestion // store compressed page only when space saving is above the threshold. // space saving = 1 - compressed_size / uncompressed_size double compression_min_space_saving = 0.1; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset
gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314264051 ## File path: be/src/olap/rowset/segment_v2/segment_writer.cpp ## @@ -63,6 +63,7 @@ Status SegmentWriter::init(uint32_t write_mbytes_per_sec) { DCHECK(type_info != nullptr); ColumnWriterOptions opts; +opts.compression_type = segment_v2::CompressionTypePB::LZ4F; Review comment: better to set default in ColumnWriterOptions structs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset
gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314272255 ## File path: be/src/olap/rowset/segment_v2/page_compression.cpp ## @@ -0,0 +1,88 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "olap/rowset/segment_v2/page_compression.h" + +#include "gutil/strings/substitute.h" +#include "util/block_compression.h" +#include "util/coding.h" + +namespace doris { +namespace segment_v2 { + +using strings::Substitute; + +Status PageDecompressor::init() { +if (_data.size < 4) { +return Status::Corruption( +Substitute("Compressed page's size is too small, size=$0, needed=$1", + _data.size, 4)); +} +_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data); +return Status::OK(); +} + +Status PageDecompressor::decompress_to(void* buf) { +Slice compressed_slice(_data.data + 4, _data.size - 4); +if (compressed_slice.size == _uncompressed_bytes) { Review comment: I think this is not safe. There exists rare cases where compressed size equals uncompressed size, but buffer content has been changed by the compression process. I'd prefer to store compression type (one byte) after page data. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset
gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314311623 ## File path: be/src/olap/rowset/segment_v2/page_compression.cpp ## @@ -0,0 +1,88 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "olap/rowset/segment_v2/page_compression.h" + +#include "gutil/strings/substitute.h" +#include "util/block_compression.h" +#include "util/coding.h" + +namespace doris { +namespace segment_v2 { + +using strings::Substitute; + +Status PageDecompressor::init() { +if (_data.size < 4) { +return Status::Corruption( +Substitute("Compressed page's size is too small, size=$0, needed=$1", + _data.size, 4)); +} +_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data); +return Status::OK(); +} + +Status PageDecompressor::decompress_to(void* buf) { +Slice compressed_slice(_data.data + 4, _data.size - 4); +if (compressed_slice.size == _uncompressed_bytes) { +// If compressed_slice's size is equal with _uncompressed_bytes, it means +// compressor store this directly without compression. So we just copy +// this to buf and return. +memcpy(buf, compressed_slice.data, _uncompressed_bytes); Review comment: Can we avoid the memcpy in this case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset
gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314303135 ## File path: be/src/olap/rowset/segment_v2/column_writer.h ## @@ -38,6 +39,7 @@ struct ColumnWriterOptions { CompressionTypePB compression_type = NO_COMPRESSION; bool need_checksum = false; Review comment: It's not related to this PR, but I think `need_checksum` should be removed. Reader can choose to ignore or verify checksum but writer should always compute and store checksum for each page. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset
gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314301529 ## File path: be/test/olap/rowset/segment_v2/column_reader_writer_test.cpp ## @@ -58,6 +58,7 @@ void test_nullable_data(uint8_t* src_data, uint8_t* src_is_null, int num_rows, s ColumnWriterOptions writer_opts; writer_opts.encoding_type = encoding; +writer_opts.compression_type = segment_v2::CompressionTypePB::LZ4F; Review comment: make LZ4F the default value for ColumnWriterOptions.compression_type so that we don't need to set it everywhere This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset
gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314262395 ## File path: be/src/olap/rowset/segment_v2/page_compression.h ## @@ -0,0 +1,98 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include + +#include "common/status.h" +#include "util/slice.h" +#include "util/faststring.h" + +namespace doris { + +class BlockCompressionCodec; + +namespace segment_v2 { + +// Helper to decompress a compressed page. +// Compressed page is composed of +// Header | CompressedData +// Header : uncompressed data length(fixed32) +// CompressedData: binary +// Usage: +// Slice compressed_data; +// PageDecompressor decompressor(compressed_data, uncomrcodec); +// RETURN_IF_ERROR(decompressor.init()); +// std::string buf; +// buf.resize(decompressor.uncompressed_bytes()); +// RETURN_IF_ERROR(decompress_to(buf.data())); +class PageDecompressor { +public: +PageDecompressor(const Slice& data, const BlockCompressionCodec* codec) +: _data(data), _codec(codec) { +} + +// Parse and validate compressed page's header. +// Only this funciton is executed successfully, uncompressed_bytes +// and decompress_to can be called. +// Return error if this page is corrupt. +Status init(); + +// Get uncompressed size in bytes of this page +size_t uncompressed_bytes() const { return _uncompressed_bytes; } + +// Decmopress compressed data into buf whose capacity must be greater than +// uncompressed_bytes() +Status decompress_to(void* buf); +private: +Slice _data; +const BlockCompressionCodec* _codec; +size_t _uncompressed_bytes; +}; + +// Helper to build a compress page. +// Usage: +// std:: raw_data; Review comment: ```suggestion // std::vector raw_data; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset
imay commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314316890 ## File path: be/src/olap/rowset/segment_v2/page_compression.h ## @@ -0,0 +1,98 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include + +#include "common/status.h" +#include "util/slice.h" +#include "util/faststring.h" + +namespace doris { + +class BlockCompressionCodec; + +namespace segment_v2 { + +// Helper to decompress a compressed page. +// Compressed page is composed of +// Header | CompressedData +// Header : uncompressed data length(fixed32) +// CompressedData: binary +// Usage: +// Slice compressed_data; +// PageDecompressor decompressor(compressed_data, uncomrcodec); +// RETURN_IF_ERROR(decompressor.init()); +// std::string buf; +// buf.resize(decompressor.uncompressed_bytes()); +// RETURN_IF_ERROR(decompress_to(buf.data())); +class PageDecompressor { +public: +PageDecompressor(const Slice& data, const BlockCompressionCodec* codec) +: _data(data), _codec(codec) { +} + +// Parse and validate compressed page's header. +// Only this funciton is executed successfully, uncompressed_bytes +// and decompress_to can be called. +// Return error if this page is corrupt. +Status init(); + +// Get uncompressed size in bytes of this page +size_t uncompressed_bytes() const { return _uncompressed_bytes; } + +// Decmopress compressed data into buf whose capacity must be greater than +// uncompressed_bytes() +Status decompress_to(void* buf); +private: +Slice _data; +const BlockCompressionCodec* _codec; +size_t _uncompressed_bytes; +}; + +// Helper to build a compress page. +// Usage: +// std:: raw_data; Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset
imay commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314317409 ## File path: be/src/util/block_compression.cpp ## @@ -319,7 +319,8 @@ class ZlibBlockCompression : public BlockCompressionCodec { auto zres = deflateInit(&zstrm, Z_DEFAULT_COMPRESSION); if (zres != Z_OK) { return Status::InvalidArgument( -Substitute("Fail to do ZLib stream compress, error=$0", zError(zres))); +Substitute("Fail to do ZLib stream compress, error=$0, res=", Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset
imay commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314320623 ## File path: be/src/olap/rowset/segment_v2/column_writer.h ## @@ -38,6 +39,7 @@ struct ColumnWriterOptions { CompressionTypePB compression_type = NO_COMPRESSION; bool need_checksum = false; size_t data_page_size = 64 * 1024; +double min_compress_ratio = 0.9; Review comment: OK, I will change it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset
imay commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314322444 ## File path: be/src/olap/rowset/segment_v2/page_compression.cpp ## @@ -0,0 +1,88 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "olap/rowset/segment_v2/page_compression.h" + +#include "gutil/strings/substitute.h" +#include "util/block_compression.h" +#include "util/coding.h" + +namespace doris { +namespace segment_v2 { + +using strings::Substitute; + +Status PageDecompressor::init() { +if (_data.size < 4) { +return Status::Corruption( +Substitute("Compressed page's size is too small, size=$0, needed=$1", + _data.size, 4)); +} +_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data); +return Status::OK(); +} + +Status PageDecompressor::decompress_to(void* buf) { +Slice compressed_slice(_data.data + 4, _data.size - 4); +if (compressed_slice.size == _uncompressed_bytes) { Review comment: Actually, for this case, we will only store uncompressed data. In PageCompressor, if compressed size >= uncompressed size we will store uncompressed data. I prefer this method, because this method is easy to parse and use less space. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset
imay commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314322807 ## File path: be/src/olap/rowset/segment_v2/segment_writer.cpp ## @@ -63,6 +63,7 @@ Status SegmentWriter::init(uint32_t write_mbytes_per_sec) { DCHECK(type_info != nullptr); ColumnWriterOptions opts; +opts.compression_type = segment_v2::CompressionTypePB::LZ4F; Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset
imay commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314324126 ## File path: be/test/olap/rowset/segment_v2/column_reader_writer_test.cpp ## @@ -58,6 +58,7 @@ void test_nullable_data(uint8_t* src_data, uint8_t* src_is_null, int num_rows, s ColumnWriterOptions writer_opts; writer_opts.encoding_type = encoding; +writer_opts.compression_type = segment_v2::CompressionTypePB::LZ4F; Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset
imay commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314324063 ## File path: be/src/olap/rowset/segment_v2/column_writer.h ## @@ -140,8 +142,7 @@ class ColumnWriter { rowid_t _next_rowid = 0; const EncodingInfo* _encoding_info = nullptr; -// const CompressionCodec* _codec = nullptr; -// TODO(zc): compression type +const BlockCompressionCodec* _compress_codec = nullptr; Review comment: I prefer to keep this field here. If we keep it here, we will find codec only once, otherwise we will retrieve it for each page. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset
imay commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314324798 ## File path: be/src/olap/rowset/segment_v2/column_writer.h ## @@ -38,6 +39,7 @@ struct ColumnWriterOptions { CompressionTypePB compression_type = NO_COMPRESSION; bool need_checksum = false; Review comment: Yeah, in next PR, I will support checksum in page, then I will rethink how we support checksum. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset
imay commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314326190 ## File path: be/src/olap/rowset/segment_v2/page_compression.cpp ## @@ -0,0 +1,88 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "olap/rowset/segment_v2/page_compression.h" + +#include "gutil/strings/substitute.h" +#include "util/block_compression.h" +#include "util/coding.h" + +namespace doris { +namespace segment_v2 { + +using strings::Substitute; + +Status PageDecompressor::init() { +if (_data.size < 4) { +return Status::Corruption( +Substitute("Compressed page's size is too small, size=$0, needed=$1", + _data.size, 4)); +} +_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data); +return Status::OK(); +} + +Status PageDecompressor::decompress_to(void* buf) { +Slice compressed_slice(_data.data + 4, _data.size - 4); +if (compressed_slice.size == _uncompressed_bytes) { +// If compressed_slice's size is equal with _uncompressed_bytes, it means +// compressor store this directly without compression. So we just copy +// this to buf and return. +memcpy(buf, compressed_slice.data, _uncompressed_bytes); Review comment: It will be a little tricky. And I will make it TODO This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay merged pull request #1655: Fix get label when use StreamLoad
imay merged pull request #1655: Fix get label when use StreamLoad URL: https://github.com/apache/incubator-doris/pull/1655 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset
gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314565548 ## File path: be/src/olap/rowset/segment_v2/page_compression.cpp ## @@ -0,0 +1,88 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "olap/rowset/segment_v2/page_compression.h" + +#include "gutil/strings/substitute.h" +#include "util/block_compression.h" +#include "util/coding.h" + +namespace doris { +namespace segment_v2 { + +using strings::Substitute; + +Status PageDecompressor::init() { +if (_data.size < 4) { +return Status::Corruption( +Substitute("Compressed page's size is too small, size=$0, needed=$1", + _data.size, 4)); +} +_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data); +return Status::OK(); +} + +Status PageDecompressor::decompress_to(void* buf) { +Slice compressed_slice(_data.data + 4, _data.size - 4); +if (compressed_slice.size == _uncompressed_bytes) { Review comment: Yeah, you're right This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset
gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314565890 ## File path: be/src/olap/rowset/segment_v2/column_writer.h ## @@ -140,8 +142,7 @@ class ColumnWriter { rowid_t _next_rowid = 0; const EncodingInfo* _encoding_info = nullptr; -// const CompressionCodec* _codec = nullptr; -// TODO(zc): compression type +const BlockCompressionCodec* _compress_codec = nullptr; Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset
gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314566101 ## File path: be/src/olap/rowset/segment_v2/column_writer.h ## @@ -38,6 +39,7 @@ struct ColumnWriterOptions { CompressionTypePB compression_type = NO_COMPRESSION; bool need_checksum = false; Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] morningman opened a new pull request #1658: FROM_UNIXTIME should only convert timestamp from 0 to 253402271999
morningman opened a new pull request #1658: FROM_UNIXTIME should only convert timestamp from 0 to 253402271999 URL: https://github.com/apache/incubator-doris/pull/1658 which is between 1900-01-01 00:00:00 ~ -12-31 23:59:59, otherwise, return null This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1658: FROM_UNIXTIME should only convert timestamp from 0 to 253402271999
imay commented on a change in pull request #1658: FROM_UNIXTIME should only convert timestamp from 0 to 253402271999 URL: https://github.com/apache/incubator-doris/pull/1658#discussion_r314569259 ## File path: be/src/runtime/datetime_value.cpp ## @@ -1544,6 +1544,10 @@ bool DateTimeValue::unix_timestamp(int64_t* timestamp, const std::string& timezo } bool DateTimeValue::from_unixtime(int64_t timestamp, const std::string& timezone) { +// timestamp should between 1900-01-01 00:00:00 ~ -12-31 23:59:59 Review comment: ```suggestion // timestamp should between 1970-01-01 00:00:00 ~ -12-31 23:59:59 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] morningman closed pull request #1642: add kafka_default_offsets when no partiotion specify .support read kafka partition from start
morningman closed pull request #1642: add kafka_default_offsets when no partiotion specify .support read kafka partition from start URL: https://github.com/apache/incubator-doris/pull/1642 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] wkhappy1 opened a new pull request #1642: add kafka_default_offsets when no partiotion specify .support read kafka partition from start
wkhappy1 opened a new pull request #1642: add kafka_default_offsets when no partiotion specify .support read kafka partition from start URL: https://github.com/apache/incubator-doris/pull/1642 add kafka_default_offsets when no partiotion specify value OFFSET_BEGINNING,OFFSET_END This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay opened a new pull request #1659: Remove tempory fail UT
imay opened a new pull request #1659: Remove tempory fail UT URL: https://github.com/apache/incubator-doris/pull/1659 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay merged pull request #1659: Remove tempory fail UT
imay merged pull request #1659: Remove tempory fail UT URL: https://github.com/apache/incubator-doris/pull/1659 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] yuanlihan opened a new pull request #1635: Enable parsing columns from file path for Broker Load (#1582)
yuanlihan opened a new pull request #1635: Enable parsing columns from file path for Broker Load (#1582) URL: https://github.com/apache/incubator-doris/pull/1635 Currently, we do not support parsing encoded/compressed columns in file path, eg: extract column k1 from file path /path/to/dir/k1=1/xxx.csv This patch is able to parse columns from file path like in Spark(Partition Discovery). This patch parse partition columns at BrokerScanNode.java and save parsing result of each file path as a property of TBrokerRangeDesc, then the broker reader of BE can read the value of specified partition column. (I'm sorry to create a new pr about this issue for being not familiar with `git rebase` ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] yuanlihan closed pull request #1635: Enable parsing columns from file path for Broker Load (#1582)
yuanlihan closed pull request #1635: Enable parsing columns from file path for Broker Load (#1582) URL: https://github.com/apache/incubator-doris/pull/1635 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] morningman merged pull request #1642: add kafka_default_offsets when no partiotion specify .support read kafka partition from start
morningman merged pull request #1642: add kafka_default_offsets when no partiotion specify .support read kafka partition from start URL: https://github.com/apache/incubator-doris/pull/1642 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1657: Doc review
imay commented on a change in pull request #1657: Doc review URL: https://github.com/apache/incubator-doris/pull/1657#discussion_r314600486 ## File path: doc_review ## @@ -0,0 +1 @@ +see issue:https://github.com/apache/incubator-doris/issues/1656 Review comment: I think this file is useless, can you please remove it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay merged pull request #1657: Doc review
imay merged pull request #1657: Doc review URL: https://github.com/apache/incubator-doris/pull/1657 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org