[GitHub] [incubator-doris] EmmyMiao87 commented on issue #1650: Where clause can't push down into the right table of JOIN clasure

2019-08-15 Thread GitBox
EmmyMiao87 commented on issue #1650: Where clause can't push down into the 
right table of JOIN clasure
URL: 
https://github.com/apache/incubator-doris/issues/1650#issuecomment-521538908
 
 
   The where predicate could not be pushed down when the predicate is on the 
outer join table.
   For example: `select * from a left join b on a.id=b.id where b.id=1;` 
   In some cases, the predicate could be pushed down.
   In impala, for example:  
   Query: explain select * from customer_address a left join customer_address b 
on a.ca_address_sk=b.ca_address_sk where b.ca_address_sk=1
   
++
   | Explain String 
|
   
++
   | Estimated Per-Host Requirements: Memory=0B VCores=2
|
   | WARNING: The following tables are missing relevant table and/or column 
statistics. |
   | tpcds_hive.customer_address
|
   |
|
   | PLAN-ROOT SINK 
|
   | |  
|
   | 04:EXCHANGE [UNPARTITIONED]
|
   | |  
|
   | 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]  
|
   | |  hash predicates: a.ca_address_sk = b.ca_address_sk  
|
   | |  other predicates: b.ca_address_sk = 1   
|
   | |  
|
   | |--03:EXCHANGE [BROADCAST] 
|
   | |  |   
|
   | |  01:SCAN HDFS [tpcds_hive.customer_address b]
|
   | | partitions=1/1 files=0 size=0B   
|
   | | predicates: b.ca_address_sk = 1  
|
   | |  
|
   | 00:SCAN HDFS [tpcds_hive.customer_address a]   
|
   |partitions=1/1 files=0 size=0B  
|
   
++


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay opened a new issue #1651: Support checksum in BetaRowset

2019-08-15 Thread GitBox
imay opened a new issue #1651: Support checksum in BetaRowset
URL: https://github.com/apache/incubator-doris/issues/1651
 
 
   We should add checksum for BetaRowset.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay opened a new issue #1652: support filter with predicate in BetaRowset

2019-08-15 Thread GitBox
imay opened a new issue #1652: support filter with predicate in BetaRowset
URL: https://github.com/apache/incubator-doris/issues/1652
 
 
   Support predicate filter in BetaRowset iterator.
   
   It's better if we can support lazy materialization.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay closed issue #1391: Add column page cache

2019-08-15 Thread GitBox
imay closed issue #1391: Add column page cache
URL: https://github.com/apache/incubator-doris/issues/1391
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #1642: add kafka_default_offsets when no partiotion specify .support read kafka partition from start

2019-08-15 Thread GitBox
morningman commented on a change in pull request #1642: add 
kafka_default_offsets  when no partiotion specify .support read kafka partition 
from start
URL: https://github.com/apache/incubator-doris/pull/1642#discussion_r314215358
 
 

 ##
 File path: 
fe/src/main/java/org/apache/doris/load/routineload/KafkaRoutineLoadJob.java
 ##
 @@ -128,6 +132,9 @@ private void convertCustomProperties() throws DdlException 
{
 convertedCustomProperties.put(entry.getKey(), 
entry.getValue());
 }
 }
+if (convertedCustomProperties.containsKey(KAFKA_DEFAULT_OFFSETS)) {
+kafkaDefaultOffSet = 
convertedCustomProperties.get(KAFKA_DEFAULT_OFFSETS);
 
 Review comment:
   ```suggestion
   kafkaDefaultOffSet = 
convertedCustomProperties.remove(KAFKA_DEFAULT_OFFSETS);
   ```
   
   convertedCustomProperties 中只能包含可以被 librdkafka 识别的properties,所以这个default 
offset 需要单独移除


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] EmmyMiao87 commented on issue #1650: Where clause can't push down into the right table of JOIN clasure

2019-08-15 Thread GitBox
EmmyMiao87 commented on issue #1650: Where clause can't push down into the 
right table of JOIN clasure
URL: 
https://github.com/apache/incubator-doris/issues/1650#issuecomment-521565573
 
 
   This 
https://www.ibm.com/developerworks/data/library/techarticle/purcell/0201purcell.html
 provides a theoretical support for a WHERE clause predicate applied to the 
NULL-supplying table.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] EmmyMiao87 commented on issue #353: Support loading data from Kafka

2019-08-15 Thread GitBox
EmmyMiao87 commented on issue #353: Support loading data from Kafka
URL: https://github.com/apache/incubator-doris/issues/353#issuecomment-521572294
 
 
   #967 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] EmmyMiao87 commented on issue #353: Support loading data from Kafka

2019-08-15 Thread GitBox
EmmyMiao87 commented on issue #353: Support loading data from Kafka
URL: https://github.com/apache/incubator-doris/issues/353#issuecomment-521572468
 
 
   #1650 scheduler routine load job for stream load 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] EmmyMiao87 removed a comment on issue #353: Support loading data from Kafka

2019-08-15 Thread GitBox
EmmyMiao87 removed a comment on issue #353: Support loading data from Kafka
URL: https://github.com/apache/incubator-doris/issues/353#issuecomment-521572468
 
 
   #1650 scheduler routine load job for stream load 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] yiguolei opened a new pull request #1653: Use same dir during schema change

2019-08-15 Thread GitBox
yiguolei opened a new pull request #1653: Use same dir during schema change
URL: https://github.com/apache/incubator-doris/pull/1653
 
 
   1. Use same dir during schema change
   2. Should start column id from base tablet's unique id


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] morningman merged pull request #1653: Use same dir during schema change

2019-08-15 Thread GitBox
morningman merged pull request #1653: Use same dir during schema change
URL: https://github.com/apache/incubator-doris/pull/1653
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] morningman closed pull request #1613: Refactor alter job process

2019-08-15 Thread GitBox
morningman closed pull request #1613: Refactor alter job process
URL: https://github.com/apache/incubator-doris/pull/1613
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] DDDDDDouble opened a new issue #1654: When we use streamload, we can't get the label in the log

2019-08-15 Thread GitBox
DDouble opened a new issue #1654: When we use streamload, we can't get the 
label in the log
URL: https://github.com/apache/incubator-doris/issues/1654
 
 
   @imay Hi, I will PR later. Below is my log information
   
   less log/fe.log
   2019-08-15 11:04:11,494 INFO 161 [LoadAction.executeWithoutPassword():139] 
redirect load action to destination=TNetworkAddress(hostname:10.26.43.136, 
port:8040), stream: true, db: midas_report, tbl: MCD_CampaignReportDate, label: 
null


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] DDDDDDouble opened a new pull request #1655: Fix get label when use StreamLoad

2019-08-15 Thread GitBox
DDouble opened a new pull request #1655: Fix get label when use StreamLoad
URL: https://github.com/apache/incubator-doris/pull/1655
 
 
   For https://github.com/apache/incubator-doris/issues/1654


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] xonze closed issue #1649: 0.10.12升级到0.10.15后fe不能正常选主

2019-08-15 Thread GitBox
xonze closed issue #1649: 0.10.12升级到0.10.15后fe不能正常选主
URL: https://github.com/apache/incubator-doris/issues/1649
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] xonze commented on issue #1649: 0.10.12升级到0.10.15后fe不能正常选主

2019-08-15 Thread GitBox
xonze commented on issue #1649: 0.10.12升级到0.10.15后fe不能正常选主
URL: 
https://github.com/apache/incubator-doris/issues/1649#issuecomment-521620570
 
 
   经测试是升级完后我用了openjdk的原因,换成oracle 官方jdk "1.8.0_211"版本一切如初


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] manannan2017 opened a new issue #1656: the sql-reference doc review,i have some doubts

2019-08-15 Thread GitBox
manannan2017 opened a new issue #1656: the sql-reference doc review,i have some 
doubts
URL: https://github.com/apache/incubator-doris/issues/1656
 
 
   一、github上sql-reference
   1、dayofweek(DATETIME date)描述不清晰
   原来:参数为Date或者Datetime类型; 纠正信息:可cast到date的数字也可以
   
地址:https://github.com/apache/incubator-doris/blob/master/docs/documentation/cn/sql-reference/sql-functions/date-time-functions/dayofweek.md
   
   2、ST_Polygon建议补充信息:多边形不能交叉
   关键字不能用:
   
![image](https://user-images.githubusercontent.com/33174388/63094043-d8d12f00-bf99-11e9-923c-69cf377e93a4.png)
   
地址:https://github.com/apache/incubator-doris/blob/master/docs/documentation/cn/sql-reference/sql-functions/spatial-functions/st_polygon.md
   
   3、建议补充圆的半径最大值信息【超过8位数,就不准了】
   SELECT ST_AsText(ST_Circle(111, -64, 123456789));
   
地址:https://github.com/apache/incubator-doris/blob/master/docs/documentation/cn/sql-reference/sql-functions/spatial-functions/st_circle.md
   
   二、doris官网
   1、导入,分隔符与描述的不一致
   
![image](https://user-images.githubusercontent.com/33174388/63094093-03bb8300-bf9a-11e9-86d8-7cf539d4935f.png)
   
   地址:http://doris.apache.org/Docs/cn/getting-started/basic-usage.html#id10
   
   2、描述错误:int 类型占4字节
   
![image](https://user-images.githubusercontent.com/33174388/63094111-103fdb80-bf9a-11e9-9275-c204d44a86c8.png)
   地址:http://doris.apache.org/Docs/cn/getting-started/data-model-rollup.html
   
   3、错别字
   
![image](https://user-images.githubusercontent.com/33174388/63094137-2352ab80-bf9a-11e9-8d3a-42e9d977a5dd.png)
   
   地址:http://doris.apache.org/Docs/cn/getting-started/data-model-rollup.html
   
   4、序号有误
   
![image](https://user-images.githubusercontent.com/33174388/63094165-32395e00-bf9a-11e9-94ee-40ad09caf061.png)
   
   地址:http://doris.apache.org/Docs/cn/getting-started/best-practice.html
   
   5、数据划分 错别字
   
![image](https://user-images.githubusercontent.com/33174388/63094180-3ebdb680-bf9a-11e9-8f93-27794cd8e61d.png)
   
   
   三、命令行里help文档
   1、help create table:
   
![image](https://user-images.githubusercontent.com/33174388/63094206-4aa97880-bf9a-11e9-9636-5845574d8d6b.png)
   
   2、范围是否正确
   
![image](https://user-images.githubusercontent.com/33174388/63094222-572dd100-bf9a-11e9-9192-6618af764e58.png)
   
   3、help mini load
   
![image](https://user-images.githubusercontent.com/33174388/63094243-6280fc80-bf9a-11e9-9e05-313e81740feb.png)
   
   4、help largeint
   
![image](https://user-images.githubusercontent.com/33174388/63094251-69a80a80-bf9a-11e9-96f0-ab6a69576f5f.png)
   
   5、help create file
   
![image](https://user-images.githubusercontent.com/33174388/63094266-7167af00-bf9a-11e9-8fd3-3dda7f5b80c7.png)
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] manannan2017 opened a new pull request #1657: Doc review

2019-08-15 Thread GitBox
manannan2017 opened a new pull request #1657: Doc review
URL: https://github.com/apache/incubator-doris/pull/1657
 
 
   the sql-reference doc review,i have some doubts。Take a look at your hard 
work,thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on issue #1656: the sql-reference doc review,i have some doubts

2019-08-15 Thread GitBox
imay commented on issue #1656: the sql-reference doc review,i have some doubts
URL: 
https://github.com/apache/incubator-doris/issues/1656#issuecomment-521637670
 
 
   Can you change it for us, and give PR to fix it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] chaoyli commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
chaoyli commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314308435
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/page_compression.cpp
 ##
 @@ -0,0 +1,88 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/rowset/segment_v2/page_compression.h"
+
+#include "gutil/strings/substitute.h"
+#include "util/block_compression.h"
+#include "util/coding.h"
+
+namespace doris {
+namespace segment_v2 {
+
+using strings::Substitute;
+
+Status PageDecompressor::init() {
+if (_data.size < 4) {
+return Status::Corruption(
+Substitute("Compressed page's size is too small, size=$0, 
needed=$1",
+   _data.size, 4));
+}
+_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data);
+return Status::OK();
+}
+
+Status PageDecompressor::decompress_to(void* buf) {
+Slice compressed_slice(_data.data + 4, _data.size - 4);
+if (compressed_slice.size == _uncompressed_bytes) {
+// If compressed_slice's size is equal with _uncompressed_bytes, it 
means
+// compressor store this directly without compression. So we just copy
+// this to buf and return.
+memcpy(buf, compressed_slice.data, _uncompressed_bytes);
+return Status::OK();
+}
+
+Slice uncompressed_data((char*)buf, _uncompressed_bytes);
+RETURN_IF_ERROR(_codec->decompress(compressed_slice, &uncompressed_data));
+if (uncompressed_data.size != _uncompressed_bytes) {
+// If size after decompress didn't match recorded size, we think this
+// page is corrupt.
+return Status::Corruption(
+Substitute("Uncompressed size not match, record=$0 vs 
decompress=$1",
+   _uncompressed_bytes, uncompressed_data.size));
+}
+return Status::OK();
+}
+
+Status PageCompressor::compress(const std::vector& raw_data,
+std::vector* compressed_data) {
+size_t uncompressed_bytes = Slice::compute_total_size(raw_data);
+size_t max_compressed_bytes = 
_codec->max_compressed_len(uncompressed_bytes);
+_buf.resize(max_compressed_bytes + 4);
+Slice compressed_slice(_buf.data() + 4, max_compressed_bytes);
+RETURN_IF_ERROR(_codec->compress(raw_data, &compressed_slice));
+double compression_ratio = (double)compressed_slice.size / 
uncompressed_bytes;
+if (compressed_slice.size >= uncompressed_bytes ||
+compression_ratio > _min_compression_ratio) {
+// If compression ration is not lower enough we just copy uncompressed
+// data to avoid decompression CPU cost
+encode_fixed32_le((uint8_t*)_buf.data(), uncompressed_bytes);
 
 Review comment:
   encode_fixed32_le((uint8_t*)_buf.data(), uncompressed_bytes) seems redundant 
with line 81


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] chaoyli commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
chaoyli commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314298476
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/column_writer.cpp
 ##
 @@ -226,10 +228,14 @@ Status ColumnWriter::_write_data_page(Page* page) {
 Status ColumnWriter::_write_physical_page(std::vector* origin_data, 
PagePointer* pp) {
 std::vector* output_data = origin_data;
 std::vector compressed_data;
-// TODO(zc): support compress
-// if (_need_compress) {
-// output_data = &compressed_data;
-// }
+
+// Put compressor out of if block, because we should use compressor's
+// content until this function finished.
+PageCompressor compressor(_compress_codec);
+if (_compress_codec != nullptr) {
+RETURN_IF_ERROR(compressor.compress(*origin_data, &compressed_data));
+output_data = &compressed_data;
+}
 
 // checksum
 uint8_t checksum_buf[sizeof(uint32_t)];
 
 Review comment:
   if _opts.need_checksum is false, checksum_buf will be leaked.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] chaoyli commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
chaoyli commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314304332
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/page_compression.h
 ##
 @@ -0,0 +1,98 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/slice.h"
+#include "util/faststring.h"
+
+namespace doris {
+
+class BlockCompressionCodec;
+
+namespace segment_v2 {
+
+// Helper to decompress a compressed page.
+// Compressed page is composed of
+// Header | CompressedData
+// Header : uncompressed data length(fixed32)
+// CompressedData: binary
+// Usage: 
+//  Slice compressed_data;
+//  PageDecompressor decompressor(compressed_data, uncomrcodec);
+//  RETURN_IF_ERROR(decompressor.init());
+//  std::string buf;
+//  buf.resize(decompressor.uncompressed_bytes());
+//  RETURN_IF_ERROR(decompress_to(buf.data()));
+class PageDecompressor {
+public:
+PageDecompressor(const Slice& data, const BlockCompressionCodec* codec)
+: _data(data), _codec(codec) {
+}
+
+// Parse and validate compressed page's header.
+// Only this funciton is executed successfully, uncompressed_bytes
+// and decompress_to can be called.
+// Return error if this page is corrupt.
+Status init();
+
+// Get uncompressed size in bytes of this page
+size_t uncompressed_bytes() const { return _uncompressed_bytes; }
+
+// Decmopress compressed data into buf whose capacity must be greater than
+// uncompressed_bytes()
+Status decompress_to(void* buf);
 
 Review comment:
   decompress will be better?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] chaoyli commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
chaoyli commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314304011
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/page_compression.cpp
 ##
 @@ -0,0 +1,88 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/rowset/segment_v2/page_compression.h"
+
+#include "gutil/strings/substitute.h"
+#include "util/block_compression.h"
+#include "util/coding.h"
+
+namespace doris {
+namespace segment_v2 {
+
+using strings::Substitute;
+
+Status PageDecompressor::init() {
+if (_data.size < 4) {
+return Status::Corruption(
+Substitute("Compressed page's size is too small, size=$0, 
needed=$1",
+   _data.size, 4));
+}
+_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data);
+return Status::OK();
+}
+
+Status PageDecompressor::decompress_to(void* buf) {
+Slice compressed_slice(_data.data + 4, _data.size - 4);
+if (compressed_slice.size == _uncompressed_bytes) {
+// If compressed_slice's size is equal with _uncompressed_bytes, it 
means
+// compressor store this directly without compression. So we just copy
+// this to buf and return.
+memcpy(buf, compressed_slice.data, _uncompressed_bytes);
+return Status::OK();
+}
+
+Slice uncompressed_data((char*)buf, _uncompressed_bytes);
+RETURN_IF_ERROR(_codec->decompress(compressed_slice, &uncompressed_data));
+if (uncompressed_data.size != _uncompressed_bytes) {
+// If size after decompress didn't match recorded size, we think this
+// page is corrupt.
+return Status::Corruption(
+Substitute("Uncompressed size not match, record=$0 vs 
decompress=$1",
+   _uncompressed_bytes, uncompressed_data.size));
+}
+return Status::OK();
+}
+
+Status PageCompressor::compress(const std::vector& raw_data,
+std::vector* compressed_data) {
+size_t uncompressed_bytes = Slice::compute_total_size(raw_data);
+size_t max_compressed_bytes = 
_codec->max_compressed_len(uncompressed_bytes);
+_buf.resize(max_compressed_bytes + 4);
+Slice compressed_slice(_buf.data() + 4, max_compressed_bytes);
+RETURN_IF_ERROR(_codec->compress(raw_data, &compressed_slice));
+double compression_ratio = (double)compressed_slice.size / 
uncompressed_bytes;
 
 Review comment:
   compressed_slice.size >= uncompressed_bytes is belongs to compression_ratio 
> _min_compression_ratio?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
imay commented on a change in pull request #1646: Support page compression in 
BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314311878
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/page_compression.cpp
 ##
 @@ -0,0 +1,88 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/rowset/segment_v2/page_compression.h"
+
+#include "gutil/strings/substitute.h"
+#include "util/block_compression.h"
+#include "util/coding.h"
+
+namespace doris {
+namespace segment_v2 {
+
+using strings::Substitute;
+
+Status PageDecompressor::init() {
+if (_data.size < 4) {
+return Status::Corruption(
+Substitute("Compressed page's size is too small, size=$0, 
needed=$1",
+   _data.size, 4));
+}
+_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data);
+return Status::OK();
+}
+
+Status PageDecompressor::decompress_to(void* buf) {
+Slice compressed_slice(_data.data + 4, _data.size - 4);
+if (compressed_slice.size == _uncompressed_bytes) {
+// If compressed_slice's size is equal with _uncompressed_bytes, it 
means
+// compressor store this directly without compression. So we just copy
+// this to buf and return.
+memcpy(buf, compressed_slice.data, _uncompressed_bytes);
+return Status::OK();
+}
+
+Slice uncompressed_data((char*)buf, _uncompressed_bytes);
+RETURN_IF_ERROR(_codec->decompress(compressed_slice, &uncompressed_data));
+if (uncompressed_data.size != _uncompressed_bytes) {
+// If size after decompress didn't match recorded size, we think this
+// page is corrupt.
+return Status::Corruption(
+Substitute("Uncompressed size not match, record=$0 vs 
decompress=$1",
+   _uncompressed_bytes, uncompressed_data.size));
+}
+return Status::OK();
+}
+
+Status PageCompressor::compress(const std::vector& raw_data,
+std::vector* compressed_data) {
+size_t uncompressed_bytes = Slice::compute_total_size(raw_data);
+size_t max_compressed_bytes = 
_codec->max_compressed_len(uncompressed_bytes);
+_buf.resize(max_compressed_bytes + 4);
+Slice compressed_slice(_buf.data() + 4, max_compressed_bytes);
+RETURN_IF_ERROR(_codec->compress(raw_data, &compressed_slice));
+double compression_ratio = (double)compressed_slice.size / 
uncompressed_bytes;
+if (compressed_slice.size >= uncompressed_bytes ||
+compression_ratio > _min_compression_ratio) {
+// If compression ration is not lower enough we just copy uncompressed
+// data to avoid decompression CPU cost
+encode_fixed32_le((uint8_t*)_buf.data(), uncompressed_bytes);
 
 Review comment:
   This can be merged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
imay commented on a change in pull request #1646: Support page compression in 
BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314312114
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/page_compression.h
 ##
 @@ -0,0 +1,98 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/slice.h"
+#include "util/faststring.h"
+
+namespace doris {
+
+class BlockCompressionCodec;
+
+namespace segment_v2 {
+
+// Helper to decompress a compressed page.
+// Compressed page is composed of
+// Header | CompressedData
+// Header : uncompressed data length(fixed32)
+// CompressedData: binary
+// Usage: 
+//  Slice compressed_data;
+//  PageDecompressor decompressor(compressed_data, uncomrcodec);
+//  RETURN_IF_ERROR(decompressor.init());
+//  std::string buf;
+//  buf.resize(decompressor.uncompressed_bytes());
+//  RETURN_IF_ERROR(decompress_to(buf.data()));
+class PageDecompressor {
+public:
+PageDecompressor(const Slice& data, const BlockCompressionCodec* codec)
+: _data(data), _codec(codec) {
+}
+
+// Parse and validate compressed page's header.
+// Only this funciton is executed successfully, uncompressed_bytes
+// and decompress_to can be called.
+// Return error if this page is corrupt.
+Status init();
+
+// Get uncompressed size in bytes of this page
+size_t uncompressed_bytes() const { return _uncompressed_bytes; }
+
+// Decmopress compressed data into buf whose capacity must be greater than
+// uncompressed_bytes()
+Status decompress_to(void* buf);
 
 Review comment:
   I don't think so


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
imay commented on a change in pull request #1646: Support page compression in 
BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314312234
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/page_compression.cpp
 ##
 @@ -0,0 +1,88 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/rowset/segment_v2/page_compression.h"
+
+#include "gutil/strings/substitute.h"
+#include "util/block_compression.h"
+#include "util/coding.h"
+
+namespace doris {
+namespace segment_v2 {
+
+using strings::Substitute;
+
+Status PageDecompressor::init() {
+if (_data.size < 4) {
+return Status::Corruption(
+Substitute("Compressed page's size is too small, size=$0, 
needed=$1",
+   _data.size, 4));
+}
+_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data);
+return Status::OK();
+}
+
+Status PageDecompressor::decompress_to(void* buf) {
+Slice compressed_slice(_data.data + 4, _data.size - 4);
+if (compressed_slice.size == _uncompressed_bytes) {
+// If compressed_slice's size is equal with _uncompressed_bytes, it 
means
+// compressor store this directly without compression. So we just copy
+// this to buf and return.
+memcpy(buf, compressed_slice.data, _uncompressed_bytes);
+return Status::OK();
+}
+
+Slice uncompressed_data((char*)buf, _uncompressed_bytes);
+RETURN_IF_ERROR(_codec->decompress(compressed_slice, &uncompressed_data));
+if (uncompressed_data.size != _uncompressed_bytes) {
+// If size after decompress didn't match recorded size, we think this
+// page is corrupt.
+return Status::Corruption(
+Substitute("Uncompressed size not match, record=$0 vs 
decompress=$1",
+   _uncompressed_bytes, uncompressed_data.size));
+}
+return Status::OK();
+}
+
+Status PageCompressor::compress(const std::vector& raw_data,
+std::vector* compressed_data) {
+size_t uncompressed_bytes = Slice::compute_total_size(raw_data);
+size_t max_compressed_bytes = 
_codec->max_compressed_len(uncompressed_bytes);
+_buf.resize(max_compressed_bytes + 4);
+Slice compressed_slice(_buf.data() + 4, max_compressed_bytes);
+RETURN_IF_ERROR(_codec->compress(raw_data, &compressed_slice));
+double compression_ratio = (double)compressed_slice.size / 
uncompressed_bytes;
 
 Review comment:
   use integer to make it definite


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
imay commented on a change in pull request #1646: Support page compression in 
BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314312451
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/column_writer.cpp
 ##
 @@ -226,10 +228,14 @@ Status ColumnWriter::_write_data_page(Page* page) {
 Status ColumnWriter::_write_physical_page(std::vector* origin_data, 
PagePointer* pp) {
 std::vector* output_data = origin_data;
 std::vector compressed_data;
-// TODO(zc): support compress
-// if (_need_compress) {
-// output_data = &compressed_data;
-// }
+
+// Put compressor out of if block, because we should use compressor's
+// content until this function finished.
+PageCompressor compressor(_compress_codec);
+if (_compress_codec != nullptr) {
+RETURN_IF_ERROR(compressor.compress(*origin_data, &compressed_data));
+output_data = &compressed_data;
+}
 
 // checksum
 uint8_t checksum_buf[sizeof(uint32_t)];
 
 Review comment:
   this is a stack array, can not leak


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
gaodayue commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314300861
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/column_writer.h
 ##
 @@ -140,8 +142,7 @@ class ColumnWriter {
 rowid_t _next_rowid = 0;
 
 const EncodingInfo* _encoding_info = nullptr;
-// const CompressionCodec* _codec = nullptr;
-// TODO(zc): compression type
+const BlockCompressionCodec* _compress_codec = nullptr;
 
 Review comment:
   This field is not needed. Just pass `_opts.compression_type` into 
PageCompressor and let it retrieve BlockCompressionCodec.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
gaodayue commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314264398
 
 

 ##
 File path: be/src/util/block_compression.cpp
 ##
 @@ -319,7 +319,8 @@ class ZlibBlockCompression : public BlockCompressionCodec {
 auto zres = deflateInit(&zstrm, Z_DEFAULT_COMPRESSION);
 if (zres != Z_OK) {
 return Status::InvalidArgument(
-Substitute("Fail to do ZLib stream compress, error=$0", 
zError(zres)));
+Substitute("Fail to do ZLib stream compress, error=$0, res=",
 
 Review comment:
   ```suggestion
   Substitute("Fail to do ZLib stream compress, error=$0, 
res=$1",
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
gaodayue commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314299730
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/column_writer.h
 ##
 @@ -38,6 +39,7 @@ struct ColumnWriterOptions {
 CompressionTypePB compression_type = NO_COMPRESSION;
 bool need_checksum = false;
 size_t data_page_size = 64 * 1024;
+double min_compress_ratio = 0.9;
 
 Review comment:
   the name is confusing because compression ratio is conventionally calculated 
as `uncompressed_size / compressed_size`. I think `space saving` is more 
appropriate here, see the definition in 
https://en.wikipedia.org/wiki/Data_compression_ratio 
   
   ```suggestion
   // store compressed page only when space saving is above the threshold.
   // space saving = 1 - compressed_size / uncompressed_size
   double compression_min_space_saving = 0.1;
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
gaodayue commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314264051
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/segment_writer.cpp
 ##
 @@ -63,6 +63,7 @@ Status SegmentWriter::init(uint32_t write_mbytes_per_sec) {
 DCHECK(type_info != nullptr);
 
 ColumnWriterOptions opts;
+opts.compression_type = segment_v2::CompressionTypePB::LZ4F;
 
 Review comment:
   better to set default in ColumnWriterOptions structs


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
gaodayue commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314272255
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/page_compression.cpp
 ##
 @@ -0,0 +1,88 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/rowset/segment_v2/page_compression.h"
+
+#include "gutil/strings/substitute.h"
+#include "util/block_compression.h"
+#include "util/coding.h"
+
+namespace doris {
+namespace segment_v2 {
+
+using strings::Substitute;
+
+Status PageDecompressor::init() {
+if (_data.size < 4) {
+return Status::Corruption(
+Substitute("Compressed page's size is too small, size=$0, 
needed=$1",
+   _data.size, 4));
+}
+_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data);
+return Status::OK();
+}
+
+Status PageDecompressor::decompress_to(void* buf) {
+Slice compressed_slice(_data.data + 4, _data.size - 4);
+if (compressed_slice.size == _uncompressed_bytes) {
 
 Review comment:
   I think this is not safe. There exists rare cases where compressed size 
equals uncompressed size, but buffer content has been changed by the 
compression process. I'd prefer to store compression type (one byte) after page 
data.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
gaodayue commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314311623
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/page_compression.cpp
 ##
 @@ -0,0 +1,88 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/rowset/segment_v2/page_compression.h"
+
+#include "gutil/strings/substitute.h"
+#include "util/block_compression.h"
+#include "util/coding.h"
+
+namespace doris {
+namespace segment_v2 {
+
+using strings::Substitute;
+
+Status PageDecompressor::init() {
+if (_data.size < 4) {
+return Status::Corruption(
+Substitute("Compressed page's size is too small, size=$0, 
needed=$1",
+   _data.size, 4));
+}
+_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data);
+return Status::OK();
+}
+
+Status PageDecompressor::decompress_to(void* buf) {
+Slice compressed_slice(_data.data + 4, _data.size - 4);
+if (compressed_slice.size == _uncompressed_bytes) {
+// If compressed_slice's size is equal with _uncompressed_bytes, it 
means
+// compressor store this directly without compression. So we just copy
+// this to buf and return.
+memcpy(buf, compressed_slice.data, _uncompressed_bytes);
 
 Review comment:
   Can we avoid the memcpy in this case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
gaodayue commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314303135
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/column_writer.h
 ##
 @@ -38,6 +39,7 @@ struct ColumnWriterOptions {
 CompressionTypePB compression_type = NO_COMPRESSION;
 bool need_checksum = false;
 
 Review comment:
   It's not related to this PR, but I think `need_checksum` should be removed. 
Reader can choose to ignore or verify checksum but writer should always compute 
and store checksum for each page.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
gaodayue commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314301529
 
 

 ##
 File path: be/test/olap/rowset/segment_v2/column_reader_writer_test.cpp
 ##
 @@ -58,6 +58,7 @@ void test_nullable_data(uint8_t* src_data, uint8_t* 
src_is_null, int num_rows, s
 
 ColumnWriterOptions writer_opts;
 writer_opts.encoding_type = encoding;
+writer_opts.compression_type = segment_v2::CompressionTypePB::LZ4F;
 
 Review comment:
   make LZ4F the default value for ColumnWriterOptions.compression_type so that 
we don't need to set it everywhere


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
gaodayue commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314262395
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/page_compression.h
 ##
 @@ -0,0 +1,98 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/slice.h"
+#include "util/faststring.h"
+
+namespace doris {
+
+class BlockCompressionCodec;
+
+namespace segment_v2 {
+
+// Helper to decompress a compressed page.
+// Compressed page is composed of
+// Header | CompressedData
+// Header : uncompressed data length(fixed32)
+// CompressedData: binary
+// Usage: 
+//  Slice compressed_data;
+//  PageDecompressor decompressor(compressed_data, uncomrcodec);
+//  RETURN_IF_ERROR(decompressor.init());
+//  std::string buf;
+//  buf.resize(decompressor.uncompressed_bytes());
+//  RETURN_IF_ERROR(decompress_to(buf.data()));
+class PageDecompressor {
+public:
+PageDecompressor(const Slice& data, const BlockCompressionCodec* codec)
+: _data(data), _codec(codec) {
+}
+
+// Parse and validate compressed page's header.
+// Only this funciton is executed successfully, uncompressed_bytes
+// and decompress_to can be called.
+// Return error if this page is corrupt.
+Status init();
+
+// Get uncompressed size in bytes of this page
+size_t uncompressed_bytes() const { return _uncompressed_bytes; }
+
+// Decmopress compressed data into buf whose capacity must be greater than
+// uncompressed_bytes()
+Status decompress_to(void* buf);
+private:
+Slice _data;
+const BlockCompressionCodec* _codec;
+size_t _uncompressed_bytes;
+};
+
+// Helper to build a compress page.
+// Usage:
+//  std:: raw_data;
 
 Review comment:
   ```suggestion
   //  std::vector raw_data;
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
imay commented on a change in pull request #1646: Support page compression in 
BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314316890
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/page_compression.h
 ##
 @@ -0,0 +1,98 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/slice.h"
+#include "util/faststring.h"
+
+namespace doris {
+
+class BlockCompressionCodec;
+
+namespace segment_v2 {
+
+// Helper to decompress a compressed page.
+// Compressed page is composed of
+// Header | CompressedData
+// Header : uncompressed data length(fixed32)
+// CompressedData: binary
+// Usage: 
+//  Slice compressed_data;
+//  PageDecompressor decompressor(compressed_data, uncomrcodec);
+//  RETURN_IF_ERROR(decompressor.init());
+//  std::string buf;
+//  buf.resize(decompressor.uncompressed_bytes());
+//  RETURN_IF_ERROR(decompress_to(buf.data()));
+class PageDecompressor {
+public:
+PageDecompressor(const Slice& data, const BlockCompressionCodec* codec)
+: _data(data), _codec(codec) {
+}
+
+// Parse and validate compressed page's header.
+// Only this funciton is executed successfully, uncompressed_bytes
+// and decompress_to can be called.
+// Return error if this page is corrupt.
+Status init();
+
+// Get uncompressed size in bytes of this page
+size_t uncompressed_bytes() const { return _uncompressed_bytes; }
+
+// Decmopress compressed data into buf whose capacity must be greater than
+// uncompressed_bytes()
+Status decompress_to(void* buf);
+private:
+Slice _data;
+const BlockCompressionCodec* _codec;
+size_t _uncompressed_bytes;
+};
+
+// Helper to build a compress page.
+// Usage:
+//  std:: raw_data;
 
 Review comment:
   OK


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
imay commented on a change in pull request #1646: Support page compression in 
BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314317409
 
 

 ##
 File path: be/src/util/block_compression.cpp
 ##
 @@ -319,7 +319,8 @@ class ZlibBlockCompression : public BlockCompressionCodec {
 auto zres = deflateInit(&zstrm, Z_DEFAULT_COMPRESSION);
 if (zres != Z_OK) {
 return Status::InvalidArgument(
-Substitute("Fail to do ZLib stream compress, error=$0", 
zError(zres)));
+Substitute("Fail to do ZLib stream compress, error=$0, res=",
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
imay commented on a change in pull request #1646: Support page compression in 
BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314320623
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/column_writer.h
 ##
 @@ -38,6 +39,7 @@ struct ColumnWriterOptions {
 CompressionTypePB compression_type = NO_COMPRESSION;
 bool need_checksum = false;
 size_t data_page_size = 64 * 1024;
+double min_compress_ratio = 0.9;
 
 Review comment:
   OK, I will change it


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
imay commented on a change in pull request #1646: Support page compression in 
BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314322444
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/page_compression.cpp
 ##
 @@ -0,0 +1,88 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/rowset/segment_v2/page_compression.h"
+
+#include "gutil/strings/substitute.h"
+#include "util/block_compression.h"
+#include "util/coding.h"
+
+namespace doris {
+namespace segment_v2 {
+
+using strings::Substitute;
+
+Status PageDecompressor::init() {
+if (_data.size < 4) {
+return Status::Corruption(
+Substitute("Compressed page's size is too small, size=$0, 
needed=$1",
+   _data.size, 4));
+}
+_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data);
+return Status::OK();
+}
+
+Status PageDecompressor::decompress_to(void* buf) {
+Slice compressed_slice(_data.data + 4, _data.size - 4);
+if (compressed_slice.size == _uncompressed_bytes) {
 
 Review comment:
   Actually, for this case, we will only store uncompressed data. 
   In PageCompressor, if compressed size >= uncompressed size we will store 
uncompressed data.
   I prefer this method, because this method is easy to parse and use less 
space.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
imay commented on a change in pull request #1646: Support page compression in 
BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314322807
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/segment_writer.cpp
 ##
 @@ -63,6 +63,7 @@ Status SegmentWriter::init(uint32_t write_mbytes_per_sec) {
 DCHECK(type_info != nullptr);
 
 ColumnWriterOptions opts;
+opts.compression_type = segment_v2::CompressionTypePB::LZ4F;
 
 Review comment:
   OK


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
imay commented on a change in pull request #1646: Support page compression in 
BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314324126
 
 

 ##
 File path: be/test/olap/rowset/segment_v2/column_reader_writer_test.cpp
 ##
 @@ -58,6 +58,7 @@ void test_nullable_data(uint8_t* src_data, uint8_t* 
src_is_null, int num_rows, s
 
 ColumnWriterOptions writer_opts;
 writer_opts.encoding_type = encoding;
+writer_opts.compression_type = segment_v2::CompressionTypePB::LZ4F;
 
 Review comment:
   OK


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
imay commented on a change in pull request #1646: Support page compression in 
BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314324063
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/column_writer.h
 ##
 @@ -140,8 +142,7 @@ class ColumnWriter {
 rowid_t _next_rowid = 0;
 
 const EncodingInfo* _encoding_info = nullptr;
-// const CompressionCodec* _codec = nullptr;
-// TODO(zc): compression type
+const BlockCompressionCodec* _compress_codec = nullptr;
 
 Review comment:
   I prefer to keep this field here. If we keep it here, we will find codec 
only once, otherwise we will retrieve it for each page.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
imay commented on a change in pull request #1646: Support page compression in 
BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314324798
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/column_writer.h
 ##
 @@ -38,6 +39,7 @@ struct ColumnWriterOptions {
 CompressionTypePB compression_type = NO_COMPRESSION;
 bool need_checksum = false;
 
 Review comment:
   Yeah, in next PR, I will support checksum in page, then I will rethink how 
we support checksum.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
imay commented on a change in pull request #1646: Support page compression in 
BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314326190
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/page_compression.cpp
 ##
 @@ -0,0 +1,88 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/rowset/segment_v2/page_compression.h"
+
+#include "gutil/strings/substitute.h"
+#include "util/block_compression.h"
+#include "util/coding.h"
+
+namespace doris {
+namespace segment_v2 {
+
+using strings::Substitute;
+
+Status PageDecompressor::init() {
+if (_data.size < 4) {
+return Status::Corruption(
+Substitute("Compressed page's size is too small, size=$0, 
needed=$1",
+   _data.size, 4));
+}
+_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data);
+return Status::OK();
+}
+
+Status PageDecompressor::decompress_to(void* buf) {
+Slice compressed_slice(_data.data + 4, _data.size - 4);
+if (compressed_slice.size == _uncompressed_bytes) {
+// If compressed_slice's size is equal with _uncompressed_bytes, it 
means
+// compressor store this directly without compression. So we just copy
+// this to buf and return.
+memcpy(buf, compressed_slice.data, _uncompressed_bytes);
 
 Review comment:
   It will be a little tricky. And I will make it TODO


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay merged pull request #1655: Fix get label when use StreamLoad

2019-08-15 Thread GitBox
imay merged pull request #1655: Fix get label when use StreamLoad
URL: https://github.com/apache/incubator-doris/pull/1655
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
gaodayue commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314565548
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/page_compression.cpp
 ##
 @@ -0,0 +1,88 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/rowset/segment_v2/page_compression.h"
+
+#include "gutil/strings/substitute.h"
+#include "util/block_compression.h"
+#include "util/coding.h"
+
+namespace doris {
+namespace segment_v2 {
+
+using strings::Substitute;
+
+Status PageDecompressor::init() {
+if (_data.size < 4) {
+return Status::Corruption(
+Substitute("Compressed page's size is too small, size=$0, 
needed=$1",
+   _data.size, 4));
+}
+_uncompressed_bytes = decode_fixed32_le((uint8_t*)_data.data);
+return Status::OK();
+}
+
+Status PageDecompressor::decompress_to(void* buf) {
+Slice compressed_slice(_data.data + 4, _data.size - 4);
+if (compressed_slice.size == _uncompressed_bytes) {
 
 Review comment:
   Yeah, you're right


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
gaodayue commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314565890
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/column_writer.h
 ##
 @@ -140,8 +142,7 @@ class ColumnWriter {
 rowid_t _next_rowid = 0;
 
 const EncodingInfo* _encoding_info = nullptr;
-// const CompressionCodec* _codec = nullptr;
-// TODO(zc): compression type
+const BlockCompressionCodec* _compress_codec = nullptr;
 
 Review comment:
   OK


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1646: Support page compression in BetaRowset

2019-08-15 Thread GitBox
gaodayue commented on a change in pull request #1646: Support page compression 
in BetaRowset
URL: https://github.com/apache/incubator-doris/pull/1646#discussion_r314566101
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/column_writer.h
 ##
 @@ -38,6 +39,7 @@ struct ColumnWriterOptions {
 CompressionTypePB compression_type = NO_COMPRESSION;
 bool need_checksum = false;
 
 Review comment:
   OK


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] morningman opened a new pull request #1658: FROM_UNIXTIME should only convert timestamp from 0 to 253402271999

2019-08-15 Thread GitBox
morningman opened a new pull request #1658: FROM_UNIXTIME should only convert 
timestamp from 0 to 253402271999
URL: https://github.com/apache/incubator-doris/pull/1658
 
 
   which is between 1900-01-01 00:00:00 ~ -12-31 23:59:59, otherwise, 
return null


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1658: FROM_UNIXTIME should only convert timestamp from 0 to 253402271999

2019-08-15 Thread GitBox
imay commented on a change in pull request #1658: FROM_UNIXTIME should only 
convert timestamp from 0 to 253402271999
URL: https://github.com/apache/incubator-doris/pull/1658#discussion_r314569259
 
 

 ##
 File path: be/src/runtime/datetime_value.cpp
 ##
 @@ -1544,6 +1544,10 @@ bool DateTimeValue::unix_timestamp(int64_t* timestamp, 
const std::string& timezo
 }
 
 bool DateTimeValue::from_unixtime(int64_t timestamp, const std::string& 
timezone) {
+// timestamp should between 1900-01-01 00:00:00 ~ -12-31 23:59:59
 
 Review comment:
   ```suggestion
   // timestamp should between 1970-01-01 00:00:00 ~ -12-31 23:59:59
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] morningman closed pull request #1642: add kafka_default_offsets when no partiotion specify .support read kafka partition from start

2019-08-15 Thread GitBox
morningman closed pull request #1642: add kafka_default_offsets  when no 
partiotion specify .support read kafka partition from start
URL: https://github.com/apache/incubator-doris/pull/1642
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] wkhappy1 opened a new pull request #1642: add kafka_default_offsets when no partiotion specify .support read kafka partition from start

2019-08-15 Thread GitBox
wkhappy1 opened a new pull request #1642: add kafka_default_offsets  when no 
partiotion specify .support read kafka partition from start
URL: https://github.com/apache/incubator-doris/pull/1642
 
 
   add kafka_default_offsets  when no partiotion specify
   value OFFSET_BEGINNING,OFFSET_END


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay opened a new pull request #1659: Remove tempory fail UT

2019-08-15 Thread GitBox
imay opened a new pull request #1659: Remove tempory fail UT
URL: https://github.com/apache/incubator-doris/pull/1659
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay merged pull request #1659: Remove tempory fail UT

2019-08-15 Thread GitBox
imay merged pull request #1659: Remove tempory fail UT
URL: https://github.com/apache/incubator-doris/pull/1659
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] yuanlihan opened a new pull request #1635: Enable parsing columns from file path for Broker Load (#1582)

2019-08-15 Thread GitBox
yuanlihan opened a new pull request #1635: Enable parsing columns from file 
path for Broker Load (#1582)
URL: https://github.com/apache/incubator-doris/pull/1635
 
 
   Currently, we do not support parsing encoded/compressed columns in file 
path, eg: extract column k1 from file path /path/to/dir/k1=1/xxx.csv
   
   This patch is able to parse columns from file path like in Spark(Partition 
Discovery).
   
   This patch parse partition columns at BrokerScanNode.java and save parsing 
result of each file path as a property of TBrokerRangeDesc, then the broker 
reader of BE can read the value of specified partition column.
   
   (I'm sorry to create a new pr about this issue for being not familiar with 
`git rebase` )


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] yuanlihan closed pull request #1635: Enable parsing columns from file path for Broker Load (#1582)

2019-08-15 Thread GitBox
yuanlihan closed pull request #1635: Enable parsing columns from file path for 
Broker Load (#1582)
URL: https://github.com/apache/incubator-doris/pull/1635
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] morningman merged pull request #1642: add kafka_default_offsets when no partiotion specify .support read kafka partition from start

2019-08-15 Thread GitBox
morningman merged pull request #1642: add kafka_default_offsets  when no 
partiotion specify .support read kafka partition from start
URL: https://github.com/apache/incubator-doris/pull/1642
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1657: Doc review

2019-08-15 Thread GitBox
imay commented on a change in pull request #1657: Doc review
URL: https://github.com/apache/incubator-doris/pull/1657#discussion_r314600486
 
 

 ##
 File path: doc_review
 ##
 @@ -0,0 +1 @@
+see issue:https://github.com/apache/incubator-doris/issues/1656
 
 Review comment:
   I think this file is useless, can you please remove it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay merged pull request #1657: Doc review

2019-08-15 Thread GitBox
imay merged pull request #1657: Doc review
URL: https://github.com/apache/incubator-doris/pull/1657
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org