[GitHub] carbondata-site issue #46: [CARBONDATA-1336] add subscribe to issue list

2017-07-28 Thread xuchuanyin
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata-site/pull/46 hi, @jatin9896 I only changed this file(In the commit ), but after running `carbonscript.sh`, I found other files changed as well. Should I just commit this file that changed

Re: [DISCUSSION] Update the function of show segments

2017-09-21 Thread xuchuanyin
If adding a new statement, I suggest to learn from hive: desc formatted table_name; VS desc table_name; Show segment... VS Show formatted segment... On 09/21/2017 14:02, Ravindra Pesala wrote: Hi, I agree with Jacky and David. But it is suggested to keep current 'show segments' command

Re: [DISCUSSION] Unify the sort column and sort scope in create table command

2017-08-31 Thread xuchuanyin
The two options both prefer to make all the sortscope in all segments (loads) same. Since carbondata supports different sortscope in different segment (load), I think there should be a third option. Option 3: The sortscope in load data command is in higher priority than that specified in

[Discussion]Compression for sort temp files in Carbondata

2017-12-19 Thread xuchuanyin
Hi, dev: Recently I found the bug in compressing sort temp file and tried to fix this bug in PR#1632 (https://github.com/apache/carbondata/pull/1632). In this PR, Carbondata will compress the records in batch and write the compressed content to file if we turn on this feature. However, I found

[Discussion]Compression for sort temp files in Carbomdata

2017-12-19 Thread xuchuanyin
Hi, dev: Recently I found the bug in compressing sort temp file and tried to fix this bug in PR#1632 (https://github.com/apache/carbondata/pull/1632). In this PR, Carbondata will compress the records in batch and write the compressed content to file if we turn on this feature. However, I found

Re: [VOTE] Apache CarbonData 1.4.0(RC2) release

2018-05-23 Thread xuchuanyin
+1 FROM MOBILE EMAIL CLIENT 在2018年05月23日 03:41,Ravindra Pesala 写道: Hi I submit the Apache CarbonData 1.4.0 (RC2) for your vote. 1.Release Notes: *https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=1234100

Re: [Discussion] Carbon Local Dictionary Support

2018-06-06 Thread xuchuanyin
Hi, Kumar: Can you raise a Jira and provide the document as attachment? I cannot open the links since it is blocked.

Re: [Discussion] Carbon Local Dictionary Support

2018-06-07 Thread xuchuanyin
About query filtering 1. “during filter, actual filter values will be generated using column local dictionary values...then filter will be applied on the dictionary encode data” --- If the filter is not 'equal' but 'like','greater than', can it also run on encode data. 2. "As dictionary data

Re: [Discussion] Carbon Local Dictionary Support

2018-06-04 Thread xuchuanyin
Hi, Kumar: Local dictionary will be nice feature and other formats like parquet all support this. My concern is that: How will you implement this feature? 1. What's the scope of the `local`? Page level (for all containing rows), Blocklet level (for all containing pages), Block level(for

Grammar about supporting string longer than 32000 characters

2018-05-01 Thread xuchuanyin
Hi, community: I'm implementing supporting string longer than 32000 characters in carbondata and have a question about the grammar of this feature. Here I'd like to explain it and want to receive your feedbacks. DESCRIPTION: In previous implementation, carbondata internally uses a short to

Re: can CarbonThriftServer configure the max number of submit task at the same time?

2018-05-03 Thread xuchuanyin
Maybe you can try `hive.server2.thrift.max.worker.threads` and set a smaller value for it. You can configure it in hive-site.xml or pass the configuration through ??hiveconf when you start the thrift-server. At last, you need to find out the root cause of the failed sqls. 60 concurrent

Re: Grammar about supporting string longer than 32000 characters

2018-05-03 Thread xuchuanyin
In traditional RDBMS, varchar(N) means the value contains at least N characters, at the DBMS will truncate the value if its length is longer than N. Will we implement like this too? Truncate the string value to N if its length is longer than N?

Re: Create Carbon data with complex data type (Array )

2018-05-03 Thread xuchuanyin
Yes, it is really a bug. You can raise a jira for this problem. I tried the following queries and they are OK. Hope it will help you bypass the bug. ``` create table IF NOT EXISTS test.Account(CAP_CHARGE Array,CAP_CR_INT Array) partitioned by (current_dt DATE) STORED BY 'carbondata' ``` ```

Re: Refactored SegmentPropertiesFetcher to resolve pruning problem post the carbon schema restructure.

2018-05-03 Thread xuchuanyin
Will delete/update affect the schema? What's the meaning of 'schema' here? -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: New-bie JIRAs for new contributors

2018-08-02 Thread xuchuanyin
I think you can start by reviewing the project docs. If there are any problems, you can raise jira to fix it. OR if you have some problems in understanding the docs, you can ask questions in the mailing list. If it is really problem, you can raise jira to fix it. -- Sent from:

Re: questions about multi thriftserver query same table

2018-08-07 Thread xuchuanyin
1. I think it's OK to query one table through two sessions. 2.You can refer to Carbon-SDK. The performance is undocumented now. You can try it and give feed back. If it is below expectation, we may try to optimize it later. -- Sent from:

Re: Operation not allowed: STORED BY (from Spark Dataframe save)

2018-08-15 Thread xuchuanyin
Hi, did you create the dataframe through SparkSession? If it is so, you do better to create that with CarbonSession, which extends the SparkSession. You may need to refer to https://github.com/apache/carbondata/blob/master/docs/quick-start-guide.md -- Sent from:

Re: [SUGGESTION]Support Decoder based fallback mechanism in local dictionary

2018-08-27 Thread xuchuanyin
This means, no need to keep the actual data along with encoded data in encoded column page. --- A problem is that, currently index datamap needs the actual data to generate index. You may affect this procedure if you do not keep the actual data. -- Sent from:

Re: [DISCUSSION] Updates to CarbonData documentation and structure

2018-09-04 Thread xuchuanyin
I think even we split the carbondata command into DDL and DML, it is still too large for one document. For example, there are many TBLProperties for creating table in DDL. Some descriptions of the TBLProperties is long and now we do not have TOC for them. It's difficult to locate one property in

Re: [DISCUSSION] Remove BTree related code

2018-09-04 Thread xuchuanyin
I find the PR in github and leave a comment. Here I copy the comments: I have doubt about the below scenario: For sort_columns, the minmax is ordered for all the blocks/blocklets in one segment. Suppose that we are doing filtering on sort_columns and the filter looks like Col1='bb'. If the

Re: Feature Proposal: CarbonCli tool

2018-09-04 Thread xuchuanyin
In the above example, you specify one directory and get two segments. But it only shows one schema info. I thought the number of the schema is the same as that of data directories. Since you mentioned that we can support nested folder, what if the schema in these files are not the same? Another

Re: Change the 'comment' content for column when execute command 'desc formatted table_name'

2018-07-06 Thread xuchuanyin
Then what's the final output looks like? -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: CarbonStore Java & REST API proposal

2018-07-06 Thread xuchuanyin
Hi, jacky, please check the following comments: 1. Do we need to provide other inferfaces, such as `listTable`, `renameTable`... 2. What's the difference between the function of 'Carbon-SDK' and 'CarbonStore' As for the CarbonStore API `createTable`: 3. Will it make use of the existing

Re: [Discussion] Carbon Local Dictionary Support

2018-07-12 Thread xuchuanyin
Hi, kumarvishal: As the local dictionary feature will be released in 1.4.1, Is there any difference between the implementation and the previous design document? I'm trying to understand the implementation of local dictionary. If there is any difference, please help to update the document in

Re: Index file cache will not work when the table has invalid segment.

2018-07-12 Thread xuchuanyin
Hi, liang, I think it may be a problem. The segment with LOAD_FAILED should not affect the query on the normal segment. In the previous mail, the second data loading is successful and query on this segment should use the index file cache. Besides, if the dataloading is failed, will the failed

Re: [Discussion] About syntax of compaction on specified segments

2018-03-13 Thread xuchuanyin
Hi, all: Here I am to make a conclusion of my opinion and provide option 4. Option 4: 4) Extending existing SQL syntax of Major and Minor compaciton based on syntax of delete segment: ALTER TABLE tablename COMPACT 'MAJOR' WHERE SEGMENT.ID IN (1,2,3,4) ALTER TABLE tablename COMPACT 'MINOR'

Re: query on string type return error

2018-04-16 Thread xuchuanyin
I think the problem may be metadata related. What's your thrift version? Have you update carbon version recently after the data is loaded? FROM MOBILE EMAIL CLIENT On 04/16/2018 15:51, Liang Chen wrote: Hi From the log message, seems like can't find the data files. Can you provide more detail

Re: Proposal to integrate QATCodec into Carbondata

2018-10-11 Thread xuchuanyin
emm, if it only needs to extend another compressor for software implementation, I think it will be quite easy to integrate. Actually a PR has already been raised weeks ago to support customize compressor in carbondata, you can refer to this link: https://github.com/apache/carbondata/pull/2715.

Re: [Issue] Long string columns config for big strings not work

2018-10-12 Thread xuchuanyin
Hi, aaron: A PR has been raised for this issue https://github.com/apache/carbondata/pull/2812, please check. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

[Proposal] Proposal to change default value of two parameters for data loading

2018-10-15 Thread xuchuanyin
Hi, all: About a year ago, we introduced 'multiple dirs for temp data' to solve disk hotspot problem in data loading. This feature enables carbon randomly pick one of the local directories configured in yarn-local-dirs when it writes any temp files to disk (for example: sort temp files and fact

Re: [Proposal] Proposal to change default value of two parameters for data loading

2018-10-16 Thread xuchuanyin
Yes, it needs further modification to meet the requirement -- an additional property is needed to handle this, we can configure multiple directories there. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Propose configurable page size in MB (via carbon property)

2018-10-19 Thread xuchuanyin
Hi, ajantha. I just go through your PR and think we may need to rethink about this feature especially its impact. I leaved a comment under your PR and will paste it here for further communication in community. I'm afraid that in common scenarios even we do not face the page size problems and

Re: Propose configurable page size in MB (via carbon property)

2018-10-22 Thread xuchuanyin
OK, anyway please take care of the loading performance. The validation can only be checked for those fields that may cross the boundary (e.g. varchar and complex), and for the ordinary fields, just skip the validation. -- Sent from:

Re: [Issue] Long string columns config for big strings not work

2018-10-12 Thread xuchuanyin
Hi, arron, I go through the code and find the root cause. While writing dataframe to carbontable, we have to keep the order of the fields in dataframe the same as that in carbontable. The code lies in `NewCarbonDataLoadRDD.scala#486`. This is because we judge whether the field is a

[Discussion] How to configure the unsafe working memory for data loading

2018-10-23 Thread xuchuanyin
Hi all, I go through the code and get another formula to estimate the unsafe working memory. It is inaccurate too but we can open this thread to optimize it. # Memory Required For Data Loading per Table ## version from Community (carbon.number.of.cores.while.loading) *

Re: [Discussion] Provide separate audit log

2018-10-31 Thread xuchuanyin
+1 I've few questions about this: 1. Is it OK to call it 'tableId' or 'table' 2. For what kind of statements will you audit the operations? -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Refactor dynamic configuration

2018-10-31 Thread xuchuanyin
Does the annotations have other effects other than just providing literal coding contract? For example we can use these annotations to 1. generate docs 2. restrict some operations (for example some configurations should not support SET command) 3. limit scope for usage (for example some

Re: [Discussion] Encryption support for carbondata files

2018-10-31 Thread xuchuanyin
Instead of supporting encryption, I think carbondata can provide another common feature: A framework that support some hooks while reading/writing column chunk. User can specify the hooks while creating table and implement the encryption feature as a special instance as they need. -- Sent

Re: [Discussion] CarbonReader performance improvement

2018-10-31 Thread xuchuanyin
A question here: """ 3. Add concurrent reading functionality to Carbon Reader. This can be enabled by passing the number of splits required by the user. If the user passes 2 as the split for reader then the user would be returned 2 CarbonReaders with equal number of RecordReaders in each. The

Re: [DISCUSSION] Refactory on spark related modules

2018-10-31 Thread xuchuanyin
+1 -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Enhancement on compaction performance

2018-11-08 Thread xuchuanyin
Hi, all: The previous experiment uses 3 huawei ecs instances as workers each with 16 cores and 32GB. Spark executor use 12 cores and 24GB. Using 74GB LineItem in 100GB TPCH. Today I run another experiment using 1 huawei RH2288 machine with 32 cores and 128GB. Spark executor use 30 cores and

Re: [ISSUE] carbondata1.5.0 and spark 2.3.2 query plan issue

2018-11-05 Thread xuchuanyin
Hi, aaron. For the wrong pruning information statistics in the query plan, do you execute the queries concurrently? I noticed that the pruning collector is single thread, if you ran queries concurrently, the statistics for pruning will be incorrect. -- Sent from:

Re: [Discuss] Removing search mode

2018-11-06 Thread xuchuanyin
+1 Q1: When will we start and finish the optimization in carbon-presto integration? Any plan for this? Another question: Q2: Is it possible to use carbon reader to implement the similar function of search mode? -- Sent from:

Re: Enhancement on compaction performance

2018-11-08 Thread xuchuanyin
Oh, I didn't notice the memory consumption at that time. We all know that the resource utilization is low during compaction. Using prefetch means that We are doing query background and it will surely consume more resources. Current size of prefetch is controlled by the 'carbon.detail.batch.size'

Re: [DISCUSSION] refining usage of numberofcores in CarbonProperties

2018-11-08 Thread xuchuanyin
In addition to the last mail, for the numCoresOfAlterPartition, you can handle it similarly. Please remember to fix these in another PR, not in PR#2907. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Enhancement on compaction performance

2018-11-07 Thread xuchuanyin
Hi all: I am raising a PR to enhance the performance of compaction. The PR number is #2906. Based on my experiments using about 72GB LineItem data ( in 100GB TPCH data), I got the following results. Code Branch PrefetchBatch Size (default 100)Load1 (s) Load2 (s)

Re: [Issue] Long string columns config for big strings not work

2018-10-11 Thread xuchuanyin
Yeah, aaron, the problem may lies in the dataframe and long_string_columns. Can you try the following statement? It is from the test code in 'VarcharDataTypesBasicTestCase', which suggests you to specify the 'long_string_columns' while writing the dataframe. ```scala test("write from dataframe

Re: error occur when I load data to s3

2018-09-03 Thread xuchuanyin
Did you build carbon with -Pbuild-with-format? it introduced Map datatype and changed the thrift, so you need to add it. On 09/04/2018 09:10, aaron wrote: Compile failed. My env is, aaron:carbondata aaron$ java -version java version "1.8.0_144" Java(TM) SE Runtime Environment (build

Re: [Issue] Bloomfilter datamap

2018-09-25 Thread xuchuanyin
Hi, arron. Actually your query will not use the time series datamap since the filter use filed 'product_id' which is not contained in your preagg datamap. Even I remove the preagg datamap, the query with bloomfilter datamap still failed with the same error logs as that in your post. Then I add

Re: [Issue] Enable/disable datamap not work on 1.5.0-SNAPSHOT

2018-09-25 Thread xuchuanyin
enable/disable datamap only works for index datamap, we do not support other types of datamap such as preagg/MV/ default Block/Blocklet datamap yet. If you found any confusing document in datamap, you can help to revise it. -- Sent from:

Re: [Issue] Bloomfilter datamap

2018-09-25 Thread xuchuanyin
More details about this issue. I've add some logs in `BloomCoarseGrainDataMap.createQueryModel` to print the input parameter 'expression'. # Before applying PR2665 ``` XU expression: org.apache.carbondata.core.scan.expression.logical.AndExpression@3b035d0c XU expression

Re: [Issue] Bloomfilter datamap

2018-09-25 Thread xuchuanyin
Yeah, I am able to reproduce this problem using current master code. I'll look into it. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Issue] Bloomfilter datamap

2018-09-25 Thread xuchuanyin
hi, aaron, thanks for your feedback. Which version of carbondata are you using? -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Issue] Bloomfilter datamap

2018-09-25 Thread xuchuanyin
Did you use the query in the first post? I tested it and it's OK, we can see the bloomfilter while explaining. 1. If bloomfilter is not there, the reason may be that the main datamap has already pruned all the blocklets. In this case, the following index datamap will be skipped for shortcut.

Re: [Issue] Bloomfilter datamap

2018-09-25 Thread xuchuanyin
You can download the patch and apply it to master, then you can rebuild the jar and perform testing. On Tue, Sep 25, 2018 at 5:02 PM +0800, "aaron" <949835...@qq.com> wrote: Great! thanks for your so quick response! I will have a try. Do you mean that I merge

Re: [DISCUSS] Move to gitbox as per ASF infra team mail

2019-01-04 Thread xuchuanyin
+1 seems the committers only need to change the url for asf repo, that's OK. On 5/1/2019 10:08, Liang Chen wrote: Hi all, Background : http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/NOTICE-Mandatory-migration-of-git-repositories-to-gitbox-apache-org-td72614.html

Re: [Discussion] Make 'no_sort' as default sort_scope and keep sort_columns as 'empty' by default

2018-12-16 Thread xuchuanyin
I think we can just rephrase the proposal. We want to make the `sort_columns` by default is empty, that is to say if the user does not explicitly specify the sort_columns, the corresponding property will be 'sort_columns'=''. And when the sort_columns is empty, carbondata will use no_sort for it

RE: [Discussion] Make 'no_sort' as default sort_scope and keepsort_columns as 'empty' by default

2018-12-17 Thread xuchuanyin
I think the no_sort is default only in case if the user doesnot specify the sort_columns explicitly. Not for all the scenarios, right? +1 for keeping the ‘sort_columns’ unchanged cause the fields in sort_columns have different encoding strategy compared with others. @Ajantha, Please make a

RE: [Discussion] Bloom memory and pruning optimisation using hierarchical pruning.

2018-11-29 Thread xuchuanyin
Hi, Ravindra Using hierarchical index was in our previous plan too, We wanted to build Block/task level index at the same time, but we postponed this feature due to the following reasons: 1. It requires different configurations (bloom_size, bloom_fpp) for different index level and it will

Re: 回复: [Discussion] How to configure the unsafe working memory for dataloading

2018-12-04 Thread xuchuanyin
Hi, What's the number of cores in your executor? And is there only one loading while you encounter this failure? Besides, can you check if the local dictionary is enabled for your table using 'desc formatter table_name'? If it is enabled, more memory will be needed and the provided formula does

RE: [DISCUSSION] Support DataLoad using Json for CarbonSession

2018-12-05 Thread xuchuanyin
Each time we introduce a new feature, I do like to know the final usage for the user. So what’s the grammar to load a json file to carbon? Moreover, there maybe more and more kind of datasources in the future, so can we just keep the integration simple by 1. Reading the input files using spark

RE: [SUGGESTION]Support compaction no_sort

2018-12-05 Thread xuchuanyin
What’s your proposal for the corresponding grammar to do that? Besides, if we only sort after compaction, will it be proper to keep the sort_scope in table level? It should be in segment level in this situation and keep it in table level will confuse the user. How do you consider this? Sent

RE: [SUGGESTION]Support compaction no_sort

2018-12-05 Thread xuchuanyin
So what’s your proposal for the grammar of this feature? Do you want carbon to do it silently without any configurations or choices from user? What I am concerned about is that the performance of compaction. If user use auto-compaction, the loading will be more delayed if we do compaction using

RE: [Discussion] Bloom memory and pruning optimisation using hierarchical pruning.

2018-12-03 Thread xuchuanyin
create task level bloom with the same configuration along with blocklet bloom. === Since the number of distinct values in the task level is much bigger than that in blocklet level, using the same configuration may cause the task level bloomfilter work inefficiently. This is just what I’m

RE: [VOTE] Apache CarbonData 1.5.1(RC2) release

2018-12-01 Thread xuchuanyin
Hi, please consider this line of code: https://github.com/apache/carbondata/blob/master/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java#L78 It uses apache-common-log directly instead of carbondata log. I’m not sure about the impact of this. Please take care of this

Re: [proposal] Parallelize block pruning of default datamap in driver for filter query processing.

2018-11-22 Thread xuchuanyin
'Parallelize pruning' is in my plan long time ago, nice to see your proposal here. While implementing this, I'd like you to make it common, that is to say not only default datamap but also other index datamaps can also use parallelize pruning. -- Sent from:

RE: [Proposal] Thoughts on general guidelines to follow in ApacheCarbonData community

2018-11-18 Thread xuchuanyin
Hi ravin, Very nice to see this proposal in community! The guidelines are better if they are easy to be performed. Even though I care more about the code quality, I do also care about the convenience for developers to contribute. After I go through the points, I think 1,3,5,8,9,10 : +1 2,4,6:

Re: [DISCUSS] Java files have different import order from Scala files

2019-01-08 Thread xuchuanyin
I think it's a good proposal, but it will introduce too many changes. In my opinion, the different order between Java and Scala files is acceptable since it will not cause serious (even minor) problems. Anyway, thanks for your investigation on this. But it's hard to tell whether we should

Re: [ANNOUNCE] Chuanyin Xu as new PMC for Apache CarbonData

2019-01-03 Thread xuchuanyin
Thanks ALL! -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Support Zstd as Column Compressor

2018-09-12 Thread xuchuanyin
Snappy and Zstd both know the decompress size of the content since they stored that size along with the compressed content. But LZ4 didn't do this, you can refer to the issue#26 in the lz4-java github page. To work around this, You can store the original size in metadata for decompression.

Re: Support Zstd as Column Compressor

2018-09-12 Thread xuchuanyin
Yeah. Zstd and Snappy knows the size of decompress size from the compressed data, but LZ4 don't. I find a link to describe this: https://github.com/lz4/lz4-java/issues/26 To work around with LZ4, you can go with your proposal and save the decompress size in the meta. But I'd like to wrap the LZ4

Re: CarbonWriterBuild issue

2018-09-12 Thread xuchuanyin
Yeah, it actually belongs to 'Builder Pattern'. We should simplify this before they are widely used. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Support for Float and Byte data types

2018-09-13 Thread xuchuanyin
The actual storage datatype for that column is stored in ColumnPage level. In previous implementation, columns with literal datatype 'float' and 'double' shared the same storage datatype 'double' and you want to distinguish them by adding support for storage datatype 'float'. Is my understanding

Re: Support Zstd as Column Compressor

2018-09-12 Thread xuchuanyin
As a result of the latest implementation, I store the compressor name in the thrift and the old enum for compression_codec has been deprecated. This makes it easier to support other compressors. Take LZ4 for example, the following changes are required: 1 Implement Lz4Compressor 2 Add

Re: [DISCUSSION] Support Compaction for Range Sort

2019-04-04 Thread xuchuanyin
Hi ManishNalla: """ merging the overlapping intervals and getting new intervals(ranges) out of them """ === What do you mean by saying this? Can you give an example for it. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] DDLs to operate on CarbonLRUCache

2019-02-18 Thread xuchuanyin
+1 for advices from manish -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

RE: Re:[DISCUSSION] Support Incremental load in datamap and other MV datamap enhancement

2019-02-18 Thread xuchuanyin
I think there is still misunderstanding between us. Here I only concern about the lazy build for index datamap. I think each segment should has its own datamap status and based on this we can support pruning by index datamap for each segment. After this, even the datamap is lazy, during query we

RE: Re:[DISCUSSION] Support Incremental load in datamap and other MV datamap enhancement

2019-02-19 Thread xuchuanyin
+1 for ravin's advice. We only support lazy/incremental load/rebuild for olap datamap (MV/preagg), not for index datamap currently. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Distributed Index Cache Server

2019-02-20 Thread xuchuanyin
Hi kunal, At last I'd suggest again that the code for pruning procedure should be moved to a separate module. The earlier we do this, the easier will be if we want to implement other types of IndexServer later. -- Sent from:

Re: [DISCUSSION] Distributed Index Cache Server

2019-03-05 Thread xuchuanyin
+1 looking forward for the PRs for that -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Distributed Index Cache Server

2019-02-12 Thread xuchuanyin
Hi Kunal, IndexServer is quiet an efficient method to solve the problem of index cache and it's great that someone finally tries to implement this. However after I went through your design document, I get some questions for this and I'll explain those as following: 1. For the 'backgroud'

Re: [DISCUSSION] Distributed Index Cache Server

2019-02-12 Thread xuchuanyin
Hi kunal, can you attach the document directly to the jira? I cannot access the doc on google drive. Thanks. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: there are plans to support the spark 2.4?

2019-07-08 Thread xuchuanyin
this reply is just for testing the the functionality of mailing list. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Cache Pre Priming

2019-08-17 Thread xuchuanyin
Hi, I've two questions about the current index server implementation: 1. Currently do we need to load all the index data of all segments to cache server while doing filter query OR only load the segments required by this query? 2. When do we trigger the cache loading action during the query? As

Re: [Discussion] Roadmap for Apache CarbonData 2

2019-08-16 Thread xuchuanyin
Hi, so glad to see Carbondata will enter stage 2.x and I have the following suggestions for your consideration as following: 1. Evolution for Carbondata file format. Previously I thought one of the key highlights of Carbondata is the Carbondata file format, is there any evolution for that? While

Re: 【Web Issues】show datamaps command should be show datamap

2019-09-11 Thread xuchuanyin
yeah, please feel free to correct it, do not forget to correct all the 'show datamaps' (at least 5 occurrences) in the project. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Support heterogeneous format segments in carbondata

2019-09-11 Thread xuchuanyin
Hi, ravipesala, previously I have a similar proposal, please check if this can make any help: https://gist.github.com/xuchuanyin/cb264f2d7e94d6e185a55ea962e91ce1 Besides, for the problem in your proposal, the user can create a `table_with_old_format_data` and create another

Re: [DISCUSSION] implement MERGE INTO statement

2019-09-11 Thread xuchuanyin
+1 with ravipesala, please use corresponding hive grammar and take delta grammar as reference. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-28 Thread xuchuanyin
Hi akash, glad to see the feature proposed and I have some questions about this. Please notice that some of the following descriptions are comments followed by '===' described in the design document attached in the corresponding jira. 1. "Currently carbondata supports timeseries on preaggregate

Re: [DISCUSSION]Support for Geospatial indexing

2019-11-12 Thread xuchuanyin
Sorry that I cannot access the document in jira. In my opinion, both for the SORT_COLUMNS in current implementation and for the LOCATION_COLUMNS in the proposal, carbondata tries to organize the data in some order. So the kernel of the proposal is that, for the SORT_COLUMNS, we can specify a

Re: Concurrent data loading issues

2019-11-12 Thread xuchuanyin
Hi, concurrent load will not cause the problem and I've tried that months ago. Seen from the log, it seems that the problem lies in Compation that automatically triggered after loading. To solve the problem, I think you can: 1. firstly turn off auto-compaction to increase loading performance,

Re: [DISCUSSION] Page Level Bloom Filter

2019-11-11 Thread xuchuanyin
+1 for this feature. Additionally, It is described in your draft that "specify the bloom columns using table properties", I do recommend that for the first phrase, we should not use this information from table properties while querying. We can store the index information in blocklet(or page)

Re: Propose feature change in CarbonData 2.0

2019-12-01 Thread xuchuanyin
Glad to see you making this proposal! The features you mentioned are really not popular even the heavy user neither try them nor know their usage. For 1/2/3/4/5.1/5.2/7, we can remove this features with their code. But if we consider compatibility, the query processing will still be complex. How

[jira] [Created] (CARBONDATA-1281) Disk hotspot found during data loading

2017-07-09 Thread xuchuanyin (JIRA)
xuchuanyin created CARBONDATA-1281: -- Summary: Disk hotspot found during data loading Key: CARBONDATA-1281 URL: https://issues.apache.org/jira/browse/CARBONDATA-1281 Project: CarbonData

[jira] [Created] (CARBONDATA-1267) Failure in data loading due to bugs in delta-integer-codec

2017-07-05 Thread xuchuanyin (JIRA)
xuchuanyin created CARBONDATA-1267: -- Summary: Failure in data loading due to bugs in delta-integer-codec Key: CARBONDATA-1267 URL: https://issues.apache.org/jira/browse/CARBONDATA-1267 Project

[jira] [Created] (CARBONDATA-1114) Failed to run tests in windows env

2017-05-31 Thread xuchuanyin (JIRA)
xuchuanyin created CARBONDATA-1114: -- Summary: Failed to run tests in windows env Key: CARBONDATA-1114 URL: https://issues.apache.org/jira/browse/CARBONDATA-1114 Project: CarbonData Issue

[jira] [Created] (CARBONDATA-1167) Mismatched between class name and logger class name

2017-06-13 Thread xuchuanyin (JIRA)
xuchuanyin created CARBONDATA-1167: -- Summary: Mismatched between class name and logger class name Key: CARBONDATA-1167 URL: https://issues.apache.org/jira/browse/CARBONDATA-1167 Project: CarbonData