Re: [Serious Issue] Rows disappeared

2018-09-27 Thread Ajantha Bhat
Hi Aaron, Thanks for reporting issue. Can you help me narrow down the issue? as I cannot reproduce locally with the information given in your mail. a) First can you disable local dictionary and try the same scenario? b) Can drop datamp and try the same scenario? -- If data is coming from data map

Re: [Serious Issue] Rows disappeared

2018-09-27 Thread Ajantha Bhat
@Aaron: I was able to reproduce the issue with my own dataset. (total 350 KB data) Issue is nothing to do with local dictionary. I have narrowed down the scenario, it is with sort columns + compaction. I will fix soon and update you Thanks, Ajantha On Thu, Sep 27, 2018 at 8:05 PM Kumar Vishal

Re: Propose configurable page size in MB (via carbon property)

2018-10-22 Thread Ajantha Bhat
Hi xuchuanyin, Thanks for your inputs. Please find some details below. 1. Already there was a size based validation in code for each row processing. In 'isVarCharColumnFul()' method. It was checking only for varchar columns. Now I am checking complex as well as string columns. 2. The logic is

[Discussion] Encryption support for carbondata files

2018-10-30 Thread Ajantha Bhat
*Background:* Currently carbondata files are not encrypted. If anyone has carbon reader, they can read the carbondata files. If the data has sensitive information, that data can be encrypted with the crypto key. So, that along with carbon reader this key is required to decrypt and read the data.

Propose configurable page size in MB (via carbon property)

2018-10-11 Thread Ajantha Bhat
Hi all, For better in-memory processing of carbondata pages, I am proposing configurable page size in MB (via carbon property). The detail background, problem and solution is added in the design document. Document is attached in the below JIRA.

Re: CarbonWriterBuild issue

2018-09-20 Thread Ajantha Bhat
Also now we that we support Hadoop conf, we don't require below API. we can remove them from CarbonWriterBuilder. *setAccessKeysetAccessKeysetSecretKeysetSecretKeysetEndPointsetEndPoint* Thanks, AB On Thu, Sep 20, 2018 at 11:16 PM Ajantha Bhat wrote: > > @xuchuanyin: > >

Re: [Serious Issue] Rows disappeared

2018-09-28 Thread Ajantha Bhat
@Aaron: Please find the issue fix changes in the below PR. *https://github.com/apache/carbondata/pull/2784 * I added a test case also and it is passed after my fix. Thanks, Ajantha On Fri, Sep 28, 2018 at 4:57 AM aaron <949835...@qq.com> wrote:

Re: [Discussion] Make 'no_sort' as default sort_scope and keep sort_columns as 'empty' by default

2018-12-17 Thread Ajantha Bhat
@Liang: yes, your understanding of my proposal is correct. Why remove empty sort_columns? if user specifies empty sort columns, I should throw an exception saying sort_columns specified not present? I feel no need to remove empty sort columns, by default we set sort_columns as empty sort_columns

[Discussion] Make 'no_sort' as default sort_scope and keep sort_columns as 'empty' by default

2018-12-11 Thread Ajantha Bhat
Hi all, Currently in carbondata, we have 'local_sort' as default sort_scope and by default, all the dimension columns are selected for sort_columns. This will slow down the data loading. *To give the best performance benefit to user by default values, * we can change sort_scope to 'no_sort' and

Re: [DISCUSSION] Support DataLoad using Json for CarbonSession

2018-12-05 Thread Ajantha Bhat
Hi, +1 for the JSON proposal in loading. This can help in nested level complex data type loading. Currently, CSV loading supports only 2 level delimiter. JSON loading can solve this problem. While supporting JSON for SDK, I have already handled your point 1) and 3) you can refer and use the

[carbondata-presto enhancements] support reading carbon SDK writer output in presto

2018-12-09 Thread Ajantha Bhat
Currently, carbon SDK files output (files without metadata folder and its contents) are read by spark using an external table with carbon session. But presto carbon integration doesn't support that. It can currently read only the transactional table output files. Hence we can enhance presto to

Re: [proposal] Parallelize block pruning of default datamap in driver for filter query processing.

2018-11-22 Thread Ajantha Bhat
@xuchuanyin Yes, I will be handling this for all types of datamap pruning in the same flow when I am done with default datamap's implementation and testing. Thanks, Ajantha On Fri, Nov 23, 2018 at 6:36 AM xuchuanyin wrote: > 'Parallelize pruning' is in my plan long time ago, nice to see your

[proposal] Parallelize block pruning of default datamap in driver for filter query processing.

2018-11-20 Thread Ajantha Bhat
Hi all, I want to propose *"Parallelize block pruning of default datamap in driver for filter query processing"* *Background:* We do block pruning for the filter queries at the driver side. In real time big data scenario, we can have millions of carbon files for one carbon table. It is currently

Support Apache arrow vector filling from carbondata SDK

2019-05-02 Thread Ajantha Bhat
*Background:* As we know Apache arrow is a cross-language development platform for in-memory data, It specifies a standardised language-independent columnar memory format for flat and hierarchical data, organised for efficient analytic operations on modern hardware. So, By integrating carbon to

Re: [Discussion] Migrate CarbonData to support PrestoSQL

2019-05-06 Thread Ajantha Bhat
+1 for carbondata to support prestoSQL. As prestoSQL community is very active and prestoDB is more restricted to only facebook driven changes. However we need to take a call now, whether carbondata needs to supports both prestoDB and prestoSQL in future? I think supporting only prestoSQL is good

Re: Question about Presto integration

2019-07-15 Thread Ajantha Bhat
Hi Yuya Ebihara, As you can see in our documentation, latest carbondata is currently integrated with presto 0.217. https://github.com/apache/carbondata/blob/master/docs/presto-guide.md So, presto 0.217 works fine with his query. Presto 0.219 is not yet supported in this version of carbondata.

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-30 Thread Ajantha Bhat
+ 1 , I have some suggestions and questions. 1. In DMPROPERTIES, instead of 'timestamp_column' suggest to use 'timeseries_column'. so that it won't give an impression that only time stamp datatype is supported and update the document with all the datatype supported. 2. Querying on this datamap

Re: [DISCUSSION]Support for Geospatial indexing

2019-10-24 Thread Ajantha Bhat
Hi Jacky, we have checked about geomesa [image: Screenshot from 2019-10-23 16-25-23.png] a. Geomesa is tightly coupled with key-value pair databases like Accumulo, HBase, Google Bigtable and Cassandra databases and used for OLTP queries. b. Geomesa current spark integration is only in query

Re: [DISCUSSION] PyCarbon: provide python interface for users to use CarbonData by python code

2019-11-24 Thread Ajantha Bhat
+1 , As we have already worked on it, we have to integrate it as clean as possible. I think this can be done by 2 layers. 1. *PySDK:* Generic python layer over java SDK. Users who doesn't need AI support but just python SDK layer can use just this. a. This supports read, write carbondata

Re: [DISCUSSION]Support for Geospatial indexing

2019-11-27 Thread Ajantha Bhat
Hi Venu, 1. Please keep the default implementation independent of grid size and other parameters. I mean below parameters. 'INDEX_HANDLER.xxx.gridSize', 'INDEX_HANDLER.xxx.minLongitude', 'INDEX_HANDLER.xxx.maxLongitude', 'INDEX_HANDLER.xxx.minLatitude', 'INDEX_HANDLER.xxx.maxLatitude', *It

Re: [ANNOUNCE] Ajantha as new Apache CarbonData committer

2019-10-03 Thread Ajantha Bhat
Thank you all. On Thu, 3 Oct, 2019, 6:47 PM Kunal Kapoor, wrote: > Congratulations ajantha > > On Thu, Oct 3, 2019, 5:30 PM Liang Chen wrote: > > > Hi > > > > > > We are pleased to announce that the PMC has invited Ajantha as new Apache > > CarbonData committer and the invite has been

Optimize and refactor insert into command

2019-12-19 Thread Ajantha Bhat
Currently carbondata "insert into" uses the CarbonLoadDataCommand itself. Load process has steps like parsing and converter step with bad record support. Insert into doesn't require these steps as data is already validated and converted from source table or dataframe. Some identified changes are

Regarding presto carbondata integration

2020-02-11 Thread Ajantha Bhat
Hi all, Currently master code of carbondata works with *prestodb 0.217* We all know about competing *presto-sql* also. Some of the users doesn't want to migrate to *presto-sql *as their cloud vendor doesn't support presto sql (Example, AWS EMR, Huawei MRS, AZURE services except HDInsights still

Re: Discussion: change default compressor to ZSTD

2020-02-06 Thread Ajantha Bhat
Hi, 33% is huge a reduction in store size. If there is negligible difference in load and query time, we should definitely go for it. And does user really need to know about what compression is used ? change in file name may be need to handle compatibility. Already thrift *FileHeader,

Re: Propose to upgrade hive version to 3.1.0

2020-02-21 Thread Ajantha Bhat
+1, The current version will still be supported or carbondata will only support 3.1.0 after this? Thanks, Ajantha On Fri, 21 Feb, 2020, 4:39 pm Kunal Kapoor, wrote: > Hi All, > > The hive community has already released version 3.1.0 which has a lot of > bug fixes and new features. > Many of

Re: Improving show segment info

2020-02-16 Thread Ajantha Bhat
Hi Likun, I think this display command is hard to maintain if we provide all these options manually. *1. How about creating a "tableName.segmentInfo" child table for each main table?* user can query this table and easy to support filter, group by. we just have to finalize the schema of this

Re: Improving show segment info

2020-02-16 Thread Ajantha Bhat
3. And about event time. I don't think we need to keep it for every row. It is a waste of storage size. can we keep in loadMetadetails or file level ? On Mon, Feb 17, 2020 at 11:10 AM Ajantha Bhat wrote: > Hi Likun, > > I think this display command is hard to maintain if we pr

Re: Discussion: change default compressor to ZSTD

2020-02-19 Thread Ajantha Bhat
I will handle the compatibility. > > > > > > Regards, > > Jacky > > > > > > --原始邮件-- > > 发件人:"Ajantha Bhat" > 发送时间:2020年2月6日(星期四) 晚上11:51 > > 收件人:"dev" > > > 主题:Re: Discussion: chan

Re: [Discussion] Support SegmentLevel MinMax for better Pruning and less driver memory usage

2020-01-14 Thread Ajantha Bhat
+1, Can you explain more about how are you encoding and storing min max in segment file?As minmax values represent user data, we cannot store as plain values. Storing encrypted min max will add overhead of encrypting and decrypting. I suggest we can convert segment file to thrift file to solve

Re: Optimize and refactor insert into command

2020-01-01 Thread Ajantha Bhat
e different. > So you may need a bad record support then , how you are going to handle > such scenarios? Correct me if I misinterpreted your points. > > Regards, > Sujith > > > On Fri, 20 Dec 2019 at 5:25 AM, Ajantha Bhat > wrote: > > > Currently carbondata "i

Re: Apply to open 'Issues' tab in Apache CarbonData github

2019-12-23 Thread Ajantha Bhat
If planning to issues tab just to replace mailing list problems. I would suggest we can start using "*slack*". Many companies and open source communities uses slack (I have used from presto sql community). It supports thread based conversations and searching is easy. It also provides option to

Re: Carbon over-use cluster resources

2020-04-15 Thread Ajantha Bhat
Hi Manhua, For only No sort and Local sort, we don't follow spark task launch logic. we have our own logic of one node one task. And inside that task we can control resource by configuration (carbon.number.of.cores.while.loading) As you pointed in the above mail, *N * C is controlled by

Re: [VOTE] Apache CarbonData 2.0.0(RC1) release

2020-04-02 Thread Ajantha Bhat
Hi, For rc1, my comment is : -1 Similar points as Liang but along with that, After #3661, many documentation link is broken for MV, bloom, lucene datamap from ReadMe.md We need to fix it soon before the carbondata 2.0.0 release. Thanks, Ajantha On Thu, Apr 2, 2020 at 4:26 PM Liang Chen wrote:

Re: Disable Adaptive encoding for Double and Float by default

2020-03-26 Thread Ajantha Bhat
be exponential. Store size directly impacts > the query performance in object store world. It is better to find a way to > fix it rather than removing things. > > Regards, > Ravindra. > > On Wed, 25 Mar 2020 at 5:04 PM, Ajantha Bhat > wrote: > > > Hi Ravi, please fi

Disable Adaptive encoding for Double and Float by default

2020-03-24 Thread Ajantha Bhat
Hi all, I have done insert into flow profiling using JMC with the latest code [with new optimized insert flow] It seems for *2.5GB* carbon to carbon insert, double and float stats collector has used *68.36 GB* [*25%* of TLAB (Thread local allocation buffer)] [image: Screenshot from 2020-03-25

Re: Disable Adaptive encoding for Double and Float by default

2020-03-25 Thread Ajantha Bhat
t; On Wed, 25 Mar 2020 at 1:51 PM, Ajantha Bhat > wrote: > > > Hi all, > > > > I have done insert into flow profiling using JMC with the latest code > > [with new optimized insert flow] > > > > It seems for *2.5GB* carbon to carbon insert, double and

Re: [VOTE] Apache CarbonData 2.0.0(RC2) release

2020-05-03 Thread Ajantha Bhat
-1 to this RC, 1. I feel we need to have clear interface changes from the previous version to this version in the release notes. Example: PR #3583 has changed the SDK 'Field' package name 2. We need to list down all the removed / deprecated

Re: [VOTE] Apache CarbonData 2.0.0(RC3) release

2020-05-17 Thread Ajantha Bhat
+1 Regards, Ajantha On Sun, 17 May, 2020, 6:41 pm Jacky Li, wrote: > +1 > > Regards, > Jacky > > > > 2020年5月17日 下午4:50,Kunal Kapoor 写道: > > > > Hi All, > > > > I submit the Apache CarbonData 2.0.0(RC3) for your vote. > > > > > > *1.Release Notes:* > > >

Re: [Discussion] Optimize the Update Performance

2020-05-13 Thread Ajantha Bhat
Hi !, Update is still using converter step with bad record handing. If it is update by dataframe scenario no need of bad record handling, only for update by value case we can keep it. This can give significant improvement as we already observed in insert flow. I tried once to send it to new

[Discussion] Support pagination in SDK reader

2020-05-20 Thread Ajantha Bhat
*Background: *Pagination is the task of dividing the query result into pages and retrieving the required pages one by one on demand. [Example is google search. It displays results in pages] In the database domain, we use offset and limit to achieve it. Now If carbondata is used to create an image

Re: [Discussion] Segment management enhance

2020-09-04 Thread Ajantha Bhat
Hi David, a) Recently we tested huge concurrent load and compactions but never faced two loads using same segment id issue (because of table status lock in recordNewLoadMetadata), so I am not sure whether we really need to update to UUID. b) And about other segment interfaces, we have to

Re: [Discussion] Update feature enhancement

2020-09-04 Thread Ajantha Bhat
Hi David. Thanks for proposing this. *+1 from my side.* I have seen users with 200K segments table stored in cloud. It will be really slow to reload all the segments where update happened for indexes like SI, min-max, MV. So, it is good to write as a new segment and just load new segment

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

2020-09-04 Thread Ajantha Bhat
Hi David, a) Compressing table status is good. But need to check the decompression overhead and how much overall benefit we can get. b) I suggest we can keep multiple 10MB files (or configurable), then read it distributed way. c) Once read all the table status files better to cache them at driver

Re: Clean files enhancement

2020-09-15 Thread Ajantha Bhat
Hi vikram, Thanks for proposing this. a) If the file system is HDFS, *HDFS already supports trash.* when data is deleted in HDFS. It will be moved to trash instead of permanent delete (can also configure trash interval *fs.trash.interval*) b) If the file system is object storage like s3a or OBS.

Re: [ANN] Indhumathi as new Apache CarbonData committer

2020-10-06 Thread Ajantha Bhat
Congratulations indhumathi. On Wed, 7 Oct, 2020, 8:16 am Liang Chen, wrote: > Hi > > > We are pleased to announce that the PMC has invited Indhumathi as new > > Apache CarbonData committer, and the invite has been accepted! > > > Congrats to Indhumathi and welcome aboard. > > > Regards > > The

Re: Parallel Insert and Update

2020-10-14 Thread Ajantha Bhat
Hi Kejian Li, Thanks for working on this. I see that this design and requirement is similar to what Nihal has discussed a few days ago. http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Parallel-compaction-and-update-td100338.html So, Probably as Ravidra

Re: [VOTE] Apache CarbonData 2.1.0(RC1) release

2020-10-04 Thread Ajantha Bhat
Hi, Thanks for preparing the release. *-1 from my side for this release package.* reason is a. Many PR's are yet to be merged [Example, Presto write PR #3875, #3916, code cleanup #3950, other PR like #3934] b. Please go through the key features again. Can add, SI global sort support, presto

[Discussion] Taking the inputs for Segment Interface Refactoring

2020-10-14 Thread Ajantha Bhat
Hi Dev, Multiple times we are discussing about segment interface refactoring. But we are not moving ahead. The final goal of this activity is to *design* *clean segment interface that can support Time travel, concurrent operation and transaction management. * So, I am welcoming the problems,

Re: Regarding Carbondata Benchmarking & Feature presentation

2020-09-17 Thread Ajantha Bhat
Hi, Thanks for planning to propose carbon. Please join our slack to directly discuss with members also. https://join.slack .com/t/carbondataworkspace/shared_invite/zt-g8sv1g92-pr3GTvjrW5H9DVvNl6H2dg we will get back to you on the presentations and benchmarks. Thanks, Ajantha On Thu, Sep 17,

Re: Regarding Carbondata Benchmarking & Feature presentation

2020-09-17 Thread Ajantha Bhat
better. Thanks, Ajantha On Thu, Sep 17, 2020 at 11:57 AM Ajantha Bhat wrote: > Hi, Thanks for planning to propose carbon. > > Please join our slack to directly discuss with members also. > > https://join.slack > .com/t/carbondataworkspace/shared_invite/zt-g8sv1g92-pr3GTvjrW5H

Re: [Discussion] Presto read support for complex data types

2020-05-25 Thread Ajantha Bhat
+ 1, This is really required as complex schema is very common now a days and most the user have it. I see that current design covers only 1 level array. Multi level array with complex children and other complex type also need to be supported. Now, that you have an idea about array. It is better

Re: [DISCUSSION] Presto+Carbon transactional and Non-transactional Write Support

2020-05-25 Thread Ajantha Bhat
+ 1 for the proposal, I didn't see design doc in JIRA. Please check. Also once we provide write support, it is better to have carbondata as a separate plugin instead of extending hive. As presto-hive was not meant to have write support and it is mainly meant for query at where it exist. Also

Re: [DISCUSSION] About global sort in 2.0.0

2020-05-31 Thread Ajantha Bhat
+1 We can have a minor version patch release. Also in the next version, I suggest we can analyze existing testcases and make them organized and stronger! Thanks, Ajantha On Mon, 1 Jun, 2020, 9:12 am Kunal Kapoor, wrote: > +1 > We can have 2.0.1 as the patch release. > > Regards > Kunal

Re: [VOTE] Apache CarbonData 2.0.1(RC1) release

2020-06-01 Thread Ajantha Bhat
+ 1 Regards, Ajantha On Mon, 1 Jun, 2020, 4:33 pm Kunal Kapoor, wrote: > Hi All, > > I submit the Apache CarbonData 2.0.1(RC1) for your vote. > > > *1.Release Notes:* > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12347870 > > *2. The tag to be voted upon* : >

Re: Will carbon support MERGE INTO sql?

2020-10-25 Thread Ajantha Bhat
Hi Zhangshunyu, Yes, we are aware of this. As API support is already there, this was a lower priority compared to other pending works like IUD performance improvement and new feature implementations like time travel, segment interface refactoring. If you are interested to contribute on SQL syntax

Slack workspace launch !

2020-08-04 Thread Ajantha Bhat
Hi all, For the better discussion thread model and quick responses, we have created a free slack workspace of Carbondata. Feel free to join the workspace by using the below invite link and have active discussions.

Re: [Discussion] SI support Complex Array Type

2020-07-30 Thread Ajantha Bhat
Hi David & Indhumathi, Storing Array of String as just String column in SI by flattening [with row level position reference] can result in slow performance in case of * Multiple array_contains() or multiple array[0] = 'x' * The join solution mentioned can result in multiple scan (once for every

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

2020-07-09 Thread Ajantha Bhat
Hi, What if too many index files in a segment and user want to finish load fast and don't want to wait for merge index? That time setting merge index = false can help to save load time and in off peak time user can create merge index. So I still feel we need to fix issue exist when merge index =

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

2020-07-09 Thread Ajantha Bhat
Hi, I didn't reply to deprecation. *+1 for deprecating it*. *And +1 for issue fix also.* Issue fix, I didn't mean when *carbon.merge.index.in .segment = false.* but when when *carbon.merge.index.in .segment = true and merge index

Re: [Disscussion] Change Default TimeStampFormat to yyyy-mm-dd hh:mm:ss.SSS

2020-07-15 Thread Ajantha Bhat
Hi, I Need to check below points before concluding on it. If you already have information on this, you can provide me. 1. About hive and spark default format; some place they mention upto 9 decimal precision. you mentioned 3 decimal precision. so, which file of hive and spark has this default

Re: [Disscuss] The precise of timestamp is limited to millisecond in carbondata, which is incompatiable with DB

2020-07-15 Thread Ajantha Bhat
+ 1, as SimpleDateFormat doesn't support nanoseconds and microsecond. Thanks, Ajantha On Tue, Jul 14, 2020 at 5:03 PM xubo245 <601450...@qq.com> wrote: > +1, please consider compatable for history data > > > > -- > Sent from: >

Re: [VOTE] Apache CarbonData 2.0.0(RC3) release

2020-07-15 Thread Ajantha Bhat
Hi Justin, Thanks for pointing it out about the "Copyright (c) 2017-2018 Uber Technologies, Inc." I see that *two test files* in *pycarbon* module of carbondata has it. As pycarbon depends on open source Apache license *uber's petastrom* project. These two testcase files were imported from that

Re: [DISCUSSION] Presto+Carbon transactional and Non-transactional Write Support

2020-07-27 Thread Ajantha Bhat
+1, I have some suggestions and questions, a) you mentioned, currently creating a table form presto and inserting data will be a non-transactional table. so, to create a transactional table, we still depend on spark ? *I feel we should support transactional table creation with all table

Re: Size control of minot compaction

2020-11-23 Thread Ajantha Bhat
Hi Zhangshunyu, Thanks for providing more details on the problem. If it is just for skipping history segments during auto minor compaction, Adding a size threshold for minor compaction should be fine. We can have a table level, dynamically configurable threshold. If it is not configured, consider

Re: [DISCUSSION] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-17 Thread Ajantha Bhat
Hi Shen Jiayu, It is an interesting feature, thanks for proposing this. +1 from my side for high-level design, I have few suggestions and questions. a) Better to separate new UDF, utility UDF PR from algorithm improvement PR for ease of review and maintainability. b) Union, intersection, and

[Discussion] Upgrade presto-sql to 333 version

2020-12-18 Thread Ajantha Bhat
Hi all, Currently carbondata is integrated with presto-sql 316, which is 1.5 years older. There are many good features and optimization that came into presto like dynamic filtering, Rubix data cache and some performance improvements. It is always good to use latest version, latest version is

Re: [DISCUSSION]Join optimization with Carbondata's metadata

2020-11-10 Thread Ajantha Bhat
Hi Akash, *Just my opinion*, once the spark supports it, we can handle it in carbon if something needs to be supported. *Doing this change independent of spark can make us lose the advantage once spark brings it as default. * Qubole's dynamic filtering is already merged in prestosql and this will

Re: [Discussion] About carbon.si.segment.merge feature

2020-11-10 Thread Ajantha Bhat
@David: a) yes, SI can use global by default. b) Handling SI original load itself to launch task based on SI segment size (need to figure out how to estimate) is better, else we have to go with one task per node logic (similar to main table local sort). But current logic needs to changed to avoid

Re: [Discussion] Taking the inputs for Segment Interface Refactoring

2020-11-13 Thread Ajantha Bhat
Hi Everyone. Please find the design of refactored segment interfaces in the document attached. Also can check the same V3 version attached in the JIRA [ https://issues.apache.org/jira/browse/CARBONDATA-2827] It is based on some recent discussions and the previous discussions of 2018 [

Re: [ANNOUNCE] Ajantha as new PMC for Apache CarbonData

2020-11-20 Thread Ajantha Bhat
Thank you all !! On Fri, 20 Nov, 2020, 1:45 pm manish gupta, wrote: > Congratulations Ajantha  > > On Fri, 20 Nov 2020 at 1:21 PM, BrooksLi wrote: > > > Congratulations to Ajantha! > > > > > > > > -- > > Sent from: > > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >

Re: [DISCUSSION]Merge index property and operations improvement.

2020-11-22 Thread Ajantha Bhat
Hi Akash, In point 3, you have mentioned no need to fail load if merge index fails, So, how to create merge index again (as first-time query is slow without merge index) If you block for new tables (as per point 2)? It is contradicting I guess. Here are my inputs for this, *For Transactional

Re: Size control of minot compaction

2020-11-23 Thread Ajantha Bhat
Hi Zhangshunyu, For this scenario specific cases, the user can use custom compaction by mentioning the segment id which needs to be considered for compaction. Also if you just want to do size based, major compaction can be used. So, why are you thinking to support size based minor compaction?

Re: [Discussion] Partition Optimization

2020-10-29 Thread Ajantha Bhat
+1, Not keeping the partition values as a column (as the folder name already has it) is a great way to reduce the store size. we might have to handle compatibility and support refresh table also. Apache Iceberg has a bit matured concept called *hidden partitioning, *where they also maintain the

Re: [DISCUSSION] Support MERGE INTO SQL API

2020-11-04 Thread Ajantha Bhat
+1, Thanks for planning to implement this. Please define the limitations or scope in more detail for WHEN MATCHED and WHEN NOT MATCHED. For example, when NOT MATCHED, can UPDATE also supported? (I guess only insert is supported) Thanks, Ajantha On Thu, Nov 5, 2020 at 8:10 AM BrooksLi wrote:

Re: [Discussion] About carbon.si.segment.merge feature

2020-11-06 Thread Ajantha Bhat
, Ajantha On Fri, Nov 6, 2020 at 4:41 PM Ajantha Bhat wrote: > Hi, > > when a carbon property *carbon.si.segment.merge = true*, > > *a) local_sort SI segment loading (default) [All the SI columns are > involved]* > > SI load will load with default local_sort. There will be

Re: [VOTE] Apache CarbonData 2.1.0(RC2) release

2020-11-04 Thread Ajantha Bhat
+1, Thanks, Ajantha On Wed, 4 Nov, 2020, 2:17 pm akashrn5, wrote: > +1 for release. > > Thanks. > > Regards, > Akash R Nilugal > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >

Re: [Discussion] Taking the inputs for Segment Interface Refactoring

2021-01-05 Thread Ajantha Bhat
ail (also present in JIRA) and provide your opinion (+1) to go ahead.* Thanks, Ajantha On Fri, Nov 13, 2020 at 2:43 PM Ajantha Bhat wrote: > Hi Everyone. > Please find the design of refactored segment interfaces in the document > attached. Also can check the same V3 version attached

Re: [Discussion]Presto Queries leveraging Secondary Index

2021-01-05 Thread Ajantha Bhat
Hi Venu, a. *Presto carbondata support reading bloom index*, so I want to correct your initial statement "Presto engine do not make use of indexes(SI, Bloom etc) in query processing" b. Between option1 and option2 the main difference is *option1 is multi-threaded and option2 is distributed.* The

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

2021-01-17 Thread Ajantha Bhat
Hi Nihal, In concurrent scenario we cannot map which load command has been loaded as which segment id. It is good to show the summary at the end of command. I agree with david suggestion. Along with load and insert, if possible we should give summary for update, delete and merge also (which we

Re: [DISCUSSION] Improve Simple insert performance in carbondata

2021-02-02 Thread Ajantha Bhat
Hi, Simple insert you mean "insert by values"? I don't think in real data pipeline this will be used frequently. Ideally insert will be used for inserting from other table or external table. Just for one row insert (or insert by values) I don't think we need to avoid using spark Rdd flow. Also

Re: [DISCUSSION] Support JOIN query with spatial index

2021-04-27 Thread Ajantha Bhat
ok. +1 from my side. If polygon join query still has performance bottleneck, we can later optimize it. Thanks, Ajantha On Tue, Apr 27, 2021 at 3:59 PM Indhumathi wrote: > Thanks Ajantha for your inputs. > > I have modified the design, by adding ToRangeList Udf filter as a implicit > column

Re: [DISCUSSION] Support alter schema for complex types

2021-03-30 Thread Ajantha Bhat
Hi Akshay, The mail description and document content are not matching. For single-level struct also document says cannot support. So, please list down all the work that need to be done in points and then divide which is supported in phase1 and which is supported in phase 2 clearly in the summary

Re: [DISCUSSION] Support JOIN query with spatial index

2021-03-30 Thread Ajantha Bhat
Hi, I have some doubts and suggestions for the same. Currently, we support these UDFs --> IN_POLYGON, IN_POLYGON_LIST, IN_POLYLINE_LIST, IN_POLYGON_RANGE_LIST but the user needs to give polygon input manually and as polygon can have many points, it is hard to give manually. So, your requirement

Re: [DISCUSSION] Describe complex columns

2021-03-30 Thread Ajantha Bhat
Hi, +1 for this improvement. a) you can also print one line of short information about the parent column when describe column is executed to avoid executing again to know what is parent column type. Example, Describe column decimalcolumn on complexcarbontable; *You can mention that

Re: [VOTE] Apache CarbonData 2.1.1(RC2) release

2021-03-29 Thread Ajantha Bhat
Hi all, PMC vote has passed for Apache Carbondata 2.1.1 release, the result is as below: +1(binding): 5(Kunal Kapoor, David CaiQiang, Kumar Vishal, Ravindra Pesala, Liang Chen) +1(non-binding) : 2 (Akash, Indhumathi) Thanks all for your vote. On Mon, Mar 29, 2021 at 12:57 PM Liang Chen

Re: [Discussion]Presto Queries leveraging Secondary Index

2021-03-29 Thread Ajantha Bhat
+1 Thanks, Ajantha On Mon, Mar 29, 2021 at 5:58 PM Indhumathi wrote: > +1 for design. > > Please find my comments. > > 1. About updating IndexStatus.ENABLED property, Need to consider > compatibility scenarios as well. > 2. Can update the query behavior when carbon.enable.distributed.index >

Re: Support SI at Segment level

2021-03-30 Thread Ajantha Bhat
+1 for this proposal. But the other ongoing requirement ( http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Presto-Queries-leveraging-Secondary-Index-td105291.html) is dependent on *isSITableEnabled* so, better to wait for it to finish and redesign on top of it.

Re: Improve carbondata CDC performance

2021-03-30 Thread Ajantha Bhat
+1 for this improvement, But as this optimization is dependent on the data. There may be a scenario where after you prune with min max also your dataset size remain almost same as original. Which brings in extra overhead of the new operations added. Do you have plan to add some intelligence or

[VOTE] Apache CarbonData 2.1.1(RC2) release

2021-03-26 Thread Ajantha Bhat
tes are cast. [ ] +1 Release this package as Apache CarbonData 2.1.1 [ ] 0 I don't feel strongly about it, but I'm okay with the release [ ] -1 Do not release this package because... Regards, Ajantha Bhat

Re: DISCUSSION: propose to activate "Issues" of https://github.com/apache/carbondata

2021-03-18 Thread Ajantha Bhat
Hi, After opening github issues tab, are we going to stop using JIRA? If we keep both, then when to use JIRA and when to use issues? Also as we have slack channel now, if user face issues then can directly discuss in slack for quick support. Thanks, Ajantha On Thu, 18 Mar, 2021, 5:29 pm Liang

[VOTE] Apache CarbonData 2.1.1(RC1) release

2021-03-17 Thread Ajantha Bhat
tes are cast. [ ] +1 Release this package as Apache CarbonData 2.1.1 [ ] 0 I don't feel strongly about it, but I'm okay with the release [ ] -1 Do not release this package because... Regards, Ajantha Bhat

Re: [DISCUSSION] Support JOIN query with spatial index

2021-04-19 Thread Ajantha Bhat
Hi, I think now the latest document has addressed my previous comments and questions. polygon list query and polyline list query design looks ok. But the design of polygon query with join, I have performance concern. In this approach, we are using union polygon filter on spatial_table to prune

[Design Discussion] Transaction manager, time travel and segment interface refactoring

2021-04-22 Thread Ajantha Bhat
Hi All, In this thread, I am continuing the below discussion along with the Transaction Manager and Time Travel feature design. http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Taking-the-inputs-for-Segment-Interface-Refactoring-td101950.html The goal of this

Re: [VOTE] Apache CarbonData 2.2.0(RC2) release

2021-08-02 Thread Ajantha Bhat
+1 Regards, Ajantha On Mon, Aug 2, 2021 at 9:03 PM Venkata Gollamudi wrote: > +1 > > Regards, > Venkata Ramana > > On Mon, 2 Aug, 2021, 20:18 Kunal Kapoor, wrote: > > > +1 > > > > Regards > > Kunal Kapoor > > > > On Mon, 2 Aug 2021, 4:53 pm Kumar Vishal, > > wrote: > > > > > +1 > > > Regards