[DISCUSSION]Carbondata Streamer tool and Schema change capture in CDC merge

2021-08-31 Thread Akash Nilugal
Hi Community, OLTP systems like Mysql are used heavily for storing transactional data in real-time and the same data is later used for doing fraud detection and taking various data-driven business decisions. Since OLTP systems are not suited for analytical queries due to their row-based storage,

TEST mailing list

2021-07-06 Thread Akash Nilugal
Hi, This is to just test the mailing list. Regards, Akash R

[VOTE] Apache CarbonData 2.2.0(RC1) release

2021-07-06 Thread Akash Nilugal
Hi All, I submit the *Apache CarbonData 2.2.0(RC1) *for your vote. *1. Release Notes:* https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12347869=Html=12320220=Create_token=A5KQ-2QAV-T4JA-FDED_386c7cf69a9d53cc8715137e7dba91958dabef9b_lin *Some key features and improvements in

[DISCUSSION]Improve Simple updates and delete performance in carbondata

2020-11-19 Thread Akash Nilugal
Hi Community, Carbondata supports update and delete using spark. So basically update is delete + Insert, and delete is just delete But we use spark APIs or actions on collections that use spark jobs to do them, like map, partition etc So Spark adds overhead of task serialization cost, total job

[DISCUSSION]Merge index property and operations improvement.

2020-11-09 Thread Akash Nilugal
Hi All, Currently, we have the merge index feature which can be enabled or disabled and by default it's enabled. Now during load or compaction, we first create index files and then create merge index, if merge index generation fails we don't fail load, we have the alter compact command to do for

Update on CI issue

2020-10-21 Thread Akash Nilugal
Hi Community, Due to some maintenance issues, CI has some problems. We are working on it to fix ASAP. we will notify once it's Ok. Thanks Regards, Akash R

CI Maintenance Activity Update

2020-10-01 Thread Akash Nilugal
Hi Community, On account of CI maintenance activity, the carbondata CI machines will be down till October 5th, starting from tonight. PR builder will start as usual from Oct 5th. Please make note of this. Thanks, Regards, Akash R

[DISCUSSION]Remove the call to update the serde properties in case of alter scenarios

2020-07-24 Thread Akash Nilugal
Hi Community, Currently, we have an API called alterTable() in CarbonSessionCatalogUtil.scala which basically fires a SQL which goes and changes the serde properties. Actually this API was added when carbon needed to support alter features(spark-2.1 and 2.2 back then), but spark wasn't

[DISCUSSION] Presto+Carbon transactional and Non-transactional Write Support

2020-07-14 Thread Akash Nilugal
Hi Community, As we know the CarbonDataisan indexed columnar data format for fast analytics on big data platforms. So we have already integrated with the query engines like spark and even presto. Currently with presto we only support the querying of carbon data files. But we don’t yet support the

Re: [DISCUSSION]: Changes to SHOW METACACHE command

2020-01-02 Thread Akash Nilugal
hi, +1, Looks good. Can you please update the document with more clear explanation. Regards, Akash On 2019/12/17 11:55:30, Vikram Ahuja wrote: > Hi All, > Please find the attached design document for the same. > >

Re: Apply to open 'Issues' tab in Apache CarbonData github

2019-12-30 Thread Akash Nilugal
Hi, +1 for ajantha's suggestion. Many Open source communities use slack for any discussion, it has got good interface UI and it even supports chat. It will be helpful for any new developer who is interested in carbondata. Currently many people cannot follow mail chain. Regards, Akash R

Re: [Discussion]Gson version problem

2019-12-08 Thread Akash Nilugal
Hi likun, Actually in datamap schema i havent seen classname, we always write the provider name so i think its not failing and the table Schema im not sure, it basically depends on which class loads first. Regards, Akash On 2019/12/09 04:20:07, Jacky Li wrote: > Hi Akash, > > > I check

Re: [DISCUSSION] Cache Pre Priming

2019-12-06 Thread Akash Nilugal
+1 Regards, Akash R Nilugal On 2019/11/26 13:47:11, vikramahuja1001 wrote: > Hi Community! > The support for prepriming in the case of Bloom and Lucene have to be > removed from the design document as those datamaps are only created during > query time and no the load time. Since they are not

Re: Propose feature change in CarbonData 2.0

2019-12-06 Thread Akash Nilugal
Hi, 1. Global Dict - 0 2. Bucket and 3. Cutstom Partition +1 4. Batch sort +1 5 page level 0 6. preaggregate and old timeseries + 1 7. stored by +1 Store optimization +1 I also suggest the refactoring below: [DISCUSSION] Segment file improvement for Update and delete case. you can find the

[Discussion]Gson version problem

2019-12-06 Thread Akash Nilugal
Hi Community, Recently we developed a code to reduce the table status file size. So now the table status file contains the short forms of name which reduces the files size. For compatibility case, we have added a @Serialized annotation provided by gson(version 2.4) which allows to mention the

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-21 Thread Akash Nilugal
/09/23 13:42:48, Akash Nilugal wrote: > Hi Community, > > Timeseries data are simply measurements or events that are > tracked,monitored, downsampled, and aggregated over time. > Basicallytimeseries data analysis helps in analyzing or monitoring > theaggregated data over peri

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-07 Thread Akash Nilugal
nd starts querying the low level data if he needs it. > I think better get some real uses how user wants this time series data. > > Regards, > Ravindra. > > > On 4 Oct 2019, at 9:39 PM, Akash Nilugal wrote: > > > > Hi Ravi, > > > > 1. I forgot to m

Re: [ANNOUNCE] Ajantha as new Apache CarbonData committer

2019-10-05 Thread Akash Nilugal
Congratulations Ajantha. Regards, Akash On 2019/10/03 12:00:22, Liang Chen wrote: > Hi > > > We are pleased to announce that the PMC has invited Ajantha as new Apache > CarbonData committer and the invite has been accepted! > > Congrats to Ajantha and welcome aboard. > > Regards > >

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-04 Thread Akash Nilugal
scenario > > 4. Why to store min/max at segment level? We can get from datamap also right? > > 4. Union with high granularity tables to low granularity tables are really > needed? Any other time series DB is doing it? Or any known use case we have? > > Regards, > R

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-01 Thread Akash Nilugal
old records if timeseries device stopped working for some time). > > On Tue, Oct 1, 2019 at 10:41 AM Akash Nilugal > wrote: > > > Hi vishal, > > > > In the design document, in the impacted analysis section, there is a topic > > compatibility/legacy stores, so b

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-01 Thread Akash Nilugal
ries datamap for older segments[Existing table]. > If the customer's main table data is also stored based on time[increasing > time] in different segments,he can use this feature as well. > > We can discuss and finalize the solution. > > -Regards > Kumar Vishal > > On Mon

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-30 Thread Akash Nilugal
is also stored based on time[increasing > time] in different segments,he can use this feature as well. > > We can discuss and finalize the solution. > > -Regards > Kumar Vishal > > On Mon, Sep 30, 2019 at 2:42 PM Akash Nilugal > wrote: > > > Hi Ajantha, >

Re: [DISCUSSION] Support heterogeneous format segments in carbondata

2019-09-30 Thread Akash Nilugal
Hi +1 One question is , is add segment and load data to main table supported? If yes, how the segment locking thing is handled? as we are going to add an entry inside table status with a segment id for added segment. Regards, Akash On 2019/09/10 14:41:22, Ravindra Pesala wrote: > Hi All, >

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-30 Thread Akash Nilugal
data will be fetched > form hour granularity datamap and aggregated ? or data is fetched from main > table ? > > Thanks, > Ajantha > > On Mon, Sep 30, 2019 at 11:46 AM Akash Nilugal > wrote: > > > Hi xuchuanyin, > > > > Thanks for the comments/Su

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-30 Thread Akash Nilugal
Hi xuchuanyin, Thanks for the comments/Suggestions 1. Preaggregate is productized, but not the timeseries with preaggregate, i think you got confused with that, if im right. 2. Limitations like, auto sampling or rollup, which we will be supporting now. Retention policies. etc 3.

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-24 Thread Akash Nilugal
datamap be supported ? If not > supported it can be updated in the design doc. > > Regards > Chetan > > On 2019/09/23 13:42:48, Akash Nilugal wrote: > > Hi Community, > > > > Timeseries data are simply measurements or events that are > > tracked,monitored, downsampled,

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-24 Thread Akash Nilugal
, > TIMESTAMP, BIGINT(Unix timestamp) > > On 2019/09/23 13:42:48, Akash Nilugal wrote: > > Hi Community, > > > > Timeseries data are simply measurements or events that are > > tracked,monitored, downsampled, and aggregated over time. > > Basicallytim

[DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-23 Thread Akash Nilugal
Hi Community, Timeseries data are simply measurements or events that are tracked,monitored, downsampled, and aggregated over time. Basicallytimeseries data analysis helps in analyzing or monitoring theaggregated data over period of time to take better decision forbusiness. So since carbondata

[DISCUSSION] Segment file improvement for Update and delete case

2019-09-12 Thread Akash Nilugal
Hi Community, Consider a scenario where we have one update operation done on segment, so one index and data file are generated. Now one more update operation happens which will load the segments of old update to cache, and actual indexmerge of that segment to cache. But since we have horizontal

Re: [DISCUSSION] implement MERGE INTO statement

2019-08-31 Thread Akash Nilugal
Hi David, Can you please create a JIRA and upload a design document. Regards, Akash On 2019/08/31 06:14:45, David Cai wrote: > hi all, > CarbonData has supported the insert/update/delete operations. > Now we can start to discuss the MERGE INTO statement. > It should combine

Re: Time travel/versioning on carbondata.

2019-08-26 Thread Akash Nilugal
Hi Ravindra, I have some doubts and suggestion, 1. Since for compaction, you are suggesting to keep the compacted segments as it is, it will be applicable for delete segment by id or date operation also right? 2. Since there is a proposal for moving delete delta file data to segment file,

Re: [DISCUSSION] Cache Pre Priming

2019-08-26 Thread Akash Nilugal
Hi David, Thanks for the input. Here anyway at one point of time query is gonna happen on table, if we giev one more table property, simply it will be like complex, like handle property, compatibility, set and unset support. let's not make this more cumbersome. Anyway LRU will take care to

Re: [DISCUSSION] Cache Pre Priming

2019-08-21 Thread Akash Nilugal
dataload operation be impacted if the > segment datamap is loaded to cache once the load is finished. > 2. Will there be a notification in logs stating that the loading of datamap > cache is completed. > > Regards > > On 2019/08/15 12:03:09, Akash Nilugal wrote: > >

Re: [DISCUSSION] Cache Pre Priming

2019-08-21 Thread Akash Nilugal
w much of the performance difference between the first and second > querys is affected by caching index and how much is affected by Hadoop > caching. > We should open it up and take a look at the time-consuming analysis on > the driver side. > > On 2019/08/21 09:42:

Re: [DISCUSSION] Cache Pre Priming

2019-08-21 Thread Akash Nilugal
It can analyze the performance improvement that can be brought by > caching part of the index in advance. > > On 2019/08/15 12:03:09, Akash Nilugal wrote: > > Hi Community, > > > > Currently, we have an index server which basically helps in distributed > > cach

Re: [DISCUSSION] Cache Pre Priming

2019-08-21 Thread Akash Nilugal
nd, first clear the cache > data, then loading the cache again? does this command can be executed many > times。 > 5. About Compaction > Does like the rebuild before,we need to decide which cache should be > clear and another segments's cache need be loaded? > On 2019/08/15

Re: [DISCUSSION] Cache Pre Priming

2019-08-19 Thread Akash Nilugal
e to use, > such as : > *.* for all dbs and tables > test.* for all tables in test db > test.day_table_201908* for table has targeted prefix > > 3. yes, you are right, fire a count(*) can do that. > > > On 2019/08/19 09:23:06, Akash Nilugal wrote: > > Hi man

Re: [DISCUSSION] Cache Pre Priming

2019-08-19 Thread Akash Nilugal
get back for any clarifications or inputs. Thanks and Regards Akash R Nilugal On Thu, Aug 15, 2019, 5:33 PM Akash Nilugal wrote: > Hi Community, > > Currently, we have an index server which basically helps in distributed > caching of the datamaps in a separate spark

Re: [DISCUSSION] Cache Pre Priming

2019-08-16 Thread Akash Nilugal
Hi All, I have raised a jira and attached the design doc there .please refer CARBONDATA - 3492 Regards, Akash On Thu, Aug 15, 2019, 5:33 PM Akash Nilugal wrote: > Hi Community, > > Currently, we have an index server which basically helps in distributed > caching of the datamaps i

[DISCUSSION] Cache Pre Priming

2019-08-15 Thread Akash Nilugal
Hi Community, Currently, we have an index server which basically helps in distributed caching of the datamaps in a separate spark application. The caching of the datamaps in index server will start once the query is fired on the table for the first time, all the datamaps will be loaded if the

[DISCUSSION] Support Incremental load in datamap and other MV datamap enhancement

2019-02-15 Thread Akash Nilugal
Hi, Currently in carbondata we have datamaps like preaggregate, lucene, bloom, mv and we have lazy and non-lazy methods to load data to datamaps. But lazy load is not allowed for datamaps like preagg, lucene, bloom.but, it is allowed for mv datamap. In lazy load of mv datamap, for every

[Discussion]Alter table column rename feature

2018-12-05 Thread Akash Nilugal
Hi community, Currently carbon supports alter table rename, add column, drop column and change datatype command. I want to propose alter table column rename feature in carbondata. If user has an old table and he wants to change some column names as those columns does not meet his current

[SUGGESTION]Support compaction no_sort

2018-11-19 Thread Akash Nilugal
Hi all, Currently when the data load is done with sort_scope as NO_SORT, then when those segments are compacted, data is still not sorted and it will hit query performance. The above problem can be solved by sorting the data during compaction and this helps in query performance. During busy

Add more metadata to footer

2018-10-11 Thread Akash Nilugal
Hi Community, With the current metadata in carbondata file footer, we can add more metadata to footer file to Improve carbon maintainability. we can add info like who has written the file and what is the carbon version in which this file is written, which will help to identify or fix any

Re: [SUGGESTION]Support Decoder based fallback mechanism in local dictionary

2018-08-31 Thread Akash Nilugal
as mentioned above 4. memory requirement reduced to higher level Regards, Akash R Nilugal On Mon, Aug 27, 2018 at 11:51 AM Akash Nilugal wrote: > Hi all, > > Currently, when the fallback is initiated for a column page in case of > local dictionary, we are keeping both

[SUGGESTION]Support Decoder based fallback mechanism in local dictionary

2018-08-27 Thread Akash Nilugal
Hi all, Currently, when the fallback is initiated for a column page in case of local dictionary, we are keeping both encoded data and actual data in memory and then we form the new column page without dictionary encoding and then at last we free the Encoded Column Page. Because of this offheap