Re: CarbonData propose major version number increment for next version (to 1.0.0)
+1 Regards Bill Venkata Gollamudi wrote > Hi All, > > CarbonData 0.2.0 has been a good work and stable release with lot of > defects fixed and with number of performance improvements. > https://issues.apache.org/jira/browse/CARBONDATA-320?jql=project%20%3D%20CARBONDATA%20AND%20fixVersion%20%3D%200.2.0-incubating%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC > > Next version has many major and new value added features are planned, > taking CarbonData capability to next level. > Like > - IUD(Insert-Update-Delete) support, > - complete rewrite of data load flow with out Kettle, > - Spark 2.x support, > - Standardize CarbonInputFormat and CarbonOutputFormat, > - alluxio(tachyon) file system support, > - Carbon thrift format optimization for fast query, > - Data loading performance improvement and In memory off heap sorting, > - Query performance improvement using off heap, > - Support Vectorized batch reader. > > https://issues.apache.org/jira/browse/CARBONDATA-301?jql=project%20%3D%20CARBONDATA%20AND%20fixVersion%20%3D%200.3.0-incubating%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC > > I think it makes sense to change CarbonData Major version in next version > to 1.0.0. > Please comment and vote on this. > > Thanks, > Ramana -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/CarbonData-propose-major-version-number-increment-for-next-version-to-1-0-0-tp3131p3219.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: CarbonData propose major version number increment for nextversion (to 1.0.0)
+1. i think it's natural for that big changes will made. another reason is,carbon is stable enough after many bug fixed. Regards Jay -- Original -- From: "Lion.X";; Date: Sat, Nov 26, 2016 09:47 AM To: "dev" ; Subject: Re: CarbonData propose major version number increment for nextversion (to 1.0.0) +1 I think it is a good choice because the new features add into next version are disruptive changes in carbon architecture. Regards, Lionx -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/CarbonData-propose-major-version-number-increment-for-next-version-to-1-0-0-tp3131p3217.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: CarbonData propose major version number increment for next version (to 1.0.0)
+1 I think it is a good choice because the new features add into next version are disruptive changes in carbon architecture. Regards, Lionx -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/CarbonData-propose-major-version-number-increment-for-next-version-to-1-0-0-tp3131p3217.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: [Improvement] Use Trie in place of HashMap to reduce memory footprint of Dictionary
Hi Liang, Kumar Vishal, I has done a standard benchmark about multiply data structures for Dictionary following your suggestions. Based on the test results, I think DAT may be the best choice for CarbonData. *1. Here are 2 test results:* --- Benchmark about {HashMap,DAT,RadixTree,TrieDict} Structures for Dictionary HashMap : java.util.HashMap DAT (Double Array Trie): https://github.com/komiya-atsushi/darts-java RadixTree: https://github.com/npgall/concurrent-trees TrieDict (Dictionary in Kylin): http://kylin.apache.org/blog/2015/08/13/kylin-dictionary Dictionary Source (Traditional Chinese): https://raw.githubusercontent.com/fxsjy/jieba/master/extra_dict/dict.txt.big Test Result a. Dictionary Size:584429 b. Build Time (ms) : DAT : 5714 HashMap : 110 RadixTree : 22044 TrieDict : 855 c. Memory footprint in 64-bit JVM (bytes) : DAT : 16779752 HashMap : 32196592 RadixTree : 46130584 TrieDict : 10443608 d. Retrieval Performance for 9935293 query times (ms) : DAT : 585 HashMap : 1010 RadixTree : 417639 TrieDict : 8664 Test Result Test Result a. Dictionary Size:584429 b. Build Time (ms) : DAT : 5867 HashMap : 100 RadixTree : 22082 TrieDict : 840 c. Memory footprint in 64-bit JVM (bytes) : DAT : 16779752 HashMap : 32196592 RadixTree : 46130584 TrieDict : 10443608 d. Retrieval Performance for 9935293 query times (ms) : DAT : 593 HashMap : 821 RadixTree : 422297 TrieDict : 8752 Test Result *2. Conclusion:* a. TrieDict is good for building tree and less memory footprint overhead, but worst retrieval performance, b. DAT is a good tradeoff between memory footprint and retrieval performance, c. RadixTree has the worst performance in different aspects. *3. Result Analysis:* a. With Trie the memory footprint of the TrieDict mapping is kinda minimized if compared to HashMap, in order to improve performance there is a cache layer overlays on top of Trie. b. Because a large number of duplicate prefix data, the total memory footprint is more than trie, meanwhile i think calculating string hash code of traditional Chinese consume considerable time overhead, so the performance is not the best. c. DAT is a better tradeoff. d. I have no idea why RadixTree has the worst performance in terms of memory, retrieval and building tree. On Fri, Nov 25, 2016 at 11:28 AM, Liang Chenwrote: > Hi xiaoqiao > > ok, look forward to seeing your test result. > Can you take this task for this improvement? Please let me know if you need > any support :) > > Regards > Liang > > > hexiaoqiao wrote > > Hi Kumar Vishal, > > > > Thanks for your suggestions. As you said, choose Trie replace HashMap we > > can get better memory footprint and also good performance. Of course, DAT > > is not only choice, and I will do test about DAT vs Radix Trie and > release > > the test result as soon as possible. Thanks your suggestions again. > > > > Regards, > > Xiaoqiao > > > > On Thu, Nov 24, 2016 at 4:48 PM, Kumar Vishal > > > kumarvishal1802@ > > > > > wrote: > > > >> Hi XIaoqiao He, > >> +1, > >> For forward dictionary case it will be very good optimisation, as our > >> case > >> is very specific storing byte array to int mapping[data to surrogate key > >> mapping], I think we will get much better memory footprint and > >> performance > >> will be also good(2x). We can also try radix tree(radix trie), it is > more > >> optimise for storage. > >> > >> -Regards > >> Kumar Vishal > >> > >> On Thu, Nov 24, 2016 at 12:12 PM, Liang Chen > > > chenliang6136@ > > > > >> wrote: > >> > >> > Hi xiaoqiao > >> > > >> > For the below example, 600K dictionary data: > >> > It is to say that using "DAT" can save 36M memory against > >> > "ConcurrentHashMap", whereas the performance just lost less (1718ms) ? > >> > > >> > One more question:if increases the dictionary data size, what's the > >> > comparison results "ConcurrentHashMap" VS "DAT" > >> > > >> > Regards > >> > Liang > >> > > >> > -- > >> > a. memory footprint (approximate quantity) in 64-bit JVM: > >> > ~104MB (*ConcurrentHashMap*) vs ~68MB (*DAT*) > >> > > >> > b. retrieval performance: total time(ms) of 500 million query: > >> > 12825 ms(*ConcurrentHashMap*) vs 14543 ms(*DAT*) > >> > > >> > Regards > >> > Liang > >> > > >> > hexiaoqiao wrote > >> > > hi Liang, > >> > > > >> > > Thanks for your reply, i need to correct the experiment result > >> because > >> > > it's > >> > > wrong order NO.1 column of result data table. > >> > > > >> > > In order to compare performance
Re: Using DataFrame to write carbondata file cause no table found error
Hi, In Append mode , the carbon table supposed to be created before other wise load fails as Table do not exist. In Overwrite mode the carbon table would be created (it drops if it already exists) and loads the data. But in your case for overwrite mode it creates the table but it says table not found while loading. Can you provide script to reproduce this issue and also provide the carbondata and spark version you are using. Regards, Ravindra. On 25 November 2016 at 17:58, ZhuWilliamwrote: > When I change the SaveMode.Append to Override,then the error is more weird: > > > > INFO 25-11 20:19:46,572 - streaming-job-executor-0 Query [ > CREATE TABLE IF NOT EXISTS DEFAULT.CARBON2 > (A STRING, B STRING) > STORED BY 'ORG.APACHE.CARBONDATA.FORMAT' > ] > INFO 25-11 20:19:46,656 - Parsing command: > CREATE TABLE IF NOT EXISTS default.carbon2 > (a STRING, b STRING) > STORED BY 'org.apache.carbondata.format' > > INFO 25-11 20:19:46,663 - Parse Completed > AUDIT 25-11 20:19:46,860 - [allwefantasy][allwefantasy][ > Thread-100]Creating > Table with Database name [default] and Table name [carbon2] > INFO 25-11 20:19:46,889 - 1: get_tables: db=default pat=.* > INFO 25-11 20:19:46,889 - ugi=allwefantasy ip=unknown-ip-addr > cmd=get_tables: db=default pat=.* > INFO 25-11 20:19:46,889 - 1: Opening raw store with implemenation > class:org.apache.hadoop.hive.metastore.ObjectStore > INFO 25-11 20:19:46,891 - ObjectStore, initialize called > INFO 25-11 20:19:46,897 - Reading in results for query > "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used > is > closing > INFO 25-11 20:19:46,898 - Using direct SQL, underlying DB is MYSQL > INFO 25-11 20:19:46,898 - Initialized ObjectStore > INFO 25-11 20:19:46,954 - streaming-job-executor-0 Table block size not > specified for default_carbon2. Therefore considering the default value 1024 > MB > INFO 25-11 20:19:46,978 - Table carbon2 for Database default created > successfully. > INFO 25-11 20:19:46,978 - streaming-job-executor-0 Table carbon2 for > Database default created successfully. > AUDIT 25-11 20:19:46,978 - [allwefantasy][allwefantasy][ > Thread-100]Creating > timestamp file for default.carbon2 > INFO 25-11 20:19:46,979 - streaming-job-executor-0 Query [CREATE TABLE > DEFAULT.CARBON2 USING CARBONDATA OPTIONS (TABLENAME "DEFAULT.CARBON2", > TABLEPATH "FILE:///TMP/CARBONDATA/STORE/DEFAULT/CARBON2") ] > INFO 25-11 20:19:47,033 - 1: get_table : db=default tbl=carbon2 > INFO 25-11 20:19:47,034 - ugi=allwefantasy ip=unknown-ip-addr > cmd=get_table > : db=default tbl=carbon2 > WARN 25-11 20:19:47,062 - Couldn't find corresponding Hive SerDe for data > source provider carbondata. Persisting data source relation > `default`.`carbon2` into Hive metastore in Spark SQL specific format, which > is NOT compatible with Hive. > INFO 25-11 20:19:47,247 - 1: create_table: Table(tableName:carbon2, > dbName:default, owner:allwefantasy, createTime:1480076387, > lastAccessTime:0, > retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, > type:array, comment:from deserializer)], location:null, > inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2. > MetadataTypedColumnsetSerDe, > parameters:{tableName=default.carbon2, serialization.format=1, > tablePath=file:///tmp/carbondata/store/default/carbon2}), bucketCols:[], > sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], > skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], > parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=carbondata}, > viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, > privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, > rolePrivileges:null)) > INFO 25-11 20:19:47,247 - ugi=allwefantasy ip=unknown-ip-addr > cmd=create_table: Table(tableName:carbon2, dbName:default, > owner:allwefantasy, createTime:1480076387, lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array, > comment:from deserializer)], location:null, > inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.serde2. > MetadataTypedColumnsetSerDe, > parameters:{tableName=default.carbon2, serialization.format=1, > tablePath=file:///tmp/carbondata/store/default/carbon2}), bucketCols:[], > sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], > skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], > parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=carbondata}, > viewOriginalText:null,
Re: Using DataFrame to write carbondata file cause no table found error
When I change the SaveMode.Append to Override,then the error is more weird: INFO 25-11 20:19:46,572 - streaming-job-executor-0 Query [ CREATE TABLE IF NOT EXISTS DEFAULT.CARBON2 (A STRING, B STRING) STORED BY 'ORG.APACHE.CARBONDATA.FORMAT' ] INFO 25-11 20:19:46,656 - Parsing command: CREATE TABLE IF NOT EXISTS default.carbon2 (a STRING, b STRING) STORED BY 'org.apache.carbondata.format' INFO 25-11 20:19:46,663 - Parse Completed AUDIT 25-11 20:19:46,860 - [allwefantasy][allwefantasy][Thread-100]Creating Table with Database name [default] and Table name [carbon2] INFO 25-11 20:19:46,889 - 1: get_tables: db=default pat=.* INFO 25-11 20:19:46,889 - ugi=allwefantasy ip=unknown-ip-addr cmd=get_tables: db=default pat=.* INFO 25-11 20:19:46,889 - 1: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore INFO 25-11 20:19:46,891 - ObjectStore, initialize called INFO 25-11 20:19:46,897 - Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing INFO 25-11 20:19:46,898 - Using direct SQL, underlying DB is MYSQL INFO 25-11 20:19:46,898 - Initialized ObjectStore INFO 25-11 20:19:46,954 - streaming-job-executor-0 Table block size not specified for default_carbon2. Therefore considering the default value 1024 MB INFO 25-11 20:19:46,978 - Table carbon2 for Database default created successfully. INFO 25-11 20:19:46,978 - streaming-job-executor-0 Table carbon2 for Database default created successfully. AUDIT 25-11 20:19:46,978 - [allwefantasy][allwefantasy][Thread-100]Creating timestamp file for default.carbon2 INFO 25-11 20:19:46,979 - streaming-job-executor-0 Query [CREATE TABLE DEFAULT.CARBON2 USING CARBONDATA OPTIONS (TABLENAME "DEFAULT.CARBON2", TABLEPATH "FILE:///TMP/CARBONDATA/STORE/DEFAULT/CARBON2") ] INFO 25-11 20:19:47,033 - 1: get_table : db=default tbl=carbon2 INFO 25-11 20:19:47,034 - ugi=allwefantasy ip=unknown-ip-addr cmd=get_table : db=default tbl=carbon2 WARN 25-11 20:19:47,062 - Couldn't find corresponding Hive SerDe for data source provider carbondata. Persisting data source relation `default`.`carbon2` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. INFO 25-11 20:19:47,247 - 1: create_table: Table(tableName:carbon2, dbName:default, owner:allwefantasy, createTime:1480076387, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array, comment:from deserializer)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe, parameters:{tableName=default.carbon2, serialization.format=1, tablePath=file:///tmp/carbondata/store/default/carbon2}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=carbondata}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, rolePrivileges:null)) INFO 25-11 20:19:47,247 - ugi=allwefantasy ip=unknown-ip-addr cmd=create_table: Table(tableName:carbon2, dbName:default, owner:allwefantasy, createTime:1480076387, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array, comment:from deserializer)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe, parameters:{tableName=default.carbon2, serialization.format=1, tablePath=file:///tmp/carbondata/store/default/carbon2}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=carbondata}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, rolePrivileges:null)) INFO 25-11 20:19:47,257 - Creating directory if it doesn't exist: file:/tmp/user/hive/warehouse/carbon2 AUDIT 25-11 20:19:47,564 - [allwefantasy][allwefantasy][Thread-100]Table created with Database name [default] and Table name [carbon2] org.apache.spark.sql.catalyst.analysis.NoSuchTableException at org.apache.spark.sql.hive.CarbonMetastoreCatalog.lookupRelation1(CarbonMetastoreCatalog.scala:141) at org.apache.spark.sql.hive.CarbonMetastoreCatalog.lookupRelation1(CarbonMetastoreCatalog.scala:127)
[jira] [Created] (CARBONDATA-450) Increase Test Coverage for Core.reader module
SWATI RAO created CARBONDATA-450: Summary: Increase Test Coverage for Core.reader module Key: CARBONDATA-450 URL: https://issues.apache.org/jira/browse/CARBONDATA-450 Project: CarbonData Issue Type: Test Reporter: SWATI RAO -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Using DataFrame to write carbondata file cause no table found error
Here is the error: ERROR 25-11 18:13:40,116 - Data loading failed. table not found: default.carbon1 AUDIT 25-11 18:13:40,118 - [allwefantasy][allwefantasy][Thread-98]Data loading failed. table not found: default.carbon1 INFO 25-11 18:13:40,119 - Finished job streaming job 148006882 ms.0 from job set of time 148006882 ms INFO 25-11 18:13:40,119 - Total delay: 0.119 s for time 148006882 ms (execution: 0.106 s) INFO 25-11 18:13:40,120 - Removing RDD 4 from persistence list java.lang.RuntimeException: Data loading failed. table not found: default.carbon1 at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1040) at org.apache.carbondata.spark.CarbonDataFrameWriter.loadDataFrame(CarbonDataFrameWriter.scala:132) at org.apache.carbondata.spark.CarbonDataFrameWriter.writeToCarbonFile(CarbonDataFrameWriter.scala:52) at org.apache.carbondata.spark.CarbonDataFrameWriter.appendToCarbonFile(CarbonDataFrameWriter.scala:43) at org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation.scala:112) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) at streaming.core.compositor.spark.streaming.output.SQLOutputCompositor$$anonfun$result$1.apply(SQLOutputCompositor.scala:61) at streaming.core.compositor.spark.streaming.output.SQLOutputCompositor$$anonfun$result$1.apply(SQLOutputCompositor.scala:53) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49) at scala.util.Try$.apply(Try.scala:161) at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:223) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Using-DataFrame-to-write-carbondata-file-cause-no-table-found-error-tp3203p3207.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: Hive create table error
It works. Thanks On Fri, Nov 25, 2016 at 6:03 PM, Sea <261810...@qq.com> wrote: > Hi, william > > > please set your mysql charset latin-1 > > > > > -- 原始邮件 -- > 发件人: "william";; > 发送时间: 2016年11月25日(星期五) 晚上6:00 > 收件人: "dev" ; > > 主题: Hive create table error > > > > > > > > > > > > When I start StreamingPro with carbondata support, hive create table > `TABLE_PARAMS` will fail . > -- Best Regards ___ 开阔视野 专注开发 WilliamZhu 祝海林 zh...@csdn.net 产品事业部-基础平台-搜索&数据挖掘 手机:18601315052 MSN:zhuhailin...@hotmail.com 微博:@PrinceCharmingJ http://weibo.com/PrinceCharmingJ 地址:北京市朝阳区广顺北大街33号院1号楼福码大厦B座12层 ___ http://www.csdn.net You're the One 全球最大中文IT技术社区 一切由你开始 http://www.iteye.net 程序员深度交流社区
Re: Re: CarbonData propose major version number increment for nextversion (to 1.0.0)
Hi Thanks you participated in discussion. Currently, CarbonData has been formally deployed by many users. Just like i mentioned in the previous post, take 1.0.0,it would be helpful to reduce maintenance cost through distinguishing the major different version. Regards Liang cenyuhai wrote > -1 > I think 1.0.0 should be a production-ready version, when we think carbon > is ready, we can change it to 1.0.0. > > > Regards > yuhai > > > > > -- 原始邮件 -- > 发件人: "sujith chacko"; > sujithchacko.2010@ > ; > 发送时间: 2016年11月25日(星期五) 下午5:30 > 收件人: "dev" > dev@.apache > ; > > 主题: Re: CarbonData propose major version number increment for nextversion > (to 1.0.0) > > > > +1 > > Thanks, > Sujith > > On Nov 24, 2016 10:37 PM, "manish gupta" > tomanishgupta18@ > wrote: > >> +1 >> >> Regards >> Manish Gupta >> >> On Thu, Nov 24, 2016 at 7:30 PM, Kumar Vishal > kumarvishal1802@ > >> wrote: >> >> > +1 >> > >> > -Regards >> > Kumar Vishal >> > >> > On Thu, Nov 24, 2016 at 2:41 PM, Raghunandan S < >> > > carbondatacontributions@ >> wrote: >> > >> > > +1 >> > > On Thu, 24 Nov 2016 at 2:30 PM, Liang Chen > chenliang6136@ > >> > > wrote: >> > > >> > > > Hi >> > > > >> > > > Ya, good proposal. >> > > > CarbonData 0.x version integrate with spark 1.x, and the load data >> > > > solution >> > > > of 0.x version is using kettle. >> > > > CarbonData 1.x version integrate with spark 2.x, the load data >> solution >> > > of >> > > > 1.x version will not use kettle . >> > > > >> > > > That would be helpful to reduce maintenance cost through >> distinguishing >> > > the >> > > > major different version. >> > > > >> > > > +1 for the proposal. >> > > > >> > > > Regards >> > > > Liang >> > > > >> > > > >> > > > Venkata Gollamudi wrote >> > > > > Hi All, >> > > > > >> > > > > CarbonData 0.2.0 has been a good work and stable release with lot >> of >> > > > > defects fixed and with number of performance improvements. >> > > > > >> > > > https://issues.apache.org/jira/browse/CARBONDATA-320? >> > jql=project%20%3D% >> > > >> 20CARBONDATA%20AND%20fixVersion%20%3D%200.2.0-incubating%20ORDER%20BY% >> > > 20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC >> > > > > >> > > > > Next version has many major and new value added features are >> planned, >> > > > > taking CarbonData capability to next level. >> > > > > Like >> > > > > - IUD(Insert-Update-Delete) support, >> > > > > - complete rewrite of data load flow with out Kettle, >> > > > > - Spark 2.x support, >> > > > > - Standardize CarbonInputFormat and CarbonOutputFormat, >> > > > > - alluxio(tachyon) file system support, >> > > > > - Carbon thrift format optimization for fast query, >> > > > > - Data loading performance improvement and In memory off heap >> > sorting, >> > > > > - Query performance improvement using off heap, >> > > > > - Support Vectorized batch reader. >> > > > > >> > > > > >> > > > https://issues.apache.org/jira/browse/CARBONDATA-301? >> > jql=project%20%3D% >> > > >> 20CARBONDATA%20AND%20fixVersion%20%3D%200.3.0-incubating%20ORDER%20BY% >> > > 20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC >> > > > > >> > > > > I think it makes sense to change CarbonData Major version in next >> > > version >> > > > > to 1.0.0. >> > > > > Please comment and vote on this. >> > > > > >> > > > > Thanks, >> > > > > Ramana >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > View this message in context: >> > > > http://apache-carbondata-mailing-list-archive.1130556. >> > > n5.nabble.com/CarbonData-propose-major-version-number- >> > > increment-for-next-version-to-1-0-0-tp3131p3157.html >> > > > Sent from the Apache CarbonData Mailing List archive mailing list >> > archive >> > > > at Nabble.com. >> > > > >> > > >> > >> -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/CarbonData-propose-major-version-number-increment-for-next-version-to-1-0-0-tp3131p3199.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: CarbonData propose major version number increment for next version (to 1.0.0)
+1 Good idea. Generally speaking minor version is bug fix, major is breaking API and command change. Regards JB On Nov 25, 2016, 10:00, at 10:00, sujith chackowrote: >+1 > >Thanks, >Sujith > >On Nov 24, 2016 10:37 PM, "manish gupta" >wrote: > >> +1 >> >> Regards >> Manish Gupta >> >> On Thu, Nov 24, 2016 at 7:30 PM, Kumar Vishal > >> wrote: >> >> > +1 >> > >> > -Regards >> > Kumar Vishal >> > >> > On Thu, Nov 24, 2016 at 2:41 PM, Raghunandan S < >> > carbondatacontributi...@gmail.com> wrote: >> > >> > > +1 >> > > On Thu, 24 Nov 2016 at 2:30 PM, Liang Chen > >> > > wrote: >> > > >> > > > Hi >> > > > >> > > > Ya, good proposal. >> > > > CarbonData 0.x version integrate with spark 1.x, and the load >data >> > > > solution >> > > > of 0.x version is using kettle. >> > > > CarbonData 1.x version integrate with spark 2.x, the load data >> solution >> > > of >> > > > 1.x version will not use kettle . >> > > > >> > > > That would be helpful to reduce maintenance cost through >> distinguishing >> > > the >> > > > major different version. >> > > > >> > > > +1 for the proposal. >> > > > >> > > > Regards >> > > > Liang >> > > > >> > > > >> > > > Venkata Gollamudi wrote >> > > > > Hi All, >> > > > > >> > > > > CarbonData 0.2.0 has been a good work and stable release with >lot >> of >> > > > > defects fixed and with number of performance improvements. >> > > > > >> > > > https://issues.apache.org/jira/browse/CARBONDATA-320? >> > jql=project%20%3D% >> > > >20CARBONDATA%20AND%20fixVersion%20%3D%200.2.0-incubating%20ORDER%20BY% >> > > 20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC >> > > > > >> > > > > Next version has many major and new value added features are >> planned, >> > > > > taking CarbonData capability to next level. >> > > > > Like >> > > > > - IUD(Insert-Update-Delete) support, >> > > > > - complete rewrite of data load flow with out Kettle, >> > > > > - Spark 2.x support, >> > > > > - Standardize CarbonInputFormat and CarbonOutputFormat, >> > > > > - alluxio(tachyon) file system support, >> > > > > - Carbon thrift format optimization for fast query, >> > > > > - Data loading performance improvement and In memory off heap >> > sorting, >> > > > > - Query performance improvement using off heap, >> > > > > - Support Vectorized batch reader. >> > > > > >> > > > > >> > > > https://issues.apache.org/jira/browse/CARBONDATA-301? >> > jql=project%20%3D% >> > > >20CARBONDATA%20AND%20fixVersion%20%3D%200.3.0-incubating%20ORDER%20BY% >> > > 20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC >> > > > > >> > > > > I think it makes sense to change CarbonData Major version in >next >> > > version >> > > > > to 1.0.0. >> > > > > Please comment and vote on this. >> > > > > >> > > > > Thanks, >> > > > > Ramana >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > View this message in context: >> > > > http://apache-carbondata-mailing-list-archive.1130556. >> > > n5.nabble.com/CarbonData-propose-major-version-number- >> > > increment-for-next-version-to-1-0-0-tp3131p3157.html >> > > > Sent from the Apache CarbonData Mailing List archive mailing >list >> > archive >> > > > at Nabble.com. >> > > > >> > > >> > >>
Re: CarbonData propose major version number increment for next version (to 1.0.0)
+1 Thanks, Sujith On Nov 24, 2016 10:37 PM, "manish gupta"wrote: > +1 > > Regards > Manish Gupta > > On Thu, Nov 24, 2016 at 7:30 PM, Kumar Vishal > wrote: > > > +1 > > > > -Regards > > Kumar Vishal > > > > On Thu, Nov 24, 2016 at 2:41 PM, Raghunandan S < > > carbondatacontributi...@gmail.com> wrote: > > > > > +1 > > > On Thu, 24 Nov 2016 at 2:30 PM, Liang Chen > > > wrote: > > > > > > > Hi > > > > > > > > Ya, good proposal. > > > > CarbonData 0.x version integrate with spark 1.x, and the load data > > > > solution > > > > of 0.x version is using kettle. > > > > CarbonData 1.x version integrate with spark 2.x, the load data > solution > > > of > > > > 1.x version will not use kettle . > > > > > > > > That would be helpful to reduce maintenance cost through > distinguishing > > > the > > > > major different version. > > > > > > > > +1 for the proposal. > > > > > > > > Regards > > > > Liang > > > > > > > > > > > > Venkata Gollamudi wrote > > > > > Hi All, > > > > > > > > > > CarbonData 0.2.0 has been a good work and stable release with lot > of > > > > > defects fixed and with number of performance improvements. > > > > > > > > > https://issues.apache.org/jira/browse/CARBONDATA-320? > > jql=project%20%3D% > > > 20CARBONDATA%20AND%20fixVersion%20%3D%200.2.0-incubating%20ORDER%20BY% > > > 20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC > > > > > > > > > > Next version has many major and new value added features are > planned, > > > > > taking CarbonData capability to next level. > > > > > Like > > > > > - IUD(Insert-Update-Delete) support, > > > > > - complete rewrite of data load flow with out Kettle, > > > > > - Spark 2.x support, > > > > > - Standardize CarbonInputFormat and CarbonOutputFormat, > > > > > - alluxio(tachyon) file system support, > > > > > - Carbon thrift format optimization for fast query, > > > > > - Data loading performance improvement and In memory off heap > > sorting, > > > > > - Query performance improvement using off heap, > > > > > - Support Vectorized batch reader. > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/CARBONDATA-301? > > jql=project%20%3D% > > > 20CARBONDATA%20AND%20fixVersion%20%3D%200.3.0-incubating%20ORDER%20BY% > > > 20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC > > > > > > > > > > I think it makes sense to change CarbonData Major version in next > > > version > > > > > to 1.0.0. > > > > > Please comment and vote on this. > > > > > > > > > > Thanks, > > > > > Ramana > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > View this message in context: > > > > http://apache-carbondata-mailing-list-archive.1130556. > > > n5.nabble.com/CarbonData-propose-major-version-number- > > > increment-for-next-version-to-1-0-0-tp3131p3157.html > > > > Sent from the Apache CarbonData Mailing List archive mailing list > > archive > > > > at Nabble.com. > > > > > > > > > >