Re: CarbonData propose major version number increment for next version (to 1.0.0)

2016-11-25 Thread bill.zhou
+1 
Regards
Bill

Venkata Gollamudi wrote
> Hi All,
> 
> CarbonData 0.2.0 has been a good work and stable release with lot of
> defects fixed and with number of performance improvements.
> https://issues.apache.org/jira/browse/CARBONDATA-320?jql=project%20%3D%20CARBONDATA%20AND%20fixVersion%20%3D%200.2.0-incubating%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC
> 
> Next version has many major and new value added features are planned,
> taking CarbonData capability to next level.
> Like
> - IUD(Insert-Update-Delete) support,
> - complete rewrite of data load flow with out Kettle,
> - Spark 2.x support,
> - Standardize CarbonInputFormat and CarbonOutputFormat,
> - alluxio(tachyon) file system support,
> - Carbon thrift format optimization for fast query,
> - Data loading performance improvement and In memory off heap sorting,
> - Query performance improvement using off heap,
> - Support Vectorized batch reader.
> 
> https://issues.apache.org/jira/browse/CARBONDATA-301?jql=project%20%3D%20CARBONDATA%20AND%20fixVersion%20%3D%200.3.0-incubating%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC
> 
> I think it makes sense to change CarbonData Major version in next version
> to 1.0.0.
> Please comment and vote on this.
> 
> Thanks,
> Ramana





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/CarbonData-propose-major-version-number-increment-for-next-version-to-1-0-0-tp3131p3219.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


Re: CarbonData propose major version number increment for nextversion (to 1.0.0)

2016-11-25 Thread Jay
+1. i think it's natural for that big changes will made. 
another reason is,carbon is stable enough after many bug fixed.


Regards
Jay
-- Original --
From:  "Lion.X";;
Date:  Sat, Nov 26, 2016 09:47 AM
To:  "dev"; 

Subject:  Re: CarbonData propose major version number increment for nextversion 
(to 1.0.0)



+1
I think it is a good choice because the new features add into next version
are disruptive changes in carbon architecture.

Regards,
Lionx



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/CarbonData-propose-major-version-number-increment-for-next-version-to-1-0-0-tp3131p3217.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: CarbonData propose major version number increment for next version (to 1.0.0)

2016-11-25 Thread Lion.X
+1
I think it is a good choice because the new features add into next version
are disruptive changes in carbon architecture.

Regards,
Lionx



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/CarbonData-propose-major-version-number-increment-for-next-version-to-1-0-0-tp3131p3217.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


Re: [Improvement] Use Trie in place of HashMap to reduce memory footprint of Dictionary

2016-11-25 Thread Xiaoqiao He
Hi Liang, Kumar Vishal,

I has done a standard benchmark about multiply data structures for
Dictionary following your suggestions. Based on the test results, I think
DAT may be the best choice for CarbonData.

*1. Here are 2 test results:*
---
Benchmark about {HashMap,DAT,RadixTree,TrieDict} Structures for Dictionary
  HashMap :   java.util.HashMap
  DAT (Double Array Trie):
https://github.com/komiya-atsushi/darts-java
  RadixTree:
https://github.com/npgall/concurrent-trees
  TrieDict (Dictionary in Kylin):
http://kylin.apache.org/blog/2015/08/13/kylin-dictionary
Dictionary Source (Traditional Chinese):
https://raw.githubusercontent.com/fxsjy/jieba/master/extra_dict/dict.txt.big
Test Result
a. Dictionary Size:584429

b. Build Time (ms) :
   DAT   : 5714
   HashMap   : 110
   RadixTree : 22044
   TrieDict  : 855

c. Memory footprint in 64-bit JVM (bytes) :
   DAT   : 16779752
   HashMap   : 32196592
   RadixTree : 46130584
   TrieDict  : 10443608

d. Retrieval Performance for 9935293 query times (ms) :
   DAT   : 585
   HashMap   : 1010
   RadixTree : 417639
   TrieDict  : 8664
Test Result

Test Result
a. Dictionary Size:584429

b. Build Time (ms) :
   DAT   : 5867
   HashMap   : 100
   RadixTree : 22082
   TrieDict  : 840

c. Memory footprint in 64-bit JVM (bytes) :
   DAT   : 16779752
   HashMap   : 32196592
   RadixTree : 46130584
   TrieDict  : 10443608

d. Retrieval Performance for 9935293 query times (ms) :
   DAT   : 593
   HashMap   : 821
   RadixTree : 422297
   TrieDict  : 8752
Test Result

*2. Conclusion:*
a. TrieDict is good for building tree and less memory footprint overhead,
but worst retrieval performance,
b. DAT is a good tradeoff between memory footprint and retrieval
performance,
c. RadixTree has the worst performance in different aspects.

*3. Result Analysis:*
a. With Trie the memory footprint of the TrieDict mapping is kinda
minimized if compared to HashMap, in order to improve performance there is
a cache layer overlays on top of Trie.
b. Because a large number of duplicate prefix data, the total memory
footprint is more than trie, meanwhile i think calculating string hash code
of traditional Chinese consume considerable time overhead, so the
performance is not the best.
c. DAT is a better tradeoff.
d. I have no idea why RadixTree has the worst performance in terms of
memory, retrieval and building tree.


On Fri, Nov 25, 2016 at 11:28 AM, Liang Chen 
wrote:

> Hi xiaoqiao
>
> ok, look forward to seeing your test result.
> Can you take this task for this improvement? Please let me know if you need
> any support :)
>
> Regards
> Liang
>
>
> hexiaoqiao wrote
> > Hi Kumar Vishal,
> >
> > Thanks for your suggestions. As you said, choose Trie replace HashMap we
> > can get better memory footprint and also good performance. Of course, DAT
> > is not only choice, and I will do test about DAT vs Radix Trie and
> release
> > the test result as soon as possible. Thanks your suggestions again.
> >
> > Regards,
> > Xiaoqiao
> >
> > On Thu, Nov 24, 2016 at 4:48 PM, Kumar Vishal 
>
> > kumarvishal1802@
>
> > 
> > wrote:
> >
> >> Hi XIaoqiao He,
> >> +1,
> >> For forward dictionary case it will be very good optimisation, as our
> >> case
> >> is very specific storing byte array to int mapping[data to surrogate key
> >> mapping], I think we will get much better memory footprint and
> >> performance
> >> will be also good(2x). We can also try radix tree(radix trie), it is
> more
> >> optimise for storage.
> >>
> >> -Regards
> >> Kumar Vishal
> >>
> >> On Thu, Nov 24, 2016 at 12:12 PM, Liang Chen 
>
> > chenliang6136@
>
> > 
> >> wrote:
> >>
> >> > Hi xiaoqiao
> >> >
> >> > For the below example, 600K dictionary data:
> >> > It is to say that using "DAT" can save 36M memory against
> >> > "ConcurrentHashMap", whereas the performance just lost less (1718ms) ?
> >> >
> >> > One more question:if increases the dictionary data size, what's the
> >> > comparison results "ConcurrentHashMap" VS "DAT"
> >> >
> >> > Regards
> >> > Liang
> >> > 
> >> > --
> >> > a. memory footprint (approximate quantity) in 64-bit JVM:
> >> > ~104MB (*ConcurrentHashMap*) vs ~68MB (*DAT*)
> >> >
> >> > b. retrieval performance: total time(ms) of 500 million query:
> >> > 12825 ms(*ConcurrentHashMap*) vs 14543 ms(*DAT*)
> >> >
> >> > Regards
> >> > Liang
> >> >
> >> > hexiaoqiao wrote
> >> > > hi Liang,
> >> > >
> >> > > Thanks for your reply, i need to correct the experiment result
> >> because
> >> > > it's
> >> > > wrong order NO.1 column of result data table.
> >> > >
> >> > > In order to compare performance 

Re: Using DataFrame to write carbondata file cause no table found error

2016-11-25 Thread Ravindra Pesala
Hi,

In Append mode , the carbon table supposed to be created before other wise
load fails as Table do not exist.
In Overwrite mode the carbon table would be created (it drops if it already
exists) and loads the data. But in your case for overwrite mode it creates
the table but it says table not found while loading. Can you provide script
to reproduce this issue and also provide the carbondata and spark version
you are using.

Regards,
Ravindra.

On 25 November 2016 at 17:58, ZhuWilliam  wrote:

> When I change the SaveMode.Append to Override,then the error is more weird:
>
>
>
> INFO  25-11 20:19:46,572 - streaming-job-executor-0 Query [
>   CREATE TABLE IF NOT EXISTS DEFAULT.CARBON2
>   (A STRING, B STRING)
>   STORED BY 'ORG.APACHE.CARBONDATA.FORMAT'
>   ]
> INFO  25-11 20:19:46,656 - Parsing command:
>   CREATE TABLE IF NOT EXISTS default.carbon2
>   (a STRING, b STRING)
>   STORED BY 'org.apache.carbondata.format'
>
> INFO  25-11 20:19:46,663 - Parse Completed
> AUDIT 25-11 20:19:46,860 - [allwefantasy][allwefantasy][
> Thread-100]Creating
> Table with Database name [default] and Table name [carbon2]
> INFO  25-11 20:19:46,889 - 1: get_tables: db=default pat=.*
> INFO  25-11 20:19:46,889 - ugi=allwefantasy ip=unknown-ip-addr
> cmd=get_tables: db=default pat=.*
> INFO  25-11 20:19:46,889 - 1: Opening raw store with implemenation
> class:org.apache.hadoop.hive.metastore.ObjectStore
> INFO  25-11 20:19:46,891 - ObjectStore, initialize called
> INFO  25-11 20:19:46,897 - Reading in results for query
> "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used
> is
> closing
> INFO  25-11 20:19:46,898 - Using direct SQL, underlying DB is MYSQL
> INFO  25-11 20:19:46,898 - Initialized ObjectStore
> INFO  25-11 20:19:46,954 - streaming-job-executor-0 Table block size not
> specified for default_carbon2. Therefore considering the default value 1024
> MB
> INFO  25-11 20:19:46,978 - Table carbon2 for Database default created
> successfully.
> INFO  25-11 20:19:46,978 - streaming-job-executor-0 Table carbon2 for
> Database default created successfully.
> AUDIT 25-11 20:19:46,978 - [allwefantasy][allwefantasy][
> Thread-100]Creating
> timestamp file for default.carbon2
> INFO  25-11 20:19:46,979 - streaming-job-executor-0 Query [CREATE TABLE
> DEFAULT.CARBON2 USING CARBONDATA OPTIONS (TABLENAME "DEFAULT.CARBON2",
> TABLEPATH "FILE:///TMP/CARBONDATA/STORE/DEFAULT/CARBON2") ]
> INFO  25-11 20:19:47,033 - 1: get_table : db=default tbl=carbon2
> INFO  25-11 20:19:47,034 - ugi=allwefantasy ip=unknown-ip-addr
> cmd=get_table
> : db=default tbl=carbon2
> WARN  25-11 20:19:47,062 - Couldn't find corresponding Hive SerDe for data
> source provider carbondata. Persisting data source relation
> `default`.`carbon2` into Hive metastore in Spark SQL specific format, which
> is NOT compatible with Hive.
> INFO  25-11 20:19:47,247 - 1: create_table: Table(tableName:carbon2,
> dbName:default, owner:allwefantasy, createTime:1480076387,
> lastAccessTime:0,
> retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col,
> type:array, comment:from deserializer)], location:null,
> inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.
> MetadataTypedColumnsetSerDe,
> parameters:{tableName=default.carbon2, serialization.format=1,
> tablePath=file:///tmp/carbondata/store/default/carbon2}), bucketCols:[],
> sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[],
> skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[],
> parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=carbondata},
> viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE,
> privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null,
> rolePrivileges:null))
> INFO  25-11 20:19:47,247 - ugi=allwefantasy ip=unknown-ip-addr
> cmd=create_table: Table(tableName:carbon2, dbName:default,
> owner:allwefantasy, createTime:1480076387, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array,
> comment:from deserializer)], location:null,
> inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,
> outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat,
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.serde2.
> MetadataTypedColumnsetSerDe,
> parameters:{tableName=default.carbon2, serialization.format=1,
> tablePath=file:///tmp/carbondata/store/default/carbon2}), bucketCols:[],
> sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[],
> skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[],
> parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=carbondata},
> viewOriginalText:null, 

Re: Using DataFrame to write carbondata file cause no table found error

2016-11-25 Thread ZhuWilliam
When I change the SaveMode.Append to Override,then the error is more weird:



INFO  25-11 20:19:46,572 - streaming-job-executor-0 Query [
  CREATE TABLE IF NOT EXISTS DEFAULT.CARBON2
  (A STRING, B STRING)
  STORED BY 'ORG.APACHE.CARBONDATA.FORMAT'
  ]
INFO  25-11 20:19:46,656 - Parsing command: 
  CREATE TABLE IF NOT EXISTS default.carbon2
  (a STRING, b STRING)
  STORED BY 'org.apache.carbondata.format'
  
INFO  25-11 20:19:46,663 - Parse Completed
AUDIT 25-11 20:19:46,860 - [allwefantasy][allwefantasy][Thread-100]Creating
Table with Database name [default] and Table name [carbon2]
INFO  25-11 20:19:46,889 - 1: get_tables: db=default pat=.*
INFO  25-11 20:19:46,889 - ugi=allwefantasy ip=unknown-ip-addr
cmd=get_tables: db=default pat=.*   
INFO  25-11 20:19:46,889 - 1: Opening raw store with implemenation
class:org.apache.hadoop.hive.metastore.ObjectStore
INFO  25-11 20:19:46,891 - ObjectStore, initialize called
INFO  25-11 20:19:46,897 - Reading in results for query
"org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is
closing
INFO  25-11 20:19:46,898 - Using direct SQL, underlying DB is MYSQL
INFO  25-11 20:19:46,898 - Initialized ObjectStore
INFO  25-11 20:19:46,954 - streaming-job-executor-0 Table block size not
specified for default_carbon2. Therefore considering the default value 1024
MB
INFO  25-11 20:19:46,978 - Table carbon2 for Database default created
successfully.
INFO  25-11 20:19:46,978 - streaming-job-executor-0 Table carbon2 for
Database default created successfully.
AUDIT 25-11 20:19:46,978 - [allwefantasy][allwefantasy][Thread-100]Creating
timestamp file for default.carbon2
INFO  25-11 20:19:46,979 - streaming-job-executor-0 Query [CREATE TABLE
DEFAULT.CARBON2 USING CARBONDATA OPTIONS (TABLENAME "DEFAULT.CARBON2",
TABLEPATH "FILE:///TMP/CARBONDATA/STORE/DEFAULT/CARBON2") ]
INFO  25-11 20:19:47,033 - 1: get_table : db=default tbl=carbon2
INFO  25-11 20:19:47,034 - ugi=allwefantasy ip=unknown-ip-addr  
cmd=get_table
: db=default tbl=carbon2
WARN  25-11 20:19:47,062 - Couldn't find corresponding Hive SerDe for data
source provider carbondata. Persisting data source relation
`default`.`carbon2` into Hive metastore in Spark SQL specific format, which
is NOT compatible with Hive.
INFO  25-11 20:19:47,247 - 1: create_table: Table(tableName:carbon2,
dbName:default, owner:allwefantasy, createTime:1480076387, lastAccessTime:0,
retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col,
type:array, comment:from deserializer)], location:null,
inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,
parameters:{tableName=default.carbon2, serialization.format=1,
tablePath=file:///tmp/carbondata/store/default/carbon2}), bucketCols:[],
sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[],
skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[],
parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=carbondata},
viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE,
privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null,
rolePrivileges:null))
INFO  25-11 20:19:47,247 - ugi=allwefantasy ip=unknown-ip-addr
cmd=create_table: Table(tableName:carbon2, dbName:default,
owner:allwefantasy, createTime:1480076387, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array,
comment:from deserializer)], location:null,
inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,
parameters:{tableName=default.carbon2, serialization.format=1,
tablePath=file:///tmp/carbondata/store/default/carbon2}), bucketCols:[],
sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[],
skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[],
parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=carbondata},
viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE,
privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null,
rolePrivileges:null))   
INFO  25-11 20:19:47,257 - Creating directory if it doesn't exist:
file:/tmp/user/hive/warehouse/carbon2
AUDIT 25-11 20:19:47,564 - [allwefantasy][allwefantasy][Thread-100]Table
created with Database name [default] and Table name [carbon2]
org.apache.spark.sql.catalyst.analysis.NoSuchTableException
at
org.apache.spark.sql.hive.CarbonMetastoreCatalog.lookupRelation1(CarbonMetastoreCatalog.scala:141)
at
org.apache.spark.sql.hive.CarbonMetastoreCatalog.lookupRelation1(CarbonMetastoreCatalog.scala:127)

[jira] [Created] (CARBONDATA-450) Increase Test Coverage for Core.reader module

2016-11-25 Thread SWATI RAO (JIRA)
SWATI RAO created CARBONDATA-450:


 Summary: Increase Test Coverage for Core.reader module
 Key: CARBONDATA-450
 URL: https://issues.apache.org/jira/browse/CARBONDATA-450
 Project: CarbonData
  Issue Type: Test
Reporter: SWATI RAO






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Using DataFrame to write carbondata file cause no table found error

2016-11-25 Thread ZhuWilliam
Here is the error: 

ERROR 25-11 18:13:40,116 - Data loading failed. table not found:
default.carbon1
AUDIT 25-11 18:13:40,118 - [allwefantasy][allwefantasy][Thread-98]Data
loading failed. table not found: default.carbon1
INFO  25-11 18:13:40,119 - Finished job streaming job 148006882 ms.0
from job set of time 148006882 ms
INFO  25-11 18:13:40,119 - Total delay: 0.119 s for time 148006882 ms
(execution: 0.106 s)
INFO  25-11 18:13:40,120 - Removing RDD 4 from persistence list
java.lang.RuntimeException: Data loading failed. table not found:
default.carbon1
at scala.sys.package$.error(package.scala:27)
at
org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:1040)
at
org.apache.carbondata.spark.CarbonDataFrameWriter.loadDataFrame(CarbonDataFrameWriter.scala:132)
at
org.apache.carbondata.spark.CarbonDataFrameWriter.writeToCarbonFile(CarbonDataFrameWriter.scala:52)
at
org.apache.carbondata.spark.CarbonDataFrameWriter.appendToCarbonFile(CarbonDataFrameWriter.scala:43)
at
org.apache.spark.sql.CarbonSource.createRelation(CarbonDatasourceRelation.scala:112)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at
streaming.core.compositor.spark.streaming.output.SQLOutputCompositor$$anonfun$result$1.apply(SQLOutputCompositor.scala:61)
at
streaming.core.compositor.spark.streaming.output.SQLOutputCompositor$$anonfun$result$1.apply(SQLOutputCompositor.scala:53)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224)
at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:223)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Using-DataFrame-to-write-carbondata-file-cause-no-table-found-error-tp3203p3207.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


Re: Hive create table error

2016-11-25 Thread william
It works. Thanks

On Fri, Nov 25, 2016 at 6:03 PM, Sea <261810...@qq.com> wrote:

> Hi, william
>
>
>   please set your mysql charset latin-1
>
>
>
>
> -- 原始邮件 --
> 发件人: "william";;
> 发送时间: 2016年11月25日(星期五) 晚上6:00
> 收件人: "dev";
>
> 主题: Hive create table error
>
>
>
>
>
>
>
>
>
>
>
> When I start StreamingPro with carbondata support, hive create table
> `TABLE_PARAMS` will fail .
>



-- 
Best Regards
___
开阔视野  专注开发
WilliamZhu   祝海林  zh...@csdn.net
产品事业部-基础平台-搜索&数据挖掘
手机:18601315052
MSN:zhuhailin...@hotmail.com
微博:@PrinceCharmingJ  http://weibo.com/PrinceCharmingJ
地址:北京市朝阳区广顺北大街33号院1号楼福码大厦B座12层
___
http://www.csdn.net  You're the One
全球最大中文IT技术社区   一切由你开始

http://www.iteye.net
程序员深度交流社区


Re: Re: CarbonData propose major version number increment for nextversion (to 1.0.0)

2016-11-25 Thread Liang Chen
Hi

Thanks you participated in discussion.

Currently, CarbonData has been formally deployed by many users.

Just like i mentioned in the previous post, take 1.0.0,it would be helpful
to reduce maintenance cost through 
distinguishing the major different version.

Regards
Liang

cenyuhai wrote
> -1
> I think 1.0.0 should be a production-ready version, when we think carbon
> is ready, we can change it to 1.0.0.
> 
> 
> Regards
> yuhai
> 
> 
> 
> 
> -- 原始邮件 --
> 发件人: "sujith chacko";

> sujithchacko.2010@

> ;
> 发送时间: 2016年11月25日(星期五) 下午5:30
> 收件人: "dev"

> dev@.apache

> ; 
> 
> 主题: Re: CarbonData propose major version number increment for nextversion
> (to 1.0.0)
> 
> 
> 
> +1
> 
> Thanks,
> Sujith
> 
> On Nov 24, 2016 10:37 PM, "manish gupta" 

> tomanishgupta18@

>  wrote:
> 
>> +1
>>
>> Regards
>> Manish Gupta
>>
>> On Thu, Nov 24, 2016 at 7:30 PM, Kumar Vishal 

> kumarvishal1802@

> 
>> wrote:
>>
>> > +1
>> >
>> > -Regards
>> > Kumar Vishal
>> >
>> > On Thu, Nov 24, 2016 at 2:41 PM, Raghunandan S <
>> > 

> carbondatacontributions@

>> wrote:
>> >
>> > > +1
>> > > On Thu, 24 Nov 2016 at 2:30 PM, Liang Chen 

> chenliang6136@

> 
>> > > wrote:
>> > >
>> > > > Hi
>> > > >
>> > > > Ya, good proposal.
>> > > > CarbonData 0.x version integrate with spark 1.x,  and the load data
>> > > > solution
>> > > > of 0.x version is using kettle.
>> > > > CarbonData 1.x version integrate with spark 2.x, the load data
>> solution
>> > > of
>> > > > 1.x version will not use kettle .
>> > > >
>> > > > That would be helpful to reduce maintenance cost through
>> distinguishing
>> > > the
>> > > > major different version.
>> > > >
>> > > > +1 for the proposal.
>> > > >
>> > > > Regards
>> > > > Liang
>> > > >
>> > > >
>> > > > Venkata Gollamudi wrote
>> > > > > Hi All,
>> > > > >
>> > > > > CarbonData 0.2.0 has been a good work and stable release with lot
>> of
>> > > > > defects fixed and with number of performance improvements.
>> > > > >
>> > > > https://issues.apache.org/jira/browse/CARBONDATA-320?
>> > jql=project%20%3D%
>> > >
>> 20CARBONDATA%20AND%20fixVersion%20%3D%200.2.0-incubating%20ORDER%20BY%
>> > > 20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC
>> > > > >
>> > > > > Next version has many major and new value added features are
>> planned,
>> > > > > taking CarbonData capability to next level.
>> > > > > Like
>> > > > > - IUD(Insert-Update-Delete) support,
>> > > > > - complete rewrite of data load flow with out Kettle,
>> > > > > - Spark 2.x support,
>> > > > > - Standardize CarbonInputFormat and CarbonOutputFormat,
>> > > > > - alluxio(tachyon) file system support,
>> > > > > - Carbon thrift format optimization for fast query,
>> > > > > - Data loading performance improvement and In memory off heap
>> > sorting,
>> > > > > - Query performance improvement using off heap,
>> > > > > - Support Vectorized batch reader.
>> > > > >
>> > > > >
>> > > > https://issues.apache.org/jira/browse/CARBONDATA-301?
>> > jql=project%20%3D%
>> > >
>> 20CARBONDATA%20AND%20fixVersion%20%3D%200.3.0-incubating%20ORDER%20BY%
>> > > 20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC
>> > > > >
>> > > > > I think it makes sense to change CarbonData Major version in next
>> > > version
>> > > > > to 1.0.0.
>> > > > > Please comment and vote on this.
>> > > > >
>> > > > > Thanks,
>> > > > > Ramana
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > View this message in context:
>> > > > http://apache-carbondata-mailing-list-archive.1130556.
>> > > n5.nabble.com/CarbonData-propose-major-version-number-
>> > > increment-for-next-version-to-1-0-0-tp3131p3157.html
>> > > > Sent from the Apache CarbonData Mailing List archive mailing list
>> > archive
>> > > > at Nabble.com.
>> > > >
>> > >
>> >
>>





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/CarbonData-propose-major-version-number-increment-for-next-version-to-1-0-0-tp3131p3199.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


Re: CarbonData propose major version number increment for next version (to 1.0.0)

2016-11-25 Thread Jean-Baptiste Onofré
+1

Good idea.

Generally speaking minor version is bug fix, major is breaking API and command 
change.

Regards
JB⁣​

On Nov 25, 2016, 10:00, at 10:00, sujith chacko  
wrote:
>+1
>
>Thanks,
>Sujith
>
>On Nov 24, 2016 10:37 PM, "manish gupta" 
>wrote:
>
>> +1
>>
>> Regards
>> Manish Gupta
>>
>> On Thu, Nov 24, 2016 at 7:30 PM, Kumar Vishal
>
>> wrote:
>>
>> > +1
>> >
>> > -Regards
>> > Kumar Vishal
>> >
>> > On Thu, Nov 24, 2016 at 2:41 PM, Raghunandan S <
>> > carbondatacontributi...@gmail.com> wrote:
>> >
>> > > +1
>> > > On Thu, 24 Nov 2016 at 2:30 PM, Liang Chen
>
>> > > wrote:
>> > >
>> > > > Hi
>> > > >
>> > > > Ya, good proposal.
>> > > > CarbonData 0.x version integrate with spark 1.x,  and the load
>data
>> > > > solution
>> > > > of 0.x version is using kettle.
>> > > > CarbonData 1.x version integrate with spark 2.x, the load data
>> solution
>> > > of
>> > > > 1.x version will not use kettle .
>> > > >
>> > > > That would be helpful to reduce maintenance cost through
>> distinguishing
>> > > the
>> > > > major different version.
>> > > >
>> > > > +1 for the proposal.
>> > > >
>> > > > Regards
>> > > > Liang
>> > > >
>> > > >
>> > > > Venkata Gollamudi wrote
>> > > > > Hi All,
>> > > > >
>> > > > > CarbonData 0.2.0 has been a good work and stable release with
>lot
>> of
>> > > > > defects fixed and with number of performance improvements.
>> > > > >
>> > > > https://issues.apache.org/jira/browse/CARBONDATA-320?
>> > jql=project%20%3D%
>> > >
>20CARBONDATA%20AND%20fixVersion%20%3D%200.2.0-incubating%20ORDER%20BY%
>> > > 20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC
>> > > > >
>> > > > > Next version has many major and new value added features are
>> planned,
>> > > > > taking CarbonData capability to next level.
>> > > > > Like
>> > > > > - IUD(Insert-Update-Delete) support,
>> > > > > - complete rewrite of data load flow with out Kettle,
>> > > > > - Spark 2.x support,
>> > > > > - Standardize CarbonInputFormat and CarbonOutputFormat,
>> > > > > - alluxio(tachyon) file system support,
>> > > > > - Carbon thrift format optimization for fast query,
>> > > > > - Data loading performance improvement and In memory off heap
>> > sorting,
>> > > > > - Query performance improvement using off heap,
>> > > > > - Support Vectorized batch reader.
>> > > > >
>> > > > >
>> > > > https://issues.apache.org/jira/browse/CARBONDATA-301?
>> > jql=project%20%3D%
>> > >
>20CARBONDATA%20AND%20fixVersion%20%3D%200.3.0-incubating%20ORDER%20BY%
>> > > 20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC
>> > > > >
>> > > > > I think it makes sense to change CarbonData Major version in
>next
>> > > version
>> > > > > to 1.0.0.
>> > > > > Please comment and vote on this.
>> > > > >
>> > > > > Thanks,
>> > > > > Ramana
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > View this message in context:
>> > > > http://apache-carbondata-mailing-list-archive.1130556.
>> > > n5.nabble.com/CarbonData-propose-major-version-number-
>> > > increment-for-next-version-to-1-0-0-tp3131p3157.html
>> > > > Sent from the Apache CarbonData Mailing List archive mailing
>list
>> > archive
>> > > > at Nabble.com.
>> > > >
>> > >
>> >
>>


Re: CarbonData propose major version number increment for next version (to 1.0.0)

2016-11-25 Thread sujith chacko
+1

Thanks,
Sujith

On Nov 24, 2016 10:37 PM, "manish gupta"  wrote:

> +1
>
> Regards
> Manish Gupta
>
> On Thu, Nov 24, 2016 at 7:30 PM, Kumar Vishal 
> wrote:
>
> > +1
> >
> > -Regards
> > Kumar Vishal
> >
> > On Thu, Nov 24, 2016 at 2:41 PM, Raghunandan S <
> > carbondatacontributi...@gmail.com> wrote:
> >
> > > +1
> > > On Thu, 24 Nov 2016 at 2:30 PM, Liang Chen 
> > > wrote:
> > >
> > > > Hi
> > > >
> > > > Ya, good proposal.
> > > > CarbonData 0.x version integrate with spark 1.x,  and the load data
> > > > solution
> > > > of 0.x version is using kettle.
> > > > CarbonData 1.x version integrate with spark 2.x, the load data
> solution
> > > of
> > > > 1.x version will not use kettle .
> > > >
> > > > That would be helpful to reduce maintenance cost through
> distinguishing
> > > the
> > > > major different version.
> > > >
> > > > +1 for the proposal.
> > > >
> > > > Regards
> > > > Liang
> > > >
> > > >
> > > > Venkata Gollamudi wrote
> > > > > Hi All,
> > > > >
> > > > > CarbonData 0.2.0 has been a good work and stable release with lot
> of
> > > > > defects fixed and with number of performance improvements.
> > > > >
> > > > https://issues.apache.org/jira/browse/CARBONDATA-320?
> > jql=project%20%3D%
> > > 20CARBONDATA%20AND%20fixVersion%20%3D%200.2.0-incubating%20ORDER%20BY%
> > > 20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC
> > > > >
> > > > > Next version has many major and new value added features are
> planned,
> > > > > taking CarbonData capability to next level.
> > > > > Like
> > > > > - IUD(Insert-Update-Delete) support,
> > > > > - complete rewrite of data load flow with out Kettle,
> > > > > - Spark 2.x support,
> > > > > - Standardize CarbonInputFormat and CarbonOutputFormat,
> > > > > - alluxio(tachyon) file system support,
> > > > > - Carbon thrift format optimization for fast query,
> > > > > - Data loading performance improvement and In memory off heap
> > sorting,
> > > > > - Query performance improvement using off heap,
> > > > > - Support Vectorized batch reader.
> > > > >
> > > > >
> > > > https://issues.apache.org/jira/browse/CARBONDATA-301?
> > jql=project%20%3D%
> > > 20CARBONDATA%20AND%20fixVersion%20%3D%200.3.0-incubating%20ORDER%20BY%
> > > 20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC
> > > > >
> > > > > I think it makes sense to change CarbonData Major version in next
> > > version
> > > > > to 1.0.0.
> > > > > Please comment and vote on this.
> > > > >
> > > > > Thanks,
> > > > > Ramana
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > > http://apache-carbondata-mailing-list-archive.1130556.
> > > n5.nabble.com/CarbonData-propose-major-version-number-
> > > increment-for-next-version-to-1-0-0-tp3131p3157.html
> > > > Sent from the Apache CarbonData Mailing List archive mailing list
> > archive
> > > > at Nabble.com.
> > > >
> > >
> >
>