from:"carbondata\-newuser"

How carbondata handle greater than with global dict column?

2018-11-06 Thread carbondata-newuser

For example: version column is a dict column explain select A from test_carbondata.table where date='2018-09-05' and version >= "1.8.5" ; | == Physical Plan == *(1) CarbonDictionaryDecoder [test_carbondata_m_device_distinct_for_bdindex], ExcludeProfile(ArrayBuffer()), CarbonAliasDecoderRelation()

Is carbondata replace global dictionary in driver site?

2018-10-26 Thread carbondata-newuser

For example. Assume column A has a global dictionary encoding, and the dictionary is { "A": 1, "B": 2, "C": 3 } executor 1 return the result [1,2,3] Finally the driver will replace the 1 to 'A', 2 to "B"? The replace occurs in driver not executor? If so, I wa

Why not support global sort in partition table?

2018-07-19 Thread carbondata-newuser

Such table can be created but if you insert data to the table. It will throw an error like: org.apache.carbondata.spark.exception.MalformedCarbonCommandException: Don't support use global sort on partitioned table. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabb

How to collect carbondata profile info?

2018-07-18 Thread carbondata-newuser

I have noticed that carbondata already provide profiler in version 1.4. It can collect lots of information like partitions.length,startTime,endTime,getSplitsStartTime,getSplitsEndTime,numSegments,numStreamSegments,numBlocks,distributeStartTime,distributeEndTime. How can I get this information? -

How to look up how many date in carbon without partition.

2018-07-18 Thread carbondata-newuser

It seems carbondata not recommend use partitionby and partitionby is not supported in global sort scope. It is very conveniently to look up how many date partition(along with the partition size every day) already exists in hive(save as parquet). In carbondata I add the date column to first sort col

Index file cache will not work when the table has invalid segment.

2018-07-10 Thread carbondata-newuser

Carbon version is 1.4 rc2. create table( col1 string, col2 int, col2 string, date string ) *First step:* insert into table carbonTest select col1,col2,col3,"20180707" from hiveTable2 where date="20180707"; The col3 is a hive map type, so this insert will be failed. And it will create invalid segm

Re: Carbon file size is so big than parquet.

2018-07-03 Thread carbondata-newuser

Finally I find the reason, in parquet we used gzip compressor and carbondata used snappy. Gzip has better compression ratio than snappy . -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Carbon file size is so big than parquet.

2018-07-02 Thread carbondata-newuser

'SORT_COLUMNS'='app_name,app_id,is_AAA,os,platform,activation_channel,app_version,channel,ut,language,is_F,is_BBB,version_code,os_version,timezone,display_density' -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Carbon file size is so big than parquet.

2018-07-02 Thread carbondata-newuser

I have disable inverted list in all columns, but it still 50% larger than parquet. 31G(parquet) vs 48G (carbondata) with 424,000,000 records. Carbondata version is 1.3 CREATE TABLE growth.carbondata_m_device_distinct ( A_id bigint, app_namestring, app_id int, platformstring, is_F smal

How carbondata handle greater than with global dict column?

Is carbondata replace global dictionary in driver site?

Why not support global sort in partition table?

How to collect carbondata profile info?

How to look up how many date in carbon without partition.

Index file cache will not work when the table has invalid segment.

Re: Carbon file size is so big than parquet.

Re: Carbon file size is so big than parquet.

Carbon file size is so big than parquet.

9 matches

Site Navigation

Mail list logo

Footer information