For example:
version column is a dict column
explain select A from test_carbondata.table where date='2018-09-05' and
version >= "1.8.5" ;
| == Physical Plan ==
*(1) CarbonDictionaryDecoder
[test_carbondata_m_device_distinct_for_bdindex],
ExcludeProfile(ArrayBuffer()), CarbonAliasDecoderRelation()
For example.
Assume column A has a global dictionary encoding, and the dictionary is
{
"A": 1,
"B": 2,
"C": 3
}
executor 1 return the result [1,2,3]
Finally the driver will replace the 1 to 'A', 2 to "B"?
The replace occurs in driver not executor?
If so, I wa
Such table can be created but if you insert data to the table.
It will throw an error like:
org.apache.carbondata.spark.exception.MalformedCarbonCommandException: Don't
support use global sort on partitioned table.
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabb
I have noticed that carbondata already provide profiler in version 1.4.
It can collect lots of information like
partitions.length,startTime,endTime,getSplitsStartTime,getSplitsEndTime,numSegments,numStreamSegments,numBlocks,distributeStartTime,distributeEndTime.
How can I get this information?
-
It seems carbondata not recommend use partitionby and partitionby is not
supported in global sort scope.
It is very conveniently to look up how many date partition(along with the
partition size every day) already exists in hive(save as parquet).
In carbondata I add the date column to first sort col
Carbon version is 1.4 rc2.
create table(
col1 string,
col2 int,
col2 string,
date string
)
*First step:*
insert into table carbonTest select col1,col2,col3,"20180707" from
hiveTable2 where date="20180707";
The col3 is a hive map type, so this insert will be failed.
And it will create invalid segm
Finally I find the reason, in parquet we used gzip compressor and carbondata
used snappy.
Gzip has better compression ratio than snappy .
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
'SORT_COLUMNS'='app_name,app_id,is_AAA,os,platform,activation_channel,app_version,channel,ut,language,is_F,is_BBB,version_code,os_version,timezone,display_density'
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
I have disable inverted list in all columns, but it still 50% larger than
parquet.
31G(parquet) vs 48G (carbondata) with 424,000,000 records.
Carbondata version is 1.3
CREATE TABLE growth.carbondata_m_device_distinct (
A_id bigint,
app_namestring,
app_id int,
platformstring,
is_F smal