Hi,

 I am using Kylin 2.6.2 with Hadoop 2.7 (hive-2.1, hbase 1.1.8), and 
encountered the following problem in the third phase (Extract Distinct Columns):
"2019-06-17 17:22:30,021 INFO [main] 
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper: Sample output: 
TEST.RECORD_AGGREG.TS '1558681200' => reducer 0
2019-06-17 17:22:30,025 ERROR [Thread-8] 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
Thread[Thread-8,5,main] threw an Exception.
java.lang.NullPointerException
        at 
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$CuboidStatCalculator.putRowKeyToHLLNew(FactDistinctColumnsMapper.java:385)
"

I have a large input (521 836 260 rows) and I want to create one cube with 2 
metrics + 1 dimension. 
At first, I thought that it might fail because of a null value for a dimension, 
but after checking the code it seems that scenario is handled:
                String colValue = row[rowkeyColIndex[i]];
                if (colValue == null)
                    colValue = "0";
                byte[] bytes = hc.putString(colValue).hash().asBytes();

Could you please help me to find the root cause why this step is failing?

Below, you can find the logs for one container and the config:
"
2019-06-17 17:22:28,323 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from 
hadoop-metrics2.properties
2019-06-17 17:22:28,373 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 
10 second(s).
2019-06-17 17:22:28,373 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system 
started
2019-06-17 17:22:28,414 INFO [main] org.apache.hadoop.mapred.YarnChild: 
Executing with tokens:
2019-06-17 17:22:28,414 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: 
mapreduce.job, Service: job_1560186768967_16121, Ident: 
(org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@4e096385)
2019-06-17 17:22:28,488 INFO [main] org.apache.hadoop.mapred.YarnChild: 
Sleeping for 0ms before retrying again. Got null now.
2019-06-17 17:22:28,670 INFO [main] org.apache.hadoop.mapred.YarnChild: 
mapreduce.cluster.local.dir for child: 
/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121
2019-06-17 17:22:28,854 INFO [main] org.apache.hadoop.mapred.Task: 
mapOutputFile class: org.apache.hadoop.mapred.MapRFsOutputFile
2019-06-17 17:22:28,854 INFO [main] 
org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. 
Instead, use dfs.metrics.session-id
2019-06-17 17:22:28,865 INFO [main] 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output 
Committer Algorithm version is 1
2019-06-17 17:22:28,865 INFO [main] 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: FileOutputCommitter 
skip cleanup _temporary folders under output directory:false, ignore cleanup 
failures: false
2019-06-17 17:22:28,874 INFO [main] org.apache.hadoop.mapred.Task:  Using 
ResourceCalculatorProcessTree : [ ]
2019-06-17 17:22:28,949 INFO [main] org.apache.hadoop.mapred.MapTask: 
Processing split: org.apache.hive.hcatalog.mapreduce.HCatSplit@1b8a29df
2019-06-17 17:22:29,154 INFO [main] 
org.apache.hadoop.mapred.MapRFsOutputBuffer: mapreduce.task.io.sort.mb: 480
2019-06-17 17:22:29,154 INFO [main] 
org.apache.hadoop.mapred.MapRFsOutputBuffer: soft limit at 413575168
2019-06-17 17:22:29,155 INFO [main] 
org.apache.hadoop.mapred.MapRFsOutputBuffer: bufstart = 0; bufvoid = 417752688
2019-06-17 17:22:29,155 INFO [main] 
org.apache.hadoop.mapred.MapRFsOutputBuffer: kvstart = 0; length = 26109543
2019-06-17 17:22:29,163 INFO [main] org.apache.hadoop.mapred.MapTask: Map 
output collector class = org.apache.hadoop.mapred.MapRFsOutputBuffer
2019-06-17 17:22:29,168 INFO [main] 
org.apache.kylin.engine.mr.common.AbstractHadoopJob: The absolute path for meta 
dir is 
/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/container_e02_1560186768967_16121_01_000016/meta
2019-06-17 17:22:29,181 INFO [main] org.apache.kylin.common.KylinConfig: 
Loading kylin-defaults.properties from 
file:/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/filecache/10/job.jar/job.jar!/kylin-defaults.properties
2019-06-17 17:22:29,185 INFO [main] org.apache.kylin.common.KylinConfig: Use 
KYLIN_CONF=/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/container_e02_1560186768967_16121_01_000016/meta
2019-06-17 17:22:29,187 INFO [main] org.apache.kylin.common.KylinConfig: 
Initialized a new KylinConfig from getInstanceFromEnv : 1097619701
2019-06-17 17:22:29,203 INFO [main] org.apache.kylin.common.KylinConfigBase: 
Kylin Config was updated with kylin.metadata.url : 
kylin_metadata@ifile,path=/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/container_e02_1560186768967_16121_01_000016/meta
2019-06-17 17:22:29,342 INFO [main] org.apache.kylin.common.KylinConfig: 
Creating new manager instance of class org.apache.kylin.cube.CubeManager
2019-06-17 17:22:29,363 INFO [main] org.apache.kylin.cube.CubeManager: 
Initializing CubeManager with config 
kylin_metadata@ifile,path=/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/container_e02_1560186768967_16121_01_000016/meta
2019-06-17 17:22:29,364 INFO [main] 
org.apache.kylin.common.persistence.ResourceStore: Using metadata url 
kylin_metadata@ifile,path=/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/container_e02_1560186768967_16121_01_000016/meta
 for resource store
2019-06-17 17:22:29,673 INFO [main] org.apache.kylin.common.KylinConfig: 
Creating new manager instance of class org.apache.kylin.cube.CubeDescManager
2019-06-17 17:22:29,674 INFO [main] org.apache.kylin.cube.CubeDescManager: 
Initializing CubeDescManager with config 
kylin_metadata@ifile,path=/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/container_e02_1560186768967_16121_01_000016/meta
2019-06-17 17:22:29,715 INFO [main] org.apache.kylin.common.KylinConfig: 
Creating new manager instance of class 
org.apache.kylin.metadata.project.ProjectManager
2019-06-17 17:22:29,716 INFO [main] 
org.apache.kylin.metadata.project.ProjectManager: Initializing ProjectManager 
with metadata url 
kylin_metadata@ifile,path=/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/container_e02_1560186768967_16121_01_000016/meta
2019-06-17 17:22:29,726 INFO [main] org.apache.kylin.common.KylinConfig: 
Creating new manager instance of class 
org.apache.kylin.metadata.cachesync.Broadcaster
2019-06-17 17:22:29,733 INFO [main] org.apache.kylin.common.KylinConfig: 
Creating new manager instance of class 
org.apache.kylin.metadata.model.DataModelManager
2019-06-17 17:22:29,738 INFO [main] org.apache.kylin.common.KylinConfig: 
Creating new manager instance of class 
org.apache.kylin.metadata.TableMetadataManager
2019-06-17 17:22:29,754 INFO [main] 
org.apache.kylin.measure.MeasureTypeFactory: Checking custom measure types from 
kylin config
2019-06-17 17:22:29,755 INFO [main] 
org.apache.kylin.measure.MeasureTypeFactory: registering COUNT_DISTINCT(hllc), 
class org.apache.kylin.measure.hllc.HLLCMeasureType$Factory
2019-06-17 17:22:29,760 INFO [main] 
org.apache.kylin.measure.MeasureTypeFactory: registering 
COUNT_DISTINCT(bitmap), class 
org.apache.kylin.measure.bitmap.BitmapMeasureType$Factory
2019-06-17 17:22:29,767 INFO [main] 
org.apache.kylin.measure.MeasureTypeFactory: registering TOP_N(topn), class 
org.apache.kylin.measure.topn.TopNMeasureType$Factory
2019-06-17 17:22:29,769 INFO [main] 
org.apache.kylin.measure.MeasureTypeFactory: registering RAW(raw), class 
org.apache.kylin.measure.raw.RawMeasureType$Factory
2019-06-17 17:22:29,771 INFO [main] 
org.apache.kylin.measure.MeasureTypeFactory: registering 
EXTENDED_COLUMN(extendedcolumn), class 
org.apache.kylin.measure.extendedcolumn.ExtendedColumnMeasureType$Factory
2019-06-17 17:22:29,772 INFO [main] 
org.apache.kylin.measure.MeasureTypeFactory: registering 
PERCENTILE_APPROX(percentile), class 
org.apache.kylin.measure.percentile.PercentileMeasureType$Factory
2019-06-17 17:22:29,774 INFO [main] 
org.apache.kylin.measure.MeasureTypeFactory: registering 
COUNT_DISTINCT(dim_dc), class 
org.apache.kylin.measure.dim.DimCountDistinctMeasureType$Factory
2019-06-17 17:22:29,789 INFO [main] 
org.apache.kylin.metadata.model.DataModelManager: Model flat_single is missing 
or unloaded yet
2019-06-17 17:22:29,789 INFO [main] 
org.apache.kylin.metadata.model.DataModelManager: Model record_aggr is missing 
or unloaded yet
2019-06-17 17:22:29,789 INFO [main] 
org.apache.kylin.metadata.model.DataModelManager: Model tester is missing or 
unloaded yet
2019-06-17 17:22:29,836 INFO [main] 
org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & 
initialized native-zlib library
2019-06-17 17:22:29,837 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.deflate]
2019-06-17 17:22:29,849 INFO [main] 
org.apache.hive.hcatalog.mapreduce.InternalUtil: Initializing 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe with properties 
{name=default.kylin_intermediate_record_aggr_cube_9c444e0a_98c7_2646_52cc_3d74b6058d18,
 numFiles=70, columns.types=bigint,bigint,bigint, auto.purge=true, 
serialization.format=1, 
columns=record_aggreg_ts,record_aggreg_page_visit_sum,record_aggreg_image_load_sum,
 rawDataSize=10549866066, columns.comments=nullnullnull, 
last_modified_time=1560812812, numRows=521836260, 
serialization.lib=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
EXTERNAL=TRUE, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, 
totalSize=21571825655, last_modified_by=root, serialization.null.format=\N, 
transient_lastDdlTime=1560817330}
2019-06-17 17:22:29,972 INFO [main] org.apache.kylin.engine.mr.KylinMapper: Do 
setup, available memory: 5712m
2019-06-17 17:22:29,972 INFO [main] org.apache.kylin.engine.mr.KylinMapper: The 
conf for current mapper will be 2047526627
2019-06-17 17:22:29,981 INFO [main] org.apache.kylin.common.KylinConfig: 
Creating new manager instance of class org.apache.kylin.source.SourceManager
2019-06-17 17:22:29,993 INFO [main] org.apache.kylin.common.KylinConfig: 
Creating new manager instance of class 
org.apache.kylin.cube.cuboid.CuboidManager
2019-06-17 17:22:30,012 INFO [main] 
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper: Found KylinVersion 
: 2.6.2.0. Use new algorithm for cuboid sampling. About the details of the new 
algorithm, please refer to KYLIN-2518
2019-06-17 17:22:30,013 INFO [main] 
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper: cuboid stats 
calculator:0 started, handle cuboids number:187
2019-06-17 17:22:30,017 INFO [main] org.apache.kylin.engine.mr.KylinMapper: 
Accepting Mapper Key with ordinal: 1
2019-06-17 17:22:30,017 INFO [main] org.apache.kylin.engine.mr.KylinMapper: Do 
map, available memory: 5701m
2019-06-17 17:22:30,021 INFO [main] 
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper: Sample output: 
TEST.RECORD_AGGREG.TS '1558681200' => reducer 0
2019-06-17 17:22:30,025 ERROR [Thread-8] 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
Thread[Thread-8,5,main] threw an Exception.
java.lang.NullPointerException
        at 
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$CuboidStatCalculator.putRowKeyToHLLNew(FactDistinctColumnsMapper.java:385)
        at 
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$CuboidStatCalculator.run(FactDistinctColumnsMapper.java:411)
        at java.lang.Thread.run(Thread.java:748)"

Cube Config:
{ "uuid": "c532d208-cd50-4aaf-06a6-6023f61a3050", "last_modified": 
1560810508988, "version": "2.6.2.0", "name": "record_aggr_cube", "is_draft": 
false, "model_name": "record_aggr", "description": "", "null_string": null, 
"dimensions": [ { "name": "TS", "table": "record_AGGREG", "column": "TS", 
"derived": null } ], "measures": [ { "name": "_COUNT_", "function": { 
"expression": "COUNT", "parameter": { "type": "constant", "value": "1" }, 
"returntype": "bigint" } }, { "name": "SUM_PAGE_VISIT", "function": { 
"expression": "SUM", "parameter": { "type": "column", "value": 
"record_AGGREG.PAGE_VISIT_SUM" }, "returntype": "bigint" } }, { "name": 
"SUM_IMAGE_LOAD", "function": { "expression": "SUM", "parameter": { "type": 
"column", "value": "record_AGGREG.IMAGE_LOAD_SUM" }, "returntype": "bigint" } } 
], "dictionaries": [], "rowkey": { "rowkey_columns": [ { "column": 
"record_AGGREG.TS", "encoding": "dict", "encoding_version": 1, "isShardBy": 
false } ] }, "hbase_mapping": { "column_family": [ { "name": "F1", "columns": [ 
{ "qualifier": "M", "measure_refs": [ "_COUNT_", "SUM_PAGE_VISIT", 
"SUM_IMAGE_LOAD" ] } ] } ] }, "aggregation_groups": [ { "includes": [ 
"record_AGGREG.TS" ], "select_rule": { "hierarchy_dims": [], "mandatory_dims": 
[], "joint_dims": [] } } ], "signature": "+8tNqJZYWGtkbx7AAZhJCg==", 
"notify_list": [], "status_need_notify": [ "ERROR", "DISCARDED", "SUCCEED" ], 
"partition_date_start": 0, "partition_date_end": 3153600000000, 
"auto_merge_time_ranges": [ 604800000, 2419200000 ], "volatile_range": 0, 
"retention_range": 0, "engine_type": 4, "storage_type": 2, 
"override_kylin_properties": { 
"kylin.engine.mr.config-override.mapreduce.map.memory.mb": "20480", 
"kylin.engine.mr.config-override.mapreduce.reduce.memory.mb": "20480", 
"kylin.engine.mr.config-override.mapreduce.map.cpu.vcores": "4", 
"kylin.engine.mr.config-override.mapreduce.map.reduce.vcores": "4", 
"kylin.source.hive.config-override.mapreduce.reduce.memory.mb": "20480", 
"kylin.engine.mr.config-override.mapreduce.reduce.cpu.vcores": "4", 
"kylin.engine.mr.config-override.mapreduce.map.java.opts": "-Xmx7g", 
"kylin.engine.mr.config-override.mapreduce.reduce.java.opts": "-Xmx7g", 
"kylin.source.hive.config-override.mapreduce.reduce.cpu.vcores": "2", 
"kylin.source.hive.config-override.mapreduce.map.cpu.vcores": "2", 
"kylin.source.hive.config-override.mapreduce.map.memory.mb": "20480" }, 
"cuboid_black_list": [], "parent_forward": 3, "mandatory_dimension_set_list": 
[], "snapshot_table_desc_list": [] }

Thanks,
David

Reply via email to