Hi,
I am using Kylin 2.6.2 with Hadoop 2.7 (hive-2.1, hbase 1.1.8), and
encountered the following problem in the third phase (Extract Distinct Columns):
"2019-06-17 17:22:30,021 INFO [main]
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper: Sample output:
TEST.RECORD_AGGREG.TS '1558681200' => reducer 0
2019-06-17 17:22:30,025 ERROR [Thread-8]
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread
Thread[Thread-8,5,main] threw an Exception.
java.lang.NullPointerException
at
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$CuboidStatCalculator.putRowKeyToHLLNew(FactDistinctColumnsMapper.java:385)
"
I have a large input (521 836 260 rows) and I want to create one cube with 2
metrics + 1 dimension.
At first, I thought that it might fail because of a null value for a dimension,
but after checking the code it seems that scenario is handled:
String colValue = row[rowkeyColIndex[i]];
if (colValue == null)
colValue = "0";
byte[] bytes = hc.putString(colValue).hash().asBytes();
Could you please help me to find the root cause why this step is failing?
Below, you can find the logs for one container and the config:
"
2019-06-17 17:22:28,323 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
hadoop-metrics2.properties
2019-06-17 17:22:28,373 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at
10 second(s).
2019-06-17 17:22:28,373 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system
started
2019-06-17 17:22:28,414 INFO [main] org.apache.hadoop.mapred.YarnChild:
Executing with tokens:
2019-06-17 17:22:28,414 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind:
mapreduce.job, Service: job_1560186768967_16121, Ident:
(org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@4e096385)
2019-06-17 17:22:28,488 INFO [main] org.apache.hadoop.mapred.YarnChild:
Sleeping for 0ms before retrying again. Got null now.
2019-06-17 17:22:28,670 INFO [main] org.apache.hadoop.mapred.YarnChild:
mapreduce.cluster.local.dir for child:
/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121
2019-06-17 17:22:28,854 INFO [main] org.apache.hadoop.mapred.Task:
mapOutputFile class: org.apache.hadoop.mapred.MapRFsOutputFile
2019-06-17 17:22:28,854 INFO [main]
org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated.
Instead, use dfs.metrics.session-id
2019-06-17 17:22:28,865 INFO [main]
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output
Committer Algorithm version is 1
2019-06-17 17:22:28,865 INFO [main]
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: FileOutputCommitter
skip cleanup _temporary folders under output directory:false, ignore cleanup
failures: false
2019-06-17 17:22:28,874 INFO [main] org.apache.hadoop.mapred.Task: Using
ResourceCalculatorProcessTree : [ ]
2019-06-17 17:22:28,949 INFO [main] org.apache.hadoop.mapred.MapTask:
Processing split: org.apache.hive.hcatalog.mapreduce.HCatSplit@1b8a29df
2019-06-17 17:22:29,154 INFO [main]
org.apache.hadoop.mapred.MapRFsOutputBuffer: mapreduce.task.io.sort.mb: 480
2019-06-17 17:22:29,154 INFO [main]
org.apache.hadoop.mapred.MapRFsOutputBuffer: soft limit at 413575168
2019-06-17 17:22:29,155 INFO [main]
org.apache.hadoop.mapred.MapRFsOutputBuffer: bufstart = 0; bufvoid = 417752688
2019-06-17 17:22:29,155 INFO [main]
org.apache.hadoop.mapred.MapRFsOutputBuffer: kvstart = 0; length = 26109543
2019-06-17 17:22:29,163 INFO [main] org.apache.hadoop.mapred.MapTask: Map
output collector class = org.apache.hadoop.mapred.MapRFsOutputBuffer
2019-06-17 17:22:29,168 INFO [main]
org.apache.kylin.engine.mr.common.AbstractHadoopJob: The absolute path for meta
dir is
/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/container_e02_1560186768967_16121_01_000016/meta
2019-06-17 17:22:29,181 INFO [main] org.apache.kylin.common.KylinConfig:
Loading kylin-defaults.properties from
file:/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/filecache/10/job.jar/job.jar!/kylin-defaults.properties
2019-06-17 17:22:29,185 INFO [main] org.apache.kylin.common.KylinConfig: Use
KYLIN_CONF=/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/container_e02_1560186768967_16121_01_000016/meta
2019-06-17 17:22:29,187 INFO [main] org.apache.kylin.common.KylinConfig:
Initialized a new KylinConfig from getInstanceFromEnv : 1097619701
2019-06-17 17:22:29,203 INFO [main] org.apache.kylin.common.KylinConfigBase:
Kylin Config was updated with kylin.metadata.url :
kylin_metadata@ifile,path=/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/container_e02_1560186768967_16121_01_000016/meta
2019-06-17 17:22:29,342 INFO [main] org.apache.kylin.common.KylinConfig:
Creating new manager instance of class org.apache.kylin.cube.CubeManager
2019-06-17 17:22:29,363 INFO [main] org.apache.kylin.cube.CubeManager:
Initializing CubeManager with config
kylin_metadata@ifile,path=/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/container_e02_1560186768967_16121_01_000016/meta
2019-06-17 17:22:29,364 INFO [main]
org.apache.kylin.common.persistence.ResourceStore: Using metadata url
kylin_metadata@ifile,path=/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/container_e02_1560186768967_16121_01_000016/meta
for resource store
2019-06-17 17:22:29,673 INFO [main] org.apache.kylin.common.KylinConfig:
Creating new manager instance of class org.apache.kylin.cube.CubeDescManager
2019-06-17 17:22:29,674 INFO [main] org.apache.kylin.cube.CubeDescManager:
Initializing CubeDescManager with config
kylin_metadata@ifile,path=/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/container_e02_1560186768967_16121_01_000016/meta
2019-06-17 17:22:29,715 INFO [main] org.apache.kylin.common.KylinConfig:
Creating new manager instance of class
org.apache.kylin.metadata.project.ProjectManager
2019-06-17 17:22:29,716 INFO [main]
org.apache.kylin.metadata.project.ProjectManager: Initializing ProjectManager
with metadata url
kylin_metadata@ifile,path=/local/yarn/hadoop-mapr/nm-local-dir/usercache/root/appcache/application_1560186768967_16121/container_e02_1560186768967_16121_01_000016/meta
2019-06-17 17:22:29,726 INFO [main] org.apache.kylin.common.KylinConfig:
Creating new manager instance of class
org.apache.kylin.metadata.cachesync.Broadcaster
2019-06-17 17:22:29,733 INFO [main] org.apache.kylin.common.KylinConfig:
Creating new manager instance of class
org.apache.kylin.metadata.model.DataModelManager
2019-06-17 17:22:29,738 INFO [main] org.apache.kylin.common.KylinConfig:
Creating new manager instance of class
org.apache.kylin.metadata.TableMetadataManager
2019-06-17 17:22:29,754 INFO [main]
org.apache.kylin.measure.MeasureTypeFactory: Checking custom measure types from
kylin config
2019-06-17 17:22:29,755 INFO [main]
org.apache.kylin.measure.MeasureTypeFactory: registering COUNT_DISTINCT(hllc),
class org.apache.kylin.measure.hllc.HLLCMeasureType$Factory
2019-06-17 17:22:29,760 INFO [main]
org.apache.kylin.measure.MeasureTypeFactory: registering
COUNT_DISTINCT(bitmap), class
org.apache.kylin.measure.bitmap.BitmapMeasureType$Factory
2019-06-17 17:22:29,767 INFO [main]
org.apache.kylin.measure.MeasureTypeFactory: registering TOP_N(topn), class
org.apache.kylin.measure.topn.TopNMeasureType$Factory
2019-06-17 17:22:29,769 INFO [main]
org.apache.kylin.measure.MeasureTypeFactory: registering RAW(raw), class
org.apache.kylin.measure.raw.RawMeasureType$Factory
2019-06-17 17:22:29,771 INFO [main]
org.apache.kylin.measure.MeasureTypeFactory: registering
EXTENDED_COLUMN(extendedcolumn), class
org.apache.kylin.measure.extendedcolumn.ExtendedColumnMeasureType$Factory
2019-06-17 17:22:29,772 INFO [main]
org.apache.kylin.measure.MeasureTypeFactory: registering
PERCENTILE_APPROX(percentile), class
org.apache.kylin.measure.percentile.PercentileMeasureType$Factory
2019-06-17 17:22:29,774 INFO [main]
org.apache.kylin.measure.MeasureTypeFactory: registering
COUNT_DISTINCT(dim_dc), class
org.apache.kylin.measure.dim.DimCountDistinctMeasureType$Factory
2019-06-17 17:22:29,789 INFO [main]
org.apache.kylin.metadata.model.DataModelManager: Model flat_single is missing
or unloaded yet
2019-06-17 17:22:29,789 INFO [main]
org.apache.kylin.metadata.model.DataModelManager: Model record_aggr is missing
or unloaded yet
2019-06-17 17:22:29,789 INFO [main]
org.apache.kylin.metadata.model.DataModelManager: Model tester is missing or
unloaded yet
2019-06-17 17:22:29,836 INFO [main]
org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
initialized native-zlib library
2019-06-17 17:22:29,837 INFO [main] org.apache.hadoop.io.compress.CodecPool:
Got brand-new decompressor [.deflate]
2019-06-17 17:22:29,849 INFO [main]
org.apache.hive.hcatalog.mapreduce.InternalUtil: Initializing
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe with properties
{name=default.kylin_intermediate_record_aggr_cube_9c444e0a_98c7_2646_52cc_3d74b6058d18,
numFiles=70, columns.types=bigint,bigint,bigint, auto.purge=true,
serialization.format=1,
columns=record_aggreg_ts,record_aggreg_page_visit_sum,record_aggreg_image_load_sum,
rawDataSize=10549866066, columns.comments=nullnullnull,
last_modified_time=1560812812, numRows=521836260,
serialization.lib=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
EXTERNAL=TRUE, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"},
totalSize=21571825655, last_modified_by=root, serialization.null.format=\N,
transient_lastDdlTime=1560817330}
2019-06-17 17:22:29,972 INFO [main] org.apache.kylin.engine.mr.KylinMapper: Do
setup, available memory: 5712m
2019-06-17 17:22:29,972 INFO [main] org.apache.kylin.engine.mr.KylinMapper: The
conf for current mapper will be 2047526627
2019-06-17 17:22:29,981 INFO [main] org.apache.kylin.common.KylinConfig:
Creating new manager instance of class org.apache.kylin.source.SourceManager
2019-06-17 17:22:29,993 INFO [main] org.apache.kylin.common.KylinConfig:
Creating new manager instance of class
org.apache.kylin.cube.cuboid.CuboidManager
2019-06-17 17:22:30,012 INFO [main]
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper: Found KylinVersion
: 2.6.2.0. Use new algorithm for cuboid sampling. About the details of the new
algorithm, please refer to KYLIN-2518
2019-06-17 17:22:30,013 INFO [main]
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper: cuboid stats
calculator:0 started, handle cuboids number:187
2019-06-17 17:22:30,017 INFO [main] org.apache.kylin.engine.mr.KylinMapper:
Accepting Mapper Key with ordinal: 1
2019-06-17 17:22:30,017 INFO [main] org.apache.kylin.engine.mr.KylinMapper: Do
map, available memory: 5701m
2019-06-17 17:22:30,021 INFO [main]
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper: Sample output:
TEST.RECORD_AGGREG.TS '1558681200' => reducer 0
2019-06-17 17:22:30,025 ERROR [Thread-8]
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread
Thread[Thread-8,5,main] threw an Exception.
java.lang.NullPointerException
at
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$CuboidStatCalculator.putRowKeyToHLLNew(FactDistinctColumnsMapper.java:385)
at
org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$CuboidStatCalculator.run(FactDistinctColumnsMapper.java:411)
at java.lang.Thread.run(Thread.java:748)"
Cube Config:
{ "uuid": "c532d208-cd50-4aaf-06a6-6023f61a3050", "last_modified":
1560810508988, "version": "2.6.2.0", "name": "record_aggr_cube", "is_draft":
false, "model_name": "record_aggr", "description": "", "null_string": null,
"dimensions": [ { "name": "TS", "table": "record_AGGREG", "column": "TS",
"derived": null } ], "measures": [ { "name": "_COUNT_", "function": {
"expression": "COUNT", "parameter": { "type": "constant", "value": "1" },
"returntype": "bigint" } }, { "name": "SUM_PAGE_VISIT", "function": {
"expression": "SUM", "parameter": { "type": "column", "value":
"record_AGGREG.PAGE_VISIT_SUM" }, "returntype": "bigint" } }, { "name":
"SUM_IMAGE_LOAD", "function": { "expression": "SUM", "parameter": { "type":
"column", "value": "record_AGGREG.IMAGE_LOAD_SUM" }, "returntype": "bigint" } }
], "dictionaries": [], "rowkey": { "rowkey_columns": [ { "column":
"record_AGGREG.TS", "encoding": "dict", "encoding_version": 1, "isShardBy":
false } ] }, "hbase_mapping": { "column_family": [ { "name": "F1", "columns": [
{ "qualifier": "M", "measure_refs": [ "_COUNT_", "SUM_PAGE_VISIT",
"SUM_IMAGE_LOAD" ] } ] } ] }, "aggregation_groups": [ { "includes": [
"record_AGGREG.TS" ], "select_rule": { "hierarchy_dims": [], "mandatory_dims":
[], "joint_dims": [] } } ], "signature": "+8tNqJZYWGtkbx7AAZhJCg==",
"notify_list": [], "status_need_notify": [ "ERROR", "DISCARDED", "SUCCEED" ],
"partition_date_start": 0, "partition_date_end": 3153600000000,
"auto_merge_time_ranges": [ 604800000, 2419200000 ], "volatile_range": 0,
"retention_range": 0, "engine_type": 4, "storage_type": 2,
"override_kylin_properties": {
"kylin.engine.mr.config-override.mapreduce.map.memory.mb": "20480",
"kylin.engine.mr.config-override.mapreduce.reduce.memory.mb": "20480",
"kylin.engine.mr.config-override.mapreduce.map.cpu.vcores": "4",
"kylin.engine.mr.config-override.mapreduce.map.reduce.vcores": "4",
"kylin.source.hive.config-override.mapreduce.reduce.memory.mb": "20480",
"kylin.engine.mr.config-override.mapreduce.reduce.cpu.vcores": "4",
"kylin.engine.mr.config-override.mapreduce.map.java.opts": "-Xmx7g",
"kylin.engine.mr.config-override.mapreduce.reduce.java.opts": "-Xmx7g",
"kylin.source.hive.config-override.mapreduce.reduce.cpu.vcores": "2",
"kylin.source.hive.config-override.mapreduce.map.cpu.vcores": "2",
"kylin.source.hive.config-override.mapreduce.map.memory.mb": "20480" },
"cuboid_black_list": [], "parent_forward": 3, "mandatory_dimension_set_list":
[], "snapshot_table_desc_list": [] }
Thanks,
David