Re: Facing Issues with Kylin 2.3.1
Hi Ketan, The cleanup logic is straightforward. You can check StorageCleanupJob.java and MetadataCleanupJob.java The metadata cleanup only delete the entries in 'kylin_metadata' hbase table; The storage cleanup will delete no-referred HBase tables, hive intermediate tables and HDFS intermediate folders; The "metastore.sh reset" will delete all metadata records; Please don't run storage cleanup when you have an empty metadata table, as it may treat all HBase and HDFS as unnecessary. 2018-04-07 13:27 GMT+08:00 kdcool6932 <kdcool6...@yahoo.com.invalid>: > Thanks ShaoFeng,That explains the process to me like when can an hbase > table be dropped. > One more thing that I will like to understand. In one of our Kylin Test > setups of 2.1 we have faced one issue.Can you possibly point me out to what > and how it could have been done. > I am pointing to kylin metadata utility as described here, we have been > using the same since we started using kylin.Referring to commands here: > http://kylin.apache.org/docs23/howto/howto_backup_metadata.html > We had some empty segments in kylin due to which we were not able to > query.So we took a metadata backup using the command , changed the json > file in there by deleting the empty segments from .json file > inside cubes folder in metadata backup directory and then we ran metadata > reset command and also metadata clean with delete --true as well. > Then we restored the modified metadata backup. > All the segment tables for all the cubes from hbase got deleted. > So to grow my understanding around it would like to understand how this > reset and clean steps actually works. > This would help us better to understand it functionally going ahead and > help us better use the tool. > > Thanks Ketan@Exponential > > > > Sent from my Samsung Galaxy smartphone. > Original message From: ShaoFeng Shi < > shaofeng...@apache.org> Date: 06/04/2018 8:29 pm (GMT+05:30) To: dev < > dev@kylin.apache.org> Subject: Re: Facing Issues with Kylin 2.3.1 > Hello Ketan, > > Thanks for the reporting. > > Firstly, the HBase table name is generated at the moment a segment is > created; Seeing the table name does not mean it already exists in HBase. > Only when the build job is executed to the "Create HTable" step, Kylin will > request the table in HBase. So for a "NEW" segment, it is expected that the > table does not exist. > > For the problem "some of the existing segment tables get deleted from > Hbase", as I know normally Kylin won't delete htable (unless merging > segments or run the StorageCleanupJob). Could you please check Kylin and > HBase logs to see when and how the table be dropped? If a table is deleted, > there should be something be logged. The logs will be helpful for analyzing > the issue. > > > 2018-04-06 16:18 GMT+08:00 ketan dikshit <kdcool6...@yahoo.com.invalid>: > > > Hi Team, > > We recently upgraded from Kylin 2.1 to Kylin 2.3.1 > > > > Since then we are facing some issues with our cube building pipeline. > > > > While building new segments, sometimes(not always), some of the existing > > segment tables get deleted from Hbase. > > For example here is the segment Json for empty segment, it shows me the > > table name , but this table gets dropped from Hbase. > > { <> > > "uuid":"90972f20-d64d-4224-a1f1-cdb6a0ddb69c", > > "name":"2018040211_2018040212", > > "storage_location_identifier":"KYLIN_Z3C9IU2QNY", > > "date_range_start":152266680, > > "date_range_end":152267040, > > "source_offset_start":0, > > "source_offset_end":0, > > "status":"NEW", > > "size_kb":0, > > "input_records":0, > > "input_records_size":0, > > "last_build_time":0, > > "last_build_job_id":null, > > "create_time_utc":1522700641299, > > "cuboid_shard_nums":{ <> > > > > }, > > "total_shards":0, > > "blackout_cuboids":[ <> > > > > ], > > "binary_signature":null, > > "dictionaries":{ <> > > > > }, > > "snapshots":null, > > "rowkey_stats":[ <> > > > > ] > > } > > > > This data loss is actually proving out very hea
Re: Facing Issues with Kylin 2.3.1
Thanks ShaoFeng,That explains the process to me like when can an hbase table be dropped. One more thing that I will like to understand. In one of our Kylin Test setups of 2.1 we have faced one issue.Can you possibly point me out to what and how it could have been done. I am pointing to kylin metadata utility as described here, we have been using the same since we started using kylin.Referring to commands here: http://kylin.apache.org/docs23/howto/howto_backup_metadata.html We had some empty segments in kylin due to which we were not able to query.So we took a metadata backup using the command , changed the json file in there by deleting the empty segments from .json file inside cubes folder in metadata backup directory and then we ran metadata reset command and also metadata clean with delete --true as well. Then we restored the modified metadata backup. All the segment tables for all the cubes from hbase got deleted. So to grow my understanding around it would like to understand how this reset and clean steps actually works. This would help us better to understand it functionally going ahead and help us better use the tool. Thanks Ketan@Exponential Sent from my Samsung Galaxy smartphone. Original message From: ShaoFeng Shi <shaofeng...@apache.org> Date: 06/04/2018 8:29 pm (GMT+05:30) To: dev <dev@kylin.apache.org> Subject: Re: Facing Issues with Kylin 2.3.1 Hello Ketan, Thanks for the reporting. Firstly, the HBase table name is generated at the moment a segment is created; Seeing the table name does not mean it already exists in HBase. Only when the build job is executed to the "Create HTable" step, Kylin will request the table in HBase. So for a "NEW" segment, it is expected that the table does not exist. For the problem "some of the existing segment tables get deleted from Hbase", as I know normally Kylin won't delete htable (unless merging segments or run the StorageCleanupJob). Could you please check Kylin and HBase logs to see when and how the table be dropped? If a table is deleted, there should be something be logged. The logs will be helpful for analyzing the issue. 2018-04-06 16:18 GMT+08:00 ketan dikshit <kdcool6...@yahoo.com.invalid>: > Hi Team, > We recently upgraded from Kylin 2.1 to Kylin 2.3.1 > > Since then we are facing some issues with our cube building pipeline. > > While building new segments, sometimes(not always), some of the existing > segment tables get deleted from Hbase. > For example here is the segment Json for empty segment, it shows me the > table name , but this table gets dropped from Hbase. > { <> > "uuid":"90972f20-d64d-4224-a1f1-cdb6a0ddb69c", > "name":"2018040211_2018040212", > "storage_location_identifier":"KYLIN_Z3C9IU2QNY", > "date_range_start":152266680, > "date_range_end":152267040, > "source_offset_start":0, > "source_offset_end":0, > "status":"NEW", > "size_kb":0, > "input_records":0, > "input_records_size":0, > "last_build_time":0, > "last_build_job_id":null, > "create_time_utc":1522700641299, > "cuboid_shard_nums":{ <> > > }, > "total_shards":0, > "blackout_cuboids":[ <> > > ], > "binary_signature":null, > "dictionaries":{ <> > > }, > "snapshots":null, > "rowkey_stats":[ <> > > ] > } > > This data loss is actually proving out very heavy business impact as we > are always going back and restoring previous day snapshots and building the > new segments again, hoping it doesn’t fails. > Here are my kylin.props > kylin.web.timezone=US/Pacific > kylin.metadata.url=kylin2.1MetadataProduction@hbase > kylin.storage.url=hbase > kylin.env.hdfs-working-dir=/tmp/kylin-2.1-prod > kylin.engine.mr.reduce-input-mb=300 > kylin.server.mode=all > kylin.job.use-remote-cli=false > kylin.job.remote-cli-working-dir=/tmp/kylin-2.1 > kylin.job.max-concurrent-jobs=10 > kylin.engine.mr.yarn-check-interval-seconds=10 > kylin.source.hive.database-for-flat-table=tmp_kylin > kylin.storage.hbase.table-name-prefix=KYLIN_ > kylin.storage.hbase.compression-codec=lz4 > kylin.storage.hbase.region-cut-gb=3 > kylin.storage.hbase.min-region-count=1 > kylin.storage.hbase.max-region-count=500 > kylin.storage.partition.max-scan-bytes=16106127360 > kylin.storage.hbase.coprocessor-mem-gb=6 > kylin.security.profile=testing > > kylin.query.cache-enabled=true > kylin.query.cache-threshold-duration=500 > kylin.query.cache-threshold-scan-count=10240 > kylin.storage.hbase.scan-cache-rows=4096 > > > Any idea around how and why this corruption might happen, How can even > data get dropped while building some other segments. > > Thanks, > Ketan@Exponential -- Best regards, Shaofeng Shi 史少锋
Re: Facing Issues with Kylin 2.3.1
Hello Ketan, Thanks for the reporting. Firstly, the HBase table name is generated at the moment a segment is created; Seeing the table name does not mean it already exists in HBase. Only when the build job is executed to the "Create HTable" step, Kylin will request the table in HBase. So for a "NEW" segment, it is expected that the table does not exist. For the problem "some of the existing segment tables get deleted from Hbase", as I know normally Kylin won't delete htable (unless merging segments or run the StorageCleanupJob). Could you please check Kylin and HBase logs to see when and how the table be dropped? If a table is deleted, there should be something be logged. The logs will be helpful for analyzing the issue. 2018-04-06 16:18 GMT+08:00 ketan dikshit: > Hi Team, > We recently upgraded from Kylin 2.1 to Kylin 2.3.1 > > Since then we are facing some issues with our cube building pipeline. > > While building new segments, sometimes(not always), some of the existing > segment tables get deleted from Hbase. > For example here is the segment Json for empty segment, it shows me the > table name , but this table gets dropped from Hbase. > { <> > "uuid":"90972f20-d64d-4224-a1f1-cdb6a0ddb69c", > "name":"2018040211_2018040212", > "storage_location_identifier":"KYLIN_Z3C9IU2QNY", > "date_range_start":152266680, > "date_range_end":152267040, > "source_offset_start":0, > "source_offset_end":0, > "status":"NEW", > "size_kb":0, > "input_records":0, > "input_records_size":0, > "last_build_time":0, > "last_build_job_id":null, > "create_time_utc":1522700641299, > "cuboid_shard_nums":{ <> > > }, > "total_shards":0, > "blackout_cuboids":[ <> > > ], > "binary_signature":null, > "dictionaries":{ <> > > }, > "snapshots":null, > "rowkey_stats":[ <> > > ] > } > > This data loss is actually proving out very heavy business impact as we > are always going back and restoring previous day snapshots and building the > new segments again, hoping it doesn’t fails. > Here are my kylin.props > kylin.web.timezone=US/Pacific > kylin.metadata.url=kylin2.1MetadataProduction@hbase > kylin.storage.url=hbase > kylin.env.hdfs-working-dir=/tmp/kylin-2.1-prod > kylin.engine.mr.reduce-input-mb=300 > kylin.server.mode=all > kylin.job.use-remote-cli=false > kylin.job.remote-cli-working-dir=/tmp/kylin-2.1 > kylin.job.max-concurrent-jobs=10 > kylin.engine.mr.yarn-check-interval-seconds=10 > kylin.source.hive.database-for-flat-table=tmp_kylin > kylin.storage.hbase.table-name-prefix=KYLIN_ > kylin.storage.hbase.compression-codec=lz4 > kylin.storage.hbase.region-cut-gb=3 > kylin.storage.hbase.min-region-count=1 > kylin.storage.hbase.max-region-count=500 > kylin.storage.partition.max-scan-bytes=16106127360 > kylin.storage.hbase.coprocessor-mem-gb=6 > kylin.security.profile=testing > > kylin.query.cache-enabled=true > kylin.query.cache-threshold-duration=500 > kylin.query.cache-threshold-scan-count=10240 > kylin.storage.hbase.scan-cache-rows=4096 > > > Any idea around how and why this corruption might happen, How can even > data get dropped while building some other segments. > > Thanks, > Ketan@Exponential -- Best regards, Shaofeng Shi 史少锋