Re: Facing Issues with Kylin 2.3.1

2018-05-21 Thread ShaoFeng Shi
Hi Ketan,

The cleanup logic is straightforward. You can check

StorageCleanupJob.java and

MetadataCleanupJob.java


The metadata cleanup only delete the entries in 'kylin_metadata' hbase
table; The storage cleanup will delete no-referred HBase tables, hive
intermediate tables and HDFS intermediate folders;

The "metastore.sh reset" will delete all metadata records; Please
don't run storage cleanup when you have an empty metadata table, as it
may treat all HBase and HDFS as unnecessary.


2018-04-07 13:27 GMT+08:00 kdcool6932 <kdcool6...@yahoo.com.invalid>:

> Thanks ShaoFeng,That explains the process to me like when can an hbase
> table be dropped.
> One more thing that I will like to understand. In one of our Kylin Test
> setups of 2.1 we have faced one issue.Can you possibly point me out to what
> and how it could have been done.
> I am pointing to kylin metadata utility as described here, we have been
> using the same since we started using kylin.Referring to commands here:
> http://kylin.apache.org/docs23/howto/howto_backup_metadata.html
> We had some empty segments in kylin due to which we were not able to
> query.So we took a metadata backup using the command , changed the json
> file in there by deleting the empty segments from .json file
> inside cubes folder in metadata backup directory and then we ran metadata
> reset command and also metadata clean with delete --true as well.
> Then we restored the modified metadata backup.
> All the segment tables for all the cubes from hbase got deleted.
> So to grow my understanding around it would like to understand how this
> reset and clean steps actually works.
> This would help us better to understand it functionally going ahead and
> help us better use the tool.
>
> Thanks Ketan@Exponential
>
>
>
> Sent from my Samsung Galaxy smartphone.
>  Original message From: ShaoFeng Shi <
> shaofeng...@apache.org> Date: 06/04/2018  8:29 pm  (GMT+05:30) To: dev <
> dev@kylin.apache.org> Subject: Re: Facing Issues with Kylin 2.3.1
> Hello Ketan,
>
> Thanks for the reporting.
>
> Firstly, the HBase table name is generated at the moment a segment is
> created; Seeing the table name does not mean it already exists in HBase.
> Only when the build job is executed to the "Create HTable" step, Kylin will
> request the table in HBase. So for a "NEW" segment, it is expected that the
> table does not exist.
>
> For the problem "some of the existing segment tables get deleted from
> Hbase", as I know normally Kylin won't delete htable (unless merging
> segments or run the StorageCleanupJob). Could you please check Kylin and
> HBase logs to see when and how the table be dropped? If a table is deleted,
> there should be something be logged. The logs will be helpful for analyzing
> the issue.
>
>
> 2018-04-06 16:18 GMT+08:00 ketan dikshit <kdcool6...@yahoo.com.invalid>:
>
> > Hi Team,
> > We recently upgraded from Kylin 2.1 to Kylin 2.3.1
> >
> > Since then we are facing some issues with our cube building pipeline.
> >
> > While building new segments, sometimes(not always), some of the existing
> > segment tables get deleted from Hbase.
> > For example here is the segment Json for empty segment, it shows me the
> > table name , but this table gets dropped from Hbase.
> > {  <>
> >  "uuid":"90972f20-d64d-4224-a1f1-cdb6a0ddb69c",
> >  "name":"2018040211_2018040212",
> >  "storage_location_identifier":"KYLIN_Z3C9IU2QNY",
> >  "date_range_start":152266680,
> >  "date_range_end":152267040,
> >  "source_offset_start":0,
> >  "source_offset_end":0,
> >  "status":"NEW",
> >  "size_kb":0,
> >  "input_records":0,
> >  "input_records_size":0,
> >  "last_build_time":0,
> >  "last_build_job_id":null,
> >  "create_time_utc":1522700641299,
> >  "cuboid_shard_nums":{  <>
> >
> >  },
> >  "total_shards":0,
> >  "blackout_cuboids":[  <>
> >
> >  ],
> >  "binary_signature":null,
> >  "dictionaries":{  <>
> >
> >  },
> >  "snapshots":null,
> >  "rowkey_stats":[  <>
> >
> >  ]
> >   }
> >
> > This data loss is actually proving out very hea

Re: Facing Issues with Kylin 2.3.1

2018-04-06 Thread kdcool6932
Thanks ShaoFeng,That explains the process to me like when can an hbase table be 
dropped.
One more thing that I will like to understand. In one of our Kylin Test setups 
of 2.1 we have faced one issue.Can you possibly point me out to what and how it 
could have been done.
I am pointing to kylin metadata utility as described here, we have been using 
the same since we started using kylin.Referring to commands here: 
http://kylin.apache.org/docs23/howto/howto_backup_metadata.html
We had some empty segments in kylin due to which we were not able to query.So 
we took a metadata backup using the command , changed the json file in there by 
deleting the empty segments from .json file inside cubes folder in 
metadata backup directory and then we ran metadata reset command and also 
metadata clean with delete --true as well.
Then we restored the modified metadata backup.
All the segment tables for all the cubes from hbase got deleted.
So to grow my understanding around it would like to understand how this reset 
and clean steps actually works.
This would help us better to understand it functionally going ahead and help us 
better use the tool.

Thanks Ketan@Exponential



Sent from my Samsung Galaxy smartphone.
 Original message From: ShaoFeng Shi <shaofeng...@apache.org> 
Date: 06/04/2018  8:29 pm  (GMT+05:30) To: dev <dev@kylin.apache.org> Subject: 
Re: Facing Issues with Kylin 2.3.1 
Hello Ketan,

Thanks for the reporting.

Firstly, the HBase table name is generated at the moment a segment is
created; Seeing the table name does not mean it already exists in HBase.
Only when the build job is executed to the "Create HTable" step, Kylin will
request the table in HBase. So for a "NEW" segment, it is expected that the
table does not exist.

For the problem "some of the existing segment tables get deleted from
Hbase", as I know normally Kylin won't delete htable (unless merging
segments or run the StorageCleanupJob). Could you please check Kylin and
HBase logs to see when and how the table be dropped? If a table is deleted,
there should be something be logged. The logs will be helpful for analyzing
the issue.


2018-04-06 16:18 GMT+08:00 ketan dikshit <kdcool6...@yahoo.com.invalid>:

> Hi Team,
> We recently upgraded from Kylin 2.1 to Kylin 2.3.1
>
> Since then we are facing some issues with our cube building pipeline.
>
> While building new segments, sometimes(not always), some of the existing
> segment tables get deleted from Hbase.
> For example here is the segment Json for empty segment, it shows me the
> table name , but this table gets dropped from Hbase.
> {  <>
>  "uuid":"90972f20-d64d-4224-a1f1-cdb6a0ddb69c",
>  "name":"2018040211_2018040212",
>  "storage_location_identifier":"KYLIN_Z3C9IU2QNY",
>  "date_range_start":152266680,
>  "date_range_end":152267040,
>  "source_offset_start":0,
>  "source_offset_end":0,
>  "status":"NEW",
>  "size_kb":0,
>  "input_records":0,
>  "input_records_size":0,
>  "last_build_time":0,
>  "last_build_job_id":null,
>  "create_time_utc":1522700641299,
>  "cuboid_shard_nums":{  <>
>
>  },
>  "total_shards":0,
>  "blackout_cuboids":[  <>
>
>  ],
>  "binary_signature":null,
>  "dictionaries":{  <>
>
>  },
>  "snapshots":null,
>  "rowkey_stats":[  <>
>
>  ]
>   }
>
> This data loss is actually proving out very heavy business impact as we
> are always going back and restoring previous day snapshots and building the
> new segments again, hoping it doesn’t fails.
> Here are my kylin.props
> kylin.web.timezone=US/Pacific
> kylin.metadata.url=kylin2.1MetadataProduction@hbase
> kylin.storage.url=hbase
> kylin.env.hdfs-working-dir=/tmp/kylin-2.1-prod
> kylin.engine.mr.reduce-input-mb=300
> kylin.server.mode=all
> kylin.job.use-remote-cli=false
> kylin.job.remote-cli-working-dir=/tmp/kylin-2.1
> kylin.job.max-concurrent-jobs=10
> kylin.engine.mr.yarn-check-interval-seconds=10
> kylin.source.hive.database-for-flat-table=tmp_kylin
> kylin.storage.hbase.table-name-prefix=KYLIN_
> kylin.storage.hbase.compression-codec=lz4
> kylin.storage.hbase.region-cut-gb=3
> kylin.storage.hbase.min-region-count=1
> kylin.storage.hbase.max-region-count=500
> kylin.storage.partition.max-scan-bytes=16106127360
> kylin.storage.hbase.coprocessor-mem-gb=6
> kylin.security.profile=testing
>
> kylin.query.cache-enabled=true
> kylin.query.cache-threshold-duration=500
> kylin.query.cache-threshold-scan-count=10240
> kylin.storage.hbase.scan-cache-rows=4096
>
>
> Any idea around how and why this corruption might happen, How can even
> data get dropped while building some other segments.
>
> Thanks,
> Ketan@Exponential




-- 
Best regards,

Shaofeng Shi 史少锋


Re: Facing Issues with Kylin 2.3.1

2018-04-06 Thread ShaoFeng Shi
Hello Ketan,

Thanks for the reporting.

Firstly, the HBase table name is generated at the moment a segment is
created; Seeing the table name does not mean it already exists in HBase.
Only when the build job is executed to the "Create HTable" step, Kylin will
request the table in HBase. So for a "NEW" segment, it is expected that the
table does not exist.

For the problem "some of the existing segment tables get deleted from
Hbase", as I know normally Kylin won't delete htable (unless merging
segments or run the StorageCleanupJob). Could you please check Kylin and
HBase logs to see when and how the table be dropped? If a table is deleted,
there should be something be logged. The logs will be helpful for analyzing
the issue.


2018-04-06 16:18 GMT+08:00 ketan dikshit :

> Hi Team,
> We recently upgraded from Kylin 2.1 to Kylin 2.3.1
>
> Since then we are facing some issues with our cube building pipeline.
>
> While building new segments, sometimes(not always), some of the existing
> segment tables get deleted from Hbase.
> For example here is the segment Json for empty segment, it shows me the
> table name , but this table gets dropped from Hbase.
> {  <>
>  "uuid":"90972f20-d64d-4224-a1f1-cdb6a0ddb69c",
>  "name":"2018040211_2018040212",
>  "storage_location_identifier":"KYLIN_Z3C9IU2QNY",
>  "date_range_start":152266680,
>  "date_range_end":152267040,
>  "source_offset_start":0,
>  "source_offset_end":0,
>  "status":"NEW",
>  "size_kb":0,
>  "input_records":0,
>  "input_records_size":0,
>  "last_build_time":0,
>  "last_build_job_id":null,
>  "create_time_utc":1522700641299,
>  "cuboid_shard_nums":{  <>
>
>  },
>  "total_shards":0,
>  "blackout_cuboids":[  <>
>
>  ],
>  "binary_signature":null,
>  "dictionaries":{  <>
>
>  },
>  "snapshots":null,
>  "rowkey_stats":[  <>
>
>  ]
>   }
>
> This data loss is actually proving out very heavy business impact as we
> are always going back and restoring previous day snapshots and building the
> new segments again, hoping it doesn’t fails.
> Here are my kylin.props
> kylin.web.timezone=US/Pacific
> kylin.metadata.url=kylin2.1MetadataProduction@hbase
> kylin.storage.url=hbase
> kylin.env.hdfs-working-dir=/tmp/kylin-2.1-prod
> kylin.engine.mr.reduce-input-mb=300
> kylin.server.mode=all
> kylin.job.use-remote-cli=false
> kylin.job.remote-cli-working-dir=/tmp/kylin-2.1
> kylin.job.max-concurrent-jobs=10
> kylin.engine.mr.yarn-check-interval-seconds=10
> kylin.source.hive.database-for-flat-table=tmp_kylin
> kylin.storage.hbase.table-name-prefix=KYLIN_
> kylin.storage.hbase.compression-codec=lz4
> kylin.storage.hbase.region-cut-gb=3
> kylin.storage.hbase.min-region-count=1
> kylin.storage.hbase.max-region-count=500
> kylin.storage.partition.max-scan-bytes=16106127360
> kylin.storage.hbase.coprocessor-mem-gb=6
> kylin.security.profile=testing
>
> kylin.query.cache-enabled=true
> kylin.query.cache-threshold-duration=500
> kylin.query.cache-threshold-scan-count=10240
> kylin.storage.hbase.scan-cache-rows=4096
>
>
> Any idea around how and why this corruption might happen, How can even
> data get dropped while building some other segments.
>
> Thanks,
> Ketan@Exponential




-- 
Best regards,

Shaofeng Shi 史少锋