Hi Donald, Thank you for advice! And of course I will contribute to FAQ later.
Best regards, Amy Donald Szeto <[email protected]> 於 2017年3月13日 週一 上午8:23寫道: > Hi Amy, > > Since event server keeps adding events to the backend, the storage will > grow indefinitely unless you implement some sort of data retention policy > that periodically. > > In 0.11, there are two options for this situation: > - You may use SelfCleaningDataSource. Backing up your existing data is > highly recommended before you try it. > - If your use case allows you to overwrite events ( > https://github.com/apache/incubator-predictionio/pull/356), you may > overwrite them instead of keep adding to it. > > Your experience would be very helpful to others as well. Would you like to > contribute how you fix your problem to the FAQ? > > > https://github.com/apache/incubator-predictionio/blob/livedoc/docs/manual/source/resources/faq.html.md > > Regards, > Donald > > On Fri, Mar 10, 2017 at 11:32 PM, Lin Amy <[email protected]> wrote: > > Hello everyone, > > Mission completed! > > The issue is solved after I fix the following error from `hbase hbck` : > ERROR: Region { meta => > pio_event:events_1,,1488109005690.f2fe88521bdf946650842f74bb4c978d., hdfs > => > file:/home/crs/hbase/hbase/data/pio_event/events_1/f2fe88521bdf946650842f74bb4c978d, > deployed => } not deployed on any region server. > ERROR: (region > pio_event:events_1,\x80#X,1489209095682.97a91816f25aa71ce2e2a0342776ddbe.) > First region should start with an empty key. You need to create a new > region and regioninfo in HDFS to plug the hole. > > `hbase hbck -repair` & `hbase hbck -repairHoles` doesn't solve the problem > at all... > > But after trying these: > 1. stoping HBase > 2. delete recovered.edits folders for failing regions. > 3. hbase hbck -repairHoles > (ref: > https://serverfault.com/questions/510290/hbase-hbck-cant-fix-region-inconsistencies > ) > > Problem solved!!! > Hope it can saves others time when this occurs again (hopefully not... Orz) > > Best regards, > Amy > > > Lin Amy <[email protected]> 於 2017年3月11日 週六 下午2:41寫道: > > Hello again, > > I have solved the problem with reference here: > https://issues.apache.org/jira/browse/ZOOKEEPER-1621, and `pio status` > returns me with a normal result, which seems great. > However, the problem now is that I receive 500 (internal server error) > with message that "The server was not able to produce a timely response > to your request.". > Also, when I do `pio train`, it fails with the following message: > Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: > Failed after attempts=35, exceptions: Sat Mar 11 14:00:10 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > java.net.ConnectException: Connection refused Sat Mar 11 14:00:10 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is > in the failed servers list: PredictIO3.ucf.com/10.1.3.153:37708 Sat Mar > 11 14:00:11 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is > in the failed servers list: PredictIO3.ucf.com/10.1.3.153:37708 Sat Mar > 11 14:00:12 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is > in the failed servers list: PredictIO3.ucf.com/10.1.3.153:37708 Sat Mar > 11 14:00:14 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > java.net.ConnectException: Connection refused Sat Mar 11 14:00:18 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > java.net.ConnectException: Connection refused Sat Mar 11 14:00:28 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > java.net.ConnectException: Connection refused Sat Mar 11 14:00:38 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > java.net.ConnectException: Connection refused Sat Mar 11 14:00:48 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > java.net.ConnectException: Connection refused Sat Mar 11 14:00:58 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > java.net.ConnectException: Connection refused Sat Mar 11 14:01:18 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > java.net.ConnectException: Connection refused Sat Mar 11 14:01:38 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > java.net.ConnectException: Connection refused Sat Mar 11 14:01:58 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > java.net.ConnectException: Connection refused Sat Mar 11 14:02:18 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > java.net.ConnectException: Connection refused Sat Mar 11 14:02:39 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > java.net.ConnectException: Connection refused Sat Mar 11 14:02:59 CST 2017, > org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, > java.net.ConnectException: Connection refused > > I have tried to delete everything inside /hbase/zookeeper by some online > advise, but the issue remained. > > Have someone met this failure and solved it? > Thank you and appreciate for any help! > > Best regards, > Amy > > Lin Amy <[email protected]> 於 2017年3月11日 週六 上午10:28寫道: > > Hello, > > Yesterday I found the disk is fulled, which lead to Hbase failure: > > *stopping > hbase/home/crs/PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/bin/stop-hbase.sh: > line 50: echo: write error: No space left on device* > *Java HotSpot(TM) 64-Bit Server VM warning: Insufficient space for shared > memory file:* > * 853* > *Try using the -Djava.io.tmpdir= option to select an alternate temp > location.* > > So I spare a lot of disk spaces, and tried to `pio-stop-all` and > `pio-start-all`. Then `pio status` gave me error: > ----------------------------------------------------- > *[INFO] [Console$] Inspecting PredictionIO...* > *[INFO] [Console$] PredictionIO 0.10.0-incubating is installed at > /home/crs/PredictionIO-0.10.0-incubating* > *[INFO] [Console$] Inspecting Apache Spark...* > *[INFO] [Console$] Apache Spark is installed at > /home/crs/PredictionIO-0.10.0-incubating/vendors/spark-1.6.2-bin-hadoop2.6* > *[INFO] [Console$] Apache Spark 1.6.2 detected (meets minimum requirement > of 1.3.0)* > *[INFO] [Console$] Inspecting storage backend connections...* > *[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...* > *[INFO] [Storage$] Verifying Model Data Backend (Source: LOCALFS)...* > *[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...* > *[ERROR] [RecoverableZooKeeper] ZooKeeper exists failed after 1 attempts* > *[ERROR] [ZooKeeperWatcher] hconnection-0x3fc05ea2, quorum=localhost:2181, > baseZNode=/hbase Received unexpected KeeperException, re-throwing exception* > *[WARN] [ZooKeeperRegistry] Can't retrieve clusterId from Zookeeper* > *[ERROR] [StorageClient] Cannot connect to ZooKeeper (ZooKeeper ensemble: > localhost). Please make sure that the configuration is pointing at the > correct ZooKeeper ensemble. By default, HBase manages its own ZooKeeper, so > if you have not configured HBase to use an external ZooKeeper, that means > your HBase is not started or configured properly.* > *[ERROR] [Storage$] Error initializing storage client for source HBASE* > *[ERROR] [Console$] Unable to connect to all storage backends > successfully. The following shows the error message from the storage > backend.* > *[ERROR] [Console$] Data source HBASE was not properly initialized. > (org.apache.predictionio.data.storage.StorageClientException)* > *[ERROR] [Console$] Dumping configuration of initialized storage backend > sources. Please make sure they are correct.* > *[ERROR] [Console$] Source Name: ELASTICSEARCH; Type: elasticsearch; > Configuration: HOME -> > /home/crs/PredictionIO-0.10.0-incubating/vendors/elasticsearch-1.7.5, HOSTS > -> Slave2,PredictIO3, PORTS -> 9300,9320, CLUSTERNAME -> CRS, TYPE -> > elasticsearch* > *[ERROR] [Console$] Source Name: LOCALFS; Type: localfs; Configuration: > PATH -> /home/crs/.pio_store/models, TYPE -> localfs* > *[ERROR] [Console$] Source Name: HBASE; Type: (error); Configuration: > (error)* > > ------------------------------------------------------ > My guess is that it fails whenever it tried to restart zookeeper. > > My pio-env.sh & some error in `hbase-crs-master-PredictIO3.log` is also > attached. > > Thank you!!!! > > Best regards, > Amy > > >
