Re: [PredictionIO Error] Running Hbase

Pat Ferrel Mon, 13 Mar 2017 10:26:52 -0700

We will also release a Template that trims, compacts and optionally 
de-duplicates the DB using the SelfCleaningDataSource. As a template you can 
schedule it separately from `pio train`.  The SelfCleaningDataSource method is 
pretty slow so we run it on some clients daily to maintain a moving time window 
of data.


Here is the template, we’ll put it in the PIO Gallery after release. 
https://github.com/actionml/db-cleaner <https://github.com/actionml/db-cleaner>


On Mar 12, 2017, at 5:16 PM, Donald Szeto <[email protected]> wrote:

Hi Amy,

Since event server keeps adding events to the backend, the storage will grow 
indefinitely unless you implement some sort of data retention policy that 
periodically.

In 0.11, there are two options for this situation:
- You may use SelfCleaningDataSource. Backing up your existing data is highly 
recommended before you try it.
- If your use case allows you to overwrite events 
(https://github.com/apache/incubator-predictionio/pull/356 
<https://github.com/apache/incubator-predictionio/pull/356>), you may overwrite 
them instead of keep adding to it.

Your experience would be very helpful to others as well. Would you like to 
contribute how you fix your problem to the FAQ?

https://github.com/apache/incubator-predictionio/blob/livedoc/docs/manual/source/resources/faq.html.md
 
<https://github.com/apache/incubator-predictionio/blob/livedoc/docs/manual/source/resources/faq.html.md>

Regards,
Donald

On Fri, Mar 10, 2017 at 11:32 PM, Lin Amy <[email protected] 
<mailto:[email protected]>> wrote:
Hello everyone,

Mission completed!

The issue is solved after I fix the following error from `hbase hbck` :
ERROR: Region { meta => 
pio_event:events_1,,1488109005690.f2fe88521bdf946650842f74bb4c978d., hdfs => 
file:/home/crs/hbase/hbase/data/pio_event/events_1/f2fe88521bdf946650842f74bb4c978d,
 deployed =>  } not deployed on any region server.
ERROR: (region 
pio_event:events_1,\x80#X,1489209095682.97a91816f25aa71ce2e2a0342776ddbe.) 
First region should start with an empty key.  You need to  create a new region 
and regioninfo in HDFS to plug the hole.

`hbase hbck -repair` & `hbase hbck -repairHoles` doesn't solve the problem at 
all... 

But after trying these:
1. stoping HBase
2. delete recovered.edits folders for failing regions.
3. hbase hbck  -repairHoles
(ref: 
https://serverfault.com/questions/510290/hbase-hbck-cant-fix-region-inconsistencies
 
<https://serverfault.com/questions/510290/hbase-hbck-cant-fix-region-inconsistencies>)

Problem solved!!!
Hope it can saves others time when this occurs again (hopefully not... Orz)

Best regards,
Amy


Lin Amy <[email protected] <mailto:[email protected]>> 於 2017年3月11日 週六 
下午2:41寫道：
Hello again,

I have solved the problem with reference here: 
https://issues.apache.org/jira/browse/ZOOKEEPER-1621 
<https://issues.apache.org/jira/browse/ZOOKEEPER-1621>, and `pio status` 
returns me with a normal result, which seems great. 
However, the problem now is that I receive 500 (internal server error) with 
message that "The server was not able to produce a timely response to your 
request.". 
Also, when I do `pio train`, it fails with the following message:
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed 
after attempts=35, exceptions:
Sat Mar 11 14:00:10 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
java.net.ConnectException: Connection refused
Sat Mar 11 14:00:10 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in 
the failed servers list: PredictIO3.ucf.com/10.1.3.153:37708 
<http://predictio3.ucf.com/10.1.3.153:37708>
Sat Mar 11 14:00:11 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in 
the failed servers list: PredictIO3.ucf.com/10.1.3.153:37708 
<http://predictio3.ucf.com/10.1.3.153:37708>
Sat Mar 11 14:00:12 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in 
the failed servers list: PredictIO3.ucf.com/10.1.3.153:37708 
<http://predictio3.ucf.com/10.1.3.153:37708>
Sat Mar 11 14:00:14 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
java.net.ConnectException: Connection refused
Sat Mar 11 14:00:18 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
java.net.ConnectException: Connection refused
Sat Mar 11 14:00:28 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
java.net.ConnectException: Connection refused
Sat Mar 11 14:00:38 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
java.net.ConnectException: Connection refused
Sat Mar 11 14:00:48 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
java.net.ConnectException: Connection refused
Sat Mar 11 14:00:58 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
java.net.ConnectException: Connection refused
Sat Mar 11 14:01:18 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
java.net.ConnectException: Connection refused
Sat Mar 11 14:01:38 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
java.net.ConnectException: Connection refused
Sat Mar 11 14:01:58 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
java.net.ConnectException: Connection refused
Sat Mar 11 14:02:18 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
java.net.ConnectException: Connection refused
Sat Mar 11 14:02:39 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
java.net.ConnectException: Connection refused
Sat Mar 11 14:02:59 CST 2017, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@7dfeb08d, 
java.net.ConnectException: Connection refused

I have tried to delete everything inside /hbase/zookeeper by some online 
advise, but the issue remained.

Have someone met this failure and solved it?
Thank you and appreciate for any help!

Best regards,
Amy

Lin Amy <[email protected] <mailto:[email protected]>> 於 2017年3月11日 週六 
上午10:28寫道：
Hello,

Yesterday I found the disk is fulled, which lead to Hbase failure:

stopping 
hbase/home/crs/PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/bin/stop-hbase.sh:
 line 50: echo: write error: No space left on device
Java HotSpot(TM) 64-Bit Server VM warning: Insufficient space for shared memory 
file:
   853
Try using the -Djava.io.tmpdir= option to select an alternate temp location.

So I spare a lot of disk spaces, and tried to `pio-stop-all` and 
`pio-start-all`. Then `pio status` gave me error:
-----------------------------------------------------
[INFO] [Console$] Inspecting PredictionIO...
[INFO] [Console$] PredictionIO 0.10.0-incubating is installed at 
/home/crs/PredictionIO-0.10.0-incubating
[INFO] [Console$] Inspecting Apache Spark...
[INFO] [Console$] Apache Spark is installed at 
/home/crs/PredictionIO-0.10.0-incubating/vendors/spark-1.6.2-bin-hadoop2.6
[INFO] [Console$] Apache Spark 1.6.2 detected (meets minimum requirement of 
1.3.0)
[INFO] [Console$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: LOCALFS)...
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[ERROR] [RecoverableZooKeeper] ZooKeeper exists failed after 1 attempts
[ERROR] [ZooKeeperWatcher] hconnection-0x3fc05ea2, quorum=localhost:2181, 
baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
[WARN] [ZooKeeperRegistry] Can't retrieve clusterId from Zookeeper
[ERROR] [StorageClient] Cannot connect to ZooKeeper (ZooKeeper ensemble: 
localhost). Please make sure that the configuration is pointing at the correct 
ZooKeeper ensemble. By default, HBase manages its own ZooKeeper, so if you have 
not configured HBase to use an external ZooKeeper, that means your HBase is not 
started or configured properly.
[ERROR] [Storage$] Error initializing storage client for source HBASE
[ERROR] [Console$] Unable to connect to all storage backends successfully. The 
following shows the error message from the storage backend.
[ERROR] [Console$] Data source HBASE was not properly initialized. 
(org.apache.predictionio.data.storage.StorageClientException)
[ERROR] [Console$] Dumping configuration of initialized storage backend 
sources. Please make sure they are correct.
[ERROR] [Console$] Source Name: ELASTICSEARCH; Type: elasticsearch; 
Configuration: HOME -> 
/home/crs/PredictionIO-0.10.0-incubating/vendors/elasticsearch-1.7.5, HOSTS -> 
Slave2,PredictIO3, PORTS -> 9300,9320, CLUSTERNAME -> CRS, TYPE -> elasticsearch
[ERROR] [Console$] Source Name: LOCALFS; Type: localfs; Configuration: PATH -> 
/home/crs/.pio_store/models, TYPE -> localfs
[ERROR] [Console$] Source Name: HBASE; Type: (error); Configuration: (error)

------------------------------------------------------
My guess is that it fails whenever it tried to restart zookeeper.

My pio-env.sh & some error in `hbase-crs-master-PredictIO3.log` is also 
attached. 

Thank you!!!!

Best regards,
Amy

Re: [PredictionIO Error] Running Hbase

Reply via email to