Try importing it in batches, like split file into 28 1 GB files and import
one after the other.

File can be split , provided each line have one event. here is how you can
split with simple command
lets say you have in total 12k lines in your original 28GB file (use wc -l
bigfile.json to do this), then first file would be
head -1000 bigfile.json > chunk1.json
second chunk file would be
head -2000 | tail -1000 > chunk2.json
and third chunk file would be
head -3000 | tail -1000 > chunk3.json

I am sure you can write a tiny script to do this. then you can easily
import chunk by chunk with same script.

-Mahesh

On Wed, Aug 2, 2017 at 5:22 PM, Carlos Vidal <carlos.vi...@beeva.com> wrote:

> Hello,
>
> I have installed the pio + ur AMI in AWS, in an m4.2xlarge instance with
> 32GB of RAM and 8 VCPU.
>
> When I try to import a 20GB events file por my application, the system
> crashes. The command I have used is:
>
>
> pio import --appid 4 --input my_events.json
>
> this command launch an spark job that needs to perform 800 task. When the
> process reaches the task 211 it crashes. This is what I can see in my
> pio.log file:
>
> 2017-08-02 11:16:17,101 WARN  org.apache.hadoop.hbase.clien
> t.HConnectionManager$HConnectionImplementation [htable-pool230-t1] -
> Encountered problems when prefetch hbase:meta table:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=35, exceptions:
> Wed Aug 02 11:07:06 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, 
> org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> This server is in the failed servers list: localhost/127.0.0.1:44866
> Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, 
> org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> This server is in the failed servers list: localhost/127.0.0.1:44866
> Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, 
> org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> This server is in the failed servers list: localhost/127.0.0.1:44866
> Wed Aug 02 11:07:08 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, 
> org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> This server is in the failed servers list: localhost/127.0.0.1:44866
> Wed Aug 02 11:07:10 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:07:14 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:07:24 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:07:34 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:07:44 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:07:54 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:08:15 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:08:35 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:08:55 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:09:15 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:09:35 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:09:55 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:10:15 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:10:35 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:10:55 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:11:15 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:11:35 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:11:55 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:12:15 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:12:35 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:12:56 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:13:16 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:13:36 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:13:56 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:14:16 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:14:36 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:14:56 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:15:16 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:15:36 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:15:56 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:16:17 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
>
> at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRet
> ries(RpcRetryingCaller.java:129)
> at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:714)
> at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScan
> ner.java:144)
> at org.apache.hadoop.hbase.client.HConnectionManager$HConnectio
> nImplementation.prefetchRegionCache(HConnectionManager.java:1153)
> at org.apache.hadoop.hbase.client.HConnectionManager$HConnectio
> nImplementation.locateRegionInMeta(HConnectionManager.java:1217)
> at org.apache.hadoop.hbase.client.HConnectionManager$HConnectio
> nImplementation.locateRegion(HConnectionManager.java:1105)
> at org.apache.hadoop.hbase.client.HConnectionManager$HConnectio
> nImplementation.locateRegion(HConnectionManager.java:1062)
> at org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation
> (AsyncProcess.java:365)
> at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProc
> ess.java:507)
> at org.apache.hadoop.hbase.client.AsyncProcess.logAndResubmit(
> AsyncProcess.java:717)
> at org.apache.hadoop.hbase.client.AsyncProcess.receiveGlobalFai
> lure(AsyncProcess.java:664)
> at org.apache.hadoop.hbase.client.AsyncProcess.access$100(
> AsyncProcess.java:93)
> at org.apache.hadoop.hbase.client.AsyncProcess$1.run(AsyncProce
> ss.java:547)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> Executor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> lExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWi
> thTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnec
> tion(RpcClient.java:578)
> at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstre
> ams(RpcClient.java:868)
> at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClien
> t.java:1543)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
> at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(Rpc
> Client.java:1661)
> at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImpl
> ementation.callBlockingMethod(RpcClient.java:1719)
> at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> ClientService$BlockingStub.get(ClientProtos.java:29966)
> at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore
> (ProtobufUtil.java:1508)
> at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:710)
> at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:708)
> at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRet
> ries(RpcRetryingCaller.java:114)
> ... 17 more
> 2017-08-02 11:21:04,430 ERROR org.apache.spark.scheduler.LiveListenerBus
> [Thread-3] - SparkListenerBus has already stopped! Dropping event
> SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@66c4a5d2)
> 2017-08-02 11:21:04,431 ERROR org.apache.spark.scheduler.LiveListenerBus
> [Thread-3] - SparkListenerBus has already stopped! Dropping event
> SparkListenerJobEnd(0,1501672864431,JobFailed(org.apache.spark.SparkException:
> Job 0 cancelled because SparkContext was shut down))
> 2017-08-02 11:28:47,129 INFO  
> org.apache.predictionio.tools.commands.Management$
> [main] - Inspecting PredictionIO...
> 2017-08-02 11:28:47,132 INFO  
> org.apache.predictionio.tools.commands.Management$
> [main] - PredictionIO 0.11.0-incubating is installed at
> /opt/data/PredictionIO-0.11.0-incubating
> 2017-08-02 11:28:47,132 INFO  
> org.apache.predictionio.tools.commands.Management$
> [main] - Inspecting Apache Spark...
> 2017-08-02 11:28:47,142 INFO  
> org.apache.predictionio.tools.commands.Management$
> [main] - Apache Spark is installed at /usr/local/spark
> 2017-08-02 11:28:47,175 INFO  
> org.apache.predictionio.tools.commands.Management$
> [main] - Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
> 2017-08-02 11:28:47,175 INFO  
> org.apache.predictionio.tools.commands.Management$
> [main] - Inspecting storage backend connections...
> 2017-08-02 11:28:47,195 INFO  org.apache.predictionio.data.storage.Storage$
> [main] - Verifying Meta Data Backend (Source: ELASTICSEARCH)...
> 2017-08-02 11:28:48,225 INFO  org.apache.predictionio.data.storage.Storage$
> [main] - Verifying Model Data Backend (Source: HDFS)...
> 2017-08-02 11:28:48,447 INFO  org.apache.predictionio.data.storage.Storage$
> [main] - Verifying Event Data Backend (Source: HBASE)...
> 2017-08-02 11:28:48,979 INFO  org.apache.predictionio.data.storage.Storage$
> [main] - Test writing to Event Store (App Id 0)...
> 2017-08-02 11:29:49,026 ERROR 
> org.apache.predictionio.tools.commands.Management$
> [main] - Unable to connect to all storage backends successfully.
>
>
>
>
>
>
> On the other hand, once this happens, if I run pio status this is what I
> obtain:
>
> aml@ip-10-41-11-227:~$ pio status
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/opt/data/Prediction
> IO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-
> 0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/opt/data/Prediction
> IO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.
> jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> [INFO] [Management$] Inspecting PredictionIO...
> [INFO] [Management$] PredictionIO 0.11.0-incubating is installed at
> /opt/data/PredictionIO-0.11.0-incubating
> [INFO] [Management$] Inspecting Apache Spark...
> [INFO] [Management$] Apache Spark is installed at /usr/local/spark
> [INFO] [Management$] Apache Spark 1.6.3 detected (meets minimum
> requirement of 1.3.0)
> [INFO] [Management$] Inspecting storage backend connections...
> [INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
> [INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
> [INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
> [INFO] [Storage$] Test writing to Event Store (App Id 0)...
> [ERROR] [Management$] Unable to connect to all storage backends
> successfully.
> The following shows the error message from the storage backend.
>
> Failed after attempts=1, exceptions:
> Wed Aug 02 11:45:04 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@43045f9f, java.net.SocketTimeoutException: Call to
> localhost/127.0.0.1:39562 failed because java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/127.0.0.1:51462 remote=
> localhost/127.0.0.1:39562]
>  (org.apache.hadoop.hbase.client.RetriesExhaustedException)
>
> Dumping configuration of initialized storage backend sources.
> Please make sure they are correct.
>
> Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS ->
> 127.0.0.1, TYPE -> elasticsearch, CLUSTERNAME -> elasticsearch
> Source Name: HBASE; Type: hbase; Configuration: TYPE -> hbase
> Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH -> /models
>
> Do you know what is the problem? How can I restart the services once the
> system fails?
>
> Thanks.
>
> Carlos Vidal.
>

-- 
This email is subject to Tracxn's Email Policy 
<https://tracxn.com/emailpolicy>

Reply via email to