Hi Abhimanyu, In that case, you should probably wait for someone else to help, I'd done the same thing with one of the old recommendation template, but that was with 0.10.0 .
Thanks Vaghawan On Thu, Oct 26, 2017 at 12:56 PM, Abhimanyu Nagrath < abhimanyunagr...@gmail.com> wrote: > Hi Vaghawan, > > I have made that template compatible with the version mentioned > above. Changed versions of engine.json and changed packages name. > > > Regards, > Abhimanyu > > On Thu, Oct 26, 2017 at 12:39 PM, Vaghawan Ojha <vaghawan...@gmail.com> > wrote: > >> Hi Abhimanyu, >> >> I don't think this template works with version 0.11.0. As per the >> template : >> >> update for PredictionIO 0.9.2, including: >> >> I don't think it supports the latest pio. You rather switch it to 0.9.2 >> if you want to experiment it. >> >> On Thu, Oct 26, 2017 at 12:52 PM, Abhimanyu Nagrath < >> abhimanyunagr...@gmail.com> wrote: >> >>> Hi Vaghawan , >>> >>> I am using v0.11.0-incubating with (ES - v5.2.1 , Hbase - 1.2.6 , Spark >>> - 2.1.0). >>> >>> Regards, >>> Abhimanyu >>> >>> On Thu, Oct 26, 2017 at 12:31 PM, Vaghawan Ojha <vaghawan...@gmail.com> >>> wrote: >>> >>>> Hi Abhimanyu, >>>> >>>> Ok, which version of pio is this? Because the template looks old to me. >>>> >>>> On Thu, Oct 26, 2017 at 12:44 PM, Abhimanyu Nagrath < >>>> abhimanyunagr...@gmail.com> wrote: >>>> >>>>> Hi Vaghawan, >>>>> >>>>> yes, the spark master connection string is correct I am getting >>>>> executor fails to connect to spark master after 4-5 hrs. >>>>> >>>>> >>>>> Regards, >>>>> Abhimanyu >>>>> >>>>> On Thu, Oct 26, 2017 at 12:17 PM, Sachin Kamkar < >>>>> sachinkam...@gmail.com> wrote: >>>>> >>>>>> It should be correct, as the user got the exception after 3-4 hours >>>>>> of starting. So looks like something else broke. OOM? >>>>>> >>>>>> With Regards, >>>>>> >>>>>> Sachin >>>>>> ⚜KTBFFH⚜ >>>>>> >>>>>> On Thu, Oct 26, 2017 at 12:15 PM, Vaghawan Ojha < >>>>>> vaghawan...@gmail.com> wrote: >>>>>> >>>>>>> "Executor failed to connect with master ", are you sure the --master >>>>>>> spark://*.*.*.*:7077 is correct? >>>>>>> >>>>>>> Like the one you copied from the spark master's web ui? sometimes >>>>>>> having that wrong fails to connect with the spark master. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> On Thu, Oct 26, 2017 at 12:02 PM, Abhimanyu Nagrath < >>>>>>> abhimanyunagr...@gmail.com> wrote: >>>>>>> >>>>>>>> I am new to predictionIO . I am using template >>>>>>>> https://github.com/EmergentOrder/template-scala-probabilisti >>>>>>>> c-classifier-batch-lbfgs. >>>>>>>> >>>>>>>> My training dataset count is 1184603 having approx 6500 features. I >>>>>>>> am using ec2 r4.8xlarge system (240 GB RAM, 32 Cores, 200 GB Swap). >>>>>>>> >>>>>>>> >>>>>>>> I tried two ways for training >>>>>>>> >>>>>>>> 1. Command ' >>>>>>>> >>>>>>>> > pio train -- --driver-memory 120G --executor-memory 100G -- conf >>>>>>>> > spark.network.timeout=10000000 >>>>>>>> >>>>>>>> ' >>>>>>>> Its throwing exception after 3-4 hours. >>>>>>>> >>>>>>>> >>>>>>>> Exception in thread "main" org.apache.spark.SparkException: >>>>>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, >>>>>>>> most >>>>>>>> recent failure: Lost task 0.0 in stage 1.0 (TID 15, localhost, executor >>>>>>>> driver): ExecutorLostFailure (executor driver exited caused by one of >>>>>>>> the >>>>>>>> running tasks) Reason: Executor heartbeat timed out after 181529 ms >>>>>>>> Driver stacktrace: >>>>>>>> at org.apache.spark.scheduler.DAGScheduler.org >>>>>>>> $apache$spark$scheduler$DAGScheduler$$failJobAn >>>>>>>> dIndependentStages(DAGScheduler.scala:1435) >>>>>>>> at org.apache.spark.scheduler.DAG >>>>>>>> Scheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423) >>>>>>>> at org.apache.spark.scheduler.DAG >>>>>>>> Scheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422) >>>>>>>> at scala.collection.mutable.Resiz >>>>>>>> ableArray$class.foreach(ResizableArray.scala:59) >>>>>>>> at scala.collection.mutable.Array >>>>>>>> Buffer.foreach(ArrayBuffer.scala:48) >>>>>>>> at org.apache.spark.scheduler.DAG >>>>>>>> Scheduler.abortStage(DAGScheduler.scala:1422) >>>>>>>> at org.apache.spark.scheduler.DAG >>>>>>>> Scheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler. >>>>>>>> scala:802) >>>>>>>> at org.apache.spark.scheduler.DAG >>>>>>>> Scheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler. >>>>>>>> scala:802) >>>>>>>> at scala.Option.foreach(Option.scala:257) >>>>>>>> at org.apache.spark.scheduler.DAG >>>>>>>> Scheduler.handleTaskSetFailed(DAGScheduler.scala:802) >>>>>>>> at org.apache.spark.scheduler.DAG >>>>>>>> SchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650) >>>>>>>> at org.apache.spark.scheduler.DAG >>>>>>>> SchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605) >>>>>>>> at org.apache.spark.scheduler.DAG >>>>>>>> SchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594) >>>>>>>> at org.apache.spark.util.EventLoo >>>>>>>> p$$anon$1.run(EventLoop.scala:48) >>>>>>>> at org.apache.spark.scheduler.DAG >>>>>>>> Scheduler.runJob(DAGScheduler.scala:628) >>>>>>>> at org.apache.spark.SparkContext. >>>>>>>> runJob(SparkContext.scala:1918) >>>>>>>> at org.apache.spark.SparkContext. >>>>>>>> runJob(SparkContext.scala:1931) >>>>>>>> at org.apache.spark.SparkContext. >>>>>>>> runJob(SparkContext.scala:1944) >>>>>>>> at org.apache.spark.rdd.RDD$$anon >>>>>>>> fun$take$1.apply(RDD.scala:1353) >>>>>>>> at org.apache.spark.rdd.RDDOperat >>>>>>>> ionScope$.withScope(RDDOperationScope.scala:151) >>>>>>>> at org.apache.spark.rdd.RDDOperat >>>>>>>> ionScope$.withScope(RDDOperationScope.scala:112) >>>>>>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) >>>>>>>> at org.apache.spark.rdd.RDD.take(RDD.scala:1326) >>>>>>>> at org.example.classification.Log >>>>>>>> isticRegressionWithLBFGSAlgorithm.train(LogisticRegressionWi >>>>>>>> thLBFGSAlgorithm.scala:28) >>>>>>>> at org.example.classification.Log >>>>>>>> isticRegressionWithLBFGSAlgorithm.train(LogisticRegressionWi >>>>>>>> thLBFGSAlgorithm.scala:21) >>>>>>>> at org.apache.predictionio.contro >>>>>>>> ller.P2LAlgorithm.trainBase(P2LAlgorithm.scala:49) >>>>>>>> at org.apache.predictionio.contro >>>>>>>> ller.Engine$$anonfun$18.apply(Engine.scala:692) >>>>>>>> at org.apache.predictionio.contro >>>>>>>> ller.Engine$$anonfun$18.apply(Engine.scala:692) >>>>>>>> at scala.collection.TraversableLi >>>>>>>> ke$$anonfun$map$1.apply(TraversableLike.scala:234) >>>>>>>> at scala.collection.TraversableLi >>>>>>>> ke$$anonfun$map$1.apply(TraversableLike.scala:234) >>>>>>>> at scala.collection.immutable.Lis >>>>>>>> t.foreach(List.scala:381) >>>>>>>> at scala.collection.TraversableLi >>>>>>>> ke$class.map(TraversableLike.scala:234) >>>>>>>> at scala.collection.immutable.List.map(List.scala:285) >>>>>>>> at org.apache.predictionio.contro >>>>>>>> ller.Engine$.train(Engine.scala:692) >>>>>>>> at org.apache.predictionio.contro >>>>>>>> ller.Engine.train(Engine.scala:177) >>>>>>>> at org.apache.predictionio.workfl >>>>>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67) >>>>>>>> at org.apache.predictionio.workfl >>>>>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:250) >>>>>>>> at org.apache.predictionio.workfl >>>>>>>> ow.CreateWorkflow.main(CreateWorkflow.scala) >>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>>>>>>> Method) >>>>>>>> at sun.reflect.NativeMethodAccess >>>>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62) >>>>>>>> at sun.reflect.DelegatingMethodAc >>>>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>>>> at java.lang.reflect.Method.invoke(Method.java:498) >>>>>>>> at org.apache.spark.deploy.SparkS >>>>>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub >>>>>>>> mit.scala:738) >>>>>>>> at org.apache.spark.deploy.SparkS >>>>>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187) >>>>>>>> at org.apache.spark.deploy.SparkS >>>>>>>> ubmit$.submit(SparkSubmit.scala:212) >>>>>>>> at org.apache.spark.deploy.SparkS >>>>>>>> ubmit$.main(SparkSubmit.scala:126) >>>>>>>> at org.apache.spark.deploy.SparkS >>>>>>>> ubmit.main(SparkSubmit.scala) >>>>>>>> >>>>>>>> 2. I started spark standalone cluster with 1 master and 3 workers >>>>>>>> and executed the command >>>>>>>> >>>>>>>> > pio train -- --master spark://*.*.*.*:7077 --driver-memory 50G >>>>>>>> > --executor-memory 50G >>>>>>>> >>>>>>>> And after some times getting the error . Executor failed to connect >>>>>>>> with master and training gets stopped. >>>>>>>> >>>>>>>> I have changed the feature count from 6500 - > 500 and still the >>>>>>>> condition is same. So can anyone suggest me am I missing something >>>>>>>> >>>>>>>> and In between training getting continuous warnings like : >>>>>>>> [ >>>>>>>> >>>>>>>> > WARN] [ScannerCallable] Ignore, probably already closed >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Abhimanyu >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >