Re: Not able to train data

Vaghawan Ojha Thu, 26 Oct 2017 01:05:03 -0700

Hi Abhimanyu,

In that case, you should probably wait for someone else to help, I'd done
the same thing with one of the old recommendation template, but that was
with 0.10.0 .


Thanks
Vaghawan


On Thu, Oct 26, 2017 at 12:56 PM, Abhimanyu Nagrath <
abhimanyunagr...@gmail.com> wrote:

> Hi Vaghawan,
>
> I have made that template compatible with the version mentioned
> above. Changed versions of engine.json and changed packages name.
>
>
> Regards,
> Abhimanyu
>
> On Thu, Oct 26, 2017 at 12:39 PM, Vaghawan Ojha <vaghawan...@gmail.com>
> wrote:
>
>> Hi Abhimanyu,
>>
>> I don't think this template works with version 0.11.0. As per the
>> template :
>>
>> update for PredictionIO 0.9.2, including:
>>
>> I don't think it supports the latest pio. You rather switch it to 0.9.2
>> if you want to experiment it.
>>
>> On Thu, Oct 26, 2017 at 12:52 PM, Abhimanyu Nagrath <
>> abhimanyunagr...@gmail.com> wrote:
>>
>>> Hi Vaghawan ,
>>>
>>> I am using v0.11.0-incubating with (ES - v5.2.1 , Hbase - 1.2.6 , Spark
>>> - 2.1.0).
>>>
>>> Regards,
>>> Abhimanyu
>>>
>>> On Thu, Oct 26, 2017 at 12:31 PM, Vaghawan Ojha <vaghawan...@gmail.com>
>>> wrote:
>>>
>>>> Hi Abhimanyu,
>>>>
>>>> Ok, which version of pio is this? Because the template looks old to me.
>>>>
>>>> On Thu, Oct 26, 2017 at 12:44 PM, Abhimanyu Nagrath <
>>>> abhimanyunagr...@gmail.com> wrote:
>>>>
>>>>> Hi Vaghawan,
>>>>>
>>>>> yes, the spark master connection string is correct I am getting
>>>>> executor fails to connect to spark master after 4-5 hrs.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Abhimanyu
>>>>>
>>>>> On Thu, Oct 26, 2017 at 12:17 PM, Sachin Kamkar <
>>>>> sachinkam...@gmail.com> wrote:
>>>>>
>>>>>> It should be correct, as the user got the exception after 3-4 hours
>>>>>> of starting. So looks like something else broke. OOM?
>>>>>>
>>>>>> With Regards,
>>>>>>
>>>>>>      Sachin
>>>>>> ⚜KTBFFH⚜
>>>>>>
>>>>>> On Thu, Oct 26, 2017 at 12:15 PM, Vaghawan Ojha <
>>>>>> vaghawan...@gmail.com> wrote:
>>>>>>
>>>>>>> "Executor failed to connect with master ", are you sure the --master
>>>>>>> spark://*.*.*.*:7077 is correct?
>>>>>>>
>>>>>>> Like the one you copied from the spark master's web ui? sometimes
>>>>>>> having that wrong fails to connect with the spark master.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Thu, Oct 26, 2017 at 12:02 PM, Abhimanyu Nagrath <
>>>>>>> abhimanyunagr...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I am new to predictionIO . I am using template
>>>>>>>> https://github.com/EmergentOrder/template-scala-probabilisti
>>>>>>>> c-classifier-batch-lbfgs.
>>>>>>>>
>>>>>>>> My training dataset count is 1184603 having approx 6500 features. I
>>>>>>>> am using ec2 r4.8xlarge system (240 GB RAM, 32 Cores, 200 GB Swap).
>>>>>>>>
>>>>>>>>
>>>>>>>> I tried two ways for training
>>>>>>>>
>>>>>>>>  1. Command '
>>>>>>>>
>>>>>>>> > pio train -- --driver-memory 120G --executor-memory 100G -- conf
>>>>>>>> > spark.network.timeout=10000000
>>>>>>>>
>>>>>>>> '
>>>>>>>>   Its throwing exception after 3-4 hours.
>>>>>>>>
>>>>>>>>
>>>>>>>>     Exception in thread "main" org.apache.spark.SparkException:
>>>>>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, 
>>>>>>>> most
>>>>>>>> recent failure: Lost task 0.0 in stage 1.0 (TID 15, localhost, executor
>>>>>>>> driver): ExecutorLostFailure (executor driver exited caused by one of 
>>>>>>>> the
>>>>>>>> running tasks) Reason: Executor heartbeat timed out after 181529 ms
>>>>>>>>     Driver stacktrace:
>>>>>>>>             at org.apache.spark.scheduler.DAGScheduler.org
>>>>>>>> $apache$spark$scheduler$DAGScheduler$$failJobAn
>>>>>>>> dIndependentStages(DAGScheduler.scala:1435)
>>>>>>>>             at org.apache.spark.scheduler.DAG
>>>>>>>> Scheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
>>>>>>>>             at org.apache.spark.scheduler.DAG
>>>>>>>> Scheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
>>>>>>>>             at scala.collection.mutable.Resiz
>>>>>>>> ableArray$class.foreach(ResizableArray.scala:59)
>>>>>>>>             at scala.collection.mutable.Array
>>>>>>>> Buffer.foreach(ArrayBuffer.scala:48)
>>>>>>>>             at org.apache.spark.scheduler.DAG
>>>>>>>> Scheduler.abortStage(DAGScheduler.scala:1422)
>>>>>>>>             at org.apache.spark.scheduler.DAG
>>>>>>>> Scheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.
>>>>>>>> scala:802)
>>>>>>>>             at org.apache.spark.scheduler.DAG
>>>>>>>> Scheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.
>>>>>>>> scala:802)
>>>>>>>>             at scala.Option.foreach(Option.scala:257)
>>>>>>>>             at org.apache.spark.scheduler.DAG
>>>>>>>> Scheduler.handleTaskSetFailed(DAGScheduler.scala:802)
>>>>>>>>             at org.apache.spark.scheduler.DAG
>>>>>>>> SchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
>>>>>>>>             at org.apache.spark.scheduler.DAG
>>>>>>>> SchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
>>>>>>>>             at org.apache.spark.scheduler.DAG
>>>>>>>> SchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
>>>>>>>>             at org.apache.spark.util.EventLoo
>>>>>>>> p$$anon$1.run(EventLoop.scala:48)
>>>>>>>>             at org.apache.spark.scheduler.DAG
>>>>>>>> Scheduler.runJob(DAGScheduler.scala:628)
>>>>>>>>             at org.apache.spark.SparkContext.
>>>>>>>> runJob(SparkContext.scala:1918)
>>>>>>>>             at org.apache.spark.SparkContext.
>>>>>>>> runJob(SparkContext.scala:1931)
>>>>>>>>             at org.apache.spark.SparkContext.
>>>>>>>> runJob(SparkContext.scala:1944)
>>>>>>>>             at org.apache.spark.rdd.RDD$$anon
>>>>>>>> fun$take$1.apply(RDD.scala:1353)
>>>>>>>>             at org.apache.spark.rdd.RDDOperat
>>>>>>>> ionScope$.withScope(RDDOperationScope.scala:151)
>>>>>>>>             at org.apache.spark.rdd.RDDOperat
>>>>>>>> ionScope$.withScope(RDDOperationScope.scala:112)
>>>>>>>>             at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>>>>>>>>             at org.apache.spark.rdd.RDD.take(RDD.scala:1326)
>>>>>>>>             at org.example.classification.Log
>>>>>>>> isticRegressionWithLBFGSAlgorithm.train(LogisticRegressionWi
>>>>>>>> thLBFGSAlgorithm.scala:28)
>>>>>>>>             at org.example.classification.Log
>>>>>>>> isticRegressionWithLBFGSAlgorithm.train(LogisticRegressionWi
>>>>>>>> thLBFGSAlgorithm.scala:21)
>>>>>>>>             at org.apache.predictionio.contro
>>>>>>>> ller.P2LAlgorithm.trainBase(P2LAlgorithm.scala:49)
>>>>>>>>             at org.apache.predictionio.contro
>>>>>>>> ller.Engine$$anonfun$18.apply(Engine.scala:692)
>>>>>>>>             at org.apache.predictionio.contro
>>>>>>>> ller.Engine$$anonfun$18.apply(Engine.scala:692)
>>>>>>>>             at scala.collection.TraversableLi
>>>>>>>> ke$$anonfun$map$1.apply(TraversableLike.scala:234)
>>>>>>>>             at scala.collection.TraversableLi
>>>>>>>> ke$$anonfun$map$1.apply(TraversableLike.scala:234)
>>>>>>>>             at scala.collection.immutable.Lis
>>>>>>>> t.foreach(List.scala:381)
>>>>>>>>             at scala.collection.TraversableLi
>>>>>>>> ke$class.map(TraversableLike.scala:234)
>>>>>>>>             at scala.collection.immutable.List.map(List.scala:285)
>>>>>>>>             at org.apache.predictionio.contro
>>>>>>>> ller.Engine$.train(Engine.scala:692)
>>>>>>>>             at org.apache.predictionio.contro
>>>>>>>> ller.Engine.train(Engine.scala:177)
>>>>>>>>             at org.apache.predictionio.workfl
>>>>>>>> ow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)
>>>>>>>>             at org.apache.predictionio.workfl
>>>>>>>> ow.CreateWorkflow$.main(CreateWorkflow.scala:250)
>>>>>>>>             at org.apache.predictionio.workfl
>>>>>>>> ow.CreateWorkflow.main(CreateWorkflow.scala)
>>>>>>>>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>>> Method)
>>>>>>>>             at sun.reflect.NativeMethodAccess
>>>>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>>>             at sun.reflect.DelegatingMethodAc
>>>>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>             at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>>>>             at org.apache.spark.deploy.SparkS
>>>>>>>> ubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSub
>>>>>>>> mit.scala:738)
>>>>>>>>             at org.apache.spark.deploy.SparkS
>>>>>>>> ubmit$.doRunMain$1(SparkSubmit.scala:187)
>>>>>>>>             at org.apache.spark.deploy.SparkS
>>>>>>>> ubmit$.submit(SparkSubmit.scala:212)
>>>>>>>>             at org.apache.spark.deploy.SparkS
>>>>>>>> ubmit$.main(SparkSubmit.scala:126)
>>>>>>>>             at org.apache.spark.deploy.SparkS
>>>>>>>> ubmit.main(SparkSubmit.scala)
>>>>>>>>
>>>>>>>> 2. I started spark standalone cluster with 1 master and 3 workers
>>>>>>>> and executed the command
>>>>>>>>
>>>>>>>> > pio train -- --master spark://*.*.*.*:7077 --driver-memory 50G
>>>>>>>> > --executor-memory 50G
>>>>>>>>
>>>>>>>> And after some times getting the error . Executor failed to connect
>>>>>>>> with master and training gets stopped.
>>>>>>>>
>>>>>>>> I have changed the feature count from 6500 - > 500 and still the
>>>>>>>> condition is same. So can anyone suggest me am I missing something
>>>>>>>>
>>>>>>>> and In between training getting continuous warnings like :
>>>>>>>> [
>>>>>>>>
>>>>>>>> > WARN] [ScannerCallable] Ignore, probably already closed
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Abhimanyu
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Not able to train data

Reply via email to