Re: Error while training : NegativeArraySizeException

2017-06-07 Thread Pat Ferrel
changing the first / primary / conversion event in eventNames changes what the 
algorithm will predict. CCO can predict anything in the data by changing this 
conversion events to the one you want.

However that means that you must have good data for the primary/conversion 
event. Removing it will not do what you want.

NegativeArraySizeException always means that you have no events for one of the 
event names, in this case the primary “facet”. You may have misconfigured the 
spelling of the “appName” and so are looking into the wrong dataset or 
misspelled the “”eventName” in engine.json or the “event” fields of the input, 
which must match exactly.
 

On Jun 7, 2017, at 2:43 AM, Bruno LEBON  wrote:

I have a model deployed for this app, it works if I keep only (facet, search) 
as event types. When asking for a prediction to my deployed model I have an 
answer in relation with my data (about cars). I checked that the data is sent 
at the right accesskey in the right ES index. This part is fine I think.

I read the code of the integration-test and I am confident my set up works 
well, as I already checked for pio status, created a new app, inserted some 
values in it, trained and deployed it. 

Yes sorry I forgot to specify the version, it is UR v0.5.0


2017-06-07 11:29 GMT+02:00 Vaghawan Ojha >:
Also what version of UR you're into? Is it the latest one? I've only worked 
with UR 0.50 . 

On Wed, Jun 7, 2017 at 3:12 PM, Vaghawan Ojha > wrote:
Yes you need to build the app again when you change something in the 
engine.json. That is every time when you change something in engine.json. 

Make sure the data corresponds to the same app which you have provided in the 
engine.json. 

Yes you can test example instigation test in UR with 
./examples/integration-test this command. 

You can find more in here http://actionml.com/docs/ur_quickstart 
 . 

On Wed, Jun 7, 2017 at 3:07 PM, Bruno LEBON > wrote:
Yes the three event types that I defined in the engine.json exist in my 
dataset, facet is my primary, I checked that it exists.

I think it is not needed to build again when changing something in the 
engine.json, as the file is read in the process but I built it and tried again 
and I still have the same error.

What is this example-intrigration? I dont know about this. Where can I find 
this script?

2017-06-07 11:11 GMT+02:00 Vaghawan Ojha >:
Hi,

For me this problem had happened when I had mistaken my primary events. The 
first eventName in the eventName array "eventNames": ["facet","view","search"] 
is primary. There is that event in your data. 

Did you make sure, you built the app again when you changed the eventName in 
engine.json? 

Also you could varify everything's fine with UR with ./example-intrigration. 

Thanks

On Wed, Jun 7, 2017 at 2:49 PM, Bruno LEBON > wrote:
Thanks for your answer.

You could explicitly do 

pio train -- --master spark://localhost:7077 --driver-memory 16G 
--executor-memory 24G 

and change the spark master url and the memories configuration. And see if that 
works. 

Yes that is the command I use to launch the train, except I am on a cluster, so 
Spark is not local. Here is mine:
 pio train -- --master spark://master:7077 --driver-memory 4g --executor-memory 
10g

The train works with different datasets, it also works with this dataset when I 
skip the event type view. So my guess is that there is something about this 
event type, either in the data but the data looks fine to me, or maybe there is 
a problem when I use more than two types of event (this is the first time I 
have more than two, however I can't believe that the problem is related the a 
number of event types).

The spelling is the same in the event sent to the eventserver ( view ) and in 
the engine.json ( view ).

I am reading the code to figure out where this error comes from.



2017-06-07 10:17 GMT+02:00 Vaghawan Ojha >:
You could explicitly do 

pio train -- --master spark://localhost:7077 --driver-memory 16G 
--executor-memory 24G 

and change the spark master url and the memories configuration. And see if that 
works. 

Thanks

On Wed, Jun 7, 2017 at 1:55 PM, Bruno LEBON > wrote:
Hi,

Using UR with PIO 0.10 I am trying to train my dataset. In return I get the 
following error:

...
[INFO] [DataSource] Received events List(facet, view, search)
[INFO] [DataSource] Number of events List(5, 4, 6)
[INFO] [Engine$] org.template.TrainingData does not support data sanity check. 
Skipping check.
[INFO] [Engine$] org.template.PreparedData does not support data sanity check. 
Skipping check.

Re: Error while training : NegativeArraySizeException

2017-06-07 Thread Bruno LEBON
I have a model deployed for this app, it works if I keep only (facet,
search) as event types. When asking for a prediction to my deployed model I
have an answer in relation with my data (about cars). I checked that the
data is sent at the right accesskey in the right ES index. This part is
fine I think.

I read the code of the integration-test and I am confident my set up works
well, as I already checked for pio status, created a new app, inserted some
values in it, trained and deployed it.

Yes sorry I forgot to specify the version, it is UR v0.5.0


2017-06-07 11:29 GMT+02:00 Vaghawan Ojha :

> Also what version of UR you're into? Is it the latest one? I've only
> worked with UR 0.50 .
>
> On Wed, Jun 7, 2017 at 3:12 PM, Vaghawan Ojha 
> wrote:
>
>> Yes you need to build the app again when you change something in the
>> engine.json. That is every time when you change something in engine.json.
>>
>> Make sure the data corresponds to the same app which you have provided in
>> the engine.json.
>>
>> Yes you can test example instigation test in UR
>> with ./examples/integration-test this command.
>>
>> You can find more in here http://actionml.com/docs/ur_quickstart .
>>
>> On Wed, Jun 7, 2017 at 3:07 PM, Bruno LEBON  wrote:
>>
>>> Yes the three event types that I defined in the engine.json exist in my
>>> dataset, facet is my primary, I checked that it exists.
>>>
>>> I think it is not needed to build again when changing something in the
>>> engine.json, as the file is read in the process but I built it and tried
>>> again and I still have the same error.
>>>
>>> What is this example-intrigration? I dont know about this. Where can I
>>> find this script?
>>>
>>> 2017-06-07 11:11 GMT+02:00 Vaghawan Ojha :
>>>
 Hi,

 For me this problem had happened when I had mistaken my primary events.
 The first eventName in the eventName array "eventNames":
 ["facet","view","search"] is primary. There is that event in your data.

 Did you make sure, you built the app again when you changed the
 eventName in engine.json?

 Also you could varify everything's fine with UR with
 ./example-intrigration.

 Thanks

 On Wed, Jun 7, 2017 at 2:49 PM, Bruno LEBON 
 wrote:

> Thanks for your answer.
>
> *You could explicitly do *
>
>
> *pio train -- --master spark://localhost:7077 --driver-memory 16G
> --executor-memory 24G *
>
> *and change the spark master url and the memories configuration. And
> see if that works. *
>
> Yes that is the command I use to launch the train, except I am on a
> cluster, so Spark is not local. Here is mine:
>  pio train -- --master spark://master:7077 --driver-memory 4g
> --executor-memory 10g
>
> The train works with different datasets, it also works with this
> dataset when I skip the event type *view*. So my guess is that there
> is something about this event type, either in the data but the data looks
> fine to me, or maybe there is a problem when I use more than two types of
> event (this is the first time I have more than two, however I can't 
> believe
> that the problem is related the a number of event types).
>
> The spelling is the same in the event sent to the eventserver ( *view
> *) and in the engine.json ( *view *).
>
> I am reading the code to figure out where this error comes from.
>
>
>
> 2017-06-07 10:17 GMT+02:00 Vaghawan Ojha :
>
>> You could explicitly do
>>
>> pio train -- --master spark://localhost:7077 --driver-memory 16G
>> --executor-memory 24G
>>
>> and change the spark master url and the memories configuration. And
>> see if that works.
>>
>> Thanks
>>
>> On Wed, Jun 7, 2017 at 1:55 PM, Bruno LEBON 
>> wrote:
>>
>>> Hi,
>>>
>>> Using UR with PIO 0.10 I am trying to train my dataset. In return I
>>> get the following error:
>>>
>>> *...*
>>> *[INFO] [DataSource] Received events List(facet, view, search)*
>>> *[INFO] [DataSource] Number of events List(5, 4, 6)*
>>> *[INFO] [Engine$] org.template.TrainingData does not support data
>>> sanity check. Skipping check.*
>>> *[INFO] [Engine$] org.template.PreparedData does not support data
>>> sanity check. Skipping check.*
>>> *[INFO] [URAlgorithm] Actions read now creating correlators*
>>> *[WARN] [TaskSetManager] Lost task 0.0 in stage 56.0 (TID 50,
>>> ip-172-31-40-139.eu-west-1.com
>>> pute.internal):
>>> java.lang.NegativeArraySizeException*
>>> *at
>>> org.apache.mahout.math.DenseVector.(DenseVector.java:57)*
>>> *at
>>> 

Re: Error while training : NegativeArraySizeException

2017-06-07 Thread Vaghawan Ojha
Also what version of UR you're into? Is it the latest one? I've only worked
with UR 0.50 .

On Wed, Jun 7, 2017 at 3:12 PM, Vaghawan Ojha  wrote:

> Yes you need to build the app again when you change something in the
> engine.json. That is every time when you change something in engine.json.
>
> Make sure the data corresponds to the same app which you have provided in
> the engine.json.
>
> Yes you can test example instigation test in UR
> with ./examples/integration-test this command.
>
> You can find more in here http://actionml.com/docs/ur_quickstart .
>
> On Wed, Jun 7, 2017 at 3:07 PM, Bruno LEBON  wrote:
>
>> Yes the three event types that I defined in the engine.json exist in my
>> dataset, facet is my primary, I checked that it exists.
>>
>> I think it is not needed to build again when changing something in the
>> engine.json, as the file is read in the process but I built it and tried
>> again and I still have the same error.
>>
>> What is this example-intrigration? I dont know about this. Where can I
>> find this script?
>>
>> 2017-06-07 11:11 GMT+02:00 Vaghawan Ojha :
>>
>>> Hi,
>>>
>>> For me this problem had happened when I had mistaken my primary events.
>>> The first eventName in the eventName array "eventNames":
>>> ["facet","view","search"] is primary. There is that event in your data.
>>>
>>> Did you make sure, you built the app again when you changed the
>>> eventName in engine.json?
>>>
>>> Also you could varify everything's fine with UR with
>>> ./example-intrigration.
>>>
>>> Thanks
>>>
>>> On Wed, Jun 7, 2017 at 2:49 PM, Bruno LEBON  wrote:
>>>
 Thanks for your answer.

 *You could explicitly do *


 *pio train -- --master spark://localhost:7077 --driver-memory 16G
 --executor-memory 24G *

 *and change the spark master url and the memories configuration. And
 see if that works. *

 Yes that is the command I use to launch the train, except I am on a
 cluster, so Spark is not local. Here is mine:
  pio train -- --master spark://master:7077 --driver-memory 4g
 --executor-memory 10g

 The train works with different datasets, it also works with this
 dataset when I skip the event type *view*. So my guess is that there
 is something about this event type, either in the data but the data looks
 fine to me, or maybe there is a problem when I use more than two types of
 event (this is the first time I have more than two, however I can't believe
 that the problem is related the a number of event types).

 The spelling is the same in the event sent to the eventserver ( *view *)
 and in the engine.json ( *view *).

 I am reading the code to figure out where this error comes from.



 2017-06-07 10:17 GMT+02:00 Vaghawan Ojha :

> You could explicitly do
>
> pio train -- --master spark://localhost:7077 --driver-memory 16G
> --executor-memory 24G
>
> and change the spark master url and the memories configuration. And
> see if that works.
>
> Thanks
>
> On Wed, Jun 7, 2017 at 1:55 PM, Bruno LEBON 
> wrote:
>
>> Hi,
>>
>> Using UR with PIO 0.10 I am trying to train my dataset. In return I
>> get the following error:
>>
>> *...*
>> *[INFO] [DataSource] Received events List(facet, view, search)*
>> *[INFO] [DataSource] Number of events List(5, 4, 6)*
>> *[INFO] [Engine$] org.template.TrainingData does not support data
>> sanity check. Skipping check.*
>> *[INFO] [Engine$] org.template.PreparedData does not support data
>> sanity check. Skipping check.*
>> *[INFO] [URAlgorithm] Actions read now creating correlators*
>> *[WARN] [TaskSetManager] Lost task 0.0 in stage 56.0 (TID 50,
>> ip-172-31-40-139.eu-west-1.com
>> pute.internal):
>> java.lang.NegativeArraySizeException*
>> *at
>> org.apache.mahout.math.DenseVector.(DenseVector.java:57)*
>> *at
>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)*
>> *at
>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)*
>> *at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
>> *at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
>> *at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*
>> *at
>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)*
>> *at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)*
>> *at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)*
>> *at 

Re: Error while training : NegativeArraySizeException

2017-06-07 Thread Bruno LEBON
Yes the three event types that I defined in the engine.json exist in my
dataset, facet is my primary, I checked that it exists.

I think it is not needed to build again when changing something in the
engine.json, as the file is read in the process but I built it and tried
again and I still have the same error.

What is this example-intrigration? I dont know about this. Where can I find
this script?

2017-06-07 11:11 GMT+02:00 Vaghawan Ojha :

> Hi,
>
> For me this problem had happened when I had mistaken my primary events.
> The first eventName in the eventName array "eventNames":
> ["facet","view","search"] is primary. There is that event in your data.
>
> Did you make sure, you built the app again when you changed the eventName
> in engine.json?
>
> Also you could varify everything's fine with UR with
> ./example-intrigration.
>
> Thanks
>
> On Wed, Jun 7, 2017 at 2:49 PM, Bruno LEBON  wrote:
>
>> Thanks for your answer.
>>
>> *You could explicitly do *
>>
>>
>> *pio train -- --master spark://localhost:7077 --driver-memory 16G
>> --executor-memory 24G *
>>
>> *and change the spark master url and the memories configuration. And see
>> if that works. *
>>
>> Yes that is the command I use to launch the train, except I am on a
>> cluster, so Spark is not local. Here is mine:
>>  pio train -- --master spark://master:7077 --driver-memory 4g
>> --executor-memory 10g
>>
>> The train works with different datasets, it also works with this dataset
>> when I skip the event type *view*. So my guess is that there is
>> something about this event type, either in the data but the data looks fine
>> to me, or maybe there is a problem when I use more than two types of event
>> (this is the first time I have more than two, however I can't believe that
>> the problem is related the a number of event types).
>>
>> The spelling is the same in the event sent to the eventserver ( *view *)
>> and in the engine.json ( *view *).
>>
>> I am reading the code to figure out where this error comes from.
>>
>>
>>
>> 2017-06-07 10:17 GMT+02:00 Vaghawan Ojha :
>>
>>> You could explicitly do
>>>
>>> pio train -- --master spark://localhost:7077 --driver-memory 16G
>>> --executor-memory 24G
>>>
>>> and change the spark master url and the memories configuration. And see
>>> if that works.
>>>
>>> Thanks
>>>
>>> On Wed, Jun 7, 2017 at 1:55 PM, Bruno LEBON  wrote:
>>>
 Hi,

 Using UR with PIO 0.10 I am trying to train my dataset. In return I get
 the following error:

 *...*
 *[INFO] [DataSource] Received events List(facet, view, search)*
 *[INFO] [DataSource] Number of events List(5, 4, 6)*
 *[INFO] [Engine$] org.template.TrainingData does not support data
 sanity check. Skipping check.*
 *[INFO] [Engine$] org.template.PreparedData does not support data
 sanity check. Skipping check.*
 *[INFO] [URAlgorithm] Actions read now creating correlators*
 *[WARN] [TaskSetManager] Lost task 0.0 in stage 56.0 (TID 50,
 ip-172-31-40-139.eu-west-1.com
 pute.internal):
 java.lang.NegativeArraySizeException*
 *at
 org.apache.mahout.math.DenseVector.(DenseVector.java:57)*
 *at
 org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)*
 *at
 org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)*
 *at
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
 *at
 org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
 *at
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*
 *at
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)*
 *at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)*
 *at
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)*
 *at org.apache.spark.scheduler.Task.run(Task.scala:89)*
 *at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)*
 *at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
 *at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
 *at java.lang.Thread.run(Thread.java:748)*

 *[ERROR] [TaskSetManager] Task 0 in stage 56.0 failed 4 times; aborting
 job*
 *Exception in thread "main" org.apache.spark.SparkException: Job
 aborted due to stage failure: Task 0 in stage 56.0 failed 4 times, most
 recent failure: Lost task 0.3 in stage 56.0 (TID 56,
 ip-172-1-1-1.eu-west-1.compute.internal):
 java.lang.NegativeArraySizeException*
 *at
 org.apache.mahout.math.DenseVector.(DenseVector.java:57)*
 *at
 

Re: Error while training : NegativeArraySizeException

2017-06-07 Thread Vaghawan Ojha
Hi,

For me this problem had happened when I had mistaken my primary events. The
first eventName in the eventName array "eventNames":
["facet","view","search"] is primary. There is that event in your data.

Did you make sure, you built the app again when you changed the eventName
in engine.json?

Also you could varify everything's fine with UR with
./example-intrigration.

Thanks

On Wed, Jun 7, 2017 at 2:49 PM, Bruno LEBON  wrote:

> Thanks for your answer.
>
> *You could explicitly do *
>
>
> *pio train -- --master spark://localhost:7077 --driver-memory 16G
> --executor-memory 24G *
>
> *and change the spark master url and the memories configuration. And see
> if that works. *
>
> Yes that is the command I use to launch the train, except I am on a
> cluster, so Spark is not local. Here is mine:
>  pio train -- --master spark://master:7077 --driver-memory 4g
> --executor-memory 10g
>
> The train works with different datasets, it also works with this dataset
> when I skip the event type *view*. So my guess is that there is something
> about this event type, either in the data but the data looks fine to me, or
> maybe there is a problem when I use more than two types of event (this is
> the first time I have more than two, however I can't believe that the
> problem is related the a number of event types).
>
> The spelling is the same in the event sent to the eventserver ( *view *)
> and in the engine.json ( *view *).
>
> I am reading the code to figure out where this error comes from.
>
>
>
> 2017-06-07 10:17 GMT+02:00 Vaghawan Ojha :
>
>> You could explicitly do
>>
>> pio train -- --master spark://localhost:7077 --driver-memory 16G
>> --executor-memory 24G
>>
>> and change the spark master url and the memories configuration. And see
>> if that works.
>>
>> Thanks
>>
>> On Wed, Jun 7, 2017 at 1:55 PM, Bruno LEBON  wrote:
>>
>>> Hi,
>>>
>>> Using UR with PIO 0.10 I am trying to train my dataset. In return I get
>>> the following error:
>>>
>>> *...*
>>> *[INFO] [DataSource] Received events List(facet, view, search)*
>>> *[INFO] [DataSource] Number of events List(5, 4, 6)*
>>> *[INFO] [Engine$] org.template.TrainingData does not support data sanity
>>> check. Skipping check.*
>>> *[INFO] [Engine$] org.template.PreparedData does not support data sanity
>>> check. Skipping check.*
>>> *[INFO] [URAlgorithm] Actions read now creating correlators*
>>> *[WARN] [TaskSetManager] Lost task 0.0 in stage 56.0 (TID 50,
>>> ip-172-31-40-139.eu-west-1.com
>>> pute.internal):
>>> java.lang.NegativeArraySizeException*
>>> *at
>>> org.apache.mahout.math.DenseVector.(DenseVector.java:57)*
>>> *at
>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)*
>>> *at
>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)*
>>> *at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
>>> *at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
>>> *at
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*
>>> *at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)*
>>> *at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)*
>>> *at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)*
>>> *at org.apache.spark.scheduler.Task.run(Task.scala:89)*
>>> *at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)*
>>> *at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
>>> *at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
>>> *at java.lang.Thread.run(Thread.java:748)*
>>>
>>> *[ERROR] [TaskSetManager] Task 0 in stage 56.0 failed 4 times; aborting
>>> job*
>>> *Exception in thread "main" org.apache.spark.SparkException: Job aborted
>>> due to stage failure: Task 0 in stage 56.0 failed 4 times, most recent
>>> failure: Lost task 0.3 in stage 56.0 (TID 56,
>>> ip-172-1-1-1.eu-west-1.compute.internal):
>>> java.lang.NegativeArraySizeException*
>>> *at
>>> org.apache.mahout.math.DenseVector.(DenseVector.java:57)*
>>> *at
>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)*
>>> *at
>>> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)*
>>> *at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
>>> *at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
>>> *at
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*
>>> *at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)*
>>> *at 

Re: Error while training : NegativeArraySizeException

2017-06-07 Thread Vaghawan Ojha
You could explicitly do

pio train -- --master spark://localhost:7077 --driver-memory 16G
--executor-memory 24G

and change the spark master url and the memories configuration. And see if
that works.

Thanks

On Wed, Jun 7, 2017 at 1:55 PM, Bruno LEBON  wrote:

> Hi,
>
> Using UR with PIO 0.10 I am trying to train my dataset. In return I get
> the following error:
>
> *...*
> *[INFO] [DataSource] Received events List(facet, view, search)*
> *[INFO] [DataSource] Number of events List(5, 4, 6)*
> *[INFO] [Engine$] org.template.TrainingData does not support data sanity
> check. Skipping check.*
> *[INFO] [Engine$] org.template.PreparedData does not support data sanity
> check. Skipping check.*
> *[INFO] [URAlgorithm] Actions read now creating correlators*
> *[WARN] [TaskSetManager] Lost task 0.0 in stage 56.0 (TID 50,
> ip-172-31-40-139.eu-west-1.compute.internal):
> java.lang.NegativeArraySizeException*
> *at org.apache.mahout.math.DenseVector.(DenseVector.java:57)*
> *at
> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)*
> *at
> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)*
> *at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
> *at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
> *at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*
> *at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)*
> *at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)*
> *at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)*
> *at org.apache.spark.scheduler.Task.run(Task.scala:89)*
> *at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)*
> *at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
> *at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
> *at java.lang.Thread.run(Thread.java:748)*
>
> *[ERROR] [TaskSetManager] Task 0 in stage 56.0 failed 4 times; aborting
> job*
> *Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Task 0 in stage 56.0 failed 4 times, most recent
> failure: Lost task 0.3 in stage 56.0 (TID 56,
> ip-172-1-1-1.eu-west-1.compute.internal):
> java.lang.NegativeArraySizeException*
> *at org.apache.mahout.math.DenseVector.(DenseVector.java:57)*
> *at
> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)*
> *at
> org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)*
> *at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
> *at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
> *at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*
> *at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)*
> *at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)*
> *at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)*
> *at org.apache.spark.scheduler.Task.run(Task.scala:89)*
> *at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)*
> *at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
> *at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
> *at java.lang.Thread.run(Thread.java:748)*
>
> *Driver stacktrace:*
> *at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)*
> *at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)*
> *at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)*
> *at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)*
> *at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)*
> *at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)*
> *at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)*
> *at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)*
> *at scala.Option.foreach(Option.scala:236)*
> *at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)*
> *at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)*
> *at
> 

Error while training : NegativeArraySizeException

2017-06-07 Thread Bruno LEBON
Hi,

Using UR with PIO 0.10 I am trying to train my dataset. In return I get the
following error:

*...*
*[INFO] [DataSource] Received events List(facet, view, search)*
*[INFO] [DataSource] Number of events List(5, 4, 6)*
*[INFO] [Engine$] org.template.TrainingData does not support data sanity
check. Skipping check.*
*[INFO] [Engine$] org.template.PreparedData does not support data sanity
check. Skipping check.*
*[INFO] [URAlgorithm] Actions read now creating correlators*
*[WARN] [TaskSetManager] Lost task 0.0 in stage 56.0 (TID 50,
ip-172-31-40-139.eu-west-1.compute.internal):
java.lang.NegativeArraySizeException*
*at org.apache.mahout.math.DenseVector.(DenseVector.java:57)*
*at
org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)*
*at
org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)*
*at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
*at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
*at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*
*at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)*
*at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)*
*at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)*
*at org.apache.spark.scheduler.Task.run(Task.scala:89)*
*at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)*
*at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
*at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
*at java.lang.Thread.run(Thread.java:748)*

*[ERROR] [TaskSetManager] Task 0 in stage 56.0 failed 4 times; aborting job*
*Exception in thread "main" org.apache.spark.SparkException: Job aborted
due to stage failure: Task 0 in stage 56.0 failed 4 times, most recent
failure: Lost task 0.3 in stage 56.0 (TID 56,
ip-172-1-1-1.eu-west-1.compute.internal):
java.lang.NegativeArraySizeException*
*at org.apache.mahout.math.DenseVector.(DenseVector.java:57)*
*at
org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:73)*
*at
org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:72)*
*at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
*at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)*
*at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*
*at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)*
*at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)*
*at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)*
*at org.apache.spark.scheduler.Task.run(Task.scala:89)*
*at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)*
*at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
*at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
*at java.lang.Thread.run(Thread.java:748)*

*Driver stacktrace:*
*at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)*
*at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)*
*at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)*
*at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)*
*at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)*
*at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)*
*at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)*
*at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)*
*at scala.Option.foreach(Option.scala:236)*
*at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)*
*at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)*
*at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)*
*at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)*
*at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)*
*at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)*
*at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)*
*at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)*
*at