Pat, this worked!!!!! Thank you very much!!!!
The only odd thing now is that all the results I get now are 0s. For
example:
Using the dataset:
"u1","i1"
"u2","i1"
"u2","i2"
"u3","i2"
"u3","i3"
"u4","i4"
echo "Recommendations for user: u1"
echo ""
curl -H "Content-Type: application/json" -d '
{
"user": "u1"
}' http://localhost:8000/queries.json
echo ""
What I get is:
{"itemScores":[{"item":"\"i2\"","score":0.0},{"item":"\"i1\"","score":0.0},{"item":"\"i3\"","score":0.0},{"item":"\"i4\"","score":0.0}]}
If user u1 has viewed i1 and user u2 has viewed i1 and i2 then I think the
algorithm should return a non-zore score for i2 (and possible i1, too).
Even using the bigger dataset with 100 items I still get all scores 0s.
So now I'm going to spend some time reading the following documentation,
unless there is some other documentation you recommend I read first!
- [The Universal Recommender](http://actionml.com/docs/ur)
- [The Correlated Cross-Occurrence Algorithm](
http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html)
- [The Universal Recommender Slide Deck](
http://www.slideshare.net/pferrel/unified-recommender-39986309)
- [Multi-domain predictive AI or how to make one thing predict another](
https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-cross-occurences/
)
Thank you very much for all your patience and help getting me to this
point!!!
Best regards,
Noelia
On 18 October 2017 at 18:33, Pat Ferrel <[email protected]> wrote:
> It is the UR so Events are taken from the EventStore and converted into a
> Mahout DistributedRowMatrix of RandomAccessSparseVectors, which are both
> serializable. This path works fine and has for several years.
>
> This must be a config problem, like not using the MahoutKryoRegistrator,
> which registers the serializers for these.
>
> @Noelia, you have left out the sparkConf section of the engine.json. The
> one used in the integration test should work:
>
> {
> "comment":" This config file uses default settings for all but the
> required values see README.md for docs",
> "id": "default",
> "description": "Default settings",
> "engineFactory": "com.actionml.RecommendationEngine",
> "datasource": {
> "params" : {
> "name": "tiny_app_data.csv",
> "appName": "TinyApp",
> "eventNames": ["view"]
> }
> },
> "sparkConf": { <================= THIS WAS LEFT OUT IN YOUR ENGINE.JSON
> BELOW IN THIS THREAD
> "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
> "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.
> MahoutKryoRegistrator",
> "spark.kryo.referenceTracking": "false",
> "spark.kryoserializer.buffer": "300m",
> "es.index.auto.create": "true"
> },
> "algorithms": [
> {
> "comment": "simplest setup where all values are default, popularity
> based backfill, must add eventsNames",
> "name": "ur",
> "params": {
> "appName": "TinyApp",
> "indexName": "urindex",
> "typeName": "items",
> "comment": "must have data for the first event or the model will
> not build, other events are optional",
> "eventNames": ["view"]
> }
> }
> ]
> }
>
>
> On Oct 18, 2017, at 8:49 AM, Donald Szeto <[email protected]> wrote:
>
> Chiming in a bit. Looking at the serialization error, it looks like we are
> just one little step away from getting this to work.
>
> Noelia, what does your synthesized data look like? All data that is
> processed by Spark needs to be serializable. At some point, a
> non-serializable vector object showing in the stack is created out of your
> synthesized data. It would be great to know what your input event looks
> like and see where in the code path has caused this.
>
> Regards,
> Donald
>
> On Tue, Oct 17, 2017 at 12:14 AM Noelia Osés Fernández <
> [email protected]> wrote:
>
>> Pat, you mentioned the problem could be that the data I was using was too
>> small. So now I'm using the attached data file as the data (4 users and 100
>> items). But I'm still getting the same error. I'm sorry I forgot to mention
>> I had increased the dataset.
>>
>> The reason why I want to make it work with a very small dataset is
>> because I want to be able to follow the calculations. I want to understand
>> what the UR is doing and understand the impact of changing this or that,
>> here or there... I find that easier to achieve with a small example in
>> which I know exactly what's happening. I want to build my trust on my
>> understanding of the UR before I move on to applying it to a real problem.
>> If I'm not confident that I know how to use it, how can I tell my client
>> that the results I'm getting are good with any degree of confidence?
>>
>>
>>
>>
>>
>> On 16 October 2017 at 20:44, Pat Ferrel <[email protected]> wrote:
>>
>>> So all setup is the same for the integration-test and your modified test
>>> *except the data*?
>>>
>>> The error looks like a setup problem because the serialization should
>>> happen with either test. But if the only difference really is the data,
>>> then toss it and use either real data or the integration test data, why are
>>> you trying to synthesize fake data if it causes the error?
>>>
>>> BTW the data you include below in this thread would never create
>>> internal IDs as high as 94 in the vector. You must have switched to a new
>>> dataset???
>>>
>>> I would get a dump of your data using `pio export` and make sure it’s
>>> what you thought it was. You claim to have only 4 user ids and 4 item ids
>>> but the serialized vector thinks you have at least 94 of user or item ids.
>>> Something doesn’t add up.
>>>
>>>
>>> On Oct 16, 2017, at 4:43 AM, Noelia Osés Fernández <[email protected]>
>>> wrote:
>>>
>>> Pat, you are absolutely right! I increased the sleep time and now the
>>> integration test for handmade works perfectly.
>>>
>>> However, the integration test adapted to run with my tiny app runs into
>>> the same problem I've been having with this app:
>>>
>>> [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not
>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>> Serialization stack:
>>> - object not serializable (class:
>>> org.apache.mahout.math.RandomAccessSparseVector,
>>> value: {66:1.0,29:1.0,70:1.0,91:1.0,58:1.0,37:1.0,13:1.0,8:1.0,94:
>>> 1.0,30:1.0,57:1.0,22:1.0,20:1.0,35:1.0,97:1.0,60:1.0,27:1.0,
>>> 72:1.0,3:1.0,34:1.0,77:1.0,46:1.0,81:1.0,86:1.0,43:1.0})
>>> - field (class: scala.Tuple2, name: _2, type: class
>>> java.lang.Object)
>>> - object (class scala.Tuple2, (1,{66:1.0,29:1.0,70:1.0,91:1.
>>> 0,58:1.0,37:1.0,13:1.0,8:1.0,94:1.0,30:1.0,57:1.0,22:1.0,
>>> 20:1.0,35:1.0,97:1.0,60:1.0,27:1.0,72:1.0,3:1.0,34:1.0,77:
>>> 1.0,46:1.0,81:1.0,86:1.0,43:1.0})); not retrying
>>> [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not
>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>> Serialization stack:
>>>
>>> ...
>>>
>>> Any ideas?
>>>
>>> On 15 October 2017 at 19:09, Pat Ferrel <[email protected]> wrote:
>>>
>>>> This is probably a timing issue in the integration test, which has to
>>>> wait for `pio deploy` to finish before the queries can be made. If it
>>>> doesn’t finish the queries will fail. By the time the rest of the test
>>>> quits the model has been deployed so you can run queries. In the
>>>> integration-test script increase the delay after `pio deploy…` and see if
>>>> it passes then.
>>>>
>>>> This is probably an integrtion-test script problem not a problem in the
>>>> system
>>>>
>>>>
>>>>
>>>> On Oct 6, 2017, at 4:21 AM, Noelia Osés Fernández <[email protected]>
>>>> wrote:
>>>>
>>>> Pat,
>>>>
>>>> I have run the integration test for the handmade example out of
>>>> curiosity. Strangely enough things go more or less as expected apart from
>>>> the fact that I get a message saying:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *...[INFO] [CoreWorkflow$] Updating engine instance[INFO]
>>>> [CoreWorkflow$] Training completed successfully.Model will remain deployed
>>>> after this testWaiting 30 seconds for the server to startnohup: redirecting
>>>> stderr to stdout % Total % Received % Xferd Average Speed Time
>>>> Time Time Current Dload Upload
>>>> Total Spent Left Speed 0 0 0 0 0 0 0 0
>>>> --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost
>>>> port 8000: Connection refused*
>>>> So the integration test does not manage to get the recommendations even
>>>> though the model trained and deployed successfully. However, as soon as the
>>>> integration test finishes, on the same terminal, I can get the
>>>> recommendations by doing the following:
>>>>
>>>> $ curl -H "Content-Type: application/json" -d '
>>>> > {
>>>> > "user": "u1"
>>>> > }' http://localhost:8000/queries.json
>>>> {"itemScores":[{"item":"Nexus","score":0.057719700038433075}
>>>> ,{"item":"Surface","score":0.0}]}
>>>>
>>>> Isn't this odd? Can you guess what's going on?
>>>>
>>>> Thank you very much for all your support!
>>>> noelia
>>>>
>>>>
>>>>
>>>> On 5 October 2017 at 19:22, Pat Ferrel <[email protected]> wrote:
>>>>
>>>>> Ok, that config should work. Does the integration test pass?
>>>>>
>>>>> The data you are using is extremely small and though it does look like
>>>>> it has cooccurrences, they may not meet minimum “big-data” thresholds used
>>>>> by default. Try adding more data or use the handmade example data, rename
>>>>> purchase to view and discard the existing view data if you wish.
>>>>>
>>>>> The error is very odd and I’ve never seen it. If the integration test
>>>>> works I can only surmise it's your data.
>>>>>
>>>>>
>>>>> On Oct 5, 2017, at 12:02 AM, Noelia Osés Fernández <
>>>>> [email protected]> wrote:
>>>>>
>>>>> SPARK: spark-1.6.3-bin-hadoop2.6
>>>>>
>>>>> PIO: 0.11.0-incubating
>>>>>
>>>>> Scala: whatever gets installed when installing PIO 0.11.0-incubating,
>>>>> I haven't installed Scala separately
>>>>>
>>>>> UR: ActionML's UR v0.6.0 I suppose as that's the last version
>>>>> mentioned in the readme file. I have attached the UR zip file I downloaded
>>>>> from the actionml github account.
>>>>>
>>>>> Thank you for your help!!
>>>>>
>>>>> On 4 October 2017 at 17:20, Pat Ferrel <[email protected]> wrote:
>>>>>
>>>>>> What version of Scala. Spark, PIO, and UR are you using?
>>>>>>
>>>>>>
>>>>>> On Oct 4, 2017, at 6:10 AM, Noelia Osés Fernández <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I'm still trying to create a very simple app to learn to use
>>>>>> PredictionIO and still having trouble. I have done pio build no problem.
>>>>>> But when I do pio train I get a very long error message related to
>>>>>> serialisation (error message copied below).
>>>>>>
>>>>>> pio status reports system is all ready to go.
>>>>>>
>>>>>> The app I'm trying to build is very simple, it only has 'view'
>>>>>> events. Here's the engine.json:
>>>>>>
>>>>>> *===========================================================*
>>>>>> {
>>>>>> "comment":" This config file uses default settings for all but the
>>>>>> required values see README.md for docs",
>>>>>> "id": "default",
>>>>>> "description": "Default settings",
>>>>>> "engineFactory": "com.actionml.RecommendationEngine",
>>>>>> "datasource": {
>>>>>> "params" : {
>>>>>> "name": "tiny_app_data.csv",
>>>>>> "appName": "TinyApp",
>>>>>> "eventNames": ["view"]
>>>>>> }
>>>>>> },
>>>>>> "algorithms": [
>>>>>> {
>>>>>> "comment": "simplest setup where all values are default,
>>>>>> popularity based backfill, must add eventsNames",
>>>>>> "name": "ur",
>>>>>> "params": {
>>>>>> "appName": "TinyApp",
>>>>>> "indexName": "urindex",
>>>>>> "typeName": "items",
>>>>>> "comment": "must have data for the first event or the model
>>>>>> will not build, other events are optional",
>>>>>> "eventNames": ["view"]
>>>>>> }
>>>>>> }
>>>>>> ]
>>>>>> }
>>>>>> *===========================================================*
>>>>>>
>>>>>> The data I'm using is:
>>>>>>
>>>>>> "u1","i1"
>>>>>> "u2","i1"
>>>>>> "u2","i2"
>>>>>> "u3","i2"
>>>>>> "u3","i3"
>>>>>> "u4","i4"
>>>>>>
>>>>>> meaning user u viewed item i.
>>>>>>
>>>>>> The data has been added to the database with the following python
>>>>>> code:
>>>>>>
>>>>>> *===========================================================*
>>>>>> """
>>>>>> Import sample data for recommendation engine
>>>>>> """
>>>>>>
>>>>>> import predictionio
>>>>>> import argparse
>>>>>> import random
>>>>>>
>>>>>> RATE_ACTIONS_DELIMITER = ","
>>>>>> SEED = 1
>>>>>>
>>>>>>
>>>>>> def import_events(client, file):
>>>>>> f = open(file, 'r')
>>>>>> random.seed(SEED)
>>>>>> count = 0
>>>>>> print "Importing data..."
>>>>>>
>>>>>> items = []
>>>>>> users = []
>>>>>> f = open(file, 'r')
>>>>>> for line in f:
>>>>>> data = line.rstrip('\r\n').split(RATE_ACTIONS_DELIMITER)
>>>>>> users.append(data[0])
>>>>>> items.append(data[1])
>>>>>> client.create_event(
>>>>>> event="view",
>>>>>> entity_type="user",
>>>>>> entity_id=data[0],
>>>>>> target_entity_type="item",
>>>>>> target_entity_id=data[1]
>>>>>> )
>>>>>> print "Event: " + "view" + " entity_id: " + data[0] + "
>>>>>> target_entity_id: " + data[1]
>>>>>> count += 1
>>>>>> f.close()
>>>>>>
>>>>>> users = set(users)
>>>>>> items = set(items)
>>>>>> print "All users: " + str(users)
>>>>>> print "All items: " + str(items)
>>>>>> for item in items:
>>>>>> client.create_event(
>>>>>> event="$set",
>>>>>> entity_type="item",
>>>>>> entity_id=item
>>>>>> )
>>>>>> count += 1
>>>>>>
>>>>>>
>>>>>> print "%s events are imported." % count
>>>>>>
>>>>>>
>>>>>> if __name__ == '__main__':
>>>>>> parser = argparse.ArgumentParser(
>>>>>> description="Import sample data for recommendation engine")
>>>>>> parser.add_argument('--access_key', default='invald_access_key')
>>>>>> parser.add_argument('--url', default="http://localhost:7070")
>>>>>> parser.add_argument('--file', default="./data/tiny_app_data.csv")
>>>>>>
>>>>>> args = parser.parse_args()
>>>>>> print args
>>>>>>
>>>>>> client = predictionio.EventClient(
>>>>>> access_key=args.access_key,
>>>>>> url=args.url,
>>>>>> threads=5,
>>>>>> qsize=500)
>>>>>> import_events(client, args.file)
>>>>>> *===========================================================*
>>>>>>
>>>>>> My pio_env.sh is the following:
>>>>>>
>>>>>> *===========================================================*
>>>>>> #!/usr/bin/env bash
>>>>>> #
>>>>>> # Copy this file as pio-env.sh and edit it for your site's
>>>>>> configuration.
>>>>>> #
>>>>>> # Licensed to the Apache Software Foundation (ASF) under one or more
>>>>>> # contributor license agreements. See the NOTICE file distributed
>>>>>> with
>>>>>> # this work for additional information regarding copyright ownership.
>>>>>> # The ASF licenses this file to You under the Apache License, Version
>>>>>> 2.0
>>>>>> # (the "License"); you may not use this file except in compliance with
>>>>>> # the License. You may obtain a copy of the License at
>>>>>> #
>>>>>> # http://www.apache.org/licenses/LICENSE-2.0
>>>>>> #
>>>>>> # Unless required by applicable law or agreed to in writing, software
>>>>>> # distributed under the License is distributed on an "AS IS" BASIS,
>>>>>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>>>>>> implied.
>>>>>> # See the License for the specific language governing permissions and
>>>>>> # limitations under the License.
>>>>>> #
>>>>>>
>>>>>> # PredictionIO Main Configuration
>>>>>> #
>>>>>> # This section controls core behavior of PredictionIO. It is very
>>>>>> likely that
>>>>>> # you need to change these to fit your site.
>>>>>>
>>>>>> # SPARK_HOME: Apache Spark is a hard dependency and must be
>>>>>> configured.
>>>>>> # SPARK_HOME=$PIO_HOME/vendors/spark-2.0.2-bin-hadoop2.7
>>>>>> SPARK_HOME=$PIO_HOME/vendors/spark-1.6.3-bin-hadoop2.6
>>>>>>
>>>>>> POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.1.4.jar
>>>>>> MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar
>>>>>>
>>>>>> # ES_CONF_DIR: You must configure this if you have advanced
>>>>>> configuration for
>>>>>> # your Elasticsearch setup.
>>>>>> # ES_CONF_DIR=/opt/elasticsearch
>>>>>> #ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch-1.7.6
>>>>>>
>>>>>> # HADOOP_CONF_DIR: You must configure this if you intend to run
>>>>>> PredictionIO
>>>>>> # with Hadoop 2.
>>>>>> # HADOOP_CONF_DIR=/opt/hadoop
>>>>>>
>>>>>> # HBASE_CONF_DIR: You must configure this if you intend to run
>>>>>> PredictionIO
>>>>>> # with HBase on a remote cluster.
>>>>>> # HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf
>>>>>>
>>>>>> # Filesystem paths where PredictionIO uses as block storage.
>>>>>> PIO_FS_BASEDIR=$HOME/.pio_store
>>>>>> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
>>>>>> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
>>>>>>
>>>>>> # PredictionIO Storage Configuration
>>>>>> #
>>>>>> # This section controls programs that make use of PredictionIO's
>>>>>> built-in
>>>>>> # storage facilities. Default values are shown below.
>>>>>> #
>>>>>> # For more information on storage configuration please refer to
>>>>>> # http://predictionio.incubator.apache.org/system/anotherdatastore/
>>>>>>
>>>>>> # Storage Repositories
>>>>>>
>>>>>> # Default is to use PostgreSQL
>>>>>> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
>>>>>> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
>>>>>>
>>>>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
>>>>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
>>>>>>
>>>>>> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
>>>>>> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
>>>>>>
>>>>>> # Storage Data Sources
>>>>>>
>>>>>> # PostgreSQL Default Settings
>>>>>> # Please change "pio" to your database name in
>>>>>> PIO_STORAGE_SOURCES_PGSQL_URL
>>>>>> # Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and
>>>>>> # PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly
>>>>>> PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
>>>>>> PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio
>>>>>> PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
>>>>>> PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio
>>>>>>
>>>>>> # MySQL Example
>>>>>> # PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc
>>>>>> # PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio
>>>>>> # PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio
>>>>>> # PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio
>>>>>>
>>>>>> # Elasticsearch Example
>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http
>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/
>>>>>> vendors/elasticsearch-5.2.1
>>>>>> # Elasticsearch 1.x Example
>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=myprojectES
>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/
>>>>>> vendors/elasticsearch-1.7.6
>>>>>>
>>>>>> # Local File System Example
>>>>>> PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
>>>>>> PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models
>>>>>>
>>>>>> # HBase Example
>>>>>> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
>>>>>> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.2.6
>>>>>>
>>>>>>
>>>>>> *===========================================================Error
>>>>>> message:*
>>>>>>
>>>>>> *===========================================================*
>>>>>> [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not
>>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>>>> Serialization stack:
>>>>>> - object not serializable (class:
>>>>>> org.apache.mahout.math.RandomAccessSparseVector,
>>>>>> value: {3:1.0,2:1.0})
>>>>>> - field (class: scala.Tuple2, name: _2, type: class
>>>>>> java.lang.Object)
>>>>>> - object (class scala.Tuple2, (2,{3:1.0,2:1.0})); not retrying
>>>>>> [ERROR] [TaskSetManager] Task 3.0 in stage 10.0 (TID 25) had a not
>>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>>>> Serialization stack:
>>>>>> - object not serializable (class:
>>>>>> org.apache.mahout.math.RandomAccessSparseVector,
>>>>>> value: {0:1.0,3:1.0})
>>>>>> - field (class: scala.Tuple2, name: _2, type: class
>>>>>> java.lang.Object)
>>>>>> - object (class scala.Tuple2, (3,{0:1.0,3:1.0})); not retrying
>>>>>> [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not
>>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>>>> Serialization stack:
>>>>>> - object not serializable (class:
>>>>>> org.apache.mahout.math.RandomAccessSparseVector,
>>>>>> value: {1:1.0})
>>>>>> - field (class: scala.Tuple2, name: _2, type: class
>>>>>> java.lang.Object)
>>>>>> - object (class scala.Tuple2, (1,{1:1.0})); not retrying
>>>>>> [ERROR] [TaskSetManager] Task 0.0 in stage 10.0 (TID 22) had a not
>>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>>>> Serialization stack:
>>>>>> - object not serializable (class:
>>>>>> org.apache.mahout.math.RandomAccessSparseVector,
>>>>>> value: {0:1.0})
>>>>>> - field (class: scala.Tuple2, name: _2, type: class
>>>>>> java.lang.Object)
>>>>>> - object (class scala.Tuple2, (0,{0:1.0})); not retrying
>>>>>> Exception in thread "main" org.apache.spark.SparkException: Job
>>>>>> aborted due to stage failure: Task 2.0 in stage 10.0 (TID 24) had a not
>>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>>>> Serialization stack:
>>>>>> - object not serializable (class:
>>>>>> org.apache.mahout.math.RandomAccessSparseVector,
>>>>>> value: {3:1.0,2:1.0})
>>>>>> - field (class: scala.Tuple2, name: _2, type: class
>>>>>> java.lang.Object)
>>>>>> - object (class scala.Tuple2, (2,{3:1.0,2:1.0}))
>>>>>> at org.apache.spark.scheduler.DAGScheduler.org
>>>>>> <http://org.apache.spark.scheduler.dagscheduler.org/>$
>>>>>> apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(
>>>>>> DAGScheduler.scala:1431)
>>>>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
>>>>>> abortStage$1.apply(DAGScheduler.scala:1419)
>>>>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
>>>>>> abortStage$1.apply(DAGScheduler.scala:1418)
>>>>>> at scala.collection.mutable.ResizableArray$class.foreach(
>>>>>> ResizableArray.scala:59)
>>>>>> at scala.collection.mutable.ArrayBuffer.foreach(
>>>>>> ArrayBuffer.scala:47)
>>>>>> at org.apache.spark.scheduler.DAGScheduler.abortStage(
>>>>>> DAGScheduler.scala:1418)
>>>>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
>>>>>> handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>>>>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
>>>>>> handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>>>>>> at scala.Option.foreach(Option.scala:236)
>>>>>> at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
>>>>>> DAGScheduler.scala:799)
>>>>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
>>>>>> doOnReceive(DAGScheduler.scala:1640)
>>>>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
>>>>>> onReceive(DAGScheduler.scala:1599)
>>>>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
>>>>>> onReceive(DAGScheduler.scala:1588)
>>>>>> at org.apache.spark.util.EventLoop$$anon$1.run(
>>>>>> EventLoop.scala:48)
>>>>>> at org.apache.spark.scheduler.DAGScheduler.runJob(
>>>>>> DAGScheduler.scala:620)
>>>>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>>>>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
>>>>>> at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1088)
>>>>>> at org.apache.spark.rdd.RDDOperationScope$.withScope(
>>>>>> RDDOperationScope.scala:150)
>>>>>> at org.apache.spark.rdd.RDDOperationScope$.withScope(
>>>>>> RDDOperationScope.scala:111)
>>>>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>>>>>> at org.apache.spark.rdd.RDD.fold(RDD.scala:1082)
>>>>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.com
>>>>>> <http://s.drm.checkpointeddrmspark.com/>
>>>>>> puteNRow(CheckpointedDrmSpark.scala:188)
>>>>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nrow$
>>>>>> lzycompute(CheckpointedDrmSpark.scala:55)
>>>>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nrow(
>>>>>> CheckpointedDrmSpark.scala:55)
>>>>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.
>>>>>> newRowCardinality(CheckpointedDrmSpark.scala:219)
>>>>>> at com.actionml.IndexedDatasetSpark$.apply(Preparator.scala:213)
>>>>>> at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:71)
>>>>>> at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:49)
>>>>>> at scala.collection.TraversableLike$$anonfun$map$
>>>>>> 1.apply(TraversableLike.scala:244)
>>>>>> at scala.collection.TraversableLike$$anonfun$map$
>>>>>> 1.apply(TraversableLike.scala:244)
>>>>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>>>>> at scala.collection.TraversableLike$class.map(
>>>>>> TraversableLike.scala:244)
>>>>>> at scala.collection.AbstractTraversable.map(
>>>>>> Traversable.scala:105)
>>>>>> at com.actionml.Preparator.prepare(Preparator.scala:49)
>>>>>> at com.actionml.Preparator.prepare(Preparator.scala:32)
>>>>>> at org.apache.predictionio.controller.PPreparator.
>>>>>> prepareBase(PPreparator.scala:37)
>>>>>> at org.apache.predictionio.controller.Engine$.train(
>>>>>> Engine.scala:671)
>>>>>> at org.apache.predictionio.controller.Engine.train(
>>>>>> Engine.scala:177)
>>>>>> at org.apache.predictionio.workflow.CoreWorkflow$.
>>>>>> runTrain(CoreWorkflow.scala:67)
>>>>>> at org.apache.predictionio.workflow.CreateWorkflow$.main(
>>>>>> CreateWorkflow.scala:250)
>>>>>> at org.apache.predictionio.workflow.CreateWorkflow.main(
>>>>>> CreateWorkflow.scala)
>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke(
>>>>>> NativeMethodAccessorImpl.java:62)
>>>>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>>>>>> DelegatingMethodAccessorImpl.java:43)
>>>>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
>>>>>> deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>>>>>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
>>>>>> SparkSubmit.scala:181)
>>>>>> at org.apache.spark.deploy.SparkSubmit$.submit(
>>>>>> SparkSubmit.scala:206)
>>>>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
>>>>>> scala:121)
>>>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>>>
>>>>>> *===========================================================*
>>>>>> Thank you all for your help.
>>>>>>
>>>>>> Best regards,
>>>>>> noelia
>>>>>>
>>>>>>
>>>>>