Thanks Pat! On 20 October 2017 at 16:53, Pat Ferrel <[email protected]> wrote:
> There are several algorithm resources. > A Math heavy one here: https://www.slideshare.net/pferrel/unified- > recommender-39986309 > A more result oriented one here: https://developer.ibm. > com/dwblog/2017/mahout-spark-correlated-cross-occurences/ > > The benefit of the CCO algorithm in the UR comes into play when you have > more than just conversions (buy for ecom, view for you) For just about all > other recommenders you really can only use one indicator of user > preference. Several experiments, including ones I’ve done, show you cannot > mix buys with detail-views in ecom or your results will be worse—that is > with single event recommenders like the Spark MLlib recommenders. The UR > uses multi-modal input so you can indeed improve results when using buys > with detail-views. The second post actually shows how dislikes can improve > results when you want to predict likes. > > In order to do this the CCO algorithm finds events that are correlated, > but it uses a statistical method that is suspicious of 100% correlation > since this is likely anomalous in the real world (caused by promotions, > give-aways, other anomalous outside influences). This statistical method is > called the log likelihood ratio. > > On Oct 20, 2017, at 12:17 AM, Noelia Osés Fernández <[email protected]> > wrote: > > Thanks for the explanation, Pat! > > I think the best course of action is for me to read the documentation and > understand how the algorithm works. Then, try again with a slightly larger > dataset. > > Thank you very much! > > On 19 October 2017 at 17:15, Pat Ferrel <[email protected]> wrote: > >> This sample dataset is too small with too few cooccurrences. U1 will >> never get i1 due to the blacklist (u1 has already viewed i1 so will not be >> recommended that again). The blacklist can be disable if you want to >> recommend viewed items again but beware that they may predominate every >> recommendations set if you do tun it off since it is self-fulfilling. Why >> not i2, not sure without running the math, the UR looks at things >> statistically and with this small a dataset anomalies can be seen since the >> data is not statistically significant. I1 will show up in internal >> intermediate results (A’A for instance) but these are then filtered by a >> statistical test called LLR, which requires a certain amount of data to >> work. >> >> Notice the handmade dataset has many more cooccurrences and produces >> understandable results. Also notice that in your dataset i3 and i4 can only >> be recommended by “popularity” since they have no cooccurrence. >> >> >> >> On Oct 19, 2017, at 1:28 AM, Noelia Osés Fernández <[email protected]> >> wrote: >> >> Pat, this worked!!!!! Thank you very much!!!! >> >> The only odd thing now is that all the results I get now are 0s. For >> example: >> >> Using the dataset: >> >> "u1","i1" >> "u2","i1" >> "u2","i2" >> "u3","i2" >> "u3","i3" >> "u4","i4" >> >> echo "Recommendations for user: u1" >> echo "" >> curl -H "Content-Type: application/json" -d ' >> { >> "user": "u1" >> }' http://localhost:8000/queries.json >> echo "" >> >> What I get is: >> >> {"itemScores":[{"item":"\"i2\"","score":0.0},{"item":"\"i1\" >> ","score":0.0},{"item":"\"i3\"","score":0.0},{"item":"\"i4\" >> ","score":0.0}]} >> >> >> If user u1 has viewed i1 and user u2 has viewed i1 and i2 then I think >> the algorithm should return a non-zore score for i2 (and possible i1, too). >> >> Even using the bigger dataset with 100 items I still get all scores 0s. >> >> So now I'm going to spend some time reading the following documentation, >> unless there is some other documentation you recommend I read first! >> >> - [The Universal Recommender](http://actionml.com/docs/ur) >> - [The Correlated Cross-Occurrence Algorithm](http://mahout.apach >> e.org/users/algorithms/intro-cooccurrence-spark.html) >> - [The Universal Recommender Slide Deck](http://www.slideshare.ne >> t/pferrel/unified-recommender-39986309) >> - [Multi-domain predictive AI or how to make one thing predict another]( >> https://developer.ibm.com/dwblog/2017/mahout-spark- >> correlated-cross-occurences/) >> >> Thank you very much for all your patience and help getting me to this >> point!!! >> >> Best regards, >> Noelia >> >> >> On 18 October 2017 at 18:33, Pat Ferrel <[email protected]> wrote: >> >>> It is the UR so Events are taken from the EventStore and converted into >>> a Mahout DistributedRowMatrix of RandomAccessSparseVectors, which are both >>> serializable. This path works fine and has for several years. >>> >>> This must be a config problem, like not using the MahoutKryoRegistrator, >>> which registers the serializers for these. >>> >>> @Noelia, you have left out the sparkConf section of the engine.json. The >>> one used in the integration test should work: >>> >>> { >>> "comment":" This config file uses default settings for all but the >>> required values see README.md for docs", >>> "id": "default", >>> "description": "Default settings", >>> "engineFactory": "com.actionml.RecommendationEngine", >>> "datasource": { >>> "params" : { >>> "name": "tiny_app_data.csv", >>> "appName": "TinyApp", >>> "eventNames": ["view"] >>> } >>> }, >>> "sparkConf": { <================= THIS WAS LEFT OUT IN YOUR >>> ENGINE.JSON BELOW IN THIS THREAD >>> "spark.serializer": "org.apache.spark.serializer.KryoSerializer", >>> "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io >>> .MahoutKryoRegistrator", >>> "spark.kryo.referenceTracking": "false", >>> "spark.kryoserializer.buffer": "300m", >>> "es.index.auto.create": "true" >>> }, >>> "algorithms": [ >>> { >>> "comment": "simplest setup where all values are default, >>> popularity based backfill, must add eventsNames", >>> "name": "ur", >>> "params": { >>> "appName": "TinyApp", >>> "indexName": "urindex", >>> "typeName": "items", >>> "comment": "must have data for the first event or the model >>> will not build, other events are optional", >>> "eventNames": ["view"] >>> } >>> } >>> ] >>> } >>> >>> >>> On Oct 18, 2017, at 8:49 AM, Donald Szeto <[email protected]> wrote: >>> >>> Chiming in a bit. Looking at the serialization error, it looks like we >>> are just one little step away from getting this to work. >>> >>> Noelia, what does your synthesized data look like? All data that is >>> processed by Spark needs to be serializable. At some point, a >>> non-serializable vector object showing in the stack is created out of your >>> synthesized data. It would be great to know what your input event looks >>> like and see where in the code path has caused this. >>> >>> Regards, >>> Donald >>> >>> On Tue, Oct 17, 2017 at 12:14 AM Noelia Osés Fernández < >>> [email protected]> wrote: >>> >>>> Pat, you mentioned the problem could be that the data I was using was >>>> too small. So now I'm using the attached data file as the data (4 users and >>>> 100 items). But I'm still getting the same error. I'm sorry I forgot to >>>> mention I had increased the dataset. >>>> >>>> The reason why I want to make it work with a very small dataset is >>>> because I want to be able to follow the calculations. I want to understand >>>> what the UR is doing and understand the impact of changing this or that, >>>> here or there... I find that easier to achieve with a small example in >>>> which I know exactly what's happening. I want to build my trust on my >>>> understanding of the UR before I move on to applying it to a real problem. >>>> If I'm not confident that I know how to use it, how can I tell my client >>>> that the results I'm getting are good with any degree of confidence? >>>> >>>> >>>> >>>> >>>> >>>> On 16 October 2017 at 20:44, Pat Ferrel <[email protected]> wrote: >>>> >>>>> So all setup is the same for the integration-test and your modified >>>>> test *except the data*? >>>>> >>>>> The error looks like a setup problem because the serialization should >>>>> happen with either test. But if the only difference really is the data, >>>>> then toss it and use either real data or the integration test data, why >>>>> are >>>>> you trying to synthesize fake data if it causes the error? >>>>> >>>>> BTW the data you include below in this thread would never create >>>>> internal IDs as high as 94 in the vector. You must have switched to a new >>>>> dataset??? >>>>> >>>>> I would get a dump of your data using `pio export` and make sure it’s >>>>> what you thought it was. You claim to have only 4 user ids and 4 item ids >>>>> but the serialized vector thinks you have at least 94 of user or item ids. >>>>> Something doesn’t add up. >>>>> >>>>> >>>>> On Oct 16, 2017, at 4:43 AM, Noelia Osés Fernández < >>>>> [email protected]> wrote: >>>>> >>>>> Pat, you are absolutely right! I increased the sleep time and now the >>>>> integration test for handmade works perfectly. >>>>> >>>>> However, the integration test adapted to run with my tiny app runs >>>>> into the same problem I've been having with this app: >>>>> >>>>> [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not >>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector >>>>> Serialization stack: >>>>> - object not serializable (class: >>>>> org.apache.mahout.math.RandomAccessSparseVector, >>>>> value: {66:1.0,29:1.0,70:1.0,91:1.0,58:1.0,37:1.0,13:1.0,8:1.0,94:1 >>>>> .0,30:1.0,57:1.0,22:1.0,20:1.0,35:1.0,97:1.0,60:1.0,27:1.0,7 >>>>> 2:1.0,3:1.0,34:1.0,77:1.0,46:1.0,81:1.0,86:1.0,43:1.0}) >>>>> - field (class: scala.Tuple2, name: _2, type: class >>>>> java.lang.Object) >>>>> - object (class scala.Tuple2, (1,{66:1.0,29:1.0,70:1.0,91:1. >>>>> 0,58:1.0,37:1.0,13:1.0,8:1.0,94:1.0,30:1.0,57:1.0,22:1.0,20: >>>>> 1.0,35:1.0,97:1.0,60:1.0,27:1.0,72:1.0,3:1.0,34:1.0,77:1.0,46:1.0,81:1.0,86:1.0,43:1.0})); >>>>> not retrying >>>>> [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not >>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector >>>>> Serialization stack: >>>>> >>>>> ... >>>>> >>>>> Any ideas? >>>>> >>>>> On 15 October 2017 at 19:09, Pat Ferrel <[email protected]> wrote: >>>>> >>>>>> This is probably a timing issue in the integration test, which has to >>>>>> wait for `pio deploy` to finish before the queries can be made. If it >>>>>> doesn’t finish the queries will fail. By the time the rest of the test >>>>>> quits the model has been deployed so you can run queries. In the >>>>>> integration-test script increase the delay after `pio deploy…` and see if >>>>>> it passes then. >>>>>> >>>>>> This is probably an integrtion-test script problem not a problem in >>>>>> the system >>>>>> >>>>>> >>>>>> >>>>>> On Oct 6, 2017, at 4:21 AM, Noelia Osés Fernández < >>>>>> [email protected]> wrote: >>>>>> >>>>>> Pat, >>>>>> >>>>>> I have run the integration test for the handmade example out of >>>>>> curiosity. Strangely enough things go more or less as expected apart from >>>>>> the fact that I get a message saying: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *...[INFO] [CoreWorkflow$] Updating engine instance[INFO] >>>>>> [CoreWorkflow$] Training completed successfully.Model will remain >>>>>> deployed >>>>>> after this testWaiting 30 seconds for the server to startnohup: >>>>>> redirecting >>>>>> stderr to stdout % Total % Received % Xferd Average Speed Time >>>>>> Time Time Current Dload Upload >>>>>> Total Spent Left Speed 0 0 0 0 0 0 0 0 >>>>>> --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost >>>>>> port 8000: Connection refused* >>>>>> So the integration test does not manage to get the recommendations >>>>>> even though the model trained and deployed successfully. However, as soon >>>>>> as the integration test finishes, on the same terminal, I can get the >>>>>> recommendations by doing the following: >>>>>> >>>>>> $ curl -H "Content-Type: application/json" -d ' >>>>>> > { >>>>>> > "user": "u1" >>>>>> > }' http://localhost:8000/queries.json >>>>>> {"itemScores":[{"item":"Nexus","score":0.057719700038433075} >>>>>> ,{"item":"Surface","score":0.0}]} >>>>>> >>>>>> Isn't this odd? Can you guess what's going on? >>>>>> >>>>>> Thank you very much for all your support! >>>>>> noelia >>>>>> >>>>>> >>>>>> >>>>>> On 5 October 2017 at 19:22, Pat Ferrel <[email protected]> wrote: >>>>>> >>>>>>> Ok, that config should work. Does the integration test pass? >>>>>>> >>>>>>> The data you are using is extremely small and though it does look >>>>>>> like it has cooccurrences, they may not meet minimum “big-data” >>>>>>> thresholds >>>>>>> used by default. Try adding more data or use the handmade example data, >>>>>>> rename purchase to view and discard the existing view data if you wish. >>>>>>> >>>>>>> The error is very odd and I’ve never seen it. If the integration >>>>>>> test works I can only surmise it's your data. >>>>>>> >>>>>>> >>>>>>> On Oct 5, 2017, at 12:02 AM, Noelia Osés Fernández < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>> SPARK: spark-1.6.3-bin-hadoop2.6 >>>>>>> >>>>>>> PIO: 0.11.0-incubating >>>>>>> >>>>>>> Scala: whatever gets installed when installing PIO >>>>>>> 0.11.0-incubating, I haven't installed Scala separately >>>>>>> >>>>>>> UR: ActionML's UR v0.6.0 I suppose as that's the last version >>>>>>> mentioned in the readme file. I have attached the UR zip file I >>>>>>> downloaded >>>>>>> from the actionml github account. >>>>>>> >>>>>>> Thank you for your help!! >>>>>>> >>>>>>> On 4 October 2017 at 17:20, Pat Ferrel <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> What version of Scala. Spark, PIO, and UR are you using? >>>>>>>> >>>>>>>> >>>>>>>> On Oct 4, 2017, at 6:10 AM, Noelia Osés Fernández < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I'm still trying to create a very simple app to learn to use >>>>>>>> PredictionIO and still having trouble. I have done pio build no >>>>>>>> problem. >>>>>>>> But when I do pio train I get a very long error message related to >>>>>>>> serialisation (error message copied below). >>>>>>>> >>>>>>>> pio status reports system is all ready to go. >>>>>>>> >>>>>>>> The app I'm trying to build is very simple, it only has 'view' >>>>>>>> events. Here's the engine.json: >>>>>>>> >>>>>>>> *===========================================================* >>>>>>>> { >>>>>>>> "comment":" This config file uses default settings for all but >>>>>>>> the required values see README.md for docs", >>>>>>>> "id": "default", >>>>>>>> "description": "Default settings", >>>>>>>> "engineFactory": "com.actionml.RecommendationEngine", >>>>>>>> "datasource": { >>>>>>>> "params" : { >>>>>>>> "name": "tiny_app_data.csv", >>>>>>>> "appName": "TinyApp", >>>>>>>> "eventNames": ["view"] >>>>>>>> } >>>>>>>> }, >>>>>>>> "algorithms": [ >>>>>>>> { >>>>>>>> "comment": "simplest setup where all values are default, >>>>>>>> popularity based backfill, must add eventsNames", >>>>>>>> "name": "ur", >>>>>>>> "params": { >>>>>>>> "appName": "TinyApp", >>>>>>>> "indexName": "urindex", >>>>>>>> "typeName": "items", >>>>>>>> "comment": "must have data for the first event or the >>>>>>>> model will not build, other events are optional", >>>>>>>> "eventNames": ["view"] >>>>>>>> } >>>>>>>> } >>>>>>>> ] >>>>>>>> } >>>>>>>> *===========================================================* >>>>>>>> >>>>>>>> The data I'm using is: >>>>>>>> >>>>>>>> "u1","i1" >>>>>>>> "u2","i1" >>>>>>>> "u2","i2" >>>>>>>> "u3","i2" >>>>>>>> "u3","i3" >>>>>>>> "u4","i4" >>>>>>>> >>>>>>>> meaning user u viewed item i. >>>>>>>> >>>>>>>> The data has been added to the database with the following python >>>>>>>> code: >>>>>>>> >>>>>>>> *===========================================================* >>>>>>>> """ >>>>>>>> Import sample data for recommendation engine >>>>>>>> """ >>>>>>>> >>>>>>>> import predictionio >>>>>>>> import argparse >>>>>>>> import random >>>>>>>> >>>>>>>> RATE_ACTIONS_DELIMITER = "," >>>>>>>> SEED = 1 >>>>>>>> >>>>>>>> >>>>>>>> def import_events(client, file): >>>>>>>> f = open(file, 'r') >>>>>>>> random.seed(SEED) >>>>>>>> count = 0 >>>>>>>> print "Importing data..." >>>>>>>> >>>>>>>> items = [] >>>>>>>> users = [] >>>>>>>> f = open(file, 'r') >>>>>>>> for line in f: >>>>>>>> data = line.rstrip('\r\n').split(RATE_ACTIONS_DELIMITER) >>>>>>>> users.append(data[0]) >>>>>>>> items.append(data[1]) >>>>>>>> client.create_event( >>>>>>>> event="view", >>>>>>>> entity_type="user", >>>>>>>> entity_id=data[0], >>>>>>>> target_entity_type="item", >>>>>>>> target_entity_id=data[1] >>>>>>>> ) >>>>>>>> print "Event: " + "view" + " entity_id: " + data[0] + " >>>>>>>> target_entity_id: " + data[1] >>>>>>>> count += 1 >>>>>>>> f.close() >>>>>>>> >>>>>>>> users = set(users) >>>>>>>> items = set(items) >>>>>>>> print "All users: " + str(users) >>>>>>>> print "All items: " + str(items) >>>>>>>> for item in items: >>>>>>>> client.create_event( >>>>>>>> event="$set", >>>>>>>> entity_type="item", >>>>>>>> entity_id=item >>>>>>>> ) >>>>>>>> count += 1 >>>>>>>> >>>>>>>> >>>>>>>> print "%s events are imported." % count >>>>>>>> >>>>>>>> >>>>>>>> if __name__ == '__main__': >>>>>>>> parser = argparse.ArgumentParser( >>>>>>>> description="Import sample data for recommendation engine") >>>>>>>> parser.add_argument('--access_key', default='invald_access_key') >>>>>>>> parser.add_argument('--url', default="http://localhost:7070") >>>>>>>> parser.add_argument('--file', default="./data/tiny_app_data.csv") >>>>>>>> >>>>>>>> args = parser.parse_args() >>>>>>>> print args >>>>>>>> >>>>>>>> client = predictionio.EventClient( >>>>>>>> access_key=args.access_key, >>>>>>>> url=args.url, >>>>>>>> threads=5, >>>>>>>> qsize=500) >>>>>>>> import_events(client, args.file) >>>>>>>> *===========================================================* >>>>>>>> >>>>>>>> My pio_env.sh is the following: >>>>>>>> >>>>>>>> *===========================================================* >>>>>>>> #!/usr/bin/env bash >>>>>>>> # >>>>>>>> # Copy this file as pio-env.sh and edit it for your site's >>>>>>>> configuration. >>>>>>>> # >>>>>>>> # Licensed to the Apache Software Foundation (ASF) under one or more >>>>>>>> # contributor license agreements. See the NOTICE file distributed >>>>>>>> with >>>>>>>> # this work for additional information regarding copyright >>>>>>>> ownership. >>>>>>>> # The ASF licenses this file to You under the Apache License, >>>>>>>> Version 2.0 >>>>>>>> # (the "License"); you may not use this file except in compliance >>>>>>>> with >>>>>>>> # the License. You may obtain a copy of the License at >>>>>>>> # >>>>>>>> # http://www.apache.org/licenses/LICENSE-2.0 >>>>>>>> # >>>>>>>> # Unless required by applicable law or agreed to in writing, >>>>>>>> software >>>>>>>> # distributed under the License is distributed on an "AS IS" BASIS, >>>>>>>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or >>>>>>>> implied. >>>>>>>> # See the License for the specific language governing permissions >>>>>>>> and >>>>>>>> # limitations under the License. >>>>>>>> # >>>>>>>> >>>>>>>> # PredictionIO Main Configuration >>>>>>>> # >>>>>>>> # This section controls core behavior of PredictionIO. It is very >>>>>>>> likely that >>>>>>>> # you need to change these to fit your site. >>>>>>>> >>>>>>>> # SPARK_HOME: Apache Spark is a hard dependency and must be >>>>>>>> configured. >>>>>>>> # SPARK_HOME=$PIO_HOME/vendors/spark-2.0.2-bin-hadoop2.7 >>>>>>>> SPARK_HOME=$PIO_HOME/vendors/spark-1.6.3-bin-hadoop2.6 >>>>>>>> >>>>>>>> POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.1.4.jar >>>>>>>> MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar >>>>>>>> >>>>>>>> # ES_CONF_DIR: You must configure this if you have advanced >>>>>>>> configuration for >>>>>>>> # your Elasticsearch setup. >>>>>>>> # ES_CONF_DIR=/opt/elasticsearch >>>>>>>> #ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch-1.7.6 >>>>>>>> >>>>>>>> # HADOOP_CONF_DIR: You must configure this if you intend to run >>>>>>>> PredictionIO >>>>>>>> # with Hadoop 2. >>>>>>>> # HADOOP_CONF_DIR=/opt/hadoop >>>>>>>> >>>>>>>> # HBASE_CONF_DIR: You must configure this if you intend to run >>>>>>>> PredictionIO >>>>>>>> # with HBase on a remote cluster. >>>>>>>> # HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf >>>>>>>> >>>>>>>> # Filesystem paths where PredictionIO uses as block storage. >>>>>>>> PIO_FS_BASEDIR=$HOME/.pio_store >>>>>>>> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines >>>>>>>> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp >>>>>>>> >>>>>>>> # PredictionIO Storage Configuration >>>>>>>> # >>>>>>>> # This section controls programs that make use of PredictionIO's >>>>>>>> built-in >>>>>>>> # storage facilities. Default values are shown below. >>>>>>>> # >>>>>>>> # For more information on storage configuration please refer to >>>>>>>> # http://predictionio.incubator.apache.org/system/anotherdatastore/ >>>>>>>> >>>>>>>> # Storage Repositories >>>>>>>> >>>>>>>> # Default is to use PostgreSQL >>>>>>>> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta >>>>>>>> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH >>>>>>>> >>>>>>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event >>>>>>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE >>>>>>>> >>>>>>>> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model >>>>>>>> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS >>>>>>>> >>>>>>>> # Storage Data Sources >>>>>>>> >>>>>>>> # PostgreSQL Default Settings >>>>>>>> # Please change "pio" to your database name in >>>>>>>> PIO_STORAGE_SOURCES_PGSQL_URL >>>>>>>> # Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and >>>>>>>> # PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly >>>>>>>> PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc >>>>>>>> PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio >>>>>>>> PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio >>>>>>>> PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio >>>>>>>> >>>>>>>> # MySQL Example >>>>>>>> # PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc >>>>>>>> # PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio >>>>>>>> # PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio >>>>>>>> # PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio >>>>>>>> >>>>>>>> # Elasticsearch Example >>>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch >>>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost >>>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200 >>>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http >>>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/ela >>>>>>>> sticsearch-5.2.1 >>>>>>>> # Elasticsearch 1.x Example >>>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch >>>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=myprojectES >>>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost >>>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300 >>>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/ela >>>>>>>> sticsearch-1.7.6 >>>>>>>> >>>>>>>> # Local File System Example >>>>>>>> PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs >>>>>>>> PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models >>>>>>>> >>>>>>>> # HBase Example >>>>>>>> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase >>>>>>>> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.2.6 >>>>>>>> >>>>>>>> >>>>>>>> *===========================================================Error >>>>>>>> message:* >>>>>>>> >>>>>>>> *===========================================================* >>>>>>>> [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not >>>>>>>> serializable result: org.apache.mahout.math.RandomA >>>>>>>> ccessSparseVector >>>>>>>> Serialization stack: >>>>>>>> - object not serializable (class: >>>>>>>> org.apache.mahout.math.RandomAccessSparseVector, value: >>>>>>>> {3:1.0,2:1.0}) >>>>>>>> - field (class: scala.Tuple2, name: _2, type: class >>>>>>>> java.lang.Object) >>>>>>>> - object (class scala.Tuple2, (2,{3:1.0,2:1.0})); not retrying >>>>>>>> [ERROR] [TaskSetManager] Task 3.0 in stage 10.0 (TID 25) had a not >>>>>>>> serializable result: org.apache.mahout.math.RandomA >>>>>>>> ccessSparseVector >>>>>>>> Serialization stack: >>>>>>>> - object not serializable (class: >>>>>>>> org.apache.mahout.math.RandomAccessSparseVector, value: >>>>>>>> {0:1.0,3:1.0}) >>>>>>>> - field (class: scala.Tuple2, name: _2, type: class >>>>>>>> java.lang.Object) >>>>>>>> - object (class scala.Tuple2, (3,{0:1.0,3:1.0})); not retrying >>>>>>>> [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not >>>>>>>> serializable result: org.apache.mahout.math.RandomA >>>>>>>> ccessSparseVector >>>>>>>> Serialization stack: >>>>>>>> - object not serializable (class: >>>>>>>> org.apache.mahout.math.RandomAccessSparseVector, value: {1:1.0}) >>>>>>>> - field (class: scala.Tuple2, name: _2, type: class >>>>>>>> java.lang.Object) >>>>>>>> - object (class scala.Tuple2, (1,{1:1.0})); not retrying >>>>>>>> [ERROR] [TaskSetManager] Task 0.0 in stage 10.0 (TID 22) had a not >>>>>>>> serializable result: org.apache.mahout.math.RandomA >>>>>>>> ccessSparseVector >>>>>>>> Serialization stack: >>>>>>>> - object not serializable (class: >>>>>>>> org.apache.mahout.math.RandomAccessSparseVector, value: {0:1.0}) >>>>>>>> - field (class: scala.Tuple2, name: _2, type: class >>>>>>>> java.lang.Object) >>>>>>>> - object (class scala.Tuple2, (0,{0:1.0})); not retrying >>>>>>>> Exception in thread "main" org.apache.spark.SparkException: Job >>>>>>>> aborted due to stage failure: Task 2.0 in stage 10.0 (TID 24) had a not >>>>>>>> serializable result: org.apache.mahout.math.RandomA >>>>>>>> ccessSparseVector >>>>>>>> Serialization stack: >>>>>>>> - object not serializable (class: >>>>>>>> org.apache.mahout.math.RandomAccessSparseVector, value: >>>>>>>> {3:1.0,2:1.0}) >>>>>>>> - field (class: scala.Tuple2, name: _2, type: class >>>>>>>> java.lang.Object) >>>>>>>> - object (class scala.Tuple2, (2,{3:1.0,2:1.0})) >>>>>>>> at org.apache.spark.scheduler.DAGScheduler.org >>>>>>>> <http://org.apache.spark.scheduler.dagscheduler.org/>$apache$sp >>>>>>>> ark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGS >>>>>>>> cheduler.scala:1431) >>>>>>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$ >>>>>>>> 1.apply(DAGScheduler.scala:1419) >>>>>>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$ >>>>>>>> 1.apply(DAGScheduler.scala:1418) >>>>>>>> at scala.collection.mutable.ResizableArray$class.foreach(Resiza >>>>>>>> bleArray.scala:59) >>>>>>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.sca >>>>>>>> la:47) >>>>>>>> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu >>>>>>>> ler.scala:1418) >>>>>>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS >>>>>>>> etFailed$1.apply(DAGScheduler.scala:799) >>>>>>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS >>>>>>>> etFailed$1.apply(DAGScheduler.scala:799) >>>>>>>> at scala.Option.foreach(Option.scala:236) >>>>>>>> at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( >>>>>>>> DAGScheduler.scala:799) >>>>>>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn >>>>>>>> Receive(DAGScheduler.scala:1640) >>>>>>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe >>>>>>>> ceive(DAGScheduler.scala:1599) >>>>>>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe >>>>>>>> ceive(DAGScheduler.scala:1588) >>>>>>>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala: >>>>>>>> 48) >>>>>>>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler. >>>>>>>> scala:620) >>>>>>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832 >>>>>>>> ) >>>>>>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952 >>>>>>>> ) >>>>>>>> at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:108 >>>>>>>> 8) >>>>>>>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>>>>>>> onScope.scala:150) >>>>>>>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>>>>>>> onScope.scala:111) >>>>>>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) >>>>>>>> at org.apache.spark.rdd.RDD.fold(RDD.scala:1082) >>>>>>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.com >>>>>>>> <http://s.drm.checkpointeddrmspark.com/> >>>>>>>> puteNRow(CheckpointedDrmSpark.scala:188) >>>>>>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nro >>>>>>>> w$lzycompute(CheckpointedDrmSpark.scala:55) >>>>>>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nro >>>>>>>> w(CheckpointedDrmSpark.scala:55) >>>>>>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.new >>>>>>>> RowCardinality(CheckpointedDrmSpark.scala:219) >>>>>>>> at com.actionml.IndexedDatasetSpark$.apply(Preparator.scala:213 >>>>>>>> ) >>>>>>>> at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:71 >>>>>>>> ) >>>>>>>> at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:49 >>>>>>>> ) >>>>>>>> at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver >>>>>>>> sableLike.scala:244) >>>>>>>> at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver >>>>>>>> sableLike.scala:244) >>>>>>>> at scala.collection.immutable.List.foreach(List.scala:318) >>>>>>>> at scala.collection.TraversableLike$class.map(TraversableLike.s >>>>>>>> cala:244) >>>>>>>> at scala.collection.AbstractTraversable.map(Traversable.scala:1 >>>>>>>> 05) >>>>>>>> at com.actionml.Preparator.prepare(Preparator.scala:49) >>>>>>>> at com.actionml.Preparator.prepare(Preparator.scala:32) >>>>>>>> at org.apache.predictionio.controller.PPreparator.prepareBase(P >>>>>>>> Preparator.scala:37) >>>>>>>> at org.apache.predictionio.controller.Engine$.train(Engine.scal >>>>>>>> a:671) >>>>>>>> at org.apache.predictionio.controller.Engine.train(Engine.scala >>>>>>>> :177) >>>>>>>> at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(Core >>>>>>>> Workflow.scala:67) >>>>>>>> at org.apache.predictionio.workflow.CreateWorkflow$.main(Create >>>>>>>> Workflow.scala:250) >>>>>>>> at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW >>>>>>>> orkflow.scala) >>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >>>>>>>> ssorImpl.java:62) >>>>>>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >>>>>>>> thodAccessorImpl.java:43) >>>>>>>> at java.lang.reflect.Method.invoke(Method.java:498) >>>>>>>> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy >>>>>>>> $SparkSubmit$$runMain(SparkSubmit.scala:731) >>>>>>>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit >>>>>>>> .scala:181) >>>>>>>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scal >>>>>>>> a:206) >>>>>>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala: >>>>>>>> 121) >>>>>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>>>>>> >>>>>>>> *===========================================================* >>>>>>>> Thank you all for your help. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> noelia >>>>>>>> >>>>>>>> >>>>>>> >> >> >> >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "actionml-user" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> To view this discussion on the web visit https://groups.google.co >> m/d/msgid/actionml-user/CAMysefsW%3DeYPjUE1pc67C9D312HL_xNMt >> zmStUwDsUdHCCVU-Q%40mail.gmail.com >> <https://groups.google.com/d/msgid/actionml-user/CAMysefsW%3DeYPjUE1pc67C9D312HL_xNMtzmStUwDsUdHCCVU-Q%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "actionml-user" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> To view this discussion on the web visit https://groups.google. >> com/d/msgid/actionml-user/ACE11A1B-C887-41F1-820B- >> 3B161EDCDABA%40occamsmachete.com >> <https://groups.google.com/d/msgid/actionml-user/ACE11A1B-C887-41F1-820B-3B161EDCDABA%40occamsmachete.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> >> > > > -- > <http://www.vicomtech.org/> > > Noelia Osés Fernández, PhD > Senior Researcher | > Investigadora Senior > > [email protected] > +[34] 943 30 92 30 > Data Intelligence for Energy and > Industrial Processes | Inteligencia > de Datos para Energía y Procesos > Industriales > > <https://www.linkedin.com/company/vicomtech> > <https://www.youtube.com/user/VICOMTech> > <https://twitter.com/@Vicomtech_IK4> > > member of: <http://www.graphicsmedia.net/> <http://www.ik4.es/> > > Legal Notice - Privacy policy > <http://www.vicomtech.org/en/proteccion-datos> > > -- > You received this message because you are subscribed to the Google Groups > "actionml-user" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > To view this discussion on the web visit https://groups.google. > com/d/msgid/actionml-user/CAMysefsy0K66O9CJw- > j3qdkN7rqoXrwHOsmgaNQTioeLuvX7Xg%40mail.gmail.com > <https://groups.google.com/d/msgid/actionml-user/CAMysefsy0K66O9CJw-j3qdkN7rqoXrwHOsmgaNQTioeLuvX7Xg%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > -- <http://www.vicomtech.org> Noelia Osés Fernández, PhD Senior Researcher | Investigadora Senior [email protected] +[34] 943 30 92 30 Data Intelligence for Energy and Industrial Processes | Inteligencia de Datos para Energía y Procesos Industriales <https://www.linkedin.com/company/vicomtech> <https://www.youtube.com/user/VICOMTech> <https://twitter.com/@Vicomtech_IK4> member of: <http://www.graphicsmedia.net/> <http://www.ik4.es> Legal Notice - Privacy policy <http://www.vicomtech.org/en/proteccion-datos>
