Re: Dynamically change parameter list

2018-02-15 Thread Tihomir Lolić
Hi Pat,

just wanted to follow up on this. I've modified CoreWorkflow to be able to
store alogrithmParams in the engineInstance.

val engineInstances = Storage.getMetaDataEngineInstances
engineInstances.update(engineInstance.copy(
  status = "COMPLETED",
  endTime = DateTime.now,
  algorithmsParams =
if(models(0).isInstanceOf[CustomCrossValidatorModel])
JsonExtractor.paramsToJson(workflowConfig.jsonExtractor,
algorithmParamsList) else engineInstance.algorithmsParams
))

Because I am using CrossValidator I had to extend it with one additional
parameter which I wanted to save during train of the model.

I don't need this saved data during retraining but only during prediction.
In case I need them during retraining I would modify TrainApp in a way to
fetch the data before starting the train and this would solve the problem
in case of reinforcement.

Hope this would help someone who needs such scenarios.

Best,
Tihomir


On Tue, Feb 13, 2018 at 12:35 AM, Pat Ferrel <p...@occamsmachete.com> wrote:

> That would be fine since the model can contain anything. But the real
> question is where you want to use those params. If you need to use them the
> next time you train, you’ll have to persist them to a place read during
> training. That is usually only the metadata store (obviously input events
> too), which has the contents of engine.json. So to get them into the
> metadata store you may have to alter engine.json.
>
> Unless someone else knows how to alter the metadata directly after `pio
> train`
>
> One problem is that you will never know what the new params are without
> putting them in a file or logging them. We keep them in a separate place
> and merge them with engine.json explicitly so we can see what is happening.
> They are calculated parameters, not hand made tunings. It seems important
> to me to keep those separate unless you are talking about some type of
> expected reinforcement learning, not really params but an evolving model.
>
>
> On Feb 12, 2018, at 2:48 PM, Tihomir Lolić <tihomir.lo...@gmail.com>
> wrote:
>
> Thank you very much for the answer. I'll try with customizing workflow.
> There is a step where Seq of models is returned. My idea is to return model
> and model parameters in this step. I'll let you know if it works.
>
> Thanks,
> Tihomie
>
> On Feb 12, 2018 23:34, "Pat Ferrel" <p...@occamsmachete.com> wrote:
>
>> This is an interesting question. As we make more mature full featured
>> engines they will begin to employ hyper parameter search techniques or
>> reinforcement params. This means that there is a new stage in the workflow
>> or a feedback loop not already accounted for.
>>
>> Short answer is no, unless you want to re-write your engine.json after
>> every train and probably keep the old one for safety. You must re-train to
>> get the new params put into the metastore and therefor available to your
>> engine.
>>
>> What we do for the Universal Recommender is have a special new workflow
>> phase, call it a self-tuning phase, where we search for the right tuning of
>> parameters. This it done with code that runs outside of pio and creates
>> parameters that go into the engine.json. This can be done periodically to
>> make sure the tuning is still optimal.
>>
>> Not sure whether feedback or hyper parameter search is the best
>> architecture for you.
>>
>>
>> From: Tihomir Lolić <tihomir.lo...@gmail.com> <tihomir.lo...@gmail.com>
>> Reply: user@predictionio.apache.org <user@predictionio.apache.org>
>> <user@predictionio.apache.org>
>> Date: February 12, 2018 at 2:02:48 PM
>> To: user@predictionio.apache.org <user@predictionio.apache.org>
>> <user@predictionio.apache.org>
>> Subject:  Dynamically change parameter list
>>
>> Hi,
>>
>> I am trying to figure out how to dynamically update algorithm parameter
>> list. After the train is finished only model is updated. The reason why I
>> need this data to be updated is that I am creating data mapping based on
>> the training data. Is there a way to update this data after the train is
>> done?
>>
>> Here is the code that I am using. The variable that and should be updated
>> after the train is marked *bold red.*
>>
>> import io.prediction.controller.{EmptyParams, EngineParams}
>> import io.prediction.data.storage.EngineInstance
>> import io.prediction.workflow.CreateWorkflow.WorkflowConfig
>> import io.prediction.workflow._
>> import org.apache.spark.ml.linalg.SparseVector
>> import org.joda.time.DateTime
>> import org.json4s.JsonAST._
>>
>> import scala.collection.mutable
>>
>&

Re: Dynamically change parameter list

2018-02-12 Thread Pat Ferrel
That would be fine since the model can contain anything. But the real question 
is where you want to use those params. If you need to use them the next time 
you train, you’ll have to persist them to a place read during training. That is 
usually only the metadata store (obviously input events too), which has the 
contents of engine.json. So to get them into the metadata store you may have to 
alter engine.json. 

Unless someone else knows how to alter the metadata directly after `pio train`

One problem is that you will never know what the new params are without putting 
them in a file or logging them. We keep them in a separate place and merge them 
with engine.json explicitly so we can see what is happening. They are 
calculated parameters, not hand made tunings. It seems important to me to keep 
those separate unless you are talking about some type of expected reinforcement 
learning, not really params but an evolving model.
 

On Feb 12, 2018, at 2:48 PM, Tihomir Lolić <tihomir.lo...@gmail.com> wrote:

Thank you very much for the answer. I'll try with customizing workflow. There 
is a step where Seq of models is returned. My idea is to return model and model 
parameters in this step. I'll let you know if it works.

Thanks,
Tihomie

On Feb 12, 2018 23:34, "Pat Ferrel" <p...@occamsmachete.com 
<mailto:p...@occamsmachete.com>> wrote:
This is an interesting question. As we make more mature full featured engines 
they will begin to employ hyper parameter search techniques or reinforcement 
params. This means that there is a new stage in the workflow or a feedback loop 
not already accounted for.

Short answer is no, unless you want to re-write your engine.json after every 
train and probably keep the old one for safety. You must re-train to get the 
new params put into the metastore and therefor available to your engine.

What we do for the Universal Recommender is have a special new workflow phase, 
call it a self-tuning phase, where we search for the right tuning of 
parameters. This it done with code that runs outside of pio and creates 
parameters that go into the engine.json. This can be done periodically to make 
sure the tuning is still optimal.

Not sure whether feedback or hyper parameter search is the best architecture 
for you.


From: Tihomir Lolić <tihomir.lo...@gmail.com> <mailto:tihomir.lo...@gmail.com>
Reply: user@predictionio.apache.org <mailto:user@predictionio.apache.org> 
<user@predictionio.apache.org> <mailto:user@predictionio.apache.org>
Date: February 12, 2018 at 2:02:48 PM
To: user@predictionio.apache.org <mailto:user@predictionio.apache.org> 
<user@predictionio.apache.org> <mailto:user@predictionio.apache.org>
Subject:  Dynamically change parameter list 

> Hi,
> 
> I am trying to figure out how to dynamically update algorithm parameter list. 
> After the train is finished only model is updated. The reason why I need this 
> data to be updated is that I am creating data mapping based on the training 
> data. Is there a way to update this data after the train is done?
> 
> Here is the code that I am using. The variable that and should be updated 
> after the train is marked bold red.
> 
> import io.prediction.controller.{EmptyParams, EngineParams}
> import io.prediction.data.storage.EngineInstance
> import io.prediction.workflow.CreateWorkflow.WorkflowConfig
> import io.prediction.workflow._
> import org.apache.spark.ml.linalg.SparseVector
> import org.joda.time.DateTime
> import org.json4s.JsonAST._
> 
> import scala.collection.mutable
> 
> object TrainApp extends App {
> 
>   val envs = Map("FOO" -> "BAR")
> 
>   val sparkEnv = Map("spark.master" -> "local")
> 
>   val sparkConf = Map("spark.executor.extraClassPath" -> ".")
> 
>   val engineFactoryName = "LogisticRegressionEngine"
> 
>   val workflowConfig = WorkflowConfig(
> engineId = EngineConfig.engineId,
> engineVersion = EngineConfig.engineVersion,
> engineVariant = EngineConfig.engineVariantId,
> engineFactory = engineFactoryName
>   )
> 
>   val workflowParams = WorkflowParams(
> verbose = workflowConfig.verbosity,
> skipSanityCheck = workflowConfig.skipSanityCheck,
> stopAfterRead = workflowConfig.stopAfterRead,
> stopAfterPrepare = workflowConfig.stopAfterPrepare,
> sparkEnv = WorkflowParams().sparkEnv ++ sparkEnv
>   )
> 
>   WorkflowUtils.modifyLogging(workflowConfig.verbose)
> 
>   val dataSourceParams = DataSourceParams(sys.env.get("APP_NAME").get)
>   val preparatorParams = EmptyParams()
> 
>   val algorithmParamsList = Seq("Logistic" -> LogisticParams(columns = 
> Array[String](),
>

Re: Dynamically change parameter list

2018-02-12 Thread Tihomir Lolić
Thank you very much for the answer. I'll try with customizing workflow.
There is a step where Seq of models is returned. My idea is to return model
and model parameters in this step. I'll let you know if it works.

Thanks,
Tihomie

On Feb 12, 2018 23:34, "Pat Ferrel" <p...@occamsmachete.com> wrote:

> This is an interesting question. As we make more mature full featured
> engines they will begin to employ hyper parameter search techniques or
> reinforcement params. This means that there is a new stage in the workflow
> or a feedback loop not already accounted for.
>
> Short answer is no, unless you want to re-write your engine.json after
> every train and probably keep the old one for safety. You must re-train to
> get the new params put into the metastore and therefor available to your
> engine.
>
> What we do for the Universal Recommender is have a special new workflow
> phase, call it a self-tuning phase, where we search for the right tuning of
> parameters. This it done with code that runs outside of pio and creates
> parameters that go into the engine.json. This can be done periodically to
> make sure the tuning is still optimal.
>
> Not sure whether feedback or hyper parameter search is the best
> architecture for you.
>
>
> From: Tihomir Lolić <tihomir.lo...@gmail.com> <tihomir.lo...@gmail.com>
> Reply: user@predictionio.apache.org <user@predictionio.apache.org>
> <user@predictionio.apache.org>
> Date: February 12, 2018 at 2:02:48 PM
> To: user@predictionio.apache.org <user@predictionio.apache.org>
> <user@predictionio.apache.org>
> Subject:  Dynamically change parameter list
>
> Hi,
>
> I am trying to figure out how to dynamically update algorithm parameter
> list. After the train is finished only model is updated. The reason why I
> need this data to be updated is that I am creating data mapping based on
> the training data. Is there a way to update this data after the train is
> done?
>
> Here is the code that I am using. The variable that and should be updated
> after the train is marked *bold red.*
>
> import io.prediction.controller.{EmptyParams, EngineParams}
> import io.prediction.data.storage.EngineInstance
> import io.prediction.workflow.CreateWorkflow.WorkflowConfig
> import io.prediction.workflow._
> import org.apache.spark.ml.linalg.SparseVector
> import org.joda.time.DateTime
> import org.json4s.JsonAST._
>
> import scala.collection.mutable
>
> object TrainApp extends App {
>
>   val envs = Map("FOO" -> "BAR")
>
>   val sparkEnv = Map("spark.master" -> "local")
>
>   val sparkConf = Map("spark.executor.extraClassPath" -> ".")
>
>   val engineFactoryName = "LogisticRegressionEngine"
>
>   val workflowConfig = WorkflowConfig(
> engineId = EngineConfig.engineId,
> engineVersion = EngineConfig.engineVersion,
> engineVariant = EngineConfig.engineVariantId,
> engineFactory = engineFactoryName
>   )
>
>   val workflowParams = WorkflowParams(
> verbose = workflowConfig.verbosity,
> skipSanityCheck = workflowConfig.skipSanityCheck,
> stopAfterRead = workflowConfig.stopAfterRead,
> stopAfterPrepare = workflowConfig.stopAfterPrepare,
> sparkEnv = WorkflowParams().sparkEnv ++ sparkEnv
>   )
>
>   WorkflowUtils.modifyLogging(workflowConfig.verbose)
>
>   val dataSourceParams = DataSourceParams(sys.env.get("APP_NAME").get)
>   val preparatorParams = EmptyParams()
>
>   *val algorithmParamsList = Seq("Logistic" -> LogisticParams(columns =
> Array[String](),*
> *  dataMapping
> = Map[String, Map[String, SparseVector]]()))*
>   val servingParams = EmptyParams()
>
>   val engineInstance = EngineInstance(
> id = "",
> status = "INIT",
> startTime = DateTime.now,
> endTime = DateTime.now,
> engineId = workflowConfig.engineId,
> engineVersion = workflowConfig.engineVersion,
> engineVariant = workflowConfig.engineVariant,
> engineFactory = workflowConfig.engineFactory,
> batch = workflowConfig.batch,
> env = envs,
> sparkConf = sparkConf,
> dataSourceParams = JsonExtractor.paramToJson(workflowConfig.jsonExtractor,
> workflowConfig.engineParamsKey -> dataSourceParams),
> preparatorParams = JsonExtractor.paramToJson(workflowConfig.jsonExtractor,
> workflowConfig.engineParamsKey -> preparatorParams),
> algorithmsParams = 
> JsonExtractor.paramsToJson(workflowConfig.jsonExtractor,
> algorithmParamsList),
> servingParams = JsonExtractor.paramToJs

Re: Dynamically change parameter list

2018-02-12 Thread Pat Ferrel
This is an interesting question. As we make more mature full featured
engines they will begin to employ hyper parameter search techniques or
reinforcement params. This means that there is a new stage in the workflow
or a feedback loop not already accounted for.

Short answer is no, unless you want to re-write your engine.json after
every train and probably keep the old one for safety. You must re-train to
get the new params put into the metastore and therefor available to your
engine.

What we do for the Universal Recommender is have a special new workflow
phase, call it a self-tuning phase, where we search for the right tuning of
parameters. This it done with code that runs outside of pio and creates
parameters that go into the engine.json. This can be done periodically to
make sure the tuning is still optimal.

Not sure whether feedback or hyper parameter search is the best
architecture for you.


From: Tihomir Lolić <tihomir.lo...@gmail.com> <tihomir.lo...@gmail.com>
Reply: user@predictionio.apache.org <user@predictionio.apache.org>
<user@predictionio.apache.org>
Date: February 12, 2018 at 2:02:48 PM
To: user@predictionio.apache.org <user@predictionio.apache.org>
<user@predictionio.apache.org>
Subject:  Dynamically change parameter list

Hi,

I am trying to figure out how to dynamically update algorithm parameter
list. After the train is finished only model is updated. The reason why I
need this data to be updated is that I am creating data mapping based on
the training data. Is there a way to update this data after the train is
done?

Here is the code that I am using. The variable that and should be updated
after the train is marked *bold red.*

import io.prediction.controller.{EmptyParams, EngineParams}
import io.prediction.data.storage.EngineInstance
import io.prediction.workflow.CreateWorkflow.WorkflowConfig
import io.prediction.workflow._
import org.apache.spark.ml.linalg.SparseVector
import org.joda.time.DateTime
import org.json4s.JsonAST._

import scala.collection.mutable

object TrainApp extends App {

  val envs = Map("FOO" -> "BAR")

  val sparkEnv = Map("spark.master" -> "local")

  val sparkConf = Map("spark.executor.extraClassPath" -> ".")

  val engineFactoryName = "LogisticRegressionEngine"

  val workflowConfig = WorkflowConfig(
engineId = EngineConfig.engineId,
engineVersion = EngineConfig.engineVersion,
engineVariant = EngineConfig.engineVariantId,
engineFactory = engineFactoryName
  )

  val workflowParams = WorkflowParams(
verbose = workflowConfig.verbosity,
skipSanityCheck = workflowConfig.skipSanityCheck,
stopAfterRead = workflowConfig.stopAfterRead,
stopAfterPrepare = workflowConfig.stopAfterPrepare,
sparkEnv = WorkflowParams().sparkEnv ++ sparkEnv
  )

  WorkflowUtils.modifyLogging(workflowConfig.verbose)

  val dataSourceParams = DataSourceParams(sys.env.get("APP_NAME").get)
  val preparatorParams = EmptyParams()

  *val algorithmParamsList = Seq("Logistic" -> LogisticParams(columns =
Array[String](),*
*  dataMapping
= Map[String, Map[String, SparseVector]]()))*
  val servingParams = EmptyParams()

  val engineInstance = EngineInstance(
id = "",
status = "INIT",
startTime = DateTime.now,
endTime = DateTime.now,
engineId = workflowConfig.engineId,
engineVersion = workflowConfig.engineVersion,
engineVariant = workflowConfig.engineVariant,
engineFactory = workflowConfig.engineFactory,
batch = workflowConfig.batch,
env = envs,
sparkConf = sparkConf,
dataSourceParams =
JsonExtractor.paramToJson(workflowConfig.jsonExtractor,
workflowConfig.engineParamsKey -> dataSourceParams),
preparatorParams =
JsonExtractor.paramToJson(workflowConfig.jsonExtractor,
workflowConfig.engineParamsKey -> preparatorParams),
algorithmsParams =
JsonExtractor.paramsToJson(workflowConfig.jsonExtractor,
algorithmParamsList),
servingParams = JsonExtractor.paramToJson(workflowConfig.jsonExtractor,
workflowConfig.engineParamsKey -> servingParams)
  )

  val (engineLanguage, engineFactory) =
WorkflowUtils.getEngine(engineInstance.engineFactory,
getClass.getClassLoader)

  val engine = engineFactory()

  val engineParams = EngineParams(
dataSourceParams = dataSourceParams,
preparatorParams = preparatorParams,
algorithmParamsList = algorithmParamsList,
servingParams = servingParams
  )

  val engineInstanceId = CreateServer.engineInstances.insert(engineInstance)

  CoreWorkflow.runTrain(
env = envs,
params = workflowParams,
engine = engine,
engineParams = engineParams,
engineInstance = engineInstance.copy(id = engineInstanceId)
  )

  CreateServer.actorSystem.shutdown()
}


Thank you,
Tihomir


Dynamically change parameter list

2018-02-12 Thread Tihomir Lolić
Hi,

I am trying to figure out how to dynamically update algorithm parameter
list. After the train is finished only model is updated. The reason why I
need this data to be updated is that I am creating data mapping based on
the training data. Is there a way to update this data after the train is
done?

Here is the code that I am using. The variable that and should be updated
after the train is marked *bold red.*

import io.prediction.controller.{EmptyParams, EngineParams}
import io.prediction.data.storage.EngineInstance
import io.prediction.workflow.CreateWorkflow.WorkflowConfig
import io.prediction.workflow._
import org.apache.spark.ml.linalg.SparseVector
import org.joda.time.DateTime
import org.json4s.JsonAST._

import scala.collection.mutable

object TrainApp extends App {

  val envs = Map("FOO" -> "BAR")

  val sparkEnv = Map("spark.master" -> "local")

  val sparkConf = Map("spark.executor.extraClassPath" -> ".")

  val engineFactoryName = "LogisticRegressionEngine"

  val workflowConfig = WorkflowConfig(
engineId = EngineConfig.engineId,
engineVersion = EngineConfig.engineVersion,
engineVariant = EngineConfig.engineVariantId,
engineFactory = engineFactoryName
  )

  val workflowParams = WorkflowParams(
verbose = workflowConfig.verbosity,
skipSanityCheck = workflowConfig.skipSanityCheck,
stopAfterRead = workflowConfig.stopAfterRead,
stopAfterPrepare = workflowConfig.stopAfterPrepare,
sparkEnv = WorkflowParams().sparkEnv ++ sparkEnv
  )

  WorkflowUtils.modifyLogging(workflowConfig.verbose)

  val dataSourceParams = DataSourceParams(sys.env.get("APP_NAME").get)
  val preparatorParams = EmptyParams()

 * val algorithmParamsList = Seq("Logistic" -> LogisticParams(columns =
Array[String](),*
*  dataMapping
= Map[String, Map[String, SparseVector]]()))*
  val servingParams = EmptyParams()

  val engineInstance = EngineInstance(
id = "",
status = "INIT",
startTime = DateTime.now,
endTime = DateTime.now,
engineId = workflowConfig.engineId,
engineVersion = workflowConfig.engineVersion,
engineVariant = workflowConfig.engineVariant,
engineFactory = workflowConfig.engineFactory,
batch = workflowConfig.batch,
env = envs,
sparkConf = sparkConf,
dataSourceParams =
JsonExtractor.paramToJson(workflowConfig.jsonExtractor,
workflowConfig.engineParamsKey -> dataSourceParams),
preparatorParams =
JsonExtractor.paramToJson(workflowConfig.jsonExtractor,
workflowConfig.engineParamsKey -> preparatorParams),
algorithmsParams =
JsonExtractor.paramsToJson(workflowConfig.jsonExtractor,
algorithmParamsList),
servingParams = JsonExtractor.paramToJson(workflowConfig.jsonExtractor,
workflowConfig.engineParamsKey -> servingParams)
  )

  val (engineLanguage, engineFactory) =
WorkflowUtils.getEngine(engineInstance.engineFactory,
getClass.getClassLoader)

  val engine = engineFactory()

  val engineParams = EngineParams(
dataSourceParams = dataSourceParams,
preparatorParams = preparatorParams,
algorithmParamsList = algorithmParamsList,
servingParams = servingParams
  )

  val engineInstanceId = CreateServer.engineInstances.insert(engineInstance)

  CoreWorkflow.runTrain(
env = envs,
params = workflowParams,
engine = engine,
engineParams = engineParams,
engineInstance = engineInstance.copy(id = engineInstanceId)
  )

  CreateServer.actorSystem.shutdown()
}


Thank you,
Tihomir