Best range of parameters for grid search?

2016-08-24 Thread Adamantios Corais
I would like to run a naive implementation of grid search with MLlib but I am a bit confused about choosing the 'best' range of parameters. Apparently, I do not want to waste too much resources for a combination of parameters that will probably not give a better mode. Any suggestions from your

Re: Grid Search using Spark MLLib Pipelines

2016-08-12 Thread Adamantios Corais
e of this cvModel.save("/my/path") On Fri, Aug 12, 2016 at 9:17 AM, Adamantios Corais mailto:adamantios.cor...@gmail.com>> wrote: Hi, Assuming that I have run the following pipeline and have got the best logistic regression model. How can I then save that mo

Grid Search using Spark MLLib Pipelines

2016-08-12 Thread Adamantios Corais
Hi, Assuming that I have run the following pipeline and have got the best logistic regression model. How can I then save that model for later use? The following command throws an error: cvModel.bestModel.save("/my/path") Also, is it possible to get the error (a collection of) for each combina

Re: How to compute the probability of each class in Naive Bayes

2015-09-10 Thread Adamantios Corais
zPi + brzTheta * testData.toBreeze -- apply exp(x). > > I have forgotten whether the probabilities are normalized already > though. If not you'll have to normalize to get them to sum to 1 and be > real class probabilities. This is better done in log space though. > > On Thu, Sep 1

Re: How to compute the probability of each class in Naive Bayes

2015-09-10 Thread Adamantios Corais
> > Breeze Vector. > > Pay attention the index of this Vector need to map to the corresponding > > label index. > > > > 2015-08-28 20:38 GMT+08:00 Adamantios Corais < > adamantios.cor...@gmail.com>: > >> > >> Hi, > >> > >>

How to compute the probability of each class in Naive Bayes

2015-08-28 Thread Adamantios Corais
Hi, I am trying to change the following code so as to get the probabilities of the input Vector on each class (instead of the class itself with the highest probability). I know that this is already available as part of the most recent release of Spark but I have to use Spark 1.1.0. Any help is ap

How to determine a good set of parameters for a ML grid search task?

2015-08-28 Thread Adamantios Corais
I have a sparse dataset of size 775946 x 845372. I would like to perform a grid search in order to tune the parameters of my LogisticRegressionWithSGD model. I have noticed that the building of each model takes about 300 to 400 seconds. That means that in order to try all possible combinations of p

Re: How to binarize data in spark

2015-08-07 Thread Adamantios Corais
;> Use StringIndexer in MLib1.4 : >> >> https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/ml/feature/StringIndexer.html >> >> On Thu, Aug 6, 2015 at 8:49 PM, Adamantios Corais < >> adamantios.cor...@gmail.com> wrote: >> >>> I have a set

How to binarize data in spark

2015-08-06 Thread Adamantios Corais
I have a set of data based on which I want to create a classification model. Each row has the following form: user1,class1,product1 > user1,class1,product2 > user1,class1,product5 > user2,class1,product2 > user2,class1,product5 > user3,class2,product1 > etc There are about 1M users, 2 classes, a

Cannot build "learning spark" project

2015-04-06 Thread Adamantios Corais
Hi, I am trying to build this project https://github.com/databricks/learning-spark with mvn package.This should work out of the box but unfortunately it doesn't. In fact, I get the following error: mvn pachage -X > Apache Maven 3.0.5 > Maven home: /usr/share/maven > Java version: 1.7.0_76, vendor

Re: How do I alter the combination of keys that exit the Spark shell?

2015-03-13 Thread Adamantios Corais
ation at all! Any ideas? *// Adamantios* On Fri, Mar 13, 2015 at 7:37 PM, Marcelo Vanzin wrote: > You can type ":quit". > > On Fri, Mar 13, 2015 at 10:29 AM, Adamantios Corais > wrote: > > Hi, > > > > I want change the default combination of keys tha

How do I alter the combination of keys that exit the Spark shell?

2015-03-13 Thread Adamantios Corais
Hi, I want change the default combination of keys that exit the Spark shell (i.e. CTRL + C) to something else, such as CTRL + H? Thank you in advance. *// Adamantios*

Spark (SQL) as OLAP engine

2015-02-03 Thread Adamantios Corais
Hi, After some research I have decided that Spark (SQL) would be ideal for building an OLAP engine. My goal is to push aggregated data (to Cassandra or other low-latency data storage) and then be able to project the results on a web page (web service). New data will be added (aggregated) once a da

Supported Notebooks (and other viz tools) for Spark 0.9.1?

2015-02-03 Thread Adamantios Corais
Hi, I am using Spark 0.9.1 and I am looking for a proper viz tools that supports that specific version. As far as I have seen all relevant tools (e.g. spark-notebook, zeppelin-project etc) only support 1.1 or 1.2; no mentions about older versions of Spark. Any ideas or suggestions? *// Adamantio

Re: which is the recommended workflow engine for Apache Spark jobs?

2014-11-10 Thread Adamantios Corais
use a java event as the workflow element. I am interested in anyones > experience with Luigi and/or any other tools. > > > On Mon, Nov 10, 2014 at 10:34 AM, Adamantios Corais < > adamantios.cor...@gmail.com> wrote: > >> I have some previous experience with Apache Oozie

which is the recommended workflow engine for Apache Spark jobs?

2014-11-10 Thread Adamantios Corais
I have some previous experience with Apache Oozie while I was developing in Apache Pig. Now, I am working explicitly with Apache Spark and I am looking for a tool with similar functionality. Is Oozie recommended? What about Luigi? What do you use \ recommend?

Re: return probability \ confidence instead of actual class

2014-10-11 Thread Adamantios Corais
pose it (do I hear a PR?) but it's not hard to do externally. You'll > have to do this anyway if you're on anything earlier than 1.2. > > On Wed, Oct 8, 2014 at 10:17 AM, Adamantios Corais > wrote: > > ok let me rephrase my question once again. python-wise I am pr

Re: return probability \ confidence instead of actual class

2014-10-08 Thread Adamantios Corais
ok let me rephrase my question once again. python-wise I am preferring .predict_proba(X) instead of .decision_function(X) since it is easier for me to interpret the results. as far as I can see, the latter functionality is already implemented in Spark (well, in version 0.9.2 for example I have to c

Re: return probability \ confidence instead of actual class

2014-10-07 Thread Adamantios Corais
and Rest of the classes >>> as negative one, and then use the same method provided by Aris as a measure >>> of how far Class i is from the decision boundary. >>> >>> On Wed, Sep 24, 2014 at 4:06 PM, Aris wrote: >>> >>>> Χαίρε Αδαμάντιε Κοραή..

Re: return probability \ confidence instead of actual class

2014-10-06 Thread Adamantios Corais
t;> >>> model.setThreshold(your threshold here) >>> >>> to set the threshold that separate positive predictions from negative >>> predictions. >>> >>> For more info, please take a look at >>> http://spark.apache.org/docs/latest/api/scala/index.

Re: return probability \ confidence instead of actual class

2014-09-21 Thread Adamantios Corais
Nobody? If that's not supported already, can please, at least, give me a few hints on how to implement it? Thanks! On Fri, Sep 19, 2014 at 7:43 PM, Adamantios Corais < adamantios.cor...@gmail.com> wrote: > Hi, > > I am working with the SVMWithSGD classification algorithm

return probability \ confidence instead of actual class

2014-09-19 Thread Adamantios Corais
Hi, I am working with the SVMWithSGD classification algorithm on Spark. It works fine for me, however, I would like to recognize the instances that are classified with a high confidence from those with a low one. How do we define the threshold here? Ultimately, I want to keep only those for which