I would like to run a naive implementation of grid search with MLlib but
I am a bit confused about choosing the 'best' range of parameters.
Apparently, I do not want to waste too much resources for a combination
of parameters that will probably not give a better mode. Any suggestions
from your
e
of this
cvModel.save("/my/path")
On Fri, Aug 12, 2016 at 9:17 AM, Adamantios Corais
mailto:adamantios.cor...@gmail.com>> wrote:
Hi,
Assuming that I have run the following pipeline and have got the
best logistic regression model. How can I then save that mo
Hi,
Assuming that I have run the following pipeline and have got the best logistic
regression model. How can I then save that model for later use? The following
command throws an error:
cvModel.bestModel.save("/my/path")
Also, is it possible to get the error (a collection of) for each combina
zPi + brzTheta * testData.toBreeze -- apply exp(x).
>
> I have forgotten whether the probabilities are normalized already
> though. If not you'll have to normalize to get them to sum to 1 and be
> real class probabilities. This is better done in log space though.
>
> On Thu, Sep 1
> > Breeze Vector.
> > Pay attention the index of this Vector need to map to the corresponding
> > label index.
> >
> > 2015-08-28 20:38 GMT+08:00 Adamantios Corais <
> adamantios.cor...@gmail.com>:
> >>
> >> Hi,
> >>
> >>
Hi,
I am trying to change the following code so as to get the probabilities of
the input Vector on each class (instead of the class itself with the
highest probability). I know that this is already available as part of the
most recent release of Spark but I have to use Spark 1.1.0.
Any help is ap
I have a sparse dataset of size 775946 x 845372. I would like to perform a
grid search in order to tune the parameters of my LogisticRegressionWithSGD
model. I have noticed that the building of each model takes about 300 to
400 seconds. That means that in order to try all possible combinations of
p
;> Use StringIndexer in MLib1.4 :
>>
>> https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/ml/feature/StringIndexer.html
>>
>> On Thu, Aug 6, 2015 at 8:49 PM, Adamantios Corais <
>> adamantios.cor...@gmail.com> wrote:
>>
>>> I have a set
I have a set of data based on which I want to create a classification
model. Each row has the following form:
user1,class1,product1
> user1,class1,product2
> user1,class1,product5
> user2,class1,product2
> user2,class1,product5
> user3,class2,product1
> etc
There are about 1M users, 2 classes, a
Hi,
I am trying to build this project
https://github.com/databricks/learning-spark with mvn package.This should
work out of the box but unfortunately it doesn't. In fact, I get the
following error:
mvn pachage -X
> Apache Maven 3.0.5
> Maven home: /usr/share/maven
> Java version: 1.7.0_76, vendor
ation at all! Any ideas?
*// Adamantios*
On Fri, Mar 13, 2015 at 7:37 PM, Marcelo Vanzin wrote:
> You can type ":quit".
>
> On Fri, Mar 13, 2015 at 10:29 AM, Adamantios Corais
> wrote:
> > Hi,
> >
> > I want change the default combination of keys tha
Hi,
I want change the default combination of keys that exit the Spark shell
(i.e. CTRL + C) to something else, such as CTRL + H?
Thank you in advance.
*// Adamantios*
Hi,
After some research I have decided that Spark (SQL) would be ideal for
building an OLAP engine. My goal is to push aggregated data (to Cassandra
or other low-latency data storage) and then be able to project the results
on a web page (web service). New data will be added (aggregated) once a
da
Hi,
I am using Spark 0.9.1 and I am looking for a proper viz tools that
supports that specific version. As far as I have seen all relevant tools
(e.g. spark-notebook, zeppelin-project etc) only support 1.1 or 1.2; no
mentions about older versions of Spark. Any ideas or suggestions?
*// Adamantio
use a java event as the workflow element. I am interested in anyones
> experience with Luigi and/or any other tools.
>
>
> On Mon, Nov 10, 2014 at 10:34 AM, Adamantios Corais <
> adamantios.cor...@gmail.com> wrote:
>
>> I have some previous experience with Apache Oozie
I have some previous experience with Apache Oozie while I was developing in
Apache Pig. Now, I am working explicitly with Apache Spark and I am looking
for a tool with similar functionality. Is Oozie recommended? What about
Luigi? What do you use \ recommend?
pose it (do I hear a PR?) but it's not hard to do externally. You'll
> have to do this anyway if you're on anything earlier than 1.2.
>
> On Wed, Oct 8, 2014 at 10:17 AM, Adamantios Corais
> wrote:
> > ok let me rephrase my question once again. python-wise I am pr
ok let me rephrase my question once again. python-wise I am preferring
.predict_proba(X) instead of .decision_function(X) since it is easier for
me to interpret the results. as far as I can see, the latter functionality
is already implemented in Spark (well, in version 0.9.2 for example I have
to c
and Rest of the classes
>>> as negative one, and then use the same method provided by Aris as a measure
>>> of how far Class i is from the decision boundary.
>>>
>>> On Wed, Sep 24, 2014 at 4:06 PM, Aris wrote:
>>>
>>>> Χαίρε Αδαμάντιε Κοραή..
t;>
>>> model.setThreshold(your threshold here)
>>>
>>> to set the threshold that separate positive predictions from negative
>>> predictions.
>>>
>>> For more info, please take a look at
>>> http://spark.apache.org/docs/latest/api/scala/index.
Nobody?
If that's not supported already, can please, at least, give me a few hints
on how to implement it?
Thanks!
On Fri, Sep 19, 2014 at 7:43 PM, Adamantios Corais <
adamantios.cor...@gmail.com> wrote:
> Hi,
>
> I am working with the SVMWithSGD classification algorithm
Hi,
I am working with the SVMWithSGD classification algorithm on Spark. It
works fine for me, however, I would like to recognize the instances that
are classified with a high confidence from those with a low one. How do we
define the threshold here? Ultimately, I want to keep only those for which
22 matches
Mail list logo