Re: New features (Discretization) for v1.x in xiangrui.pdf

2014-09-03 Thread filipus
howto install? just clone by git clone
https://github.com/apache/spark/pull/216 the code and than sbt package?

is it the same as https://github.com/LIDIAgroup/SparkFeatureSelection ???

or something different

filip



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/New-features-Discretization-for-v1-x-in-xiangrui-pdf-tp13256p13338.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



New features (Discretization) for v1.x in xiangrui.pdf

2014-09-02 Thread filipus
is there any news about Discretization in spark?

is there anything on git? i didnt find yet



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/New-features-Discretization-for-v1-x-in-xiangrui-pdf-tp13256.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: New features (Discretization) for v1.x in xiangrui.pdf

2014-09-02 Thread filipus
i guess i found it

https://github.com/LIDIAgroup/SparkFeatureSelection



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/New-features-Discretization-for-v1-x-in-xiangrui-pdf-tp13256p13261.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Time series forecasting

2014-09-01 Thread filipus
i guess it is not a question of spark but a question on your dataset you need
to Setup

think about what you wonna model and how you can shape the data in such a
way spark can use it

akima is a technique i know

a_{t+1} = C1 * a_{t} + C2* a_{t-1} + ... + C6 * a_{t-5}

spark can finde the cofficients C1-C6 by regregression I guess 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Time-series-forecasting-tp13236p13239.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



numpy digitize

2014-08-31 Thread filipus
hi Folks

is there a function in spark like numpy digitize with discretize a
numerical variable

or even better

is there a way to use the functionality of the decission tree builder of
spark mllib which splits data into bins in such a way that the splitted
variable mostly predict the target value (Label)

could be useful for logistic Regression because it (linearization) makes
models kind of stable in a way

some People would refer it to weight of evidence modeling



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/numpy-digitize-tp13212.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: org.apache.spark.examples.xxx

2014-08-30 Thread filipus
i try to get use to sbt in order to build stnd allone application by myself

the example SimpleApp i managed to run

than i tried to copy some example scala program like LinearRegression in a
local directory

.
./build.sbt
./src
./src/main
./src/main/scala
./src/main/scala/LinearRegression.scala

my build.sbt even when I dont know what I do looks like

name := Linear Regression

version := 1.0

scalaVersion := 2.10.4

libraryDependencies += org.apache.spark %% spark-core % 1.0.2

libraryDependencies += org.apache.spark %% spark-mllib % 1.0.2

libraryDependencies += com.github.scopt %% scopt % 3.2.0

resolvers += Akka Repository at http://repo.akka.io/releases/;

By the way... first i tried scalaVersion := 2.11.2 which is my installed
version. but this faild

...

sbt package builds a jar file in target but the command

spark-submit --class LinearRegression --master local[2]
target/scala-2.10/linear-regression_2.10-1.0.jar
~/git/spark/data/mllib/sample_linear_regression_data.txt

didnt work. it tells me

Spark assembly has been built with Hive, including Datanucleus jars on
classpath
Exception in thread main java.lang.NoClassDefFoundError:
scopt/OptionParser
at java.lang.Class.getDeclaredMethods0(Native Method)

AHHH: I comented /*package org.apache.spark.examples.mllib*/ in
LinearRegression.scala because otherwise it doesnt find the main class
Exception in thread main java.lang.ClassNotFoundException:
LinearRegression

when I does the same with the pre build jar package of examples everything
works fine

spark-submit --class  org.apache.spark.examples.mllib.LinearRegression
--master local[2] lib/spark-examples-1.0.2-hadoop2.2.0.jar
~/git/spark/data/mllib/sample_linear_regression_data.txt

works !!! 

spark-submit --class  org.apache.spark.examples.mllib.LinearRegression
--master local[2] lib/spark-examples-1.0.2-hadoop2.2.0.jar
~/git/spark/data/mllib/sample_linear_regression_data.txt



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-spark-examples-xxx-tp13052p13178.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: org.apache.spark.examples.xxx

2014-08-30 Thread filipus
compilation works but execution not at least with spark-submit as I described
above

when I make a local copy of the training set I can execute sbt run file
which works

sbt run sample_linear_regression_data.txt

when I do

sbt run ~/git/spark/data/mllib/sample_linear_regression_data.txt

the program fails because it doesnt find any traning set at

[error] (run-main-0) org.apache.hadoop.mapred.InvalidInputException: Input
path does not exist:
file:*/home/filip/spark-ex-regression/*~/git/spark/data/mllib/sample_linear_regression_data.txt
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
file:/home/filip/spark-ex-regression/~/git/spark/data/mllib/sample_linear_regression_data.txt

ps: does anybody knows where in the program LinearRegression.scala it
specifies the PATH or has it to do with sbt???



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-spark-examples-xxx-tp13052p13180.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: org.apache.spark.examples.xxx

2014-08-30 Thread filipus
ok I see :-)

.. instead of ~ works fine so

do you know the reason

sbt run [options] works 

after sbt package 

but 

spark-submit --class ClassName --master local[2]
target/scala/JarPackage.jar [options]

doesnt?

it cannot resolve everything somehow



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-spark-examples-xxx-tp13052p13182.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



sbt package assembly run spark examples

2014-08-28 Thread filipus
hi guys,

can someone explain or give the stupid user like me a link where i can get
the right usage of sbt and spark in order to run the examples as a stand
alone app

I got to the point running the app by sbt run path-to-the-data but still
get some error because i probably didnt tell the app that the master is
local (--master local) in the SparkContext method

in the BinaryClassification.scala programm it is set by

val conf = new SparkConf().setAppName(sBinaryClassification with $params)

so... how to adapt the code

in the docu it is written

val sc = new SparkContext(local, Simple App, YOUR_SPARK_HOME,
  List(target/scala-2.10/simple-project_2.10-1.0.jar))

I got the following error

 sbt run ~/git/spark/data/mllib/sample_binary_classification_data.txt
[info] Set current project to Simple Project (in build
file:/home/filip/spark-sample/)
[info] Running BinaryClassification
~/git/spark/data/mllib/sample_binary_classification_data.txt
[error] (run-main-0) org.apache.spark.SparkException: A master URL must be
set in your configuration
org.apache.spark.SparkException: A master URL must be set in your
configuration
at org.apache.spark.SparkContext.init(SparkContext.scala:166)
at BinaryClassification$.run(BinaryClassification.scala:107)
at
BinaryClassification$$anonfun$main$1.apply(BinaryClassification.scala:99)
at
BinaryClassification$$anonfun$main$1.apply(BinaryClassification.scala:98)
at scala.Option.map(Option.scala:145)
at BinaryClassification$.main(BinaryClassification.scala:98)
at BinaryClassification.main(BinaryClassification.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
[trace] Stack trace suppressed: run last compile:run for the full output.
java.lang.RuntimeException: Nonzero exit code: 1
at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) Nonzero exit code: 1
[error] Total time: 7 s, completed Aug 28, 2014 11:04:44 AM




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/sbt-package-assembly-run-spark-examples-tp13000.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: sbt package assembly run spark examples

2014-08-28 Thread filipus
got it when I read the class refference

https://spark.apache.org/docs/0.9.1/api/core/index.html#org.apache.spark.SparkConf

conf.setMaster(local[2])

set the master to local with 2 threads

but still get some warnings and the result (see below) is also not right i
think

ps: by the way ... first i run like this

sbt run ~/git/spark/data/mllib/sample_binary_classification_data.txt

but the app didnt find the file because it startet at the local directory an
pointed to

/home/filip/spark-sample/~/git/spark/data/mllib/sample_binary_classification_data.txt

any explanation???

ps2: i guess many of us from the user side have problems with scala sbt and
the class lib. has any of you a suggestion how i can overcome this. it is
pretty time consuming trial and error :-( 



14/08/28 11:47:37 INFO ui.SparkUI: Started SparkUI at
http://filip-VirtualBox.localdomain:4040
14/08/28 11:47:41 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/08/28 11:47:41 WARN snappy.LoadSnappy: Snappy native library not loaded
Training: 84, test: 16.
14/08/28 11:47:47 WARN netlib.BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemBLAS
14/08/28 11:47:47 WARN netlib.BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeRefBLAS
Test areaUnderPR = 1.0.
Test areaUnderROC = 1.0.






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/sbt-package-assembly-run-spark-examples-tp13000p13001.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



org.apache.spark.examples.xxx

2014-08-28 Thread filipus
hey guys

i still try to get used to compile and run the example code

why does the run_example code submit the class with an
org.apache.spark.examples in front of the class itself?

probably a stupid question but i would be glad some one of you explains

by the way.. how was the spark...example...jar file build? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-spark-examples-xxx-tp13052.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



feature space search

2014-08-09 Thread filipus
i am wondering if i can use spark in order to search for interesting
featrures/attributes for modelling. In fact I just come from some
introductional sites about vowpal wabbit. i some how like the idea of out of
the core modelling.

well, i have transactional data where customers purchased products with
unique article numbers and a huge table of customers treatments like
coupons, prospects and so on. each articles has some properties or attribute
wich is written in another table.

in fact i would like to join the tables on the fly as in input for a mashine
learning platform like spark or vowpal wabbit just to get a ranking of good
attribte for modelling.

does somebody knows how to do it?

all the best :-)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/feature-space-search-tp11838.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: pmml with augustus

2014-06-12 Thread filipus
@villu: thank you for your help. In prommis I gonna try it! thats cools :-)
do you know also the other way around from pmml to a model object in spark?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/pmml-with-augustus-tp7313p7473.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


pmml with augustus

2014-06-10 Thread filipus
hello guys,

has anybody experiances with the library augustus as a serializer for
scoring models?

looks very promising and i even found a hint on the connection augustus and
spark

all the best



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/pmml-with-augustus-tp7313.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.