Re: building spark1.2 meet error

2015-01-04 Thread xhudik
Hi J_soft

mvn do not provide tar packages by default. You got many jar files - each
project has its own jar (e.g. mllib has
mllib/target/spark-mllib_2.10-1.2.0.jar). 
However, if you want one big tar package with all dependencies - look here:
https://github.com/apache/spark/tree/master/assembly
And add /-Pbigtop-dist/ parameter to your mvn command



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/building-spark1-2-meet-error-tp20853p20960.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Problem with building spark-1.2.0

2015-01-04 Thread xhudik
The error you provided says that build was unsuccessful. If you write what
you did (what command you used), whole error trace - someone might be able
to help you ...



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-with-building-spark-1-2-0-tp20961p20964.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



MLLIB and Openblas library in non-default dir

2015-01-02 Thread xhudik
Hi
I have compiled OpenBlas library into nonstandard directory and I want to
inform Spark app about it via:
-Dcom.github.fommil.netlib.NativeSystemBLAS.natives=/usr/local/lib/libopenblas.so
which is a standard option in netlib-java
(https://github.com/fommil/netlib-java)

I tried 2 ways:
1. via *--conf* parameter
/bin/spark-submit -v  --class
org.apache.spark.examples.mllib.LinearRegression *--conf
-Dcom.github.fommil.netlib.NativeSystemBLAS.natives=/usr/local/lib/libopenblas.so*
 
examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar  
data/mllib/sample_libsvm_data.txt/

2. via *--driver-java-options* parameter
/bin/spark-submit -v *--driver-java-options
-Dcom.github.fommil.netlib.NativeSystemBLAS.natives=/usr/local/lib/libopenblas.so*
 
--class org.apache.spark.examples.mllib.LinearRegression 
examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar  
data/mllib/sample_libsvm_data.txt
/

How can I force spark-submit to propagate info about non-standard placement
of openblas library to netlib-java lib?

thanks, Tomas



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-and-Openblas-library-in-non-default-dir-tp20943.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: building spark1.2 meet error

2014-12-31 Thread xhudik
Hi J_soft,

for me it is working, I didn't put -Dscala-2.10 -X parameters. I got only
one warning, since I don't have hadoop 2.5 it didn't activate this profile:
/larix@kovral:~/sources/spark-1.2.0 mvn -Pyarn -Phadoop-2.5
-Dhadoop.version=2.5.0 -DskipTests clean package


Found 0 infos
Finished in 3 ms
[INFO]

[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM ... SUCCESS [ 14.177
s]
[INFO] Spark Project Networking ... SUCCESS [ 14.670
s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  9.030
s]
[INFO] Spark Project Core . SUCCESS [04:42
min]
[INFO] Spark Project Bagel  SUCCESS [ 26.184
s]
[INFO] Spark Project GraphX ... SUCCESS [01:07
min]
[INFO] Spark Project Streaming  SUCCESS [01:35
min]
[INFO] Spark Project Catalyst . SUCCESS [01:48
min]
[INFO] Spark Project SQL .. SUCCESS [01:55
min]
[INFO] Spark Project ML Library ... SUCCESS [02:17
min]
[INFO] Spark Project Tools  SUCCESS [ 15.527
s]
[INFO] Spark Project Hive . SUCCESS [01:43
min]
[INFO] Spark Project REPL . SUCCESS [ 45.154
s]
[INFO] Spark Project YARN Parent POM .. SUCCESS [  3.885
s]
[INFO] Spark Project YARN Stable API .. SUCCESS [01:00
min]
[INFO] Spark Project Assembly . SUCCESS [ 50.812
s]
[INFO] Spark Project External Twitter . SUCCESS [ 21.401
s]
[INFO] Spark Project External Flume Sink .. SUCCESS [ 25.207
s]
[INFO] Spark Project External Flume ... SUCCESS [ 34.734
s]
[INFO] Spark Project External MQTT  SUCCESS [ 22.617
s]
[INFO] Spark Project External ZeroMQ .. SUCCESS [ 22.444
s]
[INFO] Spark Project External Kafka ... SUCCESS [ 33.566
s]
[INFO] Spark Project Examples . SUCCESS [01:23
min]
[INFO] Spark Project YARN Shuffle Service . SUCCESS [  4.873
s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 23:20 min
[INFO] Finished at: 2014-12-31T12:02:32+01:00
[INFO] Final Memory: 76M/855M
[INFO]

[WARNING] The requested profile hadoop-2.5 could not be activated because
it does not exist./


If it won't work for you. I'd try to delete all sources, download source
code once more and try again ...

good luck, Tomas




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/building-spark1-2-meet-error-tp20853p20927.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Clustering text data with MLlib

2014-12-30 Thread xhudik
Kmeans really needs to have identified number of clusters in advance. There
are multiple algorithms (XMeans, ART,...) which do not need this
information. Unfortunately, none of them is implemented in MLLib for the
moment (you can give a hand and help community).

Anyway, it seems to me you will not be satisfied with those
algorithms(Xmeans, ART,...) either. I understood that what you want to
achieve is precise number of clusters. Notice, whenever you change input
parameters (random seed,...) number of clusters might be different.
Clustering is great tool but it won't give you one true (one number).


regards, Tomas



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Clustering-text-data-with-MLlib-tp20883p20899.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: building spark1.2 meet error

2014-12-30 Thread xhudik
Hi,
well, spark 1.2 was prepared for scala 2.10. If you want stable and fully
functional tool I'd compile it this default compiler.

*I was able to compile Spar 1.2 by Java 7 and scala 2.10 seamlessly.*

I also tried Java8 and scala 2.11 (no -Dscala.usejavacp=true), but I failed
for some other problem:

/mvn -Pyarn -Phadoop-2.5 -Dhadoop.version=2.5.0 -Dscala-2.11 -X -DskipTests
clean package 
[INFO]

[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM ... SUCCESS [ 14.453
s]
[INFO] Spark Project Core . SUCCESS [ 47.508
s]
[INFO] Spark Project Bagel  SUCCESS [  3.646
s]
[INFO] Spark Project GraphX ... SUCCESS [  5.533
s]
[INFO] Spark Project ML Library ... SUCCESS [ 12.715
s]
[INFO] Spark Project Tools  SUCCESS [  1.854
s]
[INFO] Spark Project Networking ... SUCCESS [  6.580
s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  5.290
s]
[INFO] Spark Project Streaming  SUCCESS [ 10.846
s]
[INFO] Spark Project Catalyst . SUCCESS [  8.296
s]
[INFO] Spark Project SQL .. SUCCESS [ 12.921
s]
[INFO] Spark Project Hive . SUCCESS [ 28.931
s]
[INFO] Spark Project Assembly . FAILURE [01:09
min]
[INFO] Spark Project External Twitter . SKIPPED
[INFO] Spark Project External Flume ... SKIPPED
[INFO] Spark Project External Flume Sink .. SKIPPED
[INFO] Spark Project External MQTT  SKIPPED
[INFO] Spark Project External ZeroMQ .. SKIPPED
[INFO] Spark Project Examples . SKIPPED
[INFO] Spark Project REPL . SKIPPED
[INFO] Spark Project YARN Parent POM .. SKIPPED
[INFO] Spark Project YARN Stable API .. SKIPPED
[INFO] Spark Project YARN Shuffle Service . SKIPPED
[INFO]

[INFO] BUILD FAILURE
[INFO]

[INFO] Total time: 03:49 min
[INFO] Finished at: 2014-12-30T12:41:59+01:00
[INFO] Final Memory: 59M/417M
[INFO]

[WARNING] The requested profile hadoop-2.5 could not be activated because
it does not exist.
[ERROR] Failed to execute goal on project spark-assembly_2.10: Could not
resolve dependencies for project
org.apache.spark:spark-assembly_2.10:pom:1.2.0: The following artifacts
could not be resolved: org.apache.spark:spark-repl_2.11:jar:1.2.0,
org.apache.spark:spark-yarn_2.11:jar:1.2.0: Could not find artifact
org.apache.spark:spark-repl_2.11:jar:1.2.0 in central
(https://repo1.maven.org/maven2) - [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
goal on project spark-assembly_2.10: Could not resolve dependencies for
project org.apache.spark:spark-assembly_2.10:pom:1.2.0: The following
artifacts could not be resolved: org.apache.spark:spark-repl_2.11:jar:1.2.0,
org.apache.spark:spark-yarn_2.11:jar:1.2.0: Could not find artifact
org.apache.spark:spark-repl_2.11:jar:1.2.0 in central
(https://repo1.maven.org/maven2)
at
org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.getDependencies(LifecycleDependencyResolver.java:220)
at
org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.resolveProjectDependencies(LifecycleDependencyResolver.java:127)
at
org.apache.maven.lifecycle.internal.MojoExecutor.ensureDependenciesAreResolved(MojoExecutor.java:257)
at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:200)
at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
at
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
at
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
at
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:347)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:154)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:213)
at 

Re: Mllib native netlib-java/OpenBLAS

2014-12-30 Thread xhudik
I'm half-way there
follow
1. compiled and installed open blas library
2. ln -s libopenblas_sandybridgep-r0.2.13.so /usr/lib/libblas.so.3
3. compiled and built spark:
mvn -Pnetlib-lgpl -DskipTests clean compile package

So far so fine. Then I run into problems by testing the solution:
bin/run-example mllib.LinearRegression data/mllib/sample_libsvm_data.txt

/14/12/30 18:39:57 INFO BlockManagerMaster: Registered BlockManager
14/12/30 18:39:58 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/12/30 18:39:58 WARN LoadSnappy: Snappy native library not loaded
Training: 80, test: 20.
*/usr/local/lib/jdk1.8.0//bin/java: symbol lookup error:
/tmp/jniloader1826801168744171087netlib-native_system-linux-x86_64.so:
undefined symbol: cblas_dscal*/


I created a issue report:
https://issues.apache.org/jira/browse/SPARK-5010

any help is deeply appreciated, Tomas




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Mllib-native-netlib-java-OpenBLAS-tp19662p20912.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org