Re: building spark1.2 meet error
Hi J_soft mvn do not provide tar packages by default. You got many jar files - each project has its own jar (e.g. mllib has mllib/target/spark-mllib_2.10-1.2.0.jar). However, if you want one big tar package with all dependencies - look here: https://github.com/apache/spark/tree/master/assembly And add /-Pbigtop-dist/ parameter to your mvn command -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/building-spark1-2-meet-error-tp20853p20960.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Problem with building spark-1.2.0
The error you provided says that build was unsuccessful. If you write what you did (what command you used), whole error trace - someone might be able to help you ... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Problem-with-building-spark-1-2-0-tp20961p20964.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
MLLIB and Openblas library in non-default dir
Hi I have compiled OpenBlas library into nonstandard directory and I want to inform Spark app about it via: -Dcom.github.fommil.netlib.NativeSystemBLAS.natives=/usr/local/lib/libopenblas.so which is a standard option in netlib-java (https://github.com/fommil/netlib-java) I tried 2 ways: 1. via *--conf* parameter /bin/spark-submit -v --class org.apache.spark.examples.mllib.LinearRegression *--conf -Dcom.github.fommil.netlib.NativeSystemBLAS.natives=/usr/local/lib/libopenblas.so* examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar data/mllib/sample_libsvm_data.txt/ 2. via *--driver-java-options* parameter /bin/spark-submit -v *--driver-java-options -Dcom.github.fommil.netlib.NativeSystemBLAS.natives=/usr/local/lib/libopenblas.so* --class org.apache.spark.examples.mllib.LinearRegression examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar data/mllib/sample_libsvm_data.txt / How can I force spark-submit to propagate info about non-standard placement of openblas library to netlib-java lib? thanks, Tomas -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLIB-and-Openblas-library-in-non-default-dir-tp20943.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: building spark1.2 meet error
Hi J_soft, for me it is working, I didn't put -Dscala-2.10 -X parameters. I got only one warning, since I don't have hadoop 2.5 it didn't activate this profile: /larix@kovral:~/sources/spark-1.2.0 mvn -Pyarn -Phadoop-2.5 -Dhadoop.version=2.5.0 -DskipTests clean package Found 0 infos Finished in 3 ms [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ... SUCCESS [ 14.177 s] [INFO] Spark Project Networking ... SUCCESS [ 14.670 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 9.030 s] [INFO] Spark Project Core . SUCCESS [04:42 min] [INFO] Spark Project Bagel SUCCESS [ 26.184 s] [INFO] Spark Project GraphX ... SUCCESS [01:07 min] [INFO] Spark Project Streaming SUCCESS [01:35 min] [INFO] Spark Project Catalyst . SUCCESS [01:48 min] [INFO] Spark Project SQL .. SUCCESS [01:55 min] [INFO] Spark Project ML Library ... SUCCESS [02:17 min] [INFO] Spark Project Tools SUCCESS [ 15.527 s] [INFO] Spark Project Hive . SUCCESS [01:43 min] [INFO] Spark Project REPL . SUCCESS [ 45.154 s] [INFO] Spark Project YARN Parent POM .. SUCCESS [ 3.885 s] [INFO] Spark Project YARN Stable API .. SUCCESS [01:00 min] [INFO] Spark Project Assembly . SUCCESS [ 50.812 s] [INFO] Spark Project External Twitter . SUCCESS [ 21.401 s] [INFO] Spark Project External Flume Sink .. SUCCESS [ 25.207 s] [INFO] Spark Project External Flume ... SUCCESS [ 34.734 s] [INFO] Spark Project External MQTT SUCCESS [ 22.617 s] [INFO] Spark Project External ZeroMQ .. SUCCESS [ 22.444 s] [INFO] Spark Project External Kafka ... SUCCESS [ 33.566 s] [INFO] Spark Project Examples . SUCCESS [01:23 min] [INFO] Spark Project YARN Shuffle Service . SUCCESS [ 4.873 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 23:20 min [INFO] Finished at: 2014-12-31T12:02:32+01:00 [INFO] Final Memory: 76M/855M [INFO] [WARNING] The requested profile hadoop-2.5 could not be activated because it does not exist./ If it won't work for you. I'd try to delete all sources, download source code once more and try again ... good luck, Tomas -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/building-spark1-2-meet-error-tp20853p20927.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Clustering text data with MLlib
Kmeans really needs to have identified number of clusters in advance. There are multiple algorithms (XMeans, ART,...) which do not need this information. Unfortunately, none of them is implemented in MLLib for the moment (you can give a hand and help community). Anyway, it seems to me you will not be satisfied with those algorithms(Xmeans, ART,...) either. I understood that what you want to achieve is precise number of clusters. Notice, whenever you change input parameters (random seed,...) number of clusters might be different. Clustering is great tool but it won't give you one true (one number). regards, Tomas -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Clustering-text-data-with-MLlib-tp20883p20899.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: building spark1.2 meet error
Hi, well, spark 1.2 was prepared for scala 2.10. If you want stable and fully functional tool I'd compile it this default compiler. *I was able to compile Spar 1.2 by Java 7 and scala 2.10 seamlessly.* I also tried Java8 and scala 2.11 (no -Dscala.usejavacp=true), but I failed for some other problem: /mvn -Pyarn -Phadoop-2.5 -Dhadoop.version=2.5.0 -Dscala-2.11 -X -DskipTests clean package [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ... SUCCESS [ 14.453 s] [INFO] Spark Project Core . SUCCESS [ 47.508 s] [INFO] Spark Project Bagel SUCCESS [ 3.646 s] [INFO] Spark Project GraphX ... SUCCESS [ 5.533 s] [INFO] Spark Project ML Library ... SUCCESS [ 12.715 s] [INFO] Spark Project Tools SUCCESS [ 1.854 s] [INFO] Spark Project Networking ... SUCCESS [ 6.580 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 5.290 s] [INFO] Spark Project Streaming SUCCESS [ 10.846 s] [INFO] Spark Project Catalyst . SUCCESS [ 8.296 s] [INFO] Spark Project SQL .. SUCCESS [ 12.921 s] [INFO] Spark Project Hive . SUCCESS [ 28.931 s] [INFO] Spark Project Assembly . FAILURE [01:09 min] [INFO] Spark Project External Twitter . SKIPPED [INFO] Spark Project External Flume ... SKIPPED [INFO] Spark Project External Flume Sink .. SKIPPED [INFO] Spark Project External MQTT SKIPPED [INFO] Spark Project External ZeroMQ .. SKIPPED [INFO] Spark Project Examples . SKIPPED [INFO] Spark Project REPL . SKIPPED [INFO] Spark Project YARN Parent POM .. SKIPPED [INFO] Spark Project YARN Stable API .. SKIPPED [INFO] Spark Project YARN Shuffle Service . SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 03:49 min [INFO] Finished at: 2014-12-30T12:41:59+01:00 [INFO] Final Memory: 59M/417M [INFO] [WARNING] The requested profile hadoop-2.5 could not be activated because it does not exist. [ERROR] Failed to execute goal on project spark-assembly_2.10: Could not resolve dependencies for project org.apache.spark:spark-assembly_2.10:pom:1.2.0: The following artifacts could not be resolved: org.apache.spark:spark-repl_2.11:jar:1.2.0, org.apache.spark:spark-yarn_2.11:jar:1.2.0: Could not find artifact org.apache.spark:spark-repl_2.11:jar:1.2.0 in central (https://repo1.maven.org/maven2) - [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal on project spark-assembly_2.10: Could not resolve dependencies for project org.apache.spark:spark-assembly_2.10:pom:1.2.0: The following artifacts could not be resolved: org.apache.spark:spark-repl_2.11:jar:1.2.0, org.apache.spark:spark-yarn_2.11:jar:1.2.0: Could not find artifact org.apache.spark:spark-repl_2.11:jar:1.2.0 in central (https://repo1.maven.org/maven2) at org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.getDependencies(LifecycleDependencyResolver.java:220) at org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.resolveProjectDependencies(LifecycleDependencyResolver.java:127) at org.apache.maven.lifecycle.internal.MojoExecutor.ensureDependenciesAreResolved(MojoExecutor.java:257) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:200) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:347) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:154) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:213) at
Re: Mllib native netlib-java/OpenBLAS
I'm half-way there follow 1. compiled and installed open blas library 2. ln -s libopenblas_sandybridgep-r0.2.13.so /usr/lib/libblas.so.3 3. compiled and built spark: mvn -Pnetlib-lgpl -DskipTests clean compile package So far so fine. Then I run into problems by testing the solution: bin/run-example mllib.LinearRegression data/mllib/sample_libsvm_data.txt /14/12/30 18:39:57 INFO BlockManagerMaster: Registered BlockManager 14/12/30 18:39:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/12/30 18:39:58 WARN LoadSnappy: Snappy native library not loaded Training: 80, test: 20. */usr/local/lib/jdk1.8.0//bin/java: symbol lookup error: /tmp/jniloader1826801168744171087netlib-native_system-linux-x86_64.so: undefined symbol: cblas_dscal*/ I created a issue report: https://issues.apache.org/jira/browse/SPARK-5010 any help is deeply appreciated, Tomas -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Mllib-native-netlib-java-OpenBLAS-tp19662p20912.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org