Re: Spark performance in cluster mode using yarn
Hi Ayan, I am asking general scenarios as per given info/configuration, from experts, not specific, java code is nothing get hive context and select query, there is no serialization or any other complex things I kept,straight forward, 10 lines of code, Group Please suggest if any Idea, Regards Sachin On Fri, May 15, 2015 at 6:57 AM, ayan guha guha.a...@gmail.com wrote: With this information it is hard to predict. What's the performance you are getting? What's your desired performance? Maybe you can post your code and experts can suggests improvement? On 14 May 2015 15:02, sachin Singh sachin.sha...@gmail.com wrote: Hi Friends, please someone can give the idea, Ideally what should be time(complete job execution) for spark job, I have data in a hive table, amount of data would be 1GB , 2 lacs rows for whole month, I want to do monthly aggregation, using SQL queries,groupby I have only one node,1 cluster,below configuration for running job, --num-executors 2 --driver-memory 3g --driver-java-options -XX:MaxPermSize=1G --executor-memory 2g --executor-cores 2 how much approximate time require to finish the job, or can someone suggest the best way to get quickly results, Thanks in advance, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-performance-in-cluster-mode-using-yarn-tp22877.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark performance in cluster mode using yarn
Hi Friends, please someone can give the idea, Ideally what should be time(complete job execution) for spark job, I have data in a hive table, amount of data would be 1GB , 2 lacs rows for whole month, I want to do monthly aggregation, using SQL queries,groupby I have only one node,1 cluster,below configuration for running job, --num-executors 2 --driver-memory 3g --driver-java-options -XX:MaxPermSize=1G --executor-memory 2g --executor-cores 2 how much approximate time require to finish the job, or can someone suggest the best way to get quickly results, Thanks in advance, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-performance-in-cluster-mode-using-yarn-tp22877.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
spark yarn-cluster job failing in batch processing
Hi All, I am trying to execute batch processing in yarn-cluster mode i.e. I have many sql insert queries,based on argument provided it will it will fetch the queries ,create context , schema RDD and insert in hive tables, Please Note- in standalone mode its working and in cluster mode working is I configured one query,also I have configured yarn.nodemanager.delete.debug-sec = 600 I am using below command- spark-submit --jars ./analiticlibs/utils-common-1.0.0.jar,./analiticlibs/mysql-connector-java-5.1.17.jar,./analiticlibs/log4j-1.2.17.jar --files datasource.properties,log4j.properties,hive-site.xml --deploy-mode cluster --master yarn --num-executors 1 --driver-memory 2g --driver-java-options -XX:MaxPermSize=1G --executor-memory 1g --executor-cores 1 --class com.java.analitics.jobs.StandaloneAggregationJob sparkanalitics-1.0.0.jar daily_agg 2015-04-21 Exception from Container log- Exception in thread Driver java.lang.ArrayIndexOutOfBoundsException: 2 at com.java.analitics.jobs.StandaloneAggregationJob.main(StandaloneAggregationJob.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:427) exception in our exception log file- diagnostics: Application application_1429800386537_0001 failed 2 times due to AM Container for appattempt_1429800386537_0001_02 exited with exitCode: 15 due to: Exception from container-launch. Container id: container_1429800386537_0001_02_01 Exit code: 15 Stack trace: ExitCodeException exitCode=15: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:197) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 15 .Failing this attempt.. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: root.hdfs start time: 1429800525569 final status: FAILED tracking URL: http://tejas.alcatel.com:8088/cluster/app/application_1429800386537_0001 user: hdfs 2015-04-23 20:19:27 DEBUG Client - stopping client from cache: org.apache.hadoop.ipc.Client@12f5f40b 2015-04-23 20:19:27 DEBUG Utils - Shutdown hook called need urgent support, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-yarn-cluster-job-failing-in-batch-processing-tp22626.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database
Hi Linlin, have you got the solution for this issue, if yes then what are the thing need to make correct,because I am also getting same error,when submitting spark job in cluster mode getting error as under - 2015-04-14 18:16:43 DEBUG Transaction - Transaction rolled back in 0 ms 2015-04-14 18:16:43 ERROR DDLTask - org.apache.hadoop.hive.ql.metadata.HiveException: Database does not exist: my_database at org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:4054) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:269) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java ... Please suggest, I have copied hive-site.xml in spark/conf in standalone its working fine. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-sql-failed-in-yarn-cluster-mode-when-connecting-to-non-default-hive-database-tp11811p22486.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
ExceptionDriver-Memory while running Spark job on Yarn-cluster
Hi , When I am submitting spark job as --master yarn-cluster with below command/options getting driver memory error- spark-submit --jars ./libs/mysql-connector-java-5.1.17.jar,./libs/log4j-1.2.17.jar --files datasource.properties,log4j.properties --master yarn-cluster --num-executors 1 --driver-memory 2g --executor-memory 512m --class com.test.spark.jobs.AggregationJob sparkagg.jar Exceptions as per yarn application ID log as under - Container: container_1428938273236_0006_01_01 on mycom.hostname.com_8041 = LogType: stderr LogLength: 128 Log Contents: Exception in thread Driver Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread Driver LogType: stdout LogLength: 40 Container: container_1428938273236_0006_02_01 on mycom.hostname.com_8041 = LogType: stderr LogLength: 1365 Log Contents: java.io.IOException: Log directory hdfs://mycom.hostname.com:8020/user/spark/applicationHistory/application_1428938273236_0006 already exists! at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129) at org.apache.spark.util.FileLogger.start(FileLogger.scala:115) at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74) at org.apache.spark.SparkContext.init(SparkContext.scala:353) at org.apache.spark.api.java.JavaSparkContext.init(JavaSparkContext.scala:61) LogType: stdout LogLength: 40 please help its urgent for me, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Exception-Driver-Memory-while-running-Spark-job-on-Yarn-cluster-tp22475.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
need info on Spark submit on yarn-cluster mode
Hi , I observed that we have installed only one cluster, and submiting job as yarn-cluster then getting below error, so is this cause that installation is only one cluster? Please correct me, if this is not cause then why I am not able to run in cluster mode, spark submit command is - spark-submit --jars some dependent jars... --master yarn --class com.java.jobs.sparkAggregation mytest-1.0.0.jar 2015-04-08 19:16:50 INFO Client - Application report for application_1427895906171_0087 (state: FAILED) 2015-04-08 19:16:50 DEBUG Client - client token: N/A diagnostics: Application application_1427895906171_0087 failed 2 times due to AM Container for appattempt_1427895906171_0087_02 exited with exitCode: 15 due to: Exception from container-launch. Container id: container_1427895906171_0087_02_01 Exit code: 15 Stack trace: ExitCodeException exitCode=15: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:197) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 15 .Failing this attempt.. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: root.hdfs start time: 1428500770818 final status: FAILED Exception in thread main org.apache.spark.SparkException: Application finished with failed status at org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:509) at org.apache.spark.deploy.yarn.Client.run(Client.scala:35) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/need-info-on-Spark-submit-on-yarn-cluster-mode-tp22420.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
issue while submitting Spark Job as --master yarn-cluster
Hi , when I am submitting spark job in cluster mode getting error as under in hadoop-yarn log, someone has any idea,please suggest, 2015-03-25 23:35:22,467 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1427124496008_0028 State change from FINAL_SAVING to FAILED 2015-03-25 23:35:22,467 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hdfs OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1427124496008_0028 failed 2 times due to AM Container for appattempt_1427124496008_0028_02 exited with exitCode: 13 due to: Exception from container-launch. Container id: container_1427124496008_0028_02_01 Exit code: 13 Stack trace: ExitCodeException exitCode=13: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:197) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 13 .Failing this attempt.. Failing the application. APPID=application_1427124496008_0028 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/issue-while-submitting-Spark-Job-as-master-yarn-cluster-tp0.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: issue while submitting Spark Job as --master yarn-cluster
OS I am using Linux, when I will run simply as master yarn, its running fine, Regards Sachin On Wed, Mar 25, 2015 at 4:25 PM, Xi Shen davidshe...@gmail.com wrote: What is your environment? I remember I had similar error when running spark-shell --master yarn-client in Windows environment. On Wed, Mar 25, 2015 at 9:07 PM sachin Singh sachin.sha...@gmail.com wrote: Hi , when I am submitting spark job in cluster mode getting error as under in hadoop-yarn log, someone has any idea,please suggest, 2015-03-25 23:35:22,467 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1427124496008_0028 State change from FINAL_SAVING to FAILED 2015-03-25 23:35:22,467 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hdfs OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1427124496008_0028 failed 2 times due to AM Container for appattempt_1427124496008_0028_02 exited with exitCode: 13 due to: Exception from container-launch. Container id: container_1427124496008_0028_02_01 Exit code: 13 Stack trace: ExitCodeException exitCode=13: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor. launchContainer(DefaultContainerExecutor.java:197) at org.apache.hadoop.yarn.server.nodemanager.containermanager. launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager. launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 13 .Failing this attempt.. Failing the application. APPID=application_1427124496008_0028 -- View this message in context: http://apache-spark-user-list. 1001560.n3.nabble.com/issue-while-submitting-Spark-Job-as- master-yarn-cluster-tp0.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: issue while creating spark context
thanks Sean, please can you suggest in which file or configuration I need to modify proper path, please elaborate which may help, thanks, Regards Sachin On Tue, Mar 24, 2015 at 7:15 PM, Sean Owen so...@cloudera.com wrote: That's probably the problem; the intended path is on HDFS but the configuration specifies a local path. See the exception message. On Tue, Mar 24, 2015 at 1:08 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Its in your local file system, not in hdfs. Thanks Best Regards On Tue, Mar 24, 2015 at 6:25 PM, Sachin Singh sachin.sha...@gmail.com wrote: hi, I can see required permission is granted for this directory as under, hadoop dfs -ls /user/spark DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Found 1 items drwxrwxrwt - spark spark 0 2015-03-20 01:04 /user/spark/applicationHistory regards Sachin
Re: issue while creating spark context
Hi Akhil, thanks for your quick reply, I would like to request please elaborate i.e. what kind of permission required .. thanks in advance, Regards Sachin On Tue, Mar 24, 2015 at 5:29 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Its an IOException, just make sure you are having the correct permission over */user/spark* directory. Thanks Best Regards On Tue, Mar 24, 2015 at 5:21 PM, sachin Singh sachin.sha...@gmail.com wrote: hi all, all of sudden I getting below error when I am submitting spark job using master as yarn its not able to create spark context,previously working fine, I am using CDH5.3.1 and creating javaHiveContext spark-submit --jars ./analiticlibs/mysql-connector-java-5.1.17.jar,./analiticlibs/log4j-1.2.17.jar --master yarn --class myproject.com.java.jobs.Aggregationtask sparkjob-1.0.jar error message- java.io.IOException: Error in creating log directory: file:/user/spark/applicationHistory/application_1427194309307_0005 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:133) at org.apache.spark.util.FileLogger.start(FileLogger.scala:115) at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74) at org.apache.spark.SparkContext.init(SparkContext.scala:353) at org.apache.spark.api.java.JavaSparkContext.init(JavaSparkContext.scala:61) at myproject.com.java.core.SparkAnaliticEngine.getJavaSparkContext(SparkAnaliticEngine.java:77) at myproject.com.java.core.SparkAnaliticTable.evmyprojectate(SparkAnaliticTable.java:108) at myproject.com.java.core.SparkAnaliticEngine.evmyprojectateAnaliticTable(SparkAnaliticEngine.java:55) at myproject.com.java.core.SparkAnaliticEngine.evmyprojectateAnaliticTable(SparkAnaliticEngine.java:65) at myproject.com.java.jobs.CustomAggregationJob.main(CustomAggregationJob.java:184) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/issue-while-creating-spark-context-tp22196.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: issue while creating spark context
hi, I can see required permission is granted for this directory as under, hadoop dfs -ls /user/spark DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Found 1 items *drwxrwxrwt - spark spark 0 2015-03-20 01:04 /user/spark/applicationHistory* regards Sachin On Tue, Mar 24, 2015 at 6:13 PM, Akhil Das ak...@sigmoidanalytics.com wrote: write permission as its clearly saying: java.io.IOException:* Error in creating log directory:* file:*/user/spark/*applicationHistory/application_1427194309307_0005 Thanks Best Regards On Tue, Mar 24, 2015 at 6:08 PM, Sachin Singh sachin.sha...@gmail.com wrote: Hi Akhil, thanks for your quick reply, I would like to request please elaborate i.e. what kind of permission required .. thanks in advance, Regards Sachin On Tue, Mar 24, 2015 at 5:29 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Its an IOException, just make sure you are having the correct permission over */user/spark* directory. Thanks Best Regards On Tue, Mar 24, 2015 at 5:21 PM, sachin Singh sachin.sha...@gmail.com wrote: hi all, all of sudden I getting below error when I am submitting spark job using master as yarn its not able to create spark context,previously working fine, I am using CDH5.3.1 and creating javaHiveContext spark-submit --jars ./analiticlibs/mysql-connector-java-5.1.17.jar,./analiticlibs/log4j-1.2.17.jar --master yarn --class myproject.com.java.jobs.Aggregationtask sparkjob-1.0.jar error message- java.io.IOException: Error in creating log directory: file:/user/spark/applicationHistory/application_1427194309307_0005 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:133) at org.apache.spark.util.FileLogger.start(FileLogger.scala:115) at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74) at org.apache.spark.SparkContext.init(SparkContext.scala:353) at org.apache.spark.api.java.JavaSparkContext.init(JavaSparkContext.scala:61) at myproject.com.java.core.SparkAnaliticEngine.getJavaSparkContext(SparkAnaliticEngine.java:77) at myproject.com.java.core.SparkAnaliticTable.evmyprojectate(SparkAnaliticTable.java:108) at myproject.com.java.core.SparkAnaliticEngine.evmyprojectateAnaliticTable(SparkAnaliticEngine.java:55) at myproject.com.java.core.SparkAnaliticEngine.evmyprojectateAnaliticTable(SparkAnaliticEngine.java:65) at myproject.com.java.jobs.CustomAggregationJob.main(CustomAggregationJob.java:184) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/issue-while-creating-spark-context-tp22196.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
issue while creating spark context
hi all, all of sudden I getting below error when I am submitting spark job using master as yarn its not able to create spark context,previously working fine, I am using CDH5.3.1 and creating javaHiveContext spark-submit --jars ./analiticlibs/mysql-connector-java-5.1.17.jar,./analiticlibs/log4j-1.2.17.jar --master yarn --class myproject.com.java.jobs.Aggregationtask sparkjob-1.0.jar error message- java.io.IOException: Error in creating log directory: file:/user/spark/applicationHistory/application_1427194309307_0005 at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:133) at org.apache.spark.util.FileLogger.start(FileLogger.scala:115) at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74) at org.apache.spark.SparkContext.init(SparkContext.scala:353) at org.apache.spark.api.java.JavaSparkContext.init(JavaSparkContext.scala:61) at myproject.com.java.core.SparkAnaliticEngine.getJavaSparkContext(SparkAnaliticEngine.java:77) at myproject.com.java.core.SparkAnaliticTable.evmyprojectate(SparkAnaliticTable.java:108) at myproject.com.java.core.SparkAnaliticEngine.evmyprojectateAnaliticTable(SparkAnaliticEngine.java:55) at myproject.com.java.core.SparkAnaliticEngine.evmyprojectateAnaliticTable(SparkAnaliticEngine.java:65) at myproject.com.java.jobs.CustomAggregationJob.main(CustomAggregationJob.java:184) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/issue-while-creating-spark-context-tp22196.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: issue while creating spark context
thanks Sean and Akhil, I changed the the permission of */user/spark/applicationHistory, *now it works, On Tue, Mar 24, 2015 at 7:35 PM, Sachin Singh sachin.sha...@gmail.com wrote: thanks Sean, please can you suggest in which file or configuration I need to modify proper path, please elaborate which may help, thanks, Regards Sachin On Tue, Mar 24, 2015 at 7:15 PM, Sean Owen so...@cloudera.com wrote: That's probably the problem; the intended path is on HDFS but the configuration specifies a local path. See the exception message. On Tue, Mar 24, 2015 at 1:08 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Its in your local file system, not in hdfs. Thanks Best Regards On Tue, Mar 24, 2015 at 6:25 PM, Sachin Singh sachin.sha...@gmail.com wrote: hi, I can see required permission is granted for this directory as under, hadoop dfs -ls /user/spark DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Found 1 items drwxrwxrwt - spark spark 0 2015-03-20 01:04 /user/spark/applicationHistory regards Sachin
issue creating spark context with CDH 5.3.1
Hi, I am using CDH5.3.1 I am getting bellow error while, even spark context not getting created, I am submitting my job like this - submitting command- spark-submit --jars ./analiticlibs/utils-common-1.0.0.jar,./analiticlibs/mysql-connector-java-5.1.17.jar,./analiticlibs/log4j-1.2.17.jar,./analiticlibs/ant-launcher-1.9.1.jar,./analiticlibs/antlr-2.7.7.jar,./analiticlibs/antlr-runtime-3.4.jar,./analiticlibs/avro-1.7.6-cdh5.3.1.jar,./analiticlibs/datanucleus-api-jdo-3.2.6.jar,./analiticlibs/datanucleus-core-3.2.10.jar,./analiticlibs/datanucleus-rdbms-3.2.9.jar,./analiticlibs/derby-10.10.1.1.jar,./analiticlibs/hive-ant-0.13.1-cdh5.3.1.jar,./analiticlibs/hive-contrib-0.13.1-cdh5.3.1.jar,./analiticlibs/hive-exec-0.13.1-cdh5.3.1.jar,./analiticlibs/hive-jdbc-0.13.1-cdh5.3.1.jar,./analiticlibs/hive-metastore-0.13.1-cdh5.3.1.jar,./analiticlibs/hive-service-0.13.1-cdh5.3.1.jar,./analiticlibs/libfb303-0.9.0.jar,./analiticlibs/libthrift-0.9.0-cdh5-2.jar,./analiticlibs/tachyon-0.5.0.jar,./analiticlibs/zookeeper.jar --master yarn --class mycom.java.analitics.SparkEngineTest sparkanalitics-1.0.0.jar even if I will not specify jar explicitly I am getting same exception, exception- Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf at org.apache.spark.sql.hive.api.java.JavaHiveContext.init(JavaHiveContext.scala:30) at mycom.java.analitics.core.SparkAnaliticEngine.getJavaHiveContext(SparkAnaliticEngine.java:103) at mycom.java.analitics.core.SparkAnaliticTable.evmycomate(SparkAnaliticTable.java:106) at mycom.java.analitics.core.SparkAnaliticEngine.evmycomateAnaliticTable(SparkAnaliticEngine.java:55) at mycom.java.analitics.core.SparkAnaliticEngine.evmycomateAnaliticTable(SparkAnaliticEngine.java:65) at mycom.java.analitics.SparkEngineTest.main(SparkEngineTest.java:29) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 13 more -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/issue-creating-spark-context-with-CDH-5-3-1-tp21968.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: issue creating spark context with CDH 5.3.1
I have copied hive-site.xml to spark conf folder cp /etc/hive/conf/hive-site.xml /usr/lib/spark/conf -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/issue-creating-spark-context-with-CDH-5-3-1-tp21968p21969.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: issue Running Spark Job on Yarn Cluster
Not yet, Please let. Me know if you found solution, Regards Sachin On 4 Mar 2015 21:45, mael2210 [via Apache Spark User List] ml-node+s1001560n21909...@n3.nabble.com wrote: Hello, I am facing the exact same issue. Could you solve the problem ? Kind regards -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/issue-Running-Spark-Job-on-Yarn-Cluster-tp21697p21909.html To unsubscribe from issue Running Spark Job on Yarn Cluster, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=21697code=c2FjaGluLnNoYXNoaUBnbWFpbC5jb218MjE2OTd8MTkyMzgyNjU3Mw== . NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/issue-Running-Spark-Job-on-Yarn-Cluster-tp21697p21912.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: issue Running Spark Job on Yarn Cluster
Yes. On 19 Feb 2015 23:40, Harshvardhan Chauhan ha...@gumgum.com wrote: Is this the full stack trace ? On Wed, Feb 18, 2015 at 2:39 AM, sachin Singh sachin.sha...@gmail.com wrote: Hi, I want to run my spark Job in Hadoop yarn Cluster mode, I am using below command - spark-submit --master yarn-cluster --driver-memory 1g --executor-memory 1g --executor-cores 1 --class com.dc.analysis.jobs.AggregationJob sparkanalitic.jar param1 param2 param3 I am getting error as under, kindly suggest whats going wrong ,is command is proper or not ,thanks in advance, Exception in thread main org.apache.spark.SparkException: Application finished with failed status at org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:509) at org.apache.spark.deploy.yarn.Client.run(Client.scala:35) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/issue-Running-Spark-Job-on-Yarn-Cluster-tp21697.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- *Harshvardhan Chauhan* | Software Engineer *GumGum* http://www.gumgum.com/ | *Ads that stick* 310-260-9666 | ha...@gumgum.com
issue Running Spark Job on Yarn Cluster
Hi, I want to run my spark Job in Hadoop yarn Cluster mode, I am using below command - spark-submit --master yarn-cluster --driver-memory 1g --executor-memory 1g --executor-cores 1 --class com.dc.analysis.jobs.AggregationJob sparkanalitic.jar param1 param2 param3 I am getting error as under, kindly suggest whats going wrong ,is command is proper or not ,thanks in advance, Exception in thread main org.apache.spark.SparkException: Application finished with failed status at org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:509) at org.apache.spark.deploy.yarn.Client.run(Client.scala:35) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/issue-Running-Spark-Job-on-Yarn-Cluster-tp21697.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
how to get SchemaRDD SQL exceptions i.e. table not found exception
Hi, can some one guide how to get SQL Exception trapped for query executed using SchemaRDD, i mean suppose table not found thanks in advance, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-get-SchemaRDD-SQL-exceptions-i-e-table-not-found-exception-tp21645.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
how to avoid Spark and Hive log from Application log
Hi, Please can somebody help ,how to avoid Spark and Hive log from Application log, I mean both spark and hive are using log4j property file , I have configured log4j.property file as per my application as under but its printing Spark and hive console logging also,please suggest its urgent for me, I am running application in HDFS environment log4j.rootLogger=DEBUG,debugLog, SplLog log4j.appender.debugLog=org.apache.log4j.RollingFileAppender log4j.appender.debugLog.File=logs/Debug.log log4j.appender.debugLog.MaxFileSize=10MB log4j.appender.debugLog.MaxBackupIndex=10 log4j.appender.debugLog.layout=org.apache.log4j.PatternLayout log4j.appender.debugLog.layout.ConversionPattern=%d{-MM-dd HH:mm:ss} %-5p %c{1} - %m%n log4j.appender.debugLog.filter.f1=org.apache.log4j.varia.LevelRangeFilter log4j.appender.debugLog.filter.f1.LevelMax=DEBUG log4j.appender.debugLog.filter.f1.LevelMin=DEBUG log4j.appender.SplLog=org.apache.log4j.RollingFileAppender log4j.appender.SplLog.File=logs/AppSplCmd.log log4j.appender.SplLog.MaxFileSize=10MB log4j.appender.SplLog.MaxBackupIndex=10 log4j.appender.SplLog.layout=org.apache.log4j.PatternLayout log4j.appender.SplLog.layout.ConversionPattern=%d{-MM-dd HH:mm:ss} %-5p %c{1} - %m%n log4j.appender.SplLog.filter.f1=org.apache.log4j.varia.LevelRangeFilter log4j.appender.SplLog.filter.f1.LevelMax=FATAL log4j.appender.SplLog.filter.f1.LevelMin=INFO log4j.logger.debugLogger=DEBUG, debugLog log4j.additivity.debugLogger=false log4j.logger.AppSplLogger=INFO, SplLog log4j.additivity.AppSplLogger=false Thanks in advance, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-avoid-Spark-and-Hive-log-from-Application-log-tp21615.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
getting error when submit spark with master as yarn
Hi, when I am trying to execute my program as spark-submit --master yarn --class com.mytestpack.analysis.SparkTest sparktest-1.jar I am getting error bellow error- java.lang.IllegalArgumentException: Required executor memory (1024+384 MB) is above the max threshold (1024 MB) of this cluster! at org.apache.spark.deploy.yarn.ClientBase$class.verifyClusterResources(ClientBase.scala:71) at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:35) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:77) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:140) at org.apache.spark.SparkContext.init(SparkContext.scala:335) at org.apache.spark.api.java.JavaSparkContext.init(JavaSparkContext.scala:61) I am new in Hadoop environment, Please help how/where need to set memory or any configuration ,thanks in advance, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/getting-error-when-submit-spark-with-master-as-yarn-tp21542.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
how to send JavaDStream RDD using foreachRDD using Java
Hi I want to send streaming data to kafka topic, I am having RDD data which I converted in JavaDStream ,now I want to send it to kafka topic, I don't want kafka sending code, just I need foreachRDD implementation, my code is look like as public void publishtoKafka(ITblStream t) { MyTopicProducer MTP = ProducerFactory.createProducer(hostname+:+port); JavaDStream? rdd = (JavaDStream?) t.getRDD(); rdd.foreachRDD(new FunctionString, String() { @Override public Void call(JavaRDDString rdd) throws Exception { KafkaUtils.sendDataAsString(MTP,topicName, String RDDData); return null; } }); log.debug(sent to kafka: --); } here myTopicproducer will create producer which is working fine KafkaUtils.sendDataAsString is method which will publish data to kafka topic is also working fine, I have only one problem I am not able to convert JavaDStream rdd as string using foreach or foreachRDD finally I need String message from rdds, kindly suggest java code only and I dont want to use anonymous classes, Please send me only the part to send JavaDStream RDD using foreachRDD using Function Call Thanks in advance, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-send-JavaDStream-RDD-using-foreachRDD-using-Java-tp21456.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark SQL implementation error
I have a table(csv file) loaded data on that by creating POJO as per table structure,and created SchemaRDD as under JavaRDDTest1 testSchema = sc.textFile(D:/testTable.csv).map(GetTableData);/* GetTableData will transform the all table data in testTable object*/ JavaSchemaRDD schemaTest = sqlContext.applySchema(testSchema, Test.class); schemaTest.registerTempTable(testTable); JavaSchemaRDD sqlQuery = sqlContext.sql(SELECT * FROM testTable); ListString totDuration = sqlQuery.map(new FunctionRow, String() { public String call(Row row) { return Field1is : + row.getInt(0); } }).collect(); its working fine but. if I am changing query as(rest code is same)- JavaSchemaRDD sqlQuery = sqlContext.sql(SELECT sum(field1) FROM testTable group by field2); error as - Exception in thread main java.lang.NoSuchMethodError: org.apache.spark.rdd.ShuffledRDD.init(Lorg/apache/spark/rdd/RDD;Lorg/apache/spark/Partitioner;)V Please help and Suggest -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-implementation-error-tp20901.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
JavaRDD (Data Aggregation) based on key
Hi, I have a csv file having fields as a,b,c . I want to do aggregation(sum,average..) based on any field(a,b or c) as per user input, using Apache Spark Java API,Please Help Urgent! Thanks in advance, Regards Sachin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/JavaRDD-Data-Aggregation-based-on-key-tp20828.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org