[jira] [Comment Edited] (SPARK-11227) Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1
[ https://issues.apache.org/jira/browse/SPARK-11227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253760#comment-15253760 ] Yuri Saito edited comment on SPARK-11227 at 4/22/16 11:17 AM: -- [~valgrind_girl]: Have you run spark-submit and your jar with hive-site.xml? ex ) spark-submit \ --files "conf/hive-site.xml,conf/core-site.xml,conf/hdfs-site.xml" \ --class MAIN_CLASS \ JAR_PATH was (Author: x1): [~valgrind_girl]: Have you run spark-submit and your jar with hive-site.xml? ex ) spark-submit \ --files "conf/hive-site.xml,conf/core-site.xml,conf/hdfs-site.xml" \ --class ${MAIN_CLASS} \ ${JAR_PATH} > Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1 > > > Key: SPARK-11227 > URL: https://issues.apache.org/jira/browse/SPARK-11227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0, 1.5.1 > Environment: OS: CentOS 6.6 > Memory: 28G > CPU: 8 > Mesos: 0.22.0 > HDFS: Hadoop 2.6.0-CDH5.4.0 (build by Cloudera Manager) >Reporter: Yuri Saito > > When running jar including Spark Job at HDFS HA Cluster, Mesos and > Spark1.5.1, the job throw Exception as "java.net.UnknownHostException: > nameservice1" and fail. > I do below in Terminal. > {code} > /opt/spark/bin/spark-submit \ > --class com.example.Job /jobs/job-assembly-1.0.0.jar > {code} > So, job throw below message. > {code} > 15/10/21 15:22:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 > (TID 0, spark003.example.com): java.lang.IllegalArgumentException: > java.net.UnknownHostException: nameservice1 > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:665) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:601) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) > at > org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:409) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at scala.Option.map(Option.scala:145) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:220) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >
[jira] [Commented] (SPARK-11227) Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1
[ https://issues.apache.org/jira/browse/SPARK-11227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253760#comment-15253760 ] Yuri Saito commented on SPARK-11227: [~valgrind_girl]: Have you run spark-submit and your jar with hive-site.xml? ex ) {{ spark-submit \ --files "conf/hive-site.xml,conf/core-site.xml,conf/hdfs-site.xml" \ --class ${MAIN_CLASS} \ ${JAR_PATH} }} > Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1 > > > Key: SPARK-11227 > URL: https://issues.apache.org/jira/browse/SPARK-11227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0, 1.5.1 > Environment: OS: CentOS 6.6 > Memory: 28G > CPU: 8 > Mesos: 0.22.0 > HDFS: Hadoop 2.6.0-CDH5.4.0 (build by Cloudera Manager) >Reporter: Yuri Saito > > When running jar including Spark Job at HDFS HA Cluster, Mesos and > Spark1.5.1, the job throw Exception as "java.net.UnknownHostException: > nameservice1" and fail. > I do below in Terminal. > {code} > /opt/spark/bin/spark-submit \ > --class com.example.Job /jobs/job-assembly-1.0.0.jar > {code} > So, job throw below message. > {code} > 15/10/21 15:22:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 > (TID 0, spark003.example.com): java.lang.IllegalArgumentException: > java.net.UnknownHostException: nameservice1 > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:665) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:601) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) > at > org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:409) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at scala.Option.map(Option.scala:145) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:220) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.UnknownHostException: nameservice1 > ... 41 more > {code} > But, I changed from Spark Cluster
[jira] [Comment Edited] (SPARK-11227) Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1
[ https://issues.apache.org/jira/browse/SPARK-11227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253760#comment-15253760 ] Yuri Saito edited comment on SPARK-11227 at 4/22/16 11:15 AM: -- [~valgrind_girl]: Have you run spark-submit and your jar with hive-site.xml? ex ) spark-submit \ --files "conf/hive-site.xml,conf/core-site.xml,conf/hdfs-site.xml" \ --class ${MAIN_CLASS} \ ${JAR_PATH} was (Author: x1): [~valgrind_girl]: Have you run spark-submit and your jar with hive-site.xml? ex ) {{ spark-submit \ --files "conf/hive-site.xml,conf/core-site.xml,conf/hdfs-site.xml" \ --class ${MAIN_CLASS} \ ${JAR_PATH} }} > Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1 > > > Key: SPARK-11227 > URL: https://issues.apache.org/jira/browse/SPARK-11227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0, 1.5.1 > Environment: OS: CentOS 6.6 > Memory: 28G > CPU: 8 > Mesos: 0.22.0 > HDFS: Hadoop 2.6.0-CDH5.4.0 (build by Cloudera Manager) >Reporter: Yuri Saito > > When running jar including Spark Job at HDFS HA Cluster, Mesos and > Spark1.5.1, the job throw Exception as "java.net.UnknownHostException: > nameservice1" and fail. > I do below in Terminal. > {code} > /opt/spark/bin/spark-submit \ > --class com.example.Job /jobs/job-assembly-1.0.0.jar > {code} > So, job throw below message. > {code} > 15/10/21 15:22:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 > (TID 0, spark003.example.com): java.lang.IllegalArgumentException: > java.net.UnknownHostException: nameservice1 > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:665) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:601) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) > at > org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:409) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at scala.Option.map(Option.scala:145) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:220) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at >
[jira] [Comment Edited] (SPARK-11227) Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1
[ https://issues.apache.org/jira/browse/SPARK-11227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253760#comment-15253760 ] Yuri Saito edited comment on SPARK-11227 at 4/22/16 11:16 AM: -- [~valgrind_girl]: Have you run spark-submit and your jar with hive-site.xml? ex ) spark-submit \ --files "conf/hive-site.xml,conf/core-site.xml,conf/hdfs-site.xml" \ --class ${MAIN_CLASS} \ ${JAR_PATH} was (Author: x1): [~valgrind_girl]: Have you run spark-submit and your jar with hive-site.xml? ex ) spark-submit \ --files "conf/hive-site.xml,conf/core-site.xml,conf/hdfs-site.xml" \ --class ${MAIN_CLASS} \ ${JAR_PATH} > Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1 > > > Key: SPARK-11227 > URL: https://issues.apache.org/jira/browse/SPARK-11227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0, 1.5.1 > Environment: OS: CentOS 6.6 > Memory: 28G > CPU: 8 > Mesos: 0.22.0 > HDFS: Hadoop 2.6.0-CDH5.4.0 (build by Cloudera Manager) >Reporter: Yuri Saito > > When running jar including Spark Job at HDFS HA Cluster, Mesos and > Spark1.5.1, the job throw Exception as "java.net.UnknownHostException: > nameservice1" and fail. > I do below in Terminal. > {code} > /opt/spark/bin/spark-submit \ > --class com.example.Job /jobs/job-assembly-1.0.0.jar > {code} > So, job throw below message. > {code} > 15/10/21 15:22:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 > (TID 0, spark003.example.com): java.lang.IllegalArgumentException: > java.net.UnknownHostException: nameservice1 > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:665) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:601) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) > at > org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:409) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at scala.Option.map(Option.scala:145) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:220) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at >
[jira] [Commented] (SPARK-11227) Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1
[ https://issues.apache.org/jira/browse/SPARK-11227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084457#comment-15084457 ] Yuri Saito commented on SPARK-11227: [~ansonism] Even if you use HiveContext, it dosen't work with spark 1.5.x? > Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1 > > > Key: SPARK-11227 > URL: https://issues.apache.org/jira/browse/SPARK-11227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0, 1.5.1 > Environment: OS: CentOS 6.6 > Memory: 28G > CPU: 8 > Mesos: 0.22.0 > HDFS: Hadoop 2.6.0-CDH5.4.0 (build by Cloudera Manager) >Reporter: Yuri Saito > > When running jar including Spark Job at HDFS HA Cluster, Mesos and > Spark1.5.1, the job throw Exception as "java.net.UnknownHostException: > nameservice1" and fail. > I do below in Terminal. > {code} > /opt/spark/bin/spark-submit \ > --class com.example.Job /jobs/job-assembly-1.0.0.jar > {code} > So, job throw below message. > {code} > 15/10/21 15:22:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 > (TID 0, spark003.example.com): java.lang.IllegalArgumentException: > java.net.UnknownHostException: nameservice1 > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:665) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:601) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) > at > org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:409) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at scala.Option.map(Option.scala:145) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:220) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.UnknownHostException: nameservice1 > ... 41 more > {code} > But, I changed from Spark Cluster 1.5.1 to Spark Cluster 1.4.0, then run the > job, job complete with Success. > In Addition, I disable High Availability on HDFS, then run
[jira] [Commented] (SPARK-11227) Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1
[ https://issues.apache.org/jira/browse/SPARK-11227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969002#comment-14969002 ] Yuri Saito commented on SPARK-11227: [~ste...@apache.org] But, same environments, spark1.4.0 run with successfully. > Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1 > > > Key: SPARK-11227 > URL: https://issues.apache.org/jira/browse/SPARK-11227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0, 1.5.1 > Environment: OS: CentOS 6.6 > Memory: 28G > CPU: 8 > Mesos: 0.22.0 > HDFS: Hadoop 2.6.0-CDH5.4.0 (build by Cloudera Manager) >Reporter: Yuri Saito > > When running jar including Spark Job at HDFS HA Cluster, Mesos and > Spark1.5.1, the job throw Exception as "java.net.UnknownHostException: > nameservice1" and fail. > I do below in Terminal. > {code} > /opt/spark/bin/spark-submit \ > --class com.example.Job /jobs/job-assembly-1.0.0.jar > {code} > So, job throw below message. > {code} > 15/10/21 15:22:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 > (TID 0, spark003.example.com): java.lang.IllegalArgumentException: > java.net.UnknownHostException: nameservice1 > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:665) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:601) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) > at > org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:409) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at scala.Option.map(Option.scala:145) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:220) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.UnknownHostException: nameservice1 > ... 41 more > {code} > But, I changed from Spark Cluster 1.5.1 to Spark Cluster 1.4.0, then run the > job, job complete with Success. > In Addition, I disable High Availability on HDFS, then
[jira] [Updated] (SPARK-11227) Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1
[ https://issues.apache.org/jira/browse/SPARK-11227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuri Saito updated SPARK-11227: --- Description: When running jar including Spark Job at HDFS HA Cluster, Mesos and Spark1.5.1, the job throw Exception as "java.net.UnknownHostException: nameservice1" and fail. I do below in Terminal. {code} /opt/spark/bin/spark-submit \ --class com.example.Job /jobs/job-assembly-1.0.0.jar {code} So, job throw below message. {code} 15/10/21 15:22:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, spark003.example.com): java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:665) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:601) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:409) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at scala.Option.map(Option.scala:145) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:220) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.UnknownHostException: nameservice1 ... 41 more {code} But, I changed from Spark Cluster 1.5.1 to Spark Cluster 1.4.0, then run the job, job complete with Success. In Addition, I disable High Availability on HDFS, then run the job, job complete with Success. So, I think Spark1.5 and higher have bug as the point. note: I try these packages in my Cluster, But both of these fail. * spark-1.5.1-bin-hadoop2.6.tgz * spark-1.5.1-bin-without-hadoop.tgz Only *spark-1.4.0-bin-hadoop2.6.tgz* success. was: When running jar including Spark Job at HDFS HA Cluster, Mesos and Spark1.5.1, the job throw Exception as "java.net.UnknownHostException: nameservice1" and fail. I do below in Terminal. {code} /opt/spark/bin/spark-submit \ --class com.example.Job /jobs/job-assembly-1.0.0.jar {code} So, job throw below message. {code} 15/10/21 15:22:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, spark003.example.com): java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1 at
[jira] [Created] (SPARK-11227) Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1
Yuri Saito created SPARK-11227: -- Summary: Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1 Key: SPARK-11227 URL: https://issues.apache.org/jira/browse/SPARK-11227 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.5.1, 1.5.0 Environment: OS: CentOS 6.6 Memory: 28G CPU: 8 Mesos: 0.22.0 HDFS: Hadoop 2.6.0-CDH5.4.0 (build by Cloudera Manager) Reporter: Yuri Saito When running jar including Spark Job at HDFS HA Cluster, Mesos and Spark1.5.1, the job throw Exception as "java.net.UnknownHostException: nameservice1" and fail. I do below in Terminal. {code} /opt/spark/bin/spark-submit \ --class com.example.Job /jobs/job-assembly-1.0.0.jar {code} So, job throw below message. {code} 15/10/21 15:22:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, spark003.example.com): java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:665) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:601) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:409) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at scala.Option.map(Option.scala:145) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:220) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.UnknownHostException: nameservice1 ... 41 more {code} But, I changed from Spark Cluster 1.5.1 to Spark Cluster 1.4.0, then run the job, job complete with Success. So, I think Spark1.5 and higher have bug as the point. note: I try these packages in my Cluster, But both of these fail. * spark-1.5.1-bin-hadoop2.6.tgz * spark-1.5.1-bin-without-hadoop.tgz Only *spark-1.4.0-bin-hadoop2.6.tgz* success. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8535) PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name
[ https://issues.apache.org/jira/browse/SPARK-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609552#comment-14609552 ] Yuri Saito commented on SPARK-8535: --- Could you change assignee from no-assignee to me? PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name --- Key: SPARK-8535 URL: https://issues.apache.org/jira/browse/SPARK-8535 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.0 Reporter: Christophe Bourguignat Fix For: 1.5.0 Trying to create a Spark DataFrame from a pandas dataframe with no explicit column name : pandasDF = pd.DataFrame([[1, 2], [5, 6]]) sparkDF = sqlContext.createDataFrame(pandasDF) *** 1 sparkDF = sqlContext.createDataFrame(pandasDF) /usr/local/Cellar/apache-spark/1.4.0/libexec/python/pyspark/sql/context.pyc in createDataFrame(self, data, schema, samplingRatio) 344 345 jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd()) -- 346 df = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json()) 347 return DataFrame(df, self) 348 /usr/local/Cellar/apache-spark/1.4.0/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in __call__(self, *args) 536 answer = self.gateway_client.send_command(command) 537 return_value = get_return_value(answer, self.gateway_client, -- 538 self.target_id, self.name) 539 540 for temp_arg in temp_args: /usr/local/Cellar/apache-spark/1.4.0/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 298 raise Py4JJavaError( 299 'An error occurred while calling {0}{1}{2}.\n'. -- 300 format(target_id, '.', name), value) 301 else: 302 raise Py4JError( Py4JJavaError: An error occurred while calling o87.applySchemaToPythonRDD. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8535) PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name
[ https://issues.apache.org/jira/browse/SPARK-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608133#comment-14608133 ] Yuri Saito commented on SPARK-8535: --- Because implicit name of {{pandas.columns}} are Int, but {{StructField}} json expect {{String}}. So I think {{pandas.columns}} are should be convert to {{String}}. I create PR below. https://github.com/apache/spark/pull/7124 PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name --- Key: SPARK-8535 URL: https://issues.apache.org/jira/browse/SPARK-8535 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.0 Reporter: Christophe Bourguignat Trying to create a Spark DataFrame from a pandas dataframe with no explicit column name : pandasDF = pd.DataFrame([[1, 2], [5, 6]]) sparkDF = sqlContext.createDataFrame(pandasDF) *** 1 sparkDF = sqlContext.createDataFrame(pandasDF) /usr/local/Cellar/apache-spark/1.4.0/libexec/python/pyspark/sql/context.pyc in createDataFrame(self, data, schema, samplingRatio) 344 345 jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd()) -- 346 df = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json()) 347 return DataFrame(df, self) 348 /usr/local/Cellar/apache-spark/1.4.0/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in __call__(self, *args) 536 answer = self.gateway_client.send_command(command) 537 return_value = get_return_value(answer, self.gateway_client, -- 538 self.target_id, self.name) 539 540 for temp_arg in temp_args: /usr/local/Cellar/apache-spark/1.4.0/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 298 raise Py4JJavaError( 299 'An error occurred while calling {0}{1}{2}.\n'. -- 300 format(target_id, '.', name), value) 301 else: 302 raise Py4JError( Py4JJavaError: An error occurred while calling o87.applySchemaToPythonRDD. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8450) PySpark write.parquet raises Unsupported datatype DecimalType()
[ https://issues.apache.org/jira/browse/SPARK-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606812#comment-14606812 ] Yuri Saito commented on SPARK-8450: --- When {{createDataFrame}} is called(via *PySpark*), {{CatalystTypeConverters}} convert Decimal to java.math.BigDecimal. see: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L71 But, when {{write.parque}} is called, {{MutableRowWriteSupport}} force to cast to Decimal. So, Exception occured. I create PR below. https://github.com/apache/spark/pull/7106 PySpark write.parquet raises Unsupported datatype DecimalType() --- Key: SPARK-8450 URL: https://issues.apache.org/jira/browse/SPARK-8450 Project: Spark Issue Type: Bug Components: PySpark, SQL Environment: Spark 1.4.0 on Debian Reporter: Peter Hoffmann I'm getting an Exception when I try to save a DataFrame with a DeciamlType as an parquet file Minimal Example: from decimal import Decimal from pyspark.sql import SQLContext from pyspark.sql.types import * sqlContext = SQLContext(sc) schema = StructType([ StructField('id', LongType()), StructField('value', DecimalType())]) rdd = sc.parallelize([[1, Decimal(0.5)],[2, Decimal(2.9)]]) df = sqlContext.createDataFrame(rdd, schema) df.write.parquet(hdfs://srv:9000/user/ph/decimal.parquet, 'overwrite') Stack Trace --- Py4JJavaError Traceback (most recent call last) ipython-input-19-a77dac8de5f3 in module() 1 sr.write.parquet(hdfs://srv:9000/user/ph/decimal.parquet, 'overwrite') /home/spark/spark-1.4.0-bin-hadoop2.6/python/pyspark/sql/readwriter.pyc in parquet(self, path, mode) 367 :param mode: one of `append`, `overwrite`, `error`, `ignore` (default: error) 368 -- 369 return self._jwrite.mode(mode).parquet(path) 370 371 @since(1.4) /home/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in __call__(self, *args) 536 answer = self.gateway_client.send_command(command) 537 return_value = get_return_value(answer, self.gateway_client, -- 538 self.target_id, self.name) 539 540 for temp_arg in temp_args: /home/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 298 raise Py4JJavaError( 299 'An error occurred while calling {0}{1}{2}.\n'. -- 300 format(target_id, '.', name), value) 301 else: 302 raise Py4JError( Py4JJavaError: An error occurred while calling o361.parquet. : org.apache.spark.SparkException: Job aborted. at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.insert(commands.scala:138) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.run(commands.scala:114) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:939) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:939) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:332) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135) at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:281) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at
[jira] [Commented] (SPARK-8498) Fix NullPointerException in error-handling path in UnsafeShuffleWriter
[ https://issues.apache.org/jira/browse/SPARK-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602170#comment-14602170 ] Yuri Saito commented on SPARK-8498: --- I think {{sorter.cleanupAfterError()}} throw {{SparkException}}, - not {{IOException}}. But method {{write}} declare throw {{IOException}}. {code} public void write(scala.collection.IteratorProduct2K, V records) throws IOException {code} So, maybe we cannot compile {{UnsafeShuffleWriter}} class. Fix NullPointerException in error-handling path in UnsafeShuffleWriter -- Key: SPARK-8498 URL: https://issues.apache.org/jira/browse/SPARK-8498 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 1.4.0 Reporter: Josh Rosen Assignee: holdenk Fix For: 1.5.0 This bug was reported by [~prudenko] on the dev list. When the {{tungsten-sort}} shuffle manager was enabled, an executor died with the following exception: {code} 15/06/19 17:53:35 WARN TaskSetManager: Lost task 38.0 in stage 41.0 (TID 3176, ip-10-50-225-214.ec2.internal): java.lang.NullPointerException at org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:151) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} I think that this is actually due to an error-handling issue. In the stack trace, the NPE is being thrown from an error-handling branch of a `finally` block: {code} public void write(scala.collection.IteratorProduct2K, V records) throws IOException { boolean success = false; try { while (records.hasNext()) { insertRecordIntoSorter(records.next()); } closeAndWriteOutput(); success = true; } finally { if (!success) { sorter.cleanupAfterError(); // this is the line throwing the error } } } {code} I suspect that what's happening is that an exception is being thrown from user / upstream code in the initial call to records.next(), but the error-handling block is failing because sorter == null since we haven't initialized it yet. We should fix this bug with a {{sorter != null}} check and should also add a regression test to ShuffleSuite to ensure that exceptions thrown by user code at this step of the shuffle write path don't get masked by error-handling bugs inside of the shuffle code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5320) Joins on simple table created using select gives error
[ https://issues.apache.org/jira/browse/SPARK-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14374832#comment-14374832 ] Yuri Saito commented on SPARK-5320: --- Thank you very much Michael Armbrust. Could you change assignee noassign to me(x1 - Yuri Saito)? Joins on simple table created using select gives error -- Key: SPARK-5320 URL: https://issues.apache.org/jira/browse/SPARK-5320 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.1 Reporter: Kuldeep Fix For: 1.3.1, 1.4.0 Register select 0 as a, 1 as b as table zeroone Register select 0 as x, 1 as y as table zeroone2 The following sql select * from zeroone ta join zeroone2 tb on ta.a = tb.x gives error java.lang.UnsupportedOperationException: LeafNode NoRelation$ must implement statistics. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org