[jira] [Commented] (SPARK-14959) Problem Reading partitioned ORC or Parquet files
[ https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281354#comment-15281354 ] Sebastian YEPES FERNANDEZ commented on SPARK-14959: --- I think this issue was introduced around SPARK-13664, but the thing is that there have been many underlining changes. If you need anymore debugging info let me know. > Problem Reading partitioned ORC or Parquet files > - > > Key: SPARK-14959 > URL: https://issues.apache.org/jira/browse/SPARK-14959 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: Hadoop 2.7.1.2.4.0.0-169 (HDP 2.4) >Reporter: Sebastian YEPES FERNANDEZ > > Hello, > I have noticed that in the pasts days there is an issue when trying to read > partitioned files from HDFS. > I am running on Spark master branch #c544356 > The write actually works but the read fails. > {code:title=Issue Reproduction} > case class Data(id: Int, text: String) > val ds = spark.createDataset( Seq(Data(0, "hello"), Data(1, "hello"), Data(0, > "world"), Data(1, "there")) ) > scala> > ds.write.mode(org.apache.spark.sql.SaveMode.Overwrite).format("parquet").partitionBy("id").save("/user/spark/test.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > java.io.FileNotFoundException: Path is not a file: > /user/spark/test.parquet/id=0 > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at > org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1242) > at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1227) > at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1285) > at > org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:221) > at > org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:217) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:228) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:209) > at > org.apache.spark.sql.execution.datasources.HDFSFileCatalog$$anonfun$9$$anonfun$apply$4.apply(fileSourceInterfaces.scala:372) > at > org.apache.spark.sql.execution.datasources.HDFSFileCatalog$$anonfun$9$$anonfun$apply$4.apply(fileSourceInterfaces.scala:360) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.s
[jira] [Commented] (SPARK-14959) Problem Reading partitioned ORC or Parquet files
[ https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280032#comment-15280032 ] Sebastian YEPES FERNANDEZ commented on SPARK-14959: --- [~sowen], The partitioned data exists and is readable by Spark, I can read it if I manually specify the partition: {code} scala> spark.read.format("parquet").load("hdfs://master:8020/user/spark/test.parquet/id=0").show +-+ | text| +-+ |hello| |world| +-+ scala> spark.read.format("parquet").load("hdfs://master:8020/user/spark/test.parquet/id=1").show +-+ | text| +-+ |hello| |there| +-+ {code} > Problem Reading partitioned ORC or Parquet files > - > > Key: SPARK-14959 > URL: https://issues.apache.org/jira/browse/SPARK-14959 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: Hadoop 2.7.1.2.4.0.0-169 (HDP 2.4) >Reporter: Sebastian YEPES FERNANDEZ > > Hello, > I have noticed that in the pasts days there is an issue when trying to read > partitioned files from HDFS. > I am running on Spark master branch #c544356 > The write actually works but the read fails. > {code:title=Issue Reproduction} > case class Data(id: Int, text: String) > val ds = spark.createDataset( Seq(Data(0, "hello"), Data(1, "hello"), Data(0, > "world"), Data(1, "there")) ) > scala> > ds.write.mode(org.apache.spark.sql.SaveMode.Overwrite).format("parquet").partitionBy("id").save("/user/spark/test.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > java.io.FileNotFoundException: Path is not a file: > /user/spark/test.parquet/id=0 > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at > org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1242) > at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1227) > at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1285) > at > org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:221) > at > org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:217) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:228) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFi
[jira] [Commented] (SPARK-14959) Problem Reading partitioned ORC or Parquet files
[ https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279996#comment-15279996 ] Sebastian YEPES FERNANDEZ commented on SPARK-14959: --- Hello [~sowen] using the full URL I still get the same error: {code} scala> spark.read.format("parquet").load("hdfs://master:8020/user/spark/test.parquet").show(1) java.io.FileNotFoundException: Path is not a file: /user/spark/test.parquet/id=0 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) {code} > Problem Reading partitioned ORC or Parquet files > - > > Key: SPARK-14959 > URL: https://issues.apache.org/jira/browse/SPARK-14959 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: Hadoop 2.7.1.2.4.0.0-169 (HDP 2.4) >Reporter: Sebastian YEPES FERNANDEZ > > Hello, > I have noticed that in the pasts days there is an issue when trying to read > partitioned files from HDFS. > I am running on Spark master branch #c544356 > The write actually works but the read fails. > {code:title=Issue Reproduction} > case class Data(id: Int, text: String) > val ds = spark.createDataset( Seq(Data(0, "hello"), Data(1, "hello"), Data(0, > "world"), Data(1, "there")) ) > scala> > ds.write.mode(org.apache.spark.sql.SaveMode.Overwrite).format("parquet").partitionBy("id").save("/user/spark/test.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > java.io.FileNotFoundException: Path is not a file: > /user/spark/test.parquet/id=0 > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at > org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1242) > at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1227) > at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1285) > at > org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:221) > at > org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:217) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFil
[jira] [Commented] (SPARK-14959) Problem Reading partitioned ORC or Parquet files
[ https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261623#comment-15261623 ] Sebastian YEPES FERNANDEZ commented on SPARK-14959: --- [~bomeng] I have just retested it with the last master commit be317d4a90b3ca906fefeb438f89a09b1c7da5a8 and I am still getting the same error. have you tested this with HDFS? {code:title=spakr-shell} scala> ds.write.mode(org.apache.spark.sql.SaveMode.Overwrite).format("parquet").partitionBy("id").save("/user/spark/test.parquet") SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. java.io.FileNotFoundException: Path is not a file: /user/spark/test.parquet/id=0 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) ... ... at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240) ... 78 more {code} > Problem Reading partitioned ORC or Parquet files > - > > Key: SPARK-14959 > URL: https://issues.apache.org/jira/browse/SPARK-14959 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.0.0 > Environment: Hadoop 2.7.1.2.4.0.0-169 (HDP 2.4) >Reporter: Sebastian YEPES FERNANDEZ >Priority: Critical > > Hello, > I have noticed that in the pasts days there is an issue when trying to read > partitioned files from HDFS. > I am running on Spark master branch #c544356 > The write actually works but the read fails. > {code:title=Issue Reproduction} > case class Data(id: Int, text: String) > val ds = spark.createDataset( Seq(Data(0, "hello"), Data(1, "hello"), Data(0, > "world"), Data(1, "there")) ) > scala> > ds.write.mode(org.apache.spark.sql.SaveMode.Overwrite).format("parquet").partitionBy("id").save("/user/spark/test.parquet") > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > java.io.FileNotFoundException: Path is not a file: > /user/spark/test.parquet/id=0 > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAcces
[jira] [Created] (SPARK-14959) Problem Reading partitioned ORC or Parquet files
Sebastian YEPES FERNANDEZ created SPARK-14959: - Summary: Problem Reading partitioned ORC or Parquet files Key: SPARK-14959 URL: https://issues.apache.org/jira/browse/SPARK-14959 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 2.0.0 Environment: Hadoop 2.7.1.2.4.0.0-169 (HDP 2.4) Reporter: Sebastian YEPES FERNANDEZ Priority: Critical Hello, I have noticed that in the pasts days there is an issue when trying to read partitioned files from HDFS. I am running on Spark master branch #c544356 The write actually works but the read fails. {code:title=Issue Reproduction} case class Data(id: Int, text: String) val ds = spark.createDataset( Seq(Data(0, "hello"), Data(1, "hello"), Data(0, "world"), Data(1, "there")) ) scala> ds.write.mode(org.apache.spark.sql.SaveMode.Overwrite).format("parquet").partitionBy("id").save("/user/spark/test.parquet") SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. java.io.FileNotFoundException: Path is not a file: /user/spark/test.parquet/id=0 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1242) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1227) at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1285) at org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:221) at org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:217) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:228) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:209) at org.apache.spark.sql.execution.datasources.HDFSFileCatalog$$anonfun$9$$anonfun$apply$4.apply(fileSourceInterfaces.scala:372) at org.apache.spark.sql.execution.datasources.HDFSFileCatalog$$anonfun$9$$anonfun$apply$4.apply(fileSourceInterfaces.scala:360) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.execution.datasources.HDFSFileCatalog$$anonfun$9.apply(fileSourceInter
[jira] [Commented] (SPARK-12239) SparkR - Not distributing SparkR module in YARN
[ https://issues.apache.org/jira/browse/SPARK-12239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050270#comment-15050270 ] Sebastian YEPES FERNANDEZ commented on SPARK-12239: --- [~sunrui] Thanks for the workaround, it works! Actually our real use case is to use SparkR through RStudio Server, I just used R to simplify the reproduction of the problem. > SparkR - Not distributing SparkR module in YARN > > > Key: SPARK-12239 > URL: https://issues.apache.org/jira/browse/SPARK-12239 > Project: Spark > Issue Type: Bug > Components: SparkR, YARN >Affects Versions: 1.5.2, 1.5.3 >Reporter: Sebastian YEPES FERNANDEZ >Priority: Critical > > Hello, > I am trying to use the SparkR in a YARN environment and I have encountered > the following problem: > Every thing work correctly when using bin/sparkR, but if I try running the > same jobs using sparkR directly through R it does not work. > I have managed to track down what is causing the problem, when sparkR is > launched through R the "SparkR" module is not distributed to the worker nodes. > I have tried working around this issue using the setting > "spark.yarn.dist.archives", but it does not work as it deploys the > file/extracted folder with the extension ".zip" and workers are actually > looking for a folder with the name "sparkr" > Is there currently any way to make this work? > {code} > # spark-defaults.conf > spark.yarn.dist.archives /opt/apps/spark/R/lib/sparkr.zip > # R > library(SparkR, lib.loc="/opt/apps/spark/R/lib/") > sc <- sparkR.init(appName="SparkR", master="yarn-client", > sparkEnvir=list(spark.executor.instances="1")) > sqlContext <- sparkRSQL.init(sc) > df <- createDataFrame(sqlContext, faithful) > head(df) > 15/12/09 09:04:24 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, > fr-s-cour-wrk3.alidaho.com): java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) > {code} > Container stderr: > {code} > 15/12/09 09:04:14 INFO storage.MemoryStore: Block broadcast_1 stored as > values in memory (estimated size 8.7 KB, free 530.0 MB) > 15/12/09 09:04:14 INFO r.BufferedStreamThread: Fatal error: cannot open file > '/hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_02/sparkr/SparkR/worker/daemon.R': > No such file or directory > 15/12/09 09:04:24 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 > (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) > at java.net.ServerSocket.implAccept(ServerSocket.java:545) > at java.net.ServerSocket.accept(ServerSocket.java:513) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:426) > {code} > Worker Node that runned the Container: > {code} > # ls -la > /hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_02 > total 71M > drwx--x--- 3 yarn hadoop 4.0K Dec 9 09:04 . > drwx--x--- 7 yarn hadoop 4.0K Dec 9 09:04 .. > -rw-r--r-- 1 yarn hadoop 110 Dec 9 09:03 container_tokens > -rw-r--r-- 1 yarn hadoop 12 Dec 9 09:03 .container_tokens.crc > -rwx-- 1 yarn hadoop 736 Dec 9 09:03 > default_container_executor_session.sh > -rw-r--r-- 1 yarn hadoop 16 Dec 9 09:03 > .default_container_executor_session.sh.crc > -rwx-- 1 yarn hadoop 790 Dec 9 09:03 default_container_executor.sh > -rw-r--r-- 1 yarn hadoop 16 Dec 9 09:03 .default_container_executor.sh.crc > -rwxr-xr-x 1 yarn hadoop 61K Dec 9 09:04 hadoop-lzo-0.6.0.2.3.2.0-2950.jar > -rwxr-xr-x 1 yarn hadoop 317K Dec 9 09:04 kafka-clients-0.8.2.2.jar > -rwx-- 1 yarn hadoop 6.0K Dec 9 09:03 launch_container.sh > -rw-r--r-- 1 yarn hadoop 56 Dec 9 09:03 .launch_container.sh.crc > -rwxr-xr-x 1 yarn hadoop 2.2M Dec 9 09:04 > spark-cassandra-connector_2.10-1.5.0-M3.jar > -rwxr-xr-x 1 yarn hadoop 7.1M Dec 9 09:04 spark-csv-assembly-1.3.0.jar > lrwxrwxrwx 1 yarn hadoop 119 Dec 9 09:03 __spark__.jar -> > /hadoop/hdfs/disk03/hadoop/yarn/local/usercache/spark/filecache/361/spark-assembly-1.5.3-SNAPSHOT-hadoop2.7.1.jar > lrwxrwxrwx 1 yarn hadoop 84 Dec 9 09:03 sparkr.zip -> > /hadoop/hdfs/disk01/hadoop/yarn/local/usercache/spark/filecache/359/sparkr.zip > -rwxr-xr-x 1 yarn hadoop 1.8M Dec 9 09:04 > spark-streaming_2.10-1.5.3-SNAPSHOT.jar > -rwxr-xr-x 1 yarn hadoop 11M Dec 9 09:04 > spark-streaming-kafka-assembly_2.10-1.5.3-SNAPSHOT.jar > -rwxr-xr-x 1 yarn hadoop 48M Dec
[jira] [Created] (SPARK-12239) SparkR - Not distributing SparkR module in YARN
Sebastian YEPES FERNANDEZ created SPARK-12239: - Summary: SparkR - Not distributing SparkR module in YARN Key: SPARK-12239 URL: https://issues.apache.org/jira/browse/SPARK-12239 Project: Spark Issue Type: Bug Components: SparkR, YARN Affects Versions: 1.5.2, 1.5.3 Reporter: Sebastian YEPES FERNANDEZ Priority: Critical Hello, I am trying to use the SparkR in a YARN environment and I have encountered the following problem: Every thing work correctly when using bin/sparkR, but if I try running the same jobs using sparkR directly through R it does not work. I have managed to track down what is causing the problem, when sparkR is launched through R the "SparkR" module is not distributed to the worker nodes. I have tried working around this issue using the setting "spark.yarn.dist.archives", but it does not work as it deploys the file/extracted folder with the extension ".zip" and workers are actually looking for a folder with the name "sparkr" Is there currently any way to make this work? {code} # spark-defaults.conf spark.yarn.dist.archives /opt/apps/spark/R/lib/sparkr.zip # R library(SparkR, lib.loc="/opt/apps/spark/R/lib/") sc <- sparkR.init(appName="SparkR", master="yarn-client", sparkEnvir=list(spark.executor.instances="1")) sqlContext <- sparkRSQL.init(sc) df <- createDataFrame(sqlContext, faithful) head(df) 15/12/09 09:04:24 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, fr-s-cour-wrk3.alidaho.com): java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) {code} Container stderr: {code} 15/12/09 09:04:14 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 8.7 KB, free 530.0 MB) 15/12/09 09:04:14 INFO r.BufferedStreamThread: Fatal error: cannot open file '/hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_02/sparkr/SparkR/worker/daemon.R': No such file or directory 15/12/09 09:04:24 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1) java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) at java.net.ServerSocket.implAccept(ServerSocket.java:545) at java.net.ServerSocket.accept(ServerSocket.java:513) at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:426) {code} Worker Node that runned the Container: {code} # ls -la /hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_02 total 71M drwx--x--- 3 yarn hadoop 4.0K Dec 9 09:04 . drwx--x--- 7 yarn hadoop 4.0K Dec 9 09:04 .. -rw-r--r-- 1 yarn hadoop 110 Dec 9 09:03 container_tokens -rw-r--r-- 1 yarn hadoop 12 Dec 9 09:03 .container_tokens.crc -rwx-- 1 yarn hadoop 736 Dec 9 09:03 default_container_executor_session.sh -rw-r--r-- 1 yarn hadoop 16 Dec 9 09:03 .default_container_executor_session.sh.crc -rwx-- 1 yarn hadoop 790 Dec 9 09:03 default_container_executor.sh -rw-r--r-- 1 yarn hadoop 16 Dec 9 09:03 .default_container_executor.sh.crc -rwxr-xr-x 1 yarn hadoop 61K Dec 9 09:04 hadoop-lzo-0.6.0.2.3.2.0-2950.jar -rwxr-xr-x 1 yarn hadoop 317K Dec 9 09:04 kafka-clients-0.8.2.2.jar -rwx-- 1 yarn hadoop 6.0K Dec 9 09:03 launch_container.sh -rw-r--r-- 1 yarn hadoop 56 Dec 9 09:03 .launch_container.sh.crc -rwxr-xr-x 1 yarn hadoop 2.2M Dec 9 09:04 spark-cassandra-connector_2.10-1.5.0-M3.jar -rwxr-xr-x 1 yarn hadoop 7.1M Dec 9 09:04 spark-csv-assembly-1.3.0.jar lrwxrwxrwx 1 yarn hadoop 119 Dec 9 09:03 __spark__.jar -> /hadoop/hdfs/disk03/hadoop/yarn/local/usercache/spark/filecache/361/spark-assembly-1.5.3-SNAPSHOT-hadoop2.7.1.jar lrwxrwxrwx 1 yarn hadoop 84 Dec 9 09:03 sparkr.zip -> /hadoop/hdfs/disk01/hadoop/yarn/local/usercache/spark/filecache/359/sparkr.zip -rwxr-xr-x 1 yarn hadoop 1.8M Dec 9 09:04 spark-streaming_2.10-1.5.3-SNAPSHOT.jar -rwxr-xr-x 1 yarn hadoop 11M Dec 9 09:04 spark-streaming-kafka-assembly_2.10-1.5.3-SNAPSHOT.jar -rwxr-xr-x 1 yarn hadoop 48M Dec 9 09:04 sparkts-0.1.0-SNAPSHOT-jar-with-dependencies.jar drwx--x--- 2 yarn hadoop 46 Dec 9 09:04 tmp {code} *Working case:* {code} # sparkR --master yarn-client --num-executors 1 df <- createDataFrame(sqlContext, faithful) head(df) eruptions waiting 1 3.600 79 2 1.800 54 3 3.333 74 4 2.283 62 5 4.533 85 6 2.883 55 {code} Worker Node that runned the Container: {code} # ls -la /hadoop/hdfs/disk04/hadoop/yarn/local/usercache/spark/appcache/application_1445706872
[jira] [Comment Edited] (SPARK-11170) EOFException on History server reading in progress lz4
[ https://issues.apache.org/jira/browse/SPARK-11170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962107#comment-14962107 ] Sebastian YEPES FERNANDEZ edited comment on SPARK-11170 at 10/17/15 11:06 PM: -- I also tried 'spark.eventLog.compress = false' as a workaround, but it was still saving the file with the IO codec. Update: This workaround also works was (Author: syepes): I also tried 'spark.eventLog.compress = false' as a workaround, but it was still saving the file with the IO codec. > EOFException on History server reading in progress lz4 > > > Key: SPARK-11170 > URL: https://issues.apache.org/jira/browse/SPARK-11170 > Project: Spark > Issue Type: Bug > Components: Web UI, YARN >Affects Versions: 1.5.1 > Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950) > Spark: 1.5.x (c27e1904) >Reporter: Sebastian YEPES FERNANDEZ > > The Spark History server is not able to read/save the jobs history if Spark > is configured to use > "spark.io.compression.codec=org.apache.spark.io.LZ4CompressionCodec", it > continuously generated the following error: > {code} > ERROR 2015-10-16 16:21:39 org.apache.spark.deploy.history.FsHistoryProvider: > Exception encountered when attempting to load application log > hdfs://DATA/user/spark/his > tory/application_1444297190346_0073_1.lz4.inprogress > java.io.EOFException: Stream ended prematurely > at > net.jpountz.lz4.LZ4BlockInputStream.readFully(LZ4BlockInputStream.java:218) > at > net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:150) > at > net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:117) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.readLine(BufferedReader.java:324) > at java.io.BufferedReader.readLine(BufferedReader.java:389) > at > scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:67) > at > org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:55) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > INFO 2015-10-16 16:21:39 org.apache.spark.deploy.history.FsHistoryProvider: > Replaying log path: > hdfs://DATA/user/spark/history/application_1444297190346_0072_1.lz4.i > nprogress > {code} > As a workaround setting > "spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec" > makes the History server work correctly -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11170) EOFException on History server reading in progress lz4
[ https://issues.apache.org/jira/browse/SPARK-11170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962107#comment-14962107 ] Sebastian YEPES FERNANDEZ commented on SPARK-11170: --- I also tried 'spark.eventLog.compress = false' as a workaround, but it was still saving the file with the IO codec. > EOFException on History server reading in progress lz4 > > > Key: SPARK-11170 > URL: https://issues.apache.org/jira/browse/SPARK-11170 > Project: Spark > Issue Type: Bug > Components: Web UI, YARN >Affects Versions: 1.5.1 > Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950) > Spark: 1.5.x (c27e1904) >Reporter: Sebastian YEPES FERNANDEZ > > The Spark History server is not able to read/save the jobs history if Spark > is configured to use > "spark.io.compression.codec=org.apache.spark.io.LZ4CompressionCodec", it > continuously generated the following error: > {code} > ERROR 2015-10-16 16:21:39 org.apache.spark.deploy.history.FsHistoryProvider: > Exception encountered when attempting to load application log > hdfs://DATA/user/spark/his > tory/application_1444297190346_0073_1.lz4.inprogress > java.io.EOFException: Stream ended prematurely > at > net.jpountz.lz4.LZ4BlockInputStream.readFully(LZ4BlockInputStream.java:218) > at > net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:150) > at > net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:117) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.readLine(BufferedReader.java:324) > at java.io.BufferedReader.readLine(BufferedReader.java:389) > at > scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:67) > at > org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:55) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > INFO 2015-10-16 16:21:39 org.apache.spark.deploy.history.FsHistoryProvider: > Replaying log path: > hdfs://DATA/user/spark/history/application_1444297190346_0072_1.lz4.i > nprogress > {code} > As a workaround setting > "spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec" > makes the History server work correctly -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11170) EOFException on History server reading in progress lz4
Sebastian YEPES FERNANDEZ created SPARK-11170: - Summary: EOFException on History server reading in progress lz4 Key: SPARK-11170 URL: https://issues.apache.org/jira/browse/SPARK-11170 Project: Spark Issue Type: Bug Components: Web UI, YARN Affects Versions: 1.5.1 Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950) Spark: 1.5.x (c27e1904) Reporter: Sebastian YEPES FERNANDEZ The Spark History server is not able to read/save the jobs history if Spark is configured to use "spark.io.compression.codec=org.apache.spark.io.LZ4CompressionCodec", it continuously generated the following error: {code} ERROR 2015-10-16 16:21:39 org.apache.spark.deploy.history.FsHistoryProvider: Exception encountered when attempting to load application log hdfs://DATA/user/spark/his tory/application_1444297190346_0073_1.lz4.inprogress java.io.EOFException: Stream ended prematurely at net.jpountz.lz4.LZ4BlockInputStream.readFully(LZ4BlockInputStream.java:218) at net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:150) at net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:117) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) at java.io.BufferedReader.readLine(BufferedReader.java:389) at scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:67) at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:55) at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457) at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292) at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289) at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) INFO 2015-10-16 16:21:39 org.apache.spark.deploy.history.FsHistoryProvider: Replaying log path: hdfs://DATA/user/spark/history/application_1444297190346_0072_1.lz4.i nprogress {code} As a workaround setting "spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec" makes the History server work correctly -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster
[ https://issues.apache.org/jira/browse/SPARK-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian YEPES FERNANDEZ updated SPARK-11147: -- Attachment: SparkUI.png > HTTP 500 if try to access Spark UI in yarn-cluster > -- > > Key: SPARK-11147 > URL: https://issues.apache.org/jira/browse/SPARK-11147 > Project: Spark > Issue Type: Bug > Components: Web UI, YARN >Affects Versions: 1.5.1 > Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950) > Spark: 1.5.x (c27e1904) >Reporter: Sebastian YEPES FERNANDEZ > Attachments: SparkUI.png, SparkUI.png > > > Hello, > I am facing a similar issue as described in SPARK-5837, but is my case the > SparkUI only work in "yarn-client" mode. If a run the same job using > "yarn-cluster" I get the HTTP 500 error: > {code} > HTTP ERROR 500 > Problem accessing /proxy/application_1444297190346_0085/. Reason: > Connection to http://XX.XX.XX.XX:55827 refused > Caused by: > org.apache.http.conn.HttpHostConnectException: Connection to > http://XX.XX.XX.XX:55827 refused > at > org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190) > at > org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) > at > org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643) > at > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479) > at > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) > {code} > I have verified that the UI port "55827" is actually Listening on the worker > node, I can even run a "curl http://XX.XX.XX.XX:55827"; and it redirects me to > another URL: http://YY.YY.YY.YY:8088/proxy/application_1444297190346_0082 > The strange thing is the its redirecting me to the app "_0082" and not the > actually running job "_0085" > Does anyone have any suggestions on what could be causing this issue? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster
[ https://issues.apache.org/jira/browse/SPARK-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian YEPES FERNANDEZ updated SPARK-11147: -- Attachment: SparkUI.png > HTTP 500 if try to access Spark UI in yarn-cluster > -- > > Key: SPARK-11147 > URL: https://issues.apache.org/jira/browse/SPARK-11147 > Project: Spark > Issue Type: Bug > Components: Web UI, YARN >Affects Versions: 1.5.1 > Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950) > Spark: 1.5.x (c27e1904) >Reporter: Sebastian YEPES FERNANDEZ > Attachments: SparkUI.png > > > Hello, > I am facing a similar issue as described in SPARK-5837, but is my case the > SparkUI only work in "yarn-client" mode. If a run the same job using > "yarn-cluster" I get the HTTP 500 error: > {code} > HTTP ERROR 500 > Problem accessing /proxy/application_1444297190346_0085/. Reason: > Connection to http://XX.XX.XX.XX:55827 refused > Caused by: > org.apache.http.conn.HttpHostConnectException: Connection to > http://XX.XX.XX.XX:55827 refused > at > org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190) > at > org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) > at > org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643) > at > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479) > at > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) > {code} > I have verified that the UI port "55827" is actually Listening on the worker > node, I can even run a "curl http://XX.XX.XX.XX:55827"; and it redirects me to > another URL: http://YY.YY.YY.YY:8088/proxy/application_1444297190346_0082 > The strange thing is the its redirecting me to the app "_0082" and not the > actually running job "_0085" > Does anyone have any suggestions on what could be causing this issue? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster
[ https://issues.apache.org/jira/browse/SPARK-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962043#comment-14962043 ] Sebastian YEPES FERNANDEZ commented on SPARK-11147: --- Ok I have found the source of my problem. For a bit of background, all of our nodes are multihomed: one public 1GB NIC only for admin access and a second internal 10GB NIC dedicated for all the cluster traffic (yarn,hdfs,spark...) Last night after looking at the source code, I thought it could actually be a networking issue. So I tried several settings and found a solution. Solution: {code} # Globally export this variable on all the nodes with there corresponding internal IP echo "export SPARK_LOCAL_IP=192.168.1.x" >/etc/profile # Restart all the YARN services {code} After making these changes now when I submit a job in cluster mode I can access the SparkIU. Every thing now works, but there is still is strange thing in the UI. As Steve mentioned the "webproxy" settings are showing the incorrect app ID's (see attachment) > HTTP 500 if try to access Spark UI in yarn-cluster > -- > > Key: SPARK-11147 > URL: https://issues.apache.org/jira/browse/SPARK-11147 > Project: Spark > Issue Type: Bug > Components: Web UI, YARN >Affects Versions: 1.5.1 > Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950) > Spark: 1.5.x (c27e1904) >Reporter: Sebastian YEPES FERNANDEZ > > Hello, > I am facing a similar issue as described in SPARK-5837, but is my case the > SparkUI only work in "yarn-client" mode. If a run the same job using > "yarn-cluster" I get the HTTP 500 error: > {code} > HTTP ERROR 500 > Problem accessing /proxy/application_1444297190346_0085/. Reason: > Connection to http://XX.XX.XX.XX:55827 refused > Caused by: > org.apache.http.conn.HttpHostConnectException: Connection to > http://XX.XX.XX.XX:55827 refused > at > org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190) > at > org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) > at > org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643) > at > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479) > at > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) > {code} > I have verified that the UI port "55827" is actually Listening on the worker > node, I can even run a "curl http://XX.XX.XX.XX:55827"; and it redirects me to > another URL: http://YY.YY.YY.YY:8088/proxy/application_1444297190346_0082 > The strange thing is the its redirecting me to the app "_0082" and not the > actually running job "_0085" > Does anyone have any suggestions on what could be causing this issue? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster
[ https://issues.apache.org/jira/browse/SPARK-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961088#comment-14961088 ] Sebastian YEPES FERNANDEZ commented on SPARK-11147: --- I don't think its a networking issue as until now we have not had any issue like this, we are regularly submitting jobs in client mode and all worker nodes communicate correctly. What part of the logs (yarn or spark) would be the most useful so we can pinpoint this problem. Note: Between all the servers there are no firewalls nor OS filtering. > HTTP 500 if try to access Spark UI in yarn-cluster > -- > > Key: SPARK-11147 > URL: https://issues.apache.org/jira/browse/SPARK-11147 > Project: Spark > Issue Type: Bug > Components: Web UI, YARN >Affects Versions: 1.5.1 > Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950) > Spark: 1.5.x (c27e1904) >Reporter: Sebastian YEPES FERNANDEZ > > Hello, > I am facing a similar issue as described in SPARK-5837, but is my case the > SparkUI only work in "yarn-client" mode. If a run the same job using > "yarn-cluster" I get the HTTP 500 error: > {code} > HTTP ERROR 500 > Problem accessing /proxy/application_1444297190346_0085/. Reason: > Connection to http://XX.XX.XX.XX:55827 refused > Caused by: > org.apache.http.conn.HttpHostConnectException: Connection to > http://XX.XX.XX.XX:55827 refused > at > org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190) > at > org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) > at > org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643) > at > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479) > at > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) > {code} > I have verified that the UI port "55827" is actually Listening on the worker > node, I can even run a "curl http://XX.XX.XX.XX:55827"; and it redirects me to > another URL: http://YY.YY.YY.YY:8088/proxy/application_1444297190346_0082 > The strange thing is the its redirecting me to the app "_0082" and not the > actually running job "_0085" > Does anyone have any suggestions on what could be causing this issue? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster
[ https://issues.apache.org/jira/browse/SPARK-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960801#comment-14960801 ] Sebastian YEPES FERNANDEZ commented on SPARK-11147: --- We can reproduce this issue with the SparkPi example: The SparkUI Works: {code} spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client /opt/spark/lib/spark-examples-1.5.2-SNAPSHOT-hadoop2.7.1.jar 1 {code} The SparkUI does not Work: {code} spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster /opt/spark/lib/spark-examples-1.5.2-SNAPSHOT-hadoop2.7.1.jar 1 {code} > HTTP 500 if try to access Spark UI in yarn-cluster > -- > > Key: SPARK-11147 > URL: https://issues.apache.org/jira/browse/SPARK-11147 > Project: Spark > Issue Type: Bug > Components: Web UI, YARN >Affects Versions: 1.5.1 > Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950) > Spark: 1.5.x (c27e1904) >Reporter: Sebastian YEPES FERNANDEZ > > Hello, > I am facing a similar issue as described in SPARK-5837, but is my case the > SparkUI only work in "yarn-client" mode. If a run the same job using > "yarn-cluster" I get the HTTP 500 error: > {code} > HTTP ERROR 500 > Problem accessing /proxy/application_1444297190346_0085/. Reason: > Connection to http://XX.XX.XX.XX:55827 refused > Caused by: > org.apache.http.conn.HttpHostConnectException: Connection to > http://XX.XX.XX.XX:55827 refused > at > org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190) > at > org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) > at > org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643) > at > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479) > at > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) > {code} > I have verified that the UI port "55827" is actually Listening on the worker > node, I can even run a "curl http://XX.XX.XX.XX:55827"; and it redirects me to > another URL: http://YY.YY.YY.YY:8088/proxy/application_1444297190346_0082 > The strange thing is the its redirecting me to the app "_0082" and not the > actually running job "_0085" > Does anyone have any suggestions on what could be causing this issue? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster
Sebastian YEPES FERNANDEZ created SPARK-11147: - Summary: HTTP 500 if try to access Spark UI in yarn-cluster Key: SPARK-11147 URL: https://issues.apache.org/jira/browse/SPARK-11147 Project: Spark Issue Type: Bug Components: Web UI, YARN Affects Versions: 1.5.1 Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950) Spark: 1.5.x (c27e1904) Reporter: Sebastian YEPES FERNANDEZ Hello, I am facing a similar issue as described in SPARK-5837, but is my case the SparkUI only work in "yarn-client" mode. If a run the same job using "yarn-cluster" I get the HTTP 500 error: {code} HTTP ERROR 500 Problem accessing /proxy/application_1444297190346_0085/. Reason: Connection to http://XX.XX.XX.XX:55827 refused Caused by: org.apache.http.conn.HttpHostConnectException: Connection to http://XX.XX.XX.XX:55827 refused at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) {code} I have verified that the UI port "55827" is actually Listening on the worker node, I can even run a "curl http://XX.XX.XX.XX:55827"; and it redirects me to another URL: http://YY.YY.YY.YY:8088/proxy/application_1444297190346_0082 The strange thing is the its redirecting me to the app "_0082" and not the actually running job "_0085" Does anyone have any suggestions on what could be causing this issue? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10309) Some tasks failed with Unable to acquire memory
[ https://issues.apache.org/jira/browse/SPARK-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907747#comment-14907747 ] Sebastian YEPES FERNANDEZ commented on SPARK-10309: --- Is there currently any workaround this issue? I am also facing it with the last 1.5.1: {code:title=Error|borderStyle=solid} Caused by: java.io.IOException: Unable to acquire 33554432 bytes of memory at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106) at org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:74) at org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:56) at org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:339) ... 8 more {code} > Some tasks failed with Unable to acquire memory > --- > > Key: SPARK-10309 > URL: https://issues.apache.org/jira/browse/SPARK-10309 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Davies Liu > > While running Q53 of TPCDS (scale = 1500) on 24 nodes cluster (12G memory on > executor): > {code} > java.io.IOException: Unable to acquire 33554432 bytes of memory > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106) > at > org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68) > at > org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146) > at > org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169) > at > org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169) > at > org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:45) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > The task could finished after retry. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9503) Mesos dispatcher NullPointerException (MesosClusterScheduler)
[ https://issues.apache.org/jira/browse/SPARK-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian YEPES FERNANDEZ updated SPARK-9503: - Description: Hello, I have just started using start-mesos-dispatcher and have been noticing that some random crashes NPE's By looking at the exception it looks like in certain situations the "queuedDrivers" is empty and causes the NPE "submission.cores" https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516 {code:title=log|borderStyle=solid} 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting applications on port 7077 Exception in thread "Thread-1647" java.lang.NullPointerException at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512) I0731 00:53:52.969518 7014 sched.cpp:1625] Asked to abort the driver I0731 00:53:52.969895 7014 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-' 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED {code} A side effect of this NPE is that after the crash the dispatcher will not start because its already registered #SPARK-7831 {code:title=log|borderStyle=solid} 15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at http://192.168.0.254:8081 I0731 09:55:47.715039 8162 sched.cpp:157] Version: 0.23.0 I0731 09:55:47.717013 8163 sched.cpp:254] New master detected at master@192.168.0.254:5050 I0731 09:55:47.717381 8163 sched.cpp:264] No credentials provided. Attempting to register without authentication I0731 09:55:47.718246 8177 sched.cpp:819] Got error 'Completed framework attempted to re-register' I0731 09:55:47.718268 8177 sched.cpp:1625] Asked to abort the driver 15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed framework attempted to re-register I0731 09:55:47.719091 8177 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0038' 15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED 15/07/31 09:55:47 INFO Utils: Shutdown hook called {code} I can get around this by removing the zk data: {code:title=zkCli.sh|borderStyle=solid} rmr /spark_mesos_dispatcher {code} was: Hello, I have just started using start-mesos-dispatcher and have been noticing that some random crashes NPE's By looking at the exception it looks like in certain situations the "queuedDrivers" is empty and causes the NPE "submission.cores" https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516 {code:title=log|borderStyle=solid} 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting applications on port 7077 Exception in thread "Thread-1647" java.lang.NullPointerException at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512) I0731 00:53:52.969518 7014 sched.cpp:1625] Asked to abort the driver I0731 00:53:52.969895 7014 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-' 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED {code} A side effect of this NPE is that after the crash the dispatcher will not start because its already registered #SPARK-7831 I can get around this by removing the zk data: {code:title=zkCli.sh|borderStyle=solid} rmr /spark_mesos_dispatcher {code} > Mesos dispatcher NullPointerException (MesosClusterScheduler) > - > > Key: SPARK-9503 > URL:
[jira] [Created] (SPARK-9503) Mesos dispatcher NullPointerException (MesosClusterScheduler)
Sebastian YEPES FERNANDEZ created SPARK-9503: Summary: Mesos dispatcher NullPointerException (MesosClusterScheduler) Key: SPARK-9503 URL: https://issues.apache.org/jira/browse/SPARK-9503 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.4.1 Environment: branch-1.4 #8dfdca46dd2f527bf653ea96777b23652bc4eb83 Reporter: Sebastian YEPES FERNANDEZ Hello, I have just started using start-mesos-dispatcher and have been noticing that some random crashes NPE's By looking at the exception it looks like in certain situations the "queuedDrivers" is empty and causes the NPE "submission.cores" https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516 {code:title=log|borderStyle=solid} 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting applications on port 7077 Exception in thread "Thread-1647" java.lang.NullPointerException at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512) I0731 00:53:52.969518 7014 sched.cpp:1625] Asked to abort the driver I0731 00:53:52.969895 7014 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-' 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED {code} A side effect of this NPE is that after the crash the dispatcher will not start because its already registered #SPARK-7831 I can get around this by removing the zk data: {code:title=zkCli.sh|borderStyle=solid} rmr /spark_mesos_dispatcher {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6921) Spark SQL API "saveAsParquetFile" will output tachyon file with different block size
[ https://issues.apache.org/jira/browse/SPARK-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503921#comment-14503921 ] Sebastian YEPES FERNANDEZ commented on SPARK-6921: -- I can also validate this with v1.3.1 > Spark SQL API "saveAsParquetFile" will output tachyon file with different > block size > > > Key: SPARK-6921 > URL: https://issues.apache.org/jira/browse/SPARK-6921 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: zhangxiongfei >Priority: Blocker > > I run below code in Spark Shell to access parquet files in Tachyon. > 1.First,created a DataFrame by loading a bunch of Parquet Files in Tachyon > val ta3 > =sqlContext.parquetFile("tachyon://tachyonserver:19998/apps/tachyon/zhangxf/parquetAdClick-6p-256m"); > 2.Second, set the "fs.local.block.size" to 256M to make sure that block > size of output files in Tachyon is 256M. > sc.hadoopConfiguration.setLong("fs.local.block.size",268435456) > 3.Third,saved above DataFrame into Parquet files that is stored in Tachyon > > ta3.saveAsParquetFile("tachyon://tachyonserver:19998/apps/tachyon/zhangxf/parquetAdClick-6p-256m-test"); > After above code run successfully, the output parquet files were stored in > Tachyon,but these files have different block size,below is the information of > those files in the path > "tachyon://tachyonserver:19998/apps/tachyon/zhangxf/parquetAdClick-6p-256m-test": > File Name Size Block Size > In-Memory Pin Creation Time >_SUCCESS 0.00 B 256.00 MB 100% > NO 04-13-2015 17:48:23:519 > _common_metadata 1088.00 B 256.00 MB 100% NO > 04-13-2015 17:48:23:741 > _metadata 22.71 KB 256.00 MB 100% NO > 04-13-2015 17:48:23:646 > part-r-1.parquet 177.19 MB 32.00 MB 100% NO > 04-13-2015 17:46:44:626 > part-r-2.parquet 177.21 MB 32.00 MB 100% NO > 04-13-2015 17:46:44:636 > part-r-3.parquet 177.02 MB 32.00 MB 100% NO > 04-13-2015 17:46:45:439 > part-r-4.parquet 177.21 MB 32.00 MB 100% NO > 04-13-2015 17:46:44:845 > part-r-5.parquet 177.40 MB 32.00 MB 100% NO > 04-13-2015 17:46:44:638 > part-r-6.parquet 177.33 MB 32.00 MB 100% NO > 04-13-2015 17:46:44:648 > It seems that the API saveAsParquetFile does not distribute/broadcast the > hadoopconfiguration to executors like the other API such as > saveAsTextFile.The configutation "fs.local.block.size" only take effects on > Driver. > If I set that configuration before loading parquet files,the problem is gone. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5281) Registering table on RDD is giving MissingRequirementError
[ https://issues.apache.org/jira/browse/SPARK-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329628#comment-14329628 ] Sebastian YEPES FERNANDEZ commented on SPARK-5281: -- Also having this issue with 1.2.1 with the standard context (sc) > Registering table on RDD is giving MissingRequirementError > -- > > Key: SPARK-5281 > URL: https://issues.apache.org/jira/browse/SPARK-5281 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 >Reporter: sarsol >Priority: Critical > > Application crashes on this line rdd.registerTempTable("temp") in 1.2 > version when using sbt or Eclipse SCALA IDE > Stacktrace > Exception in thread "main" scala.reflect.internal.MissingRequirementError: > class org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with > primordial classloader with boot classpath > [C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-library.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-reflect.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-actor.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-swing.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-compiler.jar;C:\Program > Files\Java\jre7\lib\resources.jar;C:\Program > Files\Java\jre7\lib\rt.jar;C:\Program > Files\Java\jre7\lib\sunrsasign.jar;C:\Program > Files\Java\jre7\lib\jsse.jar;C:\Program > Files\Java\jre7\lib\jce.jar;C:\Program > Files\Java\jre7\lib\charsets.jar;C:\Program > Files\Java\jre7\lib\jfr.jar;C:\Program Files\Java\jre7\classes] not found. > at > scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16) > at > scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17) > at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48) > at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61) > at > scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72) > at > scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119) > at > scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21) > at > org.apache.spark.sql.catalyst.ScalaReflection$$typecreator1$1.apply(ScalaReflection.scala:115) > at > scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231) > at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231) > at scala.reflect.api.TypeTags$class.typeOf(TypeTags.scala:335) > at scala.reflect.api.Universe.typeOf(Universe.scala:59) > at > org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:115) > at > org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33) > at > org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:100) > at > org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33) > at > org.apache.spark.sql.catalyst.ScalaReflection$class.attributesFor(ScalaReflection.scala:94) > at > org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:33) > at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:111) > at > com.sar.spark.dq.poc.SparkPOC$delayedInit$body.apply(SparkPOC.scala:43) > at scala.Function0$class.apply$mcV$sp(Function0.scala:40) > at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) > at scala.App$$anonfun$main$1.apply(App.scala:71) > at scala.App$$anonfun$main$1.apply(App.scala:71) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32) > at scala.App$class.main(App.scala:71) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5281) Registering table on RDD is giving MissingRequirementError
[ https://issues.apache.org/jira/browse/SPARK-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329628#comment-14329628 ] Sebastian YEPES FERNANDEZ edited comment on SPARK-5281 at 2/20/15 9:54 PM: --- Also having this issue with 1.2.1 the standard context (sc) was (Author: syepes): Also having this issue with 1.2.1 with the standard context (sc) > Registering table on RDD is giving MissingRequirementError > -- > > Key: SPARK-5281 > URL: https://issues.apache.org/jira/browse/SPARK-5281 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 >Reporter: sarsol >Priority: Critical > > Application crashes on this line rdd.registerTempTable("temp") in 1.2 > version when using sbt or Eclipse SCALA IDE > Stacktrace > Exception in thread "main" scala.reflect.internal.MissingRequirementError: > class org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with > primordial classloader with boot classpath > [C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-library.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-reflect.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-actor.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-swing.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-compiler.jar;C:\Program > Files\Java\jre7\lib\resources.jar;C:\Program > Files\Java\jre7\lib\rt.jar;C:\Program > Files\Java\jre7\lib\sunrsasign.jar;C:\Program > Files\Java\jre7\lib\jsse.jar;C:\Program > Files\Java\jre7\lib\jce.jar;C:\Program > Files\Java\jre7\lib\charsets.jar;C:\Program > Files\Java\jre7\lib\jfr.jar;C:\Program Files\Java\jre7\classes] not found. > at > scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16) > at > scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17) > at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48) > at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61) > at > scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72) > at > scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119) > at > scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21) > at > org.apache.spark.sql.catalyst.ScalaReflection$$typecreator1$1.apply(ScalaReflection.scala:115) > at > scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231) > at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231) > at scala.reflect.api.TypeTags$class.typeOf(TypeTags.scala:335) > at scala.reflect.api.Universe.typeOf(Universe.scala:59) > at > org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:115) > at > org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33) > at > org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:100) > at > org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33) > at > org.apache.spark.sql.catalyst.ScalaReflection$class.attributesFor(ScalaReflection.scala:94) > at > org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:33) > at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:111) > at > com.sar.spark.dq.poc.SparkPOC$delayedInit$body.apply(SparkPOC.scala:43) > at scala.Function0$class.apply$mcV$sp(Function0.scala:40) > at > scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) > at scala.App$$anonfun$main$1.apply(App.scala:71) > at scala.App$$anonfun$main$1.apply(App.scala:71) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32) > at scala.App$class.main(App.scala:71) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org