from:"Sebastian YEPES FERNANDEZ \(JIRA\)"

[jira] [Commented] (SPARK-14959) Problem Reading partitioned ORC or Parquet files

2016-05-12 Thread Sebastian YEPES FERNANDEZ (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281354#comment-15281354
 ] 

Sebastian YEPES FERNANDEZ commented on SPARK-14959:
---

I think this issue was introduced around SPARK-13664, but the thing is that 
there have been many underlining changes.
If you need anymore debugging info let me know. 

> Problem Reading partitioned ORC or Parquet files
> -
>
> Key: SPARK-14959
> URL: https://issues.apache.org/jira/browse/SPARK-14959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Hadoop 2.7.1.2.4.0.0-169 (HDP 2.4)
>Reporter: Sebastian YEPES FERNANDEZ
>
> Hello,
> I have noticed that in the pasts days there is an issue when trying to read 
> partitioned files from HDFS.
> I am running on Spark master branch #c544356
> The write actually works but the read fails.
> {code:title=Issue Reproduction}
> case class Data(id: Int, text: String)
> val ds = spark.createDataset( Seq(Data(0, "hello"), Data(1, "hello"), Data(0, 
> "world"), Data(1, "there")) )
> scala> 
> ds.write.mode(org.apache.spark.sql.SaveMode.Overwrite).format("parquet").partitionBy("id").save("/user/spark/test.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".  
>   
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> java.io.FileNotFoundException: Path is not a file: 
> /user/spark/test.parquet/id=0
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at 
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1242)
>   at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1227)
>   at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1285)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:221)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:217)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:228)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:209)
>   at 
> org.apache.spark.sql.execution.datasources.HDFSFileCatalog$$anonfun$9$$anonfun$apply$4.apply(fileSourceInterfaces.scala:372)
>   at 
> org.apache.spark.sql.execution.datasources.HDFSFileCatalog$$anonfun$9$$anonfun$apply$4.apply(fileSourceInterfaces.scala:360)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.s

[jira] [Commented] (SPARK-14959) Problem Reading partitioned ORC or Parquet files

2016-05-11 Thread Sebastian YEPES FERNANDEZ (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280032#comment-15280032
 ] 

Sebastian YEPES FERNANDEZ commented on SPARK-14959:
---

[~sowen], The partitioned data exists and is readable by Spark, I can read it 
if I manually specify the partition:

{code}
scala> 
spark.read.format("parquet").load("hdfs://master:8020/user/spark/test.parquet/id=0").show
+-+
| text|
+-+
|hello|
|world|
+-+

scala> 
spark.read.format("parquet").load("hdfs://master:8020/user/spark/test.parquet/id=1").show
+-+
| text|
+-+
|hello|
|there|
+-+
{code}

> Problem Reading partitioned ORC or Parquet files
> -
>
> Key: SPARK-14959
> URL: https://issues.apache.org/jira/browse/SPARK-14959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Hadoop 2.7.1.2.4.0.0-169 (HDP 2.4)
>Reporter: Sebastian YEPES FERNANDEZ
>
> Hello,
> I have noticed that in the pasts days there is an issue when trying to read 
> partitioned files from HDFS.
> I am running on Spark master branch #c544356
> The write actually works but the read fails.
> {code:title=Issue Reproduction}
> case class Data(id: Int, text: String)
> val ds = spark.createDataset( Seq(Data(0, "hello"), Data(1, "hello"), Data(0, 
> "world"), Data(1, "there")) )
> scala> 
> ds.write.mode(org.apache.spark.sql.SaveMode.Overwrite).format("parquet").partitionBy("id").save("/user/spark/test.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".  
>   
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> java.io.FileNotFoundException: Path is not a file: 
> /user/spark/test.parquet/id=0
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at 
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1242)
>   at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1227)
>   at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1285)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:221)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:217)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:228)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFi

[jira] [Commented] (SPARK-14959) Problem Reading partitioned ORC or Parquet files

2016-05-11 Thread Sebastian YEPES FERNANDEZ (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279996#comment-15279996
 ] 

Sebastian YEPES FERNANDEZ commented on SPARK-14959:
---

Hello [~sowen] using the full URL I still get the same error:

{code}
scala> 
spark.read.format("parquet").load("hdfs://master:8020/user/spark/test.parquet").show(1)
java.io.FileNotFoundException: Path is not a file: /user/spark/test.parquet/id=0
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
{code}


> Problem Reading partitioned ORC or Parquet files
> -
>
> Key: SPARK-14959
> URL: https://issues.apache.org/jira/browse/SPARK-14959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Hadoop 2.7.1.2.4.0.0-169 (HDP 2.4)
>Reporter: Sebastian YEPES FERNANDEZ
>
> Hello,
> I have noticed that in the pasts days there is an issue when trying to read 
> partitioned files from HDFS.
> I am running on Spark master branch #c544356
> The write actually works but the read fails.
> {code:title=Issue Reproduction}
> case class Data(id: Int, text: String)
> val ds = spark.createDataset( Seq(Data(0, "hello"), Data(1, "hello"), Data(0, 
> "world"), Data(1, "there")) )
> scala> 
> ds.write.mode(org.apache.spark.sql.SaveMode.Overwrite).format("parquet").partitionBy("id").save("/user/spark/test.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".  
>   
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> java.io.FileNotFoundException: Path is not a file: 
> /user/spark/test.parquet/id=0
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at 
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1242)
>   at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1227)
>   at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1285)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:221)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:217)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFil

[jira] [Commented] (SPARK-14959) Problem Reading partitioned ORC or Parquet files

2016-04-27 Thread Sebastian YEPES FERNANDEZ (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261623#comment-15261623
 ] 

Sebastian YEPES FERNANDEZ commented on SPARK-14959:
---

[~bomeng] I have just retested it with the last master commit 
be317d4a90b3ca906fefeb438f89a09b1c7da5a8 and I am still getting the same error.

have you tested this with HDFS?

{code:title=spakr-shell}
scala> 
ds.write.mode(org.apache.spark.sql.SaveMode.Overwrite).format("parquet").partitionBy("id").save("/user/spark/test.parquet")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
java.io.FileNotFoundException: Path is not a file: /user/spark/test.parquet/id=0
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
...
...
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
  at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
  at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240)
  ... 78 more
{code}

> Problem Reading partitioned ORC or Parquet files
> -
>
> Key: SPARK-14959
> URL: https://issues.apache.org/jira/browse/SPARK-14959
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.0.0
> Environment: Hadoop 2.7.1.2.4.0.0-169 (HDP 2.4)
>Reporter: Sebastian YEPES FERNANDEZ
>Priority: Critical
>
> Hello,
> I have noticed that in the pasts days there is an issue when trying to read 
> partitioned files from HDFS.
> I am running on Spark master branch #c544356
> The write actually works but the read fails.
> {code:title=Issue Reproduction}
> case class Data(id: Int, text: String)
> val ds = spark.createDataset( Seq(Data(0, "hello"), Data(1, "hello"), Data(0, 
> "world"), Data(1, "there")) )
> scala> 
> ds.write.mode(org.apache.spark.sql.SaveMode.Overwrite).format("parquet").partitionBy("id").save("/user/spark/test.parquet")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".  
>   
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> java.io.FileNotFoundException: Path is not a file: 
> /user/spark/test.parquet/id=0
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAcces

[jira] [Created] (SPARK-14959) Problem Reading partitioned ORC or Parquet files

2016-04-27 Thread Sebastian YEPES FERNANDEZ (JIRA)

Sebastian YEPES FERNANDEZ created SPARK-14959:
-

 Summary: Problem Reading partitioned ORC or Parquet files
 Key: SPARK-14959
 URL: https://issues.apache.org/jira/browse/SPARK-14959
 Project: Spark
  Issue Type: Bug
  Components: Input/Output
Affects Versions: 2.0.0
 Environment: Hadoop 2.7.1.2.4.0.0-169 (HDP 2.4)
Reporter: Sebastian YEPES FERNANDEZ
Priority: Critical


Hello,

I have noticed that in the pasts days there is an issue when trying to read 
partitioned files from HDFS.

I am running on Spark master branch #c544356


The write actually works but the read fails.

{code:title=Issue Reproduction}
case class Data(id: Int, text: String)
val ds = spark.createDataset( Seq(Data(0, "hello"), Data(1, "hello"), Data(0, 
"world"), Data(1, "there")) )

scala> 
ds.write.mode(org.apache.spark.sql.SaveMode.Overwrite).format("parquet").partitionBy("id").save("/user/spark/test.parquet")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
java.io.FileNotFoundException: Path is not a file: /user/spark/test.parquet/id=0
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)

  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
  at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
  at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1242)
  at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1227)
  at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1285)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:221)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:217)
  at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:228)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:209)
  at 
org.apache.spark.sql.execution.datasources.HDFSFileCatalog$$anonfun$9$$anonfun$apply$4.apply(fileSourceInterfaces.scala:372)
  at 
org.apache.spark.sql.execution.datasources.HDFSFileCatalog$$anonfun$9$$anonfun$apply$4.apply(fileSourceInterfaces.scala:360)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
  at 
org.apache.spark.sql.execution.datasources.HDFSFileCatalog$$anonfun$9.apply(fileSourceInter

[jira] [Commented] (SPARK-12239) SparkR - Not distributing SparkR module in YARN

2015-12-09 Thread Sebastian YEPES FERNANDEZ (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050270#comment-15050270
 ] 

Sebastian YEPES FERNANDEZ commented on SPARK-12239:
---

[~sunrui] Thanks for the workaround, it works!
Actually our real use case is to use SparkR through RStudio Server, I just used 
R to simplify the reproduction of the problem.

> SparkR - Not distributing SparkR module in YARN
> 
>
> Key: SPARK-12239
> URL: https://issues.apache.org/jira/browse/SPARK-12239
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, YARN
>Affects Versions: 1.5.2, 1.5.3
>Reporter: Sebastian YEPES FERNANDEZ
>Priority: Critical
>
> Hello,
> I am trying to use the SparkR in a YARN environment and I have encountered 
> the following problem:
> Every thing work correctly when using bin/sparkR, but if I try running the 
> same jobs using sparkR directly through R it does not work.
> I have managed to track down what is causing the problem, when sparkR is 
> launched through R the "SparkR" module is not distributed to the worker nodes.
> I have tried working around this issue using the setting 
> "spark.yarn.dist.archives", but it does not work as it deploys the 
> file/extracted folder with the extension ".zip" and workers are actually 
> looking for a folder with the name "sparkr"
> Is there currently any way to make this work?
> {code}
> # spark-defaults.conf
> spark.yarn.dist.archives /opt/apps/spark/R/lib/sparkr.zip
> # R
> library(SparkR, lib.loc="/opt/apps/spark/R/lib/")
> sc <- sparkR.init(appName="SparkR", master="yarn-client", 
> sparkEnvir=list(spark.executor.instances="1"))
> sqlContext <- sparkRSQL.init(sc)
> df <- createDataFrame(sqlContext, faithful)
> head(df)
> 15/12/09 09:04:24 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, 
> fr-s-cour-wrk3.alidaho.com): java.net.SocketTimeoutException: Accept timed out
> at java.net.PlainSocketImpl.socketAccept(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
> {code}
> Container stderr:
> {code}
> 15/12/09 09:04:14 INFO storage.MemoryStore: Block broadcast_1 stored as 
> values in memory (estimated size 8.7 KB, free 530.0 MB)
> 15/12/09 09:04:14 INFO r.BufferedStreamThread: Fatal error: cannot open file 
> '/hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_02/sparkr/SparkR/worker/daemon.R':
>  No such file or directory
> 15/12/09 09:04:24 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.net.SocketTimeoutException: Accept timed out
>   at java.net.PlainSocketImpl.socketAccept(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
>   at java.net.ServerSocket.implAccept(ServerSocket.java:545)
>   at java.net.ServerSocket.accept(ServerSocket.java:513)
>   at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:426)
> {code}
> Worker Node that runned the Container:
> {code}
> # ls -la 
> /hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_02
> total 71M
> drwx--x--- 3 yarn hadoop 4.0K Dec  9 09:04 .
> drwx--x--- 7 yarn hadoop 4.0K Dec  9 09:04 ..
> -rw-r--r-- 1 yarn hadoop  110 Dec  9 09:03 container_tokens
> -rw-r--r-- 1 yarn hadoop   12 Dec  9 09:03 .container_tokens.crc
> -rwx-- 1 yarn hadoop  736 Dec  9 09:03 
> default_container_executor_session.sh
> -rw-r--r-- 1 yarn hadoop   16 Dec  9 09:03 
> .default_container_executor_session.sh.crc
> -rwx-- 1 yarn hadoop  790 Dec  9 09:03 default_container_executor.sh
> -rw-r--r-- 1 yarn hadoop   16 Dec  9 09:03 .default_container_executor.sh.crc
> -rwxr-xr-x 1 yarn hadoop  61K Dec  9 09:04 hadoop-lzo-0.6.0.2.3.2.0-2950.jar
> -rwxr-xr-x 1 yarn hadoop 317K Dec  9 09:04 kafka-clients-0.8.2.2.jar
> -rwx-- 1 yarn hadoop 6.0K Dec  9 09:03 launch_container.sh
> -rw-r--r-- 1 yarn hadoop   56 Dec  9 09:03 .launch_container.sh.crc
> -rwxr-xr-x 1 yarn hadoop 2.2M Dec  9 09:04 
> spark-cassandra-connector_2.10-1.5.0-M3.jar
> -rwxr-xr-x 1 yarn hadoop 7.1M Dec  9 09:04 spark-csv-assembly-1.3.0.jar
> lrwxrwxrwx 1 yarn hadoop  119 Dec  9 09:03 __spark__.jar -> 
> /hadoop/hdfs/disk03/hadoop/yarn/local/usercache/spark/filecache/361/spark-assembly-1.5.3-SNAPSHOT-hadoop2.7.1.jar
> lrwxrwxrwx 1 yarn hadoop   84 Dec  9 09:03 sparkr.zip -> 
> /hadoop/hdfs/disk01/hadoop/yarn/local/usercache/spark/filecache/359/sparkr.zip
> -rwxr-xr-x 1 yarn hadoop 1.8M Dec  9 09:04 
> spark-streaming_2.10-1.5.3-SNAPSHOT.jar
> -rwxr-xr-x 1 yarn hadoop  11M Dec  9 09:04 
> spark-streaming-kafka-assembly_2.10-1.5.3-SNAPSHOT.jar
> -rwxr-xr-x 1 yarn hadoop  48M Dec

[jira] [Created] (SPARK-12239) SparkR - Not distributing SparkR module in YARN

2015-12-09 Thread Sebastian YEPES FERNANDEZ (JIRA)

Sebastian YEPES FERNANDEZ created SPARK-12239:
-

 Summary: SparkR - Not distributing SparkR module in YARN
 Key: SPARK-12239
 URL: https://issues.apache.org/jira/browse/SPARK-12239
 Project: Spark
  Issue Type: Bug
  Components: SparkR, YARN
Affects Versions: 1.5.2, 1.5.3
Reporter: Sebastian YEPES FERNANDEZ
Priority: Critical


Hello,

I am trying to use the SparkR in a YARN environment and I have encountered the 
following problem:

Every thing work correctly when using bin/sparkR, but if I try running the same 
jobs using sparkR directly through R it does not work.

I have managed to track down what is causing the problem, when sparkR is 
launched through R the "SparkR" module is not distributed to the worker nodes.
I have tried working around this issue using the setting 
"spark.yarn.dist.archives", but it does not work as it deploys the 
file/extracted folder with the extension ".zip" and workers are actually 
looking for a folder with the name "sparkr"


Is there currently any way to make this work?



{code}
# spark-defaults.conf
spark.yarn.dist.archives /opt/apps/spark/R/lib/sparkr.zip

# R
library(SparkR, lib.loc="/opt/apps/spark/R/lib/")
sc <- sparkR.init(appName="SparkR", master="yarn-client", 
sparkEnvir=list(spark.executor.instances="1"))
sqlContext <- sparkRSQL.init(sc)
df <- createDataFrame(sqlContext, faithful)
head(df)

15/12/09 09:04:24 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, 
fr-s-cour-wrk3.alidaho.com): java.net.SocketTimeoutException: Accept timed out
at java.net.PlainSocketImpl.socketAccept(Native Method)
at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
{code}


Container stderr:
{code}
15/12/09 09:04:14 INFO storage.MemoryStore: Block broadcast_1 stored as values 
in memory (estimated size 8.7 KB, free 530.0 MB)
15/12/09 09:04:14 INFO r.BufferedStreamThread: Fatal error: cannot open file 
'/hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_02/sparkr/SparkR/worker/daemon.R':
 No such file or directory
15/12/09 09:04:24 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
(TID 1)
java.net.SocketTimeoutException: Accept timed out
at java.net.PlainSocketImpl.socketAccept(Native Method)
at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.ServerSocket.implAccept(ServerSocket.java:545)
at java.net.ServerSocket.accept(ServerSocket.java:513)
at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:426)
{code}


Worker Node that runned the Container:
{code}
# ls -la 
/hadoop/hdfs/disk02/hadoop/yarn/local/usercache/spark/appcache/application_1445706872927_1168/container_e44_1445706872927_1168_01_02
total 71M
drwx--x--- 3 yarn hadoop 4.0K Dec  9 09:04 .
drwx--x--- 7 yarn hadoop 4.0K Dec  9 09:04 ..
-rw-r--r-- 1 yarn hadoop  110 Dec  9 09:03 container_tokens
-rw-r--r-- 1 yarn hadoop   12 Dec  9 09:03 .container_tokens.crc
-rwx-- 1 yarn hadoop  736 Dec  9 09:03 default_container_executor_session.sh
-rw-r--r-- 1 yarn hadoop   16 Dec  9 09:03 
.default_container_executor_session.sh.crc
-rwx-- 1 yarn hadoop  790 Dec  9 09:03 default_container_executor.sh
-rw-r--r-- 1 yarn hadoop   16 Dec  9 09:03 .default_container_executor.sh.crc
-rwxr-xr-x 1 yarn hadoop  61K Dec  9 09:04 hadoop-lzo-0.6.0.2.3.2.0-2950.jar
-rwxr-xr-x 1 yarn hadoop 317K Dec  9 09:04 kafka-clients-0.8.2.2.jar
-rwx-- 1 yarn hadoop 6.0K Dec  9 09:03 launch_container.sh
-rw-r--r-- 1 yarn hadoop   56 Dec  9 09:03 .launch_container.sh.crc
-rwxr-xr-x 1 yarn hadoop 2.2M Dec  9 09:04 
spark-cassandra-connector_2.10-1.5.0-M3.jar
-rwxr-xr-x 1 yarn hadoop 7.1M Dec  9 09:04 spark-csv-assembly-1.3.0.jar
lrwxrwxrwx 1 yarn hadoop  119 Dec  9 09:03 __spark__.jar -> 
/hadoop/hdfs/disk03/hadoop/yarn/local/usercache/spark/filecache/361/spark-assembly-1.5.3-SNAPSHOT-hadoop2.7.1.jar
lrwxrwxrwx 1 yarn hadoop   84 Dec  9 09:03 sparkr.zip -> 
/hadoop/hdfs/disk01/hadoop/yarn/local/usercache/spark/filecache/359/sparkr.zip
-rwxr-xr-x 1 yarn hadoop 1.8M Dec  9 09:04 
spark-streaming_2.10-1.5.3-SNAPSHOT.jar
-rwxr-xr-x 1 yarn hadoop  11M Dec  9 09:04 
spark-streaming-kafka-assembly_2.10-1.5.3-SNAPSHOT.jar
-rwxr-xr-x 1 yarn hadoop  48M Dec  9 09:04 
sparkts-0.1.0-SNAPSHOT-jar-with-dependencies.jar
drwx--x--- 2 yarn hadoop   46 Dec  9 09:04 tmp
{code}


*Working case:*
{code}
# sparkR --master yarn-client --num-executors 1

df <- createDataFrame(sqlContext, faithful)
head(df)
  eruptions waiting
1 3.600  79
2 1.800  54
3 3.333  74
4 2.283  62
5 4.533  85
6 2.883  55
{code}

Worker Node that runned the Container:
{code}
# ls -la 
/hadoop/hdfs/disk04/hadoop/yarn/local/usercache/spark/appcache/application_1445706872

[jira] [Comment Edited] (SPARK-11170) EOFException on History server reading in progress lz4

2015-10-17 Thread Sebastian YEPES FERNANDEZ (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962107#comment-14962107
 ] 

Sebastian YEPES FERNANDEZ edited comment on SPARK-11170 at 10/17/15 11:06 PM:
--

I also tried 'spark.eventLog.compress = false' as a workaround, but it was 
still saving the file with the IO codec.

Update: This workaround also works


was (Author: syepes):
I also tried 'spark.eventLog.compress = false' as a workaround, but it was 
still saving the file with the IO codec.

>  EOFException on History server reading in progress lz4
> 
>
> Key: SPARK-11170
> URL: https://issues.apache.org/jira/browse/SPARK-11170
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI, YARN
>Affects Versions: 1.5.1
> Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950)
> Spark: 1.5.x (c27e1904)
>Reporter: Sebastian YEPES FERNANDEZ
>
> The Spark History server is not able to read/save the jobs history if Spark 
> is configured to use 
> "spark.io.compression.codec=org.apache.spark.io.LZ4CompressionCodec", it 
> continuously generated the following error:
> {code}
> ERROR 2015-10-16 16:21:39 org.apache.spark.deploy.history.FsHistoryProvider: 
> Exception encountered when attempting to load application log 
> hdfs://DATA/user/spark/his
> tory/application_1444297190346_0073_1.lz4.inprogress
> java.io.EOFException: Stream ended prematurely
> at 
> net.jpountz.lz4.LZ4BlockInputStream.readFully(LZ4BlockInputStream.java:218)
> at 
> net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:150)
> at 
> net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:117)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:161)
> at java.io.BufferedReader.readLine(BufferedReader.java:324)
> at java.io.BufferedReader.readLine(BufferedReader.java:389)
> at 
> scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:67)
> at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:55)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
> at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> INFO 2015-10-16 16:21:39 org.apache.spark.deploy.history.FsHistoryProvider: 
> Replaying log path: 
> hdfs://DATA/user/spark/history/application_1444297190346_0072_1.lz4.i
> nprogress
> {code}
> As a workaround setting 
> "spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec" 
> makes the History server work correctly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11170) EOFException on History server reading in progress lz4

2015-10-17 Thread Sebastian YEPES FERNANDEZ (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962107#comment-14962107
 ] 

Sebastian YEPES FERNANDEZ commented on SPARK-11170:
---

I also tried 'spark.eventLog.compress = false' as a workaround, but it was 
still saving the file with the IO codec.

>  EOFException on History server reading in progress lz4
> 
>
> Key: SPARK-11170
> URL: https://issues.apache.org/jira/browse/SPARK-11170
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI, YARN
>Affects Versions: 1.5.1
> Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950)
> Spark: 1.5.x (c27e1904)
>Reporter: Sebastian YEPES FERNANDEZ
>
> The Spark History server is not able to read/save the jobs history if Spark 
> is configured to use 
> "spark.io.compression.codec=org.apache.spark.io.LZ4CompressionCodec", it 
> continuously generated the following error:
> {code}
> ERROR 2015-10-16 16:21:39 org.apache.spark.deploy.history.FsHistoryProvider: 
> Exception encountered when attempting to load application log 
> hdfs://DATA/user/spark/his
> tory/application_1444297190346_0073_1.lz4.inprogress
> java.io.EOFException: Stream ended prematurely
> at 
> net.jpountz.lz4.LZ4BlockInputStream.readFully(LZ4BlockInputStream.java:218)
> at 
> net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:150)
> at 
> net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:117)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:161)
> at java.io.BufferedReader.readLine(BufferedReader.java:324)
> at java.io.BufferedReader.readLine(BufferedReader.java:389)
> at 
> scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:67)
> at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:55)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
> at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> INFO 2015-10-16 16:21:39 org.apache.spark.deploy.history.FsHistoryProvider: 
> Replaying log path: 
> hdfs://DATA/user/spark/history/application_1444297190346_0072_1.lz4.i
> nprogress
> {code}
> As a workaround setting 
> "spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec" 
> makes the History server work correctly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-11170) EOFException on History server reading in progress lz4

2015-10-17 Thread Sebastian YEPES FERNANDEZ (JIRA)

Sebastian YEPES FERNANDEZ created SPARK-11170:
-

 Summary:  EOFException on History server reading in progress lz4
 Key: SPARK-11170
 URL: https://issues.apache.org/jira/browse/SPARK-11170
 Project: Spark
  Issue Type: Bug
  Components: Web UI, YARN
Affects Versions: 1.5.1
 Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950)
Spark: 1.5.x (c27e1904)
Reporter: Sebastian YEPES FERNANDEZ


The Spark History server is not able to read/save the jobs history if Spark 
is configured to use 
"spark.io.compression.codec=org.apache.spark.io.LZ4CompressionCodec", it 
continuously generated the following error:

{code}
ERROR 2015-10-16 16:21:39 org.apache.spark.deploy.history.FsHistoryProvider: 
Exception encountered when attempting to load application log 
hdfs://DATA/user/spark/his
tory/application_1444297190346_0073_1.lz4.inprogress
java.io.EOFException: Stream ended prematurely
at 
net.jpountz.lz4.LZ4BlockInputStream.readFully(LZ4BlockInputStream.java:218)
at 
net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:150)
at 
net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:117)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at 
scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:67)
at 
org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:55)
at 
org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457)
at 
org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292)
at 
org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at 
org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289)
at 
org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
INFO 2015-10-16 16:21:39 org.apache.spark.deploy.history.FsHistoryProvider: 
Replaying log path: 
hdfs://DATA/user/spark/history/application_1444297190346_0072_1.lz4.i
nprogress
{code}



As a workaround setting 
"spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec" makes 
the History server work correctly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster

2015-10-17 Thread Sebastian YEPES FERNANDEZ (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian YEPES FERNANDEZ updated SPARK-11147:
--
Attachment: SparkUI.png

> HTTP 500 if try to access Spark UI in yarn-cluster
> --
>
> Key: SPARK-11147
> URL: https://issues.apache.org/jira/browse/SPARK-11147
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI, YARN
>Affects Versions: 1.5.1
> Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950)
> Spark: 1.5.x (c27e1904)
>Reporter: Sebastian YEPES FERNANDEZ
> Attachments: SparkUI.png, SparkUI.png
>
>
> Hello,
> I am facing a similar issue as described in SPARK-5837, but is my case the 
> SparkUI only work in "yarn-client" mode. If a run the same job using 
> "yarn-cluster" I get the HTTP 500 error:
> {code}
> HTTP ERROR 500
> Problem accessing /proxy/application_1444297190346_0085/. Reason:
> Connection to http://XX.XX.XX.XX:55827 refused
> Caused by:
> org.apache.http.conn.HttpHostConnectException: Connection to 
> http://XX.XX.XX.XX:55827 refused
>   at 
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
>   at 
> org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> {code}
> I have verified that the UI port "55827" is actually Listening on the worker 
> node, I can even run a "curl http://XX.XX.XX.XX:55827"; and it redirects me to 
> another URL: http://YY.YY.YY.YY:8088/proxy/application_1444297190346_0082
> The strange thing is the its redirecting me to the app "_0082" and not the 
> actually running job "_0085"
> Does anyone have any suggestions on what could be causing this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster

2015-10-17 Thread Sebastian YEPES FERNANDEZ (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian YEPES FERNANDEZ updated SPARK-11147:
--
Attachment: SparkUI.png

> HTTP 500 if try to access Spark UI in yarn-cluster
> --
>
> Key: SPARK-11147
> URL: https://issues.apache.org/jira/browse/SPARK-11147
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI, YARN
>Affects Versions: 1.5.1
> Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950)
> Spark: 1.5.x (c27e1904)
>Reporter: Sebastian YEPES FERNANDEZ
> Attachments: SparkUI.png
>
>
> Hello,
> I am facing a similar issue as described in SPARK-5837, but is my case the 
> SparkUI only work in "yarn-client" mode. If a run the same job using 
> "yarn-cluster" I get the HTTP 500 error:
> {code}
> HTTP ERROR 500
> Problem accessing /proxy/application_1444297190346_0085/. Reason:
> Connection to http://XX.XX.XX.XX:55827 refused
> Caused by:
> org.apache.http.conn.HttpHostConnectException: Connection to 
> http://XX.XX.XX.XX:55827 refused
>   at 
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
>   at 
> org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> {code}
> I have verified that the UI port "55827" is actually Listening on the worker 
> node, I can even run a "curl http://XX.XX.XX.XX:55827"; and it redirects me to 
> another URL: http://YY.YY.YY.YY:8088/proxy/application_1444297190346_0082
> The strange thing is the its redirecting me to the app "_0082" and not the 
> actually running job "_0085"
> Does anyone have any suggestions on what could be causing this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster

2015-10-17 Thread Sebastian YEPES FERNANDEZ (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962043#comment-14962043
 ] 

Sebastian YEPES FERNANDEZ commented on SPARK-11147:
---


Ok I have found the source of my problem. For a bit of background, all of 
our nodes are multihomed: one public 1GB NIC only for admin access and a 
second internal 10GB NIC dedicated for all the cluster traffic 
(yarn,hdfs,spark...)

Last night after looking at the source code, I thought it could actually be a 
networking issue. So I tried several settings and found a solution.

Solution:
{code}
# Globally export this variable on all the nodes with there corresponding 
internal IP
echo "export SPARK_LOCAL_IP=192.168.1.x" >/etc/profile
# Restart all the YARN services
{code}
After making these changes now when I submit a job in cluster mode I can access 
the SparkIU.

Every thing now works, but there is still is strange thing in the UI. As Steve 
mentioned the "webproxy" settings are showing the incorrect app ID's (see 
attachment)
 

> HTTP 500 if try to access Spark UI in yarn-cluster
> --
>
> Key: SPARK-11147
> URL: https://issues.apache.org/jira/browse/SPARK-11147
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI, YARN
>Affects Versions: 1.5.1
> Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950)
> Spark: 1.5.x (c27e1904)
>Reporter: Sebastian YEPES FERNANDEZ
>
> Hello,
> I am facing a similar issue as described in SPARK-5837, but is my case the 
> SparkUI only work in "yarn-client" mode. If a run the same job using 
> "yarn-cluster" I get the HTTP 500 error:
> {code}
> HTTP ERROR 500
> Problem accessing /proxy/application_1444297190346_0085/. Reason:
> Connection to http://XX.XX.XX.XX:55827 refused
> Caused by:
> org.apache.http.conn.HttpHostConnectException: Connection to 
> http://XX.XX.XX.XX:55827 refused
>   at 
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
>   at 
> org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> {code}
> I have verified that the UI port "55827" is actually Listening on the worker 
> node, I can even run a "curl http://XX.XX.XX.XX:55827"; and it redirects me to 
> another URL: http://YY.YY.YY.YY:8088/proxy/application_1444297190346_0082
> The strange thing is the its redirecting me to the app "_0082" and not the 
> actually running job "_0085"
> Does anyone have any suggestions on what could be causing this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster

2015-10-16 Thread Sebastian YEPES FERNANDEZ (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961088#comment-14961088
 ] 

Sebastian YEPES FERNANDEZ commented on SPARK-11147:
---

I don't think its a networking issue as until now we have not had any issue 
like this, we are regularly submitting jobs in client mode and all worker nodes 
communicate correctly.
What part of the logs (yarn or spark) would be the most useful so we can 
pinpoint this problem.

Note: Between all the servers there are no firewalls nor OS filtering.

> HTTP 500 if try to access Spark UI in yarn-cluster
> --
>
> Key: SPARK-11147
> URL: https://issues.apache.org/jira/browse/SPARK-11147
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI, YARN
>Affects Versions: 1.5.1
> Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950)
> Spark: 1.5.x (c27e1904)
>Reporter: Sebastian YEPES FERNANDEZ
>
> Hello,
> I am facing a similar issue as described in SPARK-5837, but is my case the 
> SparkUI only work in "yarn-client" mode. If a run the same job using 
> "yarn-cluster" I get the HTTP 500 error:
> {code}
> HTTP ERROR 500
> Problem accessing /proxy/application_1444297190346_0085/. Reason:
> Connection to http://XX.XX.XX.XX:55827 refused
> Caused by:
> org.apache.http.conn.HttpHostConnectException: Connection to 
> http://XX.XX.XX.XX:55827 refused
>   at 
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
>   at 
> org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> {code}
> I have verified that the UI port "55827" is actually Listening on the worker 
> node, I can even run a "curl http://XX.XX.XX.XX:55827"; and it redirects me to 
> another URL: http://YY.YY.YY.YY:8088/proxy/application_1444297190346_0082
> The strange thing is the its redirecting me to the app "_0082" and not the 
> actually running job "_0085"
> Does anyone have any suggestions on what could be causing this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster

2015-10-16 Thread Sebastian YEPES FERNANDEZ (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960801#comment-14960801
 ] 

Sebastian YEPES FERNANDEZ commented on SPARK-11147:
---

We can reproduce this issue with the SparkPi example:

The SparkUI Works:
{code}
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client 
/opt/spark/lib/spark-examples-1.5.2-SNAPSHOT-hadoop2.7.1.jar 1
{code}

The SparkUI does not Work:
{code}
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster 
/opt/spark/lib/spark-examples-1.5.2-SNAPSHOT-hadoop2.7.1.jar 1
{code}


> HTTP 500 if try to access Spark UI in yarn-cluster
> --
>
> Key: SPARK-11147
> URL: https://issues.apache.org/jira/browse/SPARK-11147
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI, YARN
>Affects Versions: 1.5.1
> Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950)
> Spark: 1.5.x (c27e1904)
>Reporter: Sebastian YEPES FERNANDEZ
>
> Hello,
> I am facing a similar issue as described in SPARK-5837, but is my case the 
> SparkUI only work in "yarn-client" mode. If a run the same job using 
> "yarn-cluster" I get the HTTP 500 error:
> {code}
> HTTP ERROR 500
> Problem accessing /proxy/application_1444297190346_0085/. Reason:
> Connection to http://XX.XX.XX.XX:55827 refused
> Caused by:
> org.apache.http.conn.HttpHostConnectException: Connection to 
> http://XX.XX.XX.XX:55827 refused
>   at 
> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
>   at 
> org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> {code}
> I have verified that the UI port "55827" is actually Listening on the worker 
> node, I can even run a "curl http://XX.XX.XX.XX:55827"; and it redirects me to 
> another URL: http://YY.YY.YY.YY:8088/proxy/application_1444297190346_0082
> The strange thing is the its redirecting me to the app "_0082" and not the 
> actually running job "_0085"
> Does anyone have any suggestions on what could be causing this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster

2015-10-16 Thread Sebastian YEPES FERNANDEZ (JIRA)

Sebastian YEPES FERNANDEZ created SPARK-11147:
-

 Summary: HTTP 500 if try to access Spark UI in yarn-cluster
 Key: SPARK-11147
 URL: https://issues.apache.org/jira/browse/SPARK-11147
 Project: Spark
  Issue Type: Bug
  Components: Web UI, YARN
Affects Versions: 1.5.1
 Environment: HDP: 2.3.2.0-2950 (Hadoop 2.7.1.2.3.2.0-2950)
Spark: 1.5.x (c27e1904)
Reporter: Sebastian YEPES FERNANDEZ


Hello,

I am facing a similar issue as described in SPARK-5837, but is my case the 
SparkUI only work in "yarn-client" mode. If a run the same job using 
"yarn-cluster" I get the HTTP 500 error:

{code}
HTTP ERROR 500

Problem accessing /proxy/application_1444297190346_0085/. Reason:
Connection to http://XX.XX.XX.XX:55827 refused

Caused by:

org.apache.http.conn.HttpHostConnectException: Connection to 
http://XX.XX.XX.XX:55827 refused
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
{code}

I have verified that the UI port "55827" is actually Listening on the worker 
node, I can even run a "curl http://XX.XX.XX.XX:55827"; and it redirects me to 
another URL: http://YY.YY.YY.YY:8088/proxy/application_1444297190346_0082

The strange thing is the its redirecting me to the app "_0082" and not the 
actually running job "_0085"


Does anyone have any suggestions on what could be causing this issue?






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10309) Some tasks failed with Unable to acquire memory

2015-09-25 Thread Sebastian YEPES FERNANDEZ (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907747#comment-14907747
 ] 

Sebastian YEPES FERNANDEZ commented on SPARK-10309:
---

Is there currently any workaround this issue?

I am also facing it with the last 1.5.1:
{code:title=Error|borderStyle=solid}
Caused by: java.io.IOException: Unable to acquire 33554432 bytes of memory at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351)
 at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
 at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
 at 
org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:74)
 at 
org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:56)
 at 
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:339)
 ... 8 more
{code}

> Some tasks failed with Unable to acquire memory
> ---
>
> Key: SPARK-10309
> URL: https://issues.apache.org/jira/browse/SPARK-10309
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Davies Liu
>
> While running Q53 of TPCDS (scale = 1500) on 24 nodes cluster (12G memory on 
> executor):
> {code}
> java.io.IOException: Unable to acquire 33554432 bytes of memory
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
> at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.(UnsafeExternalRowSorter.java:68)
> at 
> org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
> at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:45)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The task could finished after retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9503) Mesos dispatcher NullPointerException (MesosClusterScheduler)

2015-07-31 Thread Sebastian YEPES FERNANDEZ (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian YEPES FERNANDEZ updated SPARK-9503:
-
Description: 
Hello,

I have just started using start-mesos-dispatcher and have been noticing that 
some random crashes NPE's

By looking at the exception it looks like in certain situations the 
"queuedDrivers" is empty and causes the NPE "submission.cores"

https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516

{code:title=log|borderStyle=solid}
15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting 
applications on port 7077
Exception in thread "Thread-1647" java.lang.NullPointerException
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512)
I0731 00:53:52.969518  7014 sched.cpp:1625] Asked to abort the driver
I0731 00:53:52.969895  7014 sched.cpp:861] Aborting framework 
'20150730-234528-4261456064-5050-61754-'
15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code 
DRIVER_ABORTED
{code}

A side effect of this NPE is that after the crash the dispatcher will not start 
because its already registered #SPARK-7831
{code:title=log|borderStyle=solid}
15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at 
http://192.168.0.254:8081
I0731 09:55:47.715039  8162 sched.cpp:157] Version: 0.23.0
I0731 09:55:47.717013  8163 sched.cpp:254] New master detected at 
master@192.168.0.254:5050
I0731 09:55:47.717381  8163 sched.cpp:264] No credentials provided. Attempting 
to register without authentication
I0731 09:55:47.718246  8177 sched.cpp:819] Got error 'Completed framework 
attempted to re-register'
I0731 09:55:47.718268  8177 sched.cpp:1625] Asked to abort the driver
15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed 
framework attempted to re-register
I0731 09:55:47.719091  8177 sched.cpp:861] Aborting framework 
'20150730-234528-4261456064-5050-61754-0038'
15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code 
DRIVER_ABORTED
15/07/31 09:55:47 INFO Utils: Shutdown hook called
{code}

I can get around this by removing the zk data:
{code:title=zkCli.sh|borderStyle=solid}
rmr /spark_mesos_dispatcher
{code}


  was:
Hello,

I have just started using start-mesos-dispatcher and have been noticing that 
some random crashes NPE's

By looking at the exception it looks like in certain situations the 
"queuedDrivers" is empty and causes the NPE "submission.cores"

https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516

{code:title=log|borderStyle=solid}
15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting 
applications on port 7077
Exception in thread "Thread-1647" java.lang.NullPointerException
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512)
I0731 00:53:52.969518  7014 sched.cpp:1625] Asked to abort the driver
I0731 00:53:52.969895  7014 sched.cpp:861] Aborting framework 
'20150730-234528-4261456064-5050-61754-'
15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code 
DRIVER_ABORTED
{code}

A side effect of this NPE is that after the crash the dispatcher will not start 
because its already registered #SPARK-7831
I can get around this by removing the zk data:
{code:title=zkCli.sh|borderStyle=solid}
rmr /spark_mesos_dispatcher
{code}



> Mesos dispatcher NullPointerException (MesosClusterScheduler)
> -
>
> Key: SPARK-9503
> URL:

[jira] [Created] (SPARK-9503) Mesos dispatcher NullPointerException (MesosClusterScheduler)

2015-07-31 Thread Sebastian YEPES FERNANDEZ (JIRA)

Sebastian YEPES FERNANDEZ created SPARK-9503:


 Summary: Mesos dispatcher NullPointerException 
(MesosClusterScheduler)
 Key: SPARK-9503
 URL: https://issues.apache.org/jira/browse/SPARK-9503
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.4.1
 Environment: branch-1.4 #8dfdca46dd2f527bf653ea96777b23652bc4eb83
Reporter: Sebastian YEPES FERNANDEZ


Hello,

I have just started using start-mesos-dispatcher and have been noticing that 
some random crashes NPE's

By looking at the exception it looks like in certain situations the 
"queuedDrivers" is empty and causes the NPE "submission.cores"

https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516

{code:title=log|borderStyle=solid}
15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting 
applications on port 7077
Exception in thread "Thread-1647" java.lang.NullPointerException
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512)
I0731 00:53:52.969518  7014 sched.cpp:1625] Asked to abort the driver
I0731 00:53:52.969895  7014 sched.cpp:861] Aborting framework 
'20150730-234528-4261456064-5050-61754-'
15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code 
DRIVER_ABORTED
{code}

A side effect of this NPE is that after the crash the dispatcher will not start 
because its already registered #SPARK-7831
I can get around this by removing the zk data:
{code:title=zkCli.sh|borderStyle=solid}
rmr /spark_mesos_dispatcher
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6921) Spark SQL API "saveAsParquetFile" will output tachyon file with different block size

2015-04-20 Thread Sebastian YEPES FERNANDEZ (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503921#comment-14503921
 ] 

Sebastian YEPES FERNANDEZ commented on SPARK-6921:
--

I can also validate this with v1.3.1

> Spark SQL API "saveAsParquetFile" will output tachyon file with different 
> block size
> 
>
> Key: SPARK-6921
> URL: https://issues.apache.org/jira/browse/SPARK-6921
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: zhangxiongfei
>Priority: Blocker
>
> I run below code  in Spark Shell to access parquet files in Tachyon.
>   1.First,created a DataFrame by loading a bunch of Parquet Files in Tachyon
>   val ta3 
> =sqlContext.parquetFile("tachyon://tachyonserver:19998/apps/tachyon/zhangxf/parquetAdClick-6p-256m");
>   2.Second, set the "fs.local.block.size" to 256M to make sure that block 
> size of output files in Tachyon is 256M.
> sc.hadoopConfiguration.setLong("fs.local.block.size",268435456)
>  3.Third,saved above DataFrame into Parquet files that is stored in Tachyon
> 
> ta3.saveAsParquetFile("tachyon://tachyonserver:19998/apps/tachyon/zhangxf/parquetAdClick-6p-256m-test");
>  After above code run successfully, the output parquet files were stored in 
> Tachyon,but these files have different block size,below is the information of 
> those files in the path 
> "tachyon://tachyonserver:19998/apps/tachyon/zhangxf/parquetAdClick-6p-256m-test":
>   File Name Size  Block Size 
> In-Memory Pin Creation Time
>_SUCCESS  0.00 B   256.00 MB 100% 
> NO 04-13-2015 17:48:23:519
>  _common_metadata  1088.00 B  256.00 MB 100% NO 
> 04-13-2015 17:48:23:741
>  _metadata   22.71 KB   256.00 MB 100% NO 
> 04-13-2015 17:48:23:646
>  part-r-1.parquet 177.19 MB 32.00 MB  100% NO 
> 04-13-2015 17:46:44:626
>  part-r-2.parquet 177.21 MB 32.00 MB  100% NO 
> 04-13-2015 17:46:44:636
>  part-r-3.parquet 177.02 MB 32.00 MB  100% NO 
> 04-13-2015 17:46:45:439
>  part-r-4.parquet 177.21 MB 32.00 MB  100% NO 
> 04-13-2015 17:46:44:845
>  part-r-5.parquet 177.40 MB 32.00 MB  100% NO 
> 04-13-2015 17:46:44:638
>  part-r-6.parquet 177.33 MB 32.00 MB  100% NO 
> 04-13-2015 17:46:44:648
>  It seems that the API saveAsParquetFile does not distribute/broadcast the 
> hadoopconfiguration to executors like the other API such as 
> saveAsTextFile.The configutation "fs.local.block.size" only take effects on 
> Driver.
>  If I set that configuration before loading parquet files,the problem is gone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5281) Registering table on RDD is giving MissingRequirementError

2015-02-20 Thread Sebastian YEPES FERNANDEZ (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329628#comment-14329628
 ] 

Sebastian YEPES FERNANDEZ commented on SPARK-5281:
--

Also having this issue with 1.2.1 with the standard context (sc)

> Registering table on RDD is giving MissingRequirementError
> --
>
> Key: SPARK-5281
> URL: https://issues.apache.org/jira/browse/SPARK-5281
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: sarsol
>Priority: Critical
>
> Application crashes on this line  rdd.registerTempTable("temp")  in 1.2 
> version when using sbt or Eclipse SCALA IDE
> Stacktrace 
> Exception in thread "main" scala.reflect.internal.MissingRequirementError: 
> class org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with 
> primordial classloader with boot classpath 
> [C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-library.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-reflect.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-actor.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-swing.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-compiler.jar;C:\Program
>  Files\Java\jre7\lib\resources.jar;C:\Program 
> Files\Java\jre7\lib\rt.jar;C:\Program 
> Files\Java\jre7\lib\sunrsasign.jar;C:\Program 
> Files\Java\jre7\lib\jsse.jar;C:\Program 
> Files\Java\jre7\lib\jce.jar;C:\Program 
> Files\Java\jre7\lib\charsets.jar;C:\Program 
> Files\Java\jre7\lib\jfr.jar;C:\Program Files\Java\jre7\classes] not found.
>   at 
> scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
>   at 
> scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$$typecreator1$1.apply(ScalaReflection.scala:115)
>   at 
> scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
>   at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
>   at scala.reflect.api.TypeTags$class.typeOf(TypeTags.scala:335)
>   at scala.reflect.api.Universe.typeOf(Universe.scala:59)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:115)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:100)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$class.attributesFor(ScalaReflection.scala:94)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:33)
>   at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:111)
>   at 
> com.sar.spark.dq.poc.SparkPOC$delayedInit$body.apply(SparkPOC.scala:43)
>   at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>   at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>   at scala.App$$anonfun$main$1.apply(App.scala:71)
>   at scala.App$$anonfun$main$1.apply(App.scala:71)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
>   at scala.App$class.main(App.scala:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-5281) Registering table on RDD is giving MissingRequirementError

2015-02-20 Thread Sebastian YEPES FERNANDEZ (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329628#comment-14329628
 ] 

Sebastian YEPES FERNANDEZ edited comment on SPARK-5281 at 2/20/15 9:54 PM:
---

Also having this issue with 1.2.1 the standard context (sc)


was (Author: syepes):
Also having this issue with 1.2.1 with the standard context (sc)

> Registering table on RDD is giving MissingRequirementError
> --
>
> Key: SPARK-5281
> URL: https://issues.apache.org/jira/browse/SPARK-5281
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: sarsol
>Priority: Critical
>
> Application crashes on this line  rdd.registerTempTable("temp")  in 1.2 
> version when using sbt or Eclipse SCALA IDE
> Stacktrace 
> Exception in thread "main" scala.reflect.internal.MissingRequirementError: 
> class org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with 
> primordial classloader with boot classpath 
> [C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-library.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-reflect.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-actor.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-swing.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-compiler.jar;C:\Program
>  Files\Java\jre7\lib\resources.jar;C:\Program 
> Files\Java\jre7\lib\rt.jar;C:\Program 
> Files\Java\jre7\lib\sunrsasign.jar;C:\Program 
> Files\Java\jre7\lib\jsse.jar;C:\Program 
> Files\Java\jre7\lib\jce.jar;C:\Program 
> Files\Java\jre7\lib\charsets.jar;C:\Program 
> Files\Java\jre7\lib\jfr.jar;C:\Program Files\Java\jre7\classes] not found.
>   at 
> scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
>   at 
> scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$$typecreator1$1.apply(ScalaReflection.scala:115)
>   at 
> scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
>   at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
>   at scala.reflect.api.TypeTags$class.typeOf(TypeTags.scala:335)
>   at scala.reflect.api.Universe.typeOf(Universe.scala:59)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:115)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:100)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$class.attributesFor(ScalaReflection.scala:94)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:33)
>   at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:111)
>   at 
> com.sar.spark.dq.poc.SparkPOC$delayedInit$body.apply(SparkPOC.scala:43)
>   at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
>   at 
> scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
>   at scala.App$$anonfun$main$1.apply(App.scala:71)
>   at scala.App$$anonfun$main$1.apply(App.scala:71)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
>   at scala.App$class.main(App.scala:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14959) Problem Reading partitioned ORC or Parquet files

[jira] [Commented] (SPARK-14959) Problem Reading partitioned ORC or Parquet files

[jira] [Commented] (SPARK-14959) Problem Reading partitioned ORC or Parquet files

[jira] [Commented] (SPARK-14959) Problem Reading partitioned ORC or Parquet files

[jira] [Created] (SPARK-14959) Problem Reading partitioned ORC or Parquet files

[jira] [Commented] (SPARK-12239) SparkR - Not distributing SparkR module in YARN

[jira] [Created] (SPARK-12239) SparkR - Not distributing SparkR module in YARN

[jira] [Comment Edited] (SPARK-11170) EOFException on History server reading in progress lz4

[jira] [Commented] (SPARK-11170) EOFException on History server reading in progress lz4

[jira] [Created] (SPARK-11170) EOFException on History server reading in progress lz4

[jira] [Updated] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster

[jira] [Updated] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster

[jira] [Commented] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster

[jira] [Commented] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster

[jira] [Commented] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster

[jira] [Created] (SPARK-11147) HTTP 500 if try to access Spark UI in yarn-cluster

[jira] [Commented] (SPARK-10309) Some tasks failed with Unable to acquire memory

[jira] [Updated] (SPARK-9503) Mesos dispatcher NullPointerException (MesosClusterScheduler)

[jira] [Created] (SPARK-9503) Mesos dispatcher NullPointerException (MesosClusterScheduler)

[jira] [Commented] (SPARK-6921) Spark SQL API "saveAsParquetFile" will output tachyon file with different block size

[jira] [Commented] (SPARK-5281) Registering table on RDD is giving MissingRequirementError

[jira] [Comment Edited] (SPARK-5281) Registering table on RDD is giving MissingRequirementError

22 matches

Site Navigation

Mail list logo

Footer information