[jira] [Commented] (SPARK-19207) LocalSparkSession should use Slf4JLoggerFactory.INSTANCE instead of creating new object via constructor
[ https://issues.apache.org/jira/browse/SPARK-19207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821417#comment-15821417 ] Tsuyoshi Ozawa commented on SPARK-19207: I will send PR to fix this problem > LocalSparkSession should use Slf4JLoggerFactory.INSTANCE instead of creating > new object via constructor > --- > > Key: SPARK-19207 > URL: https://issues.apache.org/jira/browse/SPARK-19207 > Project: Spark > Issue Type: Improvement >Reporter: Tsuyoshi Ozawa > > It's deprecated to create Slf4JLoggerFactory's instance via constructor. A > warning is generated: > {code} > [warn] > /Users/ozawa/workspace/spark/sql/core/src/test/scala/org/apache/spark/sql/LocalSparkSession.scala:32: > constructor Slf4JLoggerFactory in class Slf4JLoggerFactory is deprecated: > see corresponding Javadoc for more information. > [warn] InternalLoggerFactory.setDefaultFactory(new Slf4JLoggerFactory()) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19207) LocalSparkSession should use Slf4JLoggerFactory.INSTANCE instead of creating new object via constructor
Tsuyoshi Ozawa created SPARK-19207: -- Summary: LocalSparkSession should use Slf4JLoggerFactory.INSTANCE instead of creating new object via constructor Key: SPARK-19207 URL: https://issues.apache.org/jira/browse/SPARK-19207 Project: Spark Issue Type: Improvement Reporter: Tsuyoshi Ozawa It's deprecated to create Slf4JLoggerFactory's instance via constructor. A warning is generated: {code} [warn] /Users/ozawa/workspace/spark/sql/core/src/test/scala/org/apache/spark/sql/LocalSparkSession.scala:32: constructor Slf4JLoggerFactory in class Slf4JLoggerFactory is deprecated: see corresponding Javadoc for more information. [warn] InternalLoggerFactory.setDefaultFactory(new Slf4JLoggerFactory()) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-18345) Structured Streaming quick examples fails with default configuration
[ https://issues.apache.org/jira/browse/SPARK-18345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa closed SPARK-18345. -- Resolution: Not A Bug Oh, my bad. This was by mistake of my configuration. After clearing HADOOP_*, it works well. Closing this as Not a bug. > Structured Streaming quick examples fails with default configuration > > > Key: SPARK-18345 > URL: https://issues.apache.org/jira/browse/SPARK-18345 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.0.1 >Reporter: Tsuyoshi Ozawa > > StructuredNetworkWordCount results in failure because it needs HDFS > configuration. It should use local filesystem instead of using HDFS by > default. > {quote} > Exception in thread "main" java.net.ConnectException: Call From > ozamac-2.local/192.168.33.1 to localhost:9000 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:408) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397) > at > org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:225) > at > org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:260) > at > org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount$.main(StructuredNetworkWordCount.scala:71) > at > org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount.main(StructuredNetworkWordCount.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {quote} > . -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-18399) Examples in SparkSQL/DataFrame guide fails with default configuration
[ https://issues.apache.org/jira/browse/SPARK-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa closed SPARK-18399. -- Resolution: Not A Problem It's completely my bad! I had a hadoop configuration on my local. Closing this as not a problem. > Examples in SparkSQL/DataFrame guide fails with default configuration > - > > Key: SPARK-18399 > URL: https://issues.apache.org/jira/browse/SPARK-18399 > Project: Spark > Issue Type: Bug > Components: Documentation, SQL >Reporter: Tsuyoshi Ozawa > > http://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes > With default configuration, it results in a failure since it tries to access > HDFS while the path to people.json/txt are assumed to be in local file > system. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18399) Examples in SparkSQL/DataFrame guide fails with default configuration
[ https://issues.apache.org/jira/browse/SPARK-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated SPARK-18399: --- Summary: Examples in SparkSQL/DataFrame guide fails with default configuration (was: Examples in SparkSQL/DataFrame fails with default configuration) > Examples in SparkSQL/DataFrame guide fails with default configuration > - > > Key: SPARK-18399 > URL: https://issues.apache.org/jira/browse/SPARK-18399 > Project: Spark > Issue Type: Bug > Components: Documentation, SQL >Reporter: Tsuyoshi Ozawa > > http://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes > With default configuration, it results in a failure since it tries to access > HDFS while the path to people.json/txt are assumed to be in local file > system. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18399) Examples in SparkSQL/DataFrame fails with default configuration
[ https://issues.apache.org/jira/browse/SPARK-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated SPARK-18399: --- Summary: Examples in SparkSQL/DataFrame fails with default configuration (was: Examples in SparkSQL/DataFrame fails with default configurations) > Examples in SparkSQL/DataFrame fails with default configuration > --- > > Key: SPARK-18399 > URL: https://issues.apache.org/jira/browse/SPARK-18399 > Project: Spark > Issue Type: Bug > Components: Documentation, SQL >Reporter: Tsuyoshi Ozawa > > http://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes > With default configuration, it results in a failure since it tries to access > HDFS while the path to people.json/txt are assumed to be in local file > system. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18399) Examples in SparkSQL/DataFrame fails with default configurations
[ https://issues.apache.org/jira/browse/SPARK-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653729#comment-15653729 ] Tsuyoshi Ozawa commented on SPARK-18399: We should use absolute path with "file:///" prefix, instead of relative path. I will send a PR soon. > Examples in SparkSQL/DataFrame fails with default configurations > > > Key: SPARK-18399 > URL: https://issues.apache.org/jira/browse/SPARK-18399 > Project: Spark > Issue Type: Bug > Components: Documentation, SQL >Reporter: Tsuyoshi Ozawa > > http://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes > With default configuration, it results in a failure since it tries to access > HDFS while the path to people.json/txt are assumed to be in local file > system. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18399) Examples in SparkSQL/DataFrame fails with default configurations
[ https://issues.apache.org/jira/browse/SPARK-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653722#comment-15653722 ] Tsuyoshi Ozawa edited comment on SPARK-18399 at 11/10/16 10:46 AM: --- This is a log when the failure happens, without any configuration. {code} scala> val df = spark.read.json("examples/src/main/resources/people.json") 16/11/10 19:23:56 WARN datasources.DataSource: Error while looking for metadata directory. java.net.ConnectException: Call From ozamac-2.local/10.129.56.104 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1351) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:292) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:282) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:344) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:282) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:297) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:250) {code} was (Author: ozawa): {code} scala> val df = spark.read.json("examples/src/main/resources/people.json") 16/11/10 19:23:56 WARN datasources.DataSource: Error while looking for metadata directory. java.net.ConnectException: Call From ozamac-2.local/10.129.56.104 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1351) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at
[jira] [Created] (SPARK-18399) Examples in SparkSQL/DataFrame fails with default configurations
Tsuyoshi Ozawa created SPARK-18399: -- Summary: Examples in SparkSQL/DataFrame fails with default configurations Key: SPARK-18399 URL: https://issues.apache.org/jira/browse/SPARK-18399 Project: Spark Issue Type: Bug Components: Documentation, SQL Reporter: Tsuyoshi Ozawa http://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes With default configuration, it results in a failure since it tries to access HDFS while the path to people.json/txt are assumed to be in local file system. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18399) Examples in SparkSQL/DataFrame fails with default configurations
[ https://issues.apache.org/jira/browse/SPARK-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653722#comment-15653722 ] Tsuyoshi Ozawa commented on SPARK-18399: {code} scala> val df = spark.read.json("examples/src/main/resources/people.json") 16/11/10 19:23:56 WARN datasources.DataSource: Error while looking for metadata directory. java.net.ConnectException: Call From ozamac-2.local/10.129.56.104 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1351) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:292) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:282) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:344) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:282) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:297) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:250) {code} > Examples in SparkSQL/DataFrame fails with default configurations > > > Key: SPARK-18399 > URL: https://issues.apache.org/jira/browse/SPARK-18399 > Project: Spark > Issue Type: Bug > Components: Documentation, SQL >Reporter: Tsuyoshi Ozawa > > http://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes > With default configuration, it results in a failure since it tries to access > HDFS while the path to people.json/txt are assumed to be in local file > system. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18345) Structured Streaming quick examples fails with default configuration
[ https://issues.apache.org/jira/browse/SPARK-18345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646551#comment-15646551 ] Tsuyoshi Ozawa commented on SPARK-18345: I would like to tackle this problem. I fixed it locally, so will send PR soon. > Structured Streaming quick examples fails with default configuration > > > Key: SPARK-18345 > URL: https://issues.apache.org/jira/browse/SPARK-18345 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.0.1 >Reporter: Tsuyoshi Ozawa > > StructuredNetworkWordCount results in failure because it needs HDFS > configuration. It should use local filesystem instead of using HDFS by > default. > {quote} > Exception in thread "main" java.net.ConnectException: Call From > ozamac-2.local/192.168.33.1 to localhost:9000 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:408) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397) > at > org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:225) > at > org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:260) > at > org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount$.main(StructuredNetworkWordCount.scala:71) > at > org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount.main(StructuredNetworkWordCount.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {quote} > . -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18345) Structured Streaming quick examples fails with default configuration
Tsuyoshi Ozawa created SPARK-18345: -- Summary: Structured Streaming quick examples fails with default configuration Key: SPARK-18345 URL: https://issues.apache.org/jira/browse/SPARK-18345 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.0.1 Reporter: Tsuyoshi Ozawa StructuredNetworkWordCount results in failure because it needs HDFS configuration. It should use local filesystem instead of using HDFS by default. {quote} Exception in thread "main" java.net.ConnectException: Call From ozamac-2.local/192.168.33.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1351) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397) at org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:225) at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:260) at org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount$.main(StructuredNetworkWordCount.scala:71) at org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount.main(StructuredNetworkWordCount.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {quote} . -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2546) Configuration object thread safety issue
[ https://issues.apache.org/jira/browse/SPARK-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627909#comment-14627909 ] Tsuyoshi Ozawa commented on SPARK-2546: --- [~anknai] cc: [~joshrosen] the problem is fixed in Hadoop 2.7. Could you build spark with hadoop.version=2.7.1? I'll also backport the patch to 2.6.x, but it takes a bit time to release. Configuration object thread safety issue Key: SPARK-2546 URL: https://issues.apache.org/jira/browse/SPARK-2546 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.1, 1.0.2, 1.1.0, 1.2.0 Reporter: Andrew Ash Assignee: Josh Rosen Priority: Critical Fix For: 1.0.3, 1.1.1, 1.2.0 // observed in 0.9.1 but expected to exist in 1.0.1 as well This ticket is copy-pasted from a thread on the dev@ list: {quote} We discovered a very interesting bug in Spark at work last week in Spark 0.9.1 — that the way Spark uses the Hadoop Configuration object is prone to thread safety issues. I believe it still applies in Spark 1.0.1 as well. Let me explain: Observations - Was running a relatively simple job (read from Avro files, do a map, do another map, write back to Avro files) - 412 of 413 tasks completed, but the last task was hung in RUNNING state - The 412 successful tasks completed in median time 3.4s - The last hung task didn't finish even in 20 hours - The executor with the hung task was responsible for 100% of one core of CPU usage - Jstack of the executor attached (relevant thread pasted below) Diagnosis After doing some code spelunking, we determined the issue was concurrent use of a Configuration object for each task on an executor. In Hadoop each task runs in its own JVM, but in Spark multiple tasks can run in the same JVM, so the single-threaded access assumptions of the Configuration object no longer hold in Spark. The specific issue is that the AvroRecordReader actually _modifies_ the JobConf it's given when it's instantiated! It adds a key for the RPC protocol engine in the process of connecting to the Hadoop FileSystem. When many tasks start at the same time (like at the start of a job), many tasks are adding this configuration item to the one Configuration object at once. Internally Configuration uses a java.lang.HashMap, which isn't threadsafe… The below post is an excellent explanation of what happens in the situation where multiple threads insert into a HashMap at the same time. http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html The gist is that you have a thread following a cycle of linked list nodes indefinitely. This exactly matches our observations of the 100% CPU core and also the final location in the stack trace. So it seems the way Spark shares a Configuration object between task threads in an executor is incorrect. We need some way to prevent concurrent access to a single Configuration object. Proposed fix We can clone the JobConf object in HadoopRDD.getJobConf() so each task gets its own JobConf object (and thus Configuration object). The optimization of broadcasting the Configuration object across the cluster can remain, but on the other side I think it needs to be cloned for each task to allow for concurrent access. I'm not sure the performance implications, but the comments suggest that the Configuration object is ~10KB so I would expect a clone on the object to be relatively speedy. Has this been observed before? Does my suggested fix make sense? I'd be happy to file a Jira ticket and continue discussion there for the right way to fix. Thanks! Andrew P.S. For others seeing this issue, our temporary workaround is to enable spark.speculation, which retries failed (or hung) tasks on other machines. {noformat} Executor task launch worker-6 daemon prio=10 tid=0x7f91f01fe000 nid=0x54b1 runnable [0x7f92d74f1000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.transfer(HashMap.java:601) at java.util.HashMap.resize(HashMap.java:581) at java.util.HashMap.addEntry(HashMap.java:879) at java.util.HashMap.put(HashMap.java:505) at org.apache.hadoop.conf.Configuration.set(Configuration.java:803) at org.apache.hadoop.conf.Configuration.set(Configuration.java:783) at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:1662) at org.apache.hadoop.ipc.RPC.setProtocolEngine(RPC.java:193) at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:343) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:168) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129) at
[jira] [Commented] (SPARK-2546) Configuration object thread safety issue
[ https://issues.apache.org/jira/browse/SPARK-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286971#comment-14286971 ] Tsuyoshi OZAWA commented on SPARK-2546: --- Now HADOOP-11209, the problem reported by [~joshrosen], is resolved by [~varun_saxena]'s contribution. Thanks for your reporting. Configuration object thread safety issue Key: SPARK-2546 URL: https://issues.apache.org/jira/browse/SPARK-2546 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.1, 1.0.2, 1.1.0, 1.2.0 Reporter: Andrew Ash Assignee: Josh Rosen Priority: Critical Fix For: 1.1.1, 1.2.0, 1.0.3 // observed in 0.9.1 but expected to exist in 1.0.1 as well This ticket is copy-pasted from a thread on the dev@ list: {quote} We discovered a very interesting bug in Spark at work last week in Spark 0.9.1 — that the way Spark uses the Hadoop Configuration object is prone to thread safety issues. I believe it still applies in Spark 1.0.1 as well. Let me explain: Observations - Was running a relatively simple job (read from Avro files, do a map, do another map, write back to Avro files) - 412 of 413 tasks completed, but the last task was hung in RUNNING state - The 412 successful tasks completed in median time 3.4s - The last hung task didn't finish even in 20 hours - The executor with the hung task was responsible for 100% of one core of CPU usage - Jstack of the executor attached (relevant thread pasted below) Diagnosis After doing some code spelunking, we determined the issue was concurrent use of a Configuration object for each task on an executor. In Hadoop each task runs in its own JVM, but in Spark multiple tasks can run in the same JVM, so the single-threaded access assumptions of the Configuration object no longer hold in Spark. The specific issue is that the AvroRecordReader actually _modifies_ the JobConf it's given when it's instantiated! It adds a key for the RPC protocol engine in the process of connecting to the Hadoop FileSystem. When many tasks start at the same time (like at the start of a job), many tasks are adding this configuration item to the one Configuration object at once. Internally Configuration uses a java.lang.HashMap, which isn't threadsafe… The below post is an excellent explanation of what happens in the situation where multiple threads insert into a HashMap at the same time. http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html The gist is that you have a thread following a cycle of linked list nodes indefinitely. This exactly matches our observations of the 100% CPU core and also the final location in the stack trace. So it seems the way Spark shares a Configuration object between task threads in an executor is incorrect. We need some way to prevent concurrent access to a single Configuration object. Proposed fix We can clone the JobConf object in HadoopRDD.getJobConf() so each task gets its own JobConf object (and thus Configuration object). The optimization of broadcasting the Configuration object across the cluster can remain, but on the other side I think it needs to be cloned for each task to allow for concurrent access. I'm not sure the performance implications, but the comments suggest that the Configuration object is ~10KB so I would expect a clone on the object to be relatively speedy. Has this been observed before? Does my suggested fix make sense? I'd be happy to file a Jira ticket and continue discussion there for the right way to fix. Thanks! Andrew P.S. For others seeing this issue, our temporary workaround is to enable spark.speculation, which retries failed (or hung) tasks on other machines. {noformat} Executor task launch worker-6 daemon prio=10 tid=0x7f91f01fe000 nid=0x54b1 runnable [0x7f92d74f1000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.transfer(HashMap.java:601) at java.util.HashMap.resize(HashMap.java:581) at java.util.HashMap.addEntry(HashMap.java:879) at java.util.HashMap.put(HashMap.java:505) at org.apache.hadoop.conf.Configuration.set(Configuration.java:803) at org.apache.hadoop.conf.Configuration.set(Configuration.java:783) at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:1662) at org.apache.hadoop.ipc.RPC.setProtocolEngine(RPC.java:193) at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:343) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:168) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:436) at
[jira] [Updated] (SPARK-4915) Wrong class name of external shuffle service in the dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated SPARK-4915: -- Description: docs/job-scheduling.md says as follows: {quote} To enable this service, set `spark.shuffle.service.enabled` to `true`. In YARN, this external shuffle service is implemented in `org.apache.spark.yarn.network.YarnShuffleService` that runs in each `NodeManager` in your cluster. {quote} The class name org.apache.spark.yarn.network.YarnShuffleService is wrong. org.apache.spark.network.yarn.YarnShuffleService is correct class name to be specified. was: In docs/job-scheduling.md says as follows: {quote} To enable this service, set `spark.shuffle.service.enabled` to `true`. In YARN, this external shuffle service is implemented in `org.apache.spark.yarn.network.YarnShuffleService` that runs in each `NodeManager` in your cluster. {quote} The class name org.apache.spark.yarn.network.YarnShuffleService is wrong. org.apache.spark.network.yarn.YarnShuffleService is correct class name to be specified. Wrong class name of external shuffle service in the dynamic allocation -- Key: SPARK-4915 URL: https://issues.apache.org/jira/browse/SPARK-4915 Project: Spark Issue Type: Documentation Components: Documentation, YARN Reporter: Tsuyoshi OZAWA docs/job-scheduling.md says as follows: {quote} To enable this service, set `spark.shuffle.service.enabled` to `true`. In YARN, this external shuffle service is implemented in `org.apache.spark.yarn.network.YarnShuffleService` that runs in each `NodeManager` in your cluster. {quote} The class name org.apache.spark.yarn.network.YarnShuffleService is wrong. org.apache.spark.network.yarn.YarnShuffleService is correct class name to be specified. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4915) Wrong class name of external shuffle service in the dynamic allocation
Tsuyoshi OZAWA created SPARK-4915: - Summary: Wrong class name of external shuffle service in the dynamic allocation Key: SPARK-4915 URL: https://issues.apache.org/jira/browse/SPARK-4915 Project: Spark Issue Type: Documentation Components: Documentation, YARN Reporter: Tsuyoshi OZAWA In docs/job-scheduling.md says as follows: {quote} To enable this service, set `spark.shuffle.service.enabled` to `true`. In YARN, this external shuffle service is implemented in `org.apache.spark.yarn.network.YarnShuffleService` that runs in each `NodeManager` in your cluster. {quote} The class name org.apache.spark.yarn.network.YarnShuffleService is wrong. org.apache.spark.network.yarn.YarnShuffleService is correct class name to be specified. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4915) Wrong classname of external shuffle service in the doc for dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated SPARK-4915: -- Summary: Wrong classname of external shuffle service in the doc for dynamic allocation (was: Wrong classname of external shuffle service in the dynamic allocation) Wrong classname of external shuffle service in the doc for dynamic allocation - Key: SPARK-4915 URL: https://issues.apache.org/jira/browse/SPARK-4915 Project: Spark Issue Type: Documentation Components: Documentation, YARN Reporter: Tsuyoshi OZAWA docs/job-scheduling.md says as follows: {quote} To enable this service, set `spark.shuffle.service.enabled` to `true`. In YARN, this external shuffle service is implemented in `org.apache.spark.yarn.network.YarnShuffleService` that runs in each `NodeManager` in your cluster. {quote} The class name org.apache.spark.yarn.network.YarnShuffleService is wrong. org.apache.spark.network.yarn.YarnShuffleService is correct class name to be specified. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4140) Document the dynamic allocation feature
[ https://issues.apache.org/jira/browse/SPARK-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249499#comment-14249499 ] Tsuyoshi OZAWA commented on SPARK-4140: --- How about converting this issue into one of subtasks of SPARK-3174? It's easier to track. Document the dynamic allocation feature --- Key: SPARK-4140 URL: https://issues.apache.org/jira/browse/SPARK-4140 Project: Spark Issue Type: Improvement Components: Spark Core, YARN Affects Versions: 1.2.0 Reporter: Andrew Or Assignee: Andrew Or This blocks on SPARK-3795 and SPARK-3822. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4839) Adding documentations about dynamicAllocation
[ https://issues.apache.org/jira/browse/SPARK-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated SPARK-4839: -- Component/s: Documentation Adding documentations about dynamicAllocation - Key: SPARK-4839 URL: https://issues.apache.org/jira/browse/SPARK-4839 Project: Spark Issue Type: Sub-task Components: Documentation, Spark Core, YARN Affects Versions: 1.2.0 Reporter: Tsuyoshi OZAWA Fix For: 1.2.0 There are not docs about dynamicAllocation. We should add them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4839) Adding documentations about dynamic resource allocation
[ https://issues.apache.org/jira/browse/SPARK-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated SPARK-4839: -- Summary: Adding documentations about dynamic resource allocation (was: Adding documentations about dynamicAllocation) Adding documentations about dynamic resource allocation --- Key: SPARK-4839 URL: https://issues.apache.org/jira/browse/SPARK-4839 Project: Spark Issue Type: Sub-task Components: Documentation, Spark Core, YARN Affects Versions: 1.2.0 Reporter: Tsuyoshi OZAWA Fix For: 1.2.0 There are not docs about dynamicAllocation. We should add them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4678) A SQL query with subquery fails
Tsuyoshi OZAWA created SPARK-4678: - Summary: A SQL query with subquery fails Key: SPARK-4678 URL: https://issues.apache.org/jira/browse/SPARK-4678 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.1 Reporter: Tsuyoshi OZAWA {code} spark-sql create external table if NOT EXISTS randomText100GB(text string) location 'hdfs:///user/ozawa/randomText100GB'; spark-sql CREATE TABLE wordcount AS SELECT word, count(1) AS count FROM (SELECT EXPLODE(SPLIT(LCASE(REGEXP_REPLACE(text,'[\\p{Punct},\\p{Cntrl}]','')),' ')) AS word FROM randomText100GB) words GROUP BY word; org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in stage 1.0 (TID 25, hadoop-slave2.c.gcp-s amples.internal): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: word#5 org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:43) org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:42) org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165) org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156) org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:42) org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52) org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52) scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) scala.collection.TraversableLike$class.map(TraversableLike.scala:244) scala.collection.AbstractTraversable.map(Traversable.scala:105) org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.init(Projection.scala:52) org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106) org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106) org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43) org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:42) org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4678) A SQL query with subquery fails with TreeNodeException
[ https://issues.apache.org/jira/browse/SPARK-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated SPARK-4678: -- Summary: A SQL query with subquery fails with TreeNodeException (was: A SQL query with subquery fails) A SQL query with subquery fails with TreeNodeException -- Key: SPARK-4678 URL: https://issues.apache.org/jira/browse/SPARK-4678 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.1.1 Reporter: Tsuyoshi OZAWA {code} spark-sql create external table if NOT EXISTS randomText100GB(text string) location 'hdfs:///user/ozawa/randomText100GB'; spark-sql CREATE TABLE wordcount AS SELECT word, count(1) AS count FROM (SELECT EXPLODE(SPLIT(LCASE(REGEXP_REPLACE(text,'[\\p{Punct},\\p{Cntrl}]','')),' ')) AS word FROM randomText100GB) words GROUP BY word; org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in stage 1.0 (TID 25, hadoop-slave2.c.gcp-s amples.internal): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: word#5 org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:43) org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:42) org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165) org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156) org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:42) org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52) org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52) scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) scala.collection.TraversableLike$class.map(TraversableLike.scala:244) scala.collection.AbstractTraversable.map(Traversable.scala:105) org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.init(Projection.scala:52) org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106) org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106) org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43) org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:42) org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4651) Adding -Phadoop-2.5 and -Phadoop-2.6 to compile with newer versions of Hadoop
Tsuyoshi OZAWA created SPARK-4651: - Summary: Adding -Phadoop-2.5 and -Phadoop-2.6 to compile with newer versions of Hadoop Key: SPARK-4651 URL: https://issues.apache.org/jira/browse/SPARK-4651 Project: Spark Issue Type: Improvement Components: Build Reporter: Tsuyoshi OZAWA Currently, we don't have newer profiles to compile Spark with newer versions of Hadoop. We should have them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4651) Adding -Phadoop-2.5 and -Phadoop-2.6 to compile with newer versions of Hadoop
[ https://issues.apache.org/jira/browse/SPARK-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228571#comment-14228571 ] Tsuyoshi OZAWA commented on SPARK-4651: --- I'll send PR via github soon. Adding -Phadoop-2.5 and -Phadoop-2.6 to compile with newer versions of Hadoop - Key: SPARK-4651 URL: https://issues.apache.org/jira/browse/SPARK-4651 Project: Spark Issue Type: Improvement Components: Build Reporter: Tsuyoshi OZAWA Currently, we don't have newer profiles to compile Spark with newer versions of Hadoop. We should have them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4651) Adding -Phadoop-2.5 and -Phadoop-2.6 to compile Spark with newer versions of Hadoop
[ https://issues.apache.org/jira/browse/SPARK-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated SPARK-4651: -- Summary: Adding -Phadoop-2.5 and -Phadoop-2.6 to compile Spark with newer versions of Hadoop (was: Adding -Phadoop-2.5 and -Phadoop-2.6 to compile with newer versions of Hadoop) Adding -Phadoop-2.5 and -Phadoop-2.6 to compile Spark with newer versions of Hadoop --- Key: SPARK-4651 URL: https://issues.apache.org/jira/browse/SPARK-4651 Project: Spark Issue Type: Improvement Components: Build Reporter: Tsuyoshi OZAWA Currently, we don't have newer profiles to compile Spark with newer versions of Hadoop. We should have them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4651) Adding -Phadoop-2.5 and -Phadoop-2.6 to compile Spark with newer versions of Hadoop
[ https://issues.apache.org/jira/browse/SPARK-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228576#comment-14228576 ] Tsuyoshi OZAWA commented on SPARK-4651: --- [~srowen], oops, I've thought it's already released... anyway, I'll add 2.4+ profile. Thanks for your review! Adding -Phadoop-2.5 and -Phadoop-2.6 to compile Spark with newer versions of Hadoop --- Key: SPARK-4651 URL: https://issues.apache.org/jira/browse/SPARK-4651 Project: Spark Issue Type: Improvement Components: Build Reporter: Tsuyoshi OZAWA Currently, we don't have newer profiles to compile Spark with newer versions of Hadoop. We should have them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4651) Adding -Phadoop-2.4+ to compile Spark with newer versions of Hadoop
[ https://issues.apache.org/jira/browse/SPARK-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated SPARK-4651: -- Summary: Adding -Phadoop-2.4+ to compile Spark with newer versions of Hadoop (was: Adding -Phadoop-2.5 and -Phadoop-2.6 to compile Spark with newer versions of Hadoop) Adding -Phadoop-2.4+ to compile Spark with newer versions of Hadoop --- Key: SPARK-4651 URL: https://issues.apache.org/jira/browse/SPARK-4651 Project: Spark Issue Type: Improvement Components: Build Reporter: Tsuyoshi OZAWA Currently, we don't have newer profiles to compile Spark with newer versions of Hadoop. We should have them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4267) Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later
[ https://issues.apache.org/jira/browse/SPARK-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201959#comment-14201959 ] Tsuyoshi OZAWA commented on SPARK-4267: --- [~sandyr] [~pwendell] do you have any workarounds to deal with this problem? Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later -- Key: SPARK-4267 URL: https://issues.apache.org/jira/browse/SPARK-4267 Project: Spark Issue Type: Bug Reporter: Tsuyoshi OZAWA Currently we're trying Spark on YARN included in Hadoop 2.5.1. Hadoop 2.5 uses protobuf 2.5.0 so I compiled with protobuf 2.5.1 like this: {code} ./make-distribution.sh --name spark-1.1.1 --tgz -Pyarn -Dhadoop.version=2.5.1 -Dprotobuf.version=2.5.0 {code} Then Spark on YARN fails to launch jobs with NPE. {code} $ bin/spark-shell --master yarn-client scala sc.textFile(hdfs:///user/ozawa/wordcountInput20G).flatMap(line = line.split( )).map(word = (word, 1)).persist().reduceByKey((a, b) = a + b, 16).saveAsTextFile(hdfs:///user/ozawa/sparkWordcountOutNew2); java.lang.NullPointerException at org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1284) at org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1291) at org.apache.spark.SparkContext.textFile$default$2(SparkContext.scala:480) at $iwC$$iwC$$iwC$$iwC.init(console:13) at $iwC$$iwC$$iwC.init(console:18) at $iwC$$iwC.init(console:20) at $iwC.init(console:22) at init(console:24) at .init(console:28) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610)
[jira] [Updated] (SPARK-4267) Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later
[ https://issues.apache.org/jira/browse/SPARK-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated SPARK-4267: -- Description: Currently we're trying Spark on YARN included in Hadoop 2.5.1. Hadoop 2.5 uses protobuf 2.5.0 so I compiled with protobuf 2.5.1 like this: {code} ./make-distribution.sh --name spark-1.1.1 --tgz -Pyarn -Dhadoop.version=2.5.1 -Dprotobuf.version=2.5.0 {code} Then Spark on YARN cannot fail to run with NPE. {code} $ bin/spark-shell --master yarn-client scala sc.textFile(hdfs:///user/ozawa/wordcountInput20G).flatMap(line = line.split( )).map(word = (word, 1)).persist().reduceByKey((a, b) = a + b, 16).saveAsTextFile(hdfs:///user/ozawa/sparkWordcountOutNew2); java.lang.NullPointerException at org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1284) at org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1291) at org.apache.spark.SparkContext.textFile$default$2(SparkContext.scala:480) at $iwC$$iwC$$iwC$$iwC.init(console:13) at $iwC$$iwC$$iwC.init(console:18) at $iwC$$iwC.init(console:20) at $iwC.init(console:22) at init(console:24) at .init(console:28) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:823) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:868) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:780) at
[jira] [Created] (SPARK-4267) Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later
Tsuyoshi OZAWA created SPARK-4267: - Summary: Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later Key: SPARK-4267 URL: https://issues.apache.org/jira/browse/SPARK-4267 Project: Spark Issue Type: Bug Reporter: Tsuyoshi OZAWA Currently we're trying Spark on YARN included in Hadoop 2.5.1. Hadoop 2.5 uses protobuf 2.4.1 so I compiled with protobuf 2.5.1 like this: {code} ./make-distribution.sh --name spark-1.1.1 --tgz -Pyarn -Dhadoop.version=2.5.1 -Dprotobuf.version=2.5.0 {code} Then Spark on YARN cannot fail to run with NPE. {code} $ bin/spark-shell --master yarn-client scala sc.textFile(hdfs:///user/ozawa/wordcountInput20G).flatMap(line = line.split( )).map(word = (word, 1)).persist().reduceByKey((a, b) = a + b, 16).saveAsTextFile(hdfs:///user/ozawa/sparkWordcountOutNew2); java.lang.NullPointerException at org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1284) at org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1291) at org.apache.spark.SparkContext.textFile$default$2(SparkContext.scala:480) at $iwC$$iwC$$iwC$$iwC.init(console:13) at $iwC$$iwC$$iwC.init(console:18) at $iwC$$iwC.init(console:20) at $iwC.init(console:22) at init(console:24) at .init(console:28) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:823) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:868)
[jira] [Updated] (SPARK-4267) Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later
[ https://issues.apache.org/jira/browse/SPARK-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated SPARK-4267: -- Description: Currently we're trying Spark on YARN included in Hadoop 2.5.1. Hadoop 2.5 uses protobuf 2.5.0 so I compiled with protobuf 2.5.1 like this: {code} ./make-distribution.sh --name spark-1.1.1 --tgz -Pyarn -Dhadoop.version=2.5.1 -Dprotobuf.version=2.5.0 {code} Then Spark on YARN fails to launch jobs with NPE. {code} $ bin/spark-shell --master yarn-client scala sc.textFile(hdfs:///user/ozawa/wordcountInput20G).flatMap(line = line.split( )).map(word = (word, 1)).persist().reduceByKey((a, b) = a + b, 16).saveAsTextFile(hdfs:///user/ozawa/sparkWordcountOutNew2); java.lang.NullPointerException at org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1284) at org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1291) at org.apache.spark.SparkContext.textFile$default$2(SparkContext.scala:480) at $iwC$$iwC$$iwC$$iwC.init(console:13) at $iwC$$iwC$$iwC.init(console:18) at $iwC$$iwC.init(console:20) at $iwC.init(console:22) at init(console:24) at .init(console:28) at .clinit(console) at .init(console:7) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:823) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:868) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:780) at
[jira] [Commented] (SPARK-1097) ConcurrentModificationException
[ https://issues.apache.org/jira/browse/SPARK-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020336#comment-14020336 ] Tsuyoshi OZAWA commented on SPARK-1097: --- [~jblomo], thank you for reporting. This issue is fixed in next minor Hadoop release - 2.4.1. Note that 2.4.0 doesn't include the fix. ConcurrentModificationException --- Key: SPARK-1097 URL: https://issues.apache.org/jira/browse/SPARK-1097 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Fabrizio Milo Attachments: nravi_Conf_Spark-1388.patch {noformat} 14/02/16 08:18:45 WARN TaskSetManager: Loss was due to java.util.ConcurrentModificationException java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:926) at java.util.HashMap$KeyIterator.next(HashMap.java:960) at java.util.AbstractCollection.addAll(AbstractCollection.java:341) at java.util.HashSet.init(HashSet.java:117) at org.apache.hadoop.conf.Configuration.init(Configuration.java:554) at org.apache.hadoop.mapred.JobConf.init(JobConf.java:439) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:110) at org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:154) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.UnionPartition.iterator(UnionRDD.scala:32) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:72) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1097) ConcurrentModificationException
[ https://issues.apache.org/jira/browse/SPARK-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958671#comment-13958671 ] Tsuyoshi OZAWA commented on SPARK-1097: --- A patch by Nishkam on HADOOP-10456 has been already reviewed and will be committed in a few days against hadoop's trunk. ConcurrentModificationException --- Key: SPARK-1097 URL: https://issues.apache.org/jira/browse/SPARK-1097 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Fabrizio Milo Attachments: nravi_Conf_Spark-1388.patch {noformat} 14/02/16 08:18:45 WARN TaskSetManager: Loss was due to java.util.ConcurrentModificationException java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:926) at java.util.HashMap$KeyIterator.next(HashMap.java:960) at java.util.AbstractCollection.addAll(AbstractCollection.java:341) at java.util.HashSet.init(HashSet.java:117) at org.apache.hadoop.conf.Configuration.init(Configuration.java:554) at org.apache.hadoop.mapred.JobConf.init(JobConf.java:439) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:110) at org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:154) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.UnionPartition.iterator(UnionRDD.scala:32) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:72) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)