[jira] [Commented] (SPARK-19207) LocalSparkSession should use Slf4JLoggerFactory.INSTANCE instead of creating new object via constructor

2017-01-13 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821417#comment-15821417
 ] 

Tsuyoshi Ozawa commented on SPARK-19207:


I will send PR to fix this problem

> LocalSparkSession should use Slf4JLoggerFactory.INSTANCE instead of creating 
> new object via constructor
> ---
>
> Key: SPARK-19207
> URL: https://issues.apache.org/jira/browse/SPARK-19207
> Project: Spark
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>
> It's deprecated to create Slf4JLoggerFactory's instance via constructor. A 
> warning is generated:
> {code}
> [warn] 
> /Users/ozawa/workspace/spark/sql/core/src/test/scala/org/apache/spark/sql/LocalSparkSession.scala:32:
>  constructor Slf4JLoggerFactory in class Slf4JLoggerFactory is deprecated: 
> see corresponding Javadoc for more information.
> [warn] InternalLoggerFactory.setDefaultFactory(new Slf4JLoggerFactory())
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19207) LocalSparkSession should use Slf4JLoggerFactory.INSTANCE instead of creating new object via constructor

2017-01-13 Thread Tsuyoshi Ozawa (JIRA)
Tsuyoshi Ozawa created SPARK-19207:
--

 Summary: LocalSparkSession should use Slf4JLoggerFactory.INSTANCE 
instead of creating new object via constructor
 Key: SPARK-19207
 URL: https://issues.apache.org/jira/browse/SPARK-19207
 Project: Spark
  Issue Type: Improvement
Reporter: Tsuyoshi Ozawa


It's deprecated to create Slf4JLoggerFactory's instance via constructor. A 
warning is generated:
{code}
[warn] 
/Users/ozawa/workspace/spark/sql/core/src/test/scala/org/apache/spark/sql/LocalSparkSession.scala:32:
 constructor Slf4JLoggerFactory in class Slf4JLoggerFactory is deprecated: see 
corresponding Javadoc for more information.
[warn] InternalLoggerFactory.setDefaultFactory(new Slf4JLoggerFactory())
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-18345) Structured Streaming quick examples fails with default configuration

2016-11-10 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa closed SPARK-18345.
--
Resolution: Not A Bug

Oh, my bad. This was by mistake of my configuration. After clearing HADOOP_*, 
it works well. Closing this as Not a bug.

> Structured Streaming quick examples fails with default configuration
> 
>
> Key: SPARK-18345
> URL: https://issues.apache.org/jira/browse/SPARK-18345
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.1
>Reporter: Tsuyoshi Ozawa
>
> StructuredNetworkWordCount results in failure because it needs HDFS 
> configuration. It should use local filesystem instead of using HDFS by 
> default. 
> {quote}
> Exception in thread "main" java.net.ConnectException: Call From 
> ozamac-2.local/192.168.33.1 to localhost:9000 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>   at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
>   at 
> org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:225)
>   at 
> org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:260)
>   at 
> org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount$.main(StructuredNetworkWordCount.scala:71)
>   at 
> org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount.main(StructuredNetworkWordCount.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {quote}
> .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-18399) Examples in SparkSQL/DataFrame guide fails with default configuration

2016-11-10 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa closed SPARK-18399.
--
Resolution: Not A Problem

It's completely my bad! I had a hadoop configuration on my local. Closing this 
as not a problem.

> Examples in SparkSQL/DataFrame guide fails with default configuration
> -
>
> Key: SPARK-18399
> URL: https://issues.apache.org/jira/browse/SPARK-18399
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, SQL
>Reporter: Tsuyoshi Ozawa
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes
> With default configuration, it results in a failure since it tries to access 
> HDFS while the path to people.json/txt are assumed to be in local file 
> system. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18399) Examples in SparkSQL/DataFrame guide fails with default configuration

2016-11-10 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated SPARK-18399:
---
Summary: Examples in SparkSQL/DataFrame guide fails with default 
configuration  (was: Examples in SparkSQL/DataFrame fails with default 
configuration)

> Examples in SparkSQL/DataFrame guide fails with default configuration
> -
>
> Key: SPARK-18399
> URL: https://issues.apache.org/jira/browse/SPARK-18399
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, SQL
>Reporter: Tsuyoshi Ozawa
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes
> With default configuration, it results in a failure since it tries to access 
> HDFS while the path to people.json/txt are assumed to be in local file 
> system. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18399) Examples in SparkSQL/DataFrame fails with default configuration

2016-11-10 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated SPARK-18399:
---
Summary: Examples in SparkSQL/DataFrame fails with default configuration  
(was: Examples in SparkSQL/DataFrame fails with default configurations)

> Examples in SparkSQL/DataFrame fails with default configuration
> ---
>
> Key: SPARK-18399
> URL: https://issues.apache.org/jira/browse/SPARK-18399
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, SQL
>Reporter: Tsuyoshi Ozawa
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes
> With default configuration, it results in a failure since it tries to access 
> HDFS while the path to people.json/txt are assumed to be in local file 
> system. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18399) Examples in SparkSQL/DataFrame fails with default configurations

2016-11-10 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653729#comment-15653729
 ] 

Tsuyoshi Ozawa commented on SPARK-18399:


We should use absolute path with "file:///" prefix, instead of relative path. I 
will send a PR soon.

> Examples in SparkSQL/DataFrame fails with default configurations
> 
>
> Key: SPARK-18399
> URL: https://issues.apache.org/jira/browse/SPARK-18399
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, SQL
>Reporter: Tsuyoshi Ozawa
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes
> With default configuration, it results in a failure since it tries to access 
> HDFS while the path to people.json/txt are assumed to be in local file 
> system. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18399) Examples in SparkSQL/DataFrame fails with default configurations

2016-11-10 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653722#comment-15653722
 ] 

Tsuyoshi Ozawa edited comment on SPARK-18399 at 11/10/16 10:46 AM:
---

This is a log when the failure happens, without any configuration.
{code}
scala> val df = spark.read.json("examples/src/main/resources/people.json")
16/11/10 19:23:56 WARN datasources.DataSource: Error while looking for metadata 
directory.
java.net.ConnectException: Call From ozamac-2.local/10.129.56.104 to 
localhost:9000 failed on connection exception: java.net.ConnectException: 
Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
  at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
  at org.apache.hadoop.ipc.Client.call(Client.java:1351)
  at org.apache.hadoop.ipc.Client.call(Client.java:1300)
  at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
  at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:483)
  at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
  at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
  at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
  at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
  at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
  at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:292)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:282)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:344)
  at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:282)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
  at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:297)
  at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:250)
{code}



was (Author: ozawa):
{code}
scala> val df = spark.read.json("examples/src/main/resources/people.json")
16/11/10 19:23:56 WARN datasources.DataSource: Error while looking for metadata 
directory.
java.net.ConnectException: Call From ozamac-2.local/10.129.56.104 to 
localhost:9000 failed on connection exception: java.net.ConnectException: 
Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
  at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
  at org.apache.hadoop.ipc.Client.call(Client.java:1351)
  at org.apache.hadoop.ipc.Client.call(Client.java:1300)
  at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
  at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at 

[jira] [Created] (SPARK-18399) Examples in SparkSQL/DataFrame fails with default configurations

2016-11-10 Thread Tsuyoshi Ozawa (JIRA)
Tsuyoshi Ozawa created SPARK-18399:
--

 Summary: Examples in SparkSQL/DataFrame fails with default 
configurations
 Key: SPARK-18399
 URL: https://issues.apache.org/jira/browse/SPARK-18399
 Project: Spark
  Issue Type: Bug
  Components: Documentation, SQL
Reporter: Tsuyoshi Ozawa


http://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes

With default configuration, it results in a failure since it tries to access 
HDFS while the path to people.json/txt are assumed to be in local file system. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18399) Examples in SparkSQL/DataFrame fails with default configurations

2016-11-10 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653722#comment-15653722
 ] 

Tsuyoshi Ozawa commented on SPARK-18399:


{code}
scala> val df = spark.read.json("examples/src/main/resources/people.json")
16/11/10 19:23:56 WARN datasources.DataSource: Error while looking for metadata 
directory.
java.net.ConnectException: Call From ozamac-2.local/10.129.56.104 to 
localhost:9000 failed on connection exception: java.net.ConnectException: 
Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
  at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
  at org.apache.hadoop.ipc.Client.call(Client.java:1351)
  at org.apache.hadoop.ipc.Client.call(Client.java:1300)
  at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
  at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:483)
  at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
  at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
  at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
  at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
  at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
  at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:292)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:282)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:344)
  at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:282)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
  at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:297)
  at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:250)
{code}


> Examples in SparkSQL/DataFrame fails with default configurations
> 
>
> Key: SPARK-18399
> URL: https://issues.apache.org/jira/browse/SPARK-18399
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, SQL
>Reporter: Tsuyoshi Ozawa
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes
> With default configuration, it results in a failure since it tries to access 
> HDFS while the path to people.json/txt are assumed to be in local file 
> system. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18345) Structured Streaming quick examples fails with default configuration

2016-11-07 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646551#comment-15646551
 ] 

Tsuyoshi Ozawa commented on SPARK-18345:


I would like to tackle this problem. I fixed it locally, so will send PR soon.

> Structured Streaming quick examples fails with default configuration
> 
>
> Key: SPARK-18345
> URL: https://issues.apache.org/jira/browse/SPARK-18345
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.1
>Reporter: Tsuyoshi Ozawa
>
> StructuredNetworkWordCount results in failure because it needs HDFS 
> configuration. It should use local filesystem instead of using HDFS by 
> default. 
> {quote}
> Exception in thread "main" java.net.ConnectException: Call From 
> ozamac-2.local/192.168.33.1 to localhost:9000 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1351)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>   at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
>   at 
> org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:225)
>   at 
> org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:260)
>   at 
> org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount$.main(StructuredNetworkWordCount.scala:71)
>   at 
> org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount.main(StructuredNetworkWordCount.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {quote}
> .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18345) Structured Streaming quick examples fails with default configuration

2016-11-07 Thread Tsuyoshi Ozawa (JIRA)
Tsuyoshi Ozawa created SPARK-18345:
--

 Summary: Structured Streaming quick examples fails with default 
configuration
 Key: SPARK-18345
 URL: https://issues.apache.org/jira/browse/SPARK-18345
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.0.1
Reporter: Tsuyoshi Ozawa


StructuredNetworkWordCount results in failure because it needs HDFS 
configuration. It should use local filesystem instead of using HDFS by default. 

{quote}
Exception in thread "main" java.net.ConnectException: Call From 
ozamac-2.local/192.168.33.1 to localhost:9000 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
at 
org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:225)
at 
org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:260)
at 
org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount$.main(StructuredNetworkWordCount.scala:71)
at 
org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount.main(StructuredNetworkWordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{quote}

.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2546) Configuration object thread safety issue

2015-07-15 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627909#comment-14627909
 ] 

Tsuyoshi Ozawa commented on SPARK-2546:
---

[~anknai] cc: [~joshrosen] the problem is fixed in Hadoop 2.7. Could you build 
spark with hadoop.version=2.7.1? I'll also backport the patch to 2.6.x, but it 
takes a bit time to release.

 Configuration object thread safety issue
 

 Key: SPARK-2546
 URL: https://issues.apache.org/jira/browse/SPARK-2546
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.1, 1.0.2, 1.1.0, 1.2.0
Reporter: Andrew Ash
Assignee: Josh Rosen
Priority: Critical
 Fix For: 1.0.3, 1.1.1, 1.2.0


 // observed in 0.9.1 but expected to exist in 1.0.1 as well
 This ticket is copy-pasted from a thread on the dev@ list:
 {quote}
 We discovered a very interesting bug in Spark at work last week in Spark 
 0.9.1 — that the way Spark uses the Hadoop Configuration object is prone to 
 thread safety issues.  I believe it still applies in Spark 1.0.1 as well.  
 Let me explain:
 Observations
  - Was running a relatively simple job (read from Avro files, do a map, do 
 another map, write back to Avro files)
  - 412 of 413 tasks completed, but the last task was hung in RUNNING state
  - The 412 successful tasks completed in median time 3.4s
  - The last hung task didn't finish even in 20 hours
  - The executor with the hung task was responsible for 100% of one core of 
 CPU usage
  - Jstack of the executor attached (relevant thread pasted below)
 Diagnosis
 After doing some code spelunking, we determined the issue was concurrent use 
 of a Configuration object for each task on an executor.  In Hadoop each task 
 runs in its own JVM, but in Spark multiple tasks can run in the same JVM, so 
 the single-threaded access assumptions of the Configuration object no longer 
 hold in Spark.
 The specific issue is that the AvroRecordReader actually _modifies_ the 
 JobConf it's given when it's instantiated!  It adds a key for the RPC 
 protocol engine in the process of connecting to the Hadoop FileSystem.  When 
 many tasks start at the same time (like at the start of a job), many tasks 
 are adding this configuration item to the one Configuration object at once.  
 Internally Configuration uses a java.lang.HashMap, which isn't threadsafe… 
 The below post is an excellent explanation of what happens in the situation 
 where multiple threads insert into a HashMap at the same time.
 http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html
 The gist is that you have a thread following a cycle of linked list nodes 
 indefinitely.  This exactly matches our observations of the 100% CPU core and 
 also the final location in the stack trace.
 So it seems the way Spark shares a Configuration object between task threads 
 in an executor is incorrect.  We need some way to prevent concurrent access 
 to a single Configuration object.
 Proposed fix
 We can clone the JobConf object in HadoopRDD.getJobConf() so each task gets 
 its own JobConf object (and thus Configuration object).  The optimization of 
 broadcasting the Configuration object across the cluster can remain, but on 
 the other side I think it needs to be cloned for each task to allow for 
 concurrent access.  I'm not sure the performance implications, but the 
 comments suggest that the Configuration object is ~10KB so I would expect a 
 clone on the object to be relatively speedy.
 Has this been observed before?  Does my suggested fix make sense?  I'd be 
 happy to file a Jira ticket and continue discussion there for the right way 
 to fix.
 Thanks!
 Andrew
 P.S.  For others seeing this issue, our temporary workaround is to enable 
 spark.speculation, which retries failed (or hung) tasks on other machines.
 {noformat}
 Executor task launch worker-6 daemon prio=10 tid=0x7f91f01fe000 
 nid=0x54b1 runnable [0x7f92d74f1000]
java.lang.Thread.State: RUNNABLE
 at java.util.HashMap.transfer(HashMap.java:601)
 at java.util.HashMap.resize(HashMap.java:581)
 at java.util.HashMap.addEntry(HashMap.java:879)
 at java.util.HashMap.put(HashMap.java:505)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:803)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:783)
 at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:1662)
 at org.apache.hadoop.ipc.RPC.setProtocolEngine(RPC.java:193)
 at 
 org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:343)
 at 
 org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:168)
 at 
 org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
 at 

[jira] [Commented] (SPARK-2546) Configuration object thread safety issue

2015-01-21 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286971#comment-14286971
 ] 

Tsuyoshi OZAWA commented on SPARK-2546:
---

Now HADOOP-11209, the problem reported by [~joshrosen], is resolved by 
[~varun_saxena]'s contribution. Thanks for your reporting.

 Configuration object thread safety issue
 

 Key: SPARK-2546
 URL: https://issues.apache.org/jira/browse/SPARK-2546
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.1, 1.0.2, 1.1.0, 1.2.0
Reporter: Andrew Ash
Assignee: Josh Rosen
Priority: Critical
 Fix For: 1.1.1, 1.2.0, 1.0.3


 // observed in 0.9.1 but expected to exist in 1.0.1 as well
 This ticket is copy-pasted from a thread on the dev@ list:
 {quote}
 We discovered a very interesting bug in Spark at work last week in Spark 
 0.9.1 — that the way Spark uses the Hadoop Configuration object is prone to 
 thread safety issues.  I believe it still applies in Spark 1.0.1 as well.  
 Let me explain:
 Observations
  - Was running a relatively simple job (read from Avro files, do a map, do 
 another map, write back to Avro files)
  - 412 of 413 tasks completed, but the last task was hung in RUNNING state
  - The 412 successful tasks completed in median time 3.4s
  - The last hung task didn't finish even in 20 hours
  - The executor with the hung task was responsible for 100% of one core of 
 CPU usage
  - Jstack of the executor attached (relevant thread pasted below)
 Diagnosis
 After doing some code spelunking, we determined the issue was concurrent use 
 of a Configuration object for each task on an executor.  In Hadoop each task 
 runs in its own JVM, but in Spark multiple tasks can run in the same JVM, so 
 the single-threaded access assumptions of the Configuration object no longer 
 hold in Spark.
 The specific issue is that the AvroRecordReader actually _modifies_ the 
 JobConf it's given when it's instantiated!  It adds a key for the RPC 
 protocol engine in the process of connecting to the Hadoop FileSystem.  When 
 many tasks start at the same time (like at the start of a job), many tasks 
 are adding this configuration item to the one Configuration object at once.  
 Internally Configuration uses a java.lang.HashMap, which isn't threadsafe… 
 The below post is an excellent explanation of what happens in the situation 
 where multiple threads insert into a HashMap at the same time.
 http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html
 The gist is that you have a thread following a cycle of linked list nodes 
 indefinitely.  This exactly matches our observations of the 100% CPU core and 
 also the final location in the stack trace.
 So it seems the way Spark shares a Configuration object between task threads 
 in an executor is incorrect.  We need some way to prevent concurrent access 
 to a single Configuration object.
 Proposed fix
 We can clone the JobConf object in HadoopRDD.getJobConf() so each task gets 
 its own JobConf object (and thus Configuration object).  The optimization of 
 broadcasting the Configuration object across the cluster can remain, but on 
 the other side I think it needs to be cloned for each task to allow for 
 concurrent access.  I'm not sure the performance implications, but the 
 comments suggest that the Configuration object is ~10KB so I would expect a 
 clone on the object to be relatively speedy.
 Has this been observed before?  Does my suggested fix make sense?  I'd be 
 happy to file a Jira ticket and continue discussion there for the right way 
 to fix.
 Thanks!
 Andrew
 P.S.  For others seeing this issue, our temporary workaround is to enable 
 spark.speculation, which retries failed (or hung) tasks on other machines.
 {noformat}
 Executor task launch worker-6 daemon prio=10 tid=0x7f91f01fe000 
 nid=0x54b1 runnable [0x7f92d74f1000]
java.lang.Thread.State: RUNNABLE
 at java.util.HashMap.transfer(HashMap.java:601)
 at java.util.HashMap.resize(HashMap.java:581)
 at java.util.HashMap.addEntry(HashMap.java:879)
 at java.util.HashMap.put(HashMap.java:505)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:803)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:783)
 at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:1662)
 at org.apache.hadoop.ipc.RPC.setProtocolEngine(RPC.java:193)
 at 
 org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:343)
 at 
 org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:168)
 at 
 org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
 at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:436)
 at 

[jira] [Updated] (SPARK-4915) Wrong class name of external shuffle service in the dynamic allocation

2014-12-21 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated SPARK-4915:
--
Description: 
docs/job-scheduling.md says as follows:
{quote}
To enable this service, set `spark.shuffle.service.enabled` to `true`. In YARN, 
this external shuffle service is implemented in 
`org.apache.spark.yarn.network.YarnShuffleService` that runs in each 
`NodeManager` in your cluster. 
{quote}

The class name org.apache.spark.yarn.network.YarnShuffleService is wrong. 
org.apache.spark.network.yarn.YarnShuffleService is correct class name to be 
specified.

  was:
In docs/job-scheduling.md says as follows:
{quote}
To enable this service, set `spark.shuffle.service.enabled` to `true`. In YARN, 
this external shuffle service is implemented in 
`org.apache.spark.yarn.network.YarnShuffleService` that runs in each 
`NodeManager` in your cluster. 
{quote}

The class name org.apache.spark.yarn.network.YarnShuffleService is wrong. 
org.apache.spark.network.yarn.YarnShuffleService is correct class name to be 
specified.


 Wrong class name of external shuffle service in the dynamic allocation
 --

 Key: SPARK-4915
 URL: https://issues.apache.org/jira/browse/SPARK-4915
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, YARN
Reporter: Tsuyoshi OZAWA

 docs/job-scheduling.md says as follows:
 {quote}
 To enable this service, set `spark.shuffle.service.enabled` to `true`. In 
 YARN, this external shuffle service is implemented in 
 `org.apache.spark.yarn.network.YarnShuffleService` that runs in each 
 `NodeManager` in your cluster. 
 {quote}
 The class name org.apache.spark.yarn.network.YarnShuffleService is wrong. 
 org.apache.spark.network.yarn.YarnShuffleService is correct class name to be 
 specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4915) Wrong class name of external shuffle service in the dynamic allocation

2014-12-21 Thread Tsuyoshi OZAWA (JIRA)
Tsuyoshi OZAWA created SPARK-4915:
-

 Summary: Wrong class name of external shuffle service in the 
dynamic allocation
 Key: SPARK-4915
 URL: https://issues.apache.org/jira/browse/SPARK-4915
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, YARN
Reporter: Tsuyoshi OZAWA


In docs/job-scheduling.md says as follows:
{quote}
To enable this service, set `spark.shuffle.service.enabled` to `true`. In YARN, 
this external shuffle service is implemented in 
`org.apache.spark.yarn.network.YarnShuffleService` that runs in each 
`NodeManager` in your cluster. 
{quote}

The class name org.apache.spark.yarn.network.YarnShuffleService is wrong. 
org.apache.spark.network.yarn.YarnShuffleService is correct class name to be 
specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4915) Wrong classname of external shuffle service in the doc for dynamic allocation

2014-12-21 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated SPARK-4915:
--
Summary: Wrong classname of external shuffle service in the doc for dynamic 
allocation  (was: Wrong classname of external shuffle service in the dynamic 
allocation)

 Wrong classname of external shuffle service in the doc for dynamic allocation
 -

 Key: SPARK-4915
 URL: https://issues.apache.org/jira/browse/SPARK-4915
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, YARN
Reporter: Tsuyoshi OZAWA

 docs/job-scheduling.md says as follows:
 {quote}
 To enable this service, set `spark.shuffle.service.enabled` to `true`. In 
 YARN, this external shuffle service is implemented in 
 `org.apache.spark.yarn.network.YarnShuffleService` that runs in each 
 `NodeManager` in your cluster. 
 {quote}
 The class name org.apache.spark.yarn.network.YarnShuffleService is wrong. 
 org.apache.spark.network.yarn.YarnShuffleService is correct class name to be 
 specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4140) Document the dynamic allocation feature

2014-12-16 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249499#comment-14249499
 ] 

Tsuyoshi OZAWA commented on SPARK-4140:
---

How about converting this issue into one of subtasks of SPARK-3174? It's easier 
to track.

 Document the dynamic allocation feature
 ---

 Key: SPARK-4140
 URL: https://issues.apache.org/jira/browse/SPARK-4140
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Andrew Or
Assignee: Andrew Or

 This blocks on SPARK-3795 and SPARK-3822.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4839) Adding documentations about dynamicAllocation

2014-12-12 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated SPARK-4839:
--
Component/s: Documentation

 Adding documentations about dynamicAllocation
 -

 Key: SPARK-4839
 URL: https://issues.apache.org/jira/browse/SPARK-4839
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Tsuyoshi OZAWA
 Fix For: 1.2.0


 There are not docs about dynamicAllocation. We should add them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4839) Adding documentations about dynamic resource allocation

2014-12-12 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated SPARK-4839:
--
Summary: Adding documentations about dynamic resource allocation  (was: 
Adding documentations about dynamicAllocation)

 Adding documentations about dynamic resource allocation
 ---

 Key: SPARK-4839
 URL: https://issues.apache.org/jira/browse/SPARK-4839
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Tsuyoshi OZAWA
 Fix For: 1.2.0


 There are not docs about dynamicAllocation. We should add them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4678) A SQL query with subquery fails

2014-12-01 Thread Tsuyoshi OZAWA (JIRA)
Tsuyoshi OZAWA created SPARK-4678:
-

 Summary: A SQL query with subquery fails
 Key: SPARK-4678
 URL: https://issues.apache.org/jira/browse/SPARK-4678
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.1
Reporter: Tsuyoshi OZAWA


{code}
spark-sql create external table if  NOT EXISTS randomText100GB(text string) 
location 'hdfs:///user/ozawa/randomText100GB'; 

spark-sql CREATE TABLE wordcount AS
  SELECT word, count(1) AS count
  FROM (SELECT 
EXPLODE(SPLIT(LCASE(REGEXP_REPLACE(text,'[\\p{Punct},\\p{Cntrl}]','')),' '))
  AS word FROM randomText100GB) words
  GROUP BY word;
org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in 
stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in stage 1.0 (TID 
25, hadoop-slave2.c.gcp-s
amples.internal): 
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
attribute, tree: word#5

org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)

org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:43)

org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:42)

org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)

org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)

org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:42)

org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52)

org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52)

scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
scala.collection.AbstractTraversable.map(Traversable.scala:105)

org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.init(Projection.scala:52)

org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106)

org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106)

org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43)

org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:42)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)

org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4678) A SQL query with subquery fails with TreeNodeException

2014-12-01 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated SPARK-4678:
--
Summary: A SQL query with subquery fails with TreeNodeException  (was: A 
SQL query with subquery fails)

 A SQL query with subquery fails with TreeNodeException
 --

 Key: SPARK-4678
 URL: https://issues.apache.org/jira/browse/SPARK-4678
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.1
Reporter: Tsuyoshi OZAWA

 {code}
 spark-sql create external table if  NOT EXISTS randomText100GB(text string) 
 location 'hdfs:///user/ozawa/randomText100GB'; 
 spark-sql CREATE TABLE wordcount AS
   SELECT word, count(1) AS count
   FROM (SELECT 
 EXPLODE(SPLIT(LCASE(REGEXP_REPLACE(text,'[\\p{Punct},\\p{Cntrl}]','')),' '))
   AS word FROM randomText100GB) words
   GROUP BY word;
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in 
 stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in stage 1.0 
 (TID 25, hadoop-slave2.c.gcp-s
 amples.internal): 
 org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
 attribute, tree: word#5
 
 org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
 
 org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:43)
 
 org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:42)
 
 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
 
 org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
 
 org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:42)
 
 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52)
 
 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52)
 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 scala.collection.AbstractTraversable.map(Traversable.scala:105)
 
 org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.init(Projection.scala:52)
 
 org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106)
 
 org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106)
 
 org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43)
 
 org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:42)
 org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4651) Adding -Phadoop-2.5 and -Phadoop-2.6 to compile with newer versions of Hadoop

2014-11-28 Thread Tsuyoshi OZAWA (JIRA)
Tsuyoshi OZAWA created SPARK-4651:
-

 Summary: Adding -Phadoop-2.5 and -Phadoop-2.6 to compile with 
newer versions of Hadoop
 Key: SPARK-4651
 URL: https://issues.apache.org/jira/browse/SPARK-4651
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Tsuyoshi OZAWA


Currently, we don't have newer profiles to compile Spark with newer versions of 
Hadoop. We should have them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4651) Adding -Phadoop-2.5 and -Phadoop-2.6 to compile with newer versions of Hadoop

2014-11-28 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228571#comment-14228571
 ] 

Tsuyoshi OZAWA commented on SPARK-4651:
---

I'll send PR via github soon.

 Adding -Phadoop-2.5 and -Phadoop-2.6 to compile with newer versions of Hadoop
 -

 Key: SPARK-4651
 URL: https://issues.apache.org/jira/browse/SPARK-4651
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Tsuyoshi OZAWA

 Currently, we don't have newer profiles to compile Spark with newer versions 
 of Hadoop. We should have them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4651) Adding -Phadoop-2.5 and -Phadoop-2.6 to compile Spark with newer versions of Hadoop

2014-11-28 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated SPARK-4651:
--
Summary: Adding -Phadoop-2.5 and -Phadoop-2.6 to compile Spark with newer 
versions of Hadoop  (was: Adding -Phadoop-2.5 and -Phadoop-2.6 to compile with 
newer versions of Hadoop)

 Adding -Phadoop-2.5 and -Phadoop-2.6 to compile Spark with newer versions of 
 Hadoop
 ---

 Key: SPARK-4651
 URL: https://issues.apache.org/jira/browse/SPARK-4651
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Tsuyoshi OZAWA

 Currently, we don't have newer profiles to compile Spark with newer versions 
 of Hadoop. We should have them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4651) Adding -Phadoop-2.5 and -Phadoop-2.6 to compile Spark with newer versions of Hadoop

2014-11-28 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228576#comment-14228576
 ] 

Tsuyoshi OZAWA commented on SPARK-4651:
---

[~srowen], oops, I've thought it's already released... anyway, I'll add 2.4+ 
profile. Thanks for your review!

 Adding -Phadoop-2.5 and -Phadoop-2.6 to compile Spark with newer versions of 
 Hadoop
 ---

 Key: SPARK-4651
 URL: https://issues.apache.org/jira/browse/SPARK-4651
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Tsuyoshi OZAWA

 Currently, we don't have newer profiles to compile Spark with newer versions 
 of Hadoop. We should have them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4651) Adding -Phadoop-2.4+ to compile Spark with newer versions of Hadoop

2014-11-28 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated SPARK-4651:
--
Summary: Adding -Phadoop-2.4+ to compile Spark with newer versions of 
Hadoop  (was: Adding -Phadoop-2.5 and -Phadoop-2.6 to compile Spark with newer 
versions of Hadoop)

 Adding -Phadoop-2.4+ to compile Spark with newer versions of Hadoop
 ---

 Key: SPARK-4651
 URL: https://issues.apache.org/jira/browse/SPARK-4651
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Tsuyoshi OZAWA

 Currently, we don't have newer profiles to compile Spark with newer versions 
 of Hadoop. We should have them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4267) Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later

2014-11-07 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201959#comment-14201959
 ] 

Tsuyoshi OZAWA commented on SPARK-4267:
---

[~sandyr] [~pwendell] do you have any workarounds to deal with this problem?

 Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later
 --

 Key: SPARK-4267
 URL: https://issues.apache.org/jira/browse/SPARK-4267
 Project: Spark
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA

 Currently we're trying Spark on YARN included in Hadoop 2.5.1. Hadoop 2.5 
 uses protobuf 2.5.0 so I compiled with protobuf 2.5.1 like this:
 {code}
  ./make-distribution.sh --name spark-1.1.1 --tgz -Pyarn 
 -Dhadoop.version=2.5.1 -Dprotobuf.version=2.5.0
 {code}
 Then Spark on YARN fails to launch jobs with NPE.
 {code}
 $ bin/spark-shell --master yarn-client
 scala sc.textFile(hdfs:///user/ozawa/wordcountInput20G).flatMap(line 
 = line.split( )).map(word = (word, 1)).persist().reduceByKey((a, b) = a 
 + b, 16).saveAsTextFile(hdfs:///user/ozawa/sparkWordcountOutNew2);
 java.lang.NullPointerException
   
   
 
 at 
 org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1284)
 at 
 org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1291)   
   
   
  
 at 
 org.apache.spark.SparkContext.textFile$default$2(SparkContext.scala:480)
 at $iwC$$iwC$$iwC$$iwC.init(console:13)   
   
   
 
 at $iwC$$iwC$$iwC.init(console:18)
 at $iwC$$iwC.init(console:20) 
   
   
 
 at $iwC.init(console:22)
 at init(console:24)   
   
   
 
 at .init(console:28)
 at .clinit(console)   
   
   
 
 at .init(console:7)
 at .clinit(console)   
   
   
 
 at $print(console)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   
   
 
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   
   
   
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) 
   
   
  
 at 
 org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
 at 
 org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615)
   
   
  
 at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646)
 at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610)   
   
   

[jira] [Updated] (SPARK-4267) Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later

2014-11-06 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated SPARK-4267:
--
Description: 
Currently we're trying Spark on YARN included in Hadoop 2.5.1. Hadoop 2.5 uses 
protobuf 2.5.0 so I compiled with protobuf 2.5.1 like this:

{code}
 ./make-distribution.sh --name spark-1.1.1 --tgz -Pyarn -Dhadoop.version=2.5.1 
-Dprotobuf.version=2.5.0
{code}

Then Spark on YARN cannot fail to run with NPE.

{code}
$ bin/spark-shell --master yarn-client
scala sc.textFile(hdfs:///user/ozawa/wordcountInput20G).flatMap(line = 
line.split( )).map(word = (word, 1)).persist().reduceByKey((a, b) = a + b, 
16).saveAsTextFile(hdfs:///user/ozawa/sparkWordcountOutNew2);
java.lang.NullPointerException  


  
at 
org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1284)
at 
org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1291) 


   
at 
org.apache.spark.SparkContext.textFile$default$2(SparkContext.scala:480)
at $iwC$$iwC$$iwC$$iwC.init(console:13) 


  
at $iwC$$iwC$$iwC.init(console:18)
at $iwC$$iwC.init(console:20)   


  
at $iwC.init(console:22)
at init(console:24) 


  
at .init(console:28)
at .clinit(console) 


  
at .init(console:7)
at .clinit(console) 


  
at $print(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  


  
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

  
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)   


   
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615)  


   
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610) 


  
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:823)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:868)


   
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:780)
at 

[jira] [Created] (SPARK-4267) Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later

2014-11-06 Thread Tsuyoshi OZAWA (JIRA)
Tsuyoshi OZAWA created SPARK-4267:
-

 Summary: Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 
or later
 Key: SPARK-4267
 URL: https://issues.apache.org/jira/browse/SPARK-4267
 Project: Spark
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA


Currently we're trying Spark on YARN included in Hadoop 2.5.1. Hadoop 2.5 uses 
protobuf 2.4.1 so I compiled with protobuf 2.5.1 like this:

{code}
 ./make-distribution.sh --name spark-1.1.1 --tgz -Pyarn -Dhadoop.version=2.5.1 
-Dprotobuf.version=2.5.0
{code}

Then Spark on YARN cannot fail to run with NPE.

{code}
$ bin/spark-shell --master yarn-client
scala sc.textFile(hdfs:///user/ozawa/wordcountInput20G).flatMap(line = 
line.split( )).map(word = (word, 1)).persist().reduceByKey((a, b) = a + b, 
16).saveAsTextFile(hdfs:///user/ozawa/sparkWordcountOutNew2);
java.lang.NullPointerException  


  
at 
org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1284)
at 
org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1291) 


   
at 
org.apache.spark.SparkContext.textFile$default$2(SparkContext.scala:480)
at $iwC$$iwC$$iwC$$iwC.init(console:13) 


  
at $iwC$$iwC$$iwC.init(console:18)
at $iwC$$iwC.init(console:20)   


  
at $iwC.init(console:22)
at init(console:24) 


  
at .init(console:28)
at .clinit(console) 


  
at .init(console:7)
at .clinit(console) 


  
at $print(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  


  
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

  
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)   


   
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615)  


   
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610) 


  
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:823)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:868)


   
 

[jira] [Updated] (SPARK-4267) Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later

2014-11-06 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated SPARK-4267:
--
Description: 
Currently we're trying Spark on YARN included in Hadoop 2.5.1. Hadoop 2.5 uses 
protobuf 2.5.0 so I compiled with protobuf 2.5.1 like this:

{code}
 ./make-distribution.sh --name spark-1.1.1 --tgz -Pyarn -Dhadoop.version=2.5.1 
-Dprotobuf.version=2.5.0
{code}

Then Spark on YARN fails to launch jobs with NPE.

{code}
$ bin/spark-shell --master yarn-client
scala sc.textFile(hdfs:///user/ozawa/wordcountInput20G).flatMap(line = 
line.split( )).map(word = (word, 1)).persist().reduceByKey((a, b) = a + b, 
16).saveAsTextFile(hdfs:///user/ozawa/sparkWordcountOutNew2);
java.lang.NullPointerException  


  
at 
org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1284)
at 
org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1291) 


   
at 
org.apache.spark.SparkContext.textFile$default$2(SparkContext.scala:480)
at $iwC$$iwC$$iwC$$iwC.init(console:13) 


  
at $iwC$$iwC$$iwC.init(console:18)
at $iwC$$iwC.init(console:20)   


  
at $iwC.init(console:22)
at init(console:24) 


  
at .init(console:28)
at .clinit(console) 


  
at .init(console:7)
at .clinit(console) 


  
at $print(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  


  
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

  
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)   


   
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615)  


   
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610) 


  
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:823)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:868)


   
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:780)
at 

[jira] [Commented] (SPARK-1097) ConcurrentModificationException

2014-06-06 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020336#comment-14020336
 ] 

Tsuyoshi OZAWA commented on SPARK-1097:
---

[~jblomo], thank you for reporting. This issue is fixed in next minor Hadoop 
release - 2.4.1. Note that 2.4.0 doesn't include the fix.

 ConcurrentModificationException
 ---

 Key: SPARK-1097
 URL: https://issues.apache.org/jira/browse/SPARK-1097
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0
Reporter: Fabrizio Milo
 Attachments: nravi_Conf_Spark-1388.patch


 {noformat}
 14/02/16 08:18:45 WARN TaskSetManager: Loss was due to 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:926)
   at java.util.HashMap$KeyIterator.next(HashMap.java:960)
   at java.util.AbstractCollection.addAll(AbstractCollection.java:341)
   at java.util.HashSet.init(HashSet.java:117)
   at org.apache.hadoop.conf.Configuration.init(Configuration.java:554)
   at org.apache.hadoop.mapred.JobConf.init(JobConf.java:439)
   at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:110)
   at org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:154)
   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at org.apache.spark.rdd.UnionPartition.iterator(UnionRDD.scala:32)
   at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:72)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
   at org.apache.spark.scheduler.Task.run(Task.scala:53)
   at 
 org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
   at 
 org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1097) ConcurrentModificationException

2014-04-03 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13958671#comment-13958671
 ] 

Tsuyoshi OZAWA commented on SPARK-1097:
---

A patch by Nishkam on HADOOP-10456 has been already reviewed and will be 
committed in a few days against hadoop's trunk.

 ConcurrentModificationException
 ---

 Key: SPARK-1097
 URL: https://issues.apache.org/jira/browse/SPARK-1097
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 0.9.0
Reporter: Fabrizio Milo
 Attachments: nravi_Conf_Spark-1388.patch


 {noformat}
 14/02/16 08:18:45 WARN TaskSetManager: Loss was due to 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:926)
   at java.util.HashMap$KeyIterator.next(HashMap.java:960)
   at java.util.AbstractCollection.addAll(AbstractCollection.java:341)
   at java.util.HashSet.init(HashSet.java:117)
   at org.apache.hadoop.conf.Configuration.init(Configuration.java:554)
   at org.apache.hadoop.mapred.JobConf.init(JobConf.java:439)
   at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:110)
   at org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:154)
   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at org.apache.spark.rdd.UnionPartition.iterator(UnionRDD.scala:32)
   at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:72)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
   at org.apache.spark.scheduler.Task.run(Task.scala:53)
   at 
 org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
   at 
 org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)