[jira] [Resolved] (SPARK-9010) Improve the Spark Configuration document about `spark.kryoserializer.buffer`
[ https://issues.apache.org/jira/browse/SPARK-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-9010. -- Resolution: Fixed Fix Version/s: 1.5.0 1.4.2 Issue resolved by pull request 7393 [https://github.com/apache/spark/pull/7393] Improve the Spark Configuration document about `spark.kryoserializer.buffer` Key: SPARK-9010 URL: https://issues.apache.org/jira/browse/SPARK-9010 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.4.0 Reporter: StanZhai Priority: Trivial Labels: documentation Fix For: 1.4.2, 1.5.0 The meaning of spark.kryoserializer.buffer should be Initial size of Kryo's serialization buffer. Note that there will be one buffer per core on each worker. This buffer will grow up to spark.kryoserializer.buffer.max if needed.. The spark.kryoserializer.buffer.max.mb is out-of-date in spark 1.4. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8851) in Yarn client mode, Client.scala does not login even when credentials are specified
[ https://issues.apache.org/jira/browse/SPARK-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625999#comment-14625999 ] Apache Spark commented on SPARK-8851: - User 'harishreedharan' has created a pull request for this issue: https://github.com/apache/spark/pull/7394 in Yarn client mode, Client.scala does not login even when credentials are specified Key: SPARK-8851 URL: https://issues.apache.org/jira/browse/SPARK-8851 Project: Spark Issue Type: Bug Components: YARN Reporter: Hari Shreedharan [#6051|https://github.com/apache/spark/pull/6051] added support for passing the credentials configuration from SparkConf, so the client mode works fine. This though created an issue where the Client.scala class does not login to the KDC, thus requiring a kinit before running in Client mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9031) Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class
[ https://issues.apache.org/jira/browse/SPARK-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9031: --- Assignee: Apache Spark (was: Josh Rosen) Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class --- Key: SPARK-9031 URL: https://issues.apache.org/jira/browse/SPARK-9031 Project: Spark Issue Type: Bug Components: Shuffle, Spark Core Reporter: Josh Rosen Assignee: Apache Spark BlockObjectWriter has only one concrete non-test class, DiskBlockObjectWriter. In order to simplify the code in preparation for other refactorings, I think that we should remove this base class and have only DiskBlockObjectWriter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9031) Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class
[ https://issues.apache.org/jira/browse/SPARK-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9031: --- Assignee: Josh Rosen (was: Apache Spark) Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class --- Key: SPARK-9031 URL: https://issues.apache.org/jira/browse/SPARK-9031 Project: Spark Issue Type: Bug Components: Shuffle, Spark Core Reporter: Josh Rosen Assignee: Josh Rosen BlockObjectWriter has only one concrete non-test class, DiskBlockObjectWriter. In order to simplify the code in preparation for other refactorings, I think that we should remove this base class and have only DiskBlockObjectWriter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9031) Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class
[ https://issues.apache.org/jira/browse/SPARK-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625920#comment-14625920 ] Apache Spark commented on SPARK-9031: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/7391 Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class --- Key: SPARK-9031 URL: https://issues.apache.org/jira/browse/SPARK-9031 Project: Spark Issue Type: Bug Components: Shuffle, Spark Core Reporter: Josh Rosen Assignee: Josh Rosen BlockObjectWriter has only one concrete non-test class, DiskBlockObjectWriter. In order to simplify the code in preparation for other refactorings, I think that we should remove this base class and have only DiskBlockObjectWriter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9010) Improve the Spark Configuration document about `spark.kryoserializer.buffer`
[ https://issues.apache.org/jira/browse/SPARK-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625994#comment-14625994 ] Apache Spark commented on SPARK-9010: - User 'stanzhai' has created a pull request for this issue: https://github.com/apache/spark/pull/7393 Improve the Spark Configuration document about `spark.kryoserializer.buffer` Key: SPARK-9010 URL: https://issues.apache.org/jira/browse/SPARK-9010 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.4.0 Reporter: StanZhai Priority: Trivial Labels: documentation The meaning of spark.kryoserializer.buffer should be Initial size of Kryo's serialization buffer. Note that there will be one buffer per core on each worker. This buffer will grow up to spark.kryoserializer.buffer.max if needed.. The spark.kryoserializer.buffer.max.mb is out-of-date in spark 1.4. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9003) Add map/update function to MLlib/Vector
[ https://issues.apache.org/jira/browse/SPARK-9003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625939#comment-14625939 ] Sean Owen commented on SPARK-9003: -- [~josephkb] Please not another one! the world has too many. Add map/update function to MLlib/Vector --- Key: SPARK-9003 URL: https://issues.apache.org/jira/browse/SPARK-9003 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Yanbo Liang Priority: Minor MLlib/Vector only support foreachActive function and is short of map/update which is inconvenience for some Vector operations. For example: val a = Vectors.dense(...) If we want to compute math.log for each elements of a and get Vector as return value, we can only code as: val b = Vectors.dense(a.toArray.map(math.log)) or we can use toBreeze and fromBreeze make transformation with breeze API. The code snippet is not elegant, we want it can implement: val c = a.map(math.log) Also currently MLlib/Matrix has implemented map/update/foreachActive function. I think Vector should also has map/update. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9001) sbt doc fails due to javadoc errors
[ https://issues.apache.org/jira/browse/SPARK-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625970#comment-14625970 ] Joseph E. Gonzalez commented on SPARK-9001: --- While the issue is generally minor it does block `build/sbt publish-local`. sbt doc fails due to javadoc errors --- Key: SPARK-9001 URL: https://issues.apache.org/jira/browse/SPARK-9001 Project: Spark Issue Type: Bug Components: Documentation Reporter: Joseph E. Gonzalez Priority: Minor Running `build/sbt doc` on master fails due to errors javadocs. This is an issues since `build/sbt publish-local` depends on building the docs. Example error: [info] Generating /spark/unsafe/target/scala-2.10/api/org/apache/spark/unsafe/bitset/BitSet.html... [error] /spark/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSet.java:93: error: bad use of '' [error]* for (long i = bs.nextSetBit(0); i = 0; i = bs.nextSetBit(i + 1)) { [error] ^ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625980#comment-14625980 ] Bolke de Bruin commented on SPARK-9019: --- Tracing this down it seems that the tokens are not being set on the container in yarn.Client, which is required according to http://aajisaka.github.io/hadoop-project/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html. something like this: ByteBuffer fsTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength()); amContainer.setTokens(fsTokens); in createContainerLaunchContext of yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
[jira] [Created] (SPARK-9031) Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class
Josh Rosen created SPARK-9031: - Summary: Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class Key: SPARK-9031 URL: https://issues.apache.org/jira/browse/SPARK-9031 Project: Spark Issue Type: Bug Components: Shuffle, Spark Core Reporter: Josh Rosen Assignee: Josh Rosen BlockObjectWriter has only one concrete non-test class, DiskBlockObjectWriter. In order to simplify the code in preparation for other refactorings, I think that we should remove this base class and have only DiskBlockObjectWriter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8975) Implement a mechanism to send a new rate from the driver to the block generator
[ https://issues.apache.org/jira/browse/SPARK-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625930#comment-14625930 ] François Garillot commented on SPARK-8975: -- Typesafe PR : https://github.com/typesafehub/spark/pull/15/files Implement a mechanism to send a new rate from the driver to the block generator --- Key: SPARK-8975 URL: https://issues.apache.org/jira/browse/SPARK-8975 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Iulian Dragos Full design doc [here|https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing] - Add a new message, {{RateUpdate(newRate: Long)}} that ReceiverSupervisor handles in its endpoint - Add a new method to ReceiverTracker {{def sendRateUpdate(streamId: Int, newRate: Long): Unit}} this method sends an asynchronous RateUpdate message to the receiver supervisor corresponding to streamId - update the rate in the corresponding block generator. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9020) Support mutable state in code gen expressions
[ https://issues.apache.org/jira/browse/SPARK-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9020: --- Assignee: Apache Spark (was: Wenchen Fan) Support mutable state in code gen expressions - Key: SPARK-9020 URL: https://issues.apache.org/jira/browse/SPARK-9020 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Apache Spark Some expressions have state in them (e.g. Rand, MonotonicallyIncreasingID). We currently don't support code-gen any expressions that have mutable states. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9010) Improve the Spark Configuration document about `spark.kryoserializer.buffer`
[ https://issues.apache.org/jira/browse/SPARK-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-9010: - Assignee: StanZhai Improve the Spark Configuration document about `spark.kryoserializer.buffer` Key: SPARK-9010 URL: https://issues.apache.org/jira/browse/SPARK-9010 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.4.0 Reporter: StanZhai Assignee: StanZhai Priority: Trivial Labels: documentation Fix For: 1.4.2, 1.5.0 The meaning of spark.kryoserializer.buffer should be Initial size of Kryo's serialization buffer. Note that there will be one buffer per core on each worker. This buffer will grow up to spark.kryoserializer.buffer.max if needed.. The spark.kryoserializer.buffer.max.mb is out-of-date in spark 1.4. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626005#comment-14626005 ] Sean Owen commented on SPARK-9019: -- Same as SPARK-8851? spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at
[jira] [Assigned] (SPARK-9020) Support mutable state in code gen expressions
[ https://issues.apache.org/jira/browse/SPARK-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9020: --- Assignee: Wenchen Fan (was: Apache Spark) Support mutable state in code gen expressions - Key: SPARK-9020 URL: https://issues.apache.org/jira/browse/SPARK-9020 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Wenchen Fan Some expressions have state in them (e.g. Rand, MonotonicallyIncreasingID). We currently don't support code-gen any expressions that have mutable states. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9020) Support mutable state in code gen expressions
[ https://issues.apache.org/jira/browse/SPARK-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625929#comment-14625929 ] Apache Spark commented on SPARK-9020: - User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/7392 Support mutable state in code gen expressions - Key: SPARK-9020 URL: https://issues.apache.org/jira/browse/SPARK-9020 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Wenchen Fan Some expressions have state in them (e.g. Rand, MonotonicallyIncreasingID). We currently don't support code-gen any expressions that have mutable states. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9025) Storage tab shows no blocks for cached RDDs
[ https://issues.apache.org/jira/browse/SPARK-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625976#comment-14625976 ] Sean Owen commented on SPARK-9025: -- I can't reproduce this on master. In Storage I get RDD NameStorage Level Cached Partitions Fraction Cached Size in Memory Size in ExternalBlockStore Size on Disk ParallelCollectionRDD Memory Deserialized 1x Replicated 8 100% 352.0 B 0.0 B 0.0 B Storage tab shows no blocks for cached RDDs --- Key: SPARK-9025 URL: https://issues.apache.org/jira/browse/SPARK-9025 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.5.0 Reporter: Andrew Or Simple repro: sc.parallelize(1 to 10).cache().count(), go to storage tab. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8808) Fix assignments in SparkR
[ https://issues.apache.org/jira/browse/SPARK-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8808: --- Assignee: Apache Spark Fix assignments in SparkR - Key: SPARK-8808 URL: https://issues.apache.org/jira/browse/SPARK-8808 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Assignee: Apache Spark {noformat} inst/tests/test_binary_function.R:79:12: style: Use -, not =, for assignment. mockFile = c(Spark is pretty., Spark is awesome.) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8979) Implement a PIDRateEstimator
[ https://issues.apache.org/jira/browse/SPARK-8979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626118#comment-14626118 ] François Garillot commented on SPARK-8979: -- Parameter derivation available here: https://www.dropbox.com/s/dwgl7wa1z5wbkg6/PIDderivation.pdf?dl=0 Implement a PIDRateEstimator Key: SPARK-8979 URL: https://issues.apache.org/jira/browse/SPARK-8979 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Iulian Dragos Fix For: 1.5.0 Based on this [design doc|https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626043#comment-14626043 ] Bolke de Bruin commented on SPARK-9019: --- Will try in a few minutes, however it did not only happen when using keytabs. Also when using the user's own credentials. spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at
[jira] [Updated] (SPARK-8974) There is a bug in The spark-dynamic-executor-allocation may be not supported
[ https://issues.apache.org/jira/browse/SPARK-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-8974: - Summary: There is a bug in The spark-dynamic-executor-allocation may be not supported (was: The spark-dynamic-executor-allocation may be not supported) There is a bug in The spark-dynamic-executor-allocation may be not supported Key: SPARK-8974 URL: https://issues.apache.org/jira/browse/SPARK-8974 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.4.0 Reporter: KaiXinXIaoLei Fix For: 1.5.0 In yarn-client mode and config option spark.dynamicAllocation.enabled is true, when the state of ApplicationMaster is dead or disconnected, if the tasks are submitted before new ApplicationMaster start. The thread of spark-dynamic-executor-allocation will throw exception, When ApplicationMaster is running and not tasks are running, the number of executor is not zero. So feture of dynamicAllocation are not supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626183#comment-14626183 ] Bolke de Bruin commented on SPARK-9019: --- [~srowen] unfortunately the patch from SPARK-8851 did not solve the issue. Trace remains the same. spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at
[jira] [Comment Edited] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626183#comment-14626183 ] Bolke de Bruin edited comment on SPARK-9019 at 7/14/15 10:41 AM: - [~srowen] unfortunately the patch from SPARK-8851 did not solve the issue. Trace remains the same. With the patch a user without a key tab cannot use spark-submit anymore with --master yarn-cluster (failed renewal of token) was (Author: bolke): [~srowen] unfortunately the patch from SPARK-8851 did not solve the issue. Trace remains the same. spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
[jira] [Comment Edited] (SPARK-8975) Implement a mechanism to send a new rate from the driver to the block generator
[ https://issues.apache.org/jira/browse/SPARK-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625930#comment-14625930 ] François Garillot edited comment on SPARK-8975 at 7/14/15 11:14 AM: Typesafe PR : https://github.com/typesafehub/spark/pull/15/ was (Author: huitseeker): Typesafe PR : https://github.com/typesafehub/spark/pull/15/files Implement a mechanism to send a new rate from the driver to the block generator --- Key: SPARK-8975 URL: https://issues.apache.org/jira/browse/SPARK-8975 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Iulian Dragos Full design doc [here|https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing] - Add a new message, {{RateUpdate(newRate: Long)}} that ReceiverSupervisor handles in its endpoint - Add a new method to ReceiverTracker {{def sendRateUpdate(streamId: Int, newRate: Long): Unit}} this method sends an asynchronous RateUpdate message to the receiver supervisor corresponding to streamId - update the rate in the corresponding block generator. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9032) scala.MatchError in DataFrameReader.json(String path)
Philipp Poetter created SPARK-9032: -- Summary: scala.MatchError in DataFrameReader.json(String path) Key: SPARK-9032 URL: https://issues.apache.org/jira/browse/SPARK-9032 Project: Spark Issue Type: Bug Components: Java API, SQL Affects Versions: 1.4.0 Environment: Ubuntu 15.04 Reporter: Philipp Poetter Executing read().json() of SQLContext e.g. DataFrameReader raises a MatchError with a stacktrace as follows while trying to read JSON data: 15/07/14 11:25:26 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/07/14 11:25:26 INFO DAGScheduler: Job 0 finished: json at Example.java:23, took 6.981330 s Exception in thread main scala.MatchError: StringType (of class org.apache.spark.sql.types.StringType$) at org.apache.spark.sql.json.InferSchema$.apply(InferSchema.scala:58) at org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:139) at org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:138) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.json.JSONRelation.schema$lzycompute(JSONRelation.scala:137) at org.apache.spark.sql.json.JSONRelation.schema(JSONRelation.scala:137) at org.apache.spark.sql.sources.LogicalRelation.init(LogicalRelation.scala:30) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:213) at com.hp.sparkdemo.Example.main(Example.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/07/14 11:25:26 INFO SparkContext: Invoking stop() from shutdown hook 15/07/14 11:25:26 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040 15/07/14 11:25:26 INFO DAGScheduler: Stopping DAGScheduler 15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Shutting down all executors 15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 15/07/14 11:25:26 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! Offending code snippet (around line 23): ... JavaSparkContext sctx = new JavaSparkContext(sparkConf); SQLContext ctx = new SQLContext(sctx); DataFrame frame = ctx.read().json(facebookJSON); frame.printSchema(); ... The exception is reproducable using the following JSON: { data: [ { id: X999_Y999, from: { name: Tom Brady, id: X12 }, message: Looking forward to 2010!, actions: [ { name: Comment, link: http://www.facebook.com/X999/posts/Y999; }, { name: Like, link: http://www.facebook.com/X999/posts/Y999; } ], type: status, created_time: 2010-08-02T21:27:44+, updated_time: 2010-08-02T21:27:44+ }, { id: X998_Y998, from: { name: Peyton Manning, id: X18 }, message: Where's my contract?, actions: [ { name: Comment, link: http://www.facebook.com/X998/posts/Y998; }, { name: Like, link: http://www.facebook.com/X998/posts/Y998; } ], type: status, created_time: 2010-08-02T21:27:44+, updated_time: 2010-08-02T21:27:44+ } ] } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8974) There is a bug in dynamicAllocation. The spark-dynamic-executor-allocation may be not supported
[ https://issues.apache.org/jira/browse/SPARK-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-8974: - Summary: There is a bug in dynamicAllocation. The spark-dynamic-executor-allocation may be not supported (was: There is a bug in The spark-dynamic-executor-allocation may be not supported) There is a bug in dynamicAllocation. The spark-dynamic-executor-allocation may be not supported --- Key: SPARK-8974 URL: https://issues.apache.org/jira/browse/SPARK-8974 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.4.0 Reporter: KaiXinXIaoLei Fix For: 1.5.0 In yarn-client mode and config option spark.dynamicAllocation.enabled is true, when the state of ApplicationMaster is dead or disconnected, if the tasks are submitted before new ApplicationMaster start. The thread of spark-dynamic-executor-allocation will throw exception, When ApplicationMaster is running and not tasks are running, the number of executor is not zero. So feture of dynamicAllocation are not supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8974) There is a bug in dynamicAllocation. When there is no running tasks, the number of executor is not zero.
[ https://issues.apache.org/jira/browse/SPARK-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-8974: - Summary: There is a bug in dynamicAllocation. When there is no running tasks, the number of executor is not zero. (was: There is a bug in dynamicAllocation. The spark-dynamic-executor-allocation may be not supported) There is a bug in dynamicAllocation. When there is no running tasks, the number of executor is not zero. Key: SPARK-8974 URL: https://issues.apache.org/jira/browse/SPARK-8974 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.4.0 Reporter: KaiXinXIaoLei Fix For: 1.5.0 In yarn-client mode and config option spark.dynamicAllocation.enabled is true, when the state of ApplicationMaster is dead or disconnected, if the tasks are submitted before new ApplicationMaster start. The thread of spark-dynamic-executor-allocation will throw exception, When ApplicationMaster is running and not tasks are running, the number of executor is not zero. So feture of dynamicAllocation are not supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8808) Fix assignments in SparkR
[ https://issues.apache.org/jira/browse/SPARK-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626095#comment-14626095 ] Apache Spark commented on SPARK-8808: - User 'sun-rui' has created a pull request for this issue: https://github.com/apache/spark/pull/7395 Fix assignments in SparkR - Key: SPARK-8808 URL: https://issues.apache.org/jira/browse/SPARK-8808 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa {noformat} inst/tests/test_binary_function.R:79:12: style: Use -, not =, for assignment. mockFile = c(Spark is pretty., Spark is awesome.) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8808) Fix assignments in SparkR
[ https://issues.apache.org/jira/browse/SPARK-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8808: --- Assignee: (was: Apache Spark) Fix assignments in SparkR - Key: SPARK-8808 URL: https://issues.apache.org/jira/browse/SPARK-8808 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa {noformat} inst/tests/test_binary_function.R:79:12: style: Use -, not =, for assignment. mockFile = c(Spark is pretty., Spark is awesome.) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626113#comment-14626113 ] Bolke de Bruin commented on SPARK-9019: --- Now with debug info (not yet with patch): 15/07/14 11:03:49 DEBUG UserGroupInformation: PrivilegedAction as:yx66jx (auth:SIMPLE) from:org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717) 15/07/14 11:03:49 DEBUG SaslRpcClient: Sending sasl message state: NEGOTIATE 15/07/14 11:03:49 DEBUG SaslRpcClient: Received SASL message state: NEGOTIATE auths { method: TOKEN mechanism: DIGEST-MD5 protocol: serverId: default challenge: realm=\default\,nonce=\XXX\,qop=\auth\,charset=utf-8,algorithm=md5-sess } auths { method: KERBEROS mechanism: GSSAPI protocol: rm serverId: lxhnl002.ad.ing.net } 15/07/14 11:03:49 DEBUG SaslRpcClient: Get token info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:org.apache.hadoop.yarn.security.client.ClientRMSecurityInfo$2@5c53714b 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Looking for a token with service 10.111.114.16:8032 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is YARN_AM_RM_TOKEN and the token's service name is 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is HIVE_DELEGATION_TOKEN and the token's service name is 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is TIMELINE_DELEGATION_TOKEN and the token's service name is 10.111.114.16:8188 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is HDFS_DELEGATION_TOKEN and the token's service name is 10.111.114.16:8020 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is HDFS_DELEGATION_TOKEN and the token's service name is 10.111.114.17:8020 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is HDFS_DELEGATION_TOKEN and the token's service name is ha-hdfs:hdpnlcb 15/07/14 11:03:49 DEBUG UserGroupInformation: PrivilegedActionException as:yx66jx (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/14 11:03:49 DEBUG UserGroupInformation: PrivilegedAction as:yx66jx (auth:SIMPLE) from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643) 15/07/14 11:03:49 WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/14 11:03:49 DEBUG UserGroupInformation: PrivilegedActionException as:yx66jx (auth:SIMPLE) cause:java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] auth:SIMPLE is what worries me. spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot
[jira] [Updated] (SPARK-9034) Reflect field names defined in GenericUDTF
[ https://issues.apache.org/jira/browse/SPARK-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-9034: Description: GenericUDTF#initialize() in Hive defines field names in a returned schema though, the current HiveGenericUDTF drops these names. We might need to reflect these in a logical plan tree. was: GenericUDTF#initialize() defines field names in a returned schema though, the current HiveGenericUDTF drops these names. We might need to reflect these in a logical plan tree. Reflect field names defined in GenericUDTF -- Key: SPARK-9034 URL: https://issues.apache.org/jira/browse/SPARK-9034 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Takeshi Yamamuro GenericUDTF#initialize() in Hive defines field names in a returned schema though, the current HiveGenericUDTF drops these names. We might need to reflect these in a logical plan tree. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9034) Reflect field names defined in GenericUDTF
[ https://issues.apache.org/jira/browse/SPARK-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-9034: Description: Hive GenericUDTF#initialize() defines field names in a returned schema though, the current HiveGenericUDTF drops these names. We might need to reflect these in a logical plan tree. was: GenericUDTF#initialize() in Hive defines field names in a returned schema though, the current HiveGenericUDTF drops these names. We might need to reflect these in a logical plan tree. Reflect field names defined in GenericUDTF -- Key: SPARK-9034 URL: https://issues.apache.org/jira/browse/SPARK-9034 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Takeshi Yamamuro Hive GenericUDTF#initialize() defines field names in a returned schema though, the current HiveGenericUDTF drops these names. We might need to reflect these in a logical plan tree. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9034) Reflect field names defined in GenericUDTF
Takeshi Yamamuro created SPARK-9034: --- Summary: Reflect field names defined in GenericUDTF Key: SPARK-9034 URL: https://issues.apache.org/jira/browse/SPARK-9034 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Takeshi Yamamuro GenericUDTF#initialize() defines field names in a returned schema though, the current HiveGenericUDTF drops these names. We might need to reflect these in a logical plan tree. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626300#comment-14626300 ] Bolke de Bruin edited comment on SPARK-9019 at 7/14/15 1:00 PM: 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Looking for a token with service 10.111.114.16:8032 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is YARN_AM_RM_TOKEN and the token's service name is I think that should match was (Author: bolke): It might be that we have a configuration issue (but Im not sure): 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Looking for a token with service 10.111.114.16:8032 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is YARN_AM_RM_TOKEN and the token's service name is I think that should match spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at
[jira] [Updated] (SPARK-9033) scala.MatchError: interface java.util.Map (of class java.lang.Class) with Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel updated SPARK-9033: - Description: I've a java.util.MapString, String field in a POJO class and I'm trying to use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and getting following error in both 1.2.2 1.3.1 versions of the Spark SQL: *sample code: SQLContext sqlCtx = new SQLContext(sc.sc()); JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); //text line is splitted and assigned to respective field of the event class here DataFrame schemaRDD = sqlCtx.createDataFrame(rdd, Event.class); -- error thrown here schemaRDD.registerTempTable(events); Event class is a Serializable containing a field of type java.util.MapString, String. This issue occurs also with Spark streaming when used with SQL. JavaDStreamString receiverStream = jssc.receiverStream(new StreamingReceiver()); JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, SLIDE_INTERVAL); jssc.checkpoint(event-streaming); windowDStream.foreachRDD(evRDD - { if(evRDD.count() == 0) return null; DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class); schemaRDD.registerTempTable(events); ... } *error: scala.MatchError: interface java.util.Map (of class java.lang.Class) at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) ~[spark-sql_2.10-1.3.1.jar:1.3.1] also this occurs for fields of custom POJO classes: scala.MatchError: class com.test.MyClass (of class java.lang.Class) at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) ~[spark-sql_2.10-1.3.1.jar:1.3.1] was: I've a java.util.MapString, String field in a POJO class and I'm trying to use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and getting following error in both 1.2.2 1.3.1 versions of the Spark SQL: *sample code: SQLContext sqlCtx = new SQLContext(sc.sc()); JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); //text line is splitted and assigned to respective field of the event class here DataFrame schemaRDD = sqlCtx.createDataFrame(rdd, Event.class); -- error thrown here schemaRDD.registerTempTable(events); Event class is a Serializable containing a field of type java.util.MapString, String. This issue occurs also with Spark streaming when used with SQL. JavaDStreamString receiverStream = jssc.receiverStream(new StreamingReceiver()); JavaDStreamString windowDStream =
[jira] [Updated] (SPARK-8974) There is a bug in dynamicAllocation. When there is no running tasks, the number of executor a long time without running tasks, the number of executor does not reduce to t
[ https://issues.apache.org/jira/browse/SPARK-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-8974: - Summary: There is a bug in dynamicAllocation. When there is no running tasks, the number of executor a long time without running tasks, the number of executor does not reduce to the value of spark.dynamicAllocation.minExecutors. (was: There is a bug in dynamicAllocation. When there is no running tasks, the number of executor is not zero.) There is a bug in dynamicAllocation. When there is no running tasks, the number of executor a long time without running tasks, the number of executor does not reduce to the value of spark.dynamicAllocation.minExecutors. - Key: SPARK-8974 URL: https://issues.apache.org/jira/browse/SPARK-8974 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.4.0 Reporter: KaiXinXIaoLei Fix For: 1.5.0 In yarn-client mode and config option spark.dynamicAllocation.enabled is true, when the state of ApplicationMaster is dead or disconnected, if the tasks are submitted before new ApplicationMaster start. The thread of spark-dynamic-executor-allocation will throw exception, When ApplicationMaster is running and not tasks are running, the number of executor is not zero. So feture of dynamicAllocation are not supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626256#comment-14626256 ] Bolke de Bruin commented on SPARK-9019: --- And some more debugging information. Please note the selected auth:SIMPLE method. 15/07/14 11:03:45 INFO ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 15/07/14 11:03:45 DEBUG Shell: setsid exited with exit code 0 15/07/14 11:03:45 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(value=[Rate of successful kerberos logins and latency (milliseconds)], about=, valueName=Time, type=DEFAULT, always=false, sampleName=Ops) 15/07/14 11:03:45 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(value=[Rate of failed kerberos logins and latency (milliseconds)], about=, valueName=Time, type=DEFAULT, always=false, sampleName=Ops) 15/07/14 11:03:45 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(value=[GetGroups], about=, valueName=Time, type=DEFAULT, always=false, sampleName=Ops) 15/07/14 11:03:45 DEBUG MetricsSystemImpl: UgiMetrics, User and group related metrics 15/07/14 11:03:45 DEBUG Groups: Creating new Groups object 15/07/14 11:03:45 DEBUG NativeCodeLoader: Trying to load the custom-built native-hadoop library... 15/07/14 11:03:45 DEBUG NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path 15/07/14 11:03:45 DEBUG NativeCodeLoader: java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 15/07/14 11:03:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/07/14 11:03:45 DEBUG PerformanceAdvisory: Falling back to shell based 15/07/14 11:03:45 DEBUG JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping 15/07/14 11:03:45 DEBUG Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=30; warningDeltaMs=5000 15/07/14 11:03:45 DEBUG YarnSparkHadoopUtil: running as user: yx66jx 15/07/14 11:03:45 DEBUG UserGroupInformation: hadoop login 15/07/14 11:03:45 DEBUG UserGroupInformation: hadoop login commit 15/07/14 11:03:45 DEBUG UserGroupInformation: using kerberos user:null 15/07/14 11:03:45 DEBUG UserGroupInformation: using local user:UnixPrincipal: yx66jx 15/07/14 11:03:45 DEBUG UserGroupInformation: Using user: UnixPrincipal: yx66jx with name yx66jx 15/07/14 11:03:45 DEBUG UserGroupInformation: User entry: yx66jx 15/07/14 11:03:45 DEBUG UserGroupInformation: UGI loginUser:yx66jx (auth:KERBEROS) 15/07/14 11:03:45 DEBUG UserGroupInformation: PrivilegedAction as:yx66jx (auth:SIMPLE) from:org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65) 15/07/14 11:03:46 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1436783220608_0085_01 15/07/14 11:03:46 DEBUG BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false 15/07/14 11:03:46 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = true 15/07/14 11:03:46 DEBUG BlockReaderLocal: dfs.client.domain.socket.data.traffic = false 15/07/14 11:03:46 DEBUG BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler:
[jira] [Created] (SPARK-9033) scala.MatchError: interface java.util.Map (of class java.lang.Class) with Spark SQL
Pavel created SPARK-9033: Summary: scala.MatchError: interface java.util.Map (of class java.lang.Class) with Spark SQL Key: SPARK-9033 URL: https://issues.apache.org/jira/browse/SPARK-9033 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1, 1.2.2 Reporter: Pavel I've a java.util.MapString, String field in a POJO class and I'm trying to use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and getting following error in both 1.2.2 1.3.1 versions of the Spark SQL: *sample code: SQLContext sqlCtx = new SQLContext(sc.sc()); JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); //text line is splitted and assigned to respective field of the event class here DataFrame schemaRDD = sqlCtx.createDataFrame(rdd, Event.class); -- error thrown here schemaRDD.registerTempTable(events); Event class is a Serializable. This issue occurs also with Spark streaming when used with SQL. JavaDStreamString receiverStream = jssc.receiverStream(new StreamingReceiver()); JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, SLIDE_INTERVAL); jssc.checkpoint(event-streaming); windowDStream.foreachRDD(evRDD - { if(evRDD.count() == 0) return null; DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class); schemaRDD.registerTempTable(events); ... } *error: scala.MatchError: interface java.util.Map (of class java.lang.Class) at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) ~[spark-sql_2.10-1.3.1.jar:1.3.1] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9034) Reflect field names defined in GenericUDTF
[ https://issues.apache.org/jira/browse/SPARK-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626279#comment-14626279 ] Takeshi Yamamuro commented on SPARK-9034: - I'll make a PR for this after SPARK-8955 and SPARK-8930 resolved. Reflect field names defined in GenericUDTF -- Key: SPARK-9034 URL: https://issues.apache.org/jira/browse/SPARK-9034 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Takeshi Yamamuro Hive GenericUDTF#initialize() defines field names in a returned schema though, the current HiveGenericUDTF drops these names. We might need to reflect these in a logical plan tree. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5210) Support log rolling in EventLogger
[ https://issues.apache.org/jira/browse/SPARK-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626325#comment-14626325 ] Tao Wang commented on SPARK-5210: - Hi [~joshrosen], we found same problem in streaming / Thrift Server logs. The HistoryServer usually collapses because of OOM exception when it read too large event log written by long-running application. Even we can tune memory settings for it, but it is not a elegant way as logs generated by Streaming/Thrift Server could increase infinitely. We now plan to write event log to separate files according to their job id, say 50 jobs a file. Then HistoryServer could read small file relatively which has a low probability to cause OOM. Support log rolling in EventLogger -- Key: SPARK-5210 URL: https://issues.apache.org/jira/browse/SPARK-5210 Project: Spark Issue Type: New Feature Components: Spark Core, Web UI Reporter: Josh Rosen For long-running Spark applications (e.g. running for days / weeks), the Spark event log may grow to be very large. As a result, it would be useful if EventLoggingListener supported log file rolling / rotation. Adding this feature will involve changes to the HistoryServer in order to be able to load event logs from a sequence of files instead of a single file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5210) Support log rolling in EventLogger
[ https://issues.apache.org/jira/browse/SPARK-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626325#comment-14626325 ] Tao Wang edited comment on SPARK-5210 at 7/14/15 1:21 PM: -- Hi [~joshrosen], we found same problem in streaming / Thrift Server logs. The HistoryServer usually collapses because of OOM exception when it read too large event log written by long-running application. Even we can tune memory settings for it, but it is not a elegant way as logs generated by Streaming/Thrift Server could increase infinitely. We now plan to write event log to separate files according to their job id, say 50 jobs a file. Then HistoryServer could read small file relatively which has a low probability to cause OOM. How do you think? was (Author: wangtaothetonic): Hi [~joshrosen], we found same problem in streaming / Thrift Server logs. The HistoryServer usually collapses because of OOM exception when it read too large event log written by long-running application. Even we can tune memory settings for it, but it is not a elegant way as logs generated by Streaming/Thrift Server could increase infinitely. We now plan to write event log to separate files according to their job id, say 50 jobs a file. Then HistoryServer could read small file relatively which has a low probability to cause OOM. Support log rolling in EventLogger -- Key: SPARK-5210 URL: https://issues.apache.org/jira/browse/SPARK-5210 Project: Spark Issue Type: New Feature Components: Spark Core, Web UI Reporter: Josh Rosen For long-running Spark applications (e.g. running for days / weeks), the Spark event log may grow to be very large. As a result, it would be useful if EventLoggingListener supported log file rolling / rotation. Adding this feature will involve changes to the HistoryServer in order to be able to load event logs from a sequence of files instead of a single file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8977) Define the RateEstimator interface, and implement the ReceiverRateController
[ https://issues.apache.org/jira/browse/SPARK-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626202#comment-14626202 ] François Garillot commented on SPARK-8977: -- Typesafe PR: https://github.com/typesafehub/spark/pull/16 Define the RateEstimator interface, and implement the ReceiverRateController Key: SPARK-8977 URL: https://issues.apache.org/jira/browse/SPARK-8977 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Iulian Dragos Fix For: 1.5.0 Full [design doc|https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing] Implement a rate controller for receiver-based InputDStreams that estimates a maximum rate and sends it to each receiver supervisor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626300#comment-14626300 ] Bolke de Bruin commented on SPARK-9019: --- It might be that we have a configuration issue (but Im not sure): 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Looking for a token with service 10.111.114.16:8032 15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is YARN_AM_RM_TOKEN and the token's service name is I think that should match spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at
[jira] [Commented] (SPARK-8844) head/collect is broken in SparkR
[ https://issues.apache.org/jira/browse/SPARK-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626273#comment-14626273 ] Sun Rui commented on SPARK-8844: This is a bug about reading empty DataFrame. will submit a PR. head/collect is broken in SparkR - Key: SPARK-8844 URL: https://issues.apache.org/jira/browse/SPARK-8844 Project: Spark Issue Type: Bug Components: SparkR Affects Versions: 1.5.0 Reporter: Davies Liu Priority: Blocker {code} t = tables(sqlContext) showDF(T) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘showDF’ for signature ‘logical’ showDF(t) +-+---+ |tableName|isTemporary| +-+---+ +-+---+ 15/07/06 09:59:10 WARN Executor: Told to re-register on heartbeat head(t) Error in readTypedObject(con, type) : Unsupported type for deserialization collect(t) Error in readTypedObject(con, type) : Unsupported type for deserialization {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9033) scala.MatchError: interface java.util.Map (of class java.lang.Class) with Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel updated SPARK-9033: - Description: I've a java.util.MapString, String field in a POJO class and I'm trying to use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and getting following error in both 1.2.2 1.3.1 versions of the Spark SQL: *sample code: SQLContext sqlCtx = new SQLContext(sc.sc()); JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); //text line is splitted and assigned to respective field of the event class here DataFrame schemaRDD = sqlCtx.createDataFrame(rdd, Event.class); -- error thrown here schemaRDD.registerTempTable(events); Event class is a Serializable. This issue occurs also with Spark streaming when used with SQL. JavaDStreamString receiverStream = jssc.receiverStream(new StreamingReceiver()); JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, SLIDE_INTERVAL); jssc.checkpoint(event-streaming); windowDStream.foreachRDD(evRDD - { if(evRDD.count() == 0) return null; DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class); schemaRDD.registerTempTable(events); ... } *error: scala.MatchError: interface java.util.Map (of class java.lang.Class) at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) ~[spark-sql_2.10-1.3.1.jar:1.3.1] was: I've a java.util.MapString, String field in a POJO class and I'm trying to use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and getting following error in both 1.2.2 1.3.1 versions of the Spark SQL: *sample code: SQLContext sqlCtx = new SQLContext(sc.sc()); JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); //text line is splitted and assigned to respective field of the event class here DataFrame schemaRDD = sqlCtx.createDataFrame(rdd, Event.class); -- error thrown here schemaRDD.registerTempTable(events); Event class is a Serializable. This issue occurs also with Spark streaming when used with SQL. JavaDStreamString receiverStream = jssc.receiverStream(new StreamingReceiver()); JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, SLIDE_INTERVAL); jssc.checkpoint(event-streaming); windowDStream.foreachRDD(evRDD - { if(evRDD.count() == 0) return null; DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class); schemaRDD.registerTempTable(events); ... } *error: scala.MatchError: interface java.util.Map (of class java.lang.Class) at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437)
[jira] [Updated] (SPARK-9033) scala.MatchError: interface java.util.Map (of class java.lang.Class) with Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel updated SPARK-9033: - Description: I've a java.util.MapString, String field in a POJO class and I'm trying to use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and getting following error in both 1.2.2 1.3.1 versions of the Spark SQL: *sample code: SQLContext sqlCtx = new SQLContext(sc.sc()); JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); //text line is splitted and assigned to respective field of the event class here DataFrame schemaRDD = sqlCtx.createDataFrame(rdd, Event.class); -- error thrown here schemaRDD.registerTempTable(events); Event class is a Serializable containing a field of type java.util.MapString, String. This issue occurs also with Spark streaming when used with SQL. JavaDStreamString receiverStream = jssc.receiverStream(new StreamingReceiver()); JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, SLIDE_INTERVAL); jssc.checkpoint(event-streaming); windowDStream.foreachRDD(evRDD - { if(evRDD.count() == 0) return null; DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class); schemaRDD.registerTempTable(events); ... } *error: scala.MatchError: interface java.util.Map (of class java.lang.Class) at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) ~[spark-sql_2.10-1.3.1.jar:1.3.1] was: I've a java.util.MapString, String field in a POJO class and I'm trying to use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and getting following error in both 1.2.2 1.3.1 versions of the Spark SQL: *sample code: SQLContext sqlCtx = new SQLContext(sc.sc()); JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); //text line is splitted and assigned to respective field of the event class here DataFrame schemaRDD = sqlCtx.createDataFrame(rdd, Event.class); -- error thrown here schemaRDD.registerTempTable(events); Event class is a Serializable. This issue occurs also with Spark streaming when used with SQL. JavaDStreamString receiverStream = jssc.receiverStream(new StreamingReceiver()); JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, SLIDE_INTERVAL); jssc.checkpoint(event-streaming); windowDStream.foreachRDD(evRDD - { if(evRDD.count() == 0) return null; DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class); schemaRDD.registerTempTable(events); ... } *error: scala.MatchError: interface java.util.Map (of class java.lang.Class) at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at
[jira] [Created] (SPARK-9035) Spark on Mesos Thread Context Class Loader issues
John Omernik created SPARK-9035: --- Summary: Spark on Mesos Thread Context Class Loader issues Key: SPARK-9035 URL: https://issues.apache.org/jira/browse/SPARK-9035 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.4.0, 1.3.1, 1.3.0, 1.2.2 Environment: Mesos on MapRFS. Reporter: John Omernik Priority: Critical There is an issue trying to run Spark on Mesos (using MapRFS). I am able to run this in YARN (Using Myriad on Mesos) on the same cluster, just not directly on Mesos. I've corresponded with MapR and the issue appears to be the class loader being NULL. They will look at trying to address it in their code as well, but the issue exists here as the desired behavior shouldn't be to pass NULL (see https://issues.apache.org/jira/browse/SPARK-1403) Note, I did try to work to reopen SPARK-1403 and Patrick Wendell asked me to open a new issue, (that is this JIRA). Environment: MapR 4.1.0 (using MapRFS) Mesos 22.1 Spark 1.4 (The issue occurs on Spark 1.3.1, 1.3.0, 1.2.2 but not 1.2.0) Some comments from Kannan at MapR (he is no longer with MapR, these comments were prior to him leaving: Here is the corresponding ShimLoader code. cl.getParent is hitting NPE. If you look at Spark code base, you can see that the setContextClassLoader is invoked in a few places, but not necessarily in the context of this stack trace. private static ClassLoader getRootClassLoader() { ClassLoader cl = Thread.currentThread().getContextClassLoader(); trace(getRootClassLoader: thread classLoader is '%s', cl.getClass().getCanonicalName()); while (cl.getParent() != null) { cl = cl.getParent(); } trace(getRootClassLoader: root classLoader is '%s', cl.getClass().getCanonicalName()); return cl; } MapR cannot handle NULL in this case. Basically, it is trying to get a root classloader to use for loading a bunch of classes. It uses the thread's context class loader (TCCL) and keeps going up the parent chain. We could fall back to using the current class's classloader whenever TCCL is NULL. I need to check with some folks what the impact will be. I don't know the specific reason for choosing the TCCL here. I have raised an internal bug to fall back to using the current class loader if the TCCL is not set. Let us also figure out if there is a way for Spark to address this - if it is really a change in behavior from their side. I think we should still fix out code to not make this assumption. But since this is a core change, it may not get out soon. Command Attempted in bin/pyspark from pyspark import SparkContext, SparkConf from pyspark.sql import SQLContext, Row, HiveContext sparkhc = HiveContext(sc) test = sparkhc.sql(show tables) for r in test.collect(): print r Stack Trace from CLI: 15/07/14 09:16:40 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkexecu...@hadoopvm5.mydomain.com:58221] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15/07/14 09:16:40 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, hadoopvm5.mydomain.com): ExecutorLostFailure (executor 20150630-193234-1644210368-5050-10591-S3 lost) 15/07/14 09:16:48 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkexecu...@hadoopmapr3.mydomain.com:53763] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15/07/14 09:16:48 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1, hadoopmapr3.mydomain.com): ExecutorLostFailure (executor 20150630-193234-1644210368-5050-10591-S2 lost) 15/07/14 09:16:53 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkexecu...@hadoopvm5.mydomain.com:52102] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15/07/14 09:16:53 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2, hadoopvm5.mydomain.com): ExecutorLostFailure (executor 20150630-193234-1644210368-5050-10591-S3 lost) 15/07/14 09:17:01 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkexecu...@hadoopmapr3.mydomain.com:58600] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15/07/14 09:17:01 WARN TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3, hadoopmapr3.mydomain.com): ExecutorLostFailure (executor 20150630-193234-1644210368-5050-10591-S2 lost) 15/07/14 09:17:01 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job Traceback (most recent call last): File stdin, line 1, in module File /mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503/python/pyspark/sql/dataframe.py, line 314, in collect port = self._sc._jvm.PythonRDD.collectAndServe(self._jdf.javaToPython().rdd()) File
[jira] [Commented] (SPARK-8996) Add Python API for Kolmogorov-Smirnov Test
[ https://issues.apache.org/jira/browse/SPARK-8996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626409#comment-14626409 ] Manoj Kumar commented on SPARK-8996: Hi, Can I work on this? Add Python API for Kolmogorov-Smirnov Test -- Key: SPARK-8996 URL: https://issues.apache.org/jira/browse/SPARK-8996 Project: Spark Issue Type: New Feature Components: MLlib, PySpark Reporter: Xiangrui Meng Add Python API for the Kolmogorov-Smirnov test implemented in SPARK-8598. It should be similar to ChiSqTest in Python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8125) Accelerate ParquetRelation2 metadata discovery
[ https://issues.apache.org/jira/browse/SPARK-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8125: --- Assignee: Apache Spark (was: Cheng Lian) Accelerate ParquetRelation2 metadata discovery -- Key: SPARK-8125 URL: https://issues.apache.org/jira/browse/SPARK-8125 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.4.0 Reporter: Cheng Lian Assignee: Apache Spark Priority: Blocker For large Parquet tables (e.g., with thousands of partitions), it can be very slow to discover Parquet metadata for schema merging and generating splits for Spark jobs. We need to accelerate this processes. One possible solution is to do the discovery via a distributed Spark job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8125) Accelerate ParquetRelation2 metadata discovery
[ https://issues.apache.org/jira/browse/SPARK-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626456#comment-14626456 ] Apache Spark commented on SPARK-8125: - User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/7396 Accelerate ParquetRelation2 metadata discovery -- Key: SPARK-8125 URL: https://issues.apache.org/jira/browse/SPARK-8125 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.4.0 Reporter: Cheng Lian Assignee: Cheng Lian Priority: Blocker For large Parquet tables (e.g., with thousands of partitions), it can be very slow to discover Parquet metadata for schema merging and generating splits for Spark jobs. We need to accelerate this processes. One possible solution is to do the discovery via a distributed Spark job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8125) Accelerate ParquetRelation2 metadata discovery
[ https://issues.apache.org/jira/browse/SPARK-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8125: --- Assignee: Cheng Lian (was: Apache Spark) Accelerate ParquetRelation2 metadata discovery -- Key: SPARK-8125 URL: https://issues.apache.org/jira/browse/SPARK-8125 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.4.0 Reporter: Cheng Lian Assignee: Cheng Lian Priority: Blocker For large Parquet tables (e.g., with thousands of partitions), it can be very slow to discover Parquet metadata for schema merging and generating splits for Spark jobs. We need to accelerate this processes. One possible solution is to do the discovery via a distributed Spark job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9033) scala.MatchError: interface java.util.Map (of class java.lang.Class) with Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel updated SPARK-9033: - Description: I've a java.util.MapString, String field in a POJO class and I'm trying to use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and getting following error in both 1.2.2 1.3.1 versions of the Spark SQL: *sample code: SQLContext sqlCtx = new SQLContext(sc.sc()); JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); //text line is splitted and assigned to respective field of the event class here DataFrame schemaRDD = sqlCtx.createDataFrame(rdd, Event.class); -- error thrown here schemaRDD.registerTempTable(events); Event class is a Serializable containing a field of type java.util.MapString, String. This issue occurs also with Spark streaming when used with SQL. JavaDStreamString receiverStream = jssc.receiverStream(new StreamingReceiver()); JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, SLIDE_INTERVAL); jssc.checkpoint(event-streaming); windowDStream.foreachRDD(evRDD - { if(evRDD.count() == 0) return null; DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class); schemaRDD.registerTempTable(events); ... } *error: scala.MatchError: interface java.util.Map (of class java.lang.Class) at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) ~[spark-sql_2.10-1.3.1.jar:1.3.1] also this occurs for fields of custom POJO classes: scala.MatchError: class com.test.MyClass (of class java.lang.Class) at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) ~[spark-sql_2.10-1.3.1.jar:1.3.1] also occurs for Calendar type: scala.MatchError: class java.util.Calendar (of class java.lang.Class) at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na]
[jira] [Updated] (SPARK-9033) scala.MatchError: interface java.util.Map (of class java.lang.Class) with Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel updated SPARK-9033: - Description: I've a java.util.MapString, String field in a POJO class and I'm trying to use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and getting following error in both 1.2.2 1.3.1 versions of the Spark SQL: *sample code: SQLContext sqlCtx = new SQLContext(sc.sc()); JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); //text line is splitted and assigned to respective field of the event class here DataFrame schemaRDD = sqlCtx.createDataFrame(rdd, Event.class); -- error thrown here schemaRDD.registerTempTable(events); Event class is a Serializable containing a field of type java.util.MapString, String. This issue occurs also with Spark streaming when used with SQL. JavaDStreamString receiverStream = jssc.receiverStream(new StreamingReceiver()); JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, SLIDE_INTERVAL); jssc.checkpoint(event-streaming); windowDStream.foreachRDD(evRDD - { if(evRDD.count() == 0) return null; DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class); schemaRDD.registerTempTable(events); ... } *error: scala.MatchError: interface java.util.Map (of class java.lang.Class) at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) ~[spark-sql_2.10-1.3.1.jar:1.3.1] **also this occurs for fields of custom POJO classes: scala.MatchError: class com.test.MyClass (of class java.lang.Class) at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) ~[spark-sql_2.10-1.3.1.jar:1.3.1] **also occurs for Calendar type: scala.MatchError: class java.util.Calendar (of class java.lang.Class) at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192) ~[spark-sql_2.10-1.3.1.jar:1.3.1] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[scala-library-2.10.5.jar:na]
[jira] [Commented] (SPARK-8978) Implement the DirectKafkaController
[ https://issues.apache.org/jira/browse/SPARK-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626542#comment-14626542 ] François Garillot commented on SPARK-8978: -- Typesafe PR: https://github.com/typesafehub/spark/pull/18 Implement the DirectKafkaController --- Key: SPARK-8978 URL: https://issues.apache.org/jira/browse/SPARK-8978 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Iulian Dragos Fix For: 1.5.0 Based on this [design doc|https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing]. The DirectKafkaInputDStream should use the rate estimate to control how many records/partition to put in the next batch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8979) Implement a PIDRateEstimator
[ https://issues.apache.org/jira/browse/SPARK-8979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626543#comment-14626543 ] François Garillot commented on SPARK-8979: -- Typesafe PR: https://github.com/typesafehub/spark/pull/17 Implement a PIDRateEstimator Key: SPARK-8979 URL: https://issues.apache.org/jira/browse/SPARK-8979 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Iulian Dragos Fix For: 1.5.0 Based on this [design doc|https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7751) Add @since to stable and experimental methods in MLlib
[ https://issues.apache.org/jira/browse/SPARK-7751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626533#comment-14626533 ] Xiangrui Meng commented on SPARK-7751: -- This is great! Thanks for providing the script! Add @since to stable and experimental methods in MLlib -- Key: SPARK-7751 URL: https://issues.apache.org/jira/browse/SPARK-7751 Project: Spark Issue Type: Umbrella Components: Documentation, MLlib Affects Versions: 1.4.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Minor Labels: starter This is useful to check whether a feature exists in some version of Spark. This is an umbrella JIRA to track the progress. We want to have @since tag for both stable (those without any Experimental/DeveloperApi/AlphaComponent annotations) and experimental methods in MLlib: * an example PR for Scala: https://github.com/apache/spark/pull/6101 * an example PR for Python: https://github.com/apache/spark/pull/6295 We need to dig the history of git commit to figure out what was the Spark version when a method was first introduced. Take `NaiveBayes.setModelType` as an example. We can grep `def setModelType` at different version git tags. {code} meng@xm:~/src/spark $ git show v1.3.0:mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala | grep def setModelType meng@xm:~/src/spark $ git show v1.4.0:mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala | grep def setModelType def setModelType(modelType: String): NaiveBayes = { {code} If there are better ways, please let us know. We cannot add all @since tags in a single PR, which is hard to review. So we made some subtasks for each package, for example `org.apache.spark.classification`. Feel free to add more sub-tasks for Python and the `spark.ml` package. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9036) SparkListenerExecutorMetricsUpdate messages not included in JsonProtocol
Ryan Williams created SPARK-9036: Summary: SparkListenerExecutorMetricsUpdate messages not included in JsonProtocol Key: SPARK-9036 URL: https://issues.apache.org/jira/browse/SPARK-9036 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.4.0, 1.4.1 Reporter: Ryan Williams Priority: Minor The JsonProtocol added in SPARK-3454 [doesn't include|https://github.com/apache/spark/blob/v1.4.1-rc4/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L95-L96] code for ser/de of [{{SparkListenerExecutorMetricsUpdate}}|https://github.com/apache/spark/blob/v1.4.1-rc4/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L107-L110] messages. The comment notes that they are not used, which presumably refers to the fact that the [{{EventLoggingListener}} doesn't write these events|https://github.com/apache/spark/blob/v1.4.1-rc4/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L200-L201]. However, individual listeners can and should make that determination for themselves; I have recently written custom listeners that would like to consume metrics-update messages as JSON, so it would be nice to round out the JsonProtocol implementation by supporting them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626574#comment-14626574 ] Bolke de Bruin commented on SPARK-9019: --- Can this be related to YARN-3103? spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at
[jira] [Commented] (SPARK-8724) Need documentation on how to deploy or use SparkR in Spark 1.4.0+
[ https://issues.apache.org/jira/browse/SPARK-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626482#comment-14626482 ] Vincent Warmerdam commented on SPARK-8724: -- So a tutorial just went live for people who are on spark 1.4. http://blog.rstudio.org/2015/07/14/spark-1-4-for-rstudio/ I suppose if people link to this for now it'd be just fine. For spark 1.5 the provisioning script of ec2 will come with rstudio. Need documentation on how to deploy or use SparkR in Spark 1.4.0+ - Key: SPARK-8724 URL: https://issues.apache.org/jira/browse/SPARK-8724 Project: Spark Issue Type: Bug Components: R Affects Versions: 1.4.0 Reporter: Felix Cheung Priority: Minor As of now there doesn't seem to be any official documentation on how to deploy SparkR with Spark 1.4.0+ Also, cluster manager specific documentation (like http://spark.apache.org/docs/latest/spark-standalone.html) does not call out what mode is supported for SparkR and details on deployment steps. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin updated SPARK-9019: -- Comment: was deleted (was: - this was incorrect - ) spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92) at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:73)
[jira] [Comment Edited] (SPARK-9019) spark-submit fails on yarn with kerberos enabled
[ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625980#comment-14625980 ] Bolke de Bruin edited comment on SPARK-9019 at 7/14/15 7:33 AM: - this was incorrect - was (Author: bolke): Tracing this down it seems that the tokens are not being set on the container in yarn.Client, which is required according to http://aajisaka.github.io/hadoop-project/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html. something like this: ByteBuffer fsTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength()); amContainer.setTokens(fsTokens); in createContainerLaunchContext of yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala spark-submit fails on yarn with kerberos enabled Key: SPARK-9019 URL: https://issues.apache.org/jira/browse/SPARK-9019 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Environment: Hadoop 2.6 with YARN and kerberos enabled Reporter: Bolke de Bruin Labels: kerberos, spark-submit, yarn It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. Commandline: /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py Fails with: 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380. 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470. 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470) 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/ 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms. java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at
[jira] [Resolved] (SPARK-9001) sbt doc fails due to javadoc errors
[ https://issues.apache.org/jira/browse/SPARK-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-9001. Resolution: Fixed Assignee: Joseph E. Gonzalez Fix Version/s: 1.5.0 sbt doc fails due to javadoc errors --- Key: SPARK-9001 URL: https://issues.apache.org/jira/browse/SPARK-9001 Project: Spark Issue Type: Bug Components: Documentation Reporter: Joseph E. Gonzalez Assignee: Joseph E. Gonzalez Priority: Minor Fix For: 1.5.0 Running `build/sbt doc` on master fails due to errors javadocs. This is an issues since `build/sbt publish-local` depends on building the docs. Example error: [info] Generating /spark/unsafe/target/scala-2.10/api/org/apache/spark/unsafe/bitset/BitSet.html... [error] /spark/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSet.java:93: error: bad use of '' [error]* for (long i = bs.nextSetBit(0); i = 0; i = bs.nextSetBit(i + 1)) { [error] ^ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8624) DataFrameReader doesn't respect MERGE_SCHEMA setting for Parquet
[ https://issues.apache.org/jira/browse/SPARK-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626416#comment-14626416 ] Liang-Chi Hsieh commented on SPARK-8624: I think you can use DataFrameReader.option to set up needed parameters before calling DataFrameReader.parquet. It should solve your problem. DataFrameReader doesn't respect MERGE_SCHEMA setting for Parquet Key: SPARK-8624 URL: https://issues.apache.org/jira/browse/SPARK-8624 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Rex Xiong Labels: parquet In 1.4.0, parquet is read by DataFrameReader.parquet, when creating ParquetRelation2 object, parameters is hard-coded as Map.empty[String, String], so ParquetRelation2.shouldMergeSchemas is always true (the default value). In previous version, spark.sql.hive.convertMetastoreParquet.mergeSchema config is respected. This bug downgrade performance a lot for a folder with hundreds of parquet files and we don't want a schema merge. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8967) Implement @since as an annotation
[ https://issues.apache.org/jira/browse/SPARK-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626662#comment-14626662 ] Xiangrui Meng commented on SPARK-8967: -- The issue with Java annotation is that it doesn't show up correctly in the generated Scala doc. Especially, the version value disappears. I don't know a solution. We could switch to Scala annotation in MLlib, but this is not ideal. Implement @since as an annotation - Key: SPARK-8967 URL: https://issues.apache.org/jira/browse/SPARK-8967 Project: Spark Issue Type: New Feature Components: Documentation, Spark Core Reporter: Xiangrui Meng Assignee: Xiangrui Meng Original Estimate: 1h Remaining Estimate: 1h We use @since tag in JavaDoc. There exists one issue. For a overloaded method, it inherits the doc from its parent if no JavaDoc is provided. However, if we want to add @since, we have to add JavaDoc. Then we need to copy the JavaDoc from parent, which makes it hard to keep docs in sync. A better solution would be implementing @since as an annotation, which is not part of the JavaDoc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8945) Add and Subtract expression should support IntervalType
[ https://issues.apache.org/jira/browse/SPARK-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8945: --- Assignee: Apache Spark Add and Subtract expression should support IntervalType --- Key: SPARK-8945 URL: https://issues.apache.org/jira/browse/SPARK-8945 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Apache Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8945) Add and Subtract expression should support IntervalType
[ https://issues.apache.org/jira/browse/SPARK-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626683#comment-14626683 ] Apache Spark commented on SPARK-8945: - User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/7398 Add and Subtract expression should support IntervalType --- Key: SPARK-8945 URL: https://issues.apache.org/jira/browse/SPARK-8945 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9038) Missing TaskEnd event when task attempt is superseded by another (speculative) attempt
Ryan Williams created SPARK-9038: Summary: Missing TaskEnd event when task attempt is superseded by another (speculative) attempt Key: SPARK-9038 URL: https://issues.apache.org/jira/browse/SPARK-9038 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.4.1 Reporter: Ryan Williams Yesterday I ran a job that produced [this event log|https://www.dropbox.com/s/y90rz0gxao5w9z9/application_1432740718700_3010?dl=0]. There are 17314 {{TaskStart}}'s and 17313 {{TaskEnd}}'s; task ID 15820 (aka 13.0.526.0) is missing a {{TaskEnd}} event. A speculative second attempt, ID 16295 (13.0.526.1) finished before it; 15820 was the last taskattempt running in stage-attempt 13.0 and job 3, and when it finished the latter two were each marked as succeeded. At the conclusion of stage 13 / job 3, I observed a few things to be in conflicting/inconsistent states: *Reflecting 15820 as having finished successfully:* * The stage page for 13.0 [showed SUCCESS in the Status column of the per-task-attempt table|http://cl.ly/image/2O0O42382p2W?_ga=1.265890767.118106744.1401937910]. * The driver stdout reported 15820's successful finish, and that it was being ignored due to another attempt of the same task (16295, per above) having already succeeded: {code} 15/07/13 23:30:40 INFO scheduler.TaskSetManager: Ignoring task-finished event for 526.0 in stage 13.0 because task 526 has already completed successfully 15/07/13 23:30:40 INFO cluster.YarnScheduler: Removed TaskSet 13.0, whose tasks have all completed, from pool 15/07/13 23:30:40 INFO scheduler.DAGScheduler: Job 3 finished: collect at JointHistogram.scala:107, took 579.659523 s {code} *Not reflecting 15820 as having finished at all:* * As I mentioned before, [the event log|https://www.dropbox.com/s/y90rz0gxao5w9z9/application_1432740718700_3010?dl=0] is missing a {{TaskEnd}} for 15820. * The {{AllJobsPage}} shows 11258 tasks finished in job 3; it would have been 11259 with 15820. ** Additionally, inspecting the page in the DOM revealed a 1-task-wide sliver of light-blue (i.e. running task(s)) in the progress bar. ** [This screenshot|http://cl.ly/image/3O201z0e0G2C?_ga=1.265890767.118106744.1401937910] shows both of these on the {{AllJobsPage}}. * A history server, pointed at the event log, consistently shows 15820 as not having finished. ** This is somewhat unsurprising given that the event log powering the history server doesn't {{TaskEnd}} 15820, but seems notable nonetheless since the live UI seemingly *did* partially record the task as having ended (cf. stage page showing SUCCESS). ** Stage page shows 15820 as RUNNING. ** AllJobsPage shows 11258 tasks succeeded, 1 running. I've gone over the relevant task-success code paths and can't understand how the stage page would show me SUCCESS in the live UI, without anything having been written to the event log or the AllJobsPage's counters having been updated. [Here is a bunch of my driver stdout|https://www.dropbox.com/s/pr7rswt4o2umm20/3010.stdout?dl=0], that shows nothing abnormal afaict; and [the dreaded message about events being dropped|https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L40] did not appear anywhere while the app was running, which was one of my only guesses about how this could have happened (but which wouldn't fully explain all of the above anyway). Interested in hearing anyones' thoughts about how I might have arrived at this inconsistent state. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-8945) Add and Subtract expression should support IntervalType
[ https://issues.apache.org/jira/browse/SPARK-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-8945: --- Assignee: (was: Apache Spark) Add and Subtract expression should support IntervalType --- Key: SPARK-8945 URL: https://issues.apache.org/jira/browse/SPARK-8945 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9037) Task table pagination for the Stage page
[ https://issues.apache.org/jira/browse/SPARK-9037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626686#comment-14626686 ] Apache Spark commented on SPARK-9037: - User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/7399 Task table pagination for the Stage page Key: SPARK-9037 URL: https://issues.apache.org/jira/browse/SPARK-9037 Project: Spark Issue Type: Improvement Components: Web UI Reporter: Shixiong Zhu Implement task table pagination for the Stage page to resolve the UI scalability issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9037) Task table pagination for the Stage page
[ https://issues.apache.org/jira/browse/SPARK-9037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9037: --- Assignee: Apache Spark Task table pagination for the Stage page Key: SPARK-9037 URL: https://issues.apache.org/jira/browse/SPARK-9037 Project: Spark Issue Type: Improvement Components: Web UI Reporter: Shixiong Zhu Assignee: Apache Spark Implement task table pagination for the Stage page to resolve the UI scalability issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9029) shortcut CaseKeyWhen if key is null
[ https://issues.apache.org/jira/browse/SPARK-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-9029: Assignee: Wenchen Fan shortcut CaseKeyWhen if key is null --- Key: SPARK-9029 URL: https://issues.apache.org/jira/browse/SPARK-9029 Project: Spark Issue Type: Improvement Components: SQL Reporter: Wenchen Fan Assignee: Wenchen Fan Priority: Minor Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9037) Task table pagination for the Stage page
Shixiong Zhu created SPARK-9037: --- Summary: Task table pagination for the Stage page Key: SPARK-9037 URL: https://issues.apache.org/jira/browse/SPARK-9037 Project: Spark Issue Type: Improvement Components: Web UI Reporter: Shixiong Zhu Implement task table pagination for the Stage page to resolve the UI scalability issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9037) Task table pagination for the Stage page
[ https://issues.apache.org/jira/browse/SPARK-9037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-9037. -- Resolution: Duplicate [~zsxwing] Please search JIRA first; this has been filed a few times now. You know the drill. Task table pagination for the Stage page Key: SPARK-9037 URL: https://issues.apache.org/jira/browse/SPARK-9037 Project: Spark Issue Type: Improvement Components: Web UI Reporter: Shixiong Zhu Implement task table pagination for the Stage page to resolve the UI scalability issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9029) shortcut CaseKeyWhen if key is null
[ https://issues.apache.org/jira/browse/SPARK-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-9029. - Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 7389 [https://github.com/apache/spark/pull/7389] shortcut CaseKeyWhen if key is null --- Key: SPARK-9029 URL: https://issues.apache.org/jira/browse/SPARK-9029 Project: Spark Issue Type: Improvement Components: SQL Reporter: Wenchen Fan Priority: Minor Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8965) Add ml-guide Python Example: Estimator, Transformer, and Param
[ https://issues.apache.org/jira/browse/SPARK-8965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625831#comment-14625831 ] Arijit Saha edited comment on SPARK-8965 at 7/14/15 6:17 PM: - Hi Joseph, I would like to take up this task. Thanks, Arijit. was (Author: arijit saha): Hi Joseph, I would like to take up this task. Being a starter, will help me, to understand flow. Thanks, Arijit. Add ml-guide Python Example: Estimator, Transformer, and Param -- Key: SPARK-8965 URL: https://issues.apache.org/jira/browse/SPARK-8965 Project: Spark Issue Type: Sub-task Components: Documentation, ML, PySpark Reporter: Joseph K. Bradley Priority: Minor Labels: starter Look at: [http://spark.apache.org/docs/latest/ml-guide.html#example-estimator-transformer-and-param] We need a Python example doing exactly the same thing, but in Python. It should be tested using the PySpark shell. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9027) Generalize predicate pushdown into the metastore
[ https://issues.apache.org/jira/browse/SPARK-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-9027. - Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 7386 [https://github.com/apache/spark/pull/7386] Generalize predicate pushdown into the metastore Key: SPARK-9027 URL: https://issues.apache.org/jira/browse/SPARK-9027 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9005) RegressionMetrics computing incorrect explainedVariance and r2
[ https://issues.apache.org/jira/browse/SPARK-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626815#comment-14626815 ] Ayman Farahat commented on SPARK-9005: -- I compared the R2 and RMSE after fitting an ALS model . here are the results rank 40 r2 = 0.993274964231 explained var = 0.993566133802 count = 94652197 meanres -0.0606718131255 meanres2 0.085020285731 rank 50 r2 = 0.993547408858 explained var = 0.993826795105 count = 94652197 meanres -0.0594314727572 meanres2 0.081575944201 RegressionMetrics computing incorrect explainedVariance and r2 -- Key: SPARK-9005 URL: https://issues.apache.org/jira/browse/SPARK-9005 Project: Spark Issue Type: Bug Components: MLlib Reporter: Feynman Liang Assignee: Feynman Liang {{RegressionMetrics}} currently computes explainedVariance using {{summary.variance(1)}} (variance of the residuals) where the [Wikipedia definition|https://en.wikipedia.org/wiki/Fraction_of_variance_unexplained] uses the residual sum of squares {{math.pow(summary.normL2(1), 2)}}. The two coincide only when the predictor is unbiased (e.g. an intercept term is included in a linear model), but this is not always the case. We should change to be consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9022) UnsafeProject
[ https://issues.apache.org/jira/browse/SPARK-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-9022: - Assignee: Davies Liu UnsafeProject - Key: SPARK-9022 URL: https://issues.apache.org/jira/browse/SPARK-9022 Project: Spark Issue Type: New Feature Components: SQL Reporter: Reynold Xin Assignee: Davies Liu Create a version of Project that projects output out directly into serialized UnsafeRow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8718) Improve EdgePartition2D for non perfect square number of partitions
[ https://issues.apache.org/jira/browse/SPARK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave resolved SPARK-8718. --- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 7104 [https://github.com/apache/spark/pull/7104] Improve EdgePartition2D for non perfect square number of partitions --- Key: SPARK-8718 URL: https://issues.apache.org/jira/browse/SPARK-8718 Project: Spark Issue Type: Improvement Components: GraphX Reporter: Andrew Ray Priority: Minor Fix For: 1.5.0 The current implementation of EdgePartition2D has a major limitation: bq. One of the limitations of this approach is that the number of machines must either be a perfect square. We partially address this limitation by computing the machine assignment to the next largest perfect square and then mapping back down to the actual number of machines. Unfortunately, this can also lead to work imbalance and so it is suggested that a perfect square is used. To remove this limitation I'm proposing the following code change. It allows us to partition into any number of evenly sized bins while maintaining the property that any vertex will only need to be replicated at most 2 * sqrt(numParts) times. To maintain current behavior for perfect squares we use the old algorithm in that case, although this could be removed if we dont care about producing the exact same result. See this IPython notebook for a visualization of what is being proposed [https://github.com/aray/e2d/blob/master/EdgePartition2D.ipynb] and download it to interactively change the number of partitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8343) Improve the Spark Streaming Guides
[ https://issues.apache.org/jira/browse/SPARK-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Dusenberry updated SPARK-8343: --- Labels: spark.tc (was: ) Improve the Spark Streaming Guides -- Key: SPARK-8343 URL: https://issues.apache.org/jira/browse/SPARK-8343 Project: Spark Issue Type: Improvement Components: Documentation, Streaming Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Minor Labels: spark.tc Fix For: 1.4.1, 1.5.0 Improve the Spark Streaming Guides by fixing broken links, rewording confusing sections, fixing typos, adding missing words, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6485) Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark
[ https://issues.apache.org/jira/browse/SPARK-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Dusenberry updated SPARK-6485: --- Labels: spark.tc (was: ) Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark -- Key: SPARK-6485 URL: https://issues.apache.org/jira/browse/SPARK-6485 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Xiangrui Meng Labels: spark.tc We should add APIs for CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark. Internally, we can use DataFrames for serialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6485) Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark
[ https://issues.apache.org/jira/browse/SPARK-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627120#comment-14627120 ] Mike Dusenberry commented on SPARK-6485: Hey [~mengxr]. This is still coming, sorry about the delay! I've been creating wrappers around the Scala/Java API, so it sounds like I'm on the right track. I plan to have it completed by the end of the week. Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark -- Key: SPARK-6485 URL: https://issues.apache.org/jira/browse/SPARK-6485 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Xiangrui Meng Labels: spark.tc We should add APIs for CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark. Internally, we can use DataFrames for serialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9022) UnsafeProject
[ https://issues.apache.org/jira/browse/SPARK-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627035#comment-14627035 ] Apache Spark commented on SPARK-9022: - User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/7402 UnsafeProject - Key: SPARK-9022 URL: https://issues.apache.org/jira/browse/SPARK-9022 Project: Spark Issue Type: New Feature Components: SQL Reporter: Reynold Xin Assignee: Davies Liu Create a version of Project that projects output out directly into serialized UnsafeRow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9022) UnsafeProject
[ https://issues.apache.org/jira/browse/SPARK-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9022: --- Assignee: Apache Spark (was: Davies Liu) UnsafeProject - Key: SPARK-9022 URL: https://issues.apache.org/jira/browse/SPARK-9022 Project: Spark Issue Type: New Feature Components: SQL Reporter: Reynold Xin Assignee: Apache Spark Create a version of Project that projects output out directly into serialized UnsafeRow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9022) UnsafeProject
[ https://issues.apache.org/jira/browse/SPARK-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9022: --- Assignee: Davies Liu (was: Apache Spark) UnsafeProject - Key: SPARK-9022 URL: https://issues.apache.org/jira/browse/SPARK-9022 Project: Spark Issue Type: New Feature Components: SQL Reporter: Reynold Xin Assignee: Davies Liu Create a version of Project that projects output out directly into serialized UnsafeRow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9043) Serialize key, value and combiner classes in ShuffleDependency
[ https://issues.apache.org/jira/browse/SPARK-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9043: --- Assignee: Apache Spark Serialize key, value and combiner classes in ShuffleDependency -- Key: SPARK-9043 URL: https://issues.apache.org/jira/browse/SPARK-9043 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matt Massie Assignee: Apache Spark ShuffleManager implementations are currently not given type information regarding the key, value and combiner classes. Serialization of shuffle objects relies on them being JavaSerializable, with methods defined for reading/writing the object or, alternatively, serialization via Kryo which uses reflection. Serialization systems like Avro, Thrift and Protobuf generate classes with zero argument constructors and explicit schema information (e.g. IndexedRecords in Avro have get, put and getSchema methods). By serializing the key, value and combiner class names in ShuffleDependency, shuffle implementations will have access to schema information when registerShuffle() is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9043) Serialize key, value and combiner classes in ShuffleDependency
[ https://issues.apache.org/jira/browse/SPARK-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627051#comment-14627051 ] Apache Spark commented on SPARK-9043: - User 'massie' has created a pull request for this issue: https://github.com/apache/spark/pull/7403 Serialize key, value and combiner classes in ShuffleDependency -- Key: SPARK-9043 URL: https://issues.apache.org/jira/browse/SPARK-9043 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matt Massie ShuffleManager implementations are currently not given type information regarding the key, value and combiner classes. Serialization of shuffle objects relies on them being JavaSerializable, with methods defined for reading/writing the object or, alternatively, serialization via Kryo which uses reflection. Serialization systems like Avro, Thrift and Protobuf generate classes with zero argument constructors and explicit schema information (e.g. IndexedRecords in Avro have get, put and getSchema methods). By serializing the key, value and combiner class names in ShuffleDependency, shuffle implementations will have access to schema information when registerShuffle() is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9043) Serialize key, value and combiner classes in ShuffleDependency
[ https://issues.apache.org/jira/browse/SPARK-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9043: --- Assignee: (was: Apache Spark) Serialize key, value and combiner classes in ShuffleDependency -- Key: SPARK-9043 URL: https://issues.apache.org/jira/browse/SPARK-9043 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matt Massie ShuffleManager implementations are currently not given type information regarding the key, value and combiner classes. Serialization of shuffle objects relies on them being JavaSerializable, with methods defined for reading/writing the object or, alternatively, serialization via Kryo which uses reflection. Serialization systems like Avro, Thrift and Protobuf generate classes with zero argument constructors and explicit schema information (e.g. IndexedRecords in Avro have get, put and getSchema methods). By serializing the key, value and combiner class names in ShuffleDependency, shuffle implementations will have access to schema information when registerShuffle() is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9045) Fix Scala 2.11 build break due in UnsafeExternalRowSorter
[ https://issues.apache.org/jira/browse/SPARK-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-9045: -- Affects Version/s: 1.5.0 Target Version/s: 1.5.0 Fix Scala 2.11 build break due in UnsafeExternalRowSorter - Key: SPARK-9045 URL: https://issues.apache.org/jira/browse/SPARK-9045 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: Josh Rosen Assignee: Josh Rosen Priority: Blocker {code} [error] /home/jenkins/workspace/Spark-Master-Scala211-Compile/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java:135: error: anonymous org.apache.spark.sql.execution.UnsafeExternalRowSorter$1 is not abstract and does not override abstract method BminBy(Function1InternalRow,B,OrderingB) in TraversableOnce [error] return new AbstractScalaRowIterator() { [error] ^ [error] where B,A are type-variables: [error] B extends Object declared in method BminBy(Function1A,B,OrderingB) [error] A extends Object declared in interface TraversableOnce [error] 1 error [error] Compile failed at Jul 14, 2015 2:26:25 PM [26.443s] {code} It turns out that this can be fixed by making AbstractScalaRowIterator into a concrete class instead of an abstract class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7265) Improving documentation for Spark SQL Hive support
[ https://issues.apache.org/jira/browse/SPARK-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-7265: Labels: spark.tc (was: ) Improving documentation for Spark SQL Hive support --- Key: SPARK-7265 URL: https://issues.apache.org/jira/browse/SPARK-7265 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 1.3.1 Reporter: Jihong MA Assignee: Jihong MA Priority: Trivial Labels: spark.tc Fix For: 1.5.0 miscellaneous documentation improvement for Spark SQL Hive support, Yarn cluster deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2859) Update url of Kryo project in related docs
[ https://issues.apache.org/jira/browse/SPARK-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-2859: Labels: spark.tc (was: ) Update url of Kryo project in related docs -- Key: SPARK-2859 URL: https://issues.apache.org/jira/browse/SPARK-2859 Project: Spark Issue Type: Documentation Components: Documentation Reporter: Guancheng Chen Assignee: Guancheng Chen Priority: Trivial Labels: spark.tc Fix For: 1.0.3, 1.1.0 Kryo project has been migrated from googlecode to github, hence we need to update its URL in related docs such as tuning.md. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8639) Instructions for executing jekyll in docs/README.md could be slightly more clear, typo in docs/api.md
[ https://issues.apache.org/jira/browse/SPARK-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-8639: Labels: spark.tc (was: ) Instructions for executing jekyll in docs/README.md could be slightly more clear, typo in docs/api.md - Key: SPARK-8639 URL: https://issues.apache.org/jira/browse/SPARK-8639 Project: Spark Issue Type: Documentation Components: Documentation Reporter: Rosstin Murphy Assignee: Rosstin Murphy Priority: Trivial Labels: spark.tc Fix For: 1.4.1, 1.5.0 In docs/README.md, the text states around line 31 Execute 'jekyll' from the 'docs/' directory. Compiling the site with Jekyll will create a directory called '_site' containing index.html as well as the rest of the compiled files. It might be more clear if we said Execute 'jekyll build' from the 'docs/' directory to compile the site. Compiling the site with Jekyll will create a directory called '_site' containing index.html as well as the rest of the compiled files. In docs/api.md: Here you can API docs for Spark and its submodules. should be something like: Here you can read API docs for Spark and its submodules. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5562) LDA should handle empty documents
[ https://issues.apache.org/jira/browse/SPARK-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-5562: Labels: spark.tc (was: starter) LDA should handle empty documents - Key: SPARK-5562 URL: https://issues.apache.org/jira/browse/SPARK-5562 Project: Spark Issue Type: Test Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Assignee: Alok Singh Priority: Minor Labels: spark.tc, starter Fix For: 1.5.0 Original Estimate: 96h Remaining Estimate: 96h Latent Dirichlet Allocation (LDA) could easily be given empty documents when people select a small vocabulary. We should check to make sure it is robust to empty documents. This will hopefully take the form of a unit test, but may require modifying the LDA implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7357) Improving HBaseTest example
[ https://issues.apache.org/jira/browse/SPARK-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Kadner updated SPARK-7357: Labels: spark.tc (was: ) Improving HBaseTest example --- Key: SPARK-7357 URL: https://issues.apache.org/jira/browse/SPARK-7357 Project: Spark Issue Type: Improvement Components: Examples Affects Versions: 1.3.1 Reporter: Jihong MA Assignee: Jihong MA Priority: Minor Labels: spark.tc Fix For: 1.5.0 Original Estimate: 2m Remaining Estimate: 2m Minor improvement to HBaseTest example, when Hbase related configurations e.g: zookeeper quorum, zookeeper client port or zookeeper.znode.parent are not set to default (localhost:2181), connection to zookeeper might hang as shown in following stack 15/03/26 18:31:20 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=xxx.xxx.xxx:2181 sessionTimeout=9 watcher=hconnection-0x322a4437, quorum=xxx.xxx.xxx:2181, baseZNode=/hbase 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Opening socket connection to server 9.30.94.121:2181. Will not attempt to authenticate using SASL (unknown error) 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Socket connection established to xxx.xxx.xxx/9.30.94.121:2181, initiating session 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Session establishment complete on server xxx.xxx.xxx/9.30.94.121:2181, sessionid = 0x14c53cd311e004b, negotiated timeout = 4 15/03/26 18:31:21 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null this is due to hbase-site.xml is not placed on spark class path. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7920) Make MLlib ChiSqSelector Serializable ( Fix Related Documentation Example).
[ https://issues.apache.org/jira/browse/SPARK-7920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7920: --- Labels: (was: spark.tc) Make MLlib ChiSqSelector Serializable ( Fix Related Documentation Example). Key: SPARK-7920 URL: https://issues.apache.org/jira/browse/SPARK-7920 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.3.1, 1.4.0 Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Minor Fix For: 1.4.0 The MLlib ChiSqSelector class is not serializable, and so the example in the ChiSqSelector documentation fails. Also, that example is missing the import of ChiSqSelector. ChiSqSelector should just extend Serializable. Steps: 1. Locate the MLlib ChiSqSelector documentation example. 2. Fix the example by adding an import statement for ChiSqSelector. 3. Attempt to run - notice that it will fail due to ChiSqSelector not being serializable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8927) Doc format wrong for some config descriptions
[ https://issues.apache.org/jira/browse/SPARK-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8927: --- Labels: (was: spark.tc) Doc format wrong for some config descriptions - Key: SPARK-8927 URL: https://issues.apache.org/jira/browse/SPARK-8927 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 1.4.0 Reporter: Jon Alter Assignee: Jon Alter Priority: Trivial Fix For: 1.4.2, 1.5.0 In the docs, a couple descriptions of configuration (under Network) are not inside td/td and are being displayed immediately under the section title instead of in their row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7985) Remove fittingParamMap references. Update ML Doc Estimator, Transformer, and Param examples.
[ https://issues.apache.org/jira/browse/SPARK-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7985: --- Labels: (was: spark.tc) Remove fittingParamMap references. Update ML Doc Estimator, Transformer, and Param examples. Key: SPARK-7985 URL: https://issues.apache.org/jira/browse/SPARK-7985 Project: Spark Issue Type: Bug Components: Documentation, ML Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Minor Fix For: 1.4.0 Update ML Doc's Estimator, Transformer, and Param Scala Java examples to use model.extractParamMap instead of model.fittingParamMap, which no longer exists. Remove all other references to fittingParamMap throughout Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7969) Drop method on Dataframes should handle Column
[ https://issues.apache.org/jira/browse/SPARK-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7969: --- Labels: (was: spark.tc) Drop method on Dataframes should handle Column -- Key: SPARK-7969 URL: https://issues.apache.org/jira/browse/SPARK-7969 Project: Spark Issue Type: Improvement Components: PySpark, SQL Affects Versions: 1.4.0 Reporter: Olivier Girardot Assignee: Mike Dusenberry Priority: Minor Fix For: 1.4.1, 1.5.0 For now the drop method available on Dataframe since Spark 1.4.0 only accepts a column name (as a string), it should also accept a Column as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7969) Drop method on Dataframes should handle Column
[ https://issues.apache.org/jira/browse/SPARK-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Dusenberry updated SPARK-7969: --- Labels: spark.tc (was: ) Drop method on Dataframes should handle Column -- Key: SPARK-7969 URL: https://issues.apache.org/jira/browse/SPARK-7969 Project: Spark Issue Type: Improvement Components: PySpark, SQL Affects Versions: 1.4.0 Reporter: Olivier Girardot Assignee: Mike Dusenberry Priority: Minor Labels: spark.tc Fix For: 1.4.1, 1.5.0 For now the drop method available on Dataframe since Spark 1.4.0 only accepts a column name (as a string), it should also accept a Column as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7883) Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation.
[ https://issues.apache.org/jira/browse/SPARK-7883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Dusenberry updated SPARK-7883: --- Target Version/s: 1.4.0, 1.0.3, 1.1.2, 1.2.3, 1.3.2 (was: 1.0.3, 1.1.2, 1.2.3, 1.3.2, 1.4.0) Labels: spark.tc (was: ) Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation. --- Key: SPARK-7883 URL: https://issues.apache.org/jira/browse/SPARK-7883 Project: Spark Issue Type: Bug Components: Documentation, MLlib Affects Versions: 1.0.2, 1.1.1, 1.2.2, 1.3.1, 1.4.0 Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Trivial Labels: spark.tc Fix For: 1.0.3, 1.1.2, 1.2.3, 1.3.2, 1.4.0 The trainImplicit Scala example near the end of the MLlib Collaborative Filtering documentation refers to an ALS.trainImplicit function signature that does not exist. Rather than add an extra function, let's just fix the example. Currently, the example refers to a function that would have the following signature: def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int, alpha: Double) : MatrixFactorizationModel Instead, let's change the example to refer to this function, which does exist (notice the addition of the lambda parameter): def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int, lambda: Double, alpha: Double) : MatrixFactorizationModel -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org