[jira] [Resolved] (SPARK-9010) Improve the Spark Configuration document about `spark.kryoserializer.buffer`

2015-07-14 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-9010.
--
   Resolution: Fixed
Fix Version/s: 1.5.0
   1.4.2

Issue resolved by pull request 7393
[https://github.com/apache/spark/pull/7393]

 Improve the Spark Configuration document about `spark.kryoserializer.buffer`
 

 Key: SPARK-9010
 URL: https://issues.apache.org/jira/browse/SPARK-9010
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.4.0
Reporter: StanZhai
Priority: Trivial
  Labels: documentation
 Fix For: 1.4.2, 1.5.0


 The meaning of spark.kryoserializer.buffer should be Initial size of Kryo's 
 serialization buffer. Note that there will be one buffer per core on each 
 worker. This buffer will grow up to spark.kryoserializer.buffer.max if 
 needed..
 The spark.kryoserializer.buffer.max.mb is out-of-date in spark 1.4.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8851) in Yarn client mode, Client.scala does not login even when credentials are specified

2015-07-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625999#comment-14625999
 ] 

Apache Spark commented on SPARK-8851:
-

User 'harishreedharan' has created a pull request for this issue:
https://github.com/apache/spark/pull/7394

 in Yarn client mode, Client.scala does not login even when credentials are 
 specified
 

 Key: SPARK-8851
 URL: https://issues.apache.org/jira/browse/SPARK-8851
 Project: Spark
  Issue Type: Bug
  Components: YARN
Reporter: Hari Shreedharan

 [#6051|https://github.com/apache/spark/pull/6051] added support for passing 
 the credentials configuration from SparkConf, so the client mode works fine. 
 This though created an issue where the Client.scala class does not login to 
 the KDC, thus requiring a kinit before running in Client mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9031) Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class

2015-07-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9031:
---

Assignee: Apache Spark  (was: Josh Rosen)

 Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class
 ---

 Key: SPARK-9031
 URL: https://issues.apache.org/jira/browse/SPARK-9031
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, Spark Core
Reporter: Josh Rosen
Assignee: Apache Spark

 BlockObjectWriter has only one concrete non-test class, 
 DiskBlockObjectWriter.  In order to simplify the code in preparation for 
 other refactorings, I think that we should remove this base class and have 
 only DiskBlockObjectWriter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9031) Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class

2015-07-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9031:
---

Assignee: Josh Rosen  (was: Apache Spark)

 Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class
 ---

 Key: SPARK-9031
 URL: https://issues.apache.org/jira/browse/SPARK-9031
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, Spark Core
Reporter: Josh Rosen
Assignee: Josh Rosen

 BlockObjectWriter has only one concrete non-test class, 
 DiskBlockObjectWriter.  In order to simplify the code in preparation for 
 other refactorings, I think that we should remove this base class and have 
 only DiskBlockObjectWriter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9031) Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class

2015-07-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625920#comment-14625920
 ] 

Apache Spark commented on SPARK-9031:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/7391

 Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class
 ---

 Key: SPARK-9031
 URL: https://issues.apache.org/jira/browse/SPARK-9031
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, Spark Core
Reporter: Josh Rosen
Assignee: Josh Rosen

 BlockObjectWriter has only one concrete non-test class, 
 DiskBlockObjectWriter.  In order to simplify the code in preparation for 
 other refactorings, I think that we should remove this base class and have 
 only DiskBlockObjectWriter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9010) Improve the Spark Configuration document about `spark.kryoserializer.buffer`

2015-07-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625994#comment-14625994
 ] 

Apache Spark commented on SPARK-9010:
-

User 'stanzhai' has created a pull request for this issue:
https://github.com/apache/spark/pull/7393

 Improve the Spark Configuration document about `spark.kryoserializer.buffer`
 

 Key: SPARK-9010
 URL: https://issues.apache.org/jira/browse/SPARK-9010
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.4.0
Reporter: StanZhai
Priority: Trivial
  Labels: documentation

 The meaning of spark.kryoserializer.buffer should be Initial size of Kryo's 
 serialization buffer. Note that there will be one buffer per core on each 
 worker. This buffer will grow up to spark.kryoserializer.buffer.max if 
 needed..
 The spark.kryoserializer.buffer.max.mb is out-of-date in spark 1.4.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9003) Add map/update function to MLlib/Vector

2015-07-14 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625939#comment-14625939
 ] 

Sean Owen commented on SPARK-9003:
--

[~josephkb] Please not another one! the world has too many.

 Add map/update function to MLlib/Vector
 ---

 Key: SPARK-9003
 URL: https://issues.apache.org/jira/browse/SPARK-9003
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Yanbo Liang
Priority: Minor

 MLlib/Vector only support foreachActive function and is short of map/update 
 which is inconvenience for some Vector operations.
 For example:
 val a = Vectors.dense(...)
 If we want to compute math.log for each elements of a and get Vector as 
 return value, we can only code as:
 val b = Vectors.dense(a.toArray.map(math.log))
 or we can use toBreeze and fromBreeze make transformation with breeze API.
 The code snippet is not elegant, we want it can implement:
 val c = a.map(math.log)
 Also currently MLlib/Matrix has implemented map/update/foreachActive 
 function. I think Vector should also has map/update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9001) sbt doc fails due to javadoc errors

2015-07-14 Thread Joseph E. Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625970#comment-14625970
 ] 

Joseph E. Gonzalez commented on SPARK-9001:
---

While the issue is generally minor it does block `build/sbt publish-local`.

 sbt doc fails due to javadoc errors
 ---

 Key: SPARK-9001
 URL: https://issues.apache.org/jira/browse/SPARK-9001
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Reporter: Joseph E. Gonzalez
Priority: Minor

 Running `build/sbt doc` on master fails due to errors javadocs. 
 This is an issues since `build/sbt publish-local` depends on building the 
 docs.
 Example error:
 [info] Generating 
 /spark/unsafe/target/scala-2.10/api/org/apache/spark/unsafe/bitset/BitSet.html...
 [error] 
 /spark/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSet.java:93: 
 error: bad use of ''
 [error]*  for (long i = bs.nextSetBit(0); i = 0; i = bs.nextSetBit(i + 
 1)) {
 [error] ^



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625980#comment-14625980
 ] 

Bolke de Bruin commented on SPARK-9019:
---

Tracing this down it seems that the tokens are not being set on the container 
in yarn.Client, which is required according to 
http://aajisaka.github.io/hadoop-project/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html.

something like this:

  ByteBuffer fsTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength());
  amContainer.setTokens(fsTokens);

in createContainerLaunchContext of 
yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   

[jira] [Created] (SPARK-9031) Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class

2015-07-14 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-9031:
-

 Summary: Merge BlockObjectWriter and DiskBlockObject writer to 
remove abstract class
 Key: SPARK-9031
 URL: https://issues.apache.org/jira/browse/SPARK-9031
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, Spark Core
Reporter: Josh Rosen
Assignee: Josh Rosen


BlockObjectWriter has only one concrete non-test class, DiskBlockObjectWriter.  
In order to simplify the code in preparation for other refactorings, I think 
that we should remove this base class and have only DiskBlockObjectWriter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8975) Implement a mechanism to send a new rate from the driver to the block generator

2015-07-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625930#comment-14625930
 ] 

François Garillot commented on SPARK-8975:
--

Typesafe PR : https://github.com/typesafehub/spark/pull/15/files

 Implement a mechanism to send a new rate from the driver to the block 
 generator
 ---

 Key: SPARK-8975
 URL: https://issues.apache.org/jira/browse/SPARK-8975
 Project: Spark
  Issue Type: Sub-task
  Components: Streaming
Reporter: Iulian Dragos

 Full design doc 
 [here|https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing]
 - Add a new message, {{RateUpdate(newRate: Long)}} that ReceiverSupervisor 
 handles in its endpoint 
 - Add a new method to ReceiverTracker
 {{def sendRateUpdate(streamId: Int, newRate: Long): Unit}}
 this method sends an asynchronous RateUpdate message to the receiver 
 supervisor corresponding to streamId 
 - update the rate in the corresponding block generator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9020) Support mutable state in code gen expressions

2015-07-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9020:
---

Assignee: Apache Spark  (was: Wenchen Fan)

 Support mutable state in code gen expressions
 -

 Key: SPARK-9020
 URL: https://issues.apache.org/jira/browse/SPARK-9020
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Apache Spark

 Some expressions have state in them (e.g. Rand, MonotonicallyIncreasingID). 
 We currently don't support code-gen any expressions that have mutable states.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9010) Improve the Spark Configuration document about `spark.kryoserializer.buffer`

2015-07-14 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9010:
-
Assignee: StanZhai

 Improve the Spark Configuration document about `spark.kryoserializer.buffer`
 

 Key: SPARK-9010
 URL: https://issues.apache.org/jira/browse/SPARK-9010
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.4.0
Reporter: StanZhai
Assignee: StanZhai
Priority: Trivial
  Labels: documentation
 Fix For: 1.4.2, 1.5.0


 The meaning of spark.kryoserializer.buffer should be Initial size of Kryo's 
 serialization buffer. Note that there will be one buffer per core on each 
 worker. This buffer will grow up to spark.kryoserializer.buffer.max if 
 needed..
 The spark.kryoserializer.buffer.max.mb is out-of-date in spark 1.4.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-14 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626005#comment-14626005
 ] 

Sean Owen commented on SPARK-9019:
--

Same as SPARK-8851?

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
   at 
 org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
   at 
 

[jira] [Assigned] (SPARK-9020) Support mutable state in code gen expressions

2015-07-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9020:
---

Assignee: Wenchen Fan  (was: Apache Spark)

 Support mutable state in code gen expressions
 -

 Key: SPARK-9020
 URL: https://issues.apache.org/jira/browse/SPARK-9020
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Wenchen Fan

 Some expressions have state in them (e.g. Rand, MonotonicallyIncreasingID). 
 We currently don't support code-gen any expressions that have mutable states.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9020) Support mutable state in code gen expressions

2015-07-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625929#comment-14625929
 ] 

Apache Spark commented on SPARK-9020:
-

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/7392

 Support mutable state in code gen expressions
 -

 Key: SPARK-9020
 URL: https://issues.apache.org/jira/browse/SPARK-9020
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Wenchen Fan

 Some expressions have state in them (e.g. Rand, MonotonicallyIncreasingID). 
 We currently don't support code-gen any expressions that have mutable states.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9025) Storage tab shows no blocks for cached RDDs

2015-07-14 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625976#comment-14625976
 ] 

Sean Owen commented on SPARK-9025:
--

I can't reproduce this on master. In Storage I get


RDD NameStorage Level   Cached Partitions   Fraction Cached Size in 
Memory  Size in ExternalBlockStore  Size on Disk
ParallelCollectionRDD   Memory Deserialized 1x Replicated   8   100%
352.0 B 0.0 B   0.0 B

 Storage tab shows no blocks for cached RDDs
 ---

 Key: SPARK-9025
 URL: https://issues.apache.org/jira/browse/SPARK-9025
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.5.0
Reporter: Andrew Or

 Simple repro: sc.parallelize(1 to 10).cache().count(), go to storage tab.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8808) Fix assignments in SparkR

2015-07-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8808:
---

Assignee: Apache Spark

 Fix assignments in SparkR
 -

 Key: SPARK-8808
 URL: https://issues.apache.org/jira/browse/SPARK-8808
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Assignee: Apache Spark

 {noformat}
 inst/tests/test_binary_function.R:79:12: style: Use -, not =, for assignment.
   mockFile = c(Spark is pretty., Spark is awesome.)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8979) Implement a PIDRateEstimator

2015-07-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-8979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626118#comment-14626118
 ] 

François Garillot commented on SPARK-8979:
--

Parameter derivation available here: 
https://www.dropbox.com/s/dwgl7wa1z5wbkg6/PIDderivation.pdf?dl=0

 Implement a PIDRateEstimator
 

 Key: SPARK-8979
 URL: https://issues.apache.org/jira/browse/SPARK-8979
 Project: Spark
  Issue Type: Sub-task
  Components: Streaming
Reporter: Iulian Dragos
 Fix For: 1.5.0


 Based on this [design 
 doc|https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626043#comment-14626043
 ] 

Bolke de Bruin commented on SPARK-9019:
---

Will try in a few minutes, however it did not only happen when using keytabs. 
Also when using the user's own credentials.

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
   at 
 org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
   at 
 

[jira] [Updated] (SPARK-8974) There is a bug in The spark-dynamic-executor-allocation may be not supported

2015-07-14 Thread KaiXinXIaoLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXinXIaoLei updated SPARK-8974:
-
Summary: There is a bug in The spark-dynamic-executor-allocation may be not 
supported  (was: The spark-dynamic-executor-allocation may be not supported)

 There is a bug in The spark-dynamic-executor-allocation may be not supported
 

 Key: SPARK-8974
 URL: https://issues.apache.org/jira/browse/SPARK-8974
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: KaiXinXIaoLei
 Fix For: 1.5.0


 In yarn-client mode and config option spark.dynamicAllocation.enabled  is 
 true, when the state of ApplicationMaster is dead or disconnected, if the 
 tasks are submitted  before new ApplicationMaster start. The thread of 
 spark-dynamic-executor-allocation will throw exception, When 
 ApplicationMaster is running and not tasks are running, the number of 
 executor is not zero. So feture of dynamicAllocation are not  supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626183#comment-14626183
 ] 

Bolke de Bruin commented on SPARK-9019:
---

[~srowen] unfortunately the patch from SPARK-8851 did not solve the issue. 
Trace remains the same.

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
   at 
 org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
   at 
 

[jira] [Comment Edited] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626183#comment-14626183
 ] 

Bolke de Bruin edited comment on SPARK-9019 at 7/14/15 10:41 AM:
-

[~srowen] unfortunately the patch from SPARK-8851 did not solve the issue. 
Trace remains the same.

With the patch a user without a key tab cannot use spark-submit anymore with 
--master yarn-cluster (failed renewal of token)


was (Author: bolke):
[~srowen] unfortunately the patch from SPARK-8851 did not solve the issue. 
Trace remains the same.

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
  

[jira] [Comment Edited] (SPARK-8975) Implement a mechanism to send a new rate from the driver to the block generator

2015-07-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625930#comment-14625930
 ] 

François Garillot edited comment on SPARK-8975 at 7/14/15 11:14 AM:


Typesafe PR : https://github.com/typesafehub/spark/pull/15/


was (Author: huitseeker):
Typesafe PR : https://github.com/typesafehub/spark/pull/15/files

 Implement a mechanism to send a new rate from the driver to the block 
 generator
 ---

 Key: SPARK-8975
 URL: https://issues.apache.org/jira/browse/SPARK-8975
 Project: Spark
  Issue Type: Sub-task
  Components: Streaming
Reporter: Iulian Dragos

 Full design doc 
 [here|https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing]
 - Add a new message, {{RateUpdate(newRate: Long)}} that ReceiverSupervisor 
 handles in its endpoint 
 - Add a new method to ReceiverTracker
 {{def sendRateUpdate(streamId: Int, newRate: Long): Unit}}
 this method sends an asynchronous RateUpdate message to the receiver 
 supervisor corresponding to streamId 
 - update the rate in the corresponding block generator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9032) scala.MatchError in DataFrameReader.json(String path)

2015-07-14 Thread Philipp Poetter (JIRA)
Philipp Poetter created SPARK-9032:
--

 Summary: scala.MatchError in DataFrameReader.json(String path)
 Key: SPARK-9032
 URL: https://issues.apache.org/jira/browse/SPARK-9032
 Project: Spark
  Issue Type: Bug
  Components: Java API, SQL
Affects Versions: 1.4.0
 Environment: Ubuntu 15.04
Reporter: Philipp Poetter


Executing read().json() of SQLContext e.g. DataFrameReader raises a MatchError 
with a stacktrace as follows while trying to read JSON data:

15/07/14 11:25:26 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have 
all completed, from pool 
15/07/14 11:25:26 INFO DAGScheduler: Job 0 finished: json at Example.java:23, 
took 6.981330 s
Exception in thread main scala.MatchError: StringType (of class 
org.apache.spark.sql.types.StringType$)
at org.apache.spark.sql.json.InferSchema$.apply(InferSchema.scala:58)
at 
org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:139)
at 
org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:138)
at scala.Option.getOrElse(Option.scala:120)
at 
org.apache.spark.sql.json.JSONRelation.schema$lzycompute(JSONRelation.scala:137)
at org.apache.spark.sql.json.JSONRelation.schema(JSONRelation.scala:137)
at 
org.apache.spark.sql.sources.LogicalRelation.init(LogicalRelation.scala:30)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:213)
at com.hp.sparkdemo.Example.main(Example.java:23)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/07/14 11:25:26 INFO SparkContext: Invoking stop() from shutdown hook
15/07/14 11:25:26 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
15/07/14 11:25:26 INFO DAGScheduler: Stopping DAGScheduler
15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Shutting down all executors
15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Asking each executor to 
shut down
15/07/14 11:25:26 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!

Offending code snippet (around line 23):
...
   JavaSparkContext sctx = new JavaSparkContext(sparkConf);
SQLContext ctx = new SQLContext(sctx);
DataFrame frame = ctx.read().json(facebookJSON);
frame.printSchema();
...

The exception is reproducable using the following JSON:

{
   data: [
  {
 id: X999_Y999,
 from: {
name: Tom Brady, id: X12
 },
 message: Looking forward to 2010!,
 actions: [
{
   name: Comment,
   link: http://www.facebook.com/X999/posts/Y999;
},
{
   name: Like,
   link: http://www.facebook.com/X999/posts/Y999;
}
 ],
 type: status,
 created_time: 2010-08-02T21:27:44+,
 updated_time: 2010-08-02T21:27:44+
  },
  {
 id: X998_Y998,
 from: {
name: Peyton Manning, id: X18
 },
 message: Where's my contract?,
 actions: [
{
   name: Comment,
   link: http://www.facebook.com/X998/posts/Y998;
},
{
   name: Like,
   link: http://www.facebook.com/X998/posts/Y998;
}
 ],
 type: status,
 created_time: 2010-08-02T21:27:44+,
 updated_time: 2010-08-02T21:27:44+
  }
   ]
}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8974) There is a bug in dynamicAllocation. The spark-dynamic-executor-allocation may be not supported

2015-07-14 Thread KaiXinXIaoLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXinXIaoLei updated SPARK-8974:
-
Summary: There is a bug in dynamicAllocation. The 
spark-dynamic-executor-allocation may be not supported  (was: There is a bug in 
The spark-dynamic-executor-allocation may be not supported)

 There is a bug in dynamicAllocation. The spark-dynamic-executor-allocation 
 may be not supported
 ---

 Key: SPARK-8974
 URL: https://issues.apache.org/jira/browse/SPARK-8974
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: KaiXinXIaoLei
 Fix For: 1.5.0


 In yarn-client mode and config option spark.dynamicAllocation.enabled  is 
 true, when the state of ApplicationMaster is dead or disconnected, if the 
 tasks are submitted  before new ApplicationMaster start. The thread of 
 spark-dynamic-executor-allocation will throw exception, When 
 ApplicationMaster is running and not tasks are running, the number of 
 executor is not zero. So feture of dynamicAllocation are not  supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8974) There is a bug in dynamicAllocation. When there is no running tasks, the number of executor is not zero.

2015-07-14 Thread KaiXinXIaoLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXinXIaoLei updated SPARK-8974:
-
Summary: There is a bug in dynamicAllocation. When there is no running 
tasks, the number of executor is not zero.  (was: There is a bug in 
dynamicAllocation. The spark-dynamic-executor-allocation may be not supported)

 There is a bug in dynamicAllocation. When there is no running tasks, the 
 number of executor is not zero.
 

 Key: SPARK-8974
 URL: https://issues.apache.org/jira/browse/SPARK-8974
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: KaiXinXIaoLei
 Fix For: 1.5.0


 In yarn-client mode and config option spark.dynamicAllocation.enabled  is 
 true, when the state of ApplicationMaster is dead or disconnected, if the 
 tasks are submitted  before new ApplicationMaster start. The thread of 
 spark-dynamic-executor-allocation will throw exception, When 
 ApplicationMaster is running and not tasks are running, the number of 
 executor is not zero. So feture of dynamicAllocation are not  supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8808) Fix assignments in SparkR

2015-07-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626095#comment-14626095
 ] 

Apache Spark commented on SPARK-8808:
-

User 'sun-rui' has created a pull request for this issue:
https://github.com/apache/spark/pull/7395

 Fix assignments in SparkR
 -

 Key: SPARK-8808
 URL: https://issues.apache.org/jira/browse/SPARK-8808
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 {noformat}
 inst/tests/test_binary_function.R:79:12: style: Use -, not =, for assignment.
   mockFile = c(Spark is pretty., Spark is awesome.)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8808) Fix assignments in SparkR

2015-07-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8808:
---

Assignee: (was: Apache Spark)

 Fix assignments in SparkR
 -

 Key: SPARK-8808
 URL: https://issues.apache.org/jira/browse/SPARK-8808
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 {noformat}
 inst/tests/test_binary_function.R:79:12: style: Use -, not =, for assignment.
   mockFile = c(Spark is pretty., Spark is awesome.)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626113#comment-14626113
 ] 

Bolke de Bruin commented on SPARK-9019:
---

Now with debug info (not yet with patch):

15/07/14 11:03:49 DEBUG UserGroupInformation: PrivilegedAction as:yx66jx 
(auth:SIMPLE) 
from:org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717)
15/07/14 11:03:49 DEBUG SaslRpcClient: Sending sasl message state: NEGOTIATE

15/07/14 11:03:49 DEBUG SaslRpcClient: Received SASL message state: NEGOTIATE
auths {
  method: TOKEN
  mechanism: DIGEST-MD5
  protocol: 
  serverId: default
  challenge: 
realm=\default\,nonce=\XXX\,qop=\auth\,charset=utf-8,algorithm=md5-sess
}
auths {
  method: KERBEROS
  mechanism: GSSAPI
  protocol: rm
  serverId: lxhnl002.ad.ing.net
}

15/07/14 11:03:49 DEBUG SaslRpcClient: Get token info proto:interface 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB 
info:org.apache.hadoop.yarn.security.client.ClientRMSecurityInfo$2@5c53714b
15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Looking for a token with 
service 10.111.114.16:8032
15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is 
YARN_AM_RM_TOKEN and the token's service name is 
15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is 
HIVE_DELEGATION_TOKEN and the token's service name is 
15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is 
TIMELINE_DELEGATION_TOKEN and the token's service name is 10.111.114.16:8188
15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is 
HDFS_DELEGATION_TOKEN and the token's service name is 10.111.114.16:8020
15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is 
HDFS_DELEGATION_TOKEN and the token's service name is 10.111.114.17:8020
15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is 
HDFS_DELEGATION_TOKEN and the token's service name is ha-hdfs:hdpnlcb
15/07/14 11:03:49 DEBUG UserGroupInformation: PrivilegedActionException 
as:yx66jx (auth:SIMPLE) 
cause:org.apache.hadoop.security.AccessControlException: Client cannot 
authenticate via:[TOKEN, KERBEROS]
15/07/14 11:03:49 DEBUG UserGroupInformation: PrivilegedAction as:yx66jx 
(auth:SIMPLE) 
from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643)
15/07/14 11:03:49 WARN Client: Exception encountered while connecting to the 
server : org.apache.hadoop.security.AccessControlException: Client cannot 
authenticate via:[TOKEN, KERBEROS]
15/07/14 11:03:49 DEBUG UserGroupInformation: PrivilegedActionException 
as:yx66jx (auth:SIMPLE) cause:java.io.IOException: 
org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
via:[TOKEN, KERBEROS]



auth:SIMPLE is what worries me.

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 

[jira] [Updated] (SPARK-9034) Reflect field names defined in GenericUDTF

2015-07-14 Thread Takeshi Yamamuro (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-9034:

Description: 
GenericUDTF#initialize() in Hive defines field names in a returned schema 
though,
the current HiveGenericUDTF drops these names.
We might need to reflect these in a logical plan tree.

  was:
GenericUDTF#initialize() defines field names in a returned schema though,
the current HiveGenericUDTF drops these names.
We might need to reflect these in a logical plan tree.


 Reflect field names defined in GenericUDTF
 --

 Key: SPARK-9034
 URL: https://issues.apache.org/jira/browse/SPARK-9034
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Takeshi Yamamuro

 GenericUDTF#initialize() in Hive defines field names in a returned schema 
 though,
 the current HiveGenericUDTF drops these names.
 We might need to reflect these in a logical plan tree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9034) Reflect field names defined in GenericUDTF

2015-07-14 Thread Takeshi Yamamuro (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-9034:

Description: 
Hive GenericUDTF#initialize() defines field names in a returned schema though,
the current HiveGenericUDTF drops these names.
We might need to reflect these in a logical plan tree.

  was:
GenericUDTF#initialize() in Hive defines field names in a returned schema 
though,
the current HiveGenericUDTF drops these names.
We might need to reflect these in a logical plan tree.


 Reflect field names defined in GenericUDTF
 --

 Key: SPARK-9034
 URL: https://issues.apache.org/jira/browse/SPARK-9034
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Takeshi Yamamuro

 Hive GenericUDTF#initialize() defines field names in a returned schema though,
 the current HiveGenericUDTF drops these names.
 We might need to reflect these in a logical plan tree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9034) Reflect field names defined in GenericUDTF

2015-07-14 Thread Takeshi Yamamuro (JIRA)
Takeshi Yamamuro created SPARK-9034:
---

 Summary: Reflect field names defined in GenericUDTF
 Key: SPARK-9034
 URL: https://issues.apache.org/jira/browse/SPARK-9034
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Takeshi Yamamuro


GenericUDTF#initialize() defines field names in a returned schema though,
the current HiveGenericUDTF drops these names.
We might need to reflect these in a logical plan tree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626300#comment-14626300
 ] 

Bolke de Bruin edited comment on SPARK-9019 at 7/14/15 1:00 PM:



15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Looking for a token with 
service 10.111.114.16:8032
15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is 
YARN_AM_RM_TOKEN and the token's service name is 

I think that should match


was (Author: bolke):
It might be that we have a configuration issue (but Im not sure):

15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Looking for a token with 
service 10.111.114.16:8032
15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is 
YARN_AM_RM_TOKEN and the token's service name is 

I think that should match

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 

[jira] [Updated] (SPARK-9033) scala.MatchError: interface java.util.Map (of class java.lang.Class) with Spark SQL

2015-07-14 Thread Pavel (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel updated SPARK-9033:
-
Description: 
I've a java.util.MapString, String field in a POJO class and I'm trying to 
use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and 
getting following error in both 1.2.2  1.3.1 versions of the Spark SQL:

*sample code:
SQLContext sqlCtx = new SQLContext(sc.sc());
JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); 
//text line is splitted and assigned to respective field of the event class here
DataFrame schemaRDD  = sqlCtx.createDataFrame(rdd, Event.class); -- error 
thrown here
schemaRDD.registerTempTable(events);


Event class is a Serializable containing a field of type  java.util.MapString, 
String. This issue occurs also with Spark streaming when used with SQL.

JavaDStreamString receiverStream = jssc.receiverStream(new 
StreamingReceiver());
JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, 
SLIDE_INTERVAL);
jssc.checkpoint(event-streaming);

windowDStream.foreachRDD(evRDD - {
   if(evRDD.count() == 0) return null;

DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class);
schemaRDD.registerTempTable(events);
...
}


*error:
scala.MatchError: interface java.util.Map (of class java.lang.Class)
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 ~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]


also this occurs for fields of custom POJO classes:

scala.MatchError: class com.test.MyClass (of class java.lang.Class)
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 ~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) 
~[spark-sql_2.10-1.3.1.jar:1.3.1] 

  was:
I've a java.util.MapString, String field in a POJO class and I'm trying to 
use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and 
getting following error in both 1.2.2  1.3.1 versions of the Spark SQL:

*sample code:
SQLContext sqlCtx = new SQLContext(sc.sc());
JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); 
//text line is splitted and assigned to respective field of the event class here
DataFrame schemaRDD  = sqlCtx.createDataFrame(rdd, Event.class); -- error 
thrown here
schemaRDD.registerTempTable(events);


Event class is a Serializable containing a field of type  java.util.MapString, 
String. This issue occurs also with Spark streaming when used with SQL.

JavaDStreamString receiverStream = jssc.receiverStream(new 
StreamingReceiver());
JavaDStreamString windowDStream = 

[jira] [Updated] (SPARK-8974) There is a bug in dynamicAllocation. When there is no running tasks, the number of executor a long time without running tasks, the number of executor does not reduce to t

2015-07-14 Thread KaiXinXIaoLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXinXIaoLei updated SPARK-8974:
-
Summary: There is a bug in dynamicAllocation. When there is no running 
tasks, the number of executor a long time without running tasks, the number of 
executor does not reduce to the value of 
spark.dynamicAllocation.minExecutors.  (was: There is a bug in 
dynamicAllocation. When there is no running tasks, the number of executor is 
not zero.)

 There is a bug in dynamicAllocation. When there is no running tasks, the 
 number of executor a long time without running tasks, the number of executor 
 does not reduce to the value of spark.dynamicAllocation.minExecutors.
 -

 Key: SPARK-8974
 URL: https://issues.apache.org/jira/browse/SPARK-8974
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
Reporter: KaiXinXIaoLei
 Fix For: 1.5.0


 In yarn-client mode and config option spark.dynamicAllocation.enabled  is 
 true, when the state of ApplicationMaster is dead or disconnected, if the 
 tasks are submitted  before new ApplicationMaster start. The thread of 
 spark-dynamic-executor-allocation will throw exception, When 
 ApplicationMaster is running and not tasks are running, the number of 
 executor is not zero. So feture of dynamicAllocation are not  supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626256#comment-14626256
 ] 

Bolke de Bruin commented on SPARK-9019:
---

And some more debugging information. Please note the selected auth:SIMPLE 
method.



15/07/14 11:03:45 INFO ApplicationMaster: Registered signal handlers for [TERM, 
HUP, INT]
15/07/14 11:03:45 DEBUG Shell: setsid exited with exit code 0
15/07/14 11:03:45 DEBUG MutableMetricsFactory: field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(value=[Rate of 
successful kerberos logins and latency (milliseconds)], about=, valueName=Time, 
type=DEFAULT, always=false, sampleName=Ops)
15/07/14 11:03:45 DEBUG MutableMetricsFactory: field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(value=[Rate of failed 
kerberos logins and latency (milliseconds)], about=, valueName=Time, 
type=DEFAULT, always=false, sampleName=Ops)
15/07/14 11:03:45 DEBUG MutableMetricsFactory: field 
org.apache.hadoop.metrics2.lib.MutableRate 
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(value=[GetGroups], 
about=, valueName=Time, type=DEFAULT, always=false, sampleName=Ops)
15/07/14 11:03:45 DEBUG MetricsSystemImpl: UgiMetrics, User and group related 
metrics
15/07/14 11:03:45 DEBUG Groups:  Creating new Groups object
15/07/14 11:03:45 DEBUG NativeCodeLoader: Trying to load the custom-built 
native-hadoop library...
15/07/14 11:03:45 DEBUG NativeCodeLoader: Failed to load native-hadoop with 
error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
15/07/14 11:03:45 DEBUG NativeCodeLoader: 
java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
15/07/14 11:03:45 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
15/07/14 11:03:45 DEBUG PerformanceAdvisory: Falling back to shell based
15/07/14 11:03:45 DEBUG JniBasedUnixGroupsMappingWithFallback: Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
15/07/14 11:03:45 DEBUG Groups: Group mapping 
impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; 
cacheTimeout=30; warningDeltaMs=5000
15/07/14 11:03:45 DEBUG YarnSparkHadoopUtil: running as user: yx66jx
15/07/14 11:03:45 DEBUG UserGroupInformation: hadoop login
15/07/14 11:03:45 DEBUG UserGroupInformation: hadoop login commit
15/07/14 11:03:45 DEBUG UserGroupInformation: using kerberos user:null
15/07/14 11:03:45 DEBUG UserGroupInformation: using local user:UnixPrincipal: 
yx66jx
15/07/14 11:03:45 DEBUG UserGroupInformation: Using user: UnixPrincipal: 
yx66jx with name yx66jx
15/07/14 11:03:45 DEBUG UserGroupInformation: User entry: yx66jx
15/07/14 11:03:45 DEBUG UserGroupInformation: UGI loginUser:yx66jx 
(auth:KERBEROS)
15/07/14 11:03:45 DEBUG UserGroupInformation: PrivilegedAction as:yx66jx 
(auth:SIMPLE) 
from:org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
15/07/14 11:03:46 INFO ApplicationMaster: ApplicationAttemptId: 
appattempt_1436783220608_0085_01
15/07/14 11:03:46 DEBUG BlockReaderLocal: 
dfs.client.use.legacy.blockreader.local = false
15/07/14 11:03:46 DEBUG BlockReaderLocal: dfs.client.read.shortcircuit = true
15/07/14 11:03:46 DEBUG BlockReaderLocal: dfs.client.domain.socket.data.traffic 
= false
15/07/14 11:03:46 DEBUG BlockReaderLocal: dfs.domain.socket.path = 
/var/lib/hadoop-hdfs/dn_socket

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: 

[jira] [Created] (SPARK-9033) scala.MatchError: interface java.util.Map (of class java.lang.Class) with Spark SQL

2015-07-14 Thread Pavel (JIRA)
Pavel created SPARK-9033:


 Summary: scala.MatchError: interface java.util.Map (of class 
java.lang.Class) with Spark SQL
 Key: SPARK-9033
 URL: https://issues.apache.org/jira/browse/SPARK-9033
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1, 1.2.2
Reporter: Pavel


I've a java.util.MapString, String field in a POJO class and I'm trying to 
use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and 
getting following error in both 1.2.2  1.3.1 versions of the Spark SQL:

*sample code:
SQLContext sqlCtx = new SQLContext(sc.sc());
JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); 
//text line is splitted and assigned to respective field of the event class here
DataFrame schemaRDD  = sqlCtx.createDataFrame(rdd, Event.class); -- error 
thrown here
schemaRDD.registerTempTable(events);


Event class is a Serializable. This issue occurs also with Spark streaming when 
used with SQL.

JavaDStreamString receiverStream = jssc.receiverStream(new 
StreamingReceiver());
JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, 
SLIDE_INTERVAL);
jssc.checkpoint(event-streaming);

windowDStream.foreachRDD(evRDD - {
if(evRDD.count() == 0) return null;
DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class);
schemaRDD.registerTempTable(events);
...
}


*error:
scala.MatchError: interface java.util.Map (of class java.lang.Class)
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 ~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9034) Reflect field names defined in GenericUDTF

2015-07-14 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626279#comment-14626279
 ] 

Takeshi Yamamuro commented on SPARK-9034:
-

I'll make a PR for this after SPARK-8955 and SPARK-8930 resolved.

 Reflect field names defined in GenericUDTF
 --

 Key: SPARK-9034
 URL: https://issues.apache.org/jira/browse/SPARK-9034
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Takeshi Yamamuro

 Hive GenericUDTF#initialize() defines field names in a returned schema though,
 the current HiveGenericUDTF drops these names.
 We might need to reflect these in a logical plan tree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5210) Support log rolling in EventLogger

2015-07-14 Thread Tao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626325#comment-14626325
 ] 

Tao Wang commented on SPARK-5210:
-

Hi [~joshrosen], we found same problem in streaming / Thrift Server logs. The 
HistoryServer usually collapses because of OOM exception when it read too large 
event log written by long-running application.

Even we can tune memory settings for it, but it is not a elegant way as logs 
generated by Streaming/Thrift Server could increase infinitely.

We now plan to write event log to separate files according to their job id, say 
50 jobs a file. Then HistoryServer could read small file relatively which has a 
low probability to cause OOM.

 Support log rolling in EventLogger
 --

 Key: SPARK-5210
 URL: https://issues.apache.org/jira/browse/SPARK-5210
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core, Web UI
Reporter: Josh Rosen

 For long-running Spark applications (e.g. running for days / weeks), the 
 Spark event log may grow to be very large.
 As a result, it would be useful if EventLoggingListener supported log file 
 rolling / rotation.  Adding this feature will involve changes to the 
 HistoryServer in order to be able to load event logs from a sequence of files 
 instead of a single file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5210) Support log rolling in EventLogger

2015-07-14 Thread Tao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626325#comment-14626325
 ] 

Tao Wang edited comment on SPARK-5210 at 7/14/15 1:21 PM:
--

Hi [~joshrosen], we found same problem in streaming / Thrift Server logs. The 
HistoryServer usually collapses because of OOM exception when it read too large 
event log written by long-running application.

Even we can tune memory settings for it, but it is not a elegant way as logs 
generated by Streaming/Thrift Server could increase infinitely.

We now plan to write event log to separate files according to their job id, say 
50 jobs a file. Then HistoryServer could read small file relatively which has a 
low probability to cause OOM.

How do you think? 


was (Author: wangtaothetonic):
Hi [~joshrosen], we found same problem in streaming / Thrift Server logs. The 
HistoryServer usually collapses because of OOM exception when it read too large 
event log written by long-running application.

Even we can tune memory settings for it, but it is not a elegant way as logs 
generated by Streaming/Thrift Server could increase infinitely.

We now plan to write event log to separate files according to their job id, say 
50 jobs a file. Then HistoryServer could read small file relatively which has a 
low probability to cause OOM.

 Support log rolling in EventLogger
 --

 Key: SPARK-5210
 URL: https://issues.apache.org/jira/browse/SPARK-5210
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core, Web UI
Reporter: Josh Rosen

 For long-running Spark applications (e.g. running for days / weeks), the 
 Spark event log may grow to be very large.
 As a result, it would be useful if EventLoggingListener supported log file 
 rolling / rotation.  Adding this feature will involve changes to the 
 HistoryServer in order to be able to load event logs from a sequence of files 
 instead of a single file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8977) Define the RateEstimator interface, and implement the ReceiverRateController

2015-07-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626202#comment-14626202
 ] 

François Garillot commented on SPARK-8977:
--

Typesafe PR: https://github.com/typesafehub/spark/pull/16

 Define the RateEstimator interface, and implement the ReceiverRateController
 

 Key: SPARK-8977
 URL: https://issues.apache.org/jira/browse/SPARK-8977
 Project: Spark
  Issue Type: Sub-task
  Components: Streaming
Reporter: Iulian Dragos
 Fix For: 1.5.0


 Full [design 
 doc|https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing]
 Implement a rate controller for receiver-based InputDStreams that estimates a 
 maximum rate and sends it to each receiver supervisor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626300#comment-14626300
 ] 

Bolke de Bruin commented on SPARK-9019:
---

It might be that we have a configuration issue (but Im not sure):

15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Looking for a token with 
service 10.111.114.16:8032
15/07/14 11:03:49 DEBUG RMDelegationTokenSelector: Token kind is 
YARN_AM_RM_TOKEN and the token's service name is 

I think that should match

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
   at 
 

[jira] [Commented] (SPARK-8844) head/collect is broken in SparkR

2015-07-14 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626273#comment-14626273
 ] 

Sun Rui commented on SPARK-8844:


This is a bug about reading empty DataFrame. will submit a PR.

 head/collect is broken in SparkR 
 -

 Key: SPARK-8844
 URL: https://issues.apache.org/jira/browse/SPARK-8844
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 1.5.0
Reporter: Davies Liu
Priority: Blocker

 {code}
  t = tables(sqlContext)
  showDF(T)
 Error in (function (classes, fdef, mtable)  :
   unable to find an inherited method for function ‘showDF’ for signature 
 ‘logical’
  showDF(t)
 +-+---+
 |tableName|isTemporary|
 +-+---+
 +-+---+
  15/07/06 09:59:10 WARN Executor: Told to re-register on heartbeat
 
 
  head(t)
 Error in readTypedObject(con, type) :
   Unsupported type for deserialization
  collect(t)
 Error in readTypedObject(con, type) :
   Unsupported type for deserialization
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9033) scala.MatchError: interface java.util.Map (of class java.lang.Class) with Spark SQL

2015-07-14 Thread Pavel (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel updated SPARK-9033:
-
Description: 
I've a java.util.MapString, String field in a POJO class and I'm trying to 
use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and 
getting following error in both 1.2.2  1.3.1 versions of the Spark SQL:

*sample code:
SQLContext sqlCtx = new SQLContext(sc.sc());
JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); 
//text line is splitted and assigned to respective field of the event class here
DataFrame schemaRDD  = sqlCtx.createDataFrame(rdd, Event.class); -- error 
thrown here
schemaRDD.registerTempTable(events);


Event class is a Serializable. This issue occurs also with Spark streaming when 
used with SQL.

JavaDStreamString receiverStream = jssc.receiverStream(new 
StreamingReceiver());
JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, 
SLIDE_INTERVAL);
jssc.checkpoint(event-streaming);

windowDStream.foreachRDD(evRDD - {
   if(evRDD.count() == 0) return null;

DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class);
schemaRDD.registerTempTable(events);
...
}


*error:
scala.MatchError: interface java.util.Map (of class java.lang.Class)
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 ~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]

  was:
I've a java.util.MapString, String field in a POJO class and I'm trying to 
use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and 
getting following error in both 1.2.2  1.3.1 versions of the Spark SQL:

*sample code:
SQLContext sqlCtx = new SQLContext(sc.sc());
JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); 
//text line is splitted and assigned to respective field of the event class here
DataFrame schemaRDD  = sqlCtx.createDataFrame(rdd, Event.class); -- error 
thrown here
schemaRDD.registerTempTable(events);


Event class is a Serializable. This issue occurs also with Spark streaming when 
used with SQL.

JavaDStreamString receiverStream = jssc.receiverStream(new 
StreamingReceiver());
JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, 
SLIDE_INTERVAL);
jssc.checkpoint(event-streaming);

windowDStream.foreachRDD(evRDD - {
if(evRDD.count() == 0) return null;
DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class);
schemaRDD.registerTempTable(events);
...
}


*error:
scala.MatchError: interface java.util.Map (of class java.lang.Class)
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 ~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) 

[jira] [Updated] (SPARK-9033) scala.MatchError: interface java.util.Map (of class java.lang.Class) with Spark SQL

2015-07-14 Thread Pavel (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel updated SPARK-9033:
-
Description: 
I've a java.util.MapString, String field in a POJO class and I'm trying to 
use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and 
getting following error in both 1.2.2  1.3.1 versions of the Spark SQL:

*sample code:
SQLContext sqlCtx = new SQLContext(sc.sc());
JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); 
//text line is splitted and assigned to respective field of the event class here
DataFrame schemaRDD  = sqlCtx.createDataFrame(rdd, Event.class); -- error 
thrown here
schemaRDD.registerTempTable(events);


Event class is a Serializable containing a field of type  java.util.MapString, 
String. This issue occurs also with Spark streaming when used with SQL.

JavaDStreamString receiverStream = jssc.receiverStream(new 
StreamingReceiver());
JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, 
SLIDE_INTERVAL);
jssc.checkpoint(event-streaming);

windowDStream.foreachRDD(evRDD - {
   if(evRDD.count() == 0) return null;

DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class);
schemaRDD.registerTempTable(events);
...
}


*error:
scala.MatchError: interface java.util.Map (of class java.lang.Class)
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 ~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]

  was:
I've a java.util.MapString, String field in a POJO class and I'm trying to 
use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and 
getting following error in both 1.2.2  1.3.1 versions of the Spark SQL:

*sample code:
SQLContext sqlCtx = new SQLContext(sc.sc());
JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); 
//text line is splitted and assigned to respective field of the event class here
DataFrame schemaRDD  = sqlCtx.createDataFrame(rdd, Event.class); -- error 
thrown here
schemaRDD.registerTempTable(events);


Event class is a Serializable. This issue occurs also with Spark streaming when 
used with SQL.

JavaDStreamString receiverStream = jssc.receiverStream(new 
StreamingReceiver());
JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, 
SLIDE_INTERVAL);
jssc.checkpoint(event-streaming);

windowDStream.foreachRDD(evRDD - {
   if(evRDD.count() == 0) return null;

DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class);
schemaRDD.registerTempTable(events);
...
}


*error:
scala.MatchError: interface java.util.Map (of class java.lang.Class)
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 ~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 

[jira] [Created] (SPARK-9035) Spark on Mesos Thread Context Class Loader issues

2015-07-14 Thread John Omernik (JIRA)
John Omernik created SPARK-9035:
---

 Summary: Spark on Mesos Thread Context Class Loader issues
 Key: SPARK-9035
 URL: https://issues.apache.org/jira/browse/SPARK-9035
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0, 1.3.1, 1.3.0, 1.2.2
 Environment: Mesos on MapRFS. 
Reporter: John Omernik
Priority: Critical


There is an issue trying to run Spark on Mesos (using MapRFS).   I am able to 
run this in YARN (Using Myriad on Mesos) on the same cluster, just not directly 
on Mesos. I've corresponded with MapR and the issue appears to be the class 
loader being NULL.  They will look at trying to address it in their code as 
well, but the issue exists here as the desired behavior shouldn't be to pass 
NULL (see https://issues.apache.org/jira/browse/SPARK-1403)  Note, I did try to 
work to reopen SPARK-1403 and Patrick Wendell asked me to open a new issue, 
(that is this JIRA).

Environment:
MapR 4.1.0 (using MapRFS)
Mesos 22.1 
Spark 1.4 (The issue occurs on Spark 1.3.1, 1.3.0, 1.2.2 but not 1.2.0)





Some comments from Kannan at MapR (he is no longer with MapR, these comments 
were prior to him leaving:


Here is the corresponding ShimLoader code. cl.getParent is hitting NPE. 

If you look at Spark code base, you can see that the setContextClassLoader is 
invoked in a few places, but not necessarily in the context of this stack trace.

  private static ClassLoader getRootClassLoader() {
ClassLoader cl = Thread.currentThread().getContextClassLoader();
trace(getRootClassLoader: thread classLoader is '%s',
  cl.getClass().getCanonicalName());
while (cl.getParent() != null) {
  cl = cl.getParent();
}
trace(getRootClassLoader: root classLoader is '%s',
  cl.getClass().getCanonicalName());
return cl;
  }


  MapR cannot handle NULL in this case. Basically, it is trying to get a root 
classloader to use for loading a bunch of classes. It uses the thread's context 
class loader (TCCL) and keeps going up the parent chain. We could fall back to 
using the current class's classloader whenever TCCL is NULL. I need to check 
with some folks what the impact will be. I don't know the specific reason for 
choosing the TCCL here.

  I have raised an internal bug to fall back to using the current class loader 
if the TCCL is not set. Let us also figure out if there is a way for Spark to 
address this - if it is really a change in behavior from their side. I think we 
should still fix out code to not make this assumption. But since this is a core 
change, it may not get out soon.







Command Attempted in bin/pyspark

from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext, Row, HiveContext
sparkhc = HiveContext(sc)
test = sparkhc.sql(show tables)
for r in test.collect():
  print r







Stack Trace from CLI:
15/07/14 09:16:40 WARN ReliableDeliverySupervisor: Association with remote 
system [akka.tcp://sparkexecu...@hadoopvm5.mydomain.com:58221] has failed, 
address is now gated for [5000] ms. Reason is: [Disassociated].
15/07/14 09:16:40 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
hadoopvm5.mydomain.com): ExecutorLostFailure (executor 
20150630-193234-1644210368-5050-10591-S3 lost)
15/07/14 09:16:48 WARN ReliableDeliverySupervisor: Association with remote 
system [akka.tcp://sparkexecu...@hadoopmapr3.mydomain.com:53763] has failed, 
address is now gated for [5000] ms. Reason is: [Disassociated].
15/07/14 09:16:48 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1, 
hadoopmapr3.mydomain.com): ExecutorLostFailure (executor 
20150630-193234-1644210368-5050-10591-S2 lost)
15/07/14 09:16:53 WARN ReliableDeliverySupervisor: Association with remote 
system [akka.tcp://sparkexecu...@hadoopvm5.mydomain.com:52102] has failed, 
address is now gated for [5000] ms. Reason is: [Disassociated].
15/07/14 09:16:53 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2, 
hadoopvm5.mydomain.com): ExecutorLostFailure (executor 
20150630-193234-1644210368-5050-10591-S3 lost)
15/07/14 09:17:01 WARN ReliableDeliverySupervisor: Association with remote 
system [akka.tcp://sparkexecu...@hadoopmapr3.mydomain.com:58600] has failed, 
address is now gated for [5000] ms. Reason is: [Disassociated].
15/07/14 09:17:01 WARN TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3, 
hadoopmapr3.mydomain.com): ExecutorLostFailure (executor 
20150630-193234-1644210368-5050-10591-S2 lost)
15/07/14 09:17:01 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; 
aborting job
Traceback (most recent call last):
  File stdin, line 1, in module
  File 
/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503/python/pyspark/sql/dataframe.py,
 line 314, in collect
port = 
self._sc._jvm.PythonRDD.collectAndServe(self._jdf.javaToPython().rdd())
  File 

[jira] [Commented] (SPARK-8996) Add Python API for Kolmogorov-Smirnov Test

2015-07-14 Thread Manoj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626409#comment-14626409
 ] 

Manoj Kumar commented on SPARK-8996:


Hi, Can I work on this?

 Add Python API for Kolmogorov-Smirnov Test
 --

 Key: SPARK-8996
 URL: https://issues.apache.org/jira/browse/SPARK-8996
 Project: Spark
  Issue Type: New Feature
  Components: MLlib, PySpark
Reporter: Xiangrui Meng

 Add Python API for the Kolmogorov-Smirnov test implemented in SPARK-8598. It 
 should be similar to ChiSqTest in Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8125) Accelerate ParquetRelation2 metadata discovery

2015-07-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8125:
---

Assignee: Apache Spark  (was: Cheng Lian)

 Accelerate ParquetRelation2 metadata discovery
 --

 Key: SPARK-8125
 URL: https://issues.apache.org/jira/browse/SPARK-8125
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.4.0
Reporter: Cheng Lian
Assignee: Apache Spark
Priority: Blocker

 For large Parquet tables (e.g., with thousands of partitions), it can be very 
 slow to discover Parquet metadata for schema merging and generating splits 
 for Spark jobs. We need to accelerate this processes. One possible solution 
 is to do the discovery via a distributed Spark job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8125) Accelerate ParquetRelation2 metadata discovery

2015-07-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626456#comment-14626456
 ] 

Apache Spark commented on SPARK-8125:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/7396

 Accelerate ParquetRelation2 metadata discovery
 --

 Key: SPARK-8125
 URL: https://issues.apache.org/jira/browse/SPARK-8125
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.4.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Blocker

 For large Parquet tables (e.g., with thousands of partitions), it can be very 
 slow to discover Parquet metadata for schema merging and generating splits 
 for Spark jobs. We need to accelerate this processes. One possible solution 
 is to do the discovery via a distributed Spark job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8125) Accelerate ParquetRelation2 metadata discovery

2015-07-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8125:
---

Assignee: Cheng Lian  (was: Apache Spark)

 Accelerate ParquetRelation2 metadata discovery
 --

 Key: SPARK-8125
 URL: https://issues.apache.org/jira/browse/SPARK-8125
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.4.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Blocker

 For large Parquet tables (e.g., with thousands of partitions), it can be very 
 slow to discover Parquet metadata for schema merging and generating splits 
 for Spark jobs. We need to accelerate this processes. One possible solution 
 is to do the discovery via a distributed Spark job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9033) scala.MatchError: interface java.util.Map (of class java.lang.Class) with Spark SQL

2015-07-14 Thread Pavel (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel updated SPARK-9033:
-
Description: 
I've a java.util.MapString, String field in a POJO class and I'm trying to 
use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and 
getting following error in both 1.2.2  1.3.1 versions of the Spark SQL:

*sample code:
SQLContext sqlCtx = new SQLContext(sc.sc());
JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); 
//text line is splitted and assigned to respective field of the event class here
DataFrame schemaRDD  = sqlCtx.createDataFrame(rdd, Event.class); -- error 
thrown here
schemaRDD.registerTempTable(events);


Event class is a Serializable containing a field of type  java.util.MapString, 
String. This issue occurs also with Spark streaming when used with SQL.

JavaDStreamString receiverStream = jssc.receiverStream(new 
StreamingReceiver());
JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, 
SLIDE_INTERVAL);
jssc.checkpoint(event-streaming);

windowDStream.foreachRDD(evRDD - {
   if(evRDD.count() == 0) return null;

DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class);
schemaRDD.registerTempTable(events);
...
}


*error:
scala.MatchError: interface java.util.Map (of class java.lang.Class)
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 ~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]


also this occurs for fields of custom POJO classes:

scala.MatchError: class com.test.MyClass (of class java.lang.Class)
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 ~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]

also occurs for Calendar  type:

scala.MatchError: class java.util.Calendar (of class java.lang.Class)
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 ~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
   

[jira] [Updated] (SPARK-9033) scala.MatchError: interface java.util.Map (of class java.lang.Class) with Spark SQL

2015-07-14 Thread Pavel (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel updated SPARK-9033:
-
Description: 
I've a java.util.MapString, String field in a POJO class and I'm trying to 
use it to createDataFrame (1.3.1) / applySchema(1.2.2) with the SQLContext and 
getting following error in both 1.2.2  1.3.1 versions of the Spark SQL:

*sample code:
SQLContext sqlCtx = new SQLContext(sc.sc());
JavaRDDEvent rdd = sc.textFile(/path).map(line- Event.fromString(line)); 
//text line is splitted and assigned to respective field of the event class here
DataFrame schemaRDD  = sqlCtx.createDataFrame(rdd, Event.class); -- error 
thrown here
schemaRDD.registerTempTable(events);


Event class is a Serializable containing a field of type  java.util.MapString, 
String. This issue occurs also with Spark streaming when used with SQL.

JavaDStreamString receiverStream = jssc.receiverStream(new 
StreamingReceiver());
JavaDStreamString windowDStream = receiverStream.window(WINDOW_LENGTH, 
SLIDE_INTERVAL);
jssc.checkpoint(event-streaming);

windowDStream.foreachRDD(evRDD - {
   if(evRDD.count() == 0) return null;

DataFrame schemaRDD = sqlCtx.createDataFrame(evRDD, Event.class);
schemaRDD.registerTempTable(events);
...
}


*error:
scala.MatchError: interface java.util.Map (of class java.lang.Class)
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 ~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]


**also this occurs for fields of custom POJO classes:

scala.MatchError: class com.test.MyClass (of class java.lang.Class)
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 ~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244) 
~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1192) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:437) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:465) 
~[spark-sql_2.10-1.3.1.jar:1.3.1]

**also occurs for Calendar  type:

scala.MatchError: class java.util.Calendar (of class java.lang.Class)
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1193)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1192)
 ~[spark-sql_2.10-1.3.1.jar:1.3.1]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 ~[scala-library-2.10.5.jar:na]
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 ~[scala-library-2.10.5.jar:na]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) 
~[scala-library-2.10.5.jar:na]
   

[jira] [Commented] (SPARK-8978) Implement the DirectKafkaController

2015-07-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626542#comment-14626542
 ] 

François Garillot commented on SPARK-8978:
--

Typesafe PR: https://github.com/typesafehub/spark/pull/18

 Implement the DirectKafkaController
 ---

 Key: SPARK-8978
 URL: https://issues.apache.org/jira/browse/SPARK-8978
 Project: Spark
  Issue Type: Sub-task
  Components: Streaming
Reporter: Iulian Dragos
 Fix For: 1.5.0


 Based on this [design 
 doc|https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing].
 The DirectKafkaInputDStream should use the rate estimate to control how many 
 records/partition to put in the next batch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8979) Implement a PIDRateEstimator

2015-07-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-8979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626543#comment-14626543
 ] 

François Garillot commented on SPARK-8979:
--

Typesafe PR: https://github.com/typesafehub/spark/pull/17

 Implement a PIDRateEstimator
 

 Key: SPARK-8979
 URL: https://issues.apache.org/jira/browse/SPARK-8979
 Project: Spark
  Issue Type: Sub-task
  Components: Streaming
Reporter: Iulian Dragos
 Fix For: 1.5.0


 Based on this [design 
 doc|https://docs.google.com/document/d/1ls_g5fFmfbbSTIfQQpUxH56d0f3OksF567zwA00zK9E/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7751) Add @since to stable and experimental methods in MLlib

2015-07-14 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626533#comment-14626533
 ] 

Xiangrui Meng commented on SPARK-7751:
--

This is great! Thanks for providing the script!

 Add @since to stable and experimental methods in MLlib
 --

 Key: SPARK-7751
 URL: https://issues.apache.org/jira/browse/SPARK-7751
 Project: Spark
  Issue Type: Umbrella
  Components: Documentation, MLlib
Affects Versions: 1.4.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Minor
  Labels: starter

 This is useful to check whether a feature exists in some version of Spark. 
 This is an umbrella JIRA to track the progress. We want to have @since tag 
 for both stable (those without any Experimental/DeveloperApi/AlphaComponent 
 annotations) and experimental methods in MLlib:
 * an example PR for Scala: https://github.com/apache/spark/pull/6101
 * an example PR for Python: https://github.com/apache/spark/pull/6295
 We need to dig the history of git commit to figure out what was the Spark 
 version when a method was first introduced. Take `NaiveBayes.setModelType` as 
 an example. We can grep `def setModelType` at different version git tags.
 {code}
 meng@xm:~/src/spark
 $ git show 
 v1.3.0:mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
  | grep def setModelType
 meng@xm:~/src/spark
 $ git show 
 v1.4.0:mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
  | grep def setModelType
   def setModelType(modelType: String): NaiveBayes = {
 {code}
 If there are better ways, please let us know.
 We cannot add all @since tags in a single PR, which is hard to review. So we 
 made some subtasks for each package, for example 
 `org.apache.spark.classification`. Feel free to add more sub-tasks for Python 
 and the `spark.ml` package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9036) SparkListenerExecutorMetricsUpdate messages not included in JsonProtocol

2015-07-14 Thread Ryan Williams (JIRA)
Ryan Williams created SPARK-9036:


 Summary: SparkListenerExecutorMetricsUpdate messages not included 
in JsonProtocol
 Key: SPARK-9036
 URL: https://issues.apache.org/jira/browse/SPARK-9036
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.4.0, 1.4.1
Reporter: Ryan Williams
Priority: Minor


The JsonProtocol added in SPARK-3454 [doesn't 
include|https://github.com/apache/spark/blob/v1.4.1-rc4/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L95-L96]
 code for ser/de of 
[{{SparkListenerExecutorMetricsUpdate}}|https://github.com/apache/spark/blob/v1.4.1-rc4/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L107-L110]
 messages.

The comment notes that they are not used, which presumably refers to the fact 
that the [{{EventLoggingListener}} doesn't write these 
events|https://github.com/apache/spark/blob/v1.4.1-rc4/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L200-L201].

However, individual listeners can and should make that determination for 
themselves; I have recently written custom listeners that would like to consume 
metrics-update messages as JSON, so it would be nice to round out the 
JsonProtocol implementation by supporting them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626574#comment-14626574
 ] 

Bolke de Bruin commented on SPARK-9019:
---

Can this be related to YARN-3103?

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
   at 
 org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
   at 
 

[jira] [Commented] (SPARK-8724) Need documentation on how to deploy or use SparkR in Spark 1.4.0+

2015-07-14 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626482#comment-14626482
 ] 

Vincent Warmerdam commented on SPARK-8724:
--

So a tutorial just went live for people who are on spark 1.4. 

http://blog.rstudio.org/2015/07/14/spark-1-4-for-rstudio/

I suppose if people link to this for now it'd be just fine. For spark 1.5 the 
provisioning script of ec2 will come with rstudio. 

 Need documentation on how to deploy or use SparkR in Spark 1.4.0+
 -

 Key: SPARK-8724
 URL: https://issues.apache.org/jira/browse/SPARK-8724
 Project: Spark
  Issue Type: Bug
  Components: R
Affects Versions: 1.4.0
Reporter: Felix Cheung
Priority: Minor

 As of now there doesn't seem to be any official documentation on how to 
 deploy SparkR with Spark 1.4.0+
 Also, cluster manager specific documentation (like 
 http://spark.apache.org/docs/latest/spark-standalone.html) does not call out 
 what mode is supported for SparkR and details on deployment steps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-14 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated SPARK-9019:
--
Comment: was deleted

(was: - this was incorrect -
)

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
   at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
   at 
 org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
   at 
 org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:73)
 

[jira] [Comment Edited] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

2015-07-14 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625980#comment-14625980
 ] 

Bolke de Bruin edited comment on SPARK-9019 at 7/14/15 7:33 AM:


- this was incorrect -



was (Author: bolke):
Tracing this down it seems that the tokens are not being set on the container 
in yarn.Client, which is required according to 
http://aajisaka.github.io/hadoop-project/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html.

something like this:

  ByteBuffer fsTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength());
  amContainer.setTokens(fsTokens);

in createContainerLaunchContext of 
yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

 spark-submit fails on yarn with kerberos enabled
 

 Key: SPARK-9019
 URL: https://issues.apache.org/jira/browse/SPARK-9019
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
 Environment: Hadoop 2.6 with YARN and kerberos enabled
Reporter: Bolke de Bruin
  Labels: kerberos, spark-submit, yarn

 It is not possible to run jobs using spark-submit on yarn with a kerberized 
 cluster. 
 Commandline:
 /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
 --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
 Fails with:
 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
 SelectChannelConnector@0.0.0.0:58380
 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
 port 58380.
 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
 http://10.111.114.9:58380
 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
 YarnClusterScheduler
 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
 for source because spark.app.id is not set.
 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
 43470
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
 BlockManager
 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
 manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
 10.111.114.9, 43470)
 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
 http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
 the server : org.apache.hadoop.security.AccessControlException: Client cannot 
 authenticate via:[TOKEN, KERBEROS]
 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
 to rm2
 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
 getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
 java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
 lxhnl013.ad.ing.net:8032 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
   at org.apache.hadoop.ipc.Client.call(Client.java:1472)
   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 

[jira] [Resolved] (SPARK-9001) sbt doc fails due to javadoc errors

2015-07-14 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-9001.

   Resolution: Fixed
 Assignee: Joseph E. Gonzalez
Fix Version/s: 1.5.0

 sbt doc fails due to javadoc errors
 ---

 Key: SPARK-9001
 URL: https://issues.apache.org/jira/browse/SPARK-9001
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Reporter: Joseph E. Gonzalez
Assignee: Joseph E. Gonzalez
Priority: Minor
 Fix For: 1.5.0


 Running `build/sbt doc` on master fails due to errors javadocs. 
 This is an issues since `build/sbt publish-local` depends on building the 
 docs.
 Example error:
 [info] Generating 
 /spark/unsafe/target/scala-2.10/api/org/apache/spark/unsafe/bitset/BitSet.html...
 [error] 
 /spark/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSet.java:93: 
 error: bad use of ''
 [error]*  for (long i = bs.nextSetBit(0); i = 0; i = bs.nextSetBit(i + 
 1)) {
 [error] ^



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8624) DataFrameReader doesn't respect MERGE_SCHEMA setting for Parquet

2015-07-14 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626416#comment-14626416
 ] 

Liang-Chi Hsieh commented on SPARK-8624:


I think you can use DataFrameReader.option to set up needed parameters before 
calling DataFrameReader.parquet. It should solve your problem.

 DataFrameReader doesn't respect MERGE_SCHEMA setting for Parquet
 

 Key: SPARK-8624
 URL: https://issues.apache.org/jira/browse/SPARK-8624
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Rex Xiong
  Labels: parquet

 In 1.4.0, parquet is read by DataFrameReader.parquet, when creating 
 ParquetRelation2 object, parameters is hard-coded as Map.empty[String, 
 String], so ParquetRelation2.shouldMergeSchemas is always true (the default 
 value).
 In previous version, spark.sql.hive.convertMetastoreParquet.mergeSchema 
 config is respected.
 This bug downgrade performance a lot for a folder with hundreds of parquet 
 files and we don't want a schema merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8967) Implement @since as an annotation

2015-07-14 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626662#comment-14626662
 ] 

Xiangrui Meng commented on SPARK-8967:
--

The issue with Java annotation is that it doesn't show up correctly in the 
generated Scala doc. Especially, the version value disappears. I don't know a 
solution. We could switch to Scala annotation in MLlib, but this is not ideal.

 Implement @since as an annotation
 -

 Key: SPARK-8967
 URL: https://issues.apache.org/jira/browse/SPARK-8967
 Project: Spark
  Issue Type: New Feature
  Components: Documentation, Spark Core
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
   Original Estimate: 1h
  Remaining Estimate: 1h

 We use @since tag in JavaDoc. There exists one issue. For a overloaded 
 method, it inherits the doc from its parent if no JavaDoc is provided. 
 However, if we want to add @since, we have to add JavaDoc. Then we need to 
 copy the JavaDoc from parent, which makes it hard to keep docs in sync.
 A better solution would be implementing @since as an annotation, which is not 
 part of the JavaDoc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8945) Add and Subtract expression should support IntervalType

2015-07-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8945:
---

Assignee: Apache Spark

 Add and Subtract expression should support IntervalType
 ---

 Key: SPARK-8945
 URL: https://issues.apache.org/jira/browse/SPARK-8945
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Apache Spark





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8945) Add and Subtract expression should support IntervalType

2015-07-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626683#comment-14626683
 ] 

Apache Spark commented on SPARK-8945:
-

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/7398

 Add and Subtract expression should support IntervalType
 ---

 Key: SPARK-8945
 URL: https://issues.apache.org/jira/browse/SPARK-8945
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9038) Missing TaskEnd event when task attempt is superseded by another (speculative) attempt

2015-07-14 Thread Ryan Williams (JIRA)
Ryan Williams created SPARK-9038:


 Summary: Missing TaskEnd event when task attempt is superseded by 
another (speculative) attempt
 Key: SPARK-9038
 URL: https://issues.apache.org/jira/browse/SPARK-9038
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.1
Reporter: Ryan Williams


Yesterday I ran a job that produced [this event 
log|https://www.dropbox.com/s/y90rz0gxao5w9z9/application_1432740718700_3010?dl=0].

There are 17314 {{TaskStart}}'s and 17313 {{TaskEnd}}'s; task ID 15820 (aka 
13.0.526.0) is missing a {{TaskEnd}} event.

A speculative second attempt, ID 16295 (13.0.526.1) finished before it; 15820 
was the last taskattempt running in stage-attempt 13.0 and job 3, and when it 
finished the latter two were each marked as succeeded.

At the conclusion of stage 13 / job 3, I observed a few things to be in 
conflicting/inconsistent states:

*Reflecting 15820 as having finished successfully:*
* The stage page for 13.0 [showed SUCCESS in the Status column of the 
per-task-attempt 
table|http://cl.ly/image/2O0O42382p2W?_ga=1.265890767.118106744.1401937910].
* The driver stdout reported 15820's successful finish, and that it was being 
ignored due to another attempt of the same task (16295, per above) having 
already succeeded:
{code}
15/07/13 23:30:40 INFO scheduler.TaskSetManager: Ignoring task-finished event 
for 526.0 in stage 13.0 because task 526 has already completed successfully
15/07/13 23:30:40 INFO cluster.YarnScheduler: Removed TaskSet 13.0, whose tasks 
have all completed, from pool
15/07/13 23:30:40 INFO scheduler.DAGScheduler: Job 3 finished: collect at 
JointHistogram.scala:107, took 579.659523 s
{code}

*Not reflecting 15820 as having finished at all:*
* As I mentioned before, [the event 
log|https://www.dropbox.com/s/y90rz0gxao5w9z9/application_1432740718700_3010?dl=0]
 is missing a {{TaskEnd}} for 15820.
* The {{AllJobsPage}} shows 11258 tasks finished in job 3; it would have been 
11259 with 15820.
** Additionally, inspecting the page in the DOM revealed a 1-task-wide sliver 
of light-blue (i.e. running task(s)) in the progress bar.
** [This 
screenshot|http://cl.ly/image/3O201z0e0G2C?_ga=1.265890767.118106744.1401937910]
 shows both of these on the {{AllJobsPage}}.
* A history server, pointed at the event log, consistently shows 15820 as not 
having finished.
** This is somewhat unsurprising given that the event log powering the history 
server doesn't {{TaskEnd}} 15820, but seems notable nonetheless since the live 
UI seemingly *did* partially record the task as having ended (cf. stage page 
showing SUCCESS).
** Stage page shows 15820 as RUNNING.
** AllJobsPage shows 11258 tasks succeeded, 1 running.

I've gone over the relevant task-success code paths and can't understand how 
the stage page would show me SUCCESS in the live UI, without anything having 
been written to the event log or the AllJobsPage's counters having been 
updated. [Here is a bunch of my driver 
stdout|https://www.dropbox.com/s/pr7rswt4o2umm20/3010.stdout?dl=0], that shows 
nothing abnormal afaict; and [the dreaded message about events being 
dropped|https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L40]
 did not appear anywhere while the app was running, which was one of my only 
guesses about how this could have happened (but which wouldn't fully explain 
all of the above anyway).

Interested in hearing anyones' thoughts about how I might have arrived at this 
inconsistent state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-8945) Add and Subtract expression should support IntervalType

2015-07-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8945:
---

Assignee: (was: Apache Spark)

 Add and Subtract expression should support IntervalType
 ---

 Key: SPARK-8945
 URL: https://issues.apache.org/jira/browse/SPARK-8945
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9037) Task table pagination for the Stage page

2015-07-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626686#comment-14626686
 ] 

Apache Spark commented on SPARK-9037:
-

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/7399

 Task table pagination for the Stage page
 

 Key: SPARK-9037
 URL: https://issues.apache.org/jira/browse/SPARK-9037
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Shixiong Zhu

 Implement task table pagination for the Stage page to resolve the UI 
 scalability issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9037) Task table pagination for the Stage page

2015-07-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9037:
---

Assignee: Apache Spark

 Task table pagination for the Stage page
 

 Key: SPARK-9037
 URL: https://issues.apache.org/jira/browse/SPARK-9037
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Shixiong Zhu
Assignee: Apache Spark

 Implement task table pagination for the Stage page to resolve the UI 
 scalability issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9029) shortcut CaseKeyWhen if key is null

2015-07-14 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-9029:

Assignee: Wenchen Fan

 shortcut CaseKeyWhen if key is null
 ---

 Key: SPARK-9029
 URL: https://issues.apache.org/jira/browse/SPARK-9029
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan
Assignee: Wenchen Fan
Priority: Minor
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9037) Task table pagination for the Stage page

2015-07-14 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-9037:
---

 Summary: Task table pagination for the Stage page
 Key: SPARK-9037
 URL: https://issues.apache.org/jira/browse/SPARK-9037
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Shixiong Zhu


Implement task table pagination for the Stage page to resolve the UI 
scalability issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9037) Task table pagination for the Stage page

2015-07-14 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-9037.
--
Resolution: Duplicate

[~zsxwing] Please search JIRA first; this has been filed a few times now. You 
know the drill.

 Task table pagination for the Stage page
 

 Key: SPARK-9037
 URL: https://issues.apache.org/jira/browse/SPARK-9037
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Shixiong Zhu

 Implement task table pagination for the Stage page to resolve the UI 
 scalability issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9029) shortcut CaseKeyWhen if key is null

2015-07-14 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-9029.
-
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7389
[https://github.com/apache/spark/pull/7389]

 shortcut CaseKeyWhen if key is null
 ---

 Key: SPARK-9029
 URL: https://issues.apache.org/jira/browse/SPARK-9029
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan
Priority: Minor
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8965) Add ml-guide Python Example: Estimator, Transformer, and Param

2015-07-14 Thread Arijit Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625831#comment-14625831
 ] 

Arijit Saha edited comment on SPARK-8965 at 7/14/15 6:17 PM:
-

Hi Joseph,

I would like to take up this task.

Thanks,
Arijit.


was (Author: arijit saha):
Hi Joseph,

I would like to take up this task.
Being a starter, will help me, to understand flow.

Thanks,
Arijit.

 Add ml-guide Python Example: Estimator, Transformer, and Param
 --

 Key: SPARK-8965
 URL: https://issues.apache.org/jira/browse/SPARK-8965
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, ML, PySpark
Reporter: Joseph K. Bradley
Priority: Minor
  Labels: starter

 Look at: 
 [http://spark.apache.org/docs/latest/ml-guide.html#example-estimator-transformer-and-param]
 We need a Python example doing exactly the same thing, but in Python.  It 
 should be tested using the PySpark shell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9027) Generalize predicate pushdown into the metastore

2015-07-14 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-9027.
-
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7386
[https://github.com/apache/spark/pull/7386]

 Generalize predicate pushdown into the metastore
 

 Key: SPARK-9027
 URL: https://issues.apache.org/jira/browse/SPARK-9027
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9005) RegressionMetrics computing incorrect explainedVariance and r2

2015-07-14 Thread Ayman Farahat (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626815#comment-14626815
 ] 

Ayman Farahat commented on SPARK-9005:
--

I compared the R2 and RMSE after fitting an ALS model . here are the results
rank 40  r2 =   0.993274964231 explained var =  0.993566133802 count =  
94652197  meanres  -0.0606718131255 meanres2  0.085020285731
rank  50  r2 =   0.993547408858 explained var =  0.993826795105 count =  
94652197  meanres  -0.0594314727572 meanres2  0.081575944201

 RegressionMetrics computing incorrect explainedVariance and r2
 --

 Key: SPARK-9005
 URL: https://issues.apache.org/jira/browse/SPARK-9005
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Reporter: Feynman Liang
Assignee: Feynman Liang

 {{RegressionMetrics}} currently computes explainedVariance using 
 {{summary.variance(1)}} (variance of the residuals) where the [Wikipedia 
 definition|https://en.wikipedia.org/wiki/Fraction_of_variance_unexplained] 
 uses the residual sum of squares {{math.pow(summary.normL2(1), 2)}}. The two 
 coincide only when the predictor is unbiased (e.g. an intercept term is 
 included in a linear model), but this is not always the case. We should 
 change to be consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9022) UnsafeProject

2015-07-14 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu reassigned SPARK-9022:
-

Assignee: Davies Liu

 UnsafeProject
 -

 Key: SPARK-9022
 URL: https://issues.apache.org/jira/browse/SPARK-9022
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Reynold Xin
Assignee: Davies Liu

 Create a version of Project that projects output out directly into serialized 
 UnsafeRow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8718) Improve EdgePartition2D for non perfect square number of partitions

2015-07-14 Thread Ankur Dave (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Dave resolved SPARK-8718.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7104
[https://github.com/apache/spark/pull/7104]

 Improve EdgePartition2D for non perfect square number of partitions
 ---

 Key: SPARK-8718
 URL: https://issues.apache.org/jira/browse/SPARK-8718
 Project: Spark
  Issue Type: Improvement
  Components: GraphX
Reporter: Andrew Ray
Priority: Minor
 Fix For: 1.5.0


 The current implementation of EdgePartition2D has a major limitation:
 bq. One of the limitations of this approach is that the number of machines 
 must either be a perfect square. We partially address this limitation by 
 computing the machine assignment to the next largest perfect square and then 
 mapping back down to the actual number of machines. Unfortunately, this can 
 also lead to work imbalance and so it is suggested that a perfect square is 
 used.
 To remove this limitation I'm proposing the following code change. It allows 
 us to partition into any number of evenly sized bins while maintaining the 
 property that any vertex will only need to be replicated at most 2 * 
 sqrt(numParts) times. To maintain current behavior for perfect squares we use 
 the old algorithm in that case, although this could be removed if we dont 
 care about producing the exact same result.
 See this IPython notebook for a visualization of what is being proposed 
 [https://github.com/aray/e2d/blob/master/EdgePartition2D.ipynb] and download 
 it to interactively change the number of partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8343) Improve the Spark Streaming Guides

2015-07-14 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SPARK-8343:
---
Labels: spark.tc  (was: )

 Improve the Spark Streaming Guides
 --

 Key: SPARK-8343
 URL: https://issues.apache.org/jira/browse/SPARK-8343
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, Streaming
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Minor
  Labels: spark.tc
 Fix For: 1.4.1, 1.5.0


 Improve the Spark Streaming Guides by fixing broken links, rewording 
 confusing sections, fixing typos, adding missing words, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6485) Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark

2015-07-14 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SPARK-6485:
---
Labels: spark.tc  (was: )

 Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark
 --

 Key: SPARK-6485
 URL: https://issues.apache.org/jira/browse/SPARK-6485
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib, PySpark
Reporter: Xiangrui Meng
  Labels: spark.tc

 We should add APIs for CoordinateMatrix/RowMatrix/IndexedRowMatrix in 
 PySpark. Internally, we can use DataFrames for serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6485) Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark

2015-07-14 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627120#comment-14627120
 ] 

Mike Dusenberry commented on SPARK-6485:


Hey [~mengxr]. This is still coming, sorry about the delay!  I've been creating 
wrappers around the Scala/Java API, so it sounds like I'm on the right track.  
I plan to have it completed by the end of the week.

 Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark
 --

 Key: SPARK-6485
 URL: https://issues.apache.org/jira/browse/SPARK-6485
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib, PySpark
Reporter: Xiangrui Meng
  Labels: spark.tc

 We should add APIs for CoordinateMatrix/RowMatrix/IndexedRowMatrix in 
 PySpark. Internally, we can use DataFrames for serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9022) UnsafeProject

2015-07-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627035#comment-14627035
 ] 

Apache Spark commented on SPARK-9022:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/7402

 UnsafeProject
 -

 Key: SPARK-9022
 URL: https://issues.apache.org/jira/browse/SPARK-9022
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Reynold Xin
Assignee: Davies Liu

 Create a version of Project that projects output out directly into serialized 
 UnsafeRow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9022) UnsafeProject

2015-07-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9022:
---

Assignee: Apache Spark  (was: Davies Liu)

 UnsafeProject
 -

 Key: SPARK-9022
 URL: https://issues.apache.org/jira/browse/SPARK-9022
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Reynold Xin
Assignee: Apache Spark

 Create a version of Project that projects output out directly into serialized 
 UnsafeRow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9022) UnsafeProject

2015-07-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9022:
---

Assignee: Davies Liu  (was: Apache Spark)

 UnsafeProject
 -

 Key: SPARK-9022
 URL: https://issues.apache.org/jira/browse/SPARK-9022
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Reynold Xin
Assignee: Davies Liu

 Create a version of Project that projects output out directly into serialized 
 UnsafeRow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9043) Serialize key, value and combiner classes in ShuffleDependency

2015-07-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9043:
---

Assignee: Apache Spark

 Serialize key, value and combiner classes in ShuffleDependency
 --

 Key: SPARK-9043
 URL: https://issues.apache.org/jira/browse/SPARK-9043
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Matt Massie
Assignee: Apache Spark

 ShuffleManager implementations are currently not given type information 
 regarding the key, value and combiner classes. Serialization of shuffle 
 objects relies on them being JavaSerializable, with methods defined for 
 reading/writing the object or, alternatively, serialization via Kryo which 
 uses reflection.
 Serialization systems like Avro, Thrift and Protobuf generate classes with 
 zero argument constructors and explicit schema information (e.g. 
 IndexedRecords in Avro have get, put and getSchema methods).
 By serializing the key, value and combiner class names in ShuffleDependency, 
 shuffle implementations will have access to schema information when 
 registerShuffle() is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9043) Serialize key, value and combiner classes in ShuffleDependency

2015-07-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627051#comment-14627051
 ] 

Apache Spark commented on SPARK-9043:
-

User 'massie' has created a pull request for this issue:
https://github.com/apache/spark/pull/7403

 Serialize key, value and combiner classes in ShuffleDependency
 --

 Key: SPARK-9043
 URL: https://issues.apache.org/jira/browse/SPARK-9043
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Matt Massie

 ShuffleManager implementations are currently not given type information 
 regarding the key, value and combiner classes. Serialization of shuffle 
 objects relies on them being JavaSerializable, with methods defined for 
 reading/writing the object or, alternatively, serialization via Kryo which 
 uses reflection.
 Serialization systems like Avro, Thrift and Protobuf generate classes with 
 zero argument constructors and explicit schema information (e.g. 
 IndexedRecords in Avro have get, put and getSchema methods).
 By serializing the key, value and combiner class names in ShuffleDependency, 
 shuffle implementations will have access to schema information when 
 registerShuffle() is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9043) Serialize key, value and combiner classes in ShuffleDependency

2015-07-14 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9043:
---

Assignee: (was: Apache Spark)

 Serialize key, value and combiner classes in ShuffleDependency
 --

 Key: SPARK-9043
 URL: https://issues.apache.org/jira/browse/SPARK-9043
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Matt Massie

 ShuffleManager implementations are currently not given type information 
 regarding the key, value and combiner classes. Serialization of shuffle 
 objects relies on them being JavaSerializable, with methods defined for 
 reading/writing the object or, alternatively, serialization via Kryo which 
 uses reflection.
 Serialization systems like Avro, Thrift and Protobuf generate classes with 
 zero argument constructors and explicit schema information (e.g. 
 IndexedRecords in Avro have get, put and getSchema methods).
 By serializing the key, value and combiner class names in ShuffleDependency, 
 shuffle implementations will have access to schema information when 
 registerShuffle() is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9045) Fix Scala 2.11 build break due in UnsafeExternalRowSorter

2015-07-14 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-9045:
--
Affects Version/s: 1.5.0
 Target Version/s: 1.5.0

 Fix Scala 2.11 build break due in UnsafeExternalRowSorter
 -

 Key: SPARK-9045
 URL: https://issues.apache.org/jira/browse/SPARK-9045
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Josh Rosen
Assignee: Josh Rosen
Priority: Blocker

 {code}
 [error] 
 /home/jenkins/workspace/Spark-Master-Scala211-Compile/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java:135:
  error: anonymous org.apache.spark.sql.execution.UnsafeExternalRowSorter$1 
 is not abstract and does not override abstract method 
 BminBy(Function1InternalRow,B,OrderingB) in TraversableOnce
 [error]   return new AbstractScalaRowIterator() {
 [error] ^
 [error]   where B,A are type-variables:
 [error] B extends Object declared in method 
 BminBy(Function1A,B,OrderingB)
 [error] A extends Object declared in interface TraversableOnce
 [error] 1 error
 [error] Compile failed at Jul 14, 2015 2:26:25 PM [26.443s]
 {code}
 It turns out that this can be fixed by making AbstractScalaRowIterator into a 
 concrete class instead of an abstract class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7265) Improving documentation for Spark SQL Hive support

2015-07-14 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-7265:

Labels: spark.tc  (was: )

 Improving documentation for Spark SQL Hive support 
 ---

 Key: SPARK-7265
 URL: https://issues.apache.org/jira/browse/SPARK-7265
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 1.3.1
Reporter: Jihong MA
Assignee: Jihong MA
Priority: Trivial
  Labels: spark.tc
 Fix For: 1.5.0


 miscellaneous documentation improvement for Spark SQL Hive support, Yarn 
 cluster deployment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2859) Update url of Kryo project in related docs

2015-07-14 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-2859:

Labels: spark.tc  (was: )

 Update url of Kryo project in related docs
 --

 Key: SPARK-2859
 URL: https://issues.apache.org/jira/browse/SPARK-2859
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Guancheng Chen
Assignee: Guancheng Chen
Priority: Trivial
  Labels: spark.tc
 Fix For: 1.0.3, 1.1.0


 Kryo project has been migrated from googlecode to github, hence we need to 
 update its URL in related docs such as tuning.md.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8639) Instructions for executing jekyll in docs/README.md could be slightly more clear, typo in docs/api.md

2015-07-14 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-8639:

Labels: spark.tc  (was: )

 Instructions for executing jekyll in docs/README.md could be slightly more 
 clear, typo in docs/api.md
 -

 Key: SPARK-8639
 URL: https://issues.apache.org/jira/browse/SPARK-8639
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Rosstin Murphy
Assignee: Rosstin Murphy
Priority: Trivial
  Labels: spark.tc
 Fix For: 1.4.1, 1.5.0


 In docs/README.md, the text states around line 31
 Execute 'jekyll' from the 'docs/' directory. Compiling the site with Jekyll 
 will create a directory called '_site' containing index.html as well as the 
 rest of the compiled files.
 It might be more clear if we said
 Execute 'jekyll build' from the 'docs/' directory to compile the site. 
 Compiling the site with Jekyll will create a directory called '_site' 
 containing index.html as well as the rest of the compiled files.
 In docs/api.md: Here you can API docs for Spark and its submodules.
 should be something like: Here you can read API docs for Spark and its 
 submodules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5562) LDA should handle empty documents

2015-07-14 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-5562:

Labels: spark.tc  (was: starter)

 LDA should handle empty documents
 -

 Key: SPARK-5562
 URL: https://issues.apache.org/jira/browse/SPARK-5562
 Project: Spark
  Issue Type: Test
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Assignee: Alok Singh
Priority: Minor
  Labels: spark.tc, starter
 Fix For: 1.5.0

   Original Estimate: 96h
  Remaining Estimate: 96h

 Latent Dirichlet Allocation (LDA) could easily be given empty documents when 
 people select a small vocabulary.  We should check to make sure it is robust 
 to empty documents.
 This will hopefully take the form of a unit test, but may require modifying 
 the LDA implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7357) Improving HBaseTest example

2015-07-14 Thread Christian Kadner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Kadner updated SPARK-7357:

Labels: spark.tc  (was: )

 Improving HBaseTest example
 ---

 Key: SPARK-7357
 URL: https://issues.apache.org/jira/browse/SPARK-7357
 Project: Spark
  Issue Type: Improvement
  Components: Examples
Affects Versions: 1.3.1
Reporter: Jihong MA
Assignee: Jihong MA
Priority: Minor
  Labels: spark.tc
 Fix For: 1.5.0

   Original Estimate: 2m
  Remaining Estimate: 2m

 Minor improvement to HBaseTest example, when Hbase related configurations 
 e.g: zookeeper quorum, zookeeper client port or zookeeper.znode.parent are 
 not set to default (localhost:2181), connection to zookeeper might hang as 
 shown in following stack
 15/03/26 18:31:20 INFO zookeeper.ZooKeeper: Initiating client connection, 
 connectString=xxx.xxx.xxx:2181 sessionTimeout=9 
 watcher=hconnection-0x322a4437, quorum=xxx.xxx.xxx:2181, baseZNode=/hbase
 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Opening socket connection to 
 server 9.30.94.121:2181. Will not attempt to authenticate using SASL (unknown 
 error)
 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Socket connection established to 
 xxx.xxx.xxx/9.30.94.121:2181, initiating session
 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Session establishment complete 
 on server xxx.xxx.xxx/9.30.94.121:2181, sessionid = 0x14c53cd311e004b, 
 negotiated timeout = 4
 15/03/26 18:31:21 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper 
 is null
 this is due to hbase-site.xml is not placed on spark class path. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7920) Make MLlib ChiSqSelector Serializable ( Fix Related Documentation Example).

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7920:
---
Labels:   (was: spark.tc)

 Make MLlib ChiSqSelector Serializable ( Fix Related Documentation Example).
 

 Key: SPARK-7920
 URL: https://issues.apache.org/jira/browse/SPARK-7920
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.3.1, 1.4.0
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.4.0


 The MLlib ChiSqSelector class is not serializable, and so the example in the 
 ChiSqSelector documentation fails.  Also, that example is missing the import 
 of ChiSqSelector.  ChiSqSelector should just extend Serializable.
 Steps:
 1. Locate the MLlib ChiSqSelector documentation example.
 2. Fix the example by adding an import statement for ChiSqSelector.
 3. Attempt to run - notice that it will fail due to ChiSqSelector not being 
 serializable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8927) Doc format wrong for some config descriptions

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8927:
---
Labels:   (was: spark.tc)

 Doc format wrong for some config descriptions
 -

 Key: SPARK-8927
 URL: https://issues.apache.org/jira/browse/SPARK-8927
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 1.4.0
Reporter: Jon Alter
Assignee: Jon Alter
Priority: Trivial
 Fix For: 1.4.2, 1.5.0


 In the docs, a couple descriptions of configuration (under Network) are not 
 inside td/td and are being displayed immediately under the section title 
 instead of in their row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7985) Remove fittingParamMap references. Update ML Doc Estimator, Transformer, and Param examples.

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7985:
---
Labels:   (was: spark.tc)

 Remove fittingParamMap references. Update ML Doc Estimator, Transformer, 
 and Param examples.
 

 Key: SPARK-7985
 URL: https://issues.apache.org/jira/browse/SPARK-7985
 Project: Spark
  Issue Type: Bug
  Components: Documentation, ML
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.4.0


 Update ML Doc's Estimator, Transformer, and Param Scala  Java examples to 
 use model.extractParamMap instead of model.fittingParamMap, which no longer 
 exists.  Remove all other references to fittingParamMap throughout Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7969) Drop method on Dataframes should handle Column

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7969:
---
Labels:   (was: spark.tc)

 Drop method on Dataframes should handle Column
 --

 Key: SPARK-7969
 URL: https://issues.apache.org/jira/browse/SPARK-7969
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 1.4.0
Reporter: Olivier Girardot
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.4.1, 1.5.0


 For now the drop method available on Dataframe since Spark 1.4.0 only accepts 
 a column name (as a string), it should also accept a Column as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7969) Drop method on Dataframes should handle Column

2015-07-14 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SPARK-7969:
---
Labels: spark.tc  (was: )

 Drop method on Dataframes should handle Column
 --

 Key: SPARK-7969
 URL: https://issues.apache.org/jira/browse/SPARK-7969
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 1.4.0
Reporter: Olivier Girardot
Assignee: Mike Dusenberry
Priority: Minor
  Labels: spark.tc
 Fix For: 1.4.1, 1.5.0


 For now the drop method available on Dataframe since Spark 1.4.0 only accepts 
 a column name (as a string), it should also accept a Column as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7883) Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation.

2015-07-14 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SPARK-7883:
---
Target Version/s: 1.4.0, 1.0.3, 1.1.2, 1.2.3, 1.3.2  (was: 1.0.3, 1.1.2, 
1.2.3, 1.3.2, 1.4.0)
  Labels: spark.tc  (was: )

 Fixing broken trainImplicit example in MLlib Collaborative Filtering 
 documentation.
 ---

 Key: SPARK-7883
 URL: https://issues.apache.org/jira/browse/SPARK-7883
 Project: Spark
  Issue Type: Bug
  Components: Documentation, MLlib
Affects Versions: 1.0.2, 1.1.1, 1.2.2, 1.3.1, 1.4.0
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Trivial
  Labels: spark.tc
 Fix For: 1.0.3, 1.1.2, 1.2.3, 1.3.2, 1.4.0


 The trainImplicit Scala example near the end of the MLlib Collaborative 
 Filtering documentation refers to an ALS.trainImplicit function signature 
 that does not exist.  Rather than add an extra function, let's just fix the 
 example.
 Currently, the example refers to a function that would have the following 
 signature: 
 def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int, alpha: 
 Double) : MatrixFactorizationModel
 Instead, let's change the example to refer to this function, which does exist 
 (notice the addition of the lambda parameter):
 def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int, lambda: 
 Double, alpha: Double) : MatrixFactorizationModel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >