date:20141031


 [ 
https://issues.apache.org/jira/browse/SPARK-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-4108.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2970
[https://github.com/apache/spark/pull/2970]

 Fix uses of @deprecated in catalyst dataTypes
 -

 Key: SPARK-4108
 URL: https://issues.apache.org/jira/browse/SPARK-4108
 Project: Spark
  Issue Type: Task
  Components: SQL
Reporter: Anant Daksh Asthana
Priority: Trivial
 Fix For: 1.2.0


 @deprecated takes 2 parameters message and version 
 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala
 has a usage of @deprecated with just one parameter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4079) Snappy bundled with Spark does not work on older Linux distributions

2014-10-31 Thread Patrick Wendell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191457#comment-14191457
 ] 

Patrick Wendell commented on SPARK-4079:


Yeah that sounds like a good call. Did you want to do this?

 Snappy bundled with Spark does not work on older Linux distributions
 

 Key: SPARK-4079
 URL: https://issues.apache.org/jira/browse/SPARK-4079
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Marcelo Vanzin

 This issue has existed at least since 1.0, but has been made worse by 1.1 
 since snappy is now the default compression algorithm. When trying to use it 
 on a CentOS 5 machine, for example, you'll get something like this:
 {noformat}
   java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
 org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:319)
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:226)
at org.xerial.snappy.Snappy.clinit(Snappy.java:48)
at 
 org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79)
at 
 org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
at 
 org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207)
...
Caused by: java.lang.UnsatisfiedLinkError: 
 /tmp/snappy-1.0.5.3-af72bf3c-9dab-43af-a662-f9af657f06b1-libsnappyjava.so: 
 /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by 
 /tmp/snappy-1.0.5.3-af72bf3c-9dab-43af-a662-f9af657f06b1-libsnappyjava.so)
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1957)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1882)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1843)
at java.lang.Runtime.load0(Runtime.java:795)
at java.lang.System.load(System.java:1061)
at 
 org.xerial.snappy.SnappyNativeLoader.load(SnappyNativeLoader.java:39)
... 29 more
 {noformat}
 There are two approaches I can see here (well, 3):
 * Declare CentOS 5 (and similar OSes) not supported, although that would suck 
 for the people who are still on it and already use Spark
 * Fallback to another compression codec if Snappy cannot be loaded
 * Ask the Snappy guys to compile the library on an older OS...
 I think the second would be the best compromise.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3987) NNLS generates incorrect result


[ 
https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191459#comment-14191459
 ] 

Xiangrui Meng commented on SPARK-3987:
--

Please check the condition number of the matrix you sent. Did you run ALS with 
a very small lambda?

 NNLS generates incorrect result
 ---

 Key: SPARK-3987
 URL: https://issues.apache.org/jira/browse/SPARK-3987
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Debasish Das
Assignee: Shuo Xiang
 Fix For: 1.1.1, 1.2.0


 Hi,
 Please see the example gram matrix and linear term:
 val P2 = new DoubleMatrix(20, 20, 333907.312770, -60814.043975, 
 207935.829941, -162881.367739, -43730.396770, 17511.428983, -243340.496449, 
 -225245.957922, 104700.445881, 32430.845099, 336378.693135, -373497.970207, 
 -41147.159621, 53928.060360, -293517.883778, 53105.278068, 0.00, 
 -85257.781696, 84913.970469, -10584.080103, -60814.043975, 13826.806664, 
 -38032.612640, 33475.833875, 10791.916809, -1040.950810, 48106.552472, 
 45390.073380, -16310.282190, -2861.455903, -60790.833191, 73109.516544, 
 9826.614644, -8283.992464, 56991.742991, -6171.366034, 0.00, 
 19152.382499, -13218.721710, 2793.734234, 207935.829941, -38032.612640, 
 129661.677608, -101682.098412, -27401.299347, 10787.713362, -151803.006149, 
 -140563.601672, 65067.935324, 20031.263383, 209521.268600, -232958.054688, 
 -25764.179034, 33507.951918, -183046.845592, 32884.782835, 0.00, 
 -53315.811196, 52770.762546, -6642.187643, -162881.367739, 33475.833875, 
 -101682.098412, 85094.407608, 25422.850782, -5437.646141, 124197.166330, 
 116206.265909, -47093.484134, -11420.168521, -163429.436848, 189574.783900, 
 23447.172314, -24087.375367, 148311.355507, -20848.385466, 0.00, 
 46835.814559, -38180.352878, 6415.873901, -43730.396770, 10791.916809, 
 -27401.299347, 25422.850782, 8882.869799, 15.638084, 35933.473986, 
 34186.371325, -10745.330690, -974.314375, -43537.709621, 54371.010558, 
 7894.453004, -5408.929644, 42231.381747, -3192.010574, 0.00, 
 15058.753110, -8704.757256, 2316.581535, 17511.428983, -1040.950810, 
 10787.713362, -5437.646141, 15.638084, 2794.949847, -9681.950987, 
 -8258.171646, 7754.358930, 4193.359412, 18052.143842, -15456.096769, 
 -253.356253, 4089.672804, -12524.380088, 5651.579348, 0.00, -1513.302547, 
 6296.461898, 152.427321, -243340.496449, 48106.552472, -151803.006149, 
 124197.166330, 35933.473986, -9681.950987, 182931.600236, 170454.352953, 
 -72361.174145, -19270.461728, -244518.179729, 279551.060579, 33340.452802, 
 -37103.267653, 219025.288975, -33687.141423, 0.00, 67347.950443, 
 -58673.009647, 8957.800259, -225245.957922, 45390.073380, -140563.601672, 
 116206.265909, 34186.371325, -8258.171646, 170454.352953, 159322.942894, 
 -66074.960534, -16839.743193, -226173.967766, 260421.044094, 31624.194003, 
 -33839.612565, 203889.695169, -30034.828909, 0.00, 63525.040745, 
 -53572.741748, 8575.071847, 104700.445881, -16310.282190, 65067.935324, 
 -47093.484134, -10745.330690, 7754.358930, -72361.174145, -66074.960534, 
 35869.598076, 13378.653317, 106033.647837, -111831.682883, -10455.465743, 
 18537.392481, -88370.612394, 20344.288488, 0.00, -22935.482766, 
 29004.543704, -2409.461759, 32430.845099, -2861.455903, 20031.263383, 
 -11420.168521, -974.314375, 4193.359412, -19270.461728, -16839.743193, 
 13378.653317, 6802.081898, 33256.395091, -30421.985199, -1296.785870, 
 7026.518692, -24443.378205, 9221.982599, 0.00, -4088.076871, 
 10861.014242, -25.092938, 336378.693135, -60790.833191, 209521.268600, 
 -163429.436848, -43537.709621, 18052.143842, -244518.179729, -226173.967766, 
 106033.647837, 33256.395091, 339200.268106, -375442.716811, -41027.594509, 
 54636.778527, -295133.248586, 54177.278365, 0.00, -85237.666701, 
 85996.957056, -10503.209968, -373497.970207, 73109.516544, -232958.054688, 
 189574.783900, 54371.010558, -15456.096769, 279551.060579, 260421.044094, 
 -111831.682883, -30421.985199, -375442.716811, 427793.208465, 50528.074431, 
 -57375.986301, 335203.382015, -52676.385869, 0.00, 102368.307670, 
 -90679.792485, 13509.390393, -41147.159621, 9826.614644, -25764.179034, 
 23447.172314, 7894.453004, -253.356253, 33340.452802, 31624.194003, 
 -10455.465743, -1296.785870, -41027.594509, 50528.074431, 7255.977434, 
 -5281.636812, 39298.355527, -3440.450858, 0.00, 13717.870243, 
 -8471.405582, 2071.812204, 53928.060360, -8283.992464, 33507.951918, 
 -24087.375367, -5408.929644, 4089.672804, -37103.267653, -33839.612565, 
 18537.392481, 7026.518692, 54636.778527, -57375.986301, -5281.636812, 
 9735.061160, -45360.674033, 10634.633559, 0.00, -11652.364691, 
 15039.566630, -1202.539106, -293517.883778, 56991.742991, -183046.845592, 
 148311.355507,

[jira] [Updated] (SPARK-4164) spark.kryo.registrator shall use comma separated value to support multiple registrator

2014-10-31 Thread Jarred Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarred Li updated SPARK-4164:
-
Remaining Estimate: 2h
 Original Estimate: 2h

 spark.kryo.registrator shall use comma separated value to support multiple 
 registrator
 --

 Key: SPARK-4164
 URL: https://issues.apache.org/jira/browse/SPARK-4164
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Jarred Li
   Original Estimate: 2h
  Remaining Estimate: 2h

 Currently, spark.kryo.registrator only support one registrator class. For 
 example, 
 conf.set(spark.kryo.registrator, 
 org.apache.spark.graphx.GraphRegistrator).
 However, if there is user defined registrator class, it can not be 
 registered. 
 To improve the code, we can change the code in KryoSerializer to support 
 class with separator( for example comma).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4164) spark.kryo.registrator shall use comma separated value to support multiple registrator

2014-10-31 Thread Jarred Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191475#comment-14191475
 ] 

Jarred Li commented on SPARK-4164:
--

I can work on this issue. Could somebody assign this issue to me? Thanks!

 spark.kryo.registrator shall use comma separated value to support multiple 
 registrator
 --

 Key: SPARK-4164
 URL: https://issues.apache.org/jira/browse/SPARK-4164
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Jarred Li
   Original Estimate: 2h
  Remaining Estimate: 2h

 Currently, spark.kryo.registrator only support one registrator class. For 
 example, 
 conf.set(spark.kryo.registrator, 
 org.apache.spark.graphx.GraphRegistrator).
 However, if there is user defined registrator class, it can not be 
 registered. 
 To improve the code, we can change the code in KryoSerializer to support 
 class with separator( for example comma).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-4165) Actor with Companion throws ambiguous reference error in REPL

Shiti Saxena created SPARK-4165:
---

 Summary: Actor with Companion throws ambiguous reference error in 
REPL
 Key: SPARK-4165
 URL: https://issues.apache.org/jira/browse/SPARK-4165
 Project: Spark
  Issue Type: Bug
Reporter: Shiti Saxena


Tried the following in the master branch REPL.

{noformat}
Spark context available as sc.

scala import akka.actor.{Actor,Props}
import akka.actor.{Actor, Props}

scala :pas
// Entering paste mode (ctrl-D to finish)

class EchoActor extends Actor{
 override def receive = {
case message = sender ! message
}
}
object EchoActor {
  def props: Props = Props(new EchoActor())
}

// Exiting paste mode, now interpreting.

defined class EchoActor
defined module EchoActor

scala EchoActor.props
console:15: error: reference to EchoActor is ambiguous;
it is imported twice in the same scope by
import $VAL1.EchoActor
and import INSTANCE.EchoActor
  EchoActor.props
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3987) NNLS generates incorrect result


[ 
https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191502#comment-14191502
 ] 

Debasish Das commented on SPARK-3987:
-

Nope...standard ALS...same as netflix params...0.065 as L2...My ratings are
not within 1-5 but more like 1-10...

Also what's a good condition number for NNLS ?

On Thu, Oct 30, 2014 at 11:25 PM, Xiangrui Meng (JIRA) j...@apache.org



 NNLS generates incorrect result
 ---

 Key: SPARK-3987
 URL: https://issues.apache.org/jira/browse/SPARK-3987
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Debasish Das
Assignee: Shuo Xiang
 Fix For: 1.1.1, 1.2.0


 Hi,
 Please see the example gram matrix and linear term:
 val P2 = new DoubleMatrix(20, 20, 333907.312770, -60814.043975, 
 207935.829941, -162881.367739, -43730.396770, 17511.428983, -243340.496449, 
 -225245.957922, 104700.445881, 32430.845099, 336378.693135, -373497.970207, 
 -41147.159621, 53928.060360, -293517.883778, 53105.278068, 0.00, 
 -85257.781696, 84913.970469, -10584.080103, -60814.043975, 13826.806664, 
 -38032.612640, 33475.833875, 10791.916809, -1040.950810, 48106.552472, 
 45390.073380, -16310.282190, -2861.455903, -60790.833191, 73109.516544, 
 9826.614644, -8283.992464, 56991.742991, -6171.366034, 0.00, 
 19152.382499, -13218.721710, 2793.734234, 207935.829941, -38032.612640, 
 129661.677608, -101682.098412, -27401.299347, 10787.713362, -151803.006149, 
 -140563.601672, 65067.935324, 20031.263383, 209521.268600, -232958.054688, 
 -25764.179034, 33507.951918, -183046.845592, 32884.782835, 0.00, 
 -53315.811196, 52770.762546, -6642.187643, -162881.367739, 33475.833875, 
 -101682.098412, 85094.407608, 25422.850782, -5437.646141, 124197.166330, 
 116206.265909, -47093.484134, -11420.168521, -163429.436848, 189574.783900, 
 23447.172314, -24087.375367, 148311.355507, -20848.385466, 0.00, 
 46835.814559, -38180.352878, 6415.873901, -43730.396770, 10791.916809, 
 -27401.299347, 25422.850782, 8882.869799, 15.638084, 35933.473986, 
 34186.371325, -10745.330690, -974.314375, -43537.709621, 54371.010558, 
 7894.453004, -5408.929644, 42231.381747, -3192.010574, 0.00, 
 15058.753110, -8704.757256, 2316.581535, 17511.428983, -1040.950810, 
 10787.713362, -5437.646141, 15.638084, 2794.949847, -9681.950987, 
 -8258.171646, 7754.358930, 4193.359412, 18052.143842, -15456.096769, 
 -253.356253, 4089.672804, -12524.380088, 5651.579348, 0.00, -1513.302547, 
 6296.461898, 152.427321, -243340.496449, 48106.552472, -151803.006149, 
 124197.166330, 35933.473986, -9681.950987, 182931.600236, 170454.352953, 
 -72361.174145, -19270.461728, -244518.179729, 279551.060579, 33340.452802, 
 -37103.267653, 219025.288975, -33687.141423, 0.00, 67347.950443, 
 -58673.009647, 8957.800259, -225245.957922, 45390.073380, -140563.601672, 
 116206.265909, 34186.371325, -8258.171646, 170454.352953, 159322.942894, 
 -66074.960534, -16839.743193, -226173.967766, 260421.044094, 31624.194003, 
 -33839.612565, 203889.695169, -30034.828909, 0.00, 63525.040745, 
 -53572.741748, 8575.071847, 104700.445881, -16310.282190, 65067.935324, 
 -47093.484134, -10745.330690, 7754.358930, -72361.174145, -66074.960534, 
 35869.598076, 13378.653317, 106033.647837, -111831.682883, -10455.465743, 
 18537.392481, -88370.612394, 20344.288488, 0.00, -22935.482766, 
 29004.543704, -2409.461759, 32430.845099, -2861.455903, 20031.263383, 
 -11420.168521, -974.314375, 4193.359412, -19270.461728, -16839.743193, 
 13378.653317, 6802.081898, 33256.395091, -30421.985199, -1296.785870, 
 7026.518692, -24443.378205, 9221.982599, 0.00, -4088.076871, 
 10861.014242, -25.092938, 336378.693135, -60790.833191, 209521.268600, 
 -163429.436848, -43537.709621, 18052.143842, -244518.179729, -226173.967766, 
 106033.647837, 33256.395091, 339200.268106, -375442.716811, -41027.594509, 
 54636.778527, -295133.248586, 54177.278365, 0.00, -85237.666701, 
 85996.957056, -10503.209968, -373497.970207, 73109.516544, -232958.054688, 
 189574.783900, 54371.010558, -15456.096769, 279551.060579, 260421.044094, 
 -111831.682883, -30421.985199, -375442.716811, 427793.208465, 50528.074431, 
 -57375.986301, 335203.382015, -52676.385869, 0.00, 102368.307670, 
 -90679.792485, 13509.390393, -41147.159621, 9826.614644, -25764.179034, 
 23447.172314, 7894.453004, -253.356253, 33340.452802, 31624.194003, 
 -10455.465743, -1296.785870, -41027.594509, 50528.074431, 7255.977434, 
 -5281.636812, 39298.355527, -3440.450858, 0.00, 13717.870243, 
 -8471.405582, 2071.812204, 53928.060360, -8283.992464, 33507.951918, 
 -24087.375367, -5408.929644, 4089.672804, -37103.267653, -33839.612565, 
 18537.392481, 7026.518692, 54636.778527, -57375.986301, -5281.636812, 
 9735.061160, -45360.674033,

[jira] [Created] (SPARK-4166) Display the executor ID in the Web UI when ExecutorLostFailure happens

2014-10-31 Thread Shixiong Zhu (JIRA)

Shixiong Zhu created SPARK-4166:
---

 Summary: Display the executor ID in the Web UI when 
ExecutorLostFailure happens
 Key: SPARK-4166
 URL: https://issues.apache.org/jira/browse/SPARK-4166
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Web UI
Affects Versions: 1.1.0
Reporter: Shixiong Zhu
Priority: Minor


Now when  ExecutorLostFailure happens, it only displays ExecutorLostFailure 
(executor lost)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-4143) Move inner class DeferredObjectAdapter to top level


 [ 
https://issues.apache.org/jira/browse/SPARK-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-4143.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 3007
[https://github.com/apache/spark/pull/3007]

 Move inner class DeferredObjectAdapter to top level
 ---

 Key: SPARK-4143
 URL: https://issues.apache.org/jira/browse/SPARK-4143
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Cheng Hao
Assignee: Cheng Hao
Priority: Trivial
 Fix For: 1.2.0


 The class DeferredObjectAdapter is the inner class of HiveGenericUdf, which 
 may cause some overhead in closure ser/de-ser. Move it to top level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-4167) Schedule task on Executor will be Imbalance while task run less than local-wait time

SuYan created SPARK-4167:


 Summary: Schedule task on Executor will be Imbalance while task 
run less than local-wait time
 Key: SPARK-4167
 URL: https://issues.apache.org/jira/browse/SPARK-4167
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: SuYan


Recently, when run a spark on yarn job. it occurs executor schedules imbalance. 
the procedure is that: 
1. because user's mistake, the spark on yarn job's input split contains 0 byte 
empty splits. 
1.1: task0-99 , no-preference task(0 byte) task100-800, node-local task 1.2: 
user will run task 500 loops
1.3: 60 executor

 2.
 executor A only have 2 node-local task in the first loop, executor A first 
finished node-local-task, the it will run no-preference task, and the 
no-preference task in our situation have smaller input split than node-local 
task. So executor A finished all no-reference task, while others still run 
node-local job. 

in the second loop, all task have process-local level, and all task finished in 
3 seconds, so while executor A is still run process-local task while others are 
all finished process-local task. but all process-task run by executor A will 
finished in 3 seconds, so the local level will always be process-local. 

it results other executors are all wait for executor A the same situation in 
the left loops. 

To solve this situation, we let user to delete the empty input split.
 but is still have implied imbalance, while in some loops, a executor got more 
process-local task than others in one loop, and this task all less-3 seconds 
task. and then in the left loops, the others executor will wait that executor 
to finished all process-local tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4166) Display the executor ID in the Web UI when ExecutorLostFailure happens


[ 
https://issues.apache.org/jira/browse/SPARK-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191519#comment-14191519
 ] 

Apache Spark commented on SPARK-4166:
-

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/3033

 Display the executor ID in the Web UI when ExecutorLostFailure happens
 --

 Key: SPARK-4166
 URL: https://issues.apache.org/jira/browse/SPARK-4166
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Web UI
Affects Versions: 1.1.0
Reporter: Shixiong Zhu
Priority: Minor

 Now when  ExecutorLostFailure happens, it only displays ExecutorLostFailure 
 (executor lost)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4167) Schedule task on Executor will be Imbalance while task run less than local-wait time

[
https://issues.apache.org/jira/browse/SPARK-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

SuYan updated SPARK-4167:
-
Description:
Recently, when run a spark on yarn job. it occurs executor schedules imbalance.
the procedure is that:
1. because user's mistake, the spark on yarn job's input split contains 0 byte
empty splits.
1.1:
task0-99 , no-preference task(0 byte)
task100-800, node-local task
1.2: user will run task 500 loops
1.3: 60 executor

2.
executor A only have 2 node-local task in the first loop, executor A first
finished node-local-task, the it will run no-preference task, and the
no-preference task in our situation have smaller input split than node-local
task. So executor A finished all no-reference task, while others still run
node-local job.

in the second loop, all task have process-local level, and all task finished in
3 seconds, so while executor A is still run process-local task while others are
all finished process-local task. but all process-task run by executor A will
finished in 3 seconds, so the local level will always be process-local.

it results other executors are all wait for executor A the same situation in
the left loops.

To solve this situation, we let user to delete the empty input split.
but is still have implied imbalance, while in some loops, a executor got more
process-local task than others in one loop, and this task all less-3 seconds
task. and then in the left loops, the others executor will wait that executor
to finished all process-local tasks.

was:
Recently, when run a spark on yarn job. it occurs executor schedules imbalance.
the procedure is that:
1. because user's mistake, the spark on yarn job's input split contains 0 byte
empty splits.
1.1:
task0-99 , no-preference task(0 byte)
task100-800, node-local task 1.2: user will run task 500 loops
1.3: 60 executor

it results other executors are all wait for executor A the same situation in
the left loops.

Schedule task on Executor will be Imbalance while task run less than
local-wait time

Key: SPARK-4167
URL: https://issues.apache.org/jira/browse/SPARK-4167
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 1.1.0
Reporter: SuYan

Recently, when run a spark on yarn job. it occurs executor schedules
imbalance.
the procedure is that:
1. because user's mistake, the spark on yarn job's input split contains 0
byte empty splits.
1.1:
task0-99 , no-preference task(0 byte)
task100-800, node-local task
1.2: user will run task 500 loops
1.3: 60 executor
2.
executor A only have 2 node-local task in the first loop, executor A first
finished node-local-task, the it will run no-preference task, and the
no-preference task in our situation have smaller input split than node-local
task. So executor A finished all no-reference task, while others still run
node-local job.
in the second loop, all task have process-local level, and all task finished
in 3 seconds, so while executor A is still run process-local task while
others are all finished process-local task. but all process-task run by
executor A will finished in 3 seconds, so the local level will always be
process-local.
it results other executors are all wait for executor A the same situation in
the left loops.
To solve this situation, we let user to delete the empty input split.
but is still have implied imbalance, while in some loops, a executor got
more process-local task than others in one loop, and this task all less-3
seconds task. and then in the left loops, the others executor will wait that
executor to finished all process-local tasks.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SPARK-4167) Schedule task on Executor will be Imbalance while task run less than local-wait time

[
https://issues.apache.org/jira/browse/SPARK-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

SuYan updated SPARK-4167:
-
Description:
Recently, when run a spark on yarn job. it occurs executor schedules imbalance.
the procedure is that:
1. because user's mistake, the spark on yarn job's input split contains 0 byte
empty splits.
1.1:
task0-99 , no-preference task(0 byte)
task100-800, node-local task 1.2: user will run task 500 loops
1.3: 60 executor

it results other executors are all wait for executor A the same situation in
the left loops.

was:
Recently, when run a spark on yarn job. it occurs executor schedules imbalance.
the procedure is that:
1. because user's mistake, the spark on yarn job's input split contains 0 byte
empty splits.
1.1: task0-99 , no-preference task(0 byte) task100-800, node-local task 1.2:
user will run task 500 loops
1.3: 60 executor

it results other executors are all wait for executor A the same situation in
the left loops.

Schedule task on Executor will be Imbalance while task run less than
local-wait time

Key: SPARK-4167
URL: https://issues.apache.org/jira/browse/SPARK-4167
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 1.1.0
Reporter: SuYan

Recently, when run a spark on yarn job. it occurs executor schedules
imbalance.
the procedure is that:
1. because user's mistake, the spark on yarn job's input split contains 0
byte empty splits.
1.1:
task0-99 , no-preference task(0 byte)
task100-800, node-local task 1.2: user will run task 500 loops
1.3: 60 executor
2.
executor A only have 2 node-local task in the first loop, executor A first
finished node-local-task, the it will run no-preference task, and the
no-preference task in our situation have smaller input split than node-local
task. So executor A finished all no-reference task, while others still run
node-local job.
in the second loop, all task have process-local level, and all task finished
in 3 seconds, so while executor A is still run process-local task while
others are all finished process-local task. but all process-task run by
executor A will finished in 3 seconds, so the local level will always be
process-local.
it results other executors are all wait for executor A the same situation in
the left loops.
To solve this situation, we let user to delete the empty input split.
but is still have implied imbalance, while in some loops, a executor got
more process-local task than others in one loop, and this task all less-3
seconds task. and then in the left loops, the others executor will wait that
executor to finished all process-local tasks.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS


[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191551#comment-14191551
 ] 

Debasish Das commented on SPARK-2426:
-

[~mengxr] The matlab comparison scripts are open sourced over here:

https://github.com/debasish83/ecos/blob/master/matlab/admm/qprandom.m
https://github.com/debasish83/ecos/blob/master/matlab/pdco4/code/pdcotestQP.m

The detailed comparisons are on the REAME.md. Please look at the section on 
Matlab comparisons.

In a nutshell, for bounds MOSEK and ADMM are similar, for elastic net Proximal 
is 10X faster compared to MOSEK, for equality MOSEK is 2-3X faster than 
Proximal but both PDCO and ECOS produces much worse result as compared to ADMM. 
Accelerated ADMM also did not work as good as default ADMM. Increasing the 
over-relaxation parameter helped accelerated ADMM but I have not explored it 
yet.

ADMM and PDCO are in Matlab but ECOS and MOSEK are both using mex files so they 
are expected to be more efficient.

Next I will add the performance results of running positivity, box, sparse 
coding / regularized lsi and robust-plsa on MovieLens dataset and validate 
product recommendation using the MAP measure...In terms of RMSE, default  
positive  sparse coding...

What's the largest datasets LDA PRs are running? I would like to try that on 
sparse coding as well...From these papers sparse coding/RLSI should give 
results at par with LDA:

https://www.cs.cmu.edu/~xichen/images/SLSA-sdm11-final.pdf
http://web.stanford.edu/group/mmds/slides2012/s-hli.pdf

The same randomized matrices can be generated and run in the PR as follows:

./bin/spark-class org.apache.spark.mllib.optimization.QuadraticMinimizer 1000 1 
1.0 0.99

rank=1000, equality=1.0 lambda=1.0 beta=0.99
L1regularization = lambda*beta L2regularization = lambda*(1-beta)

Generating randomized QPs with rank 1000 equalities 1
sparseQp 88.423 ms iterations 45 converged true
posQp 181.369 ms iterations 121 converged true
boundsQp 175.733 ms iterations 121 converged true
Qp Equality 2805.564 ms iterations 2230 converged true

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit. 
 http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
 Based on Xiangrui's feedback I am currently comparing the ADMM based 
 Quadratic Minimization solvers with IPM based QpSolvers and the default 
 ALS/NNLS. I will keep updating the runtime comparison results.
 For integration the detailed plan is as follows:
 1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
 2. Integrate QuadraticMinimizer in mllib ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-2426) Quadratic Minimization for MLlib ALS


[ 
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191551#comment-14191551
 ] 

Debasish Das edited comment on SPARK-2426 at 10/31/14 8:04 AM:
---

[~mengxr] The matlab comparison scripts are open sourced over here:

https://github.com/debasish83/ecos/blob/master/matlab/admm/qprandom.m
https://github.com/debasish83/ecos/blob/master/matlab/pdco4/code/pdcotestQP.m

The detailed comparisons are on the REAME.md. Please look at the section on 
Matlab comparisons.

In a nutshell, for bounds MOSEK and ADMM are similar, for elastic net Proximal 
is 10X faster compared to MOSEK, for equality MOSEK is 2-3X faster than 
Proximal but both PDCO and ECOS produces much worse result as compared to ADMM. 
Accelerated ADMM also did not work as good as default ADMM. Increasing the 
over-relaxation parameter helps ADMM but I have not explored it yet.

ADMM and PDCO are in Matlab but ECOS and MOSEK are both using mex files so they 
are expected to be more efficient.

Next I will add the performance results of running positivity, box, sparse 
coding / regularized lsi and robust-plsa on MovieLens dataset and validate 
product recommendation using the MAP measure...In terms of RMSE, default  
positive  sparse coding...

What's the largest datasets LDA PRs are running? I would like to try that on 
sparse coding as well...From these papers sparse coding/RLSI should give 
results at par with LDA:

https://www.cs.cmu.edu/~xichen/images/SLSA-sdm11-final.pdf
http://web.stanford.edu/group/mmds/slides2012/s-hli.pdf

The same randomized matrices can be generated and run in the PR as follows:

./bin/spark-class org.apache.spark.mllib.optimization.QuadraticMinimizer 1000 1 
1.0 0.99

rank=1000, equality=1.0 lambda=1.0 beta=0.99
L1regularization = lambda*beta L2regularization = lambda*(1-beta)

Generating randomized QPs with rank 1000 equalities 1
sparseQp 88.423 ms iterations 45 converged true
posQp 181.369 ms iterations 121 converged true
boundsQp 175.733 ms iterations 121 converged true
Qp Equality 2805.564 ms iterations 2230 converged true


was (Author: debasish83):
[~mengxr] The matlab comparison scripts are open sourced over here:

https://github.com/debasish83/ecos/blob/master/matlab/admm/qprandom.m
https://github.com/debasish83/ecos/blob/master/matlab/pdco4/code/pdcotestQP.m

The detailed comparisons are on the REAME.md. Please look at the section on 
Matlab comparisons.

In a nutshell, for bounds MOSEK and ADMM are similar, for elastic net Proximal 
is 10X faster compared to MOSEK, for equality MOSEK is 2-3X faster than 
Proximal but both PDCO and ECOS produces much worse result as compared to ADMM. 
Accelerated ADMM also did not work as good as default ADMM. Increasing the 
over-relaxation parameter helped accelerated ADMM but I have not explored it 
yet.

ADMM and PDCO are in Matlab but ECOS and MOSEK are both using mex files so they 
are expected to be more efficient.

Next I will add the performance results of running positivity, box, sparse 
coding / regularized lsi and robust-plsa on MovieLens dataset and validate 
product recommendation using the MAP measure...In terms of RMSE, default  
positive  sparse coding...

What's the largest datasets LDA PRs are running? I would like to try that on 
sparse coding as well...From these papers sparse coding/RLSI should give 
results at par with LDA:

https://www.cs.cmu.edu/~xichen/images/SLSA-sdm11-final.pdf
http://web.stanford.edu/group/mmds/slides2012/s-hli.pdf

The same randomized matrices can be generated and run in the PR as follows:

./bin/spark-class org.apache.spark.mllib.optimization.QuadraticMinimizer 1000 1 
1.0 0.99

rank=1000, equality=1.0 lambda=1.0 beta=0.99
L1regularization = lambda*beta L2regularization = lambda*(1-beta)

Generating randomized QPs with rank 1000 equalities 1
sparseQp 88.423 ms iterations 45 converged true
posQp 181.369 ms iterations 121 converged true
boundsQp 175.733 ms iterations 121 converged true
Qp Equality 2805.564 ms iterations 2230 converged true

 Quadratic Minimization for MLlib ALS
 

 Key: SPARK-2426
 URL: https://issues.apache.org/jira/browse/SPARK-2426
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
   Original Estimate: 504h
  Remaining Estimate: 504h

 Current ALS supports least squares and nonnegative least squares.
 I presented ADMM and IPM based Quadratic Minimization solvers to be used for 
 the following ALS problems:
 1. ALS with bounds
 2. ALS with L1 regularization
 3. ALS with Equality constraint and bounds
 Initial runtime comparisons are presented at Spark Summit.

[jira] [Resolved] (SPARK-4162) Make scripts symlinkable

2014-10-31 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-4162.
--
Resolution: Duplicate

Duplicate of https://issues.apache.org/jira/browse/SPARK-3482 and 
https://issues.apache.org/jira/browse/SPARK-2960  Have a look at the PR for 
3482 and suggest changes. This has come up several times so would be good to 
get it fixed.

 Make scripts symlinkable 
 -

 Key: SPARK-4162
 URL: https://issues.apache.org/jira/browse/SPARK-4162
 Project: Spark
  Issue Type: Improvement
  Components: Deploy, EC2, Spark Shell
Affects Versions: 1.1.0
 Environment: Mac, linux
Reporter: Shay Seng

 Scripts are not symlink-able  because they all use:
 FWDIR=$(cd `dirname $0`/..; pwd) 
 to detect the parent Spark dir, which doesn't take into account symlinks. 
 Instead replace the above line with:
 SOURCE=$0;
 SCRIPT=`basename $SOURCE`;
 while [ -h $SOURCE ]; do
 SCRIPT=`basename $SOURCE`;
 LOOKUP=`ls -ld $SOURCE`;
 TARGET=`expr $LOOKUP : '.*- \(.*\)$'`;
 if expr ${TARGET:-.}/ : '/.*/$'  /dev/null; then
 SOURCE=${TARGET:-.};
 else
 SOURCE=`dirname $SOURCE`/${TARGET:-.};
 fi;
 done;
 FWDIR=$(cd `dirname $SOURCE`/..; pwd)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-4167) Schedule task on Executor will be Imbalance while task run less than local-wait time

[
https://issues.apache.org/jira/browse/SPARK-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

SuYan closed SPARK-4167.

Resolution: Not a Problem

Schedule task on Executor will be Imbalance while task run less than
local-wait time

Key: SPARK-4167
URL: https://issues.apache.org/jira/browse/SPARK-4167
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 1.1.0
Reporter: SuYan

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-4168) Completed Stages Number are misleading webUI when stages are more than 1000

2014-10-31 Thread Zhang, Liye (JIRA)

Zhang, Liye created SPARK-4168:
--

 Summary: Completed Stages Number are misleading webUI when stages 
are more than 1000
 Key: SPARK-4168
 URL: https://issues.apache.org/jira/browse/SPARK-4168
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Zhang, Liye


The number of completed stages and failed stages showed on webUI will always be 
less than 1000. This is really misleading when there are already thousands of 
stages completed or failed. The number should be correct even  when only 
partial of all stages listed on the webUI (stage info will be removed if the 
number is too large).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4168) Completed Stages Number are misleading webUI when stages are more than 1000


[ 
https://issues.apache.org/jira/browse/SPARK-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191643#comment-14191643
 ] 

Apache Spark commented on SPARK-4168:
-

User 'liyezhang556520' has created a pull request for this issue:
https://github.com/apache/spark/pull/3035

 Completed Stages Number are misleading webUI when stages are more than 1000
 ---

 Key: SPARK-4168
 URL: https://issues.apache.org/jira/browse/SPARK-4168
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Zhang, Liye

 The number of completed stages and failed stages showed on webUI will always 
 be less than 1000. This is really misleading when there are already thousands 
 of stages completed or failed. The number should be correct even  when only 
 partial of all stages listed on the webUI (stage info will be removed if the 
 number is too large).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-4169) [Core] Locale dependent code

2014-10-31 Thread Niklas Wilcke (JIRA)

Niklas Wilcke created SPARK-4169:


 Summary: [Core] Locale dependent code
 Key: SPARK-4169
 URL: https://issues.apache.org/jira/browse/SPARK-4169
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
 Environment: Debian, Locale: de_DE
Reporter: Niklas Wilcke
 Fix For: 1.2.0


With a non english locale the method isBindCollision in

core/src/main/scala/org/apache/spark/util/Utils.scala

doesn't work because it checks the exception message, which is locale dependent.

The test suite 
core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
also contains a locale dependent test string formatting of time durations 
which uses a DecimalSeperator which is locale dependent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-4170) Closure problems when running Scala app that extends App

2014-10-31 Thread Sean Owen (JIRA)

Sean Owen created SPARK-4170:


 Summary: Closure problems when running Scala app that extends App
 Key: SPARK-4170
 URL: https://issues.apache.org/jira/browse/SPARK-4170
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Sean Owen
Priority: Minor


Michael Albert noted this problem on the mailing list 
(http://apache-spark-user-list.1001560.n3.nabble.com/BUG-when-running-as-quot-extends-App-quot-closures-don-t-capture-variables-td17675.html):

{code}
object DemoBug extends App {
val conf = new SparkConf()
val sc = new SparkContext(conf)

val rdd = sc.parallelize(List(A,B,C,D))
val str1 = A

val rslt1 = rdd.filter(x = { x != A }).count
val rslt2 = rdd.filter(x = { str1 != null  x != A }).count

println(DemoBug: rslt1 =  + rslt1 +  rslt2 =  + rslt2)
}
{code}

This produces the output:

{code}
DemoBug: rslt1 = 3 rslt2 = 0
{code}

If instead there is a proper main(), it works as expected.


I also this week noticed that in a program which extends App, some values 
were inexplicably null in a closure. When changing to use main(), it was fine.

I assume there is a problem with variables not being added to the closure when 
main() doesn't appear in the standard way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4169) [Core] Locale dependent code

2014-10-31 Thread Niklas Wilcke (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Wilcke updated SPARK-4169:
-
Description: 
With a non english locale the method isBindCollision in

core/src/main/scala/org/apache/spark/util/Utils.scala

doesn't work because it checks the exception message, which is locale dependent.

The test suite 
core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
also contains a locale dependent test string formatting of time durations 
which uses a DecimalSeperator which is locale dependent.

I created a pull request on github to solve this issue.

  was:
With a non english locale the method isBindCollision in

core/src/main/scala/org/apache/spark/util/Utils.scala

doesn't work because it checks the exception message, which is locale dependent.

The test suite 
core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
also contains a locale dependent test string formatting of time durations 
which uses a DecimalSeperator which is locale dependent.


 [Core] Locale dependent code
 

 Key: SPARK-4169
 URL: https://issues.apache.org/jira/browse/SPARK-4169
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
 Environment: Debian, Locale: de_DE
Reporter: Niklas Wilcke
 Fix For: 1.2.0

   Original Estimate: 0.25h
  Remaining Estimate: 0.25h

 With a non english locale the method isBindCollision in
 core/src/main/scala/org/apache/spark/util/Utils.scala
 doesn't work because it checks the exception message, which is locale 
 dependent.
 The test suite 
 core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
 also contains a locale dependent test string formatting of time durations 
 which uses a DecimalSeperator which is locale dependent.
 I created a pull request on github to solve this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4169) [Core] Locale dependent code

2014-10-31 Thread Niklas Wilcke (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Wilcke updated SPARK-4169:
-
Labels: patch test  (was: )

 [Core] Locale dependent code
 

 Key: SPARK-4169
 URL: https://issues.apache.org/jira/browse/SPARK-4169
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
 Environment: Debian, Locale: de_DE
Reporter: Niklas Wilcke
  Labels: patch, test
 Fix For: 1.2.0

   Original Estimate: 0.25h
  Remaining Estimate: 0.25h

 With a non english locale the method isBindCollision in
 core/src/main/scala/org/apache/spark/util/Utils.scala
 doesn't work because it checks the exception message, which is locale 
 dependent.
 The test suite 
 core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
 also contains a locale dependent test string formatting of time durations 
 which uses a DecimalSeperator which is locale dependent.
 I created a pull request on github to solve this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4169) [Core] Locale dependent code


[ 
https://issues.apache.org/jira/browse/SPARK-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191700#comment-14191700
 ] 

Apache Spark commented on SPARK-4169:
-

User 'numbnut' has created a pull request for this issue:
https://github.com/apache/spark/pull/3036

 [Core] Locale dependent code
 

 Key: SPARK-4169
 URL: https://issues.apache.org/jira/browse/SPARK-4169
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
 Environment: Debian, Locale: de_DE
Reporter: Niklas Wilcke
  Labels: patch, test
 Fix For: 1.2.0

   Original Estimate: 0.25h
  Remaining Estimate: 0.25h

 With a non english locale the method isBindCollision in
 core/src/main/scala/org/apache/spark/util/Utils.scala
 doesn't work because it checks the exception message, which is locale 
 dependent.
 The test suite 
 core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
 also contains a locale dependent test string formatting of time durations 
 which uses a DecimalSeperator which is locale dependent.
 I created a pull request on github to solve this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4165) Actor with Companion throws ambiguous reference error in REPL


 [ 
https://issues.apache.org/jira/browse/SPARK-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiti Saxena updated SPARK-4165:

Affects Version/s: 1.2.0
   1.0.1
   1.1.0

 Actor with Companion throws ambiguous reference error in REPL
 -

 Key: SPARK-4165
 URL: https://issues.apache.org/jira/browse/SPARK-4165
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.0.1, 1.1.0, 1.2.0
Reporter: Shiti Saxena

 Tried the following in the master branch REPL.
 {noformat}
 Spark context available as sc.
 scala import akka.actor.{Actor,Props}
 import akka.actor.{Actor, Props}
 scala :pas
 // Entering paste mode (ctrl-D to finish)
 class EchoActor extends Actor{
  override def receive = {
 case message = sender ! message
 }
 }
 object EchoActor {
   def props: Props = Props(new EchoActor())
 }
 // Exiting paste mode, now interpreting.
 defined class EchoActor
 defined module EchoActor
 scala EchoActor.props
 console:15: error: reference to EchoActor is ambiguous;
 it is imported twice in the same scope by
 import $VAL1.EchoActor
 and import INSTANCE.EchoActor
   EchoActor.props
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-4171) StreamingContext.actorStream throws serializationError

Shiti Saxena created SPARK-4171:
---

 Summary: StreamingContext.actorStream throws serializationError
 Key: SPARK-4171
 URL: https://issues.apache.org/jira/browse/SPARK-4171
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.1.0, 1.2.0
Reporter: Shiti Saxena






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4171) StreamingContext.actorStream throws serializationError


 [ 
https://issues.apache.org/jira/browse/SPARK-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiti Saxena updated SPARK-4171:

Description: I encountered this issue when 

 StreamingContext.actorStream throws serializationError
 --

 Key: SPARK-4171
 URL: https://issues.apache.org/jira/browse/SPARK-4171
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.1.0, 1.2.0
Reporter: Shiti Saxena

 I encountered this issue when 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4171) StreamingContext.actorStream throws serializationError


 [ 
https://issues.apache.org/jira/browse/SPARK-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiti Saxena updated SPARK-4171:

Description: 
I encountered this issue when I was working on 
https://issues.apache.org/jira/browse/SPARK-3872.
Running the following test case on v1.1.0 and the master 
branch(v1.2.0-SNAPSHOT) throws a serialization error. 

{noformat}
 test(actor input stream) {
// Set up the streaming context and input streams
val ssc = new StreamingContext(conf, batchDuration)

val networkStream = ssc.actorStream[String](EchoActor.props, TestActor,
  // Had to pass the local value of port to prevent from closing over 
entire scope
  StorageLevel.MEMORY_AND_DISK)

println(created actor)
networkStream.print()

ssc.start()
Thread.sleep(3 * 1000)
println(started stream)

Thread.sleep(3*1000)
logInfo(Stopping server)
logInfo(Stopping context)
ssc.stop()
  }

class EchoActor extends Actor with ActorHelper {
  override def receive = {
case message = sender ! message
  }
}

object EchoActor {
  def props: Props = Props(new EchoActor())
}
{noformat}



  was:I encountered this issue when 


 StreamingContext.actorStream throws serializationError
 --

 Key: SPARK-4171
 URL: https://issues.apache.org/jira/browse/SPARK-4171
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.1.0, 1.2.0
Reporter: Shiti Saxena

 I encountered this issue when I was working on 
 https://issues.apache.org/jira/browse/SPARK-3872.
 Running the following test case on v1.1.0 and the master 
 branch(v1.2.0-SNAPSHOT) throws a serialization error. 
 {noformat}
  test(actor input stream) {
 // Set up the streaming context and input streams
 val ssc = new StreamingContext(conf, batchDuration)
 val networkStream = ssc.actorStream[String](EchoActor.props, TestActor,
   // Had to pass the local value of port to prevent from closing over 
 entire scope
   StorageLevel.MEMORY_AND_DISK)
 println(created actor)
 networkStream.print()
 ssc.start()
 Thread.sleep(3 * 1000)
 println(started stream)
 Thread.sleep(3*1000)
 logInfo(Stopping server)
 logInfo(Stopping context)
 ssc.stop()
   }
 class EchoActor extends Actor with ActorHelper {
   override def receive = {
 case message = sender ! message
   }
 }
 object EchoActor {
   def props: Props = Props(new EchoActor())
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4171) StreamingContext.actorStream throws serializationError


 [ 
https://issues.apache.org/jira/browse/SPARK-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiti Saxena updated SPARK-4171:

Description: 
I encountered this issue when I was working on 
https://issues.apache.org/jira/browse/SPARK-3872.
Running the following test case on v1.1.0 and the master 
branch(v1.2.0-SNAPSHOT) throws a serialization error. 

{noformat}
 test(actor input stream) {
// Set up the streaming context and input streams
val ssc = new StreamingContext(conf, batchDuration)

val networkStream = ssc.actorStream[String](EchoActor.props, TestActor,
  // Had to pass the local value of port to prevent from closing over 
entire scope
  StorageLevel.MEMORY_AND_DISK)

println(created actor)
networkStream.print()

ssc.start()
Thread.sleep(3 * 1000)
println(started stream)

Thread.sleep(3*1000)
logInfo(Stopping server)
logInfo(Stopping context)
ssc.stop()
  }
{noformat}

where EchoActor is defined as 

{noformat}
class EchoActor extends Actor with ActorHelper {
  override def receive = {
case message = sender ! message
  }
}

object EchoActor {
  def props: Props = Props(new EchoActor())
}
{noformat}



  was:
I encountered this issue when I was working on 
https://issues.apache.org/jira/browse/SPARK-3872.
Running the following test case on v1.1.0 and the master 
branch(v1.2.0-SNAPSHOT) throws a serialization error. 

{noformat}
 test(actor input stream) {
// Set up the streaming context and input streams
val ssc = new StreamingContext(conf, batchDuration)

val networkStream = ssc.actorStream[String](EchoActor.props, TestActor,
  // Had to pass the local value of port to prevent from closing over 
entire scope
  StorageLevel.MEMORY_AND_DISK)

println(created actor)
networkStream.print()

ssc.start()
Thread.sleep(3 * 1000)
println(started stream)

Thread.sleep(3*1000)
logInfo(Stopping server)
logInfo(Stopping context)
ssc.stop()
  }

class EchoActor extends Actor with ActorHelper {
  override def receive = {
case message = sender ! message
  }
}

object EchoActor {
  def props: Props = Props(new EchoActor())
}
{noformat}




 StreamingContext.actorStream throws serializationError
 --

 Key: SPARK-4171
 URL: https://issues.apache.org/jira/browse/SPARK-4171
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.1.0, 1.2.0
Reporter: Shiti Saxena

 I encountered this issue when I was working on 
 https://issues.apache.org/jira/browse/SPARK-3872.
 Running the following test case on v1.1.0 and the master 
 branch(v1.2.0-SNAPSHOT) throws a serialization error. 
 {noformat}
  test(actor input stream) {
 // Set up the streaming context and input streams
 val ssc = new StreamingContext(conf, batchDuration)
 val networkStream = ssc.actorStream[String](EchoActor.props, TestActor,
   // Had to pass the local value of port to prevent from closing over 
 entire scope
   StorageLevel.MEMORY_AND_DISK)
 println(created actor)
 networkStream.print()
 ssc.start()
 Thread.sleep(3 * 1000)
 println(started stream)
 Thread.sleep(3*1000)
 logInfo(Stopping server)
 logInfo(Stopping context)
 ssc.stop()
   }
 {noformat}
 where EchoActor is defined as 
 {noformat}
 class EchoActor extends Actor with ActorHelper {
   override def receive = {
 case message = sender ! message
   }
 }
 object EchoActor {
   def props: Props = Props(new EchoActor())
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4171) StreamingContext.actorStream throws serializationError


[ 
https://issues.apache.org/jira/browse/SPARK-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191736#comment-14191736
 ] 

Shiti Saxena commented on SPARK-4171:
-

After applying the patch from https://github.com/apache/spark/pull/2158, I was 
able to replicate the issue in the REPL as well,

{noformat}
Spark context available as sc.

scala import org.apache.spark.streaming.receiver.{ActorHelper, Receiver}
import org.apache.spark.streaming.receiver.{ActorHelper, Receiver}

scala import akka.actor.{Actor,Props}
import akka.actor.{Actor, Props}

scala import org.apache.spark.streaming._
import org.apache.spark.streaming._

scala Seconds(1)
res0: org.apache.spark.streaming.Duration = 1000 ms

scala val ssc= new StreamingContext(sc,res0)
ssc: org.apache.spark.streaming.StreamingContext = 
org.apache.spark.streaming.StreamingContext@1b1bca6c

scala :pas
// Entering paste mode (ctrl-D to finish)

class EchoActor extends Actor with ActorHelper {
  override def receive = {
case message = sender ! message
  }
}

object EchoActor {
  def props: Props = Props(new EchoActor())
}
defined class EchoActor
defined module EchoActor

scala ssc.actorStream[String](EchoActor.props, TestActor)
res1: org.apache.spark.streaming.dstream.ReceiverInputDStream[String] = 
org.apache.spark.streaming.dstream.PluggableInputDStream@56a620b4

scala res1.print()

scala ssc.start()
14/10/31 16:52:48 INFO ReceiverTracker: ReceiverTracker started
14/10/31 16:52:48 INFO ForEachDStream: metadataCleanupDelay = -1
14/10/31 16:52:48 INFO PluggableInputDStream: metadataCleanupDelay = -1
14/10/31 16:52:48 INFO PluggableInputDStream: Slide time = 1000 ms
14/10/31 16:52:48 INFO PluggableInputDStream: Storage level = 
StorageLevel(false, false, false, false, 1)
14/10/31 16:52:48 INFO PluggableInputDStream: Checkpoint interval = null
14/10/31 16:52:48 INFO PluggableInputDStream: Remember duration = 1000 ms
14/10/31 16:52:48 INFO PluggableInputDStream: Initialized and validated 
org.apache.spark.streaming.dstream.PluggableInputDStream@56a620b4
14/10/31 16:52:48 INFO ForEachDStream: Slide time = 1000 ms
14/10/31 16:52:48 INFO ForEachDStream: Storage level = StorageLevel(false, 
false, false, false, 1)
14/10/31 16:52:48 INFO ForEachDStream: Checkpoint interval = null
14/10/31 16:52:48 INFO ForEachDStream: Remember duration = 1000 ms
14/10/31 16:52:48 INFO ForEachDStream: Initialized and validated 
org.apache.spark.streaming.dstream.ForEachDStream@4a5a796
14/10/31 16:52:48 INFO ReceiverTracker: Starting 1 receivers
14/10/31 16:52:48 INFO SparkContext: Starting job: runJob at 
ReceiverTracker.scala:275
14/10/31 16:52:48 INFO DAGScheduler: Got job 0 (runJob at 
ReceiverTracker.scala:275) with 1 output partitions (allowLocal=false)
14/10/31 16:52:48 INFO DAGScheduler: Final stage: Stage 0(runJob at 
ReceiverTracker.scala:275)
14/10/31 16:52:48 INFO DAGScheduler: Parents of final stage: List()
14/10/31 16:52:48 INFO DAGScheduler: Missing parents: List()
14/10/31 16:52:48 INFO DAGScheduler: Submitting Stage 0 
(ParallelCollectionRDD[0] at makeRDD at ReceiverTracker.scala:253), which has 
no missing parents
14/10/31 16:52:48 INFO RecurringTimer: Started timer for JobGenerator at time 
1414754569000
14/10/31 16:52:48 INFO JobGenerator: Started JobGenerator at 1414754569000 ms
14/10/31 16:52:48 INFO JobScheduler: Started JobScheduler

scala 14/10/31 16:52:48 INFO MemoryStore: ensureFreeSpace(1216) called with 
curMem=0, maxMem=278302556
14/10/31 16:52:48 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 1216.0 B, free 265.4 MB)
14/10/31 16:52:48 INFO TaskSchedulerImpl: Cancelling stage 0
14/10/31 16:52:48 INFO DAGScheduler: Failed to run runJob at 
ReceiverTracker.scala:275
Exception in thread Thread-38 org.apache.spark.SparkException: Job aborted 
due to stage failure: Task not serializable: java.io.NotSerializableException: 
$line19.$read$$iwC$$iwC$EchoActor$
- field (class $line19.$read$$iwC$$iwC$EchoActor$$anonfun$props$1, 
name: $outer, type: class $line19.$read$$iwC$$iwC$EchoActor$)
- object (class $line19.$read$$iwC$$iwC$EchoActor$$anonfun$props$1, 
function0)
- element of array (index: 1)
- array (class [Ljava.lang.Object;, size: 32)
- field (class scala.collection.immutable.Vector, name: display0, 
type: class [Ljava.lang.Object;)
- object (class scala.collection.immutable.Vector, Vector(class 
$line19.$read$$iwC$$iwC$EchoActor, function0))
- field (class akka.actor.Props, name: args, type: interface 
scala.collection.immutable.Seq)
- object (class akka.actor.Props, 
Props(Deploy(,Config(SimpleConfigObject({})),NoRouter,NoScopeGiven,,),class 
akka.actor.TypedCreatorFunctionConsumer,Vector(class 
$line19.$read$$iwC$$iwC$EchoActor, function0)))
- field (class org.apache.spark.streaming.receiver.ActorReceiver, 
name:

[jira] [Updated] (SPARK-4171) StreamingContext.actorStream throws serializationError


 [ 
https://issues.apache.org/jira/browse/SPARK-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiti Saxena updated SPARK-4171:

Description: 
I encountered this issue when I was working on 
https://issues.apache.org/jira/browse/SPARK-3872.
Running the following test case on v1.1.0 and the master 
branch(v1.2.0-SNAPSHOT) throws a serialization error. 

{noformat}
 test(actor input stream) {
// Set up the streaming context and input streams
val ssc = new StreamingContext(conf, batchDuration)

val networkStream = ssc.actorStream[String](EchoActor.props, TestActor,
  // Had to pass the local value of port to prevent from closing over 
entire scope
  StorageLevel.MEMORY_AND_DISK)

println(created actor)
networkStream.print()

ssc.start()
Thread.sleep(3 * 1000)
println(started stream)

Thread.sleep(3*1000)
logInfo(Stopping server)
logInfo(Stopping context)
ssc.stop()
  }
{noformat}

where EchoActor is defined as 

{noformat}
class EchoActor extends Actor with ActorHelper {
  override def receive = {
case message = sender ! message
  }
}

object EchoActor {
  def props: Props = Props(new EchoActor())
}
{noformat}

The same code works with v1.0.1 


  was:
I encountered this issue when I was working on 
https://issues.apache.org/jira/browse/SPARK-3872.
Running the following test case on v1.1.0 and the master 
branch(v1.2.0-SNAPSHOT) throws a serialization error. 

{noformat}
 test(actor input stream) {
// Set up the streaming context and input streams
val ssc = new StreamingContext(conf, batchDuration)

val networkStream = ssc.actorStream[String](EchoActor.props, TestActor,
  // Had to pass the local value of port to prevent from closing over 
entire scope
  StorageLevel.MEMORY_AND_DISK)

println(created actor)
networkStream.print()

ssc.start()
Thread.sleep(3 * 1000)
println(started stream)

Thread.sleep(3*1000)
logInfo(Stopping server)
logInfo(Stopping context)
ssc.stop()
  }
{noformat}

where EchoActor is defined as 

{noformat}
class EchoActor extends Actor with ActorHelper {
  override def receive = {
case message = sender ! message
  }
}

object EchoActor {
  def props: Props = Props(new EchoActor())
}
{noformat}




 StreamingContext.actorStream throws serializationError
 --

 Key: SPARK-4171
 URL: https://issues.apache.org/jira/browse/SPARK-4171
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.1.0, 1.2.0
Reporter: Shiti Saxena

 I encountered this issue when I was working on 
 https://issues.apache.org/jira/browse/SPARK-3872.
 Running the following test case on v1.1.0 and the master 
 branch(v1.2.0-SNAPSHOT) throws a serialization error. 
 {noformat}
  test(actor input stream) {
 // Set up the streaming context and input streams
 val ssc = new StreamingContext(conf, batchDuration)
 val networkStream = ssc.actorStream[String](EchoActor.props, TestActor,
   // Had to pass the local value of port to prevent from closing over 
 entire scope
   StorageLevel.MEMORY_AND_DISK)
 println(created actor)
 networkStream.print()
 ssc.start()
 Thread.sleep(3 * 1000)
 println(started stream)
 Thread.sleep(3*1000)
 logInfo(Stopping server)
 logInfo(Stopping context)
 ssc.stop()
   }
 {noformat}
 where EchoActor is defined as 
 {noformat}
 class EchoActor extends Actor with ActorHelper {
   override def receive = {
 case message = sender ! message
   }
 }
 object EchoActor {
   def props: Props = Props(new EchoActor())
 }
 {noformat}
 The same code works with v1.0.1 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3183) Add option for requesting full YARN cluster

2014-10-31 Thread Gen TANG (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191781#comment-14191781
 ] 

Gen TANG commented on SPARK-3183:
-

The same workaround for --num-executors. For the memory, I am thinking to use 
_yarn.scheduler.maximum-allocation-mb_ as --executor-memory


 Add option for requesting full YARN cluster
 ---

 Key: SPARK-3183
 URL: https://issues.apache.org/jira/browse/SPARK-3183
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Reporter: Sandy Ryza

 This could possibly be in the form of --executor-cores ALL --executor-memory 
 ALL --num-executors ALL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3780) YarnAllocator should look at the container completed diagnostic message

2014-10-31 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-3780.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

 YarnAllocator should look at the container completed diagnostic message
 ---

 Key: SPARK-3780
 URL: https://issues.apache.org/jira/browse/SPARK-3780
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 1.1.0
Reporter: Thomas Graves
Assignee: Sandy Ryza
 Fix For: 1.2.0


 Yarn will give us a diagnostic message along with a container complete 
 notification. We should print that diagnostic message for the spark user.  
 For instance, I believe if it the container gets shot for being over its 
 memory limit yarn would give us a useful diagnostic saying that.  This would 
 be really useful for the user to be able to see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2220) Fix remaining Hive Commands

2014-10-31 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191841#comment-14191841
 ] 

Cheng Lian commented on SPARK-2220:
---

It turned out that {{ShellCommand}} and {{SourceCommand}} are wrongly 
interpreted in Spark SQL previously. These two classes correspond to the {{\!}} 
and {{SOURCE}} syntaxes respectively in Spark SQL. However, back in Hive, 
{{\!}} is interpreted in different ways by Hive CLI and Beeline, and {{SOURCE}} 
is only supported by Hive CLI.

For {{\!}}, in Hive CLI, {{\!}} starts a shell command (e.g. {{\!ls;}} and 
{{\!cat foo;}}), while in Beeline {{\!}} starts a Beeline command (e.g. 
{{\!connect jdbc:hive://localhost:1}} and {{\!run script.sql}}). And the 
{{SOURCE file}} command in Hive CLI is equivalent to the {{\!run file}} command 
in Beeline.

In a word, functionalities of these two commands should not be implemented in 
{{sql/core}} and/or {{sql/hive}}, but are already implemented as part of Spark 
SQL CLI and Hive Beeline.

 Fix remaining Hive Commands
 ---

 Key: SPARK-2220
 URL: https://issues.apache.org/jira/browse/SPARK-2220
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Cheng Lian

 None of the following have an execution plan:
 {code}
 private[hive] case class ShellCommand(cmd: String) extends Command
 private[hive] case class SourceCommand(filePath: String) extends Command
 private[hive] case class AddFile(filePath: String) extends Command
 {code}
 dfs is being fixed in a related PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2220) Fix remaining Hive Commands


[ 
https://issues.apache.org/jira/browse/SPARK-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191873#comment-14191873
 ] 

Apache Spark commented on SPARK-2220:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/3038

 Fix remaining Hive Commands
 ---

 Key: SPARK-2220
 URL: https://issues.apache.org/jira/browse/SPARK-2220
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Cheng Lian

 None of the following have an execution plan:
 {code}
 private[hive] case class ShellCommand(cmd: String) extends Command
 private[hive] case class SourceCommand(filePath: String) extends Command
 private[hive] case class AddFile(filePath: String) extends Command
 {code}
 dfs is being fixed in a related PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-2426) Quadratic Minimization for MLlib ALS

[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095232#comment-14095232
]

Debasish Das edited comment on SPARK-2426 at 10/31/14 4:20 PM:
---

Hi Xiangrui,

The branch is ready for an initial review. I will do lot of clean-up this week.

I need some advice on whether we should bring the additional ALS features first
or integrate NNLS with QuadraticMinimizer so that we can handle large ranks as
well.

https://github.com/debasish83/spark/commits/qp-als

optimization/QuadraticMinimizer.scala is the placeholder for all
QuadraticMinimization.

Right now we support 5 features:

1. Least square
2. Quadratic minimization with positivity
3. Quadratic minimization with box : generalization of positivity
4. Quadratic minimization with elastic net :L1 is at 0.99, elastic net control
is not given to users
5. Quadratic minimization with affine constraints and bounds

There are lot many regularization in Proximal.scala which can be re-used in
mllib updater...L1Updater in mllib is an example of Proximal algorithm...

QuadraticMinimizer is optimized for direct solve right now (cholesky / lu based
on problem we are solving)

The CG core from Breeze will be used for iterative solve when ranks are
high...I need a different variant of CG for Formulation 5 so Breeze CG is not
sufficient for all the formulations this branch supports and needs to be
extended..

Right now I am experimenting with ADMM rho and lambda values so that the NNLS
iterations are at par with Least square with positivity. The idea for rho and
lambda tuning are the following:

1. Derive an optimal value of lambda for quadratic problems, similar to idea of
Nesterov's acceleration being used in algorithms like FISTA and accelerated
ADMM from UCLA
2. Derive rho from approximate min and max eigenvalues of gram matrix

For Matlab based experiments within PDCO, ECOS(IPM), MOSEK and ADMM variants,
ADMM is faster with producing result quality within 1e-4 of MOSEK. I will
publish the numbers and the matlab script through the ECOS jnilib open source
(GPL licensed). I did not add any of ECOS code here so that everything stays
Apache.

For topic modeling use-case, I expect to produce sparse coding results (L1 on
product factors, L2 on user factors)

Example runs:

NMF:

./bin/spark-submit --total-executor-cores 4 --master spark://localhost:7077
--jars ~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar
--class org.apache.spark.examples.mllib.MovieLensALS
./examples/target/spark-examples_2.10-1.1.0-SNAPSHOT.jar --rank 20
--numIterations 10 --userConstraint POSITIVE --lambdaUser 0.065
--productConstraint POSITIVE --lambdaProduct 0.065 --kryo
hdfs://localhost:8020/sandbox/movielens/

Sparse coding:

./bin/spark-submit --total-executor-cores 4 --master spark://localhost:7077
--jars ~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar
--class org.apache.spark.examples.mllib.MovieLensALS
./examples/target/spark-examples_2.10-1.1.0-SNAPSHOT.jar --delimiter --rank
20 --numIterations 10 --userConstraint SMOOTH --lambdaUser 0.065
--productConstraint SPARSE --lambdaProduct 0.065 --kryo
hdfs://localhost:8020/sandbox/movielens

Robust PLSA with least square loss:

./bin/spark-submit --total-executor-cores 4 --master spark://localhost:7077
--jars ~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar
--class org.apache.spark.examples.mllib.MovieLensALS
./examples/target/spark-examples_2.10-1.1.0-SNAPSHOT.jar --delimiter --rank
20 --numIterations 10 --userConstraint EQUALITY --lambdaUser 0.065
--productConstraint EQUALITY --lambdaProduct 0.065 --kryo
hdfs://localhost:8020/sandbox/movielens

With this change, users can select to apply user and product specific
constraint...basically positive factors for products (interpretability) and
smooth for users to get more RMSE improvements.

Thanks.
Deb

was (Author: debasish83):
Hi Xiangrui,

The branch is ready for an initial review. I will do lot of clean-up this week.

I need some advice on whether we should bring the additional ALS features first
or integrate NNLS with QuadraticMinimizer so that we can handle large ranks as
well.

https://github.com/debasish83/spark/commits/qp-als

optimization/QuadraticMinimizer.scala is the placeholder for all
QuadraticMinimization.

Right now we support 5 features:

1. Least square
2. Least square with positivity
3. Least square with bounds : generalization of positivity
4. Least square with equality and positivity/bounds for LDA/PLSA
5. Least square + L1 constraint for sparse NMF

There are lot many regularization in Proximal.scala which can be re-used in
mllib updater...L1Updater in mllib is an example of Proximal algorithm...

QuadraticMinimizer is optimized for direct solve right now (cholesky / lu based
on problem we are solving)

[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191997#comment-14191997
]

Debasish Das commented on SPARK-2426:
-

Matlab comparisons of MOSEK, ECOS, PDCO and ADMM are over here:
https://github.com/debasish83/ecos/blob/master/README.md

MOSEK is available for research purposes. Let me know if there are issues in
running the matlab scripts.

Quadratic Minimization for MLlib ALS

Key: SPARK-2426
URL: https://issues.apache.org/jira/browse/SPARK-2426
Project: Spark
Issue Type: New Feature
Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
Original Estimate: 504h
Remaining Estimate: 504h

Current ALS supports least squares and nonnegative least squares.
I presented ADMM and IPM based Quadratic Minimization solvers to be used for
the following ALS problems:
1. ALS with bounds
2. ALS with L1 regularization
3. ALS with Equality constraint and bounds
Initial runtime comparisons are presented at Spark Summit.
http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
Based on Xiangrui's feedback I am currently comparing the ADMM based
Quadratic Minimization solvers with IPM based QpSolvers and the default
ALS/NNLS. I will keep updating the runtime comparison results.
For integration the detailed plan is as follows:
1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
2. Integrate QuadraticMinimizer in mllib ALS

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4080) IOException: unexpected exception type while deserializing tasks


[ 
https://issues.apache.org/jira/browse/SPARK-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192005#comment-14192005
 ] 

Josh Rosen commented on SPARK-4080:
---

Hi [~kul],

Thanks for trying this out!  I'm glad to see that my patch improved the error 
reporting here.

What do you mean by creating more than one SparkContext?  Are you creating 
multiple concurrently-running SparkContexts in the same driver JVM?

 IOException: unexpected exception type while deserializing tasks
 --

 Key: SPARK-4080
 URL: https://issues.apache.org/jira/browse/SPARK-4080
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0, 1.2.0
Reporter: Josh Rosen
Assignee: Josh Rosen
Priority: Critical
 Fix For: 1.1.1, 1.2.0


 When deserializing tasks on executors, we sometimes see {{IOException: 
 unexpected exception type}}:
 {code}
  java.io.IOException: unexpected exception type
 
 java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538)
 
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1025)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:163)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:745)
 {code}
 Here are some occurrences of this bug reported on the mailing list and GitHub:
 - https://www.mail-archive.com/user@spark.apache.org/msg12129.html
 - 
 http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201409.mbox/%3ccaeawm8uop9tgarm5sceppzey5qxo+h8hu8ujzah5s-ajyzz...@mail.gmail.com%3E
 - https://github.com/yieldbot/flambo/issues/13
 - https://www.mail-archive.com/user@spark.apache.org/msg13283.html
 This is probably caused by throwing exceptions other than IOException from 
 our custom {{readExternal}} methods (see 
 http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/io/ObjectStreamClass.java#1022).
   [~davies] spotted an instance of this in TorrentBroadcast, where a failed 
 {{require}} throws a different exception, but this issue has been reported in 
 Spark 1.1.0 as well.  To fix this, I'm going to add try-catch blocks around 
 all of our {{readExternal}} and {{writeExternal}} methods to re-throw caught 
 exceptions as IOException.
 This fix should allow us to determine the actual exceptions that are causing 
 deserialization failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2189) Method for removing temp tables created by registerAsTable


[ 
https://issues.apache.org/jira/browse/SPARK-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192084#comment-14192084
 ] 

Apache Spark commented on SPARK-2189:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/3039

 Method for removing temp tables created by registerAsTable
 --

 Key: SPARK-2189
 URL: https://issues.apache.org/jira/browse/SPARK-2189
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: Michael Armbrust
Assignee: Cheng Lian
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-4172) Progress API in Python

2014-10-31 Thread Davies Liu (JIRA)

Davies Liu created SPARK-4172:
-

 Summary: Progress API in Python
 Key: SPARK-4172
 URL: https://issues.apache.org/jira/browse/SPARK-4172
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Reporter: Davies Liu


The poll based progress API for Python



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4172) Progress API in Python


[ 
https://issues.apache.org/jira/browse/SPARK-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192106#comment-14192106
 ] 

Apache Spark commented on SPARK-4172:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/3027

 Progress API in Python
 --

 Key: SPARK-4172
 URL: https://issues.apache.org/jira/browse/SPARK-4172
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Reporter: Davies Liu
Assignee: Davies Liu

 The poll based progress API for Python



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-4016) Allow user to optionally show additional, advanced metrics in the UI


 [ 
https://issues.apache.org/jira/browse/SPARK-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-4016.
---
Resolution: Fixed

Issue resolved by pull request 2867
[https://github.com/apache/spark/pull/2867]

 Allow user to optionally show additional, advanced metrics in the UI
 

 Key: SPARK-4016
 URL: https://issues.apache.org/jira/browse/SPARK-4016
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Kay Ousterhout
Assignee: Kay Ousterhout
Priority: Minor
 Fix For: 1.2.0


 Allowing the user to show/hide additional metrics will allow us to both (1) 
 add more advanced metrics without cluttering the UI for the average user and 
 (2) hide, by default, some of the metrics currently shown that are not widely 
 used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4141) Hide Accumulators column on stage page when no accumulators exist


 [ 
https://issues.apache.org/jira/browse/SPARK-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-4141:
--
Assignee: (was: Josh Rosen)

 Hide Accumulators column on stage page when no accumulators exist
 -

 Key: SPARK-4141
 URL: https://issues.apache.org/jira/browse/SPARK-4141
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Kay Ousterhout
Priority: Minor
  Labels: starter

 The task table on the details page for each stage has a column for 
 accumulators. We should only show this column if the stage has accumulators, 
 otherwise it clutters the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-4141) Hide Accumulators column on stage page when no accumulators exist


 [ 
https://issues.apache.org/jira/browse/SPARK-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reassigned SPARK-4141:
-

Assignee: Josh Rosen

 Hide Accumulators column on stage page when no accumulators exist
 -

 Key: SPARK-4141
 URL: https://issues.apache.org/jira/browse/SPARK-4141
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Kay Ousterhout
Assignee: Josh Rosen
Priority: Minor
  Labels: starter

 The task table on the details page for each stage has a column for 
 accumulators. We should only show this column if the stage has accumulators, 
 otherwise it clutters the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3987) NNLS generates incorrect result

2014-10-31 Thread Shuo Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192165#comment-14192165
 ] 

Shuo Xiang commented on SPARK-3987:
---

[~debasish83][~mengxr] The condition number for the latest test case is 74.5 
and  the test case I put in my PR was 2.

 NNLS generates incorrect result
 ---

 Key: SPARK-3987
 URL: https://issues.apache.org/jira/browse/SPARK-3987
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Debasish Das
Assignee: Shuo Xiang
 Fix For: 1.1.1, 1.2.0


 Hi,
 Please see the example gram matrix and linear term:
 val P2 = new DoubleMatrix(20, 20, 333907.312770, -60814.043975, 
 207935.829941, -162881.367739, -43730.396770, 17511.428983, -243340.496449, 
 -225245.957922, 104700.445881, 32430.845099, 336378.693135, -373497.970207, 
 -41147.159621, 53928.060360, -293517.883778, 53105.278068, 0.00, 
 -85257.781696, 84913.970469, -10584.080103, -60814.043975, 13826.806664, 
 -38032.612640, 33475.833875, 10791.916809, -1040.950810, 48106.552472, 
 45390.073380, -16310.282190, -2861.455903, -60790.833191, 73109.516544, 
 9826.614644, -8283.992464, 56991.742991, -6171.366034, 0.00, 
 19152.382499, -13218.721710, 2793.734234, 207935.829941, -38032.612640, 
 129661.677608, -101682.098412, -27401.299347, 10787.713362, -151803.006149, 
 -140563.601672, 65067.935324, 20031.263383, 209521.268600, -232958.054688, 
 -25764.179034, 33507.951918, -183046.845592, 32884.782835, 0.00, 
 -53315.811196, 52770.762546, -6642.187643, -162881.367739, 33475.833875, 
 -101682.098412, 85094.407608, 25422.850782, -5437.646141, 124197.166330, 
 116206.265909, -47093.484134, -11420.168521, -163429.436848, 189574.783900, 
 23447.172314, -24087.375367, 148311.355507, -20848.385466, 0.00, 
 46835.814559, -38180.352878, 6415.873901, -43730.396770, 10791.916809, 
 -27401.299347, 25422.850782, 8882.869799, 15.638084, 35933.473986, 
 34186.371325, -10745.330690, -974.314375, -43537.709621, 54371.010558, 
 7894.453004, -5408.929644, 42231.381747, -3192.010574, 0.00, 
 15058.753110, -8704.757256, 2316.581535, 17511.428983, -1040.950810, 
 10787.713362, -5437.646141, 15.638084, 2794.949847, -9681.950987, 
 -8258.171646, 7754.358930, 4193.359412, 18052.143842, -15456.096769, 
 -253.356253, 4089.672804, -12524.380088, 5651.579348, 0.00, -1513.302547, 
 6296.461898, 152.427321, -243340.496449, 48106.552472, -151803.006149, 
 124197.166330, 35933.473986, -9681.950987, 182931.600236, 170454.352953, 
 -72361.174145, -19270.461728, -244518.179729, 279551.060579, 33340.452802, 
 -37103.267653, 219025.288975, -33687.141423, 0.00, 67347.950443, 
 -58673.009647, 8957.800259, -225245.957922, 45390.073380, -140563.601672, 
 116206.265909, 34186.371325, -8258.171646, 170454.352953, 159322.942894, 
 -66074.960534, -16839.743193, -226173.967766, 260421.044094, 31624.194003, 
 -33839.612565, 203889.695169, -30034.828909, 0.00, 63525.040745, 
 -53572.741748, 8575.071847, 104700.445881, -16310.282190, 65067.935324, 
 -47093.484134, -10745.330690, 7754.358930, -72361.174145, -66074.960534, 
 35869.598076, 13378.653317, 106033.647837, -111831.682883, -10455.465743, 
 18537.392481, -88370.612394, 20344.288488, 0.00, -22935.482766, 
 29004.543704, -2409.461759, 32430.845099, -2861.455903, 20031.263383, 
 -11420.168521, -974.314375, 4193.359412, -19270.461728, -16839.743193, 
 13378.653317, 6802.081898, 33256.395091, -30421.985199, -1296.785870, 
 7026.518692, -24443.378205, 9221.982599, 0.00, -4088.076871, 
 10861.014242, -25.092938, 336378.693135, -60790.833191, 209521.268600, 
 -163429.436848, -43537.709621, 18052.143842, -244518.179729, -226173.967766, 
 106033.647837, 33256.395091, 339200.268106, -375442.716811, -41027.594509, 
 54636.778527, -295133.248586, 54177.278365, 0.00, -85237.666701, 
 85996.957056, -10503.209968, -373497.970207, 73109.516544, -232958.054688, 
 189574.783900, 54371.010558, -15456.096769, 279551.060579, 260421.044094, 
 -111831.682883, -30421.985199, -375442.716811, 427793.208465, 50528.074431, 
 -57375.986301, 335203.382015, -52676.385869, 0.00, 102368.307670, 
 -90679.792485, 13509.390393, -41147.159621, 9826.614644, -25764.179034, 
 23447.172314, 7894.453004, -253.356253, 33340.452802, 31624.194003, 
 -10455.465743, -1296.785870, -41027.594509, 50528.074431, 7255.977434, 
 -5281.636812, 39298.355527, -3440.450858, 0.00, 13717.870243, 
 -8471.405582, 2071.812204, 53928.060360, -8283.992464, 33507.951918, 
 -24087.375367, -5408.929644, 4089.672804, -37103.267653, -33839.612565, 
 18537.392481, 7026.518692, 54636.778527, -57375.986301, -5281.636812, 
 9735.061160, -45360.674033, 10634.633559, 0.00, -11652.364691, 
 15039.566630, -1202.539106, -293517.883778, 56991.742991, -183046.845592,

[jira] [Commented] (SPARK-4079) Snappy bundled with Spark does not work on older Linux distributions

2014-10-31 Thread Kostas Sakellis (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192188#comment-14192188
 ] 

Kostas Sakellis commented on SPARK-4079:


yes, I'm taking this over from Marcelo.

 Snappy bundled with Spark does not work on older Linux distributions
 

 Key: SPARK-4079
 URL: https://issues.apache.org/jira/browse/SPARK-4079
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Marcelo Vanzin

 This issue has existed at least since 1.0, but has been made worse by 1.1 
 since snappy is now the default compression algorithm. When trying to use it 
 on a CentOS 5 machine, for example, you'll get something like this:
 {noformat}
   java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
 org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:319)
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:226)
at org.xerial.snappy.Snappy.clinit(Snappy.java:48)
at 
 org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79)
at 
 org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
at 
 org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207)
...
Caused by: java.lang.UnsatisfiedLinkError: 
 /tmp/snappy-1.0.5.3-af72bf3c-9dab-43af-a662-f9af657f06b1-libsnappyjava.so: 
 /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by 
 /tmp/snappy-1.0.5.3-af72bf3c-9dab-43af-a662-f9af657f06b1-libsnappyjava.so)
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1957)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1882)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1843)
at java.lang.Runtime.load0(Runtime.java:795)
at java.lang.System.load(System.java:1061)
at 
 org.xerial.snappy.SnappyNativeLoader.load(SnappyNativeLoader.java:39)
... 29 more
 {noformat}
 There are two approaches I can see here (well, 3):
 * Declare CentOS 5 (and similar OSes) not supported, although that would suck 
 for the people who are still on it and already use Spark
 * Fallback to another compression codec if Snappy cannot be loaded
 * Ask the Snappy guys to compile the library on an older OS...
 I think the second would be the best compromise.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3826) Support JDBC/ODBC server with Hive 0.13.1


 [ 
https://issues.apache.org/jira/browse/SPARK-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3826.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2685
[https://github.com/apache/spark/pull/2685]

 Support JDBC/ODBC server with Hive 0.13.1
 -

 Key: SPARK-3826
 URL: https://issues.apache.org/jira/browse/SPARK-3826
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: wangfei
Assignee: wangfei
Priority: Blocker
 Fix For: 1.2.0


 Now hive-thriftserver not support hive-0.13, to make it support both 0.12 and 
 0.13



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-4077) A broken string timestamp value can Spark SQL return wrong values for valid string timestamp values


 [ 
https://issues.apache.org/jira/browse/SPARK-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-4077.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 3019
[https://github.com/apache/spark/pull/3019]

 A broken string timestamp value can Spark SQL return wrong values for valid 
 string timestamp values
 ---

 Key: SPARK-4077
 URL: https://issues.apache.org/jira/browse/SPARK-4077
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Yin Huai
Assignee: Venkata Ramana G
 Fix For: 1.2.0


 The following case returns wrong results.
 The text file is 
 {code}
 2014-12-11 00:00:00,1
 2014-12-11astring00:00:00,2
 {code}
 The DDL statement and the query are shown below...
 {code}
 sql(
 create external table date_test(my_date timestamp, id int)
 row format delimited
 fields terminated by ','
 lines terminated by '\n'
 LOCATION 'dateTest'
 )
 sql(select * from date_test).collect.foreach(println)
 {code}
 The result is 
 {code}
 [1969-12-31 19:00:00.0,1]
 [null,2]
 {code}
 If I change the data to 
 {code}
 2014-12-11 00:00:00,1
 2014-12-11 00:00:00,2
 {code}
 The result is fine.
 For the data with broken string timestamp value, I tried runSqlHive. The 
 result is fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-4154) Query does not work if it has not between in Spark SQL and HQL


 [ 
https://issues.apache.org/jira/browse/SPARK-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-4154.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 3017
[https://github.com/apache/spark/pull/3017]

 Query does not work if it has not between  in Spark SQL and HQL
 -

 Key: SPARK-4154
 URL: https://issues.apache.org/jira/browse/SPARK-4154
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Ravindra Pesala
Assignee: Ravindra Pesala
 Fix For: 1.2.0


 if the query contains not between does not work.
 {code}
 SELECT * FROM src where key not between 10 and 20
 {code}
 It gives the following error
 {code}
 Exception in thread main java.lang.RuntimeException: 
 Unsupported language features in query: SELECT * FROM src where key not 
 between 10 and 20
 TOK_QUERY
   TOK_FROM
 TOK_TABREF
   TOK_TABNAME
 src
   TOK_INSERT
 TOK_DESTINATION
   TOK_DIR
 TOK_TMP_FILE
 TOK_SELECT
   TOK_SELEXPR
 TOK_ALLCOLREF
 TOK_WHERE
   TOK_FUNCTION
 between
 KW_TRUE
 TOK_TABLE_OR_COL
   key
 10
 20
 scala.NotImplementedError: No parse rules for ASTNode type: 256, text: 
 KW_TRUE :
 KW_TRUE
  +
  
 org.apache.spark.sql.hive.HiveQl$.nodeToExpr(HiveQl.scala:1088)
 
   at scala.sys.package$.error(package.scala:27)
   at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:251)
   at 
 org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:50)
   at 
 org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:49)
   at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
   at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
   at 
 scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
   at 
 scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
   at 
 scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2220) Fix remaining Hive Commands


 [ 
https://issues.apache.org/jira/browse/SPARK-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2220.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 3038
[https://github.com/apache/spark/pull/3038]

 Fix remaining Hive Commands
 ---

 Key: SPARK-2220
 URL: https://issues.apache.org/jira/browse/SPARK-2220
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Cheng Lian
 Fix For: 1.2.0


 None of the following have an execution plan:
 {code}
 private[hive] case class ShellCommand(cmd: String) extends Command
 private[hive] case class SourceCommand(filePath: String) extends Command
 private[hive] case class AddFile(filePath: String) extends Command
 {code}
 dfs is being fixed in a related PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-4173) EdgePartitionBuilder uses wrong value for first clustered index

2014-10-31 Thread Ankur Dave (JIRA)

Ankur Dave created SPARK-4173:
-

 Summary: EdgePartitionBuilder uses wrong value for first clustered 
index
 Key: SPARK-4173
 URL: https://issues.apache.org/jira/browse/SPARK-4173
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 1.1.0, 1.0.2, 1.2.0
Reporter: Ankur Dave
Assignee: Ankur Dave


Lines 48 and 49 in EdgePartitionBuilder reference {{srcIds}} before it has been 
initialized, causing an incorrect value to be stored for the first cluster.

https://github.com/apache/spark/blob/23468e7e96bf047ba53806352558b9d661567b23/graphx/src/main/scala/org/apache/spark/graphx/impl/EdgePartitionBuilder.scala#L48-49



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-4141) Hide Accumulators column on stage page when no accumulators exist


 [ 
https://issues.apache.org/jira/browse/SPARK-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-4141.
---
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 3031
[https://github.com/apache/spark/pull/3031]

 Hide Accumulators column on stage page when no accumulators exist
 -

 Key: SPARK-4141
 URL: https://issues.apache.org/jira/browse/SPARK-4141
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Kay Ousterhout
Priority: Minor
  Labels: starter
 Fix For: 1.2.0


 The task table on the details page for each stage has a column for 
 accumulators. We should only show this column if the stage has accumulators, 
 otherwise it clutters the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-4174) Optionally provide notifications to Receivers when DStream has been generated

2014-10-31 Thread Hari Shreedharan (JIRA)

Hari Shreedharan created SPARK-4174:
---

 Summary: Optionally provide notifications to Receivers when 
DStream has been generated
 Key: SPARK-4174
 URL: https://issues.apache.org/jira/browse/SPARK-4174
 Project: Spark
  Issue Type: Bug
Reporter: Hari Shreedharan


Receivers receiving data from Message Queues, like Active MQ, Kafka etc can 
replay messages if required. Using the HDFS WAL mechanism for such systems 
affects efficiency as we are incurring an unnecessary HDFS write when we can 
recover the data from the queue anyway.

We can fix this by providing a notification to the receiver when the RDD is 
generated from the blocks. We need to consider the case where a receiver might 
fail before the RDD is generated and come back on a different executor when the 
RDD is generated. Either way, this is likely to cause duplicates and not data 
loss -- so we may be ok.

I am thinking about something of the order of accepting a callback function 
which gets called when the RDD is generated. We can keep the function local in 
a map of batch id - function, which gets called when the function gets 
generated (we can inform the ReceiverSupervisorImpl via Akka when the driver 
generates the RDD). Of course, just an early thought - I will work on a design 
doc for this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4174) Streaming: Optionally provide notifications to Receivers when DStream has been generated

2014-10-31 Thread Hari Shreedharan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Shreedharan updated SPARK-4174:

Summary: Streaming: Optionally provide notifications to Receivers when 
DStream has been generated  (was: Optionally provide notifications to Receivers 
when DStream has been generated)

 Streaming: Optionally provide notifications to Receivers when DStream has 
 been generated
 

 Key: SPARK-4174
 URL: https://issues.apache.org/jira/browse/SPARK-4174
 Project: Spark
  Issue Type: Bug
Reporter: Hari Shreedharan
Assignee: Hari Shreedharan

 Receivers receiving data from Message Queues, like Active MQ, Kafka etc can 
 replay messages if required. Using the HDFS WAL mechanism for such systems 
 affects efficiency as we are incurring an unnecessary HDFS write when we can 
 recover the data from the queue anyway.
 We can fix this by providing a notification to the receiver when the RDD is 
 generated from the blocks. We need to consider the case where a receiver 
 might fail before the RDD is generated and come back on a different executor 
 when the RDD is generated. Either way, this is likely to cause duplicates and 
 not data loss -- so we may be ok.
 I am thinking about something of the order of accepting a callback function 
 which gets called when the RDD is generated. We can keep the function local 
 in a map of batch id - function, which gets called when the function gets 
 generated (we can inform the ReceiverSupervisorImpl via Akka when the driver 
 generates the RDD). Of course, just an early thought - I will work on a 
 design doc for this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4174) Streaming: Optionally provide notifications to Receivers when DStream has been generated

2014-10-31 Thread Hari Shreedharan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Shreedharan updated SPARK-4174:

Issue Type: Improvement  (was: Bug)

 Streaming: Optionally provide notifications to Receivers when DStream has 
 been generated
 

 Key: SPARK-4174
 URL: https://issues.apache.org/jira/browse/SPARK-4174
 Project: Spark
  Issue Type: Improvement
Reporter: Hari Shreedharan
Assignee: Hari Shreedharan

 Receivers receiving data from Message Queues, like Active MQ, Kafka etc can 
 replay messages if required. Using the HDFS WAL mechanism for such systems 
 affects efficiency as we are incurring an unnecessary HDFS write when we can 
 recover the data from the queue anyway.
 We can fix this by providing a notification to the receiver when the RDD is 
 generated from the blocks. We need to consider the case where a receiver 
 might fail before the RDD is generated and come back on a different executor 
 when the RDD is generated. Either way, this is likely to cause duplicates and 
 not data loss -- so we may be ok.
 I am thinking about something of the order of accepting a callback function 
 which gets called when the RDD is generated. We can keep the function local 
 in a map of batch id - function, which gets called when the function gets 
 generated (we can inform the ReceiverSupervisorImpl via Akka when the driver 
 generates the RDD). Of course, just an early thought - I will work on a 
 design doc for this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2014-10-31 Thread Nicholas Chammas (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192273#comment-14192273
 ] 

Nicholas Chammas commented on SPARK-3821:
-

Hey folks, I was hoping to post a design doc here this week and get feedback 
but I will have to push that back to next week. Been very busy this week and 
will be away from a computer all weekend. Apologies. 

 Develop an automated way of creating Spark images (AMI, Docker, and others)
 ---

 Key: SPARK-3821
 URL: https://issues.apache.org/jira/browse/SPARK-3821
 Project: Spark
  Issue Type: Improvement
  Components: Build, EC2
Reporter: Nicholas Chammas
Assignee: Nicholas Chammas

 Right now the creation of Spark AMIs or Docker containers is done manually. 
 With tools like [Packer|http://www.packer.io/], we should be able to automate 
 this work, and do so in such a way that multiple types of machine images can 
 be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-4150) rdd.setName returns None in PySpark


 [ 
https://issues.apache.org/jira/browse/SPARK-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-4150.
---
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 3011
[https://github.com/apache/spark/pull/3011]

 rdd.setName returns None in PySpark
 ---

 Key: SPARK-4150
 URL: https://issues.apache.org/jira/browse/SPARK-4150
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Trivial
 Fix For: 1.2.0


 We should return self so we can do 
 {code}
 rdd.setName('abc').cache().count()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1267) Add a pip installer for PySpark

2014-10-31 Thread Davies Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192279#comment-14192279
 ] 

Davies Liu commented on SPARK-1267:
---

Because PySpark depends on Spark packages, Python user can not use it after 
'pip install pyspark',  so there is not too much benefits from this.

Once we release PySpark separated from Spark, then we should keep the 
compatability across versions of PySpark and Spark, it will be a nightmare for 
us (we can not move fast to improve the implementation of PySpark).

So, I think we can not do this in near future. [~prabinb], do you mind to close 
the PR?


 Add a pip installer for PySpark
 ---

 Key: SPARK-1267
 URL: https://issues.apache.org/jira/browse/SPARK-1267
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Prabin Banka
Priority: Minor
  Labels: pyspark

 Please refer to this mail archive,
 http://mail-archives.apache.org/mod_mbox/spark-user/201311.mbox/%3CCAOEPXP7jKiw-3M8eh2giBcs8gEkZ1upHpGb=fqoucvscywj...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3870) EOL character enforcement


 [ 
https://issues.apache.org/jira/browse/SPARK-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-3870.
---
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2726
[https://github.com/apache/spark/pull/2726]

 EOL character enforcement
 -

 Key: SPARK-3870
 URL: https://issues.apache.org/jira/browse/SPARK-3870
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 1.2.0
Reporter: Kousuke Saruta
Priority: Minor
 Fix For: 1.2.0


 We have shell scripts and Windows batch files, so we should enforce proper 
 EOL character.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3640) KinesisUtils should accept a credentials object instead of forcing DefaultCredentialsProvider

2014-10-31 Thread Chris Fregly (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192411#comment-14192411
 ] 

Chris Fregly commented on SPARK-3640:
-

Agreed that this was no ideal when i first chose this implementation.  And as 
you mentioned, the NotSerializableException is exactly why I went with the 
DefaultCredentialsProvider.

So I spent some time trying to solve this using AWS IAM Roles on separate users 
under your root AWS account.  This appears to work well with the existing 
DefaultCredentialsProvider.

Is this a viable option for you?  

Basically, every user would get their own ACCESS_KEY_ID and SECRET_KEY.  This 
would be used in place of the root credentials.

For thoroughness, I've included links to the instructions as well as an example 
IAM Policy JSON (I'll also add this to the Spark Kinesis Developer Guide 
(http://spark.apache.org/docs/latest/streaming-kinesis-integration.html):

Creating IAM users
http://docs.aws.amazon.com/IAM/latest/UserGuide/Using_SettingUpUser.html
https://console.aws.amazon.com/iam/home?#security_credential 

Setting up Kinesis, DynamoDB, and CloudWatch IAM Policy for the new users
http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-using-iam.html

IAM Policy Generator
http://awspolicygen.s3.amazonaws.com/policygen.html

Attaching the Custom Policy 
https://console.aws.amazon.com/iam/home?#users
Select the user
Select Attach Policy
Select Custom Policy

IAM Policy JSON 
This is already generated using the Policy Generator above... just fill 
in the missing pieces specific to your environment.
{
  Statement: [
{
  Sid: Stmt1414784467497,
  Action: kinesis:*,
  Effect: Allow,
  Resource: 
arn:aws:kinesis:region-of-stream:aws-account-id:stream/stream-name
},
{
  Sid: Stmt1414784693732,
  Action: dynamodb:*,
  Effect: Allow,
  Resource: 
arn:aws:dynamodb:us-east-1:aws-account-id:table/dynamodb-tablename
},
{
  Sid: Stmt1414785131046,
  Action: cloudwatch:*,
  Effect: Allow,
  Resource: *
}
  ]
}

Notes:
* The region of the DynamoDB table is intentionally hard-coded to us-east-1 as 
this is how Kinesis currently works
* The DynamoDB table is the same as the application name of the Kinesis 
Streaming Application.  The sample included with the Spark distribution uses 
KinesisWordCount for the application/table name.


Is this a sufficient workaround.  Using IAM Policies is an AWS best practice, 
but not sure if this aligns with your existing environment.  If not, I can 
continue to investigate exposing that CredentialsProvider

Lemme know, Aniket!


 KinesisUtils should accept a credentials object instead of forcing 
 DefaultCredentialsProvider
 -

 Key: SPARK-3640
 URL: https://issues.apache.org/jira/browse/SPARK-3640
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.1.0
Reporter: Aniket Bhatnagar
  Labels: kinesis

 KinesisUtils should accept AWS Credentials as a parameter and should default 
 to DefaultCredentialsProvider if no credentials are provided. Currently, the 
 implementation forces usage of DefaultCredentialsProvider which can be a pain 
 especially when jobs are run by multiple  unix users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4158) Spark throws exception when Mesos resources are missing

2014-10-31 Thread RJ Nowling (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192448#comment-14192448
 ] 

RJ Nowling commented on SPARK-4158:
---

I verified that the associated patch fixes this issue on our local cluster 
running Spark 1.1.0 and Mesos 0.21.

 Spark throws exception when Mesos resources are missing
 ---

 Key: SPARK-4158
 URL: https://issues.apache.org/jira/browse/SPARK-4158
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.1.0
Reporter: Brenden Matthews

 Spark throws an exception when trying to check resources which haven't been 
 offered by Mesos.  This is an error in Spark, and should be corrected as 
 such.  Here's a sample:
 {code}
 val data Exception in thread Thread-41 java.lang.IllegalArgumentException: 
 No resource called cpus in [name: mem
 type: SCALAR
 scalar {
   value: 2067.0
 }
 role: *
 , name: disk
 type: SCALAR
 scalar {
   value: 900.0
 }
 role: *
 , name: ports
 type: RANGES
 ranges {
   range {
 begin: 31000
 end: 32000
   }
 }
 role: *
 ]
 at 
 org.apache.spark.scheduler.cluster.mesos.CoarseMesosSchedulerBackend.org$apache$spark$scheduler$cluster$mesos$CoarseMesosSchedulerBackend$$getResource(CoarseMesosSchedulerBackend.scala:236)
 at 
 org.apache.spark.scheduler.cluster.mesos.CoarseMesosSchedulerBackend$$anonfun$resourceOffers$1.apply(CoarseMesosSchedulerBackend.scala:200)
 at 
 org.apache.spark.scheduler.cluster.mesos.CoarseMesosSchedulerBackend$$anonfun$resourceOffers$1.apply(CoarseMesosSchedulerBackend.scala:197)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
 at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
 at 
 org.apache.spark.scheduler.cluster.mesos.CoarseMesosSchedulerBackend.resourceOffers(CoarseMesosSchedulerBackend.scala:197)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2014-10-31 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192491#comment-14192491
]

Zhijie Shen commented on SPARK-1537:

bq. BTW, if you want a list of things I think are important for Spark, here are
some quick ones:

Thanks for sharing the details, which are more helpful to clean up the puzzles
than some big but vague statement. Let me go through the aforementioned Jiras:

* YARN-2521: I'd like to keep it open for some further client improvement, such
as local timeline data caching, while YARN-2673 already made the client retry
when the server temporally doesn't respond. Please note that I think it's
pretty critical when you can't upload your data because the server is down is
*no longer true* after YARN-2673. On the other side, At the point of view of
the API, it should keep stable.

* YARN-2423: This is proposed to improve the Java libs by adding GET APIs. They
are used to query data, NOT to put data. We do this to help the use case that
the developers write Java code to implement the UI to analyze the timeline
data. Framework integration mainly deals with PUT APIs, and the Java client
libs are already there. Take one step back, apart from the client libs, the
RESTful APIs are always there, which is programming language neutral, and
useful to non-Java developers.

* YARN-2444: It's may be a bug or an improper use case. According to the
exception, the user doesn't pass the authorization for some reason. It is
reported for 2.5, and is probably no longer valid after we fixed a bunch of
security issues for 2.6. We need to do more validation for this issue before a
conclusion. Anyway, it's obviously an internal issue happening in secure mode
only, which should not the API CHANGES.

bq. I understand it doesn't affect the client API and we can still have the
code in,

It seems that we have the agreement that the current timeline service offering
is not blocking the Spark integration work.

Add integration with Yarn's Application Timeline Server
---

Key: SPARK-1537
URL: https://issues.apache.org/jira/browse/SPARK-1537
Project: Spark
Issue Type: New Feature
Components: YARN
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin

It would be nice to have Spark integrate with Yarn's Application Timeline
Server (see YARN-321, YARN-1530). This would allow users running Spark on
Yarn to have a single place to go for all their history needs, and avoid
having to manage a separate service (Spark's built-in server).
At the moment, there's a working version of the ATS in the Hadoop 2.4 branch,
although there is still some ongoing work. But the basics are there, and I
wouldn't expect them to change (much) at this point.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2014-10-31 Thread Marcelo Vanzin (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192502#comment-14192502
]

Marcelo Vanzin commented on SPARK-1537:
---

bq. This is proposed to improve the Java libs by adding GET APIs. They are used
to query data, NOT to put data.

Spark needs both to put and read data, otherwise the ATS is useless for Spark.
The current goal of Spark is to use the ATS as a store for its history data,
since the data itself is not considered public and stable itself.

So there is no point in integration if you can only write data. (I know you can
read data through other means, but I don't want to write a custom REST client
just to get ATS support in.)

bq. It is reported for 2.5, and is probably no longer valid after we fixed a
bunch of security issues for 2.6.

I'm not sure why you say it's security-related since there nothing
security-related in the example code I posted. And if something doesn't work in
2.5 but works in 2.6, it means we (and by that I mean Spark) have to restrict
our support to the versions where things work - even if the underlying API is
exactly the same.

Add integration with Yarn's Application Timeline Server
---

Key: SPARK-1537
URL: https://issues.apache.org/jira/browse/SPARK-1537
Project: Spark
Issue Type: New Feature
Components: YARN
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2014-10-31 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192538#comment-14192538
]

Zhijie Shen commented on SPARK-1537:

bq. Spark needs both to put and read data

It's again a vague statement. Can you share your design detail, such that we
can evaluate it is really necessary?
And what is the actual way of visualizing data? And integration work is not
just single bug fix patch, we can divide work into a sequent of sub tasks, and
the first step is to enable Spark job to be able to putting the data into the
timeline server. By doing this, not only Spark's only web front can visualize
job history, it also enable the third-party tools to do Spark job analysis too.

bq. I'm not sure why you say it's security-related since there nothing
security-related in the example code I posted.

I said According to the exception, the user doesn't pass the authorization for
some reason. If you don't agree on it, please post your investigation on
YARN-2444, YARN folks will help you on this issue.

bq. if something doesn't work in 2.5 but works in 2.6,

No matter the integration with timeline service, Spark on YARN is picking
Hadoop versions now. It doesn't make sense to ask for a feature by using an
early version that hasn't it.

Add integration with Yarn's Application Timeline Server
---

Key: SPARK-1537
URL: https://issues.apache.org/jira/browse/SPARK-1537
Project: Spark
Issue Type: New Feature
Components: YARN
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2014-10-31 Thread Marcelo Vanzin (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192561#comment-14192561
]

Marcelo Vanzin commented on SPARK-1537:
---

bq. It's again a vague statement.

I don't know what is vague about wanting to read the data you write.

bq. Can you share your design detail

I already did way better than that, way earlier in this bug: I shared the
actual code. For this particular question, here it is:
https://github.com/vanzin/spark/blob/yarn-timeline/yarn/timeline/src/main/scala/org/apache/spark/deploy/yarn/timeline/YarnTimelineProvider.scala

See how it reads data from the ATS? It feeds it into the Spark history server,
where the data can be visualized. It's using Yarn internal APIs, which is
generally bad practice.

bq. If you don't agree on it, please post your investigation on YARN-2444,
YARN folks will help you on this issue.

I posted the error and the code to reproduce it. I don't know what else do you
expect from me. If you think it's an authorization issue, test it with 2.6 and
close the bug if you believe it's fixed.

bq. No matter the integration with timeline service, Spark on YARN is picking
Hadoop versions now. It doesn't make sense to ask for a feature by using an
early version that hasn't it.

I'm not sure I really understood what you're trying to say here. Yes, we have
to pick versions. We need a version that supports the features we need. Even if
the API in 2.5 didn't change in 2.6, it seems to have bugs that prevent my
current code from working, so there is no point in trying to integrate with 2.5
as far as I'm concerned. And as far as I know, 2.6 hasn't been released yet.
(BTW, my code used to work with 2.4.)

Add integration with Yarn's Application Timeline Server
---

Key: SPARK-1537
URL: https://issues.apache.org/jira/browse/SPARK-1537
Project: Spark
Issue Type: New Feature
Components: YARN
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-4175) Exception on stage page

Sandy Ryza created SPARK-4175:
-

 Summary: Exception on stage page
 Key: SPARK-4175
 URL: https://issues.apache.org/jira/browse/SPARK-4175
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sandy Ryza
Priority: Critical


{code}
14/10/31 14:52:58 WARN servlet.ServletHandler: /stages/stage/
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:313)
at scala.None$.get(Option.scala:311)
at org.apache.spark.ui.jobs.StagePage.taskRow(StagePage.scala:331)
at 
org.apache.spark.ui.jobs.StagePage$$anonfun$8.apply(StagePage.scala:173)
at 
org.apache.spark.ui.jobs.StagePage$$anonfun$8.apply(StagePage.scala:173)
at 
org.apache.spark.ui.UIUtils$$anonfun$listingTable$2.apply(UIUtils.scala:282)
at 
org.apache.spark.ui.UIUtils$$anonfun$listingTable$2.apply(UIUtils.scala:282)
at scala.collection.immutable.Stream.map(Stream.scala:376)
at org.apache.spark.ui.UIUtils$.listingTable(UIUtils.scala:282)
at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:171)
at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68)
at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68)
at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:68)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1496)
at 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
{code}

I'm guessing this was caused by SPARK-4016?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4016) Allow user to optionally show additional, advanced metrics in the UI


[ 
https://issues.apache.org/jira/browse/SPARK-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192604#comment-14192604
 ] 

Sandy Ryza commented on SPARK-4016:
---

It looks like after this change, stage-level summary metrics no longer include 
in-progress tasks.  Is this on purpose?

 Allow user to optionally show additional, advanced metrics in the UI
 

 Key: SPARK-4016
 URL: https://issues.apache.org/jira/browse/SPARK-4016
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Kay Ousterhout
Assignee: Kay Ousterhout
Priority: Minor
 Fix For: 1.2.0


 Allowing the user to show/hide additional metrics will allow us to both (1) 
 add more advanced metrics without cluttering the UI for the average user and 
 (2) hide, by default, some of the metrics currently shown that are not widely 
 used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4016) Allow user to optionally show additional, advanced metrics in the UI


[ 
https://issues.apache.org/jira/browse/SPARK-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192609#comment-14192609
 ] 

Sandy Ryza commented on SPARK-4016:
---

Also, it looks like this can cause an exception: SPARK-4175

 Allow user to optionally show additional, advanced metrics in the UI
 

 Key: SPARK-4016
 URL: https://issues.apache.org/jira/browse/SPARK-4016
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Kay Ousterhout
Assignee: Kay Ousterhout
Priority: Minor
 Fix For: 1.2.0


 Allowing the user to show/hide additional metrics will allow us to both (1) 
 add more advanced metrics without cluttering the UI for the average user and 
 (2) hide, by default, some of the metrics currently shown that are not widely 
 used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4175) Exception on stage page


[ 
https://issues.apache.org/jira/browse/SPARK-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192608#comment-14192608
 ] 

Apache Spark commented on SPARK-4175:
-

User 'sryza' has created a pull request for this issue:
https://github.com/apache/spark/pull/3043

 Exception on stage page
 ---

 Key: SPARK-4175
 URL: https://issues.apache.org/jira/browse/SPARK-4175
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sandy Ryza
Priority: Critical

 {code}
 14/10/31 14:52:58 WARN servlet.ServletHandler: /stages/stage/
 java.util.NoSuchElementException: None.get
   at scala.None$.get(Option.scala:313)
   at scala.None$.get(Option.scala:311)
   at org.apache.spark.ui.jobs.StagePage.taskRow(StagePage.scala:331)
   at 
 org.apache.spark.ui.jobs.StagePage$$anonfun$8.apply(StagePage.scala:173)
   at 
 org.apache.spark.ui.jobs.StagePage$$anonfun$8.apply(StagePage.scala:173)
   at 
 org.apache.spark.ui.UIUtils$$anonfun$listingTable$2.apply(UIUtils.scala:282)
   at 
 org.apache.spark.ui.UIUtils$$anonfun$listingTable$2.apply(UIUtils.scala:282)
   at scala.collection.immutable.Stream.map(Stream.scala:376)
   at org.apache.spark.ui.UIUtils$.listingTable(UIUtils.scala:282)
   at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:171)
   at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68)
   at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68)
   at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:68)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
   at 
 org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
   at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1496)
   at 
 org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164)
   at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
   at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
   at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
   at org.eclipse.jetty.server.Server.handle(Server.java:370)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
   at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
   at 
 org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
   at 
 org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
   at 
 org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
   at java.lang.Thread.run(Thread.java:722)
 {code}
 I'm guessing this was caused by SPARK-4016?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3561) Allow for pluggable execution contexts in Spark


 [ 
https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-3561:
-
Fix Version/s: (was: 1.2.0)

 Allow for pluggable execution contexts in Spark
 ---

 Key: SPARK-3561
 URL: https://issues.apache.org/jira/browse/SPARK-3561
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Oleg Zhurakousky
  Labels: features
 Attachments: SPARK-3561.pdf


 Currently Spark provides integration with external resource-managers such as 
 Apache Hadoop YARN, Mesos etc. Specifically in the context of YARN, the 
 current architecture of Spark-on-YARN can be enhanced to provide 
 significantly better utilization of cluster resources for large scale, batch 
 and/or ETL applications when run alongside other applications (Spark and 
 others) and services in YARN. 
 Proposal: 
 The proposed approach would introduce a pluggable JobExecutionContext (trait) 
 - a gateway and a delegate to Hadoop execution environment - as a non-public 
 api (@Experimental) not exposed to end users of Spark. 
 The trait will define 6 operations: 
 * hadoopFile 
 * newAPIHadoopFile 
 * broadcast 
 * runJob 
 * persist
 * unpersist
 Each method directly maps to the corresponding methods in current version of 
 SparkContext. JobExecutionContext implementation will be accessed by 
 SparkContext via master URL as 
 execution-context:foo.bar.MyJobExecutionContext with default implementation 
 containing the existing code from SparkContext, thus allowing current 
 (corresponding) methods of SparkContext to delegate to such implementation. 
 An integrator will now have an option to provide custom implementation of 
 DefaultExecutionContext by either implementing it from scratch or extending 
 form DefaultExecutionContext. 
 Please see the attached design doc for more details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4016) Allow user to optionally show additional, advanced metrics in the UI

2014-10-31 Thread Kay Ousterhout (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192653#comment-14192653
 ] 

Kay Ousterhout commented on SPARK-4016:
---

[~sandyr] definitely not intentional to change the behavior of stage-level 
summary metrics -- can you clarify where you're seeing this?

 Allow user to optionally show additional, advanced metrics in the UI
 

 Key: SPARK-4016
 URL: https://issues.apache.org/jira/browse/SPARK-4016
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Kay Ousterhout
Assignee: Kay Ousterhout
Priority: Minor
 Fix For: 1.2.0


 Allowing the user to show/hide additional metrics will allow us to both (1) 
 add more advanced metrics without cluttering the UI for the average user and 
 (2) hide, by default, some of the metrics currently shown that are not widely 
 used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4016) Allow user to optionally show additional, advanced metrics in the UI

2014-10-31 Thread Kay Ousterhout (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192655#comment-14192655
 ] 

Kay Ousterhout commented on SPARK-4016:
---

(I think the summary table was always for only finished tasks, as controlled by 
this line: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala#L169)

 Allow user to optionally show additional, advanced metrics in the UI
 

 Key: SPARK-4016
 URL: https://issues.apache.org/jira/browse/SPARK-4016
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Kay Ousterhout
Assignee: Kay Ousterhout
Priority: Minor
 Fix For: 1.2.0


 Allowing the user to show/hide additional metrics will allow us to both (1) 
 add more advanced metrics without cluttering the UI for the average user and 
 (2) hide, by default, some of the metrics currently shown that are not widely 
 used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-4176) Support decimals with precision 18 in Parquet

Matei Zaharia created SPARK-4176:


 Summary: Support decimals with precision  18 in Parquet
 Key: SPARK-4176
 URL: https://issues.apache.org/jira/browse/SPARK-4176
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Matei Zaharia


After https://issues.apache.org/jira/browse/SPARK-3929, only decimals with 
precisions = 18 (that can be read into a Long) will be readable from Parquet, 
so we still need more work to support these larger ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3652) upgrade spark sql hive version to 0.13.1


 [ 
https://issues.apache.org/jira/browse/SPARK-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangfei resolved SPARK-3652.

Resolution: Fixed

 upgrade spark sql hive version to 0.13.1
 

 Key: SPARK-3652
 URL: https://issues.apache.org/jira/browse/SPARK-3652
 Project: Spark
  Issue Type: Dependency upgrade
  Components: SQL
Affects Versions: 1.1.0
Reporter: wangfei

 now spark sql hive version is 0.12.0, compile with 0.13.1 will get errors. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3322) ConnectionManager logs an error when the application ends


[ 
https://issues.apache.org/jira/browse/SPARK-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192701#comment-14192701
 ] 

wangfei commented on SPARK-3322:


yes, to close this.

 ConnectionManager logs an error when the application ends
 -

 Key: SPARK-3322
 URL: https://issues.apache.org/jira/browse/SPARK-3322
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: wangfei

 Athough it does not influence the result, it always would log an error from 
 ConnectionManager.
 Sometimes only log ConnectionManagerId(vm2,40992) not found and sometimes 
 it also will log CancelledKeyException
 The log Info as fellow:
 14/08/29 16:54:53 ERROR ConnectionManager: Corresponding SendingConnection to 
 ConnectionManagerId(vm2,40992) not found
 14/08/29 16:54:53 INFO ConnectionManager: key already cancelled ? 
 sun.nio.ch.SelectionKeyImpl@457245f9
 java.nio.channels.CancelledKeyException
 at 
 org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386)
 at 
 org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-2460) Optimize SparkContext.hadoopFile api


 [ 
https://issues.apache.org/jira/browse/SPARK-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangfei closed SPARK-2460.
--
Resolution: Fixed

 Optimize SparkContext.hadoopFile api 
 -

 Key: SPARK-2460
 URL: https://issues.apache.org/jira/browse/SPARK-2460
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: wangfei
 Fix For: 1.2.0


 1 use SparkContext.hadoopRDD() instead of instantiate HadoopRDD directly in 
 SparkContext.hadoopFile
 2 broadcast jobConf in HadoopRDD, not Configuration



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-4177) update build doc for JDBC/CLI already supporting hive 13

wangfei created SPARK-4177:
--

 Summary: update build doc for JDBC/CLI already supporting hive 13
 Key: SPARK-4177
 URL: https://issues.apache.org/jira/browse/SPARK-4177
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.1.0
Reporter: wangfei
 Fix For: 1.2.0


fix build doc since already support hive 13 in jdbc/cli



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4177) update build doc for already supporting hive 13 in jdbc/cli


[ 
https://issues.apache.org/jira/browse/SPARK-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192707#comment-14192707
 ] 

Apache Spark commented on SPARK-4177:
-

User 'scwf' has created a pull request for this issue:
https://github.com/apache/spark/pull/3042

 update build doc for already supporting hive 13 in jdbc/cli
 ---

 Key: SPARK-4177
 URL: https://issues.apache.org/jira/browse/SPARK-4177
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.1.0
Reporter: wangfei
 Fix For: 1.2.0


 fix build doc since already support hive 13 in jdbc/cli



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3974) Block matrix abstracitons and partitioners

2014-10-31 Thread Burak Yavuz (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192731#comment-14192731
 ] 

Burak Yavuz commented on SPARK-3974:


Hi everyone,
The design doc for Block Matrix abstractions and the work on matrix 
multiplication can be found here:
goo.gl/zbU1Nz

Let me know if you have any comments / suggestions. I will have the PR for this 
ready by next Friday hopefully.

 Block matrix abstracitons and partitioners
 --

 Key: SPARK-3974
 URL: https://issues.apache.org/jira/browse/SPARK-3974
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Reza Zadeh
Assignee: Burak Yavuz

 We need abstractions for block matrices with fixed block sizes, with each 
 block being dense. Partitioners along both rows and columns required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-4178) Hadoop input metrics ignore bytes read in RecordReader instantiation

Sandy Ryza created SPARK-4178:
-

 Summary: Hadoop input metrics ignore bytes read in RecordReader 
instantiation
 Key: SPARK-4178
 URL: https://issues.apache.org/jira/browse/SPARK-4178
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Sandy Ryza






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4178) Hadoop input metrics ignore bytes read in RecordReader instantiation


[ 
https://issues.apache.org/jira/browse/SPARK-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192773#comment-14192773
 ] 

Sandy Ryza commented on SPARK-4178:
---

Thanks [~kostas] for noticing this.

 Hadoop input metrics ignore bytes read in RecordReader instantiation
 

 Key: SPARK-4178
 URL: https://issues.apache.org/jira/browse/SPARK-4178
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Sandy Ryza





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4178) Hadoop input metrics ignore bytes read in RecordReader instantiation


[ 
https://issues.apache.org/jira/browse/SPARK-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192775#comment-14192775
 ] 

Apache Spark commented on SPARK-4178:
-

User 'sryza' has created a pull request for this issue:
https://github.com/apache/spark/pull/3045

 Hadoop input metrics ignore bytes read in RecordReader instantiation
 

 Key: SPARK-4178
 URL: https://issues.apache.org/jira/browse/SPARK-4178
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Sandy Ryza





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-4175) Exception on stage page

2014-10-31 Thread Kay Ousterhout (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Ousterhout resolved SPARK-4175.
---
   Resolution: Fixed
Fix Version/s: 1.2.0

 Exception on stage page
 ---

 Key: SPARK-4175
 URL: https://issues.apache.org/jira/browse/SPARK-4175
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sandy Ryza
Priority: Critical
 Fix For: 1.2.0


 {code}
 14/10/31 14:52:58 WARN servlet.ServletHandler: /stages/stage/
 java.util.NoSuchElementException: None.get
   at scala.None$.get(Option.scala:313)
   at scala.None$.get(Option.scala:311)
   at org.apache.spark.ui.jobs.StagePage.taskRow(StagePage.scala:331)
   at 
 org.apache.spark.ui.jobs.StagePage$$anonfun$8.apply(StagePage.scala:173)
   at 
 org.apache.spark.ui.jobs.StagePage$$anonfun$8.apply(StagePage.scala:173)
   at 
 org.apache.spark.ui.UIUtils$$anonfun$listingTable$2.apply(UIUtils.scala:282)
   at 
 org.apache.spark.ui.UIUtils$$anonfun$listingTable$2.apply(UIUtils.scala:282)
   at scala.collection.immutable.Stream.map(Stream.scala:376)
   at org.apache.spark.ui.UIUtils$.listingTable(UIUtils.scala:282)
   at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:171)
   at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68)
   at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68)
   at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:68)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
   at 
 org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
   at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1496)
   at 
 org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164)
   at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
   at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
   at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
   at org.eclipse.jetty.server.Server.handle(Server.java:370)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
   at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
   at 
 org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
   at 
 org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
   at 
 org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
   at java.lang.Thread.run(Thread.java:722)
 {code}
 I'm guessing this was caused by SPARK-4016?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2329) Add multi-label evaluation metrics


 [ 
https://issues.apache.org/jira/browse/SPARK-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-2329.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 1270
[https://github.com/apache/spark/pull/1270]

 Add multi-label evaluation metrics
 --

 Key: SPARK-2329
 URL: https://issues.apache.org/jira/browse/SPARK-2329
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Alexander Ulanov
Assignee: Alexander Ulanov
 Fix For: 1.2.0

   Original Estimate: 72h
  Remaining Estimate: 72h

 There is no class in Spark MLlib for measuring the performance of multi-label 
  classifiers. Multilabel classification is when the document is labeled with 
 several labels (classes).
 This task involves adding the class for multilabel evaluation and unit tests. 
 The following measures are to be implemented: Precision, Recall and 
 F1-measure (1) based on documents averaged by the number of documents; (2) 
 per label; (3) based on labels micro and macro averaged; (4) Hamming loss. 
 Reference: Tsoumakas, Grigorios, Ioannis Katakis, and Ioannis Vlahavas. 
 Mining multi-label data. Data mining and knowledge discovery handbook. 
 Springer US, 2010. 667-685.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3838) Python code example for Word2Vec in user guide


 [ 
https://issues.apache.org/jira/browse/SPARK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-3838.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2952
[https://github.com/apache/spark/pull/2952]

 Python code example for Word2Vec in user guide
 --

 Key: SPARK-3838
 URL: https://issues.apache.org/jira/browse/SPARK-3838
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Reporter: Xiangrui Meng
Assignee: Anant Daksh Asthana
Priority: Trivial
 Fix For: 1.2.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-1547) Add gradient boosting algorithm to MLlib


 [ 
https://issues.apache.org/jira/browse/SPARK-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-1547.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2607
[https://github.com/apache/spark/pull/2607]

 Add gradient boosting algorithm to MLlib
 

 Key: SPARK-1547
 URL: https://issues.apache.org/jira/browse/SPARK-1547
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.0.0
Reporter: Manish Amde
Assignee: Manish Amde
 Fix For: 1.2.0


 This task requires adding the gradient boosting algorithm to Spark MLlib. The 
 implementation needs to adapt the gradient boosting algorithm to the scalable 
 tree implementation.
 The tasks involves:
 - Comparing the various tradeoffs and finalizing the algorithm before 
 implementation
 - Code implementation
 - Unit tests
 - Functional tests
 - Performance tests
 - Documentation
 [Ensembles design document (Google doc) | 
 https://docs.google.com/document/d/1J0Q6OP2Ggx0SOtlPgRUkwLASrAkUJw6m6EK12jRDSNg/]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4127) Streaming Linear Regression- Python bindings

2014-10-31 Thread Anant Daksh Asthana (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anant Daksh Asthana updated SPARK-4127:
---
Summary: Streaming Linear Regression- Python bindings  (was: Streaming 
Linear Regression)

 Streaming Linear Regression- Python bindings
 

 Key: SPARK-4127
 URL: https://issues.apache.org/jira/browse/SPARK-4127
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, PySpark
Reporter: Anant Daksh Asthana
Priority: Minor

 Create python bindings for Streaming Linear Regression (MLlib).
 The Mllib file relevant to this issue can be found at : 
 https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingLinearRegression.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3787) Assembly jar name is wrong when we build with sbt omitting -Dhadoop.version


[ 
https://issues.apache.org/jira/browse/SPARK-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192931#comment-14192931
 ] 

Apache Spark commented on SPARK-3787:
-

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/3046

 Assembly jar name is wrong when we build with sbt omitting -Dhadoop.version
 ---

 Key: SPARK-3787
 URL: https://issues.apache.org/jira/browse/SPARK-3787
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.2.0
Reporter: Kousuke Saruta

 When we build with sbt with profile for hadoop and without property for 
 hadoop version like:
 {code}
 sbt/sbt -Phadoop-2.2 assembly
 {code}
 jar name is always used default version (1.0.4).
 When we build with maven with same condition for sbt, default version for 
 each profile is used.
 For instance, if we  build like:
 {code}
 mvn -Phadoop-2.2 package
 {code}
 jar name is used hadoop2.2.0 as a default version of hadoop-2.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192935#comment-14192935
]

Debasish Das commented on SPARK-2426:
-

Refactored QuadraticMinimizer and NNLS from mllib optimization to
breeze.optimize.quadratic
https://github.com/scalanlp/breeze/pull/321
I will update the PR as well but breeze latest depends on scala 2.11 but spark
still uses 2.10
All license and copyright information also moved to breeze. So for spark no
changes to license/notice files.

Quadratic Minimization for MLlib ALS

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3254) Streaming K-Means


 [ 
https://issues.apache.org/jira/browse/SPARK-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-3254.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

Issue resolved by pull request 2942
[https://github.com/apache/spark/pull/2942]

 Streaming K-Means
 -

 Key: SPARK-3254
 URL: https://issues.apache.org/jira/browse/SPARK-3254
 Project: Spark
  Issue Type: New Feature
  Components: MLlib, Streaming
Reporter: Xiangrui Meng
Assignee: Jeremy Freeman
 Fix For: 1.2.0


 Streaming K-Means with proper decay settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-1847) Pushdown filters on non-required parquet columns


 [ 
https://issues.apache.org/jira/browse/SPARK-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia resolved SPARK-1847.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

 Pushdown filters on non-required parquet columns
 

 Key: SPARK-1847
 URL: https://issues.apache.org/jira/browse/SPARK-1847
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.0.0
Reporter: Michael Armbrust
Assignee: Yash Datta
 Fix For: 1.2.0


 From Andre:
 TODO: we currently only filter on non-nullable (Parquet REQUIRED) attributes 
 until https://github.com/Parquet/parquet-mr/issues/371 has been resolved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3968) Use parquet-mr filter2 api in spark sql


 [ 
https://issues.apache.org/jira/browse/SPARK-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-3968:
-
Assignee: Yash Datta

 Use parquet-mr filter2 api in spark sql
 ---

 Key: SPARK-3968
 URL: https://issues.apache.org/jira/browse/SPARK-3968
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: Yash Datta
Assignee: Yash Datta
Priority: Minor
 Fix For: 1.2.0


 The parquet-mr project has introduced a new filter api , along with several 
 fixes (like filtering on optional fields) . It can also eliminate entire 
 RowGroups depending on certain statistics like min/max
 We can leverage that to further improve performance of queries with filters.
 Also filter2 api introduces ability to create custom filters. We can create a 
 custom filter for the optimized In clause (InSet) , so that elimination 
 happens in the ParquetRecordReader itself (will create a separate ticket for 
 that) .
 This fixes the below ticket : 
 https://issues.apache.org/jira/browse/SPARK-1847



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-1847) Pushdown filters on non-required parquet columns