date:20151221

[jira] [Commented] (SPARK-12066) spark sql throw java.lang.ArrayIndexOutOfBoundsException when use table.* with join

2015-12-21 Thread Ricky Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067714#comment-15067714
 ] 

Ricky Yang commented on SPARK-12066:


Problems have been confirmed,There is a line of dirty data hive table, but 
spark sql not catch this exception.
exception as following:
java.lang.ArrayIndexOutOfBoundsException: 9731
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:81)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43)
at 
org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase$FieldInfo.uncheckedGetField(ColumnarStructBase.java:110)
at 
org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase.getField(ColumnarStructBase.java:171)
at 
org.apache.hadoop.hive.serde2.objectinspector.ColumnarStructObjectInspector.getStructFieldData(ColumnarStructObjectInspector.java:166)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:386)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:382)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1792)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1792)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:215)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

> spark sql  throw java.lang.ArrayIndexOutOfBoundsException when use table.* 
> with join 
> -
>
> Key: SPARK-12066
> URL: https://issues.apache.org/jira/browse/SPARK-12066
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0, 1.5.2
> Environment: linux 
>Reporter: Ricky Yang
>Priority: Blocker
>
> throw java.lang.ArrayIndexOutOfBoundsException  when I use following spark 
> sql on spark standlone or yarn.
>the sql:
> select ta.* 
> from bi_td.dm_price_seg_td tb 
> join bi_sor.sor_ord_detail_tf ta 
> on 1 = 1 
> where ta.sale_dt = '20140514' 
> and ta.sale_price >= tb.pri_from 
> and ta.sale_price < tb.pri_to limit 10 ; 
> But ,the result is correct when using no * as following:
> select ta.sale_dt 
> from bi_td.dm_price_seg_td tb 
> join bi_sor.sor_ord_detail_tf ta 
> on 1 = 1 
> where ta.sale_dt = '20140514' 
> and ta.sale_price >= tb.pri_from 
> and ta.sale_price < tb.pri_to limit 10 ; 
> standlone version is 1.4.0 and version spark on yarn  is 1.5.2
> error log :
>   
> 15/11/30 14:19:59 ERROR SparkSQLDriver: Failed in [select ta.* 
> from bi_td.dm_price_seg_td tb 
> join bi_sor.sor_ord_detail_tf ta 
> on 1 = 1 
> where ta.sale_dt = '20140514' 
> and ta.sale_price >= tb.pri_

[jira] [Assigned] (SPARK-12475) Upgrade Zinc from 0.3.5.3 to 0.3.9

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12475:


Assignee: Apache Spark  (was: Josh Rosen)

> Upgrade Zinc from 0.3.5.3 to 0.3.9
> --
>
> Key: SPARK-12475
> URL: https://issues.apache.org/jira/browse/SPARK-12475
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>Assignee: Apache Spark
>
> We should update to the latest version of Zinc in order to match our SBT 
> version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12458) Add ExpressionDescription to datetime functions

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12458:


Assignee: (was: Apache Spark)

> Add ExpressionDescription to datetime functions
> ---
>
> Key: SPARK-12458
> URL: https://issues.apache.org/jira/browse/SPARK-12458
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12458) Add ExpressionDescription to datetime functions

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12458:


Assignee: Apache Spark

> Add ExpressionDescription to datetime functions
> ---
>
> Key: SPARK-12458
> URL: https://issues.apache.org/jira/browse/SPARK-12458
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-11823) HiveThriftBinaryServerSuite tests timing out, leaves hanging processes

2015-12-21 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-11823.

   Resolution: Fixed
Fix Version/s: 1.6.1
   2.0.0

Issue resolved by pull request 10425
[https://github.com/apache/spark/pull/10425]

> HiveThriftBinaryServerSuite tests timing out, leaves hanging processes
> --
>
> Key: SPARK-11823
> URL: https://issues.apache.org/jira/browse/SPARK-11823
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: shane knapp
>Assignee: Josh Rosen
> Fix For: 2.0.0, 1.6.1
>
> Attachments: 
> spark-jenkins-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-amp-jenkins-worker-05.out,
>  stack.log
>
>
> i've noticed on a few branches that the HiveThriftBinaryServerSuite tests 
> time out, and when that happens, the build is aborted but the tests leave 
> behind hanging processes that eat up cpu and ram.
> most recently, i discovered this happening w/the 1.6 SBT build, specifically 
> w/the hadoop 2.0 profile:
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.6-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=spark-test/56/console
> [~vanzin] grabbed the jstack log, which i've attached to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11807) Remove support for Hadoop < 2.2 (i.e. Hadoop 1 and 2.0)

2015-12-21 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067608#comment-15067608
 ] 

Reynold Xin commented on SPARK-11807:
-

[~srowen] please submit prs that can simplify code (reflection) from the 
removal of hadoop 1.x. Maybe create a new ticket and link to this one when you 
do them?


> Remove support for Hadoop < 2.2 (i.e. Hadoop 1 and 2.0)
> ---
>
> Key: SPARK-11807
> URL: https://issues.apache.org/jira/browse/SPARK-11807
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>  Labels: releasenotes
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-11807) Remove support for Hadoop < 2.2 (i.e. Hadoop 1 and 2.0)

2015-12-21 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-11807.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Remove support for Hadoop < 2.2 (i.e. Hadoop 1 and 2.0)
> ---
>
> Key: SPARK-11807
> URL: https://issues.apache.org/jira/browse/SPARK-11807
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>  Labels: releasenotes
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12327) lint-r checks fail with commented code

2015-12-21 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067575#comment-15067575
 ] 

Felix Cheung commented on SPARK-12327:
--

https://github.com/apache/spark/pull/10408

> lint-r checks fail with commented code
> --
>
> Key: SPARK-12327
> URL: https://issues.apache.org/jira/browse/SPARK-12327
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> We get this after our R version downgrade
> {code}
> R/RDD.R:183:68: style: Commented code should be removed.
> rdd@env$jrdd_val <- callJMethod(rddRef, "asJavaRDD") # 
> rddRef$asJavaRDD()
>
> ^~
> R/RDD.R:228:63: style: Commented code should be removed.
> #' http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence.
>   ^~~~
> R/RDD.R:388:24: style: Commented code should be removed.
> #' collectAsMap(rdd) # list(`1` = 2, `3` = 4)
>^~
> R/RDD.R:603:61: style: Commented code should be removed.
> #' unlist(collect(filterRDD(rdd, function (x) { x < 3 }))) # c(1, 2)
> ^~~~
> R/RDD.R:762:20: style: Commented code should be removed.
> #' take(rdd, 2L) # list(1, 2)
>^~
> R/RDD.R:830:42: style: Commented code should be removed.
> #' sort(unlist(collect(distinct(rdd # c(1, 2, 3)
>  ^~~
> R/RDD.R:980:47: style: Commented code should be removed.
> #' collect(keyBy(rdd, function(x) { x*x })) # list(list(1, 1), list(4, 2), 
> list(9, 3))
>   
> ^~~~
> R/RDD.R:1194:27: style: Commented code should be removed.
> #' takeOrdered(rdd, 6L) # list(1, 2, 3, 4, 5, 6)
>   ^~
> R/RDD.R:1215:19: style: Commented code should be removed.
> #' top(rdd, 6L) # list(10, 9, 7, 6, 5, 4)
>   ^~~
> R/RDD.R:1270:50: style: Commented code should be removed.
> #' aggregateRDD(rdd, zeroValue, seqOp, combOp) # list(10, 4)
>  ^~~
> R/RDD.R:1374:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 3), list("c", 1), list("d", 4), list("e", 
> 2))
>  
> ^~
> R/RDD.R:1415:6: style: Commented code should be removed.
> #' # list(list("a", 0), list("b", 1), list("c", 2), list("d", 3), list("e", 
> 4))
>  
> ^~
> R/RDD.R:1461:6: style: Commented code should be removed.
> #' # list(list(1, 2), list(3, 4))
>  ^~~~
> R/RDD.R:1527:6: style: Commented code should be removed.
> #' # list(list(0, 1000), list(1, 1001), list(2, 1002), list(3, 1003), list(4, 
> 1004))
>  
> ^~~
> R/RDD.R:1564:6: style: Commented code should be removed.
> #' # list(list(1, 1), list(1, 2), list(2, 1), list(2, 2))
>  ^~~~
> R/RDD.R:1595:6: style: Commented code should be removed.
> #' # list(1, 1, 3)
>  ^
> R/RDD.R:1627:6: style: Commented code should be removed.
> #' # list(1, 2, 3)
>  ^
> R/RDD.R:1663:6: style: Commented code should be removed.
> #' # list(list(1, c(1,2), c(1,2,3)), list(2, c(3,4), c(4,5,6)))
>  ^~
> R/deserialize.R:22:3: style: Commented code should be removed.
> # void -> NULL
>   ^~~~
> R/deserialize.R:23:3: style: Commented code should be removed.
> # Int -> integer
>   ^~
> R/deserialize.R:24:3: style: Commented code should be removed.
> # String -> character
>   ^~~
> R/deserialize.R:25:3: style: Commented code should be removed.
> # Boolean -> logical
>   ^~
> R/deserialize.R:26:3: style: Commented code should be removed.
> # Float -> double
>   ^~~
> R/deserialize.R:27:3: style: Commented code should be removed.
> # Double -> double
>   ^~~~
> R/deserialize.R:28:3: style: Commented code should be removed.
> # Long -> double
>   ^~
> R/deserialize.R:29:3: style: Commented code should be removed.
> # Array[Byte] -> raw
>   ^~
> R/deserialize.R:30:3: style: Commented code should be removed.
> # Date -> Date
>   ^~~~
> R/deserialize.R:31:3: style: Commented code should be removed.
> # Time -> POSIXct
>   ^~~
> R/deserialize.R

[jira] [Assigned] (SPARK-12476) Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Filter

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12476:


Assignee: (was: Apache Spark)

> Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Filter
> -
>
> Key: SPARK-12476
> URL: https://issues.apache.org/jira/browse/SPARK-12476
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Takeshi Yamamuro
>
> Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'
> Current plan:
> {code}
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> +- Filter (col0#0 = xxx)
>+- Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> {code}
> This patch enables a plan below;
> {code}
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12476) Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Filter

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12476:


Assignee: Apache Spark

> Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Filter
> -
>
> Key: SPARK-12476
> URL: https://issues.apache.org/jira/browse/SPARK-12476
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Takeshi Yamamuro
>Assignee: Apache Spark
>
> Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'
> Current plan:
> {code}
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> +- Filter (col0#0 = xxx)
>+- Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> {code}
> This patch enables a plan below;
> {code}
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12476) Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Filter

2015-12-21 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067528#comment-15067528
 ] 

Apache Spark commented on SPARK-12476:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/10427

> Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Filter
> -
>
> Key: SPARK-12476
> URL: https://issues.apache.org/jira/browse/SPARK-12476
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Takeshi Yamamuro
>
> Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'
> Current plan:
> {code}
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> +- Filter (col0#0 = xxx)
>+- Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> {code}
> This patch enables a plan below;
> {code}
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12476) Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Filter

2015-12-21 Thread Takeshi Yamamuro (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-12476:
-
Description: 
Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'

Current plan:
{code}
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
{code}

This patch enables a plan below;
{code}
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
{code}



  was:
Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'

Current plan:
{code}
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
{code}

This patch enables a plan below;
{code}
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
{code}




> Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Filter
> -
>
> Key: SPARK-12476
> URL: https://issues.apache.org/jira/browse/SPARK-12476
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Takeshi Yamamuro
>
> Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'
> Current plan:
> {code}
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> +- Filter (col0#0 = xxx)
>+- Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> {code}
> This patch enables a plan below;
> {code}
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12476) Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Fileter

2015-12-21 Thread Takeshi Yamamuro (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-12476:
-
Description: 
Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'

{code}
Current plan:
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
{code}

This patch enables a plan below;
{code}
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
{code}



  was:
Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'

{code}
Current plan:
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
{code}

This patch enables a plan below;
```
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
```





> Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Fileter
> --
>
> Key: SPARK-12476
> URL: https://issues.apache.org/jira/browse/SPARK-12476
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Takeshi Yamamuro
>
> Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'
> {code}
> Current plan:
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> +- Filter (col0#0 = xxx)
>+- Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> {code}
> This patch enables a plan below;
> {code}
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> +- Filter (col0#0 = xxx)
>+- Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12476) Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Filter

2015-12-21 Thread Takeshi Yamamuro (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-12476:
-
Description: 
Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'

Current plan:
{code}
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
{code}

This patch enables a plan below;
{code}
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
{code}



  was:
Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'

{code}
Current plan:
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
{code}

This patch enables a plan below;
{code}
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
{code}




> Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Filter
> -
>
> Key: SPARK-12476
> URL: https://issues.apache.org/jira/browse/SPARK-12476
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Takeshi Yamamuro
>
> Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'
> Current plan:
> {code}
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> +- Filter (col0#0 = xxx)
>+- Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> {code}
> This patch enables a plan below;
> {code}
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> +- Filter (col0#0 = xxx)
>+- Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12476) Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Filter

2015-12-21 Thread Takeshi Yamamuro (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-12476:
-
Summary: Implement JdbcRelation#unhandledFilters for removing unnecessary 
Spark Filter  (was: Implement JdbcRelation#unhandledFilters for removing 
unnecessary Spark Fileter)

> Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Filter
> -
>
> Key: SPARK-12476
> URL: https://issues.apache.org/jira/browse/SPARK-12476
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Takeshi Yamamuro
>
> Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'
> {code}
> Current plan:
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> +- Filter (col0#0 = xxx)
>+- Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> {code}
> This patch enables a plan below;
> {code}
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> +- Filter (col0#0 = xxx)
>+- Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12476) Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Fileter

2015-12-21 Thread Takeshi Yamamuro (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-12476:
-
Description: 
Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'

{code}
Current plan:
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
{code}

This patch enables a plan below;
```
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
```




  was:
Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'

```
Current plan:
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
```

This patch enables a plan below;
```
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
```





> Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Fileter
> --
>
> Key: SPARK-12476
> URL: https://issues.apache.org/jira/browse/SPARK-12476
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Takeshi Yamamuro
>
> Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'
> {code}
> Current plan:
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> +- Filter (col0#0 = xxx)
>+- Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> {code}
> This patch enables a plan below;
> ```
> == Optimized Logical Plan ==
> Project [col0#0,col1#1]
> +- Filter (col0#0 = xxx)
>+- Relation[col0#0,col1#1] 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})
> == Physical Plan ==
> +- Filter (col0#0 = xxx)
>+- Scan 
> JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
>  password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
> [EqualTo(col0,xxx)]
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12476) Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Fileter

2015-12-21 Thread Takeshi Yamamuro (JIRA)

Takeshi Yamamuro created SPARK-12476:


 Summary: Implement JdbcRelation#unhandledFilters for removing 
unnecessary Spark Fileter
 Key: SPARK-12476
 URL: https://issues.apache.org/jira/browse/SPARK-12476
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Takeshi Yamamuro


Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx'

```
Current plan:
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
```

This patch enables a plan below;
```
== Optimized Logical Plan ==
Project [col0#0,col1#1]
+- Filter (col0#0 = xxx)
   +- Relation[col0#0,col1#1] 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})

== Physical Plan ==
+- Filter (col0#0 = xxx)
   +- Scan 
JDBCRelation(jdbc:postgresql:postgres,testRel,[Lorg.apache.spark.Partition;@2ac7c683,{user=maropu,
 password=, driver=org.postgresql.Driver})[col0#0,col1#1] PushedFilters: 
[EqualTo(col0,xxx)]
```






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12475) Upgrade Zinc from 0.3.5.3 to 0.3.9

2015-12-21 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067509#comment-15067509
 ] 

Apache Spark commented on SPARK-12475:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/10426

> Upgrade Zinc from 0.3.5.3 to 0.3.9
> --
>
> Key: SPARK-12475
> URL: https://issues.apache.org/jira/browse/SPARK-12475
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> We should update to the latest version of Zinc in order to match our SBT 
> version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12475) Upgrade Zinc from 0.3.5.3 to 0.3.9

2015-12-21 Thread Josh Rosen (JIRA)

Josh Rosen created SPARK-12475:
--

 Summary: Upgrade Zinc from 0.3.5.3 to 0.3.9
 Key: SPARK-12475
 URL: https://issues.apache.org/jira/browse/SPARK-12475
 Project: Spark
  Issue Type: Improvement
  Components: Build, Project Infra
Reporter: Josh Rosen
Assignee: Josh Rosen


We should update to the latest version of Zinc in order to match our SBT 
version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12474) support deserialization for physical plan and hive logical plan from JSON string

2015-12-21 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-12474:

Priority: Minor  (was: Major)

> support deserialization for physical plan and hive logical plan from JSON 
> string
> 
>
> Key: SPARK-12474
> URL: https://issues.apache.org/jira/browse/SPARK-12474
> Project: Spark
>  Issue Type: New Feature
>Reporter: Wenchen Fan
>Priority: Minor
>
> SPARK-12321 add a framework based on reflection that can serialize 
> {{TreeNode}} to JSON and deserialize it back. However, it can't handle all 
> corner cases and we bypass them in the test, see 
> https://github.com/apache/spark/pull/10311/files#diff-238d584c15e16c24f49a40bcf163fe13R190
>  and 
> https://github.com/apache/spark/pull/10311/files#diff-238d584c15e16c24f49a40bcf163fe13R212.
> known corner cases:
> 1. ExpressionEncoder
> 2. BaseRelation
> 3. hive logical plan
> 4. physical plan
> The framework is in catalyst module and may not be able to handle corner 
> cases from other modules, an idea is defining a {{JsonSerializable}} trait 
> and implement it for corner cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12474) support deserialization for physical plan and hive logical plan from JSON string

2015-12-21 Thread Wenchen Fan (JIRA)

Wenchen Fan created SPARK-12474:
---

 Summary: support deserialization for physical plan and hive 
logical plan from JSON string
 Key: SPARK-12474
 URL: https://issues.apache.org/jira/browse/SPARK-12474
 Project: Spark
  Issue Type: New Feature
Reporter: Wenchen Fan


SPARK-12321 add a framework based on reflection that can serialize {{TreeNode}} 
to JSON and deserialize it back. However, it can't handle all corner cases and 
we bypass them in the test, see 
https://github.com/apache/spark/pull/10311/files#diff-238d584c15e16c24f49a40bcf163fe13R190
 and 
https://github.com/apache/spark/pull/10311/files#diff-238d584c15e16c24f49a40bcf163fe13R212.

known corner cases:
1. ExpressionEncoder
2. BaseRelation
3. hive logical plan
4. physical plan

The framework is in catalyst module and may not be able to handle corner cases 
from other modules, an idea is defining a {{JsonSerializable}} trait and 
implement it for corner cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-10873) can't sort columns on history page

2015-12-21 Thread Zhuo Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067477#comment-15067477
 ] 

Zhuo Liu edited comment on SPARK-10873 at 12/22/15 2:56 AM:


Hi Alex, yes, that is exactly what I am going to do, both column sorting and 
search box should be fixed together with jQuery DataTables. [~ajbozarth]


was (Author: zhuoliu):
Hi Alex, yes, that is exactly what I am going to do, both column sorting and 
search box should be fixed together with jQuery DataTables.

> can't sort columns on history page
> --
>
> Key: SPARK-10873
> URL: https://issues.apache.org/jira/browse/SPARK-10873
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Thomas Graves
>Assignee: Zhuo Liu
>
> Starting with 1.5.1 the history server page isn't allowing sorting by column



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10873) can't sort columns on history page

2015-12-21 Thread Zhuo Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067477#comment-15067477
 ] 

Zhuo Liu commented on SPARK-10873:
--

Hi Alex, yes, that is exactly what I am going to do, both column sorting and 
search box should be fixed together with jQuery DataTables.

> can't sort columns on history page
> --
>
> Key: SPARK-10873
> URL: https://issues.apache.org/jira/browse/SPARK-10873
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Thomas Graves
>Assignee: Zhuo Liu
>
> Starting with 1.5.1 the history server page isn't allowing sorting by column



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11823) HiveThriftBinaryServerSuite tests timing out, leaves hanging processes

2015-12-21 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067430#comment-15067430
 ] 

Apache Spark commented on SPARK-11823:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/10425

> HiveThriftBinaryServerSuite tests timing out, leaves hanging processes
> --
>
> Key: SPARK-11823
> URL: https://issues.apache.org/jira/browse/SPARK-11823
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: shane knapp
>Assignee: Josh Rosen
> Attachments: 
> spark-jenkins-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-amp-jenkins-worker-05.out,
>  stack.log
>
>
> i've noticed on a few branches that the HiveThriftBinaryServerSuite tests 
> time out, and when that happens, the build is aborted but the tests leave 
> behind hanging processes that eat up cpu and ram.
> most recently, i discovered this happening w/the 1.6 SBT build, specifically 
> w/the hadoop 2.0 profile:
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.6-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=spark-test/56/console
> [~vanzin] grabbed the jstack log, which i've attached to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12472) OOM when sort a table and save as parquet

2015-12-21 Thread Davies Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067413#comment-15067413
 ] 

Davies Liu commented on SPARK-12472:


could be workarounded by decreasing the memory used by both Spark and Parquet 
using spark.memory.fraction (for example, 0.4) and parquet.memory.pool.ratio 
(for example, 0.3, in core-site.xml)

> OOM when sort a table and save as parquet
> -
>
> Key: SPARK-12472
> URL: https://issues.apache.org/jira/browse/SPARK-12472
> Project: Spark
>  Issue Type: Bug
>Reporter: Davies Liu
>
> {code}
> t = sqlContext.table('store_sales')
> t.unionAll(t).coalesce(2).sortWithinPartitions(t[0]).write.partitionBy('ss_sold_date_sk').parquet("/tmp/ttt")
> {code}
> {code}
> 15/12/21 14:35:52 WARN TaskSetManager: Lost task 1.0 in stage 25.0 (TID 96, 
> 192.168.0.143): java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeSortDataFormat.allocate(UnsafeSortDataFormat.java:86)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeSortDataFormat.allocate(UnsafeSortDataFormat.java:32)
>   at 
> org.apache.spark.util.collection.TimSort$SortState.ensureCapacity(TimSort.java:951)
>   at 
> org.apache.spark.util.collection.TimSort$SortState.mergeLo(TimSort.java:699)
>   at 
> org.apache.spark.util.collection.TimSort$SortState.mergeAt(TimSort.java:525)
>   at 
> org.apache.spark.util.collection.TimSort$SortState.mergeCollapse(TimSort.java:453)
>   at 
> org.apache.spark.util.collection.TimSort$SortState.access$200(TimSort.java:325)
>   at org.apache.spark.util.collection.TimSort.sort(TimSort.java:153)
>   at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:226)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:187)
>   at 
> org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:170)
>   at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:244)
>   at 
> org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:327)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:342)
>   at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:91)
>   at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
>   at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
>   at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-11931) "org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.test jdbc cancel" is sometimes very slow

2015-12-21 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-11931.

Resolution: Duplicate

> "org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.test jdbc 
> cancel" is sometimes very slow
> 
>
> Key: SPARK-11931
> URL: https://issues.apache.org/jira/browse/SPARK-11931
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> The "org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.test 
> jdbc cancel" test is sometimes very slow. It usually takes about 8 seconds to 
> run, but in some cases can take up to 5 minutes: 
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/HADOOP_VERSION=1.2.1,label=spark-test/4874/testReport/org.apache.spark.sql.hive.thriftserver/HiveThriftBinaryServerSuite/test_jdbc_cancel/history/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-11823) HiveThriftBinaryServerSuite tests timing out, leaves hanging processes

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11823:


Assignee: Apache Spark  (was: Josh Rosen)

> HiveThriftBinaryServerSuite tests timing out, leaves hanging processes
> --
>
> Key: SPARK-11823
> URL: https://issues.apache.org/jira/browse/SPARK-11823
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: shane knapp
>Assignee: Apache Spark
> Attachments: 
> spark-jenkins-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-amp-jenkins-worker-05.out,
>  stack.log
>
>
> i've noticed on a few branches that the HiveThriftBinaryServerSuite tests 
> time out, and when that happens, the build is aborted but the tests leave 
> behind hanging processes that eat up cpu and ram.
> most recently, i discovered this happening w/the 1.6 SBT build, specifically 
> w/the hadoop 2.0 profile:
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.6-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=spark-test/56/console
> [~vanzin] grabbed the jstack log, which i've attached to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-11823) HiveThriftBinaryServerSuite tests timing out, leaves hanging processes

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11823:


Assignee: Josh Rosen  (was: Apache Spark)

> HiveThriftBinaryServerSuite tests timing out, leaves hanging processes
> --
>
> Key: SPARK-11823
> URL: https://issues.apache.org/jira/browse/SPARK-11823
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: shane knapp
>Assignee: Josh Rosen
> Attachments: 
> spark-jenkins-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-amp-jenkins-worker-05.out,
>  stack.log
>
>
> i've noticed on a few branches that the HiveThriftBinaryServerSuite tests 
> time out, and when that happens, the build is aborted but the tests leave 
> behind hanging processes that eat up cpu and ram.
> most recently, i discovered this happening w/the 1.6 SBT build, specifically 
> w/the hadoop 2.0 profile:
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.6-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=spark-test/56/console
> [~vanzin] grabbed the jstack log, which i've attached to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-11823) HiveThriftBinaryServerSuite tests timing out, leaves hanging processes

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11823:


Assignee: Apache Spark  (was: Josh Rosen)

> HiveThriftBinaryServerSuite tests timing out, leaves hanging processes
> --
>
> Key: SPARK-11823
> URL: https://issues.apache.org/jira/browse/SPARK-11823
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: shane knapp
>Assignee: Apache Spark
> Attachments: 
> spark-jenkins-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-amp-jenkins-worker-05.out,
>  stack.log
>
>
> i've noticed on a few branches that the HiveThriftBinaryServerSuite tests 
> time out, and when that happens, the build is aborted but the tests leave 
> behind hanging processes that eat up cpu and ram.
> most recently, i discovered this happening w/the 1.6 SBT build, specifically 
> w/the hadoop 2.0 profile:
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.6-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=spark-test/56/console
> [~vanzin] grabbed the jstack log, which i've attached to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-11823) HiveThriftBinaryServerSuite tests timing out, leaves hanging processes

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11823:


Assignee: Josh Rosen  (was: Apache Spark)

> HiveThriftBinaryServerSuite tests timing out, leaves hanging processes
> --
>
> Key: SPARK-11823
> URL: https://issues.apache.org/jira/browse/SPARK-11823
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: shane knapp
>Assignee: Josh Rosen
> Attachments: 
> spark-jenkins-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-amp-jenkins-worker-05.out,
>  stack.log
>
>
> i've noticed on a few branches that the HiveThriftBinaryServerSuite tests 
> time out, and when that happens, the build is aborted but the tests leave 
> behind hanging processes that eat up cpu and ram.
> most recently, i discovered this happening w/the 1.6 SBT build, specifically 
> w/the hadoop 2.0 profile:
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.6-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=spark-test/56/console
> [~vanzin] grabbed the jstack log, which i've attached to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12473) Reuse serializer instances for performance

2015-12-21 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12473:
--
Description: 
After commit de02782 of page rank regressed from 242s to 260s, about 7%. 
Although currently it's only 7%, we will likely register more classes in the 
future so we should do this the right way.

The commit added 26 types to register every time we create a Kryo serializer 
instance. I ran a small microbenchmark to prove that this is noticeably 
expensive:

{code}
import org.apache.spark.serializer._
import org.apache.spark.SparkConf

def makeMany(num: Int): Long = {
  val start = System.currentTimeMillis
  (1 to num).foreach { _ => new KryoSerializer(new SparkConf).newKryo() }
  System.currentTimeMillis - start
}

// before commit de02782, averaged over multiple runs
makeMany(5000) == 1500

// after commit de02782, averaged over multiple runs
makeMany(5000) == 2750
{code}

Since we create multiple serializer instances per partition, this means a 
5000-partition stage will unconditionally see an increase of > 1s for the 
stage. In page rank, we may run many such stages.

We should explore the alternative of reusing thread-local serializer instances, 
which would lead to much fewer calls to `kryo.register`.

  was:
After commit de02782 of page rank regressed from 242s to 260s, about 7%.

The commit added 26 types to register every time we create a Kryo serializer 
instance. I ran a small microbenchmark to prove that this is noticeably 
expensive:

{code}
import org.apache.spark.serializer._
import org.apache.spark.SparkConf

def makeMany(num: Int): Long = {
  val start = System.currentTimeMillis
  (1 to num).foreach { _ => new KryoSerializer(new SparkConf).newKryo() }
  System.currentTimeMillis - start
}

// before commit de02782, averaged over multiple runs
makeMany(5000) == 1500

// after commit de02782, averaged over multiple runs
makeMany(5000) == 2750
{code}

Since we create multiple serializer instances per partition, this means a 
5000-partition stage will unconditionally see an increase of > 1s for the 
stage. In page rank, we may run many such stages.

We should explore the alternative of reusing thread-local serializer instances, 
which would lead to much fewer calls to `kryo.register`.


> Reuse serializer instances for performance
> --
>
> Key: SPARK-12473
> URL: https://issues.apache.org/jira/browse/SPARK-12473
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> After commit de02782 of page rank regressed from 242s to 260s, about 7%. 
> Although currently it's only 7%, we will likely register more classes in the 
> future so we should do this the right way.
> The commit added 26 types to register every time we create a Kryo serializer 
> instance. I ran a small microbenchmark to prove that this is noticeably 
> expensive:
> {code}
> import org.apache.spark.serializer._
> import org.apache.spark.SparkConf
> def makeMany(num: Int): Long = {
>   val start = System.currentTimeMillis
>   (1 to num).foreach { _ => new KryoSerializer(new SparkConf).newKryo() }
>   System.currentTimeMillis - start
> }
> // before commit de02782, averaged over multiple runs
> makeMany(5000) == 1500
> // after commit de02782, averaged over multiple runs
> makeMany(5000) == 2750
> {code}
> Since we create multiple serializer instances per partition, this means a 
> 5000-partition stage will unconditionally see an increase of > 1s for the 
> stage. In page rank, we may run many such stages.
> We should explore the alternative of reusing thread-local serializer 
> instances, which would lead to much fewer calls to `kryo.register`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12473) Reuse serializer instances for performance

2015-12-21 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12473:
--
Description: 
After commit de02782 of page rank regressed from 242s to 260s, about 7%. 
Although currently it's only 7%, we will likely register more classes in the 
future so this will only increase.

The commit added 26 types to register every time we create a Kryo serializer 
instance. I ran a small microbenchmark to prove that this is noticeably 
expensive:

{code}
import org.apache.spark.serializer._
import org.apache.spark.SparkConf

def makeMany(num: Int): Long = {
  val start = System.currentTimeMillis
  (1 to num).foreach { _ => new KryoSerializer(new SparkConf).newKryo() }
  System.currentTimeMillis - start
}

// before commit de02782, averaged over multiple runs
makeMany(5000) == 1500

// after commit de02782, averaged over multiple runs
makeMany(5000) == 2750
{code}

Since we create multiple serializer instances per partition, this means a 
5000-partition stage will unconditionally see an increase of > 1s for the 
stage. In page rank, we may run many such stages.

We should explore the alternative of reusing thread-local serializer instances, 
which would lead to much fewer calls to `kryo.register`.

  was:
After commit de02782 of page rank regressed from 242s to 260s, about 7%. 
Although currently it's only 7%, we will likely register more classes in the 
future so we should do this the right way.

The commit added 26 types to register every time we create a Kryo serializer 
instance. I ran a small microbenchmark to prove that this is noticeably 
expensive:

{code}
import org.apache.spark.serializer._
import org.apache.spark.SparkConf

def makeMany(num: Int): Long = {
  val start = System.currentTimeMillis
  (1 to num).foreach { _ => new KryoSerializer(new SparkConf).newKryo() }
  System.currentTimeMillis - start
}

// before commit de02782, averaged over multiple runs
makeMany(5000) == 1500

// after commit de02782, averaged over multiple runs
makeMany(5000) == 2750
{code}

Since we create multiple serializer instances per partition, this means a 
5000-partition stage will unconditionally see an increase of > 1s for the 
stage. In page rank, we may run many such stages.

We should explore the alternative of reusing thread-local serializer instances, 
which would lead to much fewer calls to `kryo.register`.


> Reuse serializer instances for performance
> --
>
> Key: SPARK-12473
> URL: https://issues.apache.org/jira/browse/SPARK-12473
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> After commit de02782 of page rank regressed from 242s to 260s, about 7%. 
> Although currently it's only 7%, we will likely register more classes in the 
> future so this will only increase.
> The commit added 26 types to register every time we create a Kryo serializer 
> instance. I ran a small microbenchmark to prove that this is noticeably 
> expensive:
> {code}
> import org.apache.spark.serializer._
> import org.apache.spark.SparkConf
> def makeMany(num: Int): Long = {
>   val start = System.currentTimeMillis
>   (1 to num).foreach { _ => new KryoSerializer(new SparkConf).newKryo() }
>   System.currentTimeMillis - start
> }
> // before commit de02782, averaged over multiple runs
> makeMany(5000) == 1500
> // after commit de02782, averaged over multiple runs
> makeMany(5000) == 2750
> {code}
> Since we create multiple serializer instances per partition, this means a 
> 5000-partition stage will unconditionally see an increase of > 1s for the 
> stage. In page rank, we may run many such stages.
> We should explore the alternative of reusing thread-local serializer 
> instances, which would lead to much fewer calls to `kryo.register`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12463) Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12463:


Assignee: Apache Spark

> Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode
> 
>
> Key: SPARK-12463
> URL: https://issues.apache.org/jira/browse/SPARK-12463
> Project: Spark
>  Issue Type: Task
>  Components: Mesos
>Reporter: Timothy Chen
>Assignee: Apache Spark
>
> Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode 
> configuration for cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12473) Reuse serializer instances for performance

2015-12-21 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12473:
--
Description: 
After commit de02782 of page rank regressed from 242s to 260s, about 7%.

The commit added 26 types to register every time we create a Kryo serializer 
instance. I ran a small microbenchmark to prove that this is noticeably 
expensive:

{code}
import org.apache.spark.serializer._
import org.apache.spark.SparkConf

def makeMany(num: Int): Long = {
  val start = System.currentTimeMillis
  (1 to num).foreach { _ => new KryoSerializer(new SparkConf).newKryo() }
  System.currentTimeMillis - start
}

// before commit de02782, averaged over multiple runs
makeMany(5000) == 1500

// after commit de02782, averaged over multiple runs
makeMany(5000) == 2750
{code}

Since we create multiple serializer instances per partition, this means a 
5000-partition stage will unconditionally see an increase of > 1s for the 
stage. In page rank, we may run many such stages.

We should explore the alternative of reusing thread-local serializer instances, 
which would lead to much fewer calls to `kryo.register`.

  was:
After commit de02782 of page rank regressed from 242s to 260s, about 7%.

The commit added 26 types to register every time we create a Kryo serializer 
instance. I ran a small microbenchmark to prove that this is noticeably 
expensive:

{code}
import org.apache.spark.serializer._
import org.apache.spark.SparkConf

def makeMany(num: Int): Long = {
  val start = System.currentTimeMillis
  (1 to num).foreach { _ => new KryoSerializer(new SparkConf).newKryo() }
  System.currentTimeMillis - start
}

// before commit de02782, averaged over multiple runs
makeMany(5000) == 1500

// after commit de02782, averaged over multiple runs
makeMany(5000) == 2750
{code}

Since we create multiple serializer instances per partition, this means a 
5000-partition stage will unconditionally see an increase of > 1s for the 
stage. In page rank, we may run many such stages.


> Reuse serializer instances for performance
> --
>
> Key: SPARK-12473
> URL: https://issues.apache.org/jira/browse/SPARK-12473
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> After commit de02782 of page rank regressed from 242s to 260s, about 7%.
> The commit added 26 types to register every time we create a Kryo serializer 
> instance. I ran a small microbenchmark to prove that this is noticeably 
> expensive:
> {code}
> import org.apache.spark.serializer._
> import org.apache.spark.SparkConf
> def makeMany(num: Int): Long = {
>   val start = System.currentTimeMillis
>   (1 to num).foreach { _ => new KryoSerializer(new SparkConf).newKryo() }
>   System.currentTimeMillis - start
> }
> // before commit de02782, averaged over multiple runs
> makeMany(5000) == 1500
> // after commit de02782, averaged over multiple runs
> makeMany(5000) == 2750
> {code}
> Since we create multiple serializer instances per partition, this means a 
> 5000-partition stage will unconditionally see an increase of > 1s for the 
> stage. In page rank, we may run many such stages.
> We should explore the alternative of reusing thread-local serializer 
> instances, which would lead to much fewer calls to `kryo.register`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12473) Reuse serializer instances for performance

2015-12-21 Thread Andrew Or (JIRA)

Andrew Or created SPARK-12473:
-

 Summary: Reuse serializer instances for performance
 Key: SPARK-12473
 URL: https://issues.apache.org/jira/browse/SPARK-12473
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.6.0
Reporter: Andrew Or
Assignee: Andrew Or


After commit de02782 of page rank regressed from 242s to 260s, about 7%.

The commit added 26 types to register every time we create a Kryo serializer 
instance. I ran a small microbenchmark to prove that this is noticeably 
expensive:

{code}
import org.apache.spark.serializer._
import org.apache.spark.SparkConf

def makeMany(num: Int): Long = {
  val start = System.currentTimeMillis
  (1 to num).foreach { _ => new KryoSerializer(new SparkConf).newKryo() }
  System.currentTimeMillis - start
}

// before commit de02782, averaged over multiple runs
makeMany(5000) == 1500

// after commit de02782, averaged over multiple runs
makeMany(5000) == 2750
{code}

Since we create multiple serializer instances per partition, this means a 
5000-partition stage will unconditionally see an increase of > 1s for the 
stage. In page rank, we may run many such stages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11823) HiveThriftBinaryServerSuite tests timing out, leaves hanging processes

2015-12-21 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067345#comment-15067345
 ] 

Josh Rosen commented on SPARK-11823:


I think I spotted the problem; it may be a bad use of Thread.sleep() in a test: 
https://github.com/apache/spark/pull/6207/files#r30935200

> HiveThriftBinaryServerSuite tests timing out, leaves hanging processes
> --
>
> Key: SPARK-11823
> URL: https://issues.apache.org/jira/browse/SPARK-11823
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: shane knapp
>Assignee: Josh Rosen
> Attachments: 
> spark-jenkins-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-amp-jenkins-worker-05.out,
>  stack.log
>
>
> i've noticed on a few branches that the HiveThriftBinaryServerSuite tests 
> time out, and when that happens, the build is aborted but the tests leave 
> behind hanging processes that eat up cpu and ram.
> most recently, i discovered this happening w/the 1.6 SBT build, specifically 
> w/the hadoop 2.0 profile:
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.6-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=spark-test/56/console
> [~vanzin] grabbed the jstack log, which i've attached to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-11823) HiveThriftBinaryServerSuite tests timing out, leaves hanging processes

2015-12-21 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reassigned SPARK-11823:
--

Assignee: Josh Rosen

> HiveThriftBinaryServerSuite tests timing out, leaves hanging processes
> --
>
> Key: SPARK-11823
> URL: https://issues.apache.org/jira/browse/SPARK-11823
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: shane knapp
>Assignee: Josh Rosen
> Attachments: 
> spark-jenkins-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-amp-jenkins-worker-05.out,
>  stack.log
>
>
> i've noticed on a few branches that the HiveThriftBinaryServerSuite tests 
> time out, and when that happens, the build is aborted but the tests leave 
> behind hanging processes that eat up cpu and ram.
> most recently, i discovered this happening w/the 1.6 SBT build, specifically 
> w/the hadoop 2.0 profile:
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.6-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=spark-test/56/console
> [~vanzin] grabbed the jstack log, which i've attached to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-11931) "org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.test jdbc cancel" is sometimes very slow

2015-12-21 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reassigned SPARK-11931:
--

Assignee: Josh Rosen

> "org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.test jdbc 
> cancel" is sometimes very slow
> 
>
> Key: SPARK-11931
> URL: https://issues.apache.org/jira/browse/SPARK-11931
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> The "org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.test 
> jdbc cancel" test is sometimes very slow. It usually takes about 8 seconds to 
> run, but in some cases can take up to 5 minutes: 
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/HADOOP_VERSION=1.2.1,label=spark-test/4874/testReport/org.apache.spark.sql.hive.thriftserver/HiveThriftBinaryServerSuite/test_jdbc_cancel/history/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11823) HiveThriftBinaryServerSuite tests timing out, leaves hanging processes

2015-12-21 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067314#comment-15067314
 ] 

Josh Rosen commented on SPARK-11823:


It looks like this has caused a huge number of timeouts in the Master Maven 
Hadoop 2.4 builds this week: 
https://spark-tests.appspot.com/jobs/Spark-Master-Maven-with-YARN%20%C2%BB%20hadoop-2.4%2Cspark-test

I'm going to pull some logs and take a look.

> HiveThriftBinaryServerSuite tests timing out, leaves hanging processes
> --
>
> Key: SPARK-11823
> URL: https://issues.apache.org/jira/browse/SPARK-11823
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: shane knapp
> Attachments: 
> spark-jenkins-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-amp-jenkins-worker-05.out,
>  stack.log
>
>
> i've noticed on a few branches that the HiveThriftBinaryServerSuite tests 
> time out, and when that happens, the build is aborted but the tests leave 
> behind hanging processes that eat up cpu and ram.
> most recently, i discovered this happening w/the 1.6 SBT build, specifically 
> w/the hadoop 2.0 profile:
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.6-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=spark-test/56/console
> [~vanzin] grabbed the jstack log, which i've attached to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12414) Remove closure serializer

2015-12-21 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12414:
--
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-11806

> Remove closure serializer
> -
>
> Key: SPARK-12414
> URL: https://issues.apache.org/jira/browse/SPARK-12414
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>
> There is a config `spark.closure.serializer` that accepts exactly one value: 
> the java serializer. This is because there are currently bugs in the Kryo 
> serializer that make it not a viable candidate. This was uncovered by an 
> unsuccessful attempt to make it work: SPARK-7708.
> My high level point is that the Java serializer has worked well for at least 
> 6 Spark versions now, and it is an incredibly complicated task to get other 
> serializers (not just Kryo) to work with Spark's closures. IMO the effort is 
> not worth it and we should just remove this documentation and all the code 
> associated with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10873) can't sort columns on history page

2015-12-21 Thread Alex Bozarth (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067304#comment-15067304
 ] 

Alex Bozarth commented on SPARK-10873:
--

Thanks [~zhuoliu], is your fix for this going to include adding the search 
feature (like in jQuery DataTables)? Otherwise I might look at picking up 
SPARK-10874

> can't sort columns on history page
> --
>
> Key: SPARK-10873
> URL: https://issues.apache.org/jira/browse/SPARK-10873
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Thomas Graves
>Assignee: Zhuo Liu
>
> Starting with 1.5.1 the history server page isn't allowing sorting by column



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12453) Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version

2015-12-21 Thread Martin Schade (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067301#comment-15067301
 ] 

Martin Schade commented on SPARK-12453:
---

That would do that trick as well. Wasn't sure what you prefer. My PR just 
changes the version to the one that matches the KCL.

> Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version
> 
>
> Key: SPARK-12453
> URL: https://issues.apache.org/jira/browse/SPARK-12453
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.5.2
>Reporter: Martin Schade
>Priority: Critical
>  Labels: easyfix
>
> The Spark Streaming Kinesis Example (kinesis-asl) is broken due to wrong AWS 
> Java SDK version (1.9.16) referenced with the AWS KCL version (1.3.0).
> AWS KCL 1.3.0 references AWS Java SDK version 1.9.37.
> Using 1.9.16 in combination with 1.3.0 does fail to get data out of the 
> stream.
> I tested Spark Streaming with 1.9.37 and it works fine. 
> Testing a simple KCL client outside of Spark with 1.3.0 and 1.9.16 also 
> fails, so it is due to the specific versions used in 1.5.2 and not a Spark 
> related implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12471) Spark daemons should log their pid in the log file

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12471:


Assignee: Apache Spark

> Spark daemons should log their pid in the log file
> --
>
> Key: SPARK-12471
> URL: https://issues.apache.org/jira/browse/SPARK-12471
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Nong Li
>Assignee: Apache Spark
>
> This is useful when debugging from the log files without the processes 
> running. This information makes it possible to combine the log files with 
> other system information (e.g. dmesg output)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12458) Add ExpressionDescription to datetime functions

2015-12-21 Thread Dilip Biswal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067278#comment-15067278
 ] 

Dilip Biswal commented on SPARK-12458:
--

I would like to work on this one.

> Add ExpressionDescription to datetime functions
> ---
>
> Key: SPARK-12458
> URL: https://issues.apache.org/jira/browse/SPARK-12458
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12472) OOM when sort a table and save as parquet

2015-12-21 Thread Davies Liu (JIRA)

Davies Liu created SPARK-12472:
--

 Summary: OOM when sort a table and save as parquet
 Key: SPARK-12472
 URL: https://issues.apache.org/jira/browse/SPARK-12472
 Project: Spark
  Issue Type: Bug
Reporter: Davies Liu


{code}
t = sqlContext.table('store_sales')
t.unionAll(t).coalesce(2).sortWithinPartitions(t[0]).write.partitionBy('ss_sold_date_sk').parquet("/tmp/ttt")
{code}

{code}
15/12/21 14:35:52 WARN TaskSetManager: Lost task 1.0 in stage 25.0 (TID 96, 
192.168.0.143): java.lang.OutOfMemoryError: Java heap space
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeSortDataFormat.allocate(UnsafeSortDataFormat.java:86)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeSortDataFormat.allocate(UnsafeSortDataFormat.java:32)
at 
org.apache.spark.util.collection.TimSort$SortState.ensureCapacity(TimSort.java:951)
at 
org.apache.spark.util.collection.TimSort$SortState.mergeLo(TimSort.java:699)
at 
org.apache.spark.util.collection.TimSort$SortState.mergeAt(TimSort.java:525)
at 
org.apache.spark.util.collection.TimSort$SortState.mergeCollapse(TimSort.java:453)
at 
org.apache.spark.util.collection.TimSort$SortState.access$200(TimSort.java:325)
at org.apache.spark.util.collection.TimSort.sort(TimSort.java:153)
at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:226)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:187)
at 
org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:170)
at 
org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:244)
at 
org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:327)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:342)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:91)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12456) Add ExpressionDescription to misc functions

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12456:


Assignee: Apache Spark

> Add ExpressionDescription to misc functions
> ---
>
> Key: SPARK-12456
> URL: https://issues.apache.org/jira/browse/SPARK-12456
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12456) Add ExpressionDescription to misc functions

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12456:


Assignee: (was: Apache Spark)

> Add ExpressionDescription to misc functions
> ---
>
> Key: SPARK-12456
> URL: https://issues.apache.org/jira/browse/SPARK-12456
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12471) Spark daemons should log their pid in the log file

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12471:


Assignee: (was: Apache Spark)

> Spark daemons should log their pid in the log file
> --
>
> Key: SPARK-12471
> URL: https://issues.apache.org/jira/browse/SPARK-12471
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Nong Li
>
> This is useful when debugging from the log files without the processes 
> running. This information makes it possible to combine the log files with 
> other system information (e.g. dmesg output)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12471) Spark daemons should log their pid in the log file

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12471:


Assignee: Apache Spark

> Spark daemons should log their pid in the log file
> --
>
> Key: SPARK-12471
> URL: https://issues.apache.org/jira/browse/SPARK-12471
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Nong Li
>Assignee: Apache Spark
>
> This is useful when debugging from the log files without the processes 
> running. This information makes it possible to combine the log files with 
> other system information (e.g. dmesg output)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12471) Spark daemons should log their pid in the log file

2015-12-21 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067218#comment-15067218
 ] 

Apache Spark commented on SPARK-12471:
--

User 'nongli' has created a pull request for this issue:
https://github.com/apache/spark/pull/10422

> Spark daemons should log their pid in the log file
> --
>
> Key: SPARK-12471
> URL: https://issues.apache.org/jira/browse/SPARK-12471
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Nong Li
>
> This is useful when debugging from the log files without the processes 
> running. This information makes it possible to combine the log files with 
> other system information (e.g. dmesg output)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12470) Incorrect calculation of row size in o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12470:


Assignee: Apache Spark

> Incorrect calculation of row size in 
> o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner
> ---
>
> Key: SPARK-12470
> URL: https://issues.apache.org/jira/browse/SPARK-12470
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Pete Robbins
>Assignee: Apache Spark
>Priority: Minor
>
> While looking into https://issues.apache.org/jira/browse/SPARK-12319 I 
> noticed that the row size is incorrectly calculated.
> The "sizeReduction" value is calculated in words:
>// The number of words we can reduce when we concat two rows together.
> // The only reduction comes from merging the bitset portion of the two 
> rows, saving 1 word.
> val sizeReduction = bitset1Words + bitset2Words - outputBitsetWords
> but then it is subtracted from the size of the row in bytes:
>|out.pointTo(buf, ${schema1.size + schema2.size}, sizeInBytes - 
> $sizeReduction);
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12470) Incorrect calculation of row size in o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12470:


Assignee: (was: Apache Spark)

> Incorrect calculation of row size in 
> o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner
> ---
>
> Key: SPARK-12470
> URL: https://issues.apache.org/jira/browse/SPARK-12470
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Pete Robbins
>Priority: Minor
>
> While looking into https://issues.apache.org/jira/browse/SPARK-12319 I 
> noticed that the row size is incorrectly calculated.
> The "sizeReduction" value is calculated in words:
>// The number of words we can reduce when we concat two rows together.
> // The only reduction comes from merging the bitset portion of the two 
> rows, saving 1 word.
> val sizeReduction = bitset1Words + bitset2Words - outputBitsetWords
> but then it is subtracted from the size of the row in bytes:
>|out.pointTo(buf, ${schema1.size + schema2.size}, sizeInBytes - 
> $sizeReduction);
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12299) Remove history serving functionality from standalone Master

2015-12-21 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-12299:
---
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-11806

> Remove history serving functionality from standalone Master
> ---
>
> Key: SPARK-12299
> URL: https://issues.apache.org/jira/browse/SPARK-12299
> Project: Spark
>  Issue Type: Sub-task
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>
> The standalone Master currently continues to serve the historical UIs of 
> applications that have completed and enabled event logging. This poses 
> problems, however, if the event log is very large, e.g. SPARK-6270. The 
> Master might OOM or hang while it rebuilds the UI, rejecting applications in 
> the mean time.
> Personally, I have had to make modifications in the code to disable this 
> myself, because I wanted to use event logging in standalone mode for 
> applications that produce a lot of logging.
> Removing this from the Master would simplify the process significantly. This 
> issue supersedes SPARK-12062.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12299) Remove history serving functionality from standalone Master

2015-12-21 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reassigned SPARK-12299:
--

Assignee: Josh Rosen

> Remove history serving functionality from standalone Master
> ---
>
> Key: SPARK-12299
> URL: https://issues.apache.org/jira/browse/SPARK-12299
> Project: Spark
>  Issue Type: Sub-task
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Josh Rosen
>
> The standalone Master currently continues to serve the historical UIs of 
> applications that have completed and enabled event logging. This poses 
> problems, however, if the event log is very large, e.g. SPARK-6270. The 
> Master might OOM or hang while it rebuilds the UI, rejecting applications in 
> the mean time.
> Personally, I have had to make modifications in the code to disable this 
> myself, because I wanted to use event logging in standalone mode for 
> applications that produce a lot of logging.
> Removing this from the Master would simplify the process significantly. This 
> issue supersedes SPARK-12062.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12471) Spark daemons should log their pid in the log file

2015-12-21 Thread Nong Li (JIRA)

Nong Li created SPARK-12471:
---

 Summary: Spark daemons should log their pid in the log file
 Key: SPARK-12471
 URL: https://issues.apache.org/jira/browse/SPARK-12471
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Nong Li


This is useful when debugging from the log files without the processes running. 
This information makes it possible to combine the log files with other system 
information (e.g. dmesg output)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5226) Add DBSCAN Clustering Algorithm to MLlib

2015-12-21 Thread mustafa elbehery (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067200#comment-15067200
 ] 

mustafa elbehery commented on SPARK-5226:
-

Better Implementation, based on research paper for parallel DBSCAN can be found 
here. https://github.com/irvingc/dbscan-on-spark .. 

The approach solved bottleneck of reduce step, in which discovered clusters are 
merged.

Hope it helps.

> Add DBSCAN Clustering Algorithm to MLlib
> 
>
> Key: SPARK-5226
> URL: https://issues.apache.org/jira/browse/SPARK-5226
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Muhammad-Ali A'rabi
>Priority: Minor
>  Labels: DBSCAN, clustering
>
> MLlib is all k-means now, and I think we should add some new clustering 
> algorithms to it. First candidate is DBSCAN as I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12470) Incorrect calculation of row size in o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner

2015-12-21 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067201#comment-15067201
 ] 

Apache Spark commented on SPARK-12470:
--

User 'robbinspg' has created a pull request for this issue:
https://github.com/apache/spark/pull/10421

> Incorrect calculation of row size in 
> o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner
> ---
>
> Key: SPARK-12470
> URL: https://issues.apache.org/jira/browse/SPARK-12470
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Pete Robbins
>Priority: Minor
>
> While looking into https://issues.apache.org/jira/browse/SPARK-12319 I 
> noticed that the row size is incorrectly calculated.
> The "sizeReduction" value is calculated in words:
>// The number of words we can reduce when we concat two rows together.
> // The only reduction comes from merging the bitset portion of the two 
> rows, saving 1 word.
> val sizeReduction = bitset1Words + bitset2Words - outputBitsetWords
> but then it is subtracted from the size of the row in bytes:
>|out.pointTo(buf, ${schema1.size + schema2.size}, sizeInBytes - 
> $sizeReduction);
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12470) Incorrect calculation of row size in o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner

2015-12-21 Thread Pete Robbins (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pete Robbins updated SPARK-12470:
-
Component/s: SQL
Summary: Incorrect calculation of row size in 
o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner  (was: Incorrect 
calculation of row size in 
o.a.s.catalyst.expressions.codegen.GenerateUnsafeRowJoiner)

> Incorrect calculation of row size in 
> o.a.s.sql.catalyst.expressions.codegen.GenerateUnsafeRowJoiner
> ---
>
> Key: SPARK-12470
> URL: https://issues.apache.org/jira/browse/SPARK-12470
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Pete Robbins
>Priority: Minor
>
> While looking into https://issues.apache.org/jira/browse/SPARK-12319 I 
> noticed that the row size is incorrectly calculated.
> The "sizeReduction" value is calculated in words:
>// The number of words we can reduce when we concat two rows together.
> // The only reduction comes from merging the bitset portion of the two 
> rows, saving 1 word.
> val sizeReduction = bitset1Words + bitset2Words - outputBitsetWords
> but then it is subtracted from the size of the row in bytes:
>|out.pointTo(buf, ${schema1.size + schema2.size}, sizeInBytes - 
> $sizeReduction);
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12363) PowerIterationClustering test case failed if we deprecated KMeans.setRuns

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12363:


Assignee: Apache Spark

> PowerIterationClustering test case failed if we deprecated KMeans.setRuns
> -
>
> Key: SPARK-12363
> URL: https://issues.apache.org/jira/browse/SPARK-12363
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Reporter: Yanbo Liang
>Assignee: Apache Spark
>Priority: Minor
>
> We plan to deprecated `runs` of KMeans, PowerIterationClustering will 
> leverage KMeans to train model.
> I removed `setRuns` used in PowerIterationClustering, but one of the test 
> cases failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12470) Incorrect calculation of row size in o.a.s.catalyst.expressions.codegen.GenerateUnsafeRowJoiner

2015-12-21 Thread Pete Robbins (JIRA)

Pete Robbins created SPARK-12470:


 Summary: Incorrect calculation of row size in 
o.a.s.catalyst.expressions.codegen.GenerateUnsafeRowJoiner
 Key: SPARK-12470
 URL: https://issues.apache.org/jira/browse/SPARK-12470
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.5.2
Reporter: Pete Robbins
Priority: Minor


While looking into https://issues.apache.org/jira/browse/SPARK-12319 I noticed 
that the row size is incorrectly calculated.

The "sizeReduction" value is calculated in words:

   // The number of words we can reduce when we concat two rows together.
// The only reduction comes from merging the bitset portion of the two 
rows, saving 1 word.
val sizeReduction = bitset1Words + bitset2Words - outputBitsetWords

but then it is subtracted from the size of the row in bytes:

   |out.pointTo(buf, ${schema1.size + schema2.size}, sizeInBytes - 
$sizeReduction);
 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5226) Add DBSCAN Clustering Algorithm to MLlib

2015-12-21 Thread Stephen Boesch (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067177#comment-15067177
 ] 

Stephen Boesch commented on SPARK-5226:
---

It seems that Aliaksei has not been able to add to spark-packages. So I presume 
DBScan is still an open ticket ?

> Add DBSCAN Clustering Algorithm to MLlib
> 
>
> Key: SPARK-5226
> URL: https://issues.apache.org/jira/browse/SPARK-5226
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Muhammad-Ali A'rabi
>Priority: Minor
>  Labels: DBSCAN, clustering
>
> MLlib is all k-means now, and I think we should add some new clustering 
> algorithms to it. First candidate is DBSCAN as I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12388) Change default compressor to LZ4

2015-12-21 Thread Davies Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-12388.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10342
[https://github.com/apache/spark/pull/10342]

> Change default compressor to LZ4
> 
>
> Key: SPARK-12388
> URL: https://issues.apache.org/jira/browse/SPARK-12388
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>  Labels: releasenotes
> Fix For: 2.0.0
>
>
> According the benchmark [1], LZ4-java could be 80% (or 30%) faster than 
> Snappy.
> After changing the compressor to LZ4, I saw 20% improvement on end-to-end 
> time for a TPCDS query (Q4).
> [1] https://github.com/ning/jvm-compressor-benchmark/wiki



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12339) NullPointerException on stage kill from web UI

2015-12-21 Thread Andrew Or (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067154#comment-15067154
 ] 

Andrew Or commented on SPARK-12339:
---

I've updated the affected version to 2.0 since SPARK-11206 was merged only 
there. Please let me know if this is not the case.

> NullPointerException on stage kill from web UI
> --
>
> Key: SPARK-12339
> URL: https://issues.apache.org/jira/browse/SPARK-12339
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Assignee: Alex Bozarth
> Fix For: 2.0.0
>
>
> The following message is in the logs after killing a stage:
> {code}
> scala> INFO Executor: Executor killed task 1.0 in stage 7.0 (TID 33)
> INFO Executor: Executor killed task 0.0 in stage 7.0 (TID 32)
> WARN TaskSetManager: Lost task 1.0 in stage 7.0 (TID 33, localhost): 
> TaskKilled (killed intentionally)
> WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 32, localhost): 
> TaskKilled (killed intentionally)
> INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, 
> from pool
> ERROR LiveListenerBus: Listener SQLListener threw an exception
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
>   at 
> org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1169)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
> ERROR LiveListenerBus: Listener SQLListener threw an exception
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
>   at 
> org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1169)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
> {code}
> To reproduce, start a job and kill the stage from web UI, e.g.:
> {code}
> val rdd = sc.parallelize(0 to 9, 2)
> rdd.mapPartitionsWithIndex { case (n, it) => Thread.sleep(10 * 1000); it 
> }.count
> {code}
> Go to web UI and in Stages tab click "kill" for the stage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12453) Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version

2015-12-21 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067156#comment-15067156
 ] 

Sean Owen commented on SPARK-12453:
---

Ah, I misread the PR and it already just removes aws.java.sdk.version and 
manually managing the dependency. Just deleting the version and the 
dependencyManagement entry does the trick right?

> Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version
> 
>
> Key: SPARK-12453
> URL: https://issues.apache.org/jira/browse/SPARK-12453
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.5.2
>Reporter: Martin Schade
>Priority: Critical
>  Labels: easyfix
>
> The Spark Streaming Kinesis Example (kinesis-asl) is broken due to wrong AWS 
> Java SDK version (1.9.16) referenced with the AWS KCL version (1.3.0).
> AWS KCL 1.3.0 references AWS Java SDK version 1.9.37.
> Using 1.9.16 in combination with 1.3.0 does fail to get data out of the 
> stream.
> I tested Spark Streaming with 1.9.37 and it works fine. 
> Testing a simple KCL client outside of Spark with 1.3.0 and 1.9.16 also 
> fails, so it is due to the specific versions used in 1.5.2 and not a Spark 
> related implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-5882) Add a test for GraphLoader.edgeListFile

2015-12-21 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-5882.
--
  Resolution: Fixed
Assignee: Takeshi Yamamuro
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Add a test for GraphLoader.edgeListFile
> ---
>
> Key: SPARK-5882
> URL: https://issues.apache.org/jira/browse/SPARK-5882
> Project: Spark
>  Issue Type: Test
>  Components: GraphX
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Trivial
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12392) Optimize a location order of broadcast blocks by considering preferred local hosts

2015-12-21 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-12392.
---
  Resolution: Fixed
Assignee: Takeshi Yamamuro
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> Optimize a location order of broadcast blocks by considering preferred local 
> hosts
> --
>
> Key: SPARK-12392
> URL: https://issues.apache.org/jira/browse/SPARK-12392
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.2
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
> Fix For: 2.0.0
>
>
> When multiple workers exist in a host, we can bypass unnecessary remote 
> access for broadcasts; block managers fetch broadcast blocks from the same 
> host instead of remote hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12339) NullPointerException on stage kill from web UI

2015-12-21 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12339:
--
Affects Version/s: (was: 1.6.0)
   2.0.0

> NullPointerException on stage kill from web UI
> --
>
> Key: SPARK-12339
> URL: https://issues.apache.org/jira/browse/SPARK-12339
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Assignee: Alex Bozarth
> Fix For: 2.0.0
>
>
> The following message is in the logs after killing a stage:
> {code}
> scala> INFO Executor: Executor killed task 1.0 in stage 7.0 (TID 33)
> INFO Executor: Executor killed task 0.0 in stage 7.0 (TID 32)
> WARN TaskSetManager: Lost task 1.0 in stage 7.0 (TID 33, localhost): 
> TaskKilled (killed intentionally)
> WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 32, localhost): 
> TaskKilled (killed intentionally)
> INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, 
> from pool
> ERROR LiveListenerBus: Listener SQLListener threw an exception
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
>   at 
> org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1169)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
> ERROR LiveListenerBus: Listener SQLListener threw an exception
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
>   at 
> org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1169)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
> {code}
> To reproduce, start a job and kill the stage from web UI, e.g.:
> {code}
> val rdd = sc.parallelize(0 to 9, 2)
> rdd.mapPartitionsWithIndex { case (n, it) => Thread.sleep(10 * 1000); it 
> }.count
> {code}
> Go to web UI and in Stages tab click "kill" for the stage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12339) NullPointerException on stage kill from web UI

2015-12-21 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-12339.
---
  Resolution: Fixed
Assignee: Alex Bozarth  (was: Apache Spark)
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0

> NullPointerException on stage kill from web UI
> --
>
> Key: SPARK-12339
> URL: https://issues.apache.org/jira/browse/SPARK-12339
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: Jacek Laskowski
>Assignee: Alex Bozarth
> Fix For: 2.0.0
>
>
> The following message is in the logs after killing a stage:
> {code}
> scala> INFO Executor: Executor killed task 1.0 in stage 7.0 (TID 33)
> INFO Executor: Executor killed task 0.0 in stage 7.0 (TID 32)
> WARN TaskSetManager: Lost task 1.0 in stage 7.0 (TID 33, localhost): 
> TaskKilled (killed intentionally)
> WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 32, localhost): 
> TaskKilled (killed intentionally)
> INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, 
> from pool
> ERROR LiveListenerBus: Listener SQLListener threw an exception
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
>   at 
> org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1169)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
> ERROR LiveListenerBus: Listener SQLListener threw an exception
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.ui.SQLListener.onTaskEnd(SQLListener.scala:167)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
>   at 
> org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1169)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
> {code}
> To reproduce, start a job and kill the stage from web UI, e.g.:
> {code}
> val rdd = sc.parallelize(0 to 9, 2)
> rdd.mapPartitionsWithIndex { case (n, it) => Thread.sleep(10 * 1000); it 
> }.count
> {code}
> Go to web UI and in Stages tab click "kill" for the stage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12466) Harmless Master NPE in tests

2015-12-21 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-12466.
---
Resolution: Fixed

> Harmless Master NPE in tests
> 
>
> Key: SPARK-12466
> URL: https://issues.apache.org/jira/browse/SPARK-12466
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Tests
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.6.1, 2.0.0
>
>
> {code}
> [info] ReplayListenerSuite:
> [info] - Simple replay (58 milliseconds)
> java.lang.NullPointerException
>   at 
> org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:982)
>   at 
> org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:980)
>   at scala.concurrent.Future$$anonfun$onSuccess$1.apply(Future.scala:117)
>   at scala.concurrent.Future$$anonfun$onSuccess$1.apply(Future.scala:115)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
>   at scala.concurrent.Promise$class.complete(Promise.scala:55)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:23)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> [info] - End-to-end replay (10 seconds, 755 milliseconds)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-SBT/4316/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/consoleFull
> caused by https://github.com/apache/spark/pull/10284
> Thanks to [~ted_yu] for reporting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2331) SparkContext.emptyRDD should return RDD[T] not EmptyRDD[T]

2015-12-21 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-2331.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

> SparkContext.emptyRDD should return RDD[T] not EmptyRDD[T]
> --
>
> Key: SPARK-2331
> URL: https://issues.apache.org/jira/browse/SPARK-2331
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Ian Hummel
>Assignee: Reynold Xin
>Priority: Minor
> Fix For: 2.0.0
>
>
> The return type for SparkContext.emptyRDD is EmptyRDD[T].
> It should be RDD[T].  That means you have to add extra type annotations on 
> code like the below (which creates a union of RDDs over some subset of paths 
> in a folder)
> {code}
> val rdds = Seq("a", "b", "c").foldLeft[RDD[String]](sc.emptyRDD[String]) { 
> (rdd, path) ⇒
>   rdd.union(sc.textFile(path))
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12440) Avoid setCheckpointDir warning when filesystem is not local

2015-12-21 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12440:
--
Priority: Trivial  (was: Major)

> Avoid setCheckpointDir warning when filesystem is not local
> ---
>
> Key: SPARK-12440
> URL: https://issues.apache.org/jira/browse/SPARK-12440
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.0, 1.6.1
>Reporter: Pierre Borckmans
>Priority: Trivial
>
> In SparkContext method `setCheckpointDir`, a warning is issued when spark 
> master is not local and the passed directory for the checkpoint dir appears 
> to be local.
> In practice, when relying on hdfs configuration file and using relative path 
> (incomplete URI without hdfs scheme, ...), this warning should not be issued 
> and might be confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12440) Avoid setCheckpointDir warning when filesystem is not local

2015-12-21 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12440:
--
Summary: Avoid setCheckpointDir warning when filesystem is not local  (was: 
[CORE] Avoid setCheckpointDir warning when filesystem is not local)

> Avoid setCheckpointDir warning when filesystem is not local
> ---
>
> Key: SPARK-12440
> URL: https://issues.apache.org/jira/browse/SPARK-12440
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.0, 1.6.1
>Reporter: Pierre Borckmans
>
> In SparkContext method `setCheckpointDir`, a warning is issued when spark 
> master is not local and the passed directory for the checkpoint dir appears 
> to be local.
> In practice, when relying on hdfs configuration file and using relative path 
> (incomplete URI without hdfs scheme, ...), this warning should not be issued 
> and might be confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12453) Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version

2015-12-21 Thread Martin Schade (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067125#comment-15067125
 ] 

Martin Schade commented on SPARK-12453:
---

Make sense, thank you. Ideally it should be 1.9.37 instead of 1.9.40 though. 
Both KCL 1.4.0 and KPL 0.10.1 reference 1.9.37.

https://github.com/awslabs/amazon-kinesis-producer/blob/v0.10.1/java/amazon-kinesis-producer/pom.xml
 
https://github.com/awslabs/amazon-kinesis-client/blob/v1.4.0/pom.xml
Both reference 1.9.37

In the latest version of KPL (v0.10.2) 1.10.34 is references and in latest KCL 
1.6.1 it is version 1.10.20, not easy so sync then. So it would need to do some 
testing which combination works actually.

> Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version
> 
>
> Key: SPARK-12453
> URL: https://issues.apache.org/jira/browse/SPARK-12453
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.5.2
>Reporter: Martin Schade
>Priority: Critical
>  Labels: easyfix
>
> The Spark Streaming Kinesis Example (kinesis-asl) is broken due to wrong AWS 
> Java SDK version (1.9.16) referenced with the AWS KCL version (1.3.0).
> AWS KCL 1.3.0 references AWS Java SDK version 1.9.37.
> Using 1.9.16 in combination with 1.3.0 does fail to get data out of the 
> stream.
> I tested Spark Streaming with 1.9.37 and it works fine. 
> Testing a simple KCL client outside of Spark with 1.3.0 and 1.9.16 also 
> fails, so it is due to the specific versions used in 1.5.2 and not a Spark 
> related implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12440) [CORE] Avoid setCheckpointDir warning when filesystem is not local

2015-12-21 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12440:
--
Component/s: Spark Core

> [CORE] Avoid setCheckpointDir warning when filesystem is not local
> --
>
> Key: SPARK-12440
> URL: https://issues.apache.org/jira/browse/SPARK-12440
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.0, 1.6.1
>Reporter: Pierre Borckmans
>
> In SparkContext method `setCheckpointDir`, a warning is issued when spark 
> master is not local and the passed directory for the checkpoint dir appears 
> to be local.
> In practice, when relying on hdfs configuration file and using relative path 
> (incomplete URI without hdfs scheme, ...), this warning should not be issued 
> and might be confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12463) Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode

2015-12-21 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12463:
--
Component/s: Mesos

> Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode
> 
>
> Key: SPARK-12463
> URL: https://issues.apache.org/jira/browse/SPARK-12463
> Project: Spark
>  Issue Type: Task
>  Components: Mesos
>Reporter: Timothy Chen
>
> Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode 
> configuration for cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12374) Improve performance of Range APIs via adding logical/physical operators

2015-12-21 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-12374.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10335
[https://github.com/apache/spark/pull/10335]

> Improve performance of Range APIs via adding logical/physical operators
> ---
>
> Key: SPARK-12374
> URL: https://issues.apache.org/jira/browse/SPARK-12374
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Critical
> Fix For: 2.0.0
>
>
> Creating an actual logical/physical operator for range for matching the 
> performance of RDD Range APIs. 
> Compared with the old Range API, the new version is 3 times faster than the 
> old version. 
> {code}
> scala> val startTime = System.currentTimeMillis; sqlContext.oldRange(0, 
> 10, 1, 15).count(); val endTime = System.currentTimeMillis; val start 
> = new Timestamp(startTime); val end = new Timestamp(endTime); val elapsed = 
> (endTime - startTime)/ 1000.0
> startTime: Long = 1450416394240   
>   
> endTime: Long = 1450416421199
> start: java.sql.Timestamp = 2015-12-17 21:26:34.24
> end: java.sql.Timestamp = 2015-12-17 21:27:01.199
> elapsed: Double = 26.959
> {code}
> {code}
> scala> val startTime = System.currentTimeMillis; sqlContext.range(0, 
> 10, 1, 15).count(); val endTime = System.currentTimeMillis; val start 
> = new Timestamp(startTime); val end = new Timestamp(endTime); val elapsed = 
> (endTime - startTime)/ 1000.0
> startTime: Long = 1450416360107   
>   
> endTime: Long = 1450416368590
> start: java.sql.Timestamp = 2015-12-17 21:26:00.107
> end: java.sql.Timestamp = 2015-12-17 21:26:08.59
> elapsed: Double = 8.483
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12150) numPartitions argument to sqlContext.range() should be optional

2015-12-21 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-12150.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10335
[https://github.com/apache/spark/pull/10335]

> numPartitions argument to sqlContext.range()  should be optional
> 
>
> Key: SPARK-12150
> URL: https://issues.apache.org/jira/browse/SPARK-12150
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Henri DF
>Priority: Minor
> Fix For: 2.0.0
>
>
> It's a little inconsistent that the first two sqlContext.range() methods 
> don't take a numPartitions arg, while the third one does. 
> And more importantly, it's a little inconvenient that the numPartitions arg 
> is mandatory for the third range() method - it means that if you want to 
> specify a step, you suddenly have to think about partitioning - an orthogonal 
> concern.
> My suggestion would be to make numPartitions optional, like it is on the 
> sparkContext.range(..).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12453) Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version

2015-12-21 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067115#comment-15067115
 ] 

Sean Owen commented on SPARK-12453:
---

OK, I see what happened here: 
https://github.com/apache/spark/commit/87f82a5fb9c4350a97c761411069245f07aad46f

How about updating to 1.9.40 for consistency? really, it sounds like there's no 
point manually setting the SDK version here -- how about preemptively bringing 
those parts of SPARK-12269 back? 

Then really it should go into master first, and be backported, and then further 
updated by 12269. This is why I view it as sort of a duplicate, since it could 
as well come from back-porting just a subset of 12269.

I don't know if a new 1.5.x release will happen.

> Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version
> 
>
> Key: SPARK-12453
> URL: https://issues.apache.org/jira/browse/SPARK-12453
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.5.2
>Reporter: Martin Schade
>Priority: Critical
>  Labels: easyfix
>
> The Spark Streaming Kinesis Example (kinesis-asl) is broken due to wrong AWS 
> Java SDK version (1.9.16) referenced with the AWS KCL version (1.3.0).
> AWS KCL 1.3.0 references AWS Java SDK version 1.9.37.
> Using 1.9.16 in combination with 1.3.0 does fail to get data out of the 
> stream.
> I tested Spark Streaming with 1.9.37 and it works fine. 
> Testing a simple KCL client outside of Spark with 1.3.0 and 1.9.16 also 
> fails, so it is due to the specific versions used in 1.5.2 and not a Spark 
> related implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12466) Harmless Master NPE in tests

2015-12-21 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067112#comment-15067112
 ] 

Apache Spark commented on SPARK-12466:
--

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/10417

> Harmless Master NPE in tests
> 
>
> Key: SPARK-12466
> URL: https://issues.apache.org/jira/browse/SPARK-12466
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Tests
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.6.1, 2.0.0
>
>
> {code}
> [info] ReplayListenerSuite:
> [info] - Simple replay (58 milliseconds)
> java.lang.NullPointerException
>   at 
> org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:982)
>   at 
> org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:980)
>   at scala.concurrent.Future$$anonfun$onSuccess$1.apply(Future.scala:117)
>   at scala.concurrent.Future$$anonfun$onSuccess$1.apply(Future.scala:115)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
>   at scala.concurrent.Promise$class.complete(Promise.scala:55)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:23)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> [info] - End-to-end replay (10 seconds, 755 milliseconds)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-SBT/4316/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/consoleFull
> caused by https://github.com/apache/spark/pull/10284
> Thanks to [~ted_yu] for reporting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12466) Harmless Master NPE in tests

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12466:


Assignee: Apache Spark  (was: Andrew Or)

> Harmless Master NPE in tests
> 
>
> Key: SPARK-12466
> URL: https://issues.apache.org/jira/browse/SPARK-12466
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Tests
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Apache Spark
> Fix For: 1.6.1, 2.0.0
>
>
> {code}
> [info] ReplayListenerSuite:
> [info] - Simple replay (58 milliseconds)
> java.lang.NullPointerException
>   at 
> org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:982)
>   at 
> org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:980)
>   at scala.concurrent.Future$$anonfun$onSuccess$1.apply(Future.scala:117)
>   at scala.concurrent.Future$$anonfun$onSuccess$1.apply(Future.scala:115)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
>   at scala.concurrent.Promise$class.complete(Promise.scala:55)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:23)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> [info] - End-to-end replay (10 seconds, 755 milliseconds)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-SBT/4316/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/consoleFull
> caused by https://github.com/apache/spark/pull/10284
> Thanks to [~ted_yu] for reporting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12466) Harmless Master NPE in tests

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12466:


Assignee: Andrew Or  (was: Apache Spark)

> Harmless Master NPE in tests
> 
>
> Key: SPARK-12466
> URL: https://issues.apache.org/jira/browse/SPARK-12466
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Tests
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.6.1, 2.0.0
>
>
> {code}
> [info] ReplayListenerSuite:
> [info] - Simple replay (58 milliseconds)
> java.lang.NullPointerException
>   at 
> org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:982)
>   at 
> org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:980)
>   at scala.concurrent.Future$$anonfun$onSuccess$1.apply(Future.scala:117)
>   at scala.concurrent.Future$$anonfun$onSuccess$1.apply(Future.scala:115)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
>   at scala.concurrent.Promise$class.complete(Promise.scala:55)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:23)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> [info] - End-to-end replay (10 seconds, 755 milliseconds)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-SBT/4316/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/consoleFull
> caused by https://github.com/apache/spark/pull/10284
> Thanks to [~ted_yu] for reporting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-12453) Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version

2015-12-21 Thread Martin Schade (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Schade reopened SPARK-12453:
---

This is a bug specifically in 1.5.2 and not fixed by updating the aws-java-sdk 
version on master.
Hence I created a pull request for the 1.5.2 branch, not for master.
The ticket SparkSPARK-12269 won't fix this on the 1.5 branch from what I can 
see.

Not sure if that is the right way to fix an issue on the branch, but seemed the 
most reasonable to me.

> Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version
> 
>
> Key: SPARK-12453
> URL: https://issues.apache.org/jira/browse/SPARK-12453
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.5.2
>Reporter: Martin Schade
>Priority: Critical
>  Labels: easyfix
>
> The Spark Streaming Kinesis Example (kinesis-asl) is broken due to wrong AWS 
> Java SDK version (1.9.16) referenced with the AWS KCL version (1.3.0).
> AWS KCL 1.3.0 references AWS Java SDK version 1.9.37.
> Using 1.9.16 in combination with 1.3.0 does fail to get data out of the 
> stream.
> I tested Spark Streaming with 1.9.37 and it works fine. 
> Testing a simple KCL client outside of Spark with 1.3.0 and 1.9.16 also 
> fails, so it is due to the specific versions used in 1.5.2 and not a Spark 
> related implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12453) Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version

2015-12-21 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-12453.
---
Resolution: Duplicate

> Spark Streaming Kinesis Example broken due to wrong AWS Java SDK version
> 
>
> Key: SPARK-12453
> URL: https://issues.apache.org/jira/browse/SPARK-12453
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.5.2
>Reporter: Martin Schade
>Priority: Critical
>  Labels: easyfix
>
> The Spark Streaming Kinesis Example (kinesis-asl) is broken due to wrong AWS 
> Java SDK version (1.9.16) referenced with the AWS KCL version (1.3.0).
> AWS KCL 1.3.0 references AWS Java SDK version 1.9.37.
> Using 1.9.16 in combination with 1.3.0 does fail to get data out of the 
> stream.
> I tested Spark Streaming with 1.9.37 and it works fine. 
> Testing a simple KCL client outside of Spark with 1.3.0 and 1.9.16 also 
> fails, so it is due to the specific versions used in 1.5.2 and not a Spark 
> related implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12468) getParamMap in Pyspark ML API returns empty dictionary in example for Documentation

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12468:


Assignee: (was: Apache Spark)

> getParamMap in Pyspark ML API returns empty dictionary in example for 
> Documentation
> ---
>
> Key: SPARK-12468
> URL: https://issues.apache.org/jira/browse/SPARK-12468
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.2
>Reporter: Zachary Brown
>Priority: Minor
>
> The `extractParamMap()` method for a model that has been fit returns an empty 
> dictionary, e.g. (from the [Pyspark ML API 
> Documentation](http://spark.apache.org/docs/latest/ml-guide.html#example-estimator-transformer-and-param)):
> ```python
> from pyspark.mllib.linalg import Vectors
> from pyspark.ml.classification import LogisticRegression
> from pyspark.ml.param import Param, Params
> # Prepare training data from a list of (label, features) tuples.
> training = sqlContext.createDataFrame([
> (1.0, Vectors.dense([0.0, 1.1, 0.1])),
> (0.0, Vectors.dense([2.0, 1.0, -1.0])),
> (0.0, Vectors.dense([2.0, 1.3, 1.0])),
> (1.0, Vectors.dense([0.0, 1.2, -0.5]))], ["label", "features"])
> # Create a LogisticRegression instance. This instance is an Estimator.
> lr = LogisticRegression(maxIter=10, regParam=0.01)
> # Print out the parameters, documentation, and any default values.
> print "LogisticRegression parameters:\n" + lr.explainParams() + "\n"
> # Learn a LogisticRegression model. This uses the parameters stored in lr.
> model1 = lr.fit(training)
> # Since model1 is a Model (i.e., a transformer produced by an Estimator),
> # we can view the parameters it used during fit().
> # This prints the parameter (name: value) pairs, where names are unique IDs 
> for this
> # LogisticRegression instance.
> print "Model 1 was fit using parameters: "
> print model1.extractParamMap()
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12247) Documentation for spark.ml's ALS and collaborative filtering in general

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12247:


Assignee: (was: Apache Spark)

> Documentation for spark.ml's ALS and collaborative filtering in general
> ---
>
> Key: SPARK-12247
> URL: https://issues.apache.org/jira/browse/SPARK-12247
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib
>Affects Versions: 1.5.2
>Reporter: Timothy Hunter
>
> We need to add a section in the documentation about collaborative filtering 
> in the dataframe API:
>  - copy explanations about collaborative filtering and ALS from spark.mllib
>  - provide an example with spark.ml's ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12247) Documentation for spark.ml's ALS and collaborative filtering in general

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12247:


Assignee: Apache Spark

> Documentation for spark.ml's ALS and collaborative filtering in general
> ---
>
> Key: SPARK-12247
> URL: https://issues.apache.org/jira/browse/SPARK-12247
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib
>Affects Versions: 1.5.2
>Reporter: Timothy Hunter
>Assignee: Apache Spark
>
> We need to add a section in the documentation about collaborative filtering 
> in the dataframe API:
>  - copy explanations about collaborative filtering and ALS from spark.mllib
>  - provide an example with spark.ml's ALS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12468) getParamMap in Pyspark ML API returns empty dictionary in example for Documentation

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12468:


Assignee: Apache Spark

> getParamMap in Pyspark ML API returns empty dictionary in example for 
> Documentation
> ---
>
> Key: SPARK-12468
> URL: https://issues.apache.org/jira/browse/SPARK-12468
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.2
>Reporter: Zachary Brown
>Assignee: Apache Spark
>Priority: Minor
>
> The `extractParamMap()` method for a model that has been fit returns an empty 
> dictionary, e.g. (from the [Pyspark ML API 
> Documentation](http://spark.apache.org/docs/latest/ml-guide.html#example-estimator-transformer-and-param)):
> ```python
> from pyspark.mllib.linalg import Vectors
> from pyspark.ml.classification import LogisticRegression
> from pyspark.ml.param import Param, Params
> # Prepare training data from a list of (label, features) tuples.
> training = sqlContext.createDataFrame([
> (1.0, Vectors.dense([0.0, 1.1, 0.1])),
> (0.0, Vectors.dense([2.0, 1.0, -1.0])),
> (0.0, Vectors.dense([2.0, 1.3, 1.0])),
> (1.0, Vectors.dense([0.0, 1.2, -0.5]))], ["label", "features"])
> # Create a LogisticRegression instance. This instance is an Estimator.
> lr = LogisticRegression(maxIter=10, regParam=0.01)
> # Print out the parameters, documentation, and any default values.
> print "LogisticRegression parameters:\n" + lr.explainParams() + "\n"
> # Learn a LogisticRegression model. This uses the parameters stored in lr.
> model1 = lr.fit(training)
> # Since model1 is a Model (i.e., a transformer produced by an Estimator),
> # we can view the parameters it used during fit().
> # This prints the parameter (name: value) pairs, where names are unique IDs 
> for this
> # LogisticRegression instance.
> print "Model 1 was fit using parameters: "
> print model1.extractParamMap()
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12468) getParamMap in Pyspark ML API returns empty dictionary in example for Documentation

2015-12-21 Thread Zachary Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067080#comment-15067080
 ] 

Zachary Brown commented on SPARK-12468:
---

Found a possible fix for this by modifying the `_fit()` method of the 
JavaEstimator class in `python/pyspark/ml/wrapper.py` to update the paramMap of 
the returned model. 

Created a pull request for it here:
https://github.com/apache/spark/pull/10419

> getParamMap in Pyspark ML API returns empty dictionary in example for 
> Documentation
> ---
>
> Key: SPARK-12468
> URL: https://issues.apache.org/jira/browse/SPARK-12468
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.2
>Reporter: Zachary Brown
>Priority: Minor
>
> The `extractParamMap()` method for a model that has been fit returns an empty 
> dictionary, e.g. (from the [Pyspark ML API 
> Documentation](http://spark.apache.org/docs/latest/ml-guide.html#example-estimator-transformer-and-param)):
> ```python
> from pyspark.mllib.linalg import Vectors
> from pyspark.ml.classification import LogisticRegression
> from pyspark.ml.param import Param, Params
> # Prepare training data from a list of (label, features) tuples.
> training = sqlContext.createDataFrame([
> (1.0, Vectors.dense([0.0, 1.1, 0.1])),
> (0.0, Vectors.dense([2.0, 1.0, -1.0])),
> (0.0, Vectors.dense([2.0, 1.3, 1.0])),
> (1.0, Vectors.dense([0.0, 1.2, -0.5]))], ["label", "features"])
> # Create a LogisticRegression instance. This instance is an Estimator.
> lr = LogisticRegression(maxIter=10, regParam=0.01)
> # Print out the parameters, documentation, and any default values.
> print "LogisticRegression parameters:\n" + lr.explainParams() + "\n"
> # Learn a LogisticRegression model. This uses the parameters stored in lr.
> model1 = lr.fit(training)
> # Since model1 is a Model (i.e., a transformer produced by an Estimator),
> # we can view the parameters it used during fit().
> # This prints the parameter (name: value) pairs, where names are unique IDs 
> for this
> # LogisticRegression instance.
> print "Model 1 was fit using parameters: "
> print model1.extractParamMap()
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12469) Consistent Accumulators for Spark

2015-12-21 Thread holdenk (JIRA)

holdenk created SPARK-12469:
---

 Summary: Consistent Accumulators for Spark
 Key: SPARK-12469
 URL: https://issues.apache.org/jira/browse/SPARK-12469
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: holdenk


Tasks executed on Spark workers are unable to modify values from the driver, 
and accumulators are the one exception for this. Accumulators in Spark are 
implemented in such a way that when a stage is recomputed (say for cache 
eviction) the accumulator will be updated a second time. This makes 
accumulators inside of transformations more difficult to use for things like 
counting invalid records (one of the primary potential use cases of collecting 
side information during a transformation). However in some cases this counting 
during re-evaluation is exactly the behaviour we want (say in tracking total 
execution time for a particular function). Spark would benefit from a version 
of accumulators which did not double count even if stages were re-executed.

Motivating example:
{code}
val parseTime = sc.accumulator(0L)
val parseFailures = sc.accumulator(0L)
val parsedData = sc.textFile(...).flatMap { line =>
  val start = System.currentTimeMillis()
  val parsed = Try(parse(line))
  if (parsed.isFailure) parseFailures += 1
  parseTime += System.currentTimeMillis() - start
  parsed.toOption
}
parsedData.cache()

val resultA = parsedData.map(...).filter(...).count()

// some intervening code.  Almost anything could happen here -- some of 
parsedData may
// get kicked out of the cache, or an executor where data was cached might get 
lost

val resultB = parsedData.filter(...).map(...).flatMap(...).count()

// now we look at the accumulators
{code}

Here we would want parseFailures to only have been added to once for every line 
which failed to parse.  Unfortunately, the current Spark accumulator API 
doesn’t support the current parseFailures use case since if some data had been 
evicted its possible that it will be double counted.


See the full design document at 
https://docs.google.com/document/d/1lR_l1g3zMVctZXrcVjFusq2iQVpr4XvRK_UUDsDr6nk/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12463) Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12463:


Assignee: (was: Apache Spark)

> Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode
> 
>
> Key: SPARK-12463
> URL: https://issues.apache.org/jira/browse/SPARK-12463
> Project: Spark
>  Issue Type: Task
>Reporter: Timothy Chen
>
> Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode 
> configuration for cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12465) Remove spark.deploy.mesos.zookeeper.dir and use spark.deploy.zookeeper.dir

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12465:


Assignee: Apache Spark

> Remove spark.deploy.mesos.zookeeper.dir and use spark.deploy.zookeeper.dir
> --
>
> Key: SPARK-12465
> URL: https://issues.apache.org/jira/browse/SPARK-12465
> Project: Spark
>  Issue Type: Task
>  Components: Mesos
>Reporter: Timothy Chen
>Assignee: Apache Spark
>
> Remove spark.deploy.mesos.zookeeper.dir and use existing configuration 
> spark.deploy.zookeeper.dir for Mesos cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12463) Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12463:


Assignee: Apache Spark

> Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode
> 
>
> Key: SPARK-12463
> URL: https://issues.apache.org/jira/browse/SPARK-12463
> Project: Spark
>  Issue Type: Task
>Reporter: Timothy Chen
>Assignee: Apache Spark
>
> Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode 
> configuration for cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12465) Remove spark.deploy.mesos.zookeeper.dir and use spark.deploy.zookeeper.dir

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12465:


Assignee: Apache Spark

> Remove spark.deploy.mesos.zookeeper.dir and use spark.deploy.zookeeper.dir
> --
>
> Key: SPARK-12465
> URL: https://issues.apache.org/jira/browse/SPARK-12465
> Project: Spark
>  Issue Type: Task
>  Components: Mesos
>Reporter: Timothy Chen
>Assignee: Apache Spark
>
> Remove spark.deploy.mesos.zookeeper.dir and use existing configuration 
> spark.deploy.zookeeper.dir for Mesos cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12463) Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode

2015-12-21 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067065#comment-15067065
 ] 

Apache Spark commented on SPARK-12463:
--

User 'tnachen' has created a pull request for this issue:
https://github.com/apache/spark/pull/10057

> Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode
> 
>
> Key: SPARK-12463
> URL: https://issues.apache.org/jira/browse/SPARK-12463
> Project: Spark
>  Issue Type: Task
>Reporter: Timothy Chen
>
> Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode 
> configuration for cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12465) Remove spark.deploy.mesos.zookeeper.dir and use spark.deploy.zookeeper.dir

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12465:


Assignee: (was: Apache Spark)

> Remove spark.deploy.mesos.zookeeper.dir and use spark.deploy.zookeeper.dir
> --
>
> Key: SPARK-12465
> URL: https://issues.apache.org/jira/browse/SPARK-12465
> Project: Spark
>  Issue Type: Task
>  Components: Mesos
>Reporter: Timothy Chen
>
> Remove spark.deploy.mesos.zookeeper.dir and use existing configuration 
> spark.deploy.zookeeper.dir for Mesos cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12464) Remove spark.deploy.mesos.zookeeper.url and use spark.deploy.zookeeper.url

2015-12-21 Thread Andrew Or (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067062#comment-15067062
 ] 

Andrew Or commented on SPARK-12464:
---

By the way for future reference you probably don't need a separate issue for 
each config. Just have an issue that says
`Remove spark.deploy.mesos.* and use spark.deploy.* instead`. Since you already 
opened these we can just keep them.

> Remove spark.deploy.mesos.zookeeper.url and use spark.deploy.zookeeper.url
> --
>
> Key: SPARK-12464
> URL: https://issues.apache.org/jira/browse/SPARK-12464
> Project: Spark
>  Issue Type: Task
>  Components: Mesos
>Reporter: Timothy Chen
>Assignee: Apache Spark
>
> Remove spark.deploy.mesos.zookeeper.url and use existing configuration 
> spark.deploy.zookeeper.url for Mesos cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12465) Remove spark.deploy.mesos.zookeeper.dir and use spark.deploy.zookeeper.dir

2015-12-21 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067059#comment-15067059
 ] 

Apache Spark commented on SPARK-12465:
--

User 'tnachen' has created a pull request for this issue:
https://github.com/apache/spark/pull/10057

> Remove spark.deploy.mesos.zookeeper.dir and use spark.deploy.zookeeper.dir
> --
>
> Key: SPARK-12465
> URL: https://issues.apache.org/jira/browse/SPARK-12465
> Project: Spark
>  Issue Type: Task
>  Components: Mesos
>Reporter: Timothy Chen
>
> Remove spark.deploy.mesos.zookeeper.dir and use existing configuration 
> spark.deploy.zookeeper.dir for Mesos cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12321) JSON format for logical/physical execution plans

2015-12-21 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-12321:
-
Assignee: Wenchen Fan

> JSON format for logical/physical execution plans
> 
>
> Key: SPARK-12321
> URL: https://issues.apache.org/jira/browse/SPARK-12321
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12465) Remove spark.deploy.mesos.zookeeper.dir and use spark.deploy.zookeeper.dir

2015-12-21 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12465:


Assignee: (was: Apache Spark)

> Remove spark.deploy.mesos.zookeeper.dir and use spark.deploy.zookeeper.dir
> --
>
> Key: SPARK-12465
> URL: https://issues.apache.org/jira/browse/SPARK-12465
> Project: Spark
>  Issue Type: Task
>  Components: Mesos
>Reporter: Timothy Chen
>
> Remove spark.deploy.mesos.zookeeper.dir and use existing configuration 
> spark.deploy.zookeeper.dir for Mesos cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12321) JSON format for logical/physical execution plans

2015-12-21 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-12321.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10311
[https://github.com/apache/spark/pull/10311]

> JSON format for logical/physical execution plans
> 
>
> Key: SPARK-12321
> URL: https://issues.apache.org/jira/browse/SPARK-12321
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 199 matches

Mail list logo