[jira] [Created] (SPARK-12648) UDF with Option[Double] throws ClassCastException

2016-01-05 Thread Mikael Valot (JIRA)
Mikael Valot created SPARK-12648:


 Summary: UDF with Option[Double] throws ClassCastException
 Key: SPARK-12648
 URL: https://issues.apache.org/jira/browse/SPARK-12648
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Mikael Valot


I can write an UDF that returns an Option[Double], and the DataFrame's  schema 
is correctly inferred to be a nullable double. 
However I cannot seem to be able to write a UDF that takes an Option as an 
argument:

import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkContext, SparkConf}

val conf = new SparkConf().setMaster("local[4]").setAppName("test")
val sc = new SparkContext(conf)
val sqlc = new SQLContext(sc)
import sqlc.implicits._
val df = sc.parallelize(List(("a", Some(4D)), ("b", None))).toDF("name", 
"weight")
import org.apache.spark.sql.functions._
val addTwo = udf((d: Option[Double]) => d.map(_+2)) 
df.withColumn("plusTwo", addTwo(df("weight"))).show()

=>
2016-01-05T14:41:52 Executor task launch worker-0 ERROR 
org.apache.spark.executor.Executor Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.ClassCastException: java.lang.Double cannot be cast to scala.Option
at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(:18) 
~[na:na]
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source) ~[na:na]
at 
org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
 ~[spark-sql_2.10-1.6.0.jar:1.6.0]
at 
org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
 ~[spark-sql_2.10-1.6.0.jar:1.6.0]
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
~[scala-library-2.10.5.jar:na]




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7831) Mesos dispatcher doesn't deregister as a framework from Mesos when stopped

2016-01-05 Thread Nilanjan Raychaudhuri (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083142#comment-15083142
 ] 

Nilanjan Raychaudhuri commented on SPARK-7831:
--

I am working on a possible fix for this. I will submit a pull request soon

> Mesos dispatcher doesn't deregister as a framework from Mesos when stopped
> --
>
> Key: SPARK-7831
> URL: https://issues.apache.org/jira/browse/SPARK-7831
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.4.0
> Environment: Spark 1.4.0-rc1, Mesos 0.2.2 (compiled from source)
>Reporter: Luc Bourlier
>
> To run Spark on Mesos in cluster mode, a Spark Mesos dispatcher has to be 
> running.
> It is launched using {{sbin/start-mesos-dispatcher.sh}}. The Mesos dispatcher 
> registers as a framework in the Mesos cluster.
> After using {{sbin/stop-mesos-dispatcher.sh}} to stop the dispatcher, the 
> application is correctly terminated locally, but the framework is still 
> listed as {{active}} in the Mesos dashboard.
> I would expect the framework to be de-registered when the dispatcher is 
> stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12632) Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation

2016-01-05 Thread Vijay Kiran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082973#comment-15082973
 ] 

Vijay Kiran commented on SPARK-12632:
-

[~somi...@us.ibm.com] I added a couple of comments, I guess `recommendation.py` 
needs to be fixed as well, but I think [~bryanc] will have more to say on this 
:)

> Make Parameter Descriptions Consistent for PySpark MLlib FPM and 
> Recommendation
> ---
>
> Key: SPARK-12632
> URL: https://issues.apache.org/jira/browse/SPARK-12632
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up fpm.py 
> and recommendation.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12632) Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12632:


Assignee: (was: Apache Spark)

> Make Parameter Descriptions Consistent for PySpark MLlib FPM and 
> Recommendation
> ---
>
> Key: SPARK-12632
> URL: https://issues.apache.org/jira/browse/SPARK-12632
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up fpm.py 
> and recommendation.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12632) Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12632:


Assignee: Apache Spark

> Make Parameter Descriptions Consistent for PySpark MLlib FPM and 
> Recommendation
> ---
>
> Key: SPARK-12632
> URL: https://issues.apache.org/jira/browse/SPARK-12632
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up fpm.py 
> and recommendation.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12632) Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation

2016-01-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082964#comment-15082964
 ] 

Apache Spark commented on SPARK-12632:
--

User 'somideshmukh' has created a pull request for this issue:
https://github.com/apache/spark/pull/10602

> Make Parameter Descriptions Consistent for PySpark MLlib FPM and 
> Recommendation
> ---
>
> Key: SPARK-12632
> URL: https://issues.apache.org/jira/browse/SPARK-12632
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up fpm.py 
> and recommendation.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12632) Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation

2016-01-05 Thread Vijay Kiran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082943#comment-15082943
 ] 

Vijay Kiran edited comment on SPARK-12632 at 1/5/16 12:20 PM:
--

[~somi...@us.ibm.com] Did you start working on this already ? I opened PRs for 
other three, and made changes to these files as well.
WIP: 
https://github.com/vijaykiran/spark/commit/f7c6c49638710cc62d36dbf3b306abed0983b30f


was (Author: vijaykiran):
[~somi...@us.ibm.com] Did you start working on this already ? I opened PRs for 
other three, and made changes to these files as well.

> Make Parameter Descriptions Consistent for PySpark MLlib FPM and 
> Recommendation
> ---
>
> Key: SPARK-12632
> URL: https://issues.apache.org/jira/browse/SPARK-12632
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up fpm.py 
> and recommendation.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12632) Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation

2016-01-05 Thread Vijay Kiran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082943#comment-15082943
 ] 

Vijay Kiran commented on SPARK-12632:
-

[~somi...@us.ibm.com] Did you start working on this already ? I opened PRs for 
other three, and made changes to these files as well.

> Make Parameter Descriptions Consistent for PySpark MLlib FPM and 
> Recommendation
> ---
>
> Key: SPARK-12632
> URL: https://issues.apache.org/jira/browse/SPARK-12632
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up fpm.py 
> and recommendation.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12095) Window function rowsBetween throws exception

2016-01-05 Thread Tristan Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082921#comment-15082921
 ] 

Tristan Reid commented on SPARK-12095:
--

The SQL syntax doesn't appear to work at all. 
  `select rank() OVER (PARTITION BY c1 ORDER BY c2 ) as rank from tbl`

Is that the case?

> Window function rowsBetween throws exception
> 
>
> Key: SPARK-12095
> URL: https://issues.apache.org/jira/browse/SPARK-12095
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Irakli Machabeli
>
> From pyspark :
>  windowSpec=Window.partitionBy('A', 'B').orderBy('A','B', 
> 'C').rowsBetween('UNBOUNDED PRECEDING','CURRENT')
> Py4JError: An error occurred while calling o1107.rowsBetween. Trace:
> py4j.Py4JException: Method rowsBetween([class java.lang.String, class 
> java.lang.Long]) does not exist
> from SQL query parser fails immediately:
> Py4JJavaError: An error occurred while calling o18.sql.
> : java.lang.RuntimeException: [1.20] failure: ``union'' expected but `(' found
> select rank() OVER (PARTITION BY c1 ORDER BY c2 ) as rank from tbl
>^
> at scala.sys.package$.error(package.scala:27)
> at 
> org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1061) allow Hadoop RDDs to be read w/ a partitioner

2016-01-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1061.
--
Resolution: Won't Fix

> allow Hadoop RDDs to be read w/ a partitioner
> -
>
> Key: SPARK-1061
> URL: https://issues.apache.org/jira/browse/SPARK-1061
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Imran Rashid
>Assignee: Imran Rashid
>
> Using partitioners to get narrow dependencies can save tons of time on a 
> shuffle.  However, after saving an RDD to hdfs, and then reloading it, all 
> partitioner information is lost.  This means that you can never get a narrow 
> dependency when loading data from hadoop.
> I think we could get around this by:
> 1) having a modified version of hadoop rdd that kept track of original part 
> file (or maybe just prevent splits altogether ...)
> 2) add a "assumePartition(partitioner:Partitioner, verify: Boolean)" function 
> to RDD.  It would create a new RDD, which had the exact same data but just 
> pretended that the RDD had the given partitioner applied to it.  And if 
> verify=true, it could add a mapPartitionsWithIndex to check that each record 
> was in the right partition.
> http://apache-spark-user-list.1001560.n3.nabble.com/setting-partitioners-with-hadoop-rdds-td976.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12331) R^2 for regression through the origin

2016-01-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12331:
--
Assignee: Imran Younus

> R^2 for regression through the origin
> -
>
> Key: SPARK-12331
> URL: https://issues.apache.org/jira/browse/SPARK-12331
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Imran Younus
>Assignee: Imran Younus
>Priority: Minor
> Fix For: 2.0.0
>
>
> The value of R^2 (coefficient of determination) obtained from 
> LinearRegressionModel is not consistent with R and statsmodels when the 
> fitIntercept is false i.e., regression through the origin. In this case, both 
> R and statsmodels use the definition of R^2 given by eq(4') in the following 
> review paper:
> https://online.stat.psu.edu/~ajw13/stat501/SpecialTopics/Reg_thru_origin.pdf
> Here is the definition from this paper:
> R^2 = \sum(\hat( y)_i^2)/\sum(y_i^2)
> The paper also describes why this should be the case. I've double checked 
> that the value of R^2 from statsmodels and R are consistent with this 
> definition. On the other hand, scikit-learn doesn't use the above definition. 
> I would recommend using the above definition in Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12331) R^2 for regression through the origin

2016-01-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-12331.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10384
[https://github.com/apache/spark/pull/10384]

> R^2 for regression through the origin
> -
>
> Key: SPARK-12331
> URL: https://issues.apache.org/jira/browse/SPARK-12331
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Imran Younus
>Priority: Minor
> Fix For: 2.0.0
>
>
> The value of R^2 (coefficient of determination) obtained from 
> LinearRegressionModel is not consistent with R and statsmodels when the 
> fitIntercept is false i.e., regression through the origin. In this case, both 
> R and statsmodels use the definition of R^2 given by eq(4') in the following 
> review paper:
> https://online.stat.psu.edu/~ajw13/stat501/SpecialTopics/Reg_thru_origin.pdf
> Here is the definition from this paper:
> R^2 = \sum(\hat( y)_i^2)/\sum(y_i^2)
> The paper also describes why this should be the case. I've double checked 
> that the value of R^2 from statsmodels and R are consistent with this 
> definition. On the other hand, scikit-learn doesn't use the above definition. 
> I would recommend using the above definition in Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12634) Make Parameter Descriptions Consistent for PySpark MLlib Tree

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12634:


Assignee: Apache Spark

> Make Parameter Descriptions Consistent for PySpark MLlib Tree
> -
>
> Key: SPARK-12634
> URL: https://issues.apache.org/jira/browse/SPARK-12634
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up tree.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12634) Make Parameter Descriptions Consistent for PySpark MLlib Tree

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12634:


Assignee: (was: Apache Spark)

> Make Parameter Descriptions Consistent for PySpark MLlib Tree
> -
>
> Key: SPARK-12634
> URL: https://issues.apache.org/jira/browse/SPARK-12634
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up tree.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12634) Make Parameter Descriptions Consistent for PySpark MLlib Tree

2016-01-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082890#comment-15082890
 ] 

Apache Spark commented on SPARK-12634:
--

User 'vijaykiran' has created a pull request for this issue:
https://github.com/apache/spark/pull/10601

> Make Parameter Descriptions Consistent for PySpark MLlib Tree
> -
>
> Key: SPARK-12634
> URL: https://issues.apache.org/jira/browse/SPARK-12634
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up tree.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12647) 1.6 branch test failure o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of reducers: aggregate operator

2016-01-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082889#comment-15082889
 ] 

Sean Owen commented on SPARK-12647:
---

*shrug* at this point probably doesn't matter; mostly for next time here. The 
concern is just that someone finds your fix to the first JIRA but not the fix 
to the fix. I linked them here at least.

> 1.6 branch test failure 
> o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of 
> reducers: aggregate operator
> ---
>
> Key: SPARK-12647
> URL: https://issues.apache.org/jira/browse/SPARK-12647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Pete Robbins
>Priority: Minor
>
> All 1.6 branch builds failing eg 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-1.6-test-maven-pre-yarn-2.0.0-mr1-cdh4.1.2/lastCompletedBuild/testReport/org.apache.spark.sql.execution/ExchangeCoordinatorSuite/determining_the_number_of_reducers__aggregate_operator/
> 3 did not equal 2
> PR for SPARK-12470 causes change in partition size so test needs updating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3785) Support off-loading computations to a GPU

2016-01-05 Thread Kazuaki Ishizaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082881#comment-15082881
 ] 

Kazuaki Ishizaki commented on SPARK-3785:
-

# You can specify cpu-cores by using conventional Spark options like 
"--executor-cores".
# Do you want to execute an operation for a matrix represented by a RDD? The 
current version has possible two GPU memory limitations
#* Since it copies the whole data in a partition of RDD between CPU and GPU, a 
GPU kernel for a task cannot exceed the capacity of the GPU memory
#* Since tasks are concurrently executed, a sum of the required GPU memories by 
tasks at a time cannot exceed the capacity of the GPU memory.

Comment 2 is a very good question. To exploit GPU in Spark, it is necessary to 
devise better approaches.

> Support off-loading computations to a GPU
> -
>
> Key: SPARK-3785
> URL: https://issues.apache.org/jira/browse/SPARK-3785
> Project: Spark
>  Issue Type: Brainstorming
>  Components: MLlib
>Reporter: Thomas Darimont
>Priority: Minor
>
> Are there any plans to adding support for off-loading computations to the 
> GPU, e.g. via an open-cl binding? 
> http://www.jocl.org/
> https://code.google.com/p/javacl/
> http://lwjgl.org/wiki/index.php?title=OpenCL_in_LWJGL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12647) 1.6 branch test failure o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of reducers: aggregate operator

2016-01-05 Thread Pete Robbins (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082879#comment-15082879
 ] 

Pete Robbins edited comment on SPARK-12647 at 1/5/16 11:30 AM:
---

[~sowen] should I close this and move the PR?



was (Author: robbinspg):
@sowen should I close this and move the PR?


> 1.6 branch test failure 
> o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of 
> reducers: aggregate operator
> ---
>
> Key: SPARK-12647
> URL: https://issues.apache.org/jira/browse/SPARK-12647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Pete Robbins
>Priority: Minor
>
> All 1.6 branch builds failing eg 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-1.6-test-maven-pre-yarn-2.0.0-mr1-cdh4.1.2/lastCompletedBuild/testReport/org.apache.spark.sql.execution/ExchangeCoordinatorSuite/determining_the_number_of_reducers__aggregate_operator/
> 3 did not equal 2
> PR for SPARK-12470 causes change in partition size so test needs updating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12647) 1.6 branch test failure o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of reducers: aggregate operator

2016-01-05 Thread Pete Robbins (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082879#comment-15082879
 ] 

Pete Robbins commented on SPARK-12647:
--

@sowen should I close this and move the PR?


> 1.6 branch test failure 
> o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of 
> reducers: aggregate operator
> ---
>
> Key: SPARK-12647
> URL: https://issues.apache.org/jira/browse/SPARK-12647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Pete Robbins
>Priority: Minor
>
> All 1.6 branch builds failing eg 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-1.6-test-maven-pre-yarn-2.0.0-mr1-cdh4.1.2/lastCompletedBuild/testReport/org.apache.spark.sql.execution/ExchangeCoordinatorSuite/determining_the_number_of_reducers__aggregate_operator/
> 3 did not equal 2
> PR for SPARK-12470 causes change in partition size so test needs updating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12622) spark-submit fails on executors when jar has a space in it

2016-01-05 Thread Adrian Bridgett (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082875#comment-15082875
 ] 

Adrian Bridgett commented on SPARK-12622:
-

Damn - sorry, that's my obfuscation error, so sorry about that - it :-(  It 
should read:
{noformat}
Added JAR file:/tmp/f%20oo.jar at http://10.1.201.77:35016/jars/f%20oo.jar with 
timestamp 1451917055779
{noformat}

Let me also post the full stack trace:
{noformat}
[2016-01-04 14:23:00,053] {daily_tmo.py:153} INFO -   16/01/04 14:23:00 WARN 
scheduler.TaskSetManager: Lost task 19.0 in stage 0.0 (TID 20, 
ip-10-1-200-159.ec2.internal): java.lang.ClassNotFoundException: 
ProcessFoo$$anonfun$46
[2016-01-04 14:23:00,053] {daily_tmo.py:153} INFO - at 
java.net.URLClassLoader.findClass(URLClassLoader.java:381)
[2016-01-04 14:23:00,053] {daily_tmo.py:153} INFO - at 
java.lang.ClassLoader.loadClass(ClassLoader.java:424)
[2016-01-04 14:23:00,053] {daily_tmo.py:153} INFO - at 
java.lang.ClassLoader.loadClass(ClassLoader.java:357)
[2016-01-04 14:23:00,053] {daily_tmo.py:153} INFO - at 
java.lang.Class.forName0(Native Method)
[2016-01-04 14:23:00,053] {daily_tmo.py:153} INFO - at 
java.lang.Class.forName(Class.java:348)
[2016-01-04 14:23:00,053] {daily_tmo.py:153} INFO - at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
[2016-01-04 14:23:00,053] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
[2016-01-04 14:23:00,053] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
[2016-01-04 14:23:00,053] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
[2016-01-04 14:23:00,053] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[2016-01-04 14:23:00,053] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
[2016-01-04 14:23:00,053] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
[2016-01-04 14:23:00,054] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[2016-01-04 14:23:00,054] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[2016-01-04 14:23:00,054] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
[2016-01-04 14:23:00,054] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
[2016-01-04 14:23:00,054] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[2016-01-04 14:23:00,054] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[2016-01-04 14:23:00,054] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
[2016-01-04 14:23:00,054] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
[2016-01-04 14:23:00,054] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[2016-01-04 14:23:00,054] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[2016-01-04 14:23:00,054] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
[2016-01-04 14:23:00,054] {daily_tmo.py:153} INFO - at 
scala.collection.immutable.$colon$colon.readObject(List.scala:362)
[2016-01-04 14:23:00,054] {daily_tmo.py:153} INFO - at 
sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
[2016-01-04 14:23:00,054] {daily_tmo.py:153} INFO - at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2016-01-04 14:23:00,054] {daily_tmo.py:153} INFO - at 
java.lang.reflect.Method.invoke(Method.java:497)
[2016-01-04 14:23:00,055] {daily_tmo.py:153} INFO - at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
[2016-01-04 14:23:00,055] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
[2016-01-04 14:23:00,055] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
[2016-01-04 14:23:00,055] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
[2016-01-04 14:23:00,055] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
[2016-01-04 14:23:00,055] {daily_tmo.py:153} INFO - at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
[2016-01-04 14:23:00,055]

[jira] [Commented] (SPARK-12634) Make Parameter Descriptions Consistent for PySpark MLlib Tree

2016-01-05 Thread Vijay Kiran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082872#comment-15082872
 ] 

Vijay Kiran commented on SPARK-12634:
-

I'm editing tree.py.

> Make Parameter Descriptions Consistent for PySpark MLlib Tree
> -
>
> Key: SPARK-12634
> URL: https://issues.apache.org/jira/browse/SPARK-12634
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up tree.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12622) spark-submit fails on executors when jar has a space in it

2016-01-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082862#comment-15082862
 ] 

Sean Owen commented on SPARK-12622:
---

Oh I see it. Ultimately I assume it's because the JAR isn't found locally, 
though the question is why. This looks suspicious:
{{Added JAR file:/tmpf%20oo.jar at http://10.1.201.77:43888/jars/f%oo.jar}}

The second http URL can't be right. I don't have any more ideas but that looks 
like somewhere to start looking.

> spark-submit fails on executors when jar has a space in it
> --
>
> Key: SPARK-12622
> URL: https://issues.apache.org/jira/browse/SPARK-12622
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.6.0
> Environment: Linux, Mesos 
>Reporter: Adrian Bridgett
>Priority: Minor
>
> spark-submit --class foo "Foo.jar"  works
> but when using "f oo.jar" it starts to run and then breaks on the executors 
> as they cannot find the various functions.
> Out of interest (as HDFS CLI uses this format) I tried f%20oo.jar - this 
> fails immediately.
> {noformat}
> spark-submit --class Foo /tmp/f\ oo.jar
> ...
> spark.jars=file:/tmp/f%20oo.jar
> 6/01/04 14:56:47 INFO spark.SparkContext: Added JAR file:/tmpf%20oo.jar at 
> http://10.1.201.77:43888/jars/f%oo.jar with timestamp 1451919407769
> 16/01/04 14:57:48 WARN scheduler.TaskSetManager: Lost task 4.0 in stage 0.0 
> (TID 2, ip-10-1-200-232.ec2.internal): java.lang.ClassNotFoundException: 
> Foo$$anonfun$46
> {noformat}
> SPARK-6568 is related but maybe specific to the Windows environment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3785) Support off-loading computations to a GPU

2016-01-05 Thread Kazuaki Ishizaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082859#comment-15082859
 ] 

Kazuaki Ishizaki commented on SPARK-3785:
-

To use CUDA is an intermediate approach to evaluate idea A. A future version 
will drive GPU code from a Spark program without writing CUDA code by hand. The 
version may generate GPU binary thru CUDA or OpenCL by using it as a backend in 
a compiler.

> Support off-loading computations to a GPU
> -
>
> Key: SPARK-3785
> URL: https://issues.apache.org/jira/browse/SPARK-3785
> Project: Spark
>  Issue Type: Brainstorming
>  Components: MLlib
>Reporter: Thomas Darimont
>Priority: Minor
>
> Are there any plans to adding support for off-loading computations to the 
> GPU, e.g. via an open-cl binding? 
> http://www.jocl.org/
> https://code.google.com/p/javacl/
> http://lwjgl.org/wiki/index.php?title=OpenCL_in_LWJGL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12622) spark-submit fails on executors when jar has a space in it

2016-01-05 Thread Adrian Bridgett (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082855#comment-15082855
 ] 

Adrian Bridgett commented on SPARK-12622:
-

The job fails with the ClassNotFound exception, if I rename the jar file and 
resubmit it all works.

> spark-submit fails on executors when jar has a space in it
> --
>
> Key: SPARK-12622
> URL: https://issues.apache.org/jira/browse/SPARK-12622
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.6.0
> Environment: Linux, Mesos 
>Reporter: Adrian Bridgett
>Priority: Minor
>
> spark-submit --class foo "Foo.jar"  works
> but when using "f oo.jar" it starts to run and then breaks on the executors 
> as they cannot find the various functions.
> Out of interest (as HDFS CLI uses this format) I tried f%20oo.jar - this 
> fails immediately.
> {noformat}
> spark-submit --class Foo /tmp/f\ oo.jar
> ...
> spark.jars=file:/tmp/f%20oo.jar
> 6/01/04 14:56:47 INFO spark.SparkContext: Added JAR file:/tmpf%20oo.jar at 
> http://10.1.201.77:43888/jars/f%oo.jar with timestamp 1451919407769
> 16/01/04 14:57:48 WARN scheduler.TaskSetManager: Lost task 4.0 in stage 0.0 
> (TID 2, ip-10-1-200-232.ec2.internal): java.lang.ClassNotFoundException: 
> Foo$$anonfun$46
> {noformat}
> SPARK-6568 is related but maybe specific to the Windows environment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-12633) Make Parameter Descriptions Consistent for PySpark MLlib Regression

2016-01-05 Thread Vijay Kiran (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Kiran updated SPARK-12633:

Comment: was deleted

(was: Opened a PR  https://github.com/apache/spark/pull/10600)

> Make Parameter Descriptions Consistent for PySpark MLlib Regression
> ---
>
> Key: SPARK-12633
> URL: https://issues.apache.org/jira/browse/SPARK-12633
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up 
> regression.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12633) Make Parameter Descriptions Consistent for PySpark MLlib Regression

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12633:


Assignee: Apache Spark

> Make Parameter Descriptions Consistent for PySpark MLlib Regression
> ---
>
> Key: SPARK-12633
> URL: https://issues.apache.org/jira/browse/SPARK-12633
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up 
> regression.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12633) Make Parameter Descriptions Consistent for PySpark MLlib Regression

2016-01-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082853#comment-15082853
 ] 

Apache Spark commented on SPARK-12633:
--

User 'vijaykiran' has created a pull request for this issue:
https://github.com/apache/spark/pull/10600

> Make Parameter Descriptions Consistent for PySpark MLlib Regression
> ---
>
> Key: SPARK-12633
> URL: https://issues.apache.org/jira/browse/SPARK-12633
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up 
> regression.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12633) Make Parameter Descriptions Consistent for PySpark MLlib Regression

2016-01-05 Thread Vijay Kiran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082854#comment-15082854
 ] 

Vijay Kiran commented on SPARK-12633:
-

Opened a PR  https://github.com/apache/spark/pull/10600

> Make Parameter Descriptions Consistent for PySpark MLlib Regression
> ---
>
> Key: SPARK-12633
> URL: https://issues.apache.org/jira/browse/SPARK-12633
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up 
> regression.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12633) Make Parameter Descriptions Consistent for PySpark MLlib Regression

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12633:


Assignee: (was: Apache Spark)

> Make Parameter Descriptions Consistent for PySpark MLlib Regression
> ---
>
> Key: SPARK-12633
> URL: https://issues.apache.org/jira/browse/SPARK-12633
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up 
> regression.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12647) 1.6 branch test failure o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of reducers: aggregate operator

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12647:


Assignee: Apache Spark

> 1.6 branch test failure 
> o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of 
> reducers: aggregate operator
> ---
>
> Key: SPARK-12647
> URL: https://issues.apache.org/jira/browse/SPARK-12647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Pete Robbins
>Assignee: Apache Spark
>Priority: Minor
>
> All 1.6 branch builds failing eg 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-1.6-test-maven-pre-yarn-2.0.0-mr1-cdh4.1.2/lastCompletedBuild/testReport/org.apache.spark.sql.execution/ExchangeCoordinatorSuite/determining_the_number_of_reducers__aggregate_operator/
> 3 did not equal 2
> PR for SPARK-12470 causes change in partition size so test needs updating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12647) 1.6 branch test failure o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of reducers: aggregate operator

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12647:


Assignee: (was: Apache Spark)

> 1.6 branch test failure 
> o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of 
> reducers: aggregate operator
> ---
>
> Key: SPARK-12647
> URL: https://issues.apache.org/jira/browse/SPARK-12647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Pete Robbins
>Priority: Minor
>
> All 1.6 branch builds failing eg 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-1.6-test-maven-pre-yarn-2.0.0-mr1-cdh4.1.2/lastCompletedBuild/testReport/org.apache.spark.sql.execution/ExchangeCoordinatorSuite/determining_the_number_of_reducers__aggregate_operator/
> 3 did not equal 2
> PR for SPARK-12470 causes change in partition size so test needs updating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12647) 1.6 branch test failure o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of reducers: aggregate operator

2016-01-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082847#comment-15082847
 ] 

Apache Spark commented on SPARK-12647:
--

User 'robbinspg' has created a pull request for this issue:
https://github.com/apache/spark/pull/10599

> 1.6 branch test failure 
> o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of 
> reducers: aggregate operator
> ---
>
> Key: SPARK-12647
> URL: https://issues.apache.org/jira/browse/SPARK-12647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Pete Robbins
>Priority: Minor
>
> All 1.6 branch builds failing eg 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-1.6-test-maven-pre-yarn-2.0.0-mr1-cdh4.1.2/lastCompletedBuild/testReport/org.apache.spark.sql.execution/ExchangeCoordinatorSuite/determining_the_number_of_reducers__aggregate_operator/
> 3 did not equal 2
> PR for SPARK-12470 causes change in partition size so test needs updating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12647) 1.6 branch test failure o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of reducers: aggregate operator

2016-01-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082846#comment-15082846
 ] 

Sean Owen commented on SPARK-12647:
---

[~robbinspg] rather than make a new JIRA, you should reopen your existing one 
and provide another PR. The additional change must logically go with your 
original one.

> 1.6 branch test failure 
> o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of 
> reducers: aggregate operator
> ---
>
> Key: SPARK-12647
> URL: https://issues.apache.org/jira/browse/SPARK-12647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Pete Robbins
>Priority: Minor
>
> All 1.6 branch builds failing eg 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-1.6-test-maven-pre-yarn-2.0.0-mr1-cdh4.1.2/lastCompletedBuild/testReport/org.apache.spark.sql.execution/ExchangeCoordinatorSuite/determining_the_number_of_reducers__aggregate_operator/
> 3 did not equal 2
> PR for SPARK-12470 causes change in partition size so test needs updating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12647) 1.6 branch test failure o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of reducers: aggregate operator

2016-01-05 Thread Pete Robbins (JIRA)
Pete Robbins created SPARK-12647:


 Summary: 1.6 branch test failure 
o.a.s.sql.execution.ExchangeCoordinatorSuite.determining the number of 
reducers: aggregate operator
 Key: SPARK-12647
 URL: https://issues.apache.org/jira/browse/SPARK-12647
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Pete Robbins
Priority: Minor


All 1.6 branch builds failing eg 
https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-1.6-test-maven-pre-yarn-2.0.0-mr1-cdh4.1.2/lastCompletedBuild/testReport/org.apache.spark.sql.execution/ExchangeCoordinatorSuite/determining_the_number_of_reducers__aggregate_operator/

3 did not equal 2

PR for SPARK-12470 causes change in partition size so test needs updating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12622) spark-submit fails on executors when jar has a space in it

2016-01-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082829#comment-15082829
 ] 

Sean Owen commented on SPARK-12622:
---

I don't see details of the actual problem here. Everything so far looks 
correct. {{file:/tmp/f%20oo.jar}} is a valid URI for the file, so that can't be 
rejected. What breaks?

> spark-submit fails on executors when jar has a space in it
> --
>
> Key: SPARK-12622
> URL: https://issues.apache.org/jira/browse/SPARK-12622
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.6.0
> Environment: Linux, Mesos 
>Reporter: Adrian Bridgett
>Priority: Minor
>
> spark-submit --class foo "Foo.jar"  works
> but when using "f oo.jar" it starts to run and then breaks on the executors 
> as they cannot find the various functions.
> Out of interest (as HDFS CLI uses this format) I tried f%20oo.jar - this 
> fails immediately.
> {noformat}
> spark-submit --class Foo /tmp/f\ oo.jar
> ...
> spark.jars=file:/tmp/f%20oo.jar
> 6/01/04 14:56:47 INFO spark.SparkContext: Added JAR file:/tmpf%20oo.jar at 
> http://10.1.201.77:43888/jars/f%oo.jar with timestamp 1451919407769
> 16/01/04 14:57:48 WARN scheduler.TaskSetManager: Lost task 4.0 in stage 0.0 
> (TID 2, ip-10-1-200-232.ec2.internal): java.lang.ClassNotFoundException: 
> Foo$$anonfun$46
> {noformat}
> SPARK-6568 is related but maybe specific to the Windows environment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12630) Make Parameter Descriptions Consistent for PySpark MLlib Classification

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12630:


Assignee: Apache Spark

> Make Parameter Descriptions Consistent for PySpark MLlib Classification
> ---
>
> Key: SPARK-12630
> URL: https://issues.apache.org/jira/browse/SPARK-12630
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up 
> classification.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12630) Make Parameter Descriptions Consistent for PySpark MLlib Classification

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12630:


Assignee: (was: Apache Spark)

> Make Parameter Descriptions Consistent for PySpark MLlib Classification
> ---
>
> Key: SPARK-12630
> URL: https://issues.apache.org/jira/browse/SPARK-12630
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up 
> classification.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12630) Make Parameter Descriptions Consistent for PySpark MLlib Classification

2016-01-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082821#comment-15082821
 ] 

Apache Spark commented on SPARK-12630:
--

User 'vijaykiran' has created a pull request for this issue:
https://github.com/apache/spark/pull/10598

> Make Parameter Descriptions Consistent for PySpark MLlib Classification
> ---
>
> Key: SPARK-12630
> URL: https://issues.apache.org/jira/browse/SPARK-12630
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up 
> classification.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12646) Support _HOST in kerberos principal for connecting to secure cluster

2016-01-05 Thread Hari Krishna Dara (JIRA)
Hari Krishna Dara created SPARK-12646:
-

 Summary: Support _HOST in kerberos principal for connecting to 
secure cluster
 Key: SPARK-12646
 URL: https://issues.apache.org/jira/browse/SPARK-12646
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Reporter: Hari Krishna Dara
Priority: Minor


Hadoop supports _HOST as a token that is dynamically replaced with the actual 
hostname at the time the kerberos authentication is done. This is supported in 
many hadoop stacks including YARN. When configuring Spark to connect to secure 
cluster (e.g., yarn-cluster or yarn-client as master), it would be natural to 
extend support for this token to Spark as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12630) Make Parameter Descriptions Consistent for PySpark MLlib Classification

2016-01-05 Thread Vijay Kiran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082764#comment-15082764
 ] 

Vijay Kiran commented on SPARK-12630:
-

I've made the changes, after I run the tests, I'll open a PR.

> Make Parameter Descriptions Consistent for PySpark MLlib Classification
> ---
>
> Key: SPARK-12630
> URL: https://issues.apache.org/jira/browse/SPARK-12630
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up 
> classification.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12623) map key_values to values

2016-01-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082739#comment-15082739
 ] 

Sean Owen commented on SPARK-12623:
---

There is a {{preservesPartitioning}} flag on some API methods that lets you 
specify that your function of {{(key, value)}} pairs won't change keys, or at 
least won't change the partitioning. Unfortunately, for historical reasons this 
wasn't exposed on the {{map()}} function, but was exposed on {{mapPartitions}}. 
It's a little clunky to invoke if you only need map, but not much -- you get an 
iterator that you then map as before.

That would at least let you do what you're trying to do. As to exposing a 
specialized method for this, yeah it's not crazy or anything but I doubt it 
would be viewed as worth it when there's a fairly direct way to do what you 
want. (Or else, I'd say argue for a new param to map, but that has its own 
obscure issues.)

> map key_values to values
> 
>
> Key: SPARK-12623
> URL: https://issues.apache.org/jira/browse/SPARK-12623
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Elazar Gershuni
>Priority: Minor
>  Labels: easyfix, features, performance
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Why doesn't the argument to mapValues() take a key as an agument? 
> Alternatively, can we have a "mapKeyValuesToValues" that does?
> Use case: I want to write a simpler analyzer that takes the argument to 
> map(), and analyze it to see whether it (trivially) doesn't change the key, 
> e.g. 
> g = lambda kv: (kv[0], f(kv[0], kv[1]))
> rdd.map(g)
> Problem is, if I find that it is the case, I can't call mapValues() with that 
> function, as in `rdd.mapValues(lambda kv: g(kv)[1])`, since mapValues 
> receives only `v` as an argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12645) SparkR support hash function

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12645:


Assignee: Apache Spark

> SparkR support hash function 
> -
>
> Key: SPARK-12645
> URL: https://issues.apache.org/jira/browse/SPARK-12645
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Yanbo Liang
>Assignee: Apache Spark
>
> Add hash function for SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12645) SparkR support hash function

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12645:


Assignee: (was: Apache Spark)

> SparkR support hash function 
> -
>
> Key: SPARK-12645
> URL: https://issues.apache.org/jira/browse/SPARK-12645
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Yanbo Liang
>
> Add hash function for SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12645) SparkR support hash function

2016-01-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082732#comment-15082732
 ] 

Apache Spark commented on SPARK-12645:
--

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/10597

> SparkR support hash function 
> -
>
> Key: SPARK-12645
> URL: https://issues.apache.org/jira/browse/SPARK-12645
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Yanbo Liang
>
> Add hash function for SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12645) SparkR support hash function

2016-01-05 Thread Yanbo Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-12645:

Description: Add hash function for SparkR  (was: SparkR add function hash 
for DataFrame)

> SparkR support hash function 
> -
>
> Key: SPARK-12645
> URL: https://issues.apache.org/jira/browse/SPARK-12645
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Yanbo Liang
>
> Add hash function for SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12645) SparkR add function hash

2016-01-05 Thread Yanbo Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-12645:

Summary: SparkR add function hash   (was: SparkR add function hash)

> SparkR add function hash 
> -
>
> Key: SPARK-12645
> URL: https://issues.apache.org/jira/browse/SPARK-12645
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Yanbo Liang
>
> SparkR add function hash for DataFrame



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12645) SparkR support hash function

2016-01-05 Thread Yanbo Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-12645:

Summary: SparkR support hash function   (was: SparkR add function hash )

> SparkR support hash function 
> -
>
> Key: SPARK-12645
> URL: https://issues.apache.org/jira/browse/SPARK-12645
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Yanbo Liang
>
> SparkR add function hash for DataFrame



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12645) SparkR add function hash

2016-01-05 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-12645:
---

 Summary: SparkR add function hash
 Key: SPARK-12645
 URL: https://issues.apache.org/jira/browse/SPARK-12645
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Reporter: Yanbo Liang


SparkR add function hash for DataFrame



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12403) "Simba Spark ODBC Driver 1.0" not working with 1.5.2 anymore

2016-01-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-12403.
---
Resolution: Not A Problem

OK, but as far as I can tell from this conversation it's an issue with a 
third-party ODBC driver.

> "Simba Spark ODBC Driver 1.0" not working with 1.5.2 anymore
> 
>
> Key: SPARK-12403
> URL: https://issues.apache.org/jira/browse/SPARK-12403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.5.1, 1.5.2
> Environment: ODBC connector query 
>Reporter: Lunen
>
> We are unable to query the SPARK tables using the ODBC driver from Simba 
> Spark(Databricks - "Simba Spark ODBC Driver 1.0")  We are able to do a show 
> databases and show tables, but not any queries. eg.
> Working:
> Select * from openquery(SPARK,'SHOW DATABASES')
> Select * from openquery(SPARK,'SHOW TABLES')
> Not working:
> Select * from openquery(SPARK,'Select * from lunentest')
> The error I get is:
> OLE DB provider "MSDASQL" for linked server "SPARK" returned message 
> "[Simba][SQLEngine] (31740) Table or view not found: spark..lunentest".
> Msg 7321, Level 16, State 2, Line 2
> An error occurred while preparing the query "Select * from lunentest" for 
> execution against OLE DB provider "MSDASQL" for linked server "SPARK"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12403) "Simba Spark ODBC Driver 1.0" not working with 1.5.2 anymore

2016-01-05 Thread Lunen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082687#comment-15082687
 ] 

Lunen commented on SPARK-12403:
---

I've managed to get in contact with the people who develops the Spark ODBC 
drivers. They told me that they OEM the driver to Databricks and that they 
don't understand why they would not make the latest driver available. I've also 
tested a trail version of the developer's latest driver and it works perfectly 
fine.

Asked on Databricks' forumn and sent emails to their sales and info department 
explaining the situation. Hopefully someone can help.

> "Simba Spark ODBC Driver 1.0" not working with 1.5.2 anymore
> 
>
> Key: SPARK-12403
> URL: https://issues.apache.org/jira/browse/SPARK-12403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.5.1, 1.5.2
> Environment: ODBC connector query 
>Reporter: Lunen
>
> We are unable to query the SPARK tables using the ODBC driver from Simba 
> Spark(Databricks - "Simba Spark ODBC Driver 1.0")  We are able to do a show 
> databases and show tables, but not any queries. eg.
> Working:
> Select * from openquery(SPARK,'SHOW DATABASES')
> Select * from openquery(SPARK,'SHOW TABLES')
> Not working:
> Select * from openquery(SPARK,'Select * from lunentest')
> The error I get is:
> OLE DB provider "MSDASQL" for linked server "SPARK" returned message 
> "[Simba][SQLEngine] (31740) Table or view not found: spark..lunentest".
> Msg 7321, Level 16, State 2, Line 2
> An error occurred while preparing the query "Select * from lunentest" for 
> execution against OLE DB provider "MSDASQL" for linked server "SPARK"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-05 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082674#comment-15082674
 ] 

Mario Briggs commented on SPARK-12177:
--

implemented here - 
https://github.com/mariobriggs/spark/commit/2fcbb721b99b48e336ba7ef7c317c279c9483840

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12401) Add support for enums in postgres

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12401:


Assignee: (was: Apache Spark)

> Add support for enums in postgres
> -
>
> Key: SPARK-12401
> URL: https://issues.apache.org/jira/browse/SPARK-12401
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Jaka Jancar
>Priority: Minor
>
> JSON and JSONB types [are now 
> converted|https://github.com/apache/spark/pull/8948/files] into strings on 
> the Spark side instead of throwing. It would be great it [enumerated 
> types|http://www.postgresql.org/docs/current/static/datatype-enum.html] were 
> treated similarly instead of failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12401) Add support for enums in postgres

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12401:


Assignee: Apache Spark

> Add support for enums in postgres
> -
>
> Key: SPARK-12401
> URL: https://issues.apache.org/jira/browse/SPARK-12401
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Jaka Jancar
>Assignee: Apache Spark
>Priority: Minor
>
> JSON and JSONB types [are now 
> converted|https://github.com/apache/spark/pull/8948/files] into strings on 
> the Spark side instead of throwing. It would be great it [enumerated 
> types|http://www.postgresql.org/docs/current/static/datatype-enum.html] were 
> treated similarly instead of failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12401) Add support for enums in postgres

2016-01-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082659#comment-15082659
 ] 

Apache Spark commented on SPARK-12401:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/10596

> Add support for enums in postgres
> -
>
> Key: SPARK-12401
> URL: https://issues.apache.org/jira/browse/SPARK-12401
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Jaka Jancar
>Priority: Minor
>
> JSON and JSONB types [are now 
> converted|https://github.com/apache/spark/pull/8948/files] into strings on 
> the Spark side instead of throwing. It would be great it [enumerated 
> types|http://www.postgresql.org/docs/current/static/datatype-enum.html] were 
> treated similarly instead of failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12622) spark-submit fails on executors when jar has a space in it

2016-01-05 Thread Adrian Bridgett (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082653#comment-15082653
 ] 

Adrian Bridgett commented on SPARK-12622:
-

Ajesh - that'd be a good improvement (I raised the ticket as it's not obvious 
what the problem is rather than that I really want spaces to work!)  I'd worry 
that someone would then raise a problem about "file:/tmp/f%20oo.jar" failing :-)

Jayadevan - I disliked the space when I saw it (sbt assembly of some in house 
code) but didn't know if it was invalid or not (but made a mental note to ask 
if we could lose the space).  FYI it looks like it's due to name in our sbt 
being "foo data" so we get "foo data-assembly-1.0.jar". Interestingly, the sbt 
example also has spaces: 
http://www.scala-sbt.org/0.13/docs/Howto-Project-Metadata.html

> spark-submit fails on executors when jar has a space in it
> --
>
> Key: SPARK-12622
> URL: https://issues.apache.org/jira/browse/SPARK-12622
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.6.0
> Environment: Linux, Mesos 
>Reporter: Adrian Bridgett
>Priority: Minor
>
> spark-submit --class foo "Foo.jar"  works
> but when using "f oo.jar" it starts to run and then breaks on the executors 
> as they cannot find the various functions.
> Out of interest (as HDFS CLI uses this format) I tried f%20oo.jar - this 
> fails immediately.
> {noformat}
> spark-submit --class Foo /tmp/f\ oo.jar
> ...
> spark.jars=file:/tmp/f%20oo.jar
> 6/01/04 14:56:47 INFO spark.SparkContext: Added JAR file:/tmpf%20oo.jar at 
> http://10.1.201.77:43888/jars/f%oo.jar with timestamp 1451919407769
> 16/01/04 14:57:48 WARN scheduler.TaskSetManager: Lost task 4.0 in stage 0.0 
> (TID 2, ip-10-1-200-232.ec2.internal): java.lang.ClassNotFoundException: 
> Foo$$anonfun$46
> {noformat}
> SPARK-6568 is related but maybe specific to the Windows environment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12623) map key_values to values

2016-01-05 Thread Elazar Gershuni (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082639#comment-15082639
 ] 

Elazar Gershuni edited comment on SPARK-12623 at 1/5/16 8:41 AM:
-

That does not answer the question/feature request. Mapping values to values can 
be achieved by similar code to the one you suggested:

{code}
rdd.map { case (key, value) => (key, myFunctionOf(value)) }
{code}

Yet Spark does provide {{rdd.mapValues()}}, for performance reasons (retaining 
the partitioning - avoiding the need to reshuffle when the key does not change).
I would like to enjoy similar benefits for my case too. The code that you 
suggested does not, since spark cannot know that the key does not change.

I'm sorry if that's not the place for the question/feature request, but it 
really isn't a user question.


was (Author: elazar):
That does not answer the question/feature request. Mapping values to values can 
be achieved by similar code to the one you suggested:

rdd.map { case (key, value) => (key, myFunctionOf(value)) }

Yet Spark does provide rdd.mapValues(), for performance reasons (retaining the 
partitioning - avoiding the need to reshuffle when the key does not change).
I would like to enjoy similar benefits for my case too. The code that you 
suggested does not, since spark cannot know that the key does not change.

I'm sorry if that's not the place for the question/feature request, but it 
really isn't a user question.

> map key_values to values
> 
>
> Key: SPARK-12623
> URL: https://issues.apache.org/jira/browse/SPARK-12623
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Elazar Gershuni
>Priority: Minor
>  Labels: easyfix, features, performance
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Why doesn't the argument to mapValues() take a key as an agument? 
> Alternatively, can we have a "mapKeyValuesToValues" that does?
> Use case: I want to write a simpler analyzer that takes the argument to 
> map(), and analyze it to see whether it (trivially) doesn't change the key, 
> e.g. 
> g = lambda kv: (kv[0], f(kv[0], kv[1]))
> rdd.map(g)
> Problem is, if I find that it is the case, I can't call mapValues() with that 
> function, as in `rdd.mapValues(lambda kv: g(kv)[1])`, since mapValues 
> receives only `v` as an argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11806) Spark 2.0 deprecations and removals

2016-01-05 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-11806:

Description: 
This is an umbrella ticket to track things we are deprecating and removing in 
Spark 2.0.



  was:
This is an umbrella ticket to track things we are deprecating and removing in 
Spark 2.0.

All sub-tasks are currently assigned to Reynold to prevent others from picking 
up prematurely.




> Spark 2.0 deprecations and removals
> ---
>
> Key: SPARK-11806
> URL: https://issues.apache.org/jira/browse/SPARK-11806
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>  Labels: releasenotes
>
> This is an umbrella ticket to track things we are deprecating and removing in 
> Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12641) Remove unused code related to Hadoop 0.23

2016-01-05 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-12641.
-
   Resolution: Fixed
 Assignee: Kousuke Saruta
Fix Version/s: 2.0.0

> Remove unused code related to Hadoop 0.23
> -
>
> Key: SPARK-12641
> URL: https://issues.apache.org/jira/browse/SPARK-12641
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 2.0.0
>
>
> Currently we don't support Hadoop 0.23 but there is a few code related to it 
> so let's clean it up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12632) Make Parameter Descriptions Consistent for PySpark MLlib FPM and Recommendation

2016-01-05 Thread somil deshmukh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082643#comment-15082643
 ] 

somil deshmukh commented on SPARK-12632:


I would like to work on this

> Make Parameter Descriptions Consistent for PySpark MLlib FPM and 
> Recommendation
> ---
>
> Key: SPARK-12632
> URL: https://issues.apache.org/jira/browse/SPARK-12632
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 1.6.0
>Reporter: Bryan Cutler
>Priority: Trivial
>  Labels: doc, starter
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Follow example parameter description format from parent task to fix up fpm.py 
> and recommendation.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12623) map key_values to values

2016-01-05 Thread Elazar Gershuni (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082639#comment-15082639
 ] 

Elazar Gershuni commented on SPARK-12623:
-

That does not answer the question/feature request. Mapping values to values can 
be achieved by similar code to the one you suggested:

rdd.map { case (key, value) => (key, myFunctionOf(value)) }

Yet Spark does provide rdd.mapValues(), for performance reasons (retaining the 
partitioning - avoiding the need to reshuffle when the key does not change).
I would like to enjoy similar benefits for my case too. The code that you 
suggested does not, since spark cannot know that the key does not change.

I'm sorry if that's not the place for the question/feature request, but it 
really isn't a user question.

> map key_values to values
> 
>
> Key: SPARK-12623
> URL: https://issues.apache.org/jira/browse/SPARK-12623
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Elazar Gershuni
>Priority: Minor
>  Labels: easyfix, features, performance
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Why doesn't the argument to mapValues() take a key as an agument? 
> Alternatively, can we have a "mapKeyValuesToValues" that does?
> Use case: I want to write a simpler analyzer that takes the argument to 
> map(), and analyze it to see whether it (trivially) doesn't change the key, 
> e.g. 
> g = lambda kv: (kv[0], f(kv[0], kv[1]))
> rdd.map(g)
> Problem is, if I find that it is the case, I can't call mapValues() with that 
> function, as in `rdd.mapValues(lambda kv: g(kv)[1])`, since mapValues 
> receives only `v` as an argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12570) DecisionTreeRegressor: provide variance of prediction: user guide update

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12570:


Assignee: Apache Spark

> DecisionTreeRegressor: provide variance of prediction: user guide update
> 
>
> Key: SPARK-12570
> URL: https://issues.apache.org/jira/browse/SPARK-12570
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>Priority: Minor
>
> See linked JIRA for details.  This should update the table of output columns 
> and text.  Examples are probably not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12570) DecisionTreeRegressor: provide variance of prediction: user guide update

2016-01-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082631#comment-15082631
 ] 

Apache Spark commented on SPARK-12570:
--

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/10594

> DecisionTreeRegressor: provide variance of prediction: user guide update
> 
>
> Key: SPARK-12570
> URL: https://issues.apache.org/jira/browse/SPARK-12570
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> See linked JIRA for details.  This should update the table of output columns 
> and text.  Examples are probably not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12570) DecisionTreeRegressor: provide variance of prediction: user guide update

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12570:


Assignee: (was: Apache Spark)

> DecisionTreeRegressor: provide variance of prediction: user guide update
> 
>
> Key: SPARK-12570
> URL: https://issues.apache.org/jira/browse/SPARK-12570
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> See linked JIRA for details.  This should update the table of output columns 
> and text.  Examples are probably not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12644) Vectorize/Batch decode parquet

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12644:


Assignee: Nong Li  (was: Apache Spark)

> Vectorize/Batch decode parquet
> --
>
> Key: SPARK-12644
> URL: https://issues.apache.org/jira/browse/SPARK-12644
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Nong Li
>Assignee: Nong Li
>
> The parquet encodings are largely designed to decode faster in batches, 
> column by column. This can speed up the decoding considerably.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12644) Vectorize/Batch decode parquet

2016-01-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12644:


Assignee: Apache Spark  (was: Nong Li)

> Vectorize/Batch decode parquet
> --
>
> Key: SPARK-12644
> URL: https://issues.apache.org/jira/browse/SPARK-12644
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Nong Li
>Assignee: Apache Spark
>
> The parquet encodings are largely designed to decode faster in batches, 
> column by column. This can speed up the decoding considerably.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12644) Vectorize/Batch decode parquet

2016-01-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082629#comment-15082629
 ] 

Apache Spark commented on SPARK-12644:
--

User 'nongli' has created a pull request for this issue:
https://github.com/apache/spark/pull/10593

> Vectorize/Batch decode parquet
> --
>
> Key: SPARK-12644
> URL: https://issues.apache.org/jira/browse/SPARK-12644
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Nong Li
>Assignee: Nong Li
>
> The parquet encodings are largely designed to decode faster in batches, 
> column by column. This can speed up the decoding considerably.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3