[jira] [Assigned] (SPARK-11468) Add R API for stddev/variance

2015-11-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11468:


Assignee: Apache Spark

> Add R API for stddev/variance
> -
>
> Key: SPARK-11468
> URL: https://issues.apache.org/jira/browse/SPARK-11468
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Davies Liu
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11468) Add R API for stddev/variance

2015-11-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991325#comment-14991325
 ] 

Apache Spark commented on SPARK-11468:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/9489

> Add R API for stddev/variance
> -
>
> Key: SPARK-11468
> URL: https://issues.apache.org/jira/browse/SPARK-11468
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11468) Add R API for stddev/variance

2015-11-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11468:


Assignee: (was: Apache Spark)

> Add R API for stddev/variance
> -
>
> Key: SPARK-11468
> URL: https://issues.apache.org/jira/browse/SPARK-11468
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11449) PortableDataStream should be a factory

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-11449.
---
   Resolution: Fixed
 Assignee: Herman van Hovell
Fix Version/s: 1.6.0

Resolved by https://github.com/apache/spark/pull/9417

> PortableDataStream should be a factory
> --
>
> Key: SPARK-11449
> URL: https://issues.apache.org/jira/browse/SPARK-11449
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>Priority: Minor
> Fix For: 1.6.0
>
>
> {{PortableDataStream}}'s close behavior caught me by surprise the other day. 
> I assumed incorrectly that closing the inputstream it provides would also 
> close the {{PortableDataStream}}. This leads to quite a confusing situation 
> in when you try to reuse the {{PortableDataStream}}: the state of the 
> {{PortableDataStream}} indicates that it is open, whereas the underlying 
> inputstream is actually closed.
> I'd like either to improve the documentation, or add an {{InputStream}} 
> wrapper that closes the {{PortableDataStream}} when you close the 
> {{InputStream}}. Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8469) Application timeline view unreadable with many executors

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8469:
-
Target Version/s:   (was: 1.6.0)

> Application timeline view unreadable with many executors
> 
>
> Key: SPARK-8469
> URL: https://issues.apache.org/jira/browse/SPARK-8469
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Andrew Or
> Attachments: Screen Shot 2015-06-18 at 5.51.21 PM.png
>
>
> This is a problem with using dynamic allocation with many executors. See 
> screenshot. We may want to limit the number of stacked events somehow. See 
> screenshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1227) Diagnostics for Classification

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-1227.
--
Resolution: Won't Fix

> Diagnostics for Classification
> -
>
> Key: SPARK-1227
> URL: https://issues.apache.org/jira/browse/SPARK-1227
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Martin Jaggi
>Assignee: Martin Jaggi
>
> Currently, the attained objective function is not computed (for efficiency 
> reasons, as one evaluation requires one full pass through the data).
> For diagnostics and comparing different algorithms, we should however provide 
> this as a separate function (one MR).
> Doing this requires the loss and regularizer functions themselves, not only 
> their gradients (which are currently in the Gradient class). How about adding 
> the new function directly on the corresponding models in classification/* and 
> regression/* ? Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9226) Change default log level to WARN in python REPL

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-9226.
--
  Resolution: Won't Fix
Target Version/s:   (was: 1.6.0)

> Change default log level to WARN in python REPL
> ---
>
> Key: SPARK-9226
> URL: https://issues.apache.org/jira/browse/SPARK-9226
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Reporter: Auberon López
>Priority: Minor
>
> SPARK-7261 provides separate logging properties to be used when in the scala 
> REPL, by default changing the logging level to WARN instead of INFO. This 
> same improvement can be implemented for the Python REPL, which will make 
> using Pyspark interactively a cleaner experience that is closer to parity 
> with the scala shell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-11526.
---
Resolution: Invalid

There isn't any concrete information here, like specific details of the error. 
It's not even clear this is a Spark issue. Please read 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>  Labels: easyfix
>
> I have tried to use SparkSQL JDBC to connect to *PostGIS* Database. Although 
> the connection works fine with a normal *PostgresSql* Database, it throws 
> UnSupported Type Exception when I try to query a Database with _PostGIS_ 
> extension.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11525) Support spark packages containing R source code in Standalone mode

2015-11-05 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated SPARK-11525:

Description: Currently we support 

> Support spark packages containing R source code in Standalone mode
> --
>
> Key: SPARK-11525
> URL: https://issues.apache.org/jira/browse/SPARK-11525
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.1
>Reporter: Sun Rui
>
> Currently we support 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11145) Cannot filter using a partition key and another column

2015-11-05 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991314#comment-14991314
 ] 

Jeff Zhang commented on SPARK-11145:


I ran it on the master, seems it has been resolved. 

> Cannot filter using a partition key and another column
> --
>
> Key: SPARK-11145
> URL: https://issues.apache.org/jira/browse/SPARK-11145
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.5.1
>Reporter: Julien Buret
>
> A Dataframe, loaded from partitionned parquet files, cannot be filtered by a 
> predicate comparing a partition key and another column.
> In this case all records are returned
> Example
> {code}
> from pyspark.sql import SQLContext
> sqlContext = SQLContext(sc)
> d = [
> {'name': 'a', 'YEAR': 2015, 'year_2': 2014, 'statut': 'a'},
> {'name': 'b', 'YEAR': 2014, 'year_2': 2014, 'statut': 'a'},
> {'name': 'c', 'YEAR': 2013, 'year_2': 2011, 'statut': 'a'},
> {'name': 'd', 'YEAR': 2014, 'year_2': 2013, 'statut': 'a'},
> {'name': 'e', 'YEAR': 2016, 'year_2': 2017, 'statut': 'p'}
> ]
> rdd = sc.parallelize(d)
> df = sqlContext.createDataFrame(rdd)
> df.write.partitionBy('YEAR').mode('overwrite').parquet('data')
> df2 = sqlContext.read.parquet('data')
> df2.filter(df2.YEAR == df2.year_2).show()
> {code}
> return 
> {code}
> ++--+--++
> |name|statut|year_2|YEAR|
> ++--+--++
> |   d| a|  2013|2014|
> |   b| a|  2014|2014|
> |   c| a|  2011|2013|
> |   e| p|  2017|2016|
> |   a| a|  2014|2015|
> ++--+--++
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4134) Dynamic allocation: tone down scary executor lost messages when killing on purpose

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4134:
-
Target Version/s:   (was: 1.6.0)

> Dynamic allocation: tone down scary executor lost messages when killing on 
> purpose
> --
>
> Key: SPARK-4134
> URL: https://issues.apache.org/jira/browse/SPARK-4134
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> After SPARK-3822 goes in, we are now able to dynamically kill executors after 
> an application has started. However, when we do that we get a ton of scary 
> error messages telling us that we've done wrong somehow. It would be good to 
> detect when this is the case and prevent these messages from surfacing.
> This maybe difficult, however, because the connection manager tends to be 
> quite verbose in unconditionally logging disconnection messages. This is a 
> very nice-to-have for 1.2 but certainly not a blocker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7977) Disallow println

2015-11-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991408#comment-14991408
 ] 

Sean Owen commented on SPARK-7977:
--

This has nothing to do with this JIRA, and is a question for 
u...@spark.apache.org.

> Disallow println
> 
>
> Key: SPARK-7977
> URL: https://issues.apache.org/jira/browse/SPARK-7977
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Reporter: Reynold Xin
>Assignee: Jon Alter
>  Labels: starter
> Fix For: 1.5.0
>
>
> Very often we see pull requests that added println from debugging, but the 
> author forgot to remove it before code review.
> We can use the regex checker to disallow println. For legitimate use of 
> println, we can then disable the rule where they are used.
> Add to scalastyle-config.xml file:
> {code}
>class="org.scalastyle.scalariform.TokenChecker" enabled="true">
> ^println$
> 
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11425) Improve hybrid aggregation (sort-based after hash-based)

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-11425:
--
Assignee: Davies Liu

> Improve hybrid aggregation (sort-based after hash-based)
> 
>
> Key: SPARK-11425
> URL: https://issues.apache.org/jira/browse/SPARK-11425
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 1.6.0
>
>
> After aggregation, the dataset could be smaller than inputs, so it's better 
> to do hash based aggregation for all inputs, then using sort based 
> aggregation to merge them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11398) unnecessary def dialectClassName in HiveContext, and misleading dialect conf at the start of spark-sql

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-11398:
--
Assignee: Zhenhua Wang

> unnecessary def dialectClassName in HiveContext, and misleading dialect conf 
> at the start of spark-sql
> --
>
> Key: SPARK-11398
> URL: https://issues.apache.org/jira/browse/SPARK-11398
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Zhenhua Wang
>Assignee: Zhenhua Wang
>Priority: Minor
> Fix For: 1.6.0
>
>
> 1. def dialectClassName in HiveContext is unnecessary. 
> In HiveContext, if conf.dialect == "hiveql", getSQLDialect() will return new 
> HiveQLDialect(this);
> else it will use super.getSQLDialect(). Then in super.getSQLDialect(), it 
> calls dialectClassName, which is overriden in HiveContext and still return 
> super.dialectClassName.
> So we'll never reach the code "classOf[HiveQLDialect].getCanonicalName" of 
> def dialectClassName in HiveContext.
> 2. When we start bin/spark-sql, the default context is HiveContext, and the 
> corresponding dialect is hiveql.
> However, if we type "set spark.sql.dialect;", the result is "sql", which is 
> inconsistent with the actual dialect and is misleading. For example, we can 
> use sql like "create table" which is only allowed in hiveql, but this dialect 
> conf shows it's "sql".
> Although this problem will not cause any execution error, it's misleading to 
> spark sql users. Therefore I think we should fix it.
> In this pr, instead of overriding def dialect in conf of HiveContext, I set 
> the SQLConf.DIALECT directly as "hiveql", such that result of "set 
> spark.sql.dialect;" will be "hiveql", not "sql". After the change, we can 
> still use "sql" as the dialect in HiveContext through "set 
> spark.sql.dialect=sql". Then the conf.dialect in HiveContext will become sql. 
> Because in SQLConf, def dialect = getConf(), and now the dialect in 
> "settings" becomes "sql".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11519) Spark MemoryStore with hadoop SequenceFile cache the values is same record.

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-11519.
---
Resolution: Not A Problem

Not quite sure what you're reporting, but it is expected that objects returned 
from InputFormats are reused. Callers have to clone the objects if storing them.

> Spark MemoryStore with hadoop SequenceFile cache the values is same record.
> ---
>
> Key: SPARK-11519
> URL: https://issues.apache.org/jira/browse/SPARK-11519
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.1.0
> Environment: jdk.1.7.0, spark1.1.0, hadoop2.3.0
>Reporter: xukaiqiang
>
> use spark create newAPIHadoopFile which is SequenceFile format, when use 
> spark memory cache, the cache save the same java object .
> read  hadoop file with SequenceFileRecordReader save as NewHadoopRDD. the kv 
> values as  :
> [1, com.data.analysis.domain.RecordObject@54cdb594]
> [2, com.data.analysis.domain.RecordObject@54cdb594]
> [3, com.data.analysis.domain.RecordObject@54cdb594]
> although the value is the same java object , but i am sure the context is not 
> same .
> jsut use spark memory cache, the  MemoryStore vector save all records, but 
> the value is the last vlaue from newHadoopRDD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5114) Should Evaluator be a PipelineStage

2015-11-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991434#comment-14991434
 ] 

Sean Owen commented on SPARK-5114:
--

[~josephkb] [~mengxr] as I'm sweeping through stuff targeted for 1.6, I see 
most of the issues that are being targeted and pushed version to version are in 
MLlib. Lots are in these nearly-finished umbrellas, which have some open-ended 
tasks like this. Is this one resolved for example? how about the other old 
umbrella issues? some seem like they can be untargeted, resolved, or just 
outright closed.

> Should Evaluator be a PipelineStage
> ---
>
> Key: SPARK-5114
> URL: https://issues.apache.org/jira/browse/SPARK-5114
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 1.2.0
>Reporter: Joseph K. Bradley
>
> Pipelines can currently contain Estimators and Transformers.
> Question for debate: Should Pipelines be able to contain Evaluators?
> Pros:
> * Schema check: Evaluators take input datasets with particular schema, which 
> should perhaps be checked before running a Pipeline.
> * Intermediate results:
> ** If a Transformer removes a column (which is not done by built-in 
> Transformers currently but might be reasonable in the future), then the user 
> can never evaluate that column.  (However, users could keep all columns 
> around.)
> ** If users have to evaluate after running a Pipeline, then each evaluated 
> column may have to be re-materialized.
> Cons:
> * API: Evaluators do not transform datasets.   They produce a scalar (or a 
> few values), which makes it hard to say how they fit into a Pipeline or a 
> PipelineModel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11515) QuantileDiscretizer should take random seed

2015-11-05 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991294#comment-14991294
 ] 

Yu Ishikawa commented on SPARK-11515:
-

I'll work on this issue.

> QuantileDiscretizer should take random seed
> ---
>
> Key: SPARK-11515
> URL: https://issues.apache.org/jira/browse/SPARK-11515
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> QuantileDiscretizer takes a random sample to select bins.  It currently does 
> not specify a seed for the XORShiftRandom, but it should take a seed by 
> extending the HasSeed Param.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11518) The script spark-submit.cmd can not handle spark directory with space.

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-11518:
--
   Priority: Minor  (was: Major)
Component/s: (was: Spark Core)
 Windows
 Deploy

[~cliu] can you suggest a fix to the scripts that quotes paths?

> The script spark-submit.cmd can not handle spark directory with space.
> --
>
> Key: SPARK-11518
> URL: https://issues.apache.org/jira/browse/SPARK-11518
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Windows
>Affects Versions: 1.4.1
>Reporter: Cele Liu
>Priority: Minor
>
> Unzip the spark into D:\Program Files\Spark, when we submit the app, we got 
> error:
> 'D:\Program' is not recognized as an internal or external command,
> operable program or batch file.
> In spark-submit.cmd, the script does not handle space:
> cmd /V /E /C %~dp0spark-submit2.cmd %*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11461) ObjectFile saving/loading should use configured serializer

2015-11-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991358#comment-14991358
 ] 

Sean Owen commented on SPARK-11461:
---

I think the semantics of the methods imply Java serialization since it looks 
like it's intended for consumption by other parts of the Hadoop ecosystem as a 
SequenceFile. How about just using {{saveAsHadoopFile}} directly?

> ObjectFile saving/loading should use configured serializer
> --
>
> Key: SPARK-11461
> URL: https://issues.apache.org/jira/browse/SPARK-11461
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Ryan Williams
>
> [RDD.saveAsObjectFile|https://github.com/apache/spark/blob/v1.5.1/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1452]
>  and 
> [SparkContext.objectFile|https://github.com/apache/spark/blob/v1.5.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L1223]
>  use 
> [Utils.serialize|https://github.com/apache/spark/blob/v1.5.1/core/src/main/scala/org/apache/spark/util/Utils.scala#L78-L85]
>  and 
> [Utils.deserialize|https://github.com/apache/spark/blob/v1.5.1/core/src/main/scala/org/apache/spark/util/Utils.scala#L94-L105]
>  which are hard-coded to use Java SerDe rather than the serializer configured 
> via the {{spark.serializer}} conf param.
> I'd like to write RDDs as Object-/Sequence-Files using e.g. Kryo serde 
> instead of Java; is there a way to do this, or any reason that Spark 
> currently only supports Java?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11440) Declare rest of @Experimental items non-experimental if they've existed since 1.2.0

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-11440.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 9396
[https://github.com/apache/spark/pull/9396]

> Declare rest of @Experimental items non-experimental if they've existed since 
> 1.2.0
> ---
>
> Key: SPARK-11440
> URL: https://issues.apache.org/jira/browse/SPARK-11440
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Spark Core, Streaming
>Affects Versions: 1.5.1
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 1.6.0
>
>
> Follow on to SPARK-11184. This removes {{Experimental}} annotations on 
> methods that have existed since at least 1.2.0. That's almost entirely stuff 
> in core and streaming. 
> SQL experimental items are largely 1.3.0 onwards; arguably could be 
> non-Experimental and happy to do that.
> We've already reviewed MLlib, and ML is still properly Experimental in the 
> main now. Details in the PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5256) Improving MLlib optimization APIs

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5256:
-
Target Version/s:   (was: 1.6.0)

> Improving MLlib optimization APIs
> -
>
> Key: SPARK-5256
> URL: https://issues.apache.org/jira/browse/SPARK-5256
> Project: Spark
>  Issue Type: Umbrella
>  Components: MLlib
>Reporter: Joseph K. Bradley
>
> *Goal*: Improve APIs for optimization
> *Motivation*: There have been several disjoint mentions of improving the 
> optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
> a place to discuss what API changes are necessary for the long term, and to 
> provide links to other relevant JIRAs.
> Eventually, I hope this leads to a design doc outlining:
> * current issues
> * requirements such as supporting many types of objective functions, 
> optimization algorithms, and parameters to those algorithms
> * ideal API
> * breakdown of smaller JIRAs needed to achieve that API
> I will soon create an initial design doc, and I will try to watch this JIRA 
> and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5269) BlockManager.dataDeserialize always creates a new serializer instance

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5269:
-
Target Version/s:   (was: 1.6.0)

> BlockManager.dataDeserialize always creates a new serializer instance
> -
>
> Key: SPARK-5269
> URL: https://issues.apache.org/jira/browse/SPARK-5269
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Ivan Vergiliev
>Assignee: Matt Cheah
>  Labels: performance, serializers
>
> BlockManager.dataDeserialize always creates a new instance of the serializer, 
> which is pretty slow in some cases. I'm using Kryo serialization and have a 
> custom registrator, and its register method is showing up as taking about 15% 
> of the execution time in my profiles. This started happening after I 
> increased the number of keys in a job with a shuffle phase by a factor of 40.
> One solution I can think of is to create a ThreadLocal SerializerInstance for 
> the defaultSerializer, and only create a new one if a custom serializer is 
> passed in. AFAICT a custom serializer is passed only from 
> DiskStore.getValues, and that, on the other hand, depends on the serializer 
> passed to ExternalSorter. I don't know how often this is used, but I think 
> this can still be a good solution for the standard use case.
> Oh, and also - ExternalSorter already has a SerializerInstance, so if the 
> getValues method is called from a single thread, maybe we can pass that 
> directly?
> I'd be happy to try a patch but would probably need a confirmation from 
> someone that this approach would indeed work (or an idea for another).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6717) Clear shuffle files after checkpointing in ALS

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6717:
-
Target Version/s:   (was: 1.6.0)

> Clear shuffle files after checkpointing in ALS
> --
>
> Key: SPARK-6717
> URL: https://issues.apache.org/jira/browse/SPARK-6717
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>  Labels: als
>
> In ALS iterations, we checkpoint RDDs to cut lineage and to reduce shuffle 
> files. However, whether to clean shuffle files depends on the system GC, 
> which may not be triggered in ALS iterations. So after checkpointing, before 
> we let the RDD object go out of scope, we should clean its shuffle 
> dependencies explicitly. This function could either stay inside ALS or go to 
> Core.
> Without this feature, we can call System.gc() periodically to clean shuffle 
> files of RDDs that went out of scope.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8546) PMML export for Naive Bayes

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8546:
-
Target Version/s:   (was: 1.6.0)

> PMML export for Naive Bayes
> ---
>
> Key: SPARK-8546
> URL: https://issues.apache.org/jira/browse/SPARK-8546
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Joseph K. Bradley
>Assignee: Xusen Yin
>Priority: Minor
>
> The naive Bayes section of PMML standard can be found at 
> http://www.dmg.org/v4-1/NaiveBayes.html. We should first figure out how to 
> generate PMML for both binomial and multinomial naive Bayes models using 
> JPMML (maybe [~vfed] can help).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3586) Support nested directories in Spark Streaming

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-3586:
-
Target Version/s:   (was: 1.6.0)

> Support nested directories in Spark Streaming
> -
>
> Key: SPARK-3586
> URL: https://issues.apache.org/jira/browse/SPARK-3586
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.1.0
>Reporter: XiaoJing wang
>Priority: Minor
>
> For  text files, the method streamingContext.textFileStream(dataDirectory). 
> Spark Streaming will monitor the directory dataDirectory and process any 
> files created in that directory.but files written in nested directories not 
> supported
> eg
> streamingContext.textFileStream(/test). 
> Look at the direction contents:
> /test/file1
> /test/file2
> /test/dr/file1
> In this mothod the "textFileStream" can only read file:
> /test/file1
> /test/file2
> /test/dr/
> but the file: /test/dr/file1  is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11518) The script spark-submit.cmd can not handle spark directory with space.

2015-11-05 Thread Cele Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991425#comment-14991425
 ] 

Cele Liu commented on SPARK-11518:
--

No. I just use work around to not put Spark into a such folder.

> The script spark-submit.cmd can not handle spark directory with space.
> --
>
> Key: SPARK-11518
> URL: https://issues.apache.org/jira/browse/SPARK-11518
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Windows
>Affects Versions: 1.4.1
>Reporter: Cele Liu
>Priority: Minor
>
> Unzip the spark into D:\Program Files\Spark, when we submit the app, we got 
> error:
> 'D:\Program' is not recognized as an internal or external command,
> operable program or batch file.
> In spark-submit.cmd, the script does not handle space:
> cmd /V /E /C %~dp0spark-submit2.cmd %*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11518) The script spark-submit.cmd can not handle spark directory with space.

2015-11-05 Thread Cele Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991427#comment-14991427
 ] 

Cele Liu commented on SPARK-11518:
--

No. I just use work around to not put Spark into a such folder.

> The script spark-submit.cmd can not handle spark directory with space.
> --
>
> Key: SPARK-11518
> URL: https://issues.apache.org/jira/browse/SPARK-11518
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Windows
>Affects Versions: 1.4.1
>Reporter: Cele Liu
>Priority: Minor
>
> Unzip the spark into D:\Program Files\Spark, when we submit the app, we got 
> error:
> 'D:\Program' is not recognized as an internal or external command,
> operable program or batch file.
> In spark-submit.cmd, the script does not handle space:
> cmd /V /E /C %~dp0spark-submit2.cmd %*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11518) The script spark-submit.cmd can not handle spark directory with space.

2015-11-05 Thread Cele Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991424#comment-14991424
 ] 

Cele Liu commented on SPARK-11518:
--

No. I just use work around to not put Spark into a such folder.

> The script spark-submit.cmd can not handle spark directory with space.
> --
>
> Key: SPARK-11518
> URL: https://issues.apache.org/jira/browse/SPARK-11518
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Windows
>Affects Versions: 1.4.1
>Reporter: Cele Liu
>Priority: Minor
>
> Unzip the spark into D:\Program Files\Spark, when we submit the app, we got 
> error:
> 'D:\Program' is not recognized as an internal or external command,
> operable program or batch file.
> In spark-submit.cmd, the script does not handle space:
> cmd /V /E /C %~dp0spark-submit2.cmd %*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11518) The script spark-submit.cmd can not handle spark directory with space.

2015-11-05 Thread Cele Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991426#comment-14991426
 ] 

Cele Liu commented on SPARK-11518:
--

No. I just use work around to not put Spark into a such folder.

> The script spark-submit.cmd can not handle spark directory with space.
> --
>
> Key: SPARK-11518
> URL: https://issues.apache.org/jira/browse/SPARK-11518
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Windows
>Affects Versions: 1.4.1
>Reporter: Cele Liu
>Priority: Minor
>
> Unzip the spark into D:\Program Files\Spark, when we submit the app, we got 
> error:
> 'D:\Program' is not recognized as an internal or external command,
> operable program or batch file.
> In spark-submit.cmd, the script does not handle space:
> cmd /V /E /C %~dp0spark-submit2.cmd %*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11525) Support spark packages containing R source code in Standalone mode

2015-11-05 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated SPARK-11525:

Description: 
Currently SparkR supports spark packages containing R source code in YARN 
modes. However, this does not work in Standalone modes.
To support this feature in Standalone modes, something needs to be done:
1. Distribute an archive of the additional R packages built from R source code 
contained in spark packages or jars specified via spark-submit command line 
options to cluster nodes;
2. On cluster nodes, decompress the archive to a writable directory and pass 
the path to R processes.

  was:Currently we support 


> Support spark packages containing R source code in Standalone mode
> --
>
> Key: SPARK-11525
> URL: https://issues.apache.org/jira/browse/SPARK-11525
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 1.5.1
>Reporter: Sun Rui
>
> Currently SparkR supports spark packages containing R source code in YARN 
> modes. However, this does not work in Standalone modes.
> To support this feature in Standalone modes, something needs to be done:
> 1. Distribute an archive of the additional R packages built from R source 
> code contained in spark packages or jars specified via spark-submit command 
> line options to cluster nodes;
> 2. On cluster nodes, decompress the archive to a writable directory and pass 
> the path to R processes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7977) Disallow println

2015-11-05 Thread sparkerjin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991399#comment-14991399
 ] 

sparkerjin commented on SPARK-7977:
---

Hi,
 
I run the spark-submit with cluster mode, but there was not any output  about 
the driverID and the status of the driver.   Is

The descriptions are as follows:
1. run the spark-submit with cluster mode:
[root@jasonspark02 spark-1.5.1-bin-hadoop2.4]# bin/spark-submit --deploy-mode 
cluster --class org.apache.spark.examples.SparkPi ./lib/spark-

examples-1.5.1-hadoop2.4.0.jar
Running Spark using the REST application submission protocol.
15/11/05 02:13:43 INFO rest.RestSubmissionClient: Submitting a request to 
launch an application in spark://jasonspark02:7077.
15/11/05 02:13:43 WARN rest.RestSubmissionClient: Unable to connect to server 
spark://jasonspark02:7077.
Warning: Master endpoint spark://jasonspark02:7077 was not a REST server. 
Falling back to legacy submission gateway instead.
15/11/05 02:13:43 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where 

applicable
[root@jasonspark02 spark-1.5.1-bin-hadoop2.4]#

2. Problem:
we can see that there is not any information about the driverId and the state 
of the driver.  I think users need to use these information to get more 
information about the job.

And running spark-submit cluster mode with --verbose,  it also could not get 
the info.

3. Reason:
I  looked into the code, and found that  --verbose  or -v was not passed into 
the  childArgs in spark-submit, and it used the  Level.WARN as the default  in 
Client.scala.

4. Expected:
I think the users should know the driverID and status after he/she submits a 
job.
such as:
[root@jasonpark02 spark-1.5.1-bin-hadoop2.4]# bin/spark-submit --deploy-mode 
cluster --conf spark.ego.uname=u1 --conf spark.ego.passwd=u1  --class 
org.apache.spark.examples.SparkPi ./lib/spark-examples-1.5.1-hadoop2.4.0.jar
Running Spark using the REST application submission protocol.
15/11/05 03:08:43 INFO rest.RestSubmissionClient: Submitting a request to 
launch an application in spark://jasonspark02:7077.
15/11/05 03:08:44 WARN rest.RestSubmissionClient: Unable to connect to server 
spark://jasonspark02:7077.
Warning: Master endpoint spark://jinspark02:7077 was not a REST server. Falling 
back to legacy submission gateway instead.
15/11/05 03:08:44 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
15/11/05 03:08:44 INFO spark.SecurityManager: Changing view acls to: root
15/11/05 03:08:44 INFO spark.SecurityManager: Changing modify acls to: root
15/11/05 03:08:44 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(root); users with 
modify permissions: Set(root)
15/11/05 03:08:45 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/11/05 03:08:45 INFO util.Utils: Successfully started service 'driverClient' 
on port 47454.
15/11/05 03:08:45 INFO deploy.ClientEndpoint: Driver successfully submitted as 
driver-20151105030845- <--   I think the 
info is very important
15/11/05 03:08:45 INFO deploy.ClientEndpoint: ... waiting before polling master 
for driver state
15/11/05 03:08:50 INFO deploy.ClientEndpoint: ... polling master for driver 
state
15/11/05 03:08:50 INFO deploy.ClientEndpoint: State of 
driver-20151105030845-

What do you think of it? Any ideas, please let me know.  Thanks.

Jie Hua




> Disallow println
> 
>
> Key: SPARK-7977
> URL: https://issues.apache.org/jira/browse/SPARK-7977
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Reporter: Reynold Xin
>Assignee: Jon Alter
>  Labels: starter
> Fix For: 1.5.0
>
>
> Very often we see pull requests that added println from debugging, but the 
> author forgot to remove it before code review.
> We can use the regex checker to disallow println. For legitimate use of 
> println, we can then disable the rule where they are used.
> Add to scalastyle-config.xml file:
> {code}
>class="org.scalastyle.scalariform.TokenChecker" enabled="true">
> ^println$
> 
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11102) Uninformative exception when specifing non-exist input for JSON data source

2015-11-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991398#comment-14991398
 ] 

Apache Spark commented on SPARK-11102:
--

User 'zjffdu' has created a pull request for this issue:
https://github.com/apache/spark/pull/9490

> Uninformative exception when specifing non-exist input for JSON data source
> ---
>
> Key: SPARK-11102
> URL: https://issues.apache.org/jira/browse/SPARK-11102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Jeff Zhang
>Priority: Minor
>
> If I specify a non-exist input path for json data source, the following 
> exception will be thrown, it is not readable. 
> {code}
> 15/10/14 16:14:39 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
> in memory (estimated size 19.9 KB, free 251.4 KB)
> 15/10/14 16:14:39 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
> on 192.168.3.3:54725 (size: 19.9 KB, free: 2.2 GB)
> 15/10/14 16:14:39 INFO SparkContext: Created broadcast 0 from json at 
> :19
> java.io.IOException: No input paths specified in job
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:201)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD.scala:1087)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1085)
>   at 
> org.apache.spark.sql.execution.datasources.json.InferSchema$.apply(InferSchema.scala:58)
>   at 
> org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:105)
>   at 
> org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:100)
>   at scala.Option.getOrElse(Option.scala:120)
>   at 
> org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema$lzycompute(JSONRelation.scala:100)
>   at 
> org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema(JSONRelation.scala:99)
>   at 
> org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:561)
>   at 
> org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:560)
>   at 
> org.apache.spark.sql.execution.datasources.LogicalRelation.(LogicalRelation.scala:37)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:106)
>   at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:221)
>   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:19)
>   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:24)
>   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26)
>   at $iwC$$iwC$$iwC$$iwC$$iwC.(:28)
>   at $iwC$$iwC$$iwC$$iwC.(:30)
>   at $iwC$$iwC$$iwC.(:32)
>   at $iwC$$iwC.(:34)
>   at $iwC.(:36)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9579) Improve Word2Vec unit tests

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9579:
-
Target Version/s:   (was: 1.6.0)

> Improve Word2Vec unit tests
> ---
>
> Key: SPARK-9579
> URL: https://issues.apache.org/jira/browse/SPARK-9579
> Project: Spark
>  Issue Type: Test
>  Components: MLlib
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> Word2Vec unit tests should be improved in a few ways:
> * Test individual components of the algorithm.  This may mean breaking the 
> code into smaller methods which can be tested individually.
> * Test vs another library, if possible.  Following the example of unit tests 
> for LogisticRegression, create robust unit tests making sure the two 
> implementations produce similar results.  This may be too hard to do robustly 
> (and deterministically).  In that case, the first improvement will suffice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9140) Replace TimeTracker by Stopwatch

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9140:
-
Target Version/s:   (was: 1.6.0)

> Replace TimeTracker by Stopwatch
> 
>
> Key: SPARK-9140
> URL: https://issues.apache.org/jira/browse/SPARK-9140
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>Priority: Minor
>
> We can replace TImeTracker in tree implementations by Stopwatch. The initial 
> PR could use local stopwatches only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11486) TungstenAggregate may fail when switching to sort-based aggregation when there are string in grouping columns and no aggregation buffer columns

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-11486:
--
Assignee: Davies Liu

> TungstenAggregate may fail when switching to sort-based aggregation when 
> there are string in grouping columns and no aggregation buffer columns
> ---
>
> Key: SPARK-11486
> URL: https://issues.apache.org/jira/browse/SPARK-11486
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Josh Rosen
>Assignee: Davies Liu
>Priority: Blocker
> Fix For: 1.6.0
>
>
> This was discovered by [~davies]:
> {code}
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeRow.update(UnsafeRow.java:193)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply(generated.java:40)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:643)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:517)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:779)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:128)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$3.apply(TungstenAggregate.scala:137)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$3.apply(TungstenAggregate.scala:137)
>   at 
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 15/10/28 23:25:08 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_3, 
> runningTasks: 0
> {code}
> See discussion at 
> https://github.com/apache/spark/pull/9383#issuecomment-153466959



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11518) The script spark-submit.cmd can not handle spark directory with space.

2015-11-05 Thread Cele Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991422#comment-14991422
 ] 

Cele Liu commented on SPARK-11518:
--

No. I just use work around to not put Spark into a such folder.

> The script spark-submit.cmd can not handle spark directory with space.
> --
>
> Key: SPARK-11518
> URL: https://issues.apache.org/jira/browse/SPARK-11518
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Windows
>Affects Versions: 1.4.1
>Reporter: Cele Liu
>Priority: Minor
>
> Unzip the spark into D:\Program Files\Spark, when we submit the app, we got 
> error:
> 'D:\Program' is not recognized as an internal or external command,
> operable program or batch file.
> In spark-submit.cmd, the script does not handle space:
> cmd /V /E /C %~dp0spark-submit2.cmd %*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5934) DStreamGraph.clearMetadata attempts to unpersist the same RDD multiple times

2015-11-05 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991353#comment-14991353
 ] 

Saisai Shao commented on SPARK-5934:


I think here in your code:

{code}
val output = input.cache().transform(x => x)
{code}

The transformation you did actually do not generate a new RDD, so for 
QueueInputDStream and TransformedDStream, they maintain the same reference of 
RDD2, in this way, calling {{clearMetadata}} will unpersist this RDD twice.

>From my understanding, this is not a bug, this is your case that lead to this 
>WARNING log, if you changed to {{ input.cache().transform(x => x.map(i => 1)) 
>}}, the warning log will not occur.


> DStreamGraph.clearMetadata attempts to unpersist the same RDD multiple times
> 
>
> Key: SPARK-5934
> URL: https://issues.apache.org/jira/browse/SPARK-5934
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Streaming
>Affects Versions: 1.2.1
>Reporter: Nick Pritchard
>Priority: Minor
>
> It seems that since DStream.clearMetadata calls itself recursively on the 
> dependencies, that it attempts to unpersist the same RDD, which results in 
> warn logs like this:
> {quote}
> WARN BlockManager: Asked to remove block rdd_2_1, which does not exist
> {quote}
> or this:
> {quote}
> WARN BlockManager: Block rdd_2_1 could not be removed as it was not found in 
> either the disk, memory, or tachyon store
> {quote}
> This is preceded by logs like:
> {quote}
> DEBUG TransformedDStream: Unpersisting old RDDs: 2
> DEBUG QueueInputDStream: Unpersisting old RDDs: 2
> {quote}
> Here is a reproducible case:
> {code:scala}
> object Test {
>   def main(args: Array[String]): Unit = {
> val conf = new SparkConf().setMaster("local[2]").setAppName("Test")
> val ssc = new StreamingContext(conf, Seconds(1))
> val queue = new mutable.Queue[RDD[Int]]
> val input = ssc.queueStream(queue)
> val output = input.cache().transform(x => x)
> output.print()
> ssc.start()
> for (i <- 1 to 5) {
>   val rdd = ssc.sparkContext.parallelize(Seq(i))
>   queue.enqueue(rdd)
>   Thread.sleep(1000)
> }
> ssc.stop()
>   }
> }
> {code}
> It doesn't seem to be a fatal error, but the WARN messages are a bit 
> unsettling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11459) Allow configuring checkpoint dir, filenames

2015-11-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991352#comment-14991352
 ] 

Sean Owen commented on SPARK-11459:
---

That's not the purpose of checkpoints though. The path has to contain a unique 
element to differentiate different RDDs. You should indeed explicitly 'save' 
data if you need to serialize it to external storage.

> Allow configuring checkpoint dir, filenames
> ---
>
> Key: SPARK-11459
> URL: https://issues.apache.org/jira/browse/SPARK-11459
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Ryan Williams
>Priority: Minor
>
> I frequently want to persist some RDDs to disk and choose the names of the 
> files that they are saved as.
> Currently, the {{RDD.checkpoint}} flow [writes to a directory with a UUID in 
> its 
> name|https://github.com/apache/spark/blob/v1.5.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2050],
>  and the file is [always named after the RDD's 
> ID|https://github.com/apache/spark/blob/v1.5.1/core/src/main/scala/org/apache/spark/rdd/ReliableRDDCheckpointData.scala#L96].
> Is there any reason not to allow the user to e.g. pass a string to 
> {{RDD.checkpoint}} that will set the location that the RDD is checkpointed to?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11378) StreamingContext.awaitTerminationOrTimeout does not return

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-11378.
---
   Resolution: Fixed
Fix Version/s: 1.6.0
   1.5.3

Issue resolved by pull request 9336
[https://github.com/apache/spark/pull/9336]

> StreamingContext.awaitTerminationOrTimeout does not return
> --
>
> Key: SPARK-11378
> URL: https://issues.apache.org/jira/browse/SPARK-11378
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Streaming
>Affects Versions: 1.5.1
>Reporter: Nick Evans
>Assignee: Nick Evans
>Priority: Minor
> Fix For: 1.5.3, 1.6.0
>
>
> The docs for {{SparkContext.awaitTerminationOrTimeout}} state it will "Return 
> `true` if it's stopped; (...) or `false` if the waiting time elapsed before 
> returning from the method."
> This is currently not the case - the function does not return and thus any 
> logic built on awaitTerminationOrTimeout will not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7041) Avoid writing empty files in BypassMergeSortShuffleWriter

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-7041.
--
  Resolution: Duplicate
Target Version/s:   (was: 1.6.0)

I'll close this in favor of the more recent issue since there is an active PR 
and discussion there.

> Avoid writing empty files in BypassMergeSortShuffleWriter
> -
>
> Key: SPARK-7041
> URL: https://issues.apache.org/jira/browse/SPARK-7041
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> In BypassMergeSortShuffleWriter, we may end up opening disk writers files for 
> empty partitions; this occurs because we manually call {{open()}} after 
> creating the writer, causing serialization and compression input streams to 
> be created; these streams may write headers to the output stream, resulting 
> in non-zero-length files being created for partitions that contain no 
> records.  This is unnecessary, though, since the disk object writer will 
> automatically open itself when the first write is performed.  Removing this 
> eager {{open()}} call and rewriting the consumers to cope with the 
> non-existence of empty files results in a large performance benefit for 
> certain sparse workloads when using sort-based shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8884) 1-sample Anderson-Darling Goodness-of-Fit test

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8884:
-
Target Version/s:   (was: 1.6.0)

> 1-sample Anderson-Darling Goodness-of-Fit test
> --
>
> Key: SPARK-8884
> URL: https://issues.apache.org/jira/browse/SPARK-8884
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Jose Cambronero
>Priority: Minor
>
> We have implemented a 1-sample Anderson-Darling goodness-of-fit test to add 
> to the current hypothesis testing functionality. The current implementation 
> supports various distributions (normal, exponential, gumbel, logistic, and 
> weibull). However, users must provide distribution parameters for all except 
> normal/exponential (in which case they are estimated from the data). In 
> contrast to other tests, such as the Kolmogorov Smirnov test, we only support 
> specific distributions as the critical values depend on the distribution 
> being tested. 
> The distributed implementation of AD takes advantage of the fact that we can 
> calculate a portion of the statistic within each partition of a sorted data 
> set, independent of the global order of those observations. We can then carry 
> some additional information that allows us to adjust the final amounts once 
> we have collected 1 result per partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8542) PMML export for Decision Trees

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8542:
-
Target Version/s:   (was: 1.6.0)

> PMML export for Decision Trees
> --
>
> Key: SPARK-8542
> URL: https://issues.apache.org/jira/browse/SPARK-8542
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Joseph K. Bradley
>Assignee: Jasmine George
>Priority: Minor
>   Original Estimate: 216h
>  Remaining Estimate: 216h
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8665) Update ALS documentation to include performance tips

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-8665:
-
Target Version/s:   (was: 1.6.0)

> Update ALS documentation to include performance tips
> 
>
> Key: SPARK-8665
> URL: https://issues.apache.org/jira/browse/SPARK-8665
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> With the new ALS implementation, users still need to deal with 
> computation/communication trade-offs. It would be nice to document this 
> clearly based on the issues on the mailing list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)
mustafa elbehery created SPARK-11526:


 Summary: JDBC to PostGIS throws UnSupported Type exception
 Key: SPARK-11526
 URL: https://issues.apache.org/jira/browse/SPARK-11526
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.1
 Environment: Linux Based
Reporter: mustafa elbehery


I have tried to use SparkSQL JDBC to connect to *PostGIS* Database. Although 
the connection works fine with a normal *PostgresSql* Database, it throws 
UnSupported Type Exception when I try to query a Database with _PostGIS_ 
extension.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-11518) The script spark-submit.cmd can not handle spark directory with space.

2015-11-05 Thread Cele Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cele Liu updated SPARK-11518:
-
Comment: was deleted

(was: No. I just use work around to not put Spark into a such folder.)

> The script spark-submit.cmd can not handle spark directory with space.
> --
>
> Key: SPARK-11518
> URL: https://issues.apache.org/jira/browse/SPARK-11518
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Windows
>Affects Versions: 1.4.1
>Reporter: Cele Liu
>Priority: Minor
>
> Unzip the spark into D:\Program Files\Spark, when we submit the app, we got 
> error:
> 'D:\Program' is not recognized as an internal or external command,
> operable program or batch file.
> In spark-submit.cmd, the script does not handle space:
> cmd /V /E /C %~dp0spark-submit2.cmd %*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11525) Support spark packages containing R source code in Standalone mode

2015-11-05 Thread Sun Rui (JIRA)
Sun Rui created SPARK-11525:
---

 Summary: Support spark packages containing R source code in 
Standalone mode
 Key: SPARK-11525
 URL: https://issues.apache.org/jira/browse/SPARK-11525
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Affects Versions: 1.5.1
Reporter: Sun Rui






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7425) spark.ml Predictor should support other numeric types for label

2015-11-05 Thread Stefano Baghino (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991287#comment-14991287
 ] 

Stefano Baghino commented on SPARK-7425:


Thank you for the feedback, [~josephkb].
I agree with what you say, it's just that the edit proposed by [~gweidner] 
would result in regressions when the SchemaUtils.checkColumnType routine is 
called with non-NumericTypes (I've checked usages throughout the code). I'm 
checking if there's another way to work on the issue without causing unwanted 
regressions. I actually implemented a solution and "just" need to write proper 
tests, I'll need a little time as I'm doing this in my spare time. :)

> spark.ml Predictor should support other numeric types for label
> ---
>
> Key: SPARK-7425
> URL: https://issues.apache.org/jira/browse/SPARK-7425
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>  Labels: starter
>
> Currently, the Predictor abstraction expects the input labelCol type to be 
> DoubleType, but we should support other numeric types.  This will involve 
> updating the PredictorParams.validateAndTransformSchema method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11509) ipython notebooks do not work on clusters created using spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 script

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-11509.
---
Resolution: Not A Problem

> ipython notebooks do not work on clusters created using 
> spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 script
> --
>
> Key: SPARK-11509
> URL: https://issues.apache.org/jira/browse/SPARK-11509
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, EC2, PySpark
>Affects Versions: 1.5.1
> Environment: AWS cluster
> [ec2-user@ip-172-31-29-60 ~]$ uname -a
> Linux ip-172-31-29-60.us-west-1.compute.internal 3.4.37-40.44.amzn1.x86_64 #1 
> SMP Thu Mar 21 01:17:08 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Andrew Davidson
>
> I recently downloaded  spark-1.5.1-bin-hadoop2.6 to my local mac.
> I used spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create an aws cluster. I am 
> able to run the java SparkPi example on the cluster how ever I am not able to 
> run ipython notebooks on the cluster. (I connect using ssh tunnel)
> According to the 1.5.1 getting started doc 
> http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell
> The following should work
>  PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook 
> --no-browser --port=7000" /root/spark/bin/pyspark
> I am able to connect to the notebook server and start a notebook how ever
> bug 1) the default sparkContext does not exist
> from pyspark import SparkContext
> textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.md")
> textFile.take(3
> ---
> NameError Traceback (most recent call last)
>  in ()
>   1 from pyspark import SparkContext
> > 2 textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.md")
>   3 textFile.take(3)
> NameError: name 'sc' is not defined
> bug 2)
>  If I create a SparkContext I get the following python versions miss match 
> error
> sc = SparkContext("local", "Simple App")
> textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.md")
> textFile.take(3)
>  File "/root/spark/python/lib/pyspark.zip/pyspark/worker.py", line 64, in main
> ("%d.%d" % sys.version_info[:2], version))
> Exception: Python in worker has different version 2.7 than that in driver 
> 2.6, PySpark cannot run with different minor versions
> I am able to run ipython notebooks on my local Mac as follows. (by default 
> you would get an error that the driver and works are using different version 
> of python)
> $ cat ~/bin/pySparkNotebook.sh
> #!/bin/sh 
> set -x # turn debugging on
> #set +x # turn debugging off
> export PYSPARK_PYTHON=python3
> export PYSPARK_DRIVER_PYTHON=python3
> IPYTHON_OPTS=notebook $SPARK_ROOT/bin/pyspark $*$ 
> I have spent a lot of time trying to debug the pyspark script however I can 
> not figure out what the problem is
> Please let me know if there is something I can do to help
> Andy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10055) San Francisco Crime Classification

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10055:
--
Target Version/s:   (was: 1.6.0)

> San Francisco Crime Classification
> --
>
> Key: SPARK-10055
> URL: https://issues.apache.org/jira/browse/SPARK-10055
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xiangrui Meng
>Assignee: Kai Sasaki
>
> Apply ML pipeline API to San Francisco Crime Classification 
> (https://www.kaggle.com/c/sf-crime).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11518) The script spark-submit.cmd can not handle spark directory with space.

2015-11-05 Thread Cele Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991428#comment-14991428
 ] 

Cele Liu commented on SPARK-11518:
--

No. I just use work around to not put Spark into a such folder.

> The script spark-submit.cmd can not handle spark directory with space.
> --
>
> Key: SPARK-11518
> URL: https://issues.apache.org/jira/browse/SPARK-11518
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Windows
>Affects Versions: 1.4.1
>Reporter: Cele Liu
>Priority: Minor
>
> Unzip the spark into D:\Program Files\Spark, when we submit the app, we got 
> error:
> 'D:\Program' is not recognized as an internal or external command,
> operable program or batch file.
> In spark-submit.cmd, the script does not handle space:
> cmd /V /E /C %~dp0spark-submit2.cmd %*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11518) The script spark-submit.cmd can not handle spark directory with space.

2015-11-05 Thread Cele Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991423#comment-14991423
 ] 

Cele Liu commented on SPARK-11518:
--

No. I just use work around to not put Spark into a such folder.

> The script spark-submit.cmd can not handle spark directory with space.
> --
>
> Key: SPARK-11518
> URL: https://issues.apache.org/jira/browse/SPARK-11518
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Windows
>Affects Versions: 1.4.1
>Reporter: Cele Liu
>Priority: Minor
>
> Unzip the spark into D:\Program Files\Spark, when we submit the app, we got 
> error:
> 'D:\Program' is not recognized as an internal or external command,
> operable program or batch file.
> In spark-submit.cmd, the script does not handle space:
> cmd /V /E /C %~dp0spark-submit2.cmd %*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11506) Code Optimization to remove a redundant operation

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-11506:
--
Assignee: Alok

> Code Optimization to remove a redundant operation
> -
>
> Key: SPARK-11506
> URL: https://issues.apache.org/jira/browse/SPARK-11506
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Alok
>Assignee: Alok
>Priority: Trivial
>  Labels: easyfix, performance
> Fix For: 1.6.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Creating this JIRA to track all 'trivial' code optimizations that represent a 
> small improvement in Spark MLLib.
> These changes are not 'must-haves' but definitely improve MLLIB's performance 
> slightly. Specifically, the changes in this JIRA fall in 'trivial' category 
> as the correctness of the code is not altered and the performance gain has 
> not been proven (theoretically / experimentally) to be major.
> Starting this umbrella JIRA, since I could not find another JIRA where I 
> could link my PR, which removes an unnecessary operation in MLLIB's Online 
> LDA implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11506) Code Optimization to remove a redundant operation

2015-11-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-11506.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 9456
[https://github.com/apache/spark/pull/9456]

> Code Optimization to remove a redundant operation
> -
>
> Key: SPARK-11506
> URL: https://issues.apache.org/jira/browse/SPARK-11506
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Alok
>Priority: Trivial
>  Labels: easyfix, performance
> Fix For: 1.6.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Creating this JIRA to track all 'trivial' code optimizations that represent a 
> small improvement in Spark MLLib.
> These changes are not 'must-haves' but definitely improve MLLIB's performance 
> slightly. Specifically, the changes in this JIRA fall in 'trivial' category 
> as the correctness of the code is not altered and the performance gain has 
> not been proven (theoretically / experimentally) to be major.
> Starting this umbrella JIRA, since I could not find another JIRA where I 
> could link my PR, which removes an unnecessary operation in MLLIB's Online 
> LDA implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mustafa elbehery updated SPARK-11526:
-
Description: 
I have tried to use SparkSQL 
[JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
 to connect to *PostGIS* Database. Although the connection works fine with a 
normal *PostgresSql* Database, it throws UnSupported Type Exception when I try 
to query a Database with _PostGIS_ extension.

To Further Explain, I have two Databases in my Postgresql, as follows :-

1- *postgres* : a normal database which supports only primitive types.
2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
extension. 


When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
had the following results :- 

1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
could query it using the following code; {color:red} val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:postgres",
  "dbtable" -> "affiliations")).load() {color}



2- *However*, when I tried to use the same way for querying *nycesri*, I have 
got *unsupported Type * exception, probably because the _postGis_ extension 
unsupported. Following is the used code; {color:red}  val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgis:nycesri",
  "dbtable" -> "ny_counties_clip")).load() {color}



I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
screenshot of the exception.

  was:
I have tried to use SparkSQL 
[JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
 to connect to *PostGIS* Database. Although the connection works fine with a 
normal *PostgresSql* Database, it throws UnSupported Type Exception when I try 
to query a Database with _PostGIS_ extension.

To Further Explain, I have two Databases in my Postgresql, as follows :-

1- *postgres* : a normal database which supports only primitive types.
2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
extension. 


When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
had the following results :- 

1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
could query it using the following code; bq. val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:postgres",
  "dbtable" -> "affiliations")).load()



2- *However*, when I tried to use the same way for querying *nycesri*, I have 
got *unsupported Type * exception, probably because the _postGis_ extension 
unsupported. Following is the used code; bq.  val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgis:nycesri",
  "dbtable" -> "ny_counties_clip")).load()



I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
screenshot of the exception.


> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL 
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  to connect to *PostGIS* Database. Although the connection works fine with a 
> normal *PostgresSql* Database, it throws UnSupported Type Exception when I 
> try to query a Database with _PostGIS_ extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
> extension. 
> When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
> had the following results :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; {color:red} val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load() {color}
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code; {color:red}  val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load() {color}
> I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
> screenshot of 

[jira] [Assigned] (SPARK-11527) PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale

2015-11-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11527:


Assignee: (was: Apache Spark)

> PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale
> -
>
> Key: SPARK-11527
> URL: https://issues.apache.org/jira/browse/SPARK-11527
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Yanbo Liang
>Priority: Minor
>
> PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991484#comment-14991484
 ] 

mustafa elbehery edited comment on SPARK-11526 at 11/5/15 10:33 AM:


[~sowen] Correct me if I am wrong, I think the *PostGIS* extension is not 
supported in SparkSql JDBC Connector, thats why the returned type is 
*unsupported* not "OTHER". I have discussed this issue with [~rams] during 
Spark Summit, and he mentioned that it could be fixed.


was (Author: elbehery):
[~sowen] Correct me if I am wrong, I think the *PostGIS* extension is not 
supported in SparkSql JDBC Connector, thats why the returned type is 
*unsupported* not "OTHER". I have discussed this issue with [~rams] during the 
Spark Summit, and he mentioned that it could be fixed.

> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL JDBC  to connect to *PostGIS* Database. Although 
> the connection works fine with a normal *PostgresSql* Database, it throws 
> UnSupported Type Exception when I try to query a Database with _PostGIS_ 
> extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports geometry Types and  _spatial_ 
> queries, using _postgis_ extension. 
> When I tried to use the  
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  from SparkShell as mentioned in SparkSql docs, I had the following results 
> :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; 
> {color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load() {color}
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code;
>  {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load() {color}
> I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
> screenshot of the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991484#comment-14991484
 ] 

mustafa elbehery commented on SPARK-11526:
--

[~sowen] Correct me if I am wrong, I think the *PostGIS* extension is not 
supported in SparkSql JDBC Connector, thats why the returned type is 
*unsupported* not "OTHER". I have discussed this issue with [~rams] during the 
Spark Summit, and he mentioned that it could be fixed.

> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL JDBC  to connect to *PostGIS* Database. Although 
> the connection works fine with a normal *PostgresSql* Database, it throws 
> UnSupported Type Exception when I try to query a Database with _PostGIS_ 
> extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports geometry Types and  _spatial_ 
> queries, using _postgis_ extension. 
> When I tried to use the  
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  from SparkShell as mentioned in SparkSql docs, I had the following results 
> :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; 
> {color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load() {color}
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code;
>  {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load() {color}
> I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
> screenshot of the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991638#comment-14991638
 ] 

Sean Owen commented on SPARK-11526:
---

It means PostGIS returns a non-standard type, in the way your app uses it, that 
neither JDBC nor Spark support. I don't know how to get around that if that's 
your requirement. 

> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL JDBC  to connect to *PostGIS* Database. Although 
> the connection works fine with a normal *PostgresSql* Database, it throws 
> UnSupported Type Exception when I try to query a Database with _PostGIS_ 
> extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports geometry Types and  _spatial_ 
> queries, using _postgis_ extension. 
> When I tried to use the  
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  from SparkShell as mentioned in SparkSql docs, I had the following results 
> :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; 
> {color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load() {color}
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code;
>  {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load() {color}
> I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
> screenshot of the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mustafa elbehery updated SPARK-11526:
-
Description: 
I have tried to use SparkSQL 
[JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
 to connect to *PostGIS* Database. Although the connection works fine with a 
normal *PostgresSql* Database, it throws UnSupported Type Exception when I try 
to query a Database with _PostGIS_ extension.

To Further Explain, I have two Databases in my Postgresql, as follows :-

1- *postgres* : a normal database which supports only primitive types.
2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
extension. 


When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
had the following results :- 

1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
could query it using the following code; 

{color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:postgres",
  "dbtable" -> "affiliations")).load() {color}



2- *However*, when I tried to use the same way for querying *nycesri*, I have 
got *unsupported Type * exception, probably because the _postGis_ extension 
unsupported. Following is the used code;

 {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgis:nycesri",
  "dbtable" -> "ny_counties_clip")).load() {color}



I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
screenshot of the exception.

  was:
I have tried to use SparkSQL 
[JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
 to connect to *PostGIS* Database. Although the connection works fine with a 
normal *PostgresSql* Database, it throws UnSupported Type Exception when I try 
to query a Database with _PostGIS_ extension.

To Further Explain, I have two Databases in my Postgresql, as follows :-

1- *postgres* : a normal database which supports only primitive types.
2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
extension. 


When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
had the following results :- 

1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
could query it using the following code; {color:red} val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:postgres",
  "dbtable" -> "affiliations")).load() {color}



2- *However*, when I tried to use the same way for querying *nycesri*, I have 
got *unsupported Type * exception, probably because the _postGis_ extension 
unsupported. Following is the used code; {color:red}  val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgis:nycesri",
  "dbtable" -> "ny_counties_clip")).load() {color}



I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
screenshot of the exception.


> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL 
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  to connect to *PostGIS* Database. Although the connection works fine with a 
> normal *PostgresSql* Database, it throws UnSupported Type Exception when I 
> try to query a Database with _PostGIS_ extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
> extension. 
> When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
> had the following results :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; 
> {color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load() {color}
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code;
>  {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load() {color}
> I have tried to use PostGIS_JDBC.jar, but it did not work.  I 

[jira] [Commented] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991460#comment-14991460
 ] 

mustafa elbehery commented on SPARK-11526:
--

I have re-opened the issue

> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL 
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  to connect to *PostGIS* Database. Although the connection works fine with a 
> normal *PostgresSql* Database, it throws UnSupported Type Exception when I 
> try to query a Database with _PostGIS_ extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
> extension. 
> When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
> had the following results :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; 
> {color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load() {color}
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code;
>  {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load() {color}
> I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
> screenshot of the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10689) User guide and example code for AFTSurvivalRegression

2015-11-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10689:


Assignee: Apache Spark

> User guide and example code for AFTSurvivalRegression
> -
>
> Key: SPARK-10689
> URL: https://issues.apache.org/jira/browse/SPARK-10689
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: Xiangrui Meng
>Assignee: Apache Spark
>
> Add user guide and example code for AFTSurvivalRegression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10689) User guide and example code for AFTSurvivalRegression

2015-11-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991459#comment-14991459
 ] 

Apache Spark commented on SPARK-10689:
--

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/9491

> User guide and example code for AFTSurvivalRegression
> -
>
> Key: SPARK-10689
> URL: https://issues.apache.org/jira/browse/SPARK-10689
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: Xiangrui Meng
>
> Add user guide and example code for AFTSurvivalRegression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10689) User guide and example code for AFTSurvivalRegression

2015-11-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10689:


Assignee: (was: Apache Spark)

> User guide and example code for AFTSurvivalRegression
> -
>
> Key: SPARK-10689
> URL: https://issues.apache.org/jira/browse/SPARK-10689
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: Xiangrui Meng
>
> Add user guide and example code for AFTSurvivalRegression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11527) PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale

2015-11-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991480#comment-14991480
 ] 

Apache Spark commented on SPARK-11527:
--

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/9492

> PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale
> -
>
> Key: SPARK-11527
> URL: https://issues.apache.org/jira/browse/SPARK-11527
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Yanbo Liang
>Priority: Minor
>
> PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11527) PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale

2015-11-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11527:


Assignee: Apache Spark

> PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale
> -
>
> Key: SPARK-11527
> URL: https://issues.apache.org/jira/browse/SPARK-11527
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Yanbo Liang
>Assignee: Apache Spark
>Priority: Minor
>
> PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mustafa elbehery updated SPARK-11526:
-
   Priority: Critical  (was: Major)
Description: 
I have tried to use SparkSQL 
[JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
 to connect to *PostGIS* Database. Although the connection works fine with a 
normal *PostgresSql* Database, it throws UnSupported Type Exception when I try 
to query a Database with _PostGIS_ extension.

To Further Explain, I have two Databases in my Postgresql, as follows :-

1- *postgres* : a normal database which supports only primitive types.
2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
extension. 


When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
had the following results :- 

1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
could query it using the following code; bq. val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:postgres",
  "dbtable" -> "affiliations")).load()



*However*, when I tried to use the same way for querying *nycesri*, I have got 
*unsupported Type * exception, probably because the _postGis_ extension 
unsupported. Following is the used code; bq.  val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgis:nycesri",
  "dbtable" -> "ny_counties_clip")).load()



I have tried to use PostGIS_JDBC.jar, but it did not work.

  was:I have tried to use SparkSQL JDBC to connect to *PostGIS* Database. 
Although the connection works fine with a normal *PostgresSql* Database, it 
throws UnSupported Type Exception when I try to query a Database with _PostGIS_ 
extension.  


> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
>
> I have tried to use SparkSQL 
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  to connect to *PostGIS* Database. Although the connection works fine with a 
> normal *PostgresSql* Database, it throws UnSupported Type Exception when I 
> try to query a Database with _PostGIS_ extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
> extension. 
> When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
> had the following results :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; bq. val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load()
> *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code; bq.  val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load()
> I have tried to use PostGIS_JDBC.jar, but it did not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mustafa elbehery reopened SPARK-11526:
--

> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL 
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  to connect to *PostGIS* Database. Although the connection works fine with a 
> normal *PostgresSql* Database, it throws UnSupported Type Exception when I 
> try to query a Database with _PostGIS_ extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
> extension. 
> When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
> had the following results :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; 
> {color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load() {color}
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code;
>  {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load() {color}
> I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
> screenshot of the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11527) PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale

2015-11-05 Thread Yanbo Liang (JIRA)
Yanbo Liang created SPARK-11527:
---

 Summary: PySpark AFTSurvivalRegressionModel should expose 
coefficients/intercept/scale
 Key: SPARK-11527
 URL: https://issues.apache.org/jira/browse/SPARK-11527
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Yanbo Liang
Priority: Minor


PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991497#comment-14991497
 ] 

mustafa elbehery commented on SPARK-11526:
--

Ok, I am sure the problem is not from the code, since the same code was working 
with standard database type. Is this means I can not connect to PostGIS through 
SparkSQL JDBC ?? .

> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL JDBC  to connect to *PostGIS* Database. Although 
> the connection works fine with a normal *PostgresSql* Database, it throws 
> UnSupported Type Exception when I try to query a Database with _PostGIS_ 
> extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports geometry Types and  _spatial_ 
> queries, using _postgis_ extension. 
> When I tried to use the  
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  from SparkShell as mentioned in SparkSql docs, I had the following results 
> :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; 
> {color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load() {color}
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code;
>  {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load() {color}
> I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
> screenshot of the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mustafa elbehery updated SPARK-11526:
-
Description: 
I have tried to use SparkSQL 
[JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
 to connect to *PostGIS* Database. Although the connection works fine with a 
normal *PostgresSql* Database, it throws UnSupported Type Exception when I try 
to query a Database with _PostGIS_ extension.

To Further Explain, I have two Databases in my Postgresql, as follows :-

1- *postgres* : a normal database which supports only primitive types.
2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
extension. 


When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
had the following results :- 

1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
could query it using the following code; bq. val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:postgres",
  "dbtable" -> "affiliations")).load()



2- *However*, when I tried to use the same way for querying *nycesri*, I have 
got *unsupported Type * exception, probably because the _postGis_ extension 
unsupported. Following is the used code; bq.  val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgis:nycesri",
  "dbtable" -> "ny_counties_clip")).load()



I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
screenshot of the exception.

  was:
I have tried to use SparkSQL 
[JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
 to connect to *PostGIS* Database. Although the connection works fine with a 
normal *PostgresSql* Database, it throws UnSupported Type Exception when I try 
to query a Database with _PostGIS_ extension.

To Further Explain, I have two Databases in my Postgresql, as follows :-

1- *postgres* : a normal database which supports only primitive types.
2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
extension. 


When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
had the following results :- 

1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
could query it using the following code; bq. val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:postgres",
  "dbtable" -> "affiliations")).load()



2- *However*, when I tried to use the same way for querying *nycesri*, I have 
got *unsupported Type * exception, probably because the _postGis_ extension 
unsupported. Following is the used code; bq.  val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgis:nycesri",
  "dbtable" -> "ny_counties_clip")).load()



I have tried to use PostGIS_JDBC.jar, but it did not work.


> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL 
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  to connect to *PostGIS* Database. Although the connection works fine with a 
> normal *PostgresSql* Database, it throws UnSupported Type Exception when I 
> try to query a Database with _PostGIS_ extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
> extension. 
> When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
> had the following results :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; bq. val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load()
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code; bq.  val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load()
> I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
> screenshot of the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991452#comment-14991452
 ] 

mustafa elbehery commented on SPARK-11526:
--

updated 

> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL 
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  to connect to *PostGIS* Database. Although the connection works fine with a 
> normal *PostgresSql* Database, it throws UnSupported Type Exception when I 
> try to query a Database with _PostGIS_ extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
> extension. 
> When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
> had the following results :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; bq. val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load()
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code; bq.  val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load()
> I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
> screenshot of the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mustafa elbehery updated SPARK-11526:
-
Comment: was deleted

(was: updated )

> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL 
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  to connect to *PostGIS* Database. Although the connection works fine with a 
> normal *PostgresSql* Database, it throws UnSupported Type Exception when I 
> try to query a Database with _PostGIS_ extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
> extension. 
> When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
> had the following results :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; bq. val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load()
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code; bq.  val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load()
> I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
> screenshot of the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991453#comment-14991453
 ] 

mustafa elbehery commented on SPARK-11526:
--

updated 

> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL 
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  to connect to *PostGIS* Database. Although the connection works fine with a 
> normal *PostgresSql* Database, it throws UnSupported Type Exception when I 
> try to query a Database with _PostGIS_ extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
> extension. 
> When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
> had the following results :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; bq. val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load()
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code; bq.  val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load()
> I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
> screenshot of the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11390) Query plan with/without filterPushdown indistinguishable

2015-11-05 Thread Vishesh Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991538#comment-14991538
 ] 

Vishesh Garg commented on SPARK-11390:
--

I was under the impression that this is just a plan tree presentation issue, 
and that the filter was indeed getting pushed by calling the 
*PrunedFilteredScan.buildScan()* method. However now I'm not sure that's the 
case because the internal plan structure also seems to suggest otherwise:

{noformat}
== Physical Plan == 
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], 
output=[count#3L])  
  TungstenExchange SinglePartition   
TungstenAggregate(key=[], 
functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#6L]) 
   
   Project 
 Filter (age#1 < 15)  
   Scan OrcRelation[hdfs://localhost:9000/user/spec/people][age#1]  
Code Generation: true
{noformat}
Am I missing something here?

> Query plan with/without filterPushdown indistinguishable
> 
>
> Key: SPARK-11390
> URL: https://issues.apache.org/jira/browse/SPARK-11390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: All
>Reporter: Vishesh Garg
>Priority: Minor
>
> The execution plan of a query remains the same regardless of whether the 
> filterPushdown flag has been set to "true" or "false", as can be seen below: 
> ==
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false")
> scala> sqlContext.sql("SELECT name FROM people WHERE age = 15").explain()
> == Physical Plan ==
> Project [name#6]
>  Filter (age#7 = 15)
>   Scan OrcRelation[hdfs://localhost:9000/user/spec/people][name#6,age#7]
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
> scala> sqlContext.sql("SELECT name FROM people WHERE age = 15").explain()
> == Physical Plan ==
> Project [name#6]
>  Filter (age#7 = 15)
>   Scan OrcRelation[hdfs://localhost:9000/user/spec/people][name#6,age#7]
> ==
> Ideally, when the filterPushdown flag is set to "true", both the scan and the 
> filter nodes should be merged together to make it clear that the filtering is 
> being done by the data source itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mustafa elbehery updated SPARK-11526:
-
Attachment: Selection_007.png

> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL 
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  to connect to *PostGIS* Database. Although the connection works fine with a 
> normal *PostgresSql* Database, it throws UnSupported Type Exception when I 
> try to query a Database with _PostGIS_ extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
> extension. 
> When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
> had the following results :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; bq. val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load()
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code; bq.  val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load()
> I have tried to use PostGIS_JDBC.jar, but it did not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mustafa elbehery updated SPARK-11526:
-
Description: 
I have tried to use SparkSQL 
[JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
 to connect to *PostGIS* Database. Although the connection works fine with a 
normal *PostgresSql* Database, it throws UnSupported Type Exception when I try 
to query a Database with _PostGIS_ extension.

To Further Explain, I have two Databases in my Postgresql, as follows :-

1- *postgres* : a normal database which supports only primitive types.
2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
extension. 


When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
had the following results :- 

1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
could query it using the following code; bq. val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:postgres",
  "dbtable" -> "affiliations")).load()



2- *However*, when I tried to use the same way for querying *nycesri*, I have 
got *unsupported Type * exception, probably because the _postGis_ extension 
unsupported. Following is the used code; bq.  val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgis:nycesri",
  "dbtable" -> "ny_counties_clip")).load()



I have tried to use PostGIS_JDBC.jar, but it did not work.

  was:
I have tried to use SparkSQL 
[JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
 to connect to *PostGIS* Database. Although the connection works fine with a 
normal *PostgresSql* Database, it throws UnSupported Type Exception when I try 
to query a Database with _PostGIS_ extension.

To Further Explain, I have two Databases in my Postgresql, as follows :-

1- *postgres* : a normal database which supports only primitive types.
2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
extension. 


When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
had the following results :- 

1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
could query it using the following code; bq. val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:postgres",
  "dbtable" -> "affiliations")).load()



*However*, when I tried to use the same way for querying *nycesri*, I have got 
*unsupported Type * exception, probably because the _postGis_ extension 
unsupported. Following is the used code; bq.  val jdbcDF = 
sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgis:nycesri",
  "dbtable" -> "ny_counties_clip")).load()



I have tried to use PostGIS_JDBC.jar, but it did not work.


> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL 
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  to connect to *PostGIS* Database. Although the connection works fine with a 
> normal *PostgresSql* Database, it throws UnSupported Type Exception when I 
> try to query a Database with _PostGIS_ extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
> extension. 
> When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
> had the following results :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; bq. val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load()
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code; bq.  val jdbcDF = 
> sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load()
> I have tried to use PostGIS_JDBC.jar, but it did not work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For 

[jira] [Updated] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mustafa elbehery updated SPARK-11526:
-
Description: 
I have tried to use SparkSQL JDBC  to connect to *PostGIS* Database. Although 
the connection works fine with a normal *PostgresSql* Database, it throws 
UnSupported Type Exception when I try to query a Database with _PostGIS_ 
extension.

To Further Explain, I have two Databases in my Postgresql, as follows :-

1- *postgres* : a normal database which supports only primitive types.
2- *nycesri* : a database which supports geometry Types and  _spatial_ queries, 
using _postgis_ extension. 


When I tried to use the  
[JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]from
 SparkShell as mentioned in SparkSql docs, I had the following results :- 

1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
could query it using the following code; 

{color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:postgres",
  "dbtable" -> "affiliations")).load() {color}



2- *However*, when I tried to use the same way for querying *nycesri*, I have 
got *unsupported Type * exception, probably because the _postGis_ extension 
unsupported. Following is the used code;

 {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgis:nycesri",
  "dbtable" -> "ny_counties_clip")).load() {color}



I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
screenshot of the exception.

  was:
I have tried to use SparkSQL 
[JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
 to connect to *PostGIS* Database. Although the connection works fine with a 
normal *PostgresSql* Database, it throws UnSupported Type Exception when I try 
to query a Database with _PostGIS_ extension.

To Further Explain, I have two Databases in my Postgresql, as follows :-

1- *postgres* : a normal database which supports only primitive types.
2- *nycesri* : a database which supports geometry Types and  _spatial_ queries, 
using _postgis_ extension. 


When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
had the following results :- 

1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
could query it using the following code; 

{color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:postgres",
  "dbtable" -> "affiliations")).load() {color}



2- *However*, when I tried to use the same way for querying *nycesri*, I have 
got *unsupported Type * exception, probably because the _postGis_ extension 
unsupported. Following is the used code;

 {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgis:nycesri",
  "dbtable" -> "ny_counties_clip")).load() {color}



I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
screenshot of the exception.


> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL JDBC  to connect to *PostGIS* Database. Although 
> the connection works fine with a normal *PostgresSql* Database, it throws 
> UnSupported Type Exception when I try to query a Database with _PostGIS_ 
> extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports geometry Types and  _spatial_ 
> queries, using _postgis_ extension. 
> When I tried to use the  
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]from
>  SparkShell as mentioned in SparkSql docs, I had the following results :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; 
> {color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load() {color}
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code;
>  {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load() {color}

[jira] [Updated] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mustafa elbehery updated SPARK-11526:
-
Description: 
I have tried to use SparkSQL JDBC  to connect to *PostGIS* Database. Although 
the connection works fine with a normal *PostgresSql* Database, it throws 
UnSupported Type Exception when I try to query a Database with _PostGIS_ 
extension.

To Further Explain, I have two Databases in my Postgresql, as follows :-

1- *postgres* : a normal database which supports only primitive types.
2- *nycesri* : a database which supports geometry Types and  _spatial_ queries, 
using _postgis_ extension. 


When I tried to use the  
[JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
 from SparkShell as mentioned in SparkSql docs, I had the following results :- 

1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
could query it using the following code; 

{color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:postgres",
  "dbtable" -> "affiliations")).load() {color}



2- *However*, when I tried to use the same way for querying *nycesri*, I have 
got *unsupported Type * exception, probably because the _postGis_ extension 
unsupported. Following is the used code;

 {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgis:nycesri",
  "dbtable" -> "ny_counties_clip")).load() {color}



I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
screenshot of the exception.

  was:
I have tried to use SparkSQL JDBC  to connect to *PostGIS* Database. Although 
the connection works fine with a normal *PostgresSql* Database, it throws 
UnSupported Type Exception when I try to query a Database with _PostGIS_ 
extension.

To Further Explain, I have two Databases in my Postgresql, as follows :-

1- *postgres* : a normal database which supports only primitive types.
2- *nycesri* : a database which supports geometry Types and  _spatial_ queries, 
using _postgis_ extension. 


When I tried to use the  
[JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]from
 SparkShell as mentioned in SparkSql docs, I had the following results :- 

1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
could query it using the following code; 

{color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:postgres",
  "dbtable" -> "affiliations")).load() {color}



2- *However*, when I tried to use the same way for querying *nycesri*, I have 
got *unsupported Type * exception, probably because the _postGis_ extension 
unsupported. Following is the used code;

 {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgis:nycesri",
  "dbtable" -> "ny_counties_clip")).load() {color}



I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
screenshot of the exception.


> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL JDBC  to connect to *PostGIS* Database. Although 
> the connection works fine with a normal *PostgresSql* Database, it throws 
> UnSupported Type Exception when I try to query a Database with _PostGIS_ 
> extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports geometry Types and  _spatial_ 
> queries, using _postgis_ extension. 
> When I tried to use the  
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  from SparkShell as mentioned in SparkSql docs, I had the following results 
> :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; 
> {color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load() {color}
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code;
>  {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load() 

[jira] [Updated] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread mustafa elbehery (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mustafa elbehery updated SPARK-11526:
-
Description: 
I have tried to use SparkSQL 
[JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
 to connect to *PostGIS* Database. Although the connection works fine with a 
normal *PostgresSql* Database, it throws UnSupported Type Exception when I try 
to query a Database with _PostGIS_ extension.

To Further Explain, I have two Databases in my Postgresql, as follows :-

1- *postgres* : a normal database which supports only primitive types.
2- *nycesri* : a database which supports geometry Types and  _spatial_ queries, 
using _postgis_ extension. 


When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
had the following results :- 

1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
could query it using the following code; 

{color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:postgres",
  "dbtable" -> "affiliations")).load() {color}



2- *However*, when I tried to use the same way for querying *nycesri*, I have 
got *unsupported Type * exception, probably because the _postGis_ extension 
unsupported. Following is the used code;

 {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgis:nycesri",
  "dbtable" -> "ny_counties_clip")).load() {color}



I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
screenshot of the exception.

  was:
I have tried to use SparkSQL 
[JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
 to connect to *PostGIS* Database. Although the connection works fine with a 
normal *PostgresSql* Database, it throws UnSupported Type Exception when I try 
to query a Database with _PostGIS_ extension.

To Further Explain, I have two Databases in my Postgresql, as follows :-

1- *postgres* : a normal database which supports only primitive types.
2- *nycesri* : a database which supports _spatial_ queries, using _postgis_ 
extension. 


When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
had the following results :- 

1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
could query it using the following code; 

{color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:postgres",
  "dbtable" -> "affiliations")).load() {color}



2- *However*, when I tried to use the same way for querying *nycesri*, I have 
got *unsupported Type * exception, probably because the _postGis_ extension 
unsupported. Following is the used code;

 {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgis:nycesri",
  "dbtable" -> "ny_counties_clip")).load() {color}



I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
screenshot of the exception.


> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL 
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  to connect to *PostGIS* Database. Although the connection works fine with a 
> normal *PostgresSql* Database, it throws UnSupported Type Exception when I 
> try to query a Database with _PostGIS_ extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports geometry Types and  _spatial_ 
> queries, using _postgis_ extension. 
> When I tried to use the JDBC from SparkShell as mentioned in SparkSql docs, I 
> had the following results :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; 
> {color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load() {color}
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code;
>  {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load() {color}
> I have tried to use 

[jira] [Commented] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991472#comment-14991472
 ] 

Sean Owen commented on SPARK-11526:
---

[~elbehery] you still haven't addressed the key question: why is this a Spark 
issue? The error indicates the database is returning a type "OTHER" which is 
not something Spark can support, since it's an unknown type. This is a problem 
with PostGIS and its relation to standard SQL types. I'm going to re-close 
this. \

> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL JDBC  to connect to *PostGIS* Database. Although 
> the connection works fine with a normal *PostgresSql* Database, it throws 
> UnSupported Type Exception when I try to query a Database with _PostGIS_ 
> extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports geometry Types and  _spatial_ 
> queries, using _postgis_ extension. 
> When I tried to use the  
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  from SparkShell as mentioned in SparkSql docs, I had the following results 
> :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; 
> {color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load() {color}
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code;
>  {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load() {color}
> I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
> screenshot of the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11526) JDBC to PostGIS throws UnSupported Type exception

2015-11-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991489#comment-14991489
 ] 

Sean Owen commented on SPARK-11526:
---

No, have a look at your stack trace and the code. Spark explicitly doesn't 
support JDBC type "OTHER" (type ) since it can't. It's not a particular 
type. 

> JDBC to PostGIS throws UnSupported Type exception
> -
>
> Key: SPARK-11526
> URL: https://issues.apache.org/jira/browse/SPARK-11526
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: Linux Based
>Reporter: mustafa elbehery
>Priority: Critical
>  Labels: easyfix
> Attachments: Selection_007.png
>
>
> I have tried to use SparkSQL JDBC  to connect to *PostGIS* Database. Although 
> the connection works fine with a normal *PostgresSql* Database, it throws 
> UnSupported Type Exception when I try to query a Database with _PostGIS_ 
> extension.
> To Further Explain, I have two Databases in my Postgresql, as follows :-
> 1- *postgres* : a normal database which supports only primitive types.
> 2- *nycesri* : a database which supports geometry Types and  _spatial_ 
> queries, using _postgis_ extension. 
> When I tried to use the  
> [JDBC|http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases]
>  from SparkShell as mentioned in SparkSql docs, I had the following results 
> :- 
> 1- with *postgres*, I have retrieved the tables in the DataFrame object, and 
> could query it using the following code; 
> {color:red} val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgresql:postgres",
>   "dbtable" -> "affiliations")).load() {color}
> 2- *However*, when I tried to use the same way for querying *nycesri*, I have 
> got *unsupported Type * exception, probably because the _postGis_ 
> extension unsupported. Following is the used code;
>  {color:red}  val jdbcDF = sqlContext.read.format("jdbc").options(
>   Map("url" -> "jdbc:postgis:nycesri",
>   "dbtable" -> "ny_counties_clip")).load() {color}
> I have tried to use PostGIS_JDBC.jar, but it did not work.  I have attached a 
> screenshot of the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8181) date/time function: hour

2015-11-05 Thread Chip Sands (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991957#comment-14991957
 ] 

Chip Sands commented on SPARK-8181:
---

Using  Spark Thrift Server  hour( '1961-08-30 06:06:10')  is returning  (-17)  
for the hour in 1.5 and 1.5.1

> date/time function: hour
> 
>
> Key: SPARK-8181
> URL: https://issues.apache.org/jira/browse/SPARK-8181
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Tarek Auel
> Fix For: 1.5.0
>
>
> hour(string|date|timestamp): int
> Returns the hour of the timestamp: hour('2009-07-30 12:58:59') = 12, 
> hour('12:58:59') = 12.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11474) Options to jdbc load are lower cased

2015-11-05 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-11474.
-
   Resolution: Fixed
 Assignee: Huaxin Gao
Fix Version/s: 1.6.0
   1.5.3

> Options to jdbc load are lower cased
> 
>
> Key: SPARK-11474
> URL: https://issues.apache.org/jira/browse/SPARK-11474
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 1.5.1
> Environment: Linux & Mac
>Reporter: Stephen Samuel
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 1.5.3, 1.6.0
>
>
> We recently upgraded from spark 1.3.0 to 1.5.1 and one of the features we 
> wanted to take advantage of was the fetchSize added to the jdbc data frame 
> reader.
> In 1.5.1 there appears to be a bug or regression, whereby an options map has 
> its keys lowercased. This means the existing properties from prior to 1.4 are 
> ok, such as dbtable, url and driver, but the newer fetchSize gets converted 
> to fetchsize.
> To re-produce:
> val conf = new SparkConf(true).setMaster("local").setAppName("fetchtest")
> val sc = new SparkContext(conf)
> val sql = new SQLContext(sc)
> val options = Map("url" -> , "driver" -> , "fetchSize" -> )
> val df = sql.load("jdbc", options)
> Breakpoint at line 371 in JDBCRDD and you'll see the options are all 
> lowercased, so:
> val fetchSize = properties.getProperty("fetchSize", "0").toInt
> results in 0
> Now I know sql.load is deprecated, but this might be occuring on other 
> methods too. The workaround is to use the java.util.Properties overload, 
> which keeps the case sensitive keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-11021) SparkSQL cli throws exception when using with Hive 0.12 metastore in spark-1.5.0 version

2015-11-05 Thread Jeff Mink (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991989#comment-14991989
 ] 

Jeff Mink edited comment on SPARK-11021 at 11/5/15 5:06 PM:


We came across a seemingly similar issue with using older versions of Hive 
through Spark (It's hard to tell, because I don't see the query that caused the 
error above).

We are running Spark 1.5.1 with Hive 1.0. Any time we ran an INSERT OVERWRITE 
or CREATE TABLE AS SELECT through Spark's SQL context, we would see the 
following:

{noformat}
15/11/05 09:51:47 INFO output.FileOutputCommitter: Saved output of task 
'attempt_201511050951__m_00_0' to 
hdfs://development/user/jmink/jmink.db/test/.hive-staging_hive_2015-11-05_09-51-46_242_7888543996555678892-1/-ext-1/_temporary/0/task_201511050951__m_00
...
15/11/05 09:51:47 INFO common.FileUtils: deleting  
hdfs://development/user/jmink/jmink.db/test/.hive-staging_hive_2015-11-05_09-51-46_242_7888543996555678892-1
{noformat}

The reason it was doing this is because there is a new setting in later 
versions of Hive (I think 1.2?) 'hive.exec.stagingdir' that, by default, is set 
to '.hive-staging'. This causes the staging data to be written to a 
subdirectory of the table that we are overwriting, which means that when our 
process gets to the OVERWRITE stage, it deletes the staging folder along with 
everything else in the table's location.

The fix for this was to edit our '/opt/spark/hive-site.xml' and add the 
following entry (of course, you can set this to whatever works for you):

{noformat}
  
hive.exec.stagingdir
/tmp/hive/spark-${user.name}
  
{noformat}


was (Author: jxmink):
We came across a seemingly similar issue with using older versions of Hive 
through Spark (It's hard to tell, because I don't see the query that caused the 
error above).

We are running Spark 1.5.1 with Hive 1.0. Any time we ran an INSERT OVERWRITE 
or CREATE TABLE AS SELECT through Spark's SQL context, we would see the 
following:
```
15/11/05 09:51:47 INFO output.FileOutputCommitter: Saved output of task 
'attempt_201511050951__m_00_0' to 
hdfs://development/user/jmink/jmink.db/test/.hive-staging_hive_2015-11-05_09-51-46_242_7888543996555678892-1/-ext-1/_temporary/0/task_201511050951__m_00
...
15/11/05 09:51:47 INFO common.FileUtils: deleting  
hdfs://development/user/jmink/jmink.db/test/.hive-staging_hive_2015-11-05_09-51-46_242_7888543996555678892-1
```

The reason it was doing this is because there is a new setting in later 
versions of Hive (I think 1.2?) `hive.exec.stagingdir` that, by default, is set 
to `.hive-staging`. This causes the staging data to be written to a 
subdirectory of the table that we are overwriting, which means that when our 
process gets to the OVERWRITE stage, it deletes the staging folder along with 
everything else in the table's location.

The fix for this was to edit our `/opt/spark/hive-site.xml` and add the 
following entry (of course, you can set this to whatever works for you):
```
  
hive.exec.stagingdir
/tmp/hive/spark-${user.name}
  
```

> SparkSQL cli throws exception when using with Hive 0.12 metastore in 
> spark-1.5.0 version
> 
>
> Key: SPARK-11021
> URL: https://issues.apache.org/jira/browse/SPARK-11021
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: iward
>
> After upgrade spark from 1.4.1 to 1.5.0,I get the following exception when I 
> set set the following properties in spark-defaults.conf:
> {noformat}
> spark.sql.hive.metastore.version=0.12.0
> spark.sql.hive.metastore.jars=hive 0.12 jars and hadoop jars
> {noformat}
> when I run a task,it got following exception:
> {noformat}
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.sql.hive.client.Shim_v0_12.loadTable(HiveShim.scala:249)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply$mcV$sp(ClientWrapper.scala:489)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:256)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:211)
>   at 
> 

[jira] [Updated] (SPARK-11473) R-like summary statistics with intercept for OLS via normal equation solver

2015-11-05 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-11473:
--
Assignee: Yanbo Liang

> R-like summary statistics with intercept for OLS via normal equation solver
> ---
>
> Key: SPARK-11473
> URL: https://issues.apache.org/jira/browse/SPARK-11473
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
> Fix For: 1.6.0
>
>
> SPARK-9836 has provided R-like summary statistics for coefficients, we should 
> also add this statistics for intercept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11473) R-like summary statistics with intercept for OLS via normal equation solver

2015-11-05 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-11473.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 9485
[https://github.com/apache/spark/pull/9485]

> R-like summary statistics with intercept for OLS via normal equation solver
> ---
>
> Key: SPARK-11473
> URL: https://issues.apache.org/jira/browse/SPARK-11473
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Yanbo Liang
> Fix For: 1.6.0
>
>
> SPARK-9836 has provided R-like summary statistics for coefficients, we should 
> also add this statistics for intercept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11021) SparkSQL cli throws exception when using with Hive 0.12 metastore in spark-1.5.0 version

2015-11-05 Thread Jeff Mink (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991989#comment-14991989
 ] 

Jeff Mink commented on SPARK-11021:
---

We came across a seemingly similar issue with using older versions of Hive 
through Spark (It's hard to tell, because I don't see the query that caused the 
error above).

We are running Spark 1.5.1 with Hive 1.0. Any time we ran an INSERT OVERWRITE 
or CREATE TABLE AS SELECT through Spark's SQL context, we would see the 
following:
```
15/11/05 09:51:47 INFO output.FileOutputCommitter: Saved output of task 
'attempt_201511050951__m_00_0' to 
hdfs://development/user/jmink/jmink.db/test/.hive-staging_hive_2015-11-05_09-51-46_242_7888543996555678892-1/-ext-1/_temporary/0/task_201511050951__m_00
...
15/11/05 09:51:47 INFO common.FileUtils: deleting  
hdfs://development/user/jmink/jmink.db/test/.hive-staging_hive_2015-11-05_09-51-46_242_7888543996555678892-1
```

The reason it was doing this is because there is a new setting in later 
versions of Hive (I think 1.2?) `hive.exec.stagingdir` that, by default, is set 
to `.hive-staging`. This causes the staging data to be written to a 
subdirectory of the table that we are overwriting, which means that when our 
process gets to the OVERWRITE stage, it deletes the staging folder along with 
everything else in the table's location.

The fix for this was to edit our `/opt/spark/hive-site.xml` and add the 
following entry (of course, you can set this to whatever works for you):
```
  
hive.exec.stagingdir
/tmp/hive/spark-${user.name}
  
```

> SparkSQL cli throws exception when using with Hive 0.12 metastore in 
> spark-1.5.0 version
> 
>
> Key: SPARK-11021
> URL: https://issues.apache.org/jira/browse/SPARK-11021
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: iward
>
> After upgrade spark from 1.4.1 to 1.5.0,I get the following exception when I 
> set set the following properties in spark-defaults.conf:
> {noformat}
> spark.sql.hive.metastore.version=0.12.0
> spark.sql.hive.metastore.jars=hive 0.12 jars and hadoop jars
> {noformat}
> when I run a task,it got following exception:
> {noformat}
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.sql.hive.client.Shim_v0_12.loadTable(HiveShim.scala:249)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply$mcV$sp(ClientWrapper.scala:489)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:256)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:211)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:248)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.loadTable(ClientWrapper.scala:488)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:243)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:263)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:927)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:927)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:144)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:129)
>   at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:719)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:61)
>   at 
> 

[jira] [Resolved] (SPARK-11501) spark.rpc config not propagated to executors

2015-11-05 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-11501.

   Resolution: Fixed
 Assignee: Nishkam Ravi
Fix Version/s: 1.6.0

> spark.rpc config not propagated to executors 
> -
>
> Key: SPARK-11501
> URL: https://issues.apache.org/jira/browse/SPARK-11501
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.5.2, 1.6.0
>Reporter: Nishkam Ravi
>Assignee: Nishkam Ravi
> Fix For: 1.6.0
>
>
> spark.rpc conf doesn't get propagated to executors because RpcEnv.create is 
> done before properties are fetched from the driver. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11473) R-like summary statistics with intercept for OLS via normal equation solver

2015-11-05 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-11473:
--
Target Version/s: 1.6.0

> R-like summary statistics with intercept for OLS via normal equation solver
> ---
>
> Key: SPARK-11473
> URL: https://issues.apache.org/jira/browse/SPARK-11473
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Yanbo Liang
> Fix For: 1.6.0
>
>
> SPARK-9836 has provided R-like summary statistics for coefficients, we should 
> also add this statistics for intercept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7425) spark.ml Predictor should support other numeric types for label

2015-11-05 Thread Glenn Weidner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991863#comment-14991863
 ] 

Glenn Weidner commented on SPARK-7425:
--

No changes were proposed to SchemaUtils.checkColumnType.

> spark.ml Predictor should support other numeric types for label
> ---
>
> Key: SPARK-7425
> URL: https://issues.apache.org/jira/browse/SPARK-7425
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>  Labels: starter
>
> Currently, the Predictor abstraction expects the input labelCol type to be 
> DoubleType, but we should support other numeric types.  This will involve 
> updating the PredictorParams.validateAndTransformSchema method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.

2015-11-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991953#comment-14991953
 ] 

Apache Spark commented on SPARK-10648:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/9498

> Spark-SQL JDBC fails to set a default precision and scale when they are not 
> defined in an oracle schema.
> 
>
> Key: SPARK-10648
> URL: https://issues.apache.org/jira/browse/SPARK-10648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
> Environment: using oracle 11g, ojdbc7.jar
>Reporter: Travis Hegner
>
> Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a 
> scala app, I am getting an exception "Overflowed precision". Some times I 
> would get the exception "Unscaled value too large for precision".
> This issue likely affects older versions as well, but this was the version I 
> verified it on.
> I narrowed it down to the fact that the schema detection system was trying to 
> set the precision to 0, and the scale to -127.
> I have a proposed pull request to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11527) PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale

2015-11-05 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-11527.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 9492
[https://github.com/apache/spark/pull/9492]

> PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale
> -
>
> Key: SPARK-11527
> URL: https://issues.apache.org/jira/browse/SPARK-11527
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Yanbo Liang
>Priority: Minor
> Fix For: 1.6.0
>
>
> PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11527) PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale

2015-11-05 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-11527:
--
Assignee: Yanbo Liang

> PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale
> -
>
> Key: SPARK-11527
> URL: https://issues.apache.org/jira/browse/SPARK-11527
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
>Priority: Minor
> Fix For: 1.6.0
>
>
> PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11527) PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale

2015-11-05 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-11527:
--
Target Version/s: 1.6.0

> PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale
> -
>
> Key: SPARK-11527
> URL: https://issues.apache.org/jira/browse/SPARK-11527
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
>Priority: Minor
> Fix For: 1.6.0
>
>
> PySpark AFTSurvivalRegressionModel should expose coefficients/intercept/scale



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8144) For PySpark SQL, automatically convert values provided in readwriter options to string

2015-11-05 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992180#comment-14992180
 ] 

Joseph K. Bradley commented on SPARK-8144:
--

I'm really OK with closing this; I don't think it's that important (maybe 
better to improve docs).

> For PySpark SQL, automatically convert values provided in readwriter options 
> to string
> --
>
> Key: SPARK-8144
> URL: https://issues.apache.org/jira/browse/SPARK-8144
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 1.4.0
>Reporter: Joseph K. Bradley
>
> Because of typos in lines 81 and 240 of:
> [https://github.com/apache/spark/blob/16fc49617e1dfcbe9122b224f7f63b7bfddb36ce/python/pyspark/sql/readwriter.py]
> (Search for "option(")
> CC: [~yhuai] [~davies]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11447) Null comparison requires type information but type extraction fails for complex types

2015-11-05 Thread kevin yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992271#comment-14992271
 ] 

kevin yu commented on SPARK-11447:
--

Hello Kapil : Thanks a lot. I am looking into it now. Kevin

> Null comparison requires type information but type extraction fails for 
> complex types
> -
>
> Key: SPARK-11447
> URL: https://issues.apache.org/jira/browse/SPARK-11447
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Kapil Singh
>
> While comparing a Column to a null literal, comparison works only if type of 
> null literal matches type of the Column it's being compared to. Example scala 
> code (can be run from spark shell):
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.catalyst.expressions._
> val inputRowsData = Seq(Seq("abc"),Seq(null),Seq("xyz"))
> val inputRows = for(seq <- inputRowsData) yield Row.fromSeq(seq)
> val dfSchema = StructType(Seq(StructField("column", StringType, true)))
> val df = sqlContext.createDataFrame(sc.makeRDD(inputRows), dfSchema)
> //DOESN'T WORK
> val filteredDF = df.filter(df("column") <=> (new Column(Literal(null
> //WORKS
> val filteredDF = df.filter(df("column") <=> (new Column(Literal.create(null, 
> SparkleFunctions.dataType(df("column"))
> Why should type information be required for a null comparison? If it's 
> required, it's not always possible to extract type information from complex  
> types (e.g. StructType). Following scala code (can be run from spark shell), 
> throws org.apache.spark.sql.catalyst.analysis.UnresolvedException:
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.catalyst.expressions._
> val inputRowsData = Seq(Seq(Row.fromSeq(Seq("abc", 
> "def"))),Seq(Row.fromSeq(Seq(null, "123"))),Seq(Row.fromSeq(Seq("ghi", 
> "jkl"
> val inputRows = for(seq <- inputRowsData) yield Row.fromSeq(seq)
> val dfSchema = StructType(Seq(StructField("column", 
> StructType(Seq(StructField("p1", StringType, true), StructField("p2", 
> StringType, true))), true)))
> val filteredDF = df.filter(df("column")("p1") <=> (new 
> Column(Literal.create(null, SparkleFunctions.dataType(df("column")("p1"))
> org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
> dataType on unresolved object, tree: column#0[p1]
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedExtractValue.dataType(unresolved.scala:243)
>   at 
> org.apache.spark.sql.ArithmeticFunctions$class.dataType(ArithmeticFunctions.scala:76)
>   at 
> org.apache.spark.sql.SparkleFunctions$.dataType(SparkleFunctions.scala:14)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:45)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:47)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:49)
>   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51)
>   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:53)
>   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:55)
>   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:57)
>   at $iwC$$iwC$$iwC$$iwC$$iwC.(:59)
>   at $iwC$$iwC$$iwC$$iwC.(:61)
>   at $iwC$$iwC$$iwC.(:63)
>   at $iwC$$iwC.(:65)
>   at $iwC.(:67)
>   at (:69)
>   at .(:73)
>   at .()
>   at .(:7)
>   at .()
>   at $print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>   at 
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
>   at 
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>   at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>   at 
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>   at 
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>   at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>   at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>   at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>   at 
> 

[jira] [Assigned] (SPARK-11532) Remove implicit conversion from Expression to Column

2015-11-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11532:


Assignee: Reynold Xin  (was: Apache Spark)

> Remove implicit conversion from Expression to Column
> 
>
> Key: SPARK-11532
> URL: https://issues.apache.org/jira/browse/SPARK-11532
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11532) Remove implicit conversion from Expression to Column

2015-11-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992277#comment-14992277
 ] 

Apache Spark commented on SPARK-11532:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/9500

> Remove implicit conversion from Expression to Column
> 
>
> Key: SPARK-11532
> URL: https://issues.apache.org/jira/browse/SPARK-11532
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11532) Remove implicit conversion from Expression to Column

2015-11-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-11532:


Assignee: Apache Spark  (was: Reynold Xin)

> Remove implicit conversion from Expression to Column
> 
>
> Key: SPARK-11532
> URL: https://issues.apache.org/jira/browse/SPARK-11532
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11535) StringIndexer should handle empty String specially

2015-11-05 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-11535:
-

 Summary: StringIndexer should handle empty String specially
 Key: SPARK-11535
 URL: https://issues.apache.org/jira/browse/SPARK-11535
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Priority: Minor


StringIndexer will treat an empty string like any other string and index it 
properly.  However, the feature attribute name will be set to the empty string, 
which causes a failure in OneHotEncoder.  We should handle it specially by 
calling it something like "(empty_string)" (and maybe append an integer if that 
string already exists).

See [https://issues.apache.org/jira/browse/SPARK-10513] for a description of 
the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   >