[jira] [Commented] (SPARK-14919) Spark Cannot be used with software that requires jackson-databind 2.6+: RDDOperationScope

2016-04-26 Thread John Ferguson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259247#comment-15259247
 ] 

John Ferguson commented on SPARK-14919:
---

So basically no matter what we - Spark, us, the world - as the community ask of 
them regrading this behavior - they don't listen?  Is there an alternative. I 
mean I have my pom based work around but to be honest it feels filthy.

> Spark Cannot be used with software that requires jackson-databind 2.6+: 
> RDDOperationScope
> -
>
> Key: SPARK-14919
> URL: https://issues.apache.org/jira/browse/SPARK-14919
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 1.6.1
> Environment: Linux, OSX
>Reporter: John Ferguson
>
> When using Spark 1.4.x or Spark 1.6.1 in an application that has a front end 
> requiring jackson-databind 2.6+, we see the follow exceptions:
> Subset of stack trace:
> ==
> com.fasterxml.jackson.databind.JsonMappingException: Could not find creator 
> property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
>  at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1]
>   at 
> com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
>   at 
> com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:405)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:354)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:262)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:242)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
>   at 
> com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3664)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3556)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2576)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85)
>   at 
> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
>   at scala.Option.map(Option.scala:145)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>   at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1011)
>   at 
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:832)
>   at 
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:830)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>   at org.apache.spark.SparkContext.textFile(SparkContext.scala:830)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14919) Spark Cannot be used with software that requires jackson-databind 2.6+: RDDOperationScope

2016-04-26 Thread John Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Ferguson resolved SPARK-14919.
---
Resolution: Not A Problem

Although it is not optimal, by using POM dependencies in the application 
consuming Spark, we can force the required Jackson dependencies, specifically 
the Scala module ones to be up to date.  This was not immediately obvious 
without digging into a lot of other documentation such as: 
https://github.com/FasterXML/jackson-module-scala/issues/177  

However, given Jackson has been moving forward and without care for how changes 
impact legacy code, this may be an issue that returns.

> Spark Cannot be used with software that requires jackson-databind 2.6+: 
> RDDOperationScope
> -
>
> Key: SPARK-14919
> URL: https://issues.apache.org/jira/browse/SPARK-14919
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 1.6.1
> Environment: Linux, OSX
>Reporter: John Ferguson
>
> When using Spark 1.4.x or Spark 1.6.1 in an application that has a front end 
> requiring jackson-databind 2.6+, we see the follow exceptions:
> Subset of stack trace:
> ==
> com.fasterxml.jackson.databind.JsonMappingException: Could not find creator 
> property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
>  at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1]
>   at 
> com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
>   at 
> com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:405)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:354)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:262)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:242)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
>   at 
> com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3664)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3556)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2576)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85)
>   at 
> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
>   at scala.Option.map(Option.scala:145)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>   at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1011)
>   at 
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:832)
>   at 
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:830)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>   at org.apache.spark.SparkContext.textFile(SparkContext.scala:830)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-14919) Spark Cannot be used with software that requires jackson-databind 2.6+: RDDOperationScope

2016-04-26 Thread John Ferguson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Ferguson closed SPARK-14919.
-

See resolution.

> Spark Cannot be used with software that requires jackson-databind 2.6+: 
> RDDOperationScope
> -
>
> Key: SPARK-14919
> URL: https://issues.apache.org/jira/browse/SPARK-14919
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 1.6.1
> Environment: Linux, OSX
>Reporter: John Ferguson
>
> When using Spark 1.4.x or Spark 1.6.1 in an application that has a front end 
> requiring jackson-databind 2.6+, we see the follow exceptions:
> Subset of stack trace:
> ==
> com.fasterxml.jackson.databind.JsonMappingException: Could not find creator 
> property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
>  at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1]
>   at 
> com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
>   at 
> com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:405)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:354)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:262)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:242)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
>   at 
> com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3664)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3556)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2576)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85)
>   at 
> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
>   at scala.Option.map(Option.scala:145)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>   at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1011)
>   at 
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:832)
>   at 
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:830)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>   at org.apache.spark.SparkContext.textFile(SparkContext.scala:830)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14919) Spark Cannot be used with software that requires jackson-databind 2.6+: RDDOperationScope

2016-04-26 Thread John Ferguson (JIRA)
John Ferguson created SPARK-14919:
-

 Summary: Spark Cannot be used with software that requires 
jackson-databind 2.6+: RDDOperationScope
 Key: SPARK-14919
 URL: https://issues.apache.org/jira/browse/SPARK-14919
 Project: Spark
  Issue Type: Bug
  Components: Input/Output
Affects Versions: 1.6.1
 Environment: Linux, OSX
Reporter: John Ferguson


When using Spark 1.4.x or Spark 1.6.1 in an application that has a front end 
requiring jackson-databind 2.6+, we see the follow exceptions:

Subset of stack trace:
==
com.fasterxml.jackson.databind.JsonMappingException: Could not find creator 
property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
 at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1]
  at 
com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
  at 
com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
  at 
com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
  at 
com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
  at 
com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
  at 
com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:405)
  at 
com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:354)
  at 
com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:262)
  at 
com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:242)
  at 
com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
  at 
com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
  at 
com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3664)
  at 
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3556)
  at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2576)
  at 
org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85)
  at 
org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
  at 
org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
  at scala.Option.map(Option.scala:145)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
  at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
  at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1011)
  at 
org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:832)
  at 
org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:830)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
  at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
  at org.apache.spark.SparkContext.textFile(SparkContext.scala:830)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6666) org.apache.spark.sql.jdbc.JDBCRDD does not escape/quote column names

2015-04-01 Thread John Ferguson (JIRA)
John Ferguson created SPARK-:


 Summary: org.apache.spark.sql.jdbc.JDBCRDD  does not escape/quote 
column names
 Key: SPARK-
 URL: https://issues.apache.org/jira/browse/SPARK-
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
 Environment:  
Reporter: John Ferguson
Priority: Critical


Is there a way to have JDBC DataFrames use quoted/escaped column names?  Right 
now, it looks like it "sees" the names correctly in the schema created but does 
not escape them in the SQL it creates when they are not compliant:

org.apache.spark.sql.jdbc.JDBCRDD

private val columnList: String = {
val sb = new StringBuilder()
columns.foreach(x => sb.append(",").append(x))
if (sb.length == 0) "1" else sb.substring(1)
}


If you see value in this, I would take a shot at adding the quoting (escaping) 
of column names here.  If you don't do it, some drivers... like postgresql's 
will simply drop case all names when parsing the query.  As you can see in the 
TL;DR below that means they won't match the schema I am given.

TL;DR:
 
I am able to connect to a Postgres database in the shell (with driver 
referenced):

   val jdbcDf = 
sqlContext.jdbc("jdbc:postgresql://localhost/sparkdemo?user=dbuser", "sp500")

In fact when I run:

   jdbcDf.registerTempTable("sp500")
   val avgEPSNamed = sqlContext.sql("SELECT AVG(`Earnings/Share`) as AvgCPI 
FROM sp500")

and

   val avgEPSProg = jsonDf.agg(avg(jsonDf.col("Earnings/Share")))

The values come back as expected.  However, if I try:

   jdbcDf.show

Or if I try
   
   val all = sqlContext.sql("SELECT * FROM sp500")
   all.show

I get errors about column names not being found.  In fact the error includes a 
mention of column names all lower cased.  For now I will change my schema to be 
more restrictive.  Right now it is, per a Stack Overflow poster, not ANSI 
compliant by doing things that are allowed by ""'s in pgsql, MySQL and 
SQLServer.  BTW, our users are giving us tables like this... because various 
tools they already use support non-compliant names.  In fact, this is mild 
compared to what we've had to support.

Currently the schema in question uses mixed case, quoted names with special 
characters and spaces:

CREATE TABLE sp500
(
"Symbol" text,
"Name" text,
"Sector" text,
"Price" double precision,
"Dividend Yield" double precision,
"Price/Earnings" double precision,
"Earnings/Share" double precision,
"Book Value" double precision,
"52 week low" double precision,
"52 week high" double precision,
"Market Cap" double precision,
"EBITDA" double precision,
"Price/Sales" double precision,
"Price/Book" double precision,
"SEC Filings" text
) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org