[jira] [Commented] (SPARK-14919) Spark Cannot be used with software that requires jackson-databind 2.6+: RDDOperationScope
[ https://issues.apache.org/jira/browse/SPARK-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259247#comment-15259247 ] John Ferguson commented on SPARK-14919: --- So basically no matter what we - Spark, us, the world - as the community ask of them regrading this behavior - they don't listen? Is there an alternative. I mean I have my pom based work around but to be honest it feels filthy. > Spark Cannot be used with software that requires jackson-databind 2.6+: > RDDOperationScope > - > > Key: SPARK-14919 > URL: https://issues.apache.org/jira/browse/SPARK-14919 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 1.6.1 > Environment: Linux, OSX >Reporter: John Ferguson > > When using Spark 1.4.x or Spark 1.6.1 in an application that has a front end > requiring jackson-databind 2.6+, we see the follow exceptions: > Subset of stack trace: > == > com.fasterxml.jackson.databind.JsonMappingException: Could not find creator > property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope) > at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1] > at > com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148) > at > com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843) > at > com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533) > at > com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220) > at > com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143) > at > com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:405) > at > com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:354) > at > com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:262) > at > com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:242) > at > com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143) > at > com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439) > at > com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3664) > at > com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3556) > at > com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2576) > at > org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85) > at > org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136) > at scala.Option.map(Option.scala:145) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:714) > at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1011) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:832) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:830) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:714) > at org.apache.spark.SparkContext.textFile(SparkContext.scala:830) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14919) Spark Cannot be used with software that requires jackson-databind 2.6+: RDDOperationScope
[ https://issues.apache.org/jira/browse/SPARK-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Ferguson resolved SPARK-14919. --- Resolution: Not A Problem Although it is not optimal, by using POM dependencies in the application consuming Spark, we can force the required Jackson dependencies, specifically the Scala module ones to be up to date. This was not immediately obvious without digging into a lot of other documentation such as: https://github.com/FasterXML/jackson-module-scala/issues/177 However, given Jackson has been moving forward and without care for how changes impact legacy code, this may be an issue that returns. > Spark Cannot be used with software that requires jackson-databind 2.6+: > RDDOperationScope > - > > Key: SPARK-14919 > URL: https://issues.apache.org/jira/browse/SPARK-14919 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 1.6.1 > Environment: Linux, OSX >Reporter: John Ferguson > > When using Spark 1.4.x or Spark 1.6.1 in an application that has a front end > requiring jackson-databind 2.6+, we see the follow exceptions: > Subset of stack trace: > == > com.fasterxml.jackson.databind.JsonMappingException: Could not find creator > property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope) > at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1] > at > com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148) > at > com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843) > at > com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533) > at > com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220) > at > com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143) > at > com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:405) > at > com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:354) > at > com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:262) > at > com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:242) > at > com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143) > at > com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439) > at > com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3664) > at > com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3556) > at > com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2576) > at > org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85) > at > org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136) > at scala.Option.map(Option.scala:145) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:714) > at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1011) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:832) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:830) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:714) > at org.apache.spark.SparkContext.textFile(SparkContext.scala:830) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-14919) Spark Cannot be used with software that requires jackson-databind 2.6+: RDDOperationScope
[ https://issues.apache.org/jira/browse/SPARK-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Ferguson closed SPARK-14919. - See resolution. > Spark Cannot be used with software that requires jackson-databind 2.6+: > RDDOperationScope > - > > Key: SPARK-14919 > URL: https://issues.apache.org/jira/browse/SPARK-14919 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 1.6.1 > Environment: Linux, OSX >Reporter: John Ferguson > > When using Spark 1.4.x or Spark 1.6.1 in an application that has a front end > requiring jackson-databind 2.6+, we see the follow exceptions: > Subset of stack trace: > == > com.fasterxml.jackson.databind.JsonMappingException: Could not find creator > property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope) > at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1] > at > com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148) > at > com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843) > at > com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533) > at > com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220) > at > com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143) > at > com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:405) > at > com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:354) > at > com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:262) > at > com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:242) > at > com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143) > at > com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439) > at > com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3664) > at > com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3556) > at > com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2576) > at > org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85) > at > org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136) > at scala.Option.map(Option.scala:145) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:714) > at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1011) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:832) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:830) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:714) > at org.apache.spark.SparkContext.textFile(SparkContext.scala:830) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14919) Spark Cannot be used with software that requires jackson-databind 2.6+: RDDOperationScope
John Ferguson created SPARK-14919: - Summary: Spark Cannot be used with software that requires jackson-databind 2.6+: RDDOperationScope Key: SPARK-14919 URL: https://issues.apache.org/jira/browse/SPARK-14919 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 1.6.1 Environment: Linux, OSX Reporter: John Ferguson When using Spark 1.4.x or Spark 1.6.1 in an application that has a front end requiring jackson-databind 2.6+, we see the follow exceptions: Subset of stack trace: == com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope) at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1] at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148) at com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843) at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533) at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220) at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143) at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:405) at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:354) at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:262) at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:242) at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143) at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439) at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3664) at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3556) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2576) at org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85) at org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136) at org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136) at scala.Option.map(Option.scala:145) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.SparkContext.withScope(SparkContext.scala:714) at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1011) at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:832) at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:830) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.SparkContext.withScope(SparkContext.scala:714) at org.apache.spark.SparkContext.textFile(SparkContext.scala:830) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6666) org.apache.spark.sql.jdbc.JDBCRDD does not escape/quote column names
John Ferguson created SPARK-: Summary: org.apache.spark.sql.jdbc.JDBCRDD does not escape/quote column names Key: SPARK- URL: https://issues.apache.org/jira/browse/SPARK- Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Environment: Reporter: John Ferguson Priority: Critical Is there a way to have JDBC DataFrames use quoted/escaped column names? Right now, it looks like it "sees" the names correctly in the schema created but does not escape them in the SQL it creates when they are not compliant: org.apache.spark.sql.jdbc.JDBCRDD private val columnList: String = { val sb = new StringBuilder() columns.foreach(x => sb.append(",").append(x)) if (sb.length == 0) "1" else sb.substring(1) } If you see value in this, I would take a shot at adding the quoting (escaping) of column names here. If you don't do it, some drivers... like postgresql's will simply drop case all names when parsing the query. As you can see in the TL;DR below that means they won't match the schema I am given. TL;DR: I am able to connect to a Postgres database in the shell (with driver referenced): val jdbcDf = sqlContext.jdbc("jdbc:postgresql://localhost/sparkdemo?user=dbuser", "sp500") In fact when I run: jdbcDf.registerTempTable("sp500") val avgEPSNamed = sqlContext.sql("SELECT AVG(`Earnings/Share`) as AvgCPI FROM sp500") and val avgEPSProg = jsonDf.agg(avg(jsonDf.col("Earnings/Share"))) The values come back as expected. However, if I try: jdbcDf.show Or if I try val all = sqlContext.sql("SELECT * FROM sp500") all.show I get errors about column names not being found. In fact the error includes a mention of column names all lower cased. For now I will change my schema to be more restrictive. Right now it is, per a Stack Overflow poster, not ANSI compliant by doing things that are allowed by ""'s in pgsql, MySQL and SQLServer. BTW, our users are giving us tables like this... because various tools they already use support non-compliant names. In fact, this is mild compared to what we've had to support. Currently the schema in question uses mixed case, quoted names with special characters and spaces: CREATE TABLE sp500 ( "Symbol" text, "Name" text, "Sector" text, "Price" double precision, "Dividend Yield" double precision, "Price/Earnings" double precision, "Earnings/Share" double precision, "Book Value" double precision, "52 week low" double precision, "52 week high" double precision, "Market Cap" double precision, "EBITDA" double precision, "Price/Sales" double precision, "Price/Book" double precision, "SEC Filings" text ) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org