[GitHub] spark pull request: [SPARK-12010][SQL] Spark JDBC requires support...

CK50 Wed, 02 Dec 2015 02:27:13 -0800

Github user CK50 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10066#discussion_r46398015
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/CassandraDialect.scala ---
    @@ -0,0 +1,53 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.jdbc
    +
    +import java.sql.Types
    +
    +import org.apache.spark.sql.types._
    +
    +
    +private case object CassandraDialect extends JdbcDialect {
    +
    +  override def canHandle(url: String): Boolean =
    +    url.startsWith("jdbc:datadirect:cassandra") ||
    +    url.startsWith("jdbc:weblogic:cassandra")
    +
    +  override def getInsertStatement(table: String, rddSchema: StructType): 
String = {
    +    val sql = new StringBuilder(s"INSERT INTO $table ( ")
    +    var fieldsLeft = rddSchema.fields.length
    +    var i = 0
    +    // Build list of column names
    +    while (fieldsLeft > 0) {
    +      sql.append(rddSchema.fields(i).name)
    +      if (fieldsLeft > 1) sql.append(", ")
    --- End diff --
    
    @Sean: I am working on your suggested changes. Looks like the code can 
    be collapsed into one or two lines.
    
    Meanwhile I have found that inserting into an Oracle table having more 
    columns than columns in the dataframe results in
    
    java.sql.BatchUpdateException: ORA-00947: not enough values if there are 
    any unmapped columns.
    
    This does not matter, as long as the table matches exactly the 
    dataframe. But as soon as someone wants to insert into an existing table 
    with more columns than the dataframe has, this is a problem.
    
    So it may indeed be better to include the suggested change for other 
    technologies as well.
    
    The key question I see is: Is it okay to rely on the dataframe column 
    names matching the target table column names?
    
    If so, do you suggest changing the default behaviour to include column 
    names for all dialects?
    
    Does Spark automated tests have coverage for different databases? / 
    Would any regression be caught prior to merge?
    
    btw,
    re squashing commits: I will try, but for now I need to better 
    understand how all of this works in GitHub.
    
    On 01.12.2015 13:17, Sean Owen wrote:
    >
    > In 
    > sql/core/src/main/scala/org/apache/spark/sql/jdbc/CassandraDialect.scala 
    > <https://github.com/apache/spark/pull/10066#discussion_r46270863>:
    >
    > > +
    > > +
    > > +private case object CassandraDialect extends JdbcDialect {
    > > +
    > > +  override def canHandle(url: String): Boolean =
    > > +    url.startsWith("jdbc:datadirect:cassandra") ||
    > > +    url.startsWith("jdbc:weblogic:cassandra")
    > > +
    > > +  override def getInsertStatement(table: String, rddSchema: 
StructType): String = {
    > > +    val sql = new StringBuilder(s"INSERT INTO $table ( ")
    > > +    var fieldsLeft = rddSchema.fields.length
    > > +    var i = 0
    > > +    // Build list of column names
    > > +    while (fieldsLeft > 0) {
    > > +      sql.append(rddSchema.fields(i).name)
    > > +      if (fieldsLeft > 1) sql.append(", ")
    >
    > Nits: braces and newline for the |if|; use |a += 1| instead of |a = a 
    > + 1| in the two lines below; extra blank line above near the imports.
    >
    > You should probably also make this more idiomatic and compact. For 
    > example this |while| loop collapses to 
    > |rddSchema.fields.map(_.name).mkString(", ")|, I believe. Similarly 
    > for the final |while|. And then the entire method doesn't need to 
    > manually build it up with |StringBuilder|. This is probably a couple 
    > lines of code.
    >
    > â
    > Reply to this email directly or view it on GitHub 
    > <https://github.com/apache/spark/pull/10066/files#r46270863>.
    >
    
    
    -- 
    Oracle <http://www.oracle.com>
    Christian Kurz | Consulting Member of Technical Staff
    Phone: +49 228 30899431 <tel:+49%20228%2030899431> | Mobile: +49 170 
    2964124 <tel:+49%20170%202964124>
    Oracle Product Development
    
    ORACLE Deutschland B.V. & Co. KG | Hamborner Str. 51 | 40472 DÃ¼sseldorf
    
    ORACLE Deutschland B.V. & Co. KG
    Hauptverwaltung: Riesstr. 25, D-80992 MÃ¼nchen
    Registergericht: Amtsgericht MÃ¼nchen, HRA 95603
    
    KomplementÃ¤rin: ORACLE Deutschland Verwaltung B.V.
    Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
    Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
    GeschÃ¤ftsfÃ¼hrer: Alexander van der Ven, Astrid Kepper, Val Maher
    
    Green Oracle <http://www.oracle.com/commitment> Oracle is committed to 
    developing practices and products that help protect the environment




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12010][SQL] Spark JDBC requires support...

Reply via email to