[GitHub] spark pull request: [SPARK-5040][SQL] Support expressing unresolve...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3862#issuecomment-68471873 Hi @rxin I think you meant to name the class StringToAttributeConversionHelper. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4804] StringContext method to allow Str...
GitHub user culler opened a pull request: https://github.com/apache/spark/pull/3649 [SPARK-4804] StringContext method to allow String references to column names in Catalyst DSL You can merge this pull request into a Git repository by running: $ git pull https://github.com/culler/spark dslstringcontext Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3649.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3649 commit d6460f6396e1a5cb3f356236cf22a1ef1aad3147 Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-07T16:11:09Z Adds RichDate and RichTimestamp classes with comparison operators, allowing them to be used in DSL expressions. These classes provide initializers which accept string representations of dates or times. They are renamed as Date and Timestamp when the members of an SQLContext are in scope. commit fa073246a392a1234141bceb61a6a71780fd3b23 Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-07T16:17:33Z Adds new implicit conversions which allow DSL expressions to start with a literal, e.g. 0 'x . These conversions expose a conflict with the scalatest === operator if assert(X === Y) is used when the conversions are in scope. To fix this, several tests are modified, as recommended in the scalatest documentation, by making the change: assert(X === Y) -- assert(convertToEqualizer(X).===(Y)) commit 492feefaea6531b3374bf3e753e61234b79e1184 Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-07T16:38:18Z Clarification of one comment. commit 53aced3bd6f7dba5d124a7070283e95e05b61914 Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-08T19:46:10Z Fixed implicit conversions for RichDate and RichTimestamp and added a test which would have detected the problem. commit bd539b6ce8a45e9a6d32b77fe55296a7a960d4bf Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-08T21:00:14Z Rebased and fixed the assert commands in ParquetQuerySuite.scala. commit 353d12ca82eea8b097076adcf5b315641bbb81b9 Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-09T16:54:10Z Add unapply methods to RichDate and RichTimestamp and make their types available after importing the members of an SQLContext. commit eff68fe04ac699ad180fd7a96b71d6b14159aba0 Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-10T03:28:25Z Modified RichTimestamp.toString to use the Hive format. commit 759e883ed1eab286ef6b437ec8c194bc8cced50b Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-10T13:56:16Z Rebasing and making comments consistent with ScalaDocs. commit 0c86b93e350bb3927f07907b256418036789c038 Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-10T15:05:41Z Removing all changes to SpecificMutableRow.scala. Mutable Dates and Timestamps are not useful. commit 1414f302fe5587dba5c2dec4f33f7001223584ce Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-10T23:31:19Z New DSL equals operator -=-. Restored all test files as this does not collide with scalatests === operator. commit dabcaf262d68565ecae0092e70fe81f0be6307bd Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-10T23:44:42Z Cleaning up more comments and one more test. commit ad2940dad053089182dd65c2d9d0db6c4fedc0cf Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-11T04:57:08Z Removed LhsLiterals for another PR. Cleaned up. commit f917e06db9925dc02bc98df3e25ba60642d007a3 Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-11T05:02:22Z Removed test which is no longer part of this PR. commit 4c32c0416093fedcb70acdcbfbeb112ded4406eb Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-11T05:25:55Z Restored lost import of scalatest.Assertions.convertToEqualizer. commit 4cfb8640f3b185de1e2ffacaf30f6d72a8c1d9dc Author: Marc Culler marc.cul...@gmail.com Date: 2014-12-09T20:51:32Z Merge branch 'master' of https://github.com/apache/spark commit b3a432d94bfd6856653c4019e0da4f9949582619 Author: Marc Culler marc.cul...@gmail.com Date: 2014-12-09T21:31:23Z Clean copy of upstream master commit a6e7a307b97cb951dcbb20aa52e08ac1c24d43da Author: Marc Culler marc.cul...@gmail.com Date: 2014-12-09T21:47:19Z Merge branch 'master' of https://github.com/apache/spark into clean commit 26aa58f951df85c279da844775dbee9d7a0953ff Author: Marc Culler marc.cul...@gmail.com Date: 2014-12-09T22:34:59Z Added StringContext method to convert String to UnresolvedAttribute. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3158#issuecomment-63154275 Maybe next week will be better. The PR is now much smaller, with the changes to test files removed and the DSL part moved into PR #3210. On Fri, Nov 7, 2014 at 3:02 PM, Michael Armbrust notificati...@github.com wrote: Thanks for working on this. Is a pretty big change so I'll review it next week after we get some critical fixes in for 1.2. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3158#issuecomment-62211082. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] DSL expressions with a liter...
GitHub user culler opened a pull request: https://github.com/apache/spark/pull/3210 [SPARK-4205][SQL] DSL expressions with a literal on the left hand side Currently the DSL expression parser does not recognize expressions which begin with a literal, such as: `0 'x`. This PR adds a new implicit class which allows such expressions to be included in the DSL. Due to conflicts with scalatest, which converts `===` and `!==` in its `assert` macro, these two operators can not be used by the new class. It adds two new operators `-=-` and `!=-` which behave identically to `===` and `!==` within the DSL, but can also be used in expressions which start with a literal. The new class replaces the comment: `// TODO more implicit class for literal?` I think this may be what the author had in mind. You can merge this pull request into a Git repository by running: $ git pull https://github.com/culler/spark lhs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3210.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3210 commit 1d42fff2f5d77df8b9d0c0fcbcabf0d46d01c174 Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-11T20:34:33Z Adding an implicit class which enables DSL expressions which begin with a literal to be recognized. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3158#issuecomment-62618892 Thanks! On Mon, Nov 10, 2014 at 6:17 PM, Michael Armbrust notificati...@github.com wrote: Quick note: you don't need a fork per PR. You can have multiple branches in a single fork, each of them with a corresponding PR. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3158#issuecomment-62480969. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] DSL expressions with a liter...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3210#issuecomment-62619183 @marmbrus and @liancheng, here is the other half of PR #3158. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...
Github user culler commented on a diff in the pull request: https://github.com/apache/spark/pull/3158#discussion_r20080979 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SpecificMutableRow.scala --- @@ -169,6 +170,35 @@ final class MutableByte extends MutableValue { newCopy.asInstanceOf[this.type] } } +final class MutableDate extends MutableValue { --- End diff -- Thanks @liancheng, I was wondering about that. I will remove them. On Mon, Nov 10, 2014 at 2:09 AM, Cheng Lian notificati...@github.com wrote: In sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SpecificMutableRow.scala: @@ -169,6 +170,35 @@ final class MutableByte extends MutableValue { newCopy.asInstanceOf[this.type] } } +final class MutableDate extends MutableValue { According to our discussion here https://github.com/apache/spark/pull/3066#discussion_r19784664, these two classes are not necessary. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3158/files#r20070128. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...
Github user culler commented on a diff in the pull request: https://github.com/apache/spark/pull/3158#discussion_r20082032 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/timetypes.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import java.sql.{Date, Timestamp} +import scala.language.implicitConversions + +/* * + * Subclass of java.sql.Date which provides the usual comparison + * operators (as required for catalyst expressions) and which can + * be constructed from a string. + * + * scala val d1 = Date(2014-02-01) + * d1: Date = 2014-02-01 + * + * scala val d2 = Date(2014-02-02) + * d2: Date = 2014-02-02 + * + * scala d1 d2 + * res1: Boolean = true + */ + +class RichDate(milliseconds: Long) extends Date(milliseconds) { + def (that: Date): Boolean = this.before(that) + def (that: Date): Boolean = this.after(that) + def = (that: Date): Boolean = (this.before(that) || this.equals(that)) + def = (that: Date): Boolean = (this.after(that) || this.equals(that)) + def === (that: Date): Boolean = this.equals(that) +} + +object RichDate { + def apply(init: String) = new RichDate(Date.valueOf(init).getTime) + + def unapply(richdate: RichDate): Option[Date] = Some(new Date(richdate.getTime)) +} + +/* * + * Analogous subclass of java.sql.Timestamp. + * + * scala val ts1 = Timestamp(2014-03-04 12:34:56.12) + * ts1: Timestamp = 2014-03-04 12:34:56.12 + * + * scala val ts2 = Timestamp(2014-03-04 12:34:56.13) + * ts2: Timestamp = 2014-03-04 12:34:56.13 + * + * scala ts1 ts2 + * res13: Boolean = true --- End diff -- Thanks @liancheng. I am sorry I did not notice I had done that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3158#issuecomment-62395022 â Hey @culler, I noticed that a significant part of the this PR is about changing a === b to EQ(a).===(b). I understand that this is used to avoid function name collision between ScalaTest === and the newly introduced === DSL operator for Date and Timestamp, âYes, I am very unhappy about that. âI will study further how to work around it some other way. However, you have not understood the problem. The problem does not have anything to do with my tests. Even if I completely remove my tests, the problem will still arise. Also, the changes I am proposing do not change the DSL at all. The ugly `EQ` construction *only* appears in scala source files which import *both* the scalatest FunSuite *and* the SQLContext conversions. That ugly stuff *never* appears in a DSL expression. Let me try to explain. I have added some new implicit conversions. When these conversions are in scope they add `===` operators to some basic types, e.g Int. The definition of this `===` operator is: `def === (other: Symbol) = EqualTo(literal, other)` so this implicit conversion would not be applied when evaluating expressions such as the ones in test files which import scalatest. E.g. it does not apply in `assert(literals.size === 4)` because the right hand side (4) is not a Symbol. So there really is no conflict here. The problem is that even though the implicit conversion does not apply, the scalatest `assert` **macro** is nonetheless deciding that the `literals.size` already has an `===` operator, and so it refuses to apply its own implicit conversion which would add a different `===` operator. Note that this situation is different from the one which arises with the DSL `===` operator. In that case the operator is simply a method of Expression. If a DSL expression begins with a Symbol the implict conversions convert the Symbol to an Expression which then has an `===` operator. But this does not work when the DSL expression begins with a literal. In fact, this is exactly the reason why the DSL could not recognize and expression like `0 'x`. This is the problem I am solving. EQ(a).===(b) looks too cumbersome, and looses all the meaning about DSL operators -- conciseness and readability. âPlease note, this construction would never be used in a DSL expression.â The === name collision problem had already been solved once when org.apache.spark.sql.catalyst.dsl.=== is introduced, and it has been proven to be easy to workaround without such significant change. The situation was different, since that `===` operator is only a method of Expression.â So my suggestion is that, to workaround this issue. You may add your tests in DslQuerySuite with the help of checkAnswer, and simply don't use both === within a same scope. âUnderstood. But that will not solve the problem. The other tests will still fail to compile.â Given that the `EQ` construction is not acceptable for use in a test, I will look into the possibility of requiring a separate import for these left hand side conversions, so that they would not need to be in scope for any of the tests that are not testing this specific feature. Maybe that is the cleanest workaround. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3158#issuecomment-62480405 Hi @marmbrus and @liancheng, I have now tried what seems to be a very simple and direct solution to the `===` problem. My patch no longer uses `===`. I added a new equality operator to the DSL called `-=-`. It is identical to the `===` operator (which is still in place) with the exception that it also works for DSL expressions which begin with a literal. And it does not cause any conflicts with scalatest. I reverted all of the changes to the various test files that were intended to work around the `===` issue. That brings this PR down to 5 files. All of the changes related to the DSL are in one file: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala. All of the changes related to RichDate and RichTimestamp are in the other 4 files. This means that the two different changes have effectively been isolated and the structure of the PR should be more transparent now. It would not be hard to remove the one file and start another PR for it, except for the technical detail that github only allows one fork at a time. I suppose I could fork myself, so to speak, and open another github account for my alter ego. PS The `===` issue is not just about having too many implicit conversions in scope -- it is also about combining those conversions with macros. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3158#issuecomment-62508858 OK @marmbrus and @liancheng, this PR is now fit and slim. The LhsLiterals have been removed and will appear in a new PR. This one passes all tests and I think it is ready for review, finally. 谢谢, @liancheng, for your patience, thoughtful comments and careful reading. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3158#issuecomment-62339566 Tests are failing when comparing Hive results to Catalyst results due to different string conversions of Timestamps: 2011-01-01 01:01:01 (Hive) vs 2011-01-01 01:01:01.0 (Catalyst). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date classes w...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3066#issuecomment-62257326 Thanks! I'm proud to be a member of the club. :^) On Fri, Nov 7, 2014 at 11:16 PM, Cheng Lian notificati...@github.com wrote: @culler https://github.com/culler Don't worry, I guess every contributor had screwed up rebasing at least once :) â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3066#issuecomment-62246351. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date classes w...
Github user culler closed the pull request at: https://github.com/apache/spark/pull/3066 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date classes w...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3066#issuecomment-62158264 Hi @liancheng , now that I have completely screwed up this PR by attempting to rebase the repository, I will close it and open a new one which will hopefully be clean. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...
GitHub user culler opened a pull request: https://github.com/apache/spark/pull/3158 [SPARK-4205][SQL] Timestamp and Date with comparisons / DSL literals This is the same as pull request #3066, which I closed due to corruption of the repository after I tried to rebase so as to include modifications to a test file added after the original pull request was issued. There are two parts: (1) new RichDate and RichTimestamp classes provide comparison operators, which allows them to be used in DSL expressions, and initializers which accept string representations of dates or times; (2) new implicit conversions are added which allow recognition of DSL expressions which have a literal on the left, e.g. 0 'x . You can merge this pull request into a Git repository by running: $ git pull https://github.com/culler/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3158.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3158 commit 7180345ba7ce255a2fc389ae8c55998ed20b9a82 Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-07T16:11:09Z Adds RichDate and RichTimestamp classes with comparison operators, allowing them to be used in DSL expressions. These classes provide initializers which accept string representations of dates or times. They are renamed as Date and Timestamp when the members of an SQLContext are in scope. commit bcf6e6bb143f8e4a5f22356fadae54fce4f57041 Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-07T16:17:33Z Adds new implicit conversions which allow DSL expressions to start with a literal, e.g. 0 'x . These conversions expose a conflict with the scalatest === operator if assert(X === Y) is used when the conversions are in scope. To fix this, several tests are modified, as recommended in the scalatest documentation, by making the change: assert(X === Y) -- assert(convertToEqualizer(X).===(Y)) commit ef5e4a4230d671ed2ae19f74c280f5e8c44f41aa Author: Marc Culler marc.cul...@gmail.com Date: 2014-11-07T16:38:18Z Clarification of one comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3158#issuecomment-62175638 @liangcheng and @rxin, I am reopening pull request #3066 as #3158 so it can be based on a current commit of the spark source. I messed up #3066 by trying to rebase it after a new test file was added which required minor changes to compile. Sorry for any confusion this causes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date with comp...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3158#issuecomment-6677 Thanks @marmbrus. That sounds good. If it isn't too much trouble, I'd appreciate seeing what jenkins has to say before you review it next week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date classes w...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3066#issuecomment-62036504 The file sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala was added after this PR and caused a test to fail due to the collision between === operators. I am adding the whole file to this PR (with explicit conversions for === added to it). I hope that is the right way to handle this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date classes w...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3066#issuecomment-61830155 Thanks @liancheng. I deleted the comment to avoid wasting your time since I figured out what was going on. Actually it was not my test suite that i was worried about. Many existing tests would not compile because my new conversions add an === operator to types such as Int when the new conversions are in scope, as they are when testing the DSL. I followed scalatest's advice for dealing with cases where an === operator already exists. Namely, to apply their conversion explicitly: assert(X === Y) -- assert(convertToEqualizer(X).===(Y)) Will that be OK? (I import convertToEqualizer as EQ for brevity.) I wanted to pass the existing tests before adding any new ones. I am checking that now. It looks good so far, but unfortunately, 11 test files had to be edited as above. On Tue, Nov 4, 2014 at 11:31 PM, Cheng Lian notificati...@github.com wrote: Hey @culler https://github.com/culler, saw you comment about testing yesterday (gone now?) but didn't have time to response. Yes, the === operator is used by in both Spark SQL DSL and ScalaTest. To workaround this, you can either add your test case in DslQuerySuite and use checkAnswer for assertion, or simply avoid to use ===, there're some other assertion functions in ScalaTest, for example, assertResult is one of my favorite :) â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3066#issuecomment-61761929. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date classes w...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3066#issuecomment-61870151 Hi @liancheng. Tests have been added. The PR builds, satisfies lint, and pass the tests on my system. (Except there were two failures in the SparkSubmitSuite, which I think are not related to this PR.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Timestamp and Date classes which work in the c...
Github user culler commented on a diff in the pull request: https://github.com/apache/spark/pull/3066#discussion_r19730958 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala --- @@ -146,6 +146,31 @@ package object dsl { def upper(e: Expression) = Upper(e) def lower(e: Expression) = Lower(e) +/* + * Conversions to provide the standard operators in the special case + * where a literal is being combined with a symbol. Without these an + * expression such as 0 'x is not recognized. + */ +implicit class InitialLiteral(x: Any) { + val literal = Literal(x) + def + (other: Symbol):Expression = {literal + other} + def - (other: Symbol):Expression = {literal - other} + def * (other: Symbol):Expression = {literal * other} + def / (other: Symbol):Expression = {literal / other} + def % (other: Symbol):Expression = {literal % other} + + def (other: Symbol):Expression = {literal other} + def || (other: Symbol):Expression = {literal || other} + + def (other: Symbol):Expression = {literal other} + def = (other: Symbol):Expression = {literal = other} + def (other: Symbol):Expression = {literal other} + def = (other: Symbol):Expression = {literal = other} + def === (other: Symbol):Expression = {literal === other} + def = (other: Symbol):Expression = {literal = other} + def !== (other: Symbol):Expression = {literal !== other} --- End diff -- Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Timestamp and Date classes which work in the c...
Github user culler commented on a diff in the pull request: https://github.com/apache/spark/pull/3066#discussion_r19731906 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Projection.scala --- @@ -139,6 +140,12 @@ class JoinedRow extends Row { def getString(i: Int): String = if (i row1.size) row1.getString(i) else row2.getString(i - row1.size) + def getDate(i: Int): Date = +if (i row1.size) row1.getDate(i) else row2.getDate(i - row1.size) + + def getTimestamp(i: Int): Timestamp = +if (i row1.size) row1.getTimestamp(i) else row2.getTimestamp(i - row1.size) + --- End diff -- I see. I am removing them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Timestamp and Date classes which work in the c...
Github user culler commented on a diff in the pull request: https://github.com/apache/spark/pull/3066#discussion_r19733162 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Row.scala --- @@ -42,6 +44,31 @@ object Row { * This method can be used to construct a [[Row]] from a [[Seq]] of values. */ def fromSeq(values: Seq[Any]): Row = new GenericRow(values.toArray) + + /** + * This method can be used to construct a [[Row]] from a [[Seq]] of Strings, + * converting each item to the type specified in a [[StructType]] schema. + * Only primitive types can be used. + */ + def fromStringsBySchema(strings: Seq[String], schema: StructType): Row = { + val values = for { + (field, str) - schema.fields zip strings + item = field.dataType match { + case IntegerType= str.toInt + case LongType = str.toLong + case DoubleType = str.toDouble + case FloatType = str.toFloat + case ByteType = str.toByte + case ShortType = str.toShort + case StringType = str + case BooleanType= (str != ) + case DateType = Date.valueOf(str) + case TimestampType = Timestamp.valueOf(str) + case DecimalType() = new BigDecimal(str) + } + } yield item + new GenericRow(values.toArray) + } --- End diff -- It doesn't need to be a Row method. I will move it outside ( and worry about the Boolean case.) There is an issue lurking here, however. A SchemaRDD should have Rows which match its schema. But mismatches generate no errors until the last stage, e.g. when one collects the results of a query. My thought was that it would be helpful to have a safe Row constructor which would check its data against a schema and raise an error if they do not match (after applying whatever implicit conversions are available. (Yes, this is a separate issue and does not belong here.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Timestamp and Date classes which work in the c...
Github user culler commented on a diff in the pull request: https://github.com/apache/spark/pull/3066#discussion_r19733534 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala --- @@ -187,7 +187,8 @@ case object NullType extends DataType object NativeType { val all = Seq( -IntegerType, BooleanType, LongType, DoubleType, FloatType, ShortType, ByteType, StringType) +IntegerType, BooleanType, LongType, DoubleType, FloatType, ShortType, +ByteType, StringType, DateType, TimestampType) --- End diff -- Yes, but the new line does exceed 100 columns due to the addition of two new items DateType and TimestampType. (Or so the compiler tells me.) Perhaps there is a reason why these types should not be regarded as Native but I could not see any such reason. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Timestamp and Date classes which work in the c...
Github user culler commented on a diff in the pull request: https://github.com/apache/spark/pull/3066#discussion_r19733740 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/timetypes.scala --- @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import java.sql.{Date = JDate, Timestamp = JTimestamp} +import scala.language.implicitConversions + +/* + * Subclass of java.sql.Date which provides the usual comparison + * operators (as required for catalyst expressions) and which can + * be constructed from a string. + * + * scala val d1 = Date(2014-02-01) + * d1: Date = 2014-02-01 + * + * scala val d2 = Date(2014-02-02) + * d2: Date = 2014-02-02 + * + * scala d1 d2 + * res1: Boolean = true + */ + +class Date(milliseconds: Long) extends JDate(milliseconds) { --- End diff -- OK. That makes sense and avoids problems. But how about if they are still imported as org.apache.spark.sql.Date and org.apache.spark.sql.Timestamp at the top level? My reasoning is based on providing consistency for a user. A user is not required to use RichInts or RichStrings as literals in a catalyst expression. So why would they expect to have to use RichDates or RichTimestamps? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Timestamp and Date classes which work in the c...
Github user culler commented on a diff in the pull request: https://github.com/apache/spark/pull/3066#discussion_r19735638 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/package.scala --- @@ -451,4 +451,33 @@ package object sql { * Builder for [[Metadata]]. If there is a key collision, the latter will overwrite the former. */ type MetadataBuilder = catalyst.util.MetadataBuilder + + /** + * :: DeveloperApi :: + * + * A Date class which support the standard comparison operators, for + * use in DSL expressions. Implicit conversions to java.sql.Date + * are provided. The class intializer accepts a String, e.g. + * + * val ts = Date(2014-01-01) --- End diff -- Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Timestamp and Date classes which work in the c...
Github user culler commented on a diff in the pull request: https://github.com/apache/spark/pull/3066#discussion_r19736442 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/package.scala --- @@ -451,4 +451,33 @@ package object sql { * Builder for [[Metadata]]. If there is a key collision, the latter will overwrite the former. */ type MetadataBuilder = catalyst.util.MetadataBuilder + + /** + * :: DeveloperApi :: + * + * A Date class which support the standard comparison operators, for + * use in DSL expressions. Implicit conversions to java.sql.Date + * are provided. The class intializer accepts a String, e.g. + * + * val ts = Date(2014-01-01) --- End diff -- And more thanks for reading the code so carefully and for all of the thoughtful comments and helpful advice. I am pushing fixes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Timestamp and Date classes which work in the c...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3066#issuecomment-61488635 I have corrected the issues pointed out in the comments. Please retest. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Timestamp and Date classes which work in the c...
Github user culler commented on a diff in the pull request: https://github.com/apache/spark/pull/3066#discussion_r19737251 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnType.scala --- @@ -372,6 +372,11 @@ private[sql] object TIMESTAMP extends NativeColumnType(TimestampType, 9, 12) { override def setField(row: MutableRow, ordinal: Int, value: Timestamp): Unit = { row(ordinal) = value } + + def append(v: Date, buffer: ByteBuffer) { +buffer.putLong(v.getTime) + } + --- End diff -- This won't compile without an implementation of the append method, because it is declared in the abstract ColumnType class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Timestamp and Date classes which work in the c...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3066#issuecomment-61498140 Hmmm. The GitHub docs seem to suggest that I am supposed to mention @liancheng and @rxin in my comments. So I am doing that. My apologies if this is superfluous or annoying. - @culler --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4205][SQL] Timestamp and Date classes w...
Github user culler commented on the pull request: https://github.com/apache/spark/pull/3066#issuecomment-61599608 Hi Cheng, I will happily add tests. There is one thing I could use some help with, though. It would be very helpful if I could run the existing test suite. But I am currently unable to compile several of the tests involving catalyst expressions. The problem is that the compiler objects whenever a triple = ( === ) is used inside of an assert command. I realize that this is a feature of FunSuite. But with the current spark source distribution and the FunSuite jar that it installs, this feature does not work for me. The compiler says that the === operator expects a Symbol argument, which suggests that it is somehow being redefined by catalyst libraries. (It works if I just import FunSuite alone.) Can you tell me the trick for getting FunSuite working without having to change all of those test suite files? Thanks, and sorry to have to bother you with this. - Marc On Mon, Nov 3, 2014 at 10:31 PM, Cheng Lian notificati...@github.com wrote: Hey @culler https://github.com/culler, would you please add tests for all the new features you introduced in this PR? â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3066#issuecomment-61592236. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Timestamp and Date classes which work in the c...
GitHub user culler opened a pull request: https://github.com/apache/spark/pull/3066 Timestamp and Date classes which work in the catalyst DSL. These Timestamp and Date classes support the usual comparison operators, so the can be used in catalyst Expressions. The come with implicit conversions to and from the underlying Java classes. They also have a convenient initializer which accepts a String. Additional implicit conversions fix an issue with the DSL where expressions like 0 'x are not recognized. An independent change adds a method to Row for creating a row from a sequence of Strings and a schema. You can merge this pull request into a Git repository by running: $ git pull https://github.com/culler/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3066.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3066 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org