Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82836/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82836 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82836/testReport)**
for PR 18664 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82836 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82836/testReport)**
for PR 18664 at commit
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18664
Jenkins, retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82832/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82832 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82832/testReport)**
for PR 18664 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82832 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82832/testReport)**
for PR 18664 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82826 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82826/testReport)**
for PR 18664 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82826/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82826 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82826/testReport)**
for PR 18664 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82650/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82650 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82650/testReport)**
for PR 18664 at commit
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
I think I sort of have things working now the way we discussed. Working
with timestamps in `toPandas()` was pretty straightforward, but there are some
differences with them in `pandas_udf` and
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82650 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82650/testReport)**
for PR 18664 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82641/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82641 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82641/testReport)**
for PR 18664 at commit
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
Thanks @ueshin , I agree it is better to convert the timezone to Python
system local first and then localize to make tz-naive in case the Python system
local tz is different that
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82641 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82641/testReport)**
for PR 18664 at commit
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/18664
That's a great explanation. I think you are right. Using
`SQLConf.SESSION_LOCAL_TIMEZONE` makes much more sense to me now.
---
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18664
I disagree with using `DateTimeUtils.defaultTimeZone()` for the timezone.
If `DateTimeUtils.defaultTimeZone()` is different from system timezone in
Python, the return values are different between
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82613 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82613/testReport)**
for PR 18664 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82613/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82613 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82613/testReport)**
for PR 18664 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82612/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82612 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82612/testReport)**
for PR 18664 at commit
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
@HyukjinKwon @ueshin , please take a look. This should handle timestamps
with Arrow the same as without Arrow. I still need to add some tests for
timestamps with `pandas_udf`s.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82612 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82612/testReport)**
for PR 18664 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82611/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82611 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82611/testReport)**
for PR 18664 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #82611 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82611/testReport)**
for PR 18664 at commit
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
I'll work on doing (1) to have conversions in Python for Arrow to match
Non-Arrow and we can see how that turns out.
---
-
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
> I'm just wondering what if we use timestamp in nested types. Currently we
don't support nested types but in the future?
I'll try to take this into account, or at least add a note for
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
> BTW, do you think it is possible to easily de-duplicate timezone handling
for both with-Arrow and without-Arrow within Python side if we go for 1. in the
separate PR?
@HyukjinKwon ,
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18664
I'd say I prefer 1, too. I'm just wondering what if we use timestamp in
nested types. Currently we don't support nested types but in the future?
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/18664
> Write Arrow data with SESSION_LOCAL timestamp (as is currently in this PR)
BTW, could we just use `DateTimeUtils.defaultTimeZone()` instead of
`SQLConf.SESSION_LOCAL_TIMEZONE` if you
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/18664
@BryanCutler, BTW, do you think it is possible to de-duplicate timezone
handling within Python side if we go for 1.?
---
-
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/18664
I think I prefer 1. Do you maybe have a preference @ueshin? I believe you
are more insightful in this.
---
-
To
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
Ok sounds good. Could I get some opinions on the best way to convert
internal Spark timestamps since they are stored as UTC time? I think we have
the following options:
1. Write
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/18664
Yup, I think we already don't have timezone in `udf` too? I think we are
fine as long as it keeps the existing behaviour. Let's don't forget to handle
all those cases when we deal with timezone
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
@HyukjinKwon and @ueshin so with Arrow, the Pandas DataFrame from
`toPandas()` timestamp columns will not have a timezone - are we going to do
the same thing for `pandas_udf` Series? I was
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18664
I'm sorry for the delay.
I agree with @HyukjinKwon's suggestion to keep the behavior of current
`toPandas` without Arrow for now.
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/18664
Yup, that's what I suggested. To me, it sounds few issues are convoluted
here and want to proceed what we are clear for now separately.
---
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
Made [SPARK-1](https://issues.apache.org/jira/browse/SPARK-1) for
user doc, once we decide what to do with timestampes it can be completed
---
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/18664
Bryan, I haven't created. Go ahead!
On Fri, Oct 6, 2017 at 5:45 PM Bryan Cutler
wrote:
> Thanks all for the discussion. I think there are a lot of
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
Thanks all for the discussion. I think there are a lot of subtleties at
play here, and what may or may not be considered a bug can depend on the users
intent. Regardless, I agree that there
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/18664
I am okay with proceeding separately for dealing with timezone, and
matching the behaviour with Arrow to the existing behaviour without Arrow here
with respect to timezone.
Less sure
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/18664
If we all agree on the necessity of a design doc first, I can create a Jira
and we can make progress there.
What do you all think? @BryanCutler @gatorsmile @HyukjinKwon
---
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/18664
I agree. I think some high level document describing these differences so
we can discuss it. I think we should be more careful about Arrow-version
behavior before releasing support for timestamp
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/18664
Yup, I admit there could be some exceptions (there have been actually) but
that should still be the baseline we should basically pursue. Probably, we
could treat this Arrow optimisation as an
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/18664
> The baseline should be (as said above): Internal optimisation should not
introduce any behaviour change, and we are discouraged to change the previous
behaviour unless it has bugs in general.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/18664
The baseline should be (as said above): Internal optimisation should not
introduce any behaviour change, and we are discouraged to change the previous
behaviour unless it has bugs in general.
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/18664
cc @ueshin
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/18664
Thanks @gatorsmile for the constructive feedback!
I don't want to make this more complicated but I also want to make sure we
are aware that there is also difference between
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18664
I think all of the involved reviewers agree this is a pretty serious design
issue. We are unable to change the behavior after we officially release it.
Thus, we have to be very very careful
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/18664
I agree with Bryan. I think we might want to rethink the assumption that
toPandas result with arrow / without arrow should be 100% the same.
For instance, non-Arrow doesn't respect
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
@ueshin @HyukjinKwon , I think it would be critical for users to have
timestamps working for Arrow. Just to recap, the remaining issue here was that
`toPandas()` without Arrow does not have
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80646/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #80646 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80646/testReport)**
for PR 18664 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #80646 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80646/testReport)**
for PR 18664 at commit
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
I'm ok with that @ueshin , I'll revert back to the PR you made then remove
the default value and throw exception if there is a TimestampType and
`timeZoneId` is `None.
---
If your project is
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18664
@BryanCutler I'm sorry for the delay.
I think it's too strict as an API to use `SparkSession` to apply timezone.
How about throwing an exception instead of using
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
Hi @ueshin , do you have an idea on how to proceed here?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80180/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #80180 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80180/testReport)**
for PR 18664 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80168/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80167/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #80168 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80168/testReport)**
for PR 18664 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #80167 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80167/testReport)**
for PR 18664 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80169/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #80169 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80169/testReport)**
for PR 18664 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #80169 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80169/testReport)**
for PR 18664 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #80168 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80168/testReport)**
for PR 18664 at commit
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
Sorry @ueshin, I forgot to push the changes described in my last comment,
please take a look when you can.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #80167 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80167/testReport)**
for PR 18664 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80136/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #80136 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80136/testReport)**
for PR 18664 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18664
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80134/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #80134 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80134/testReport)**
for PR 18664 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #80136 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80136/testReport)**
for PR 18664 at commit
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
I merged your changes @ueshin , but having timezone as an Option this way
makes me a little nervous. It will be easy for people to omit it and in doing
so won't cause an immediate failure, but
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18664
**[Test build #80134 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80134/testReport)**
for PR 18664 at commit
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/18664
To Wes's concern, I think we are only dealing with values in UTC here, both
Spark and Arrow internally represents timestamp as microseconds since epoch.
To the two issues Bryan and
Github user wesm commented on the issue:
https://github.com/apache/spark/pull/18664
For item 2, in Arrow-land if the data is time zone aware, then it must be
internally normalized to UTC. Conversions are therefore metadata-only
operations and do not require any computation. The
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/18664
> I don't think Scala/Java Timestamp encoder has the same issue
Scala and Python handle Timestamps the same way, they both store internally
as time from `1970-01-01 00:00:00.0 UTC` and
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18664
I don't think Scala/Java Timestamp encoder has the same issue because
`java.sql.Timestamp` always has the timestamp value from `1970-01-01 00:00:00.0
UTC` regardless of timezone as the same as Spark
1 - 100 of 160 matches
Mail list logo