date:20190109

[GitHub] jzhuge commented on issue #165: Update 2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md

2019-01-09 Thread GitBox

jzhuge commented on issue #165: Update 
2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md
URL: https://github.com/apache/spark-website/pull/165#issuecomment-452937118
 
 
   Sure.
   
   On Wed, Jan 9, 2019 at 5:18 PM Sean Owen  wrote:
   
   > Oops, good catch @jzhuge  . Can you run jekyll
   > build locally to also update the HTML? if it's any trouble I can do it in
   > a separate PR.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > ,
   > or mute the thread
   > 

   > .
   >
   
   
   -- 
   John Zhuge
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Remove non-Tungsten mode in Spark 3?

2019-01-09 Thread Sean Owen

I haven't touched Tungsten, but have proposed removing the deprecated old
memory manager and settings -- yes I think that's the primary argument for
it.
https://github.com/apache/spark/pull/23457

On Wed, Jan 9, 2019 at 6:06 PM Erik Erlandson  wrote:

> Removing the user facing config seems like a good idea from the standpoint
> of reducing cognitive load, and documentation
>
> On Fri, Jan 4, 2019 at 7:03 AM Sean Owen  wrote:
>
>> OK, maybe leave in tungsten for 3.0.
>> I did a quick check, and removing StaticMemoryManager saves a few hundred
>> lines. It's used in MemoryStore tests internally though, and not a trivial
>> change to remove it. It's also used directly in HashedRelation. It could
>> still be worth removing it as a user-facing option to reduce confusion
>> about memory tuning, but it wouldn't take out much code. What do you all
>> think?
>>
>

[GitHub] srowen commented on issue #165: Update 2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md

2019-01-09 Thread GitBox

srowen commented on issue #165: Update 
2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md
URL: https://github.com/apache/spark-website/pull/165#issuecomment-452934048
 
 
   Oops, good catch @jzhuge . Can you run `jekyll build` locally to also update 
the HTML? if it's any trouble I can do it in a separate PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] SPARK 2.2.3 (RC1)

2019-01-09 Thread Sean Owen

Hm OK those other profiles should be unrelated. I'll see if I can
figure it out, but it's likely this is specific to the machine I am
testing on somehow.
For that reason, I'll say +1 on the basis that these tests really do pass.

On Wed, Jan 9, 2019 at 6:05 PM Dongjoon Hyun  wrote:
>
> I tested with Maven and `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive 
> -Phive-thriftserver` on CentOS/JDK8.
>
> The difference seems to be `-Pmesos -Psparkr` from your and `-Pkinesis-asl` 
> from mine.
>
> Do you think it's related? BTW, at least, we have a green balls on Jenkins.
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-maven-hadoop-2.7/591/
>
>
> On Wed, Jan 9, 2019 at 3:37 PM Sean Owen  wrote:
>>
>> BTW did you run with the same profiles, I wonder; I test with,
>> generally, -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos
>> -Psparkr
>>
>> I am checking mostly because none of that weird error would happen
>> without testing hive-thriftserver.
>>
>> The others are probably just flakiness or something else odd, and I'd
>> look past them if others are not seeing them.
>>
>> The licenses and signatures looked fine, and it built correctly, at least.
>>
>> On Wed, Jan 9, 2019 at 5:09 PM Dongjoon Hyun  wrote:
>> >
>> > Hi, Sean.
>> >
>> > It looks strange. I didn't hit them. I'm not sure but it looks like some 
>> > flakiness at 2.2.x era.
>> > For me, those test passes. (I ran twice before starting a vote and during 
>> > this voting from the source tar file)
>> >
>> > Bests,
>> > Dongjoon
>> >
>> > On Wed, Jan 9, 2019 at 1:42 PM Sean Owen  wrote:
>> >>
>> >> I wonder if anyone else is seeing the following issues, or whether
>> >> it's specific to my environment:
>> >>
>> >> With -Phive-thriftserver, it compiles fine. However during tests, I get 
>> >> ...
>> >> [error] 
>> >> /home/ubuntu/spark-2.2.3/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java:64:
>> >> error: package org.eclipse.jetty.server does not exist
>> >> [error]   protected org.eclipse.jetty.server.Server httpServer;
>> >> [error] ^
>> >>
>> >> That's weird. I'd have to dig into the POM to see if this dependency
>> >> for some reason would not be available at test time. But does this
>> >> profile pass for anyone else?
>> >>
>> >> I'm also seeing test failures like the following. Yes, there's more,
>> >> just seeing if anyone sees these?
>> >>
>> >> - event ordering *** FAILED ***
>> >>   The code passed to failAfter did not complete within 10 seconds.
>> >> (StreamingQueryListenerSuite.scala:411)
>> >>
>> >> - HDFSMetadataLog: metadata directory collision *** FAILED ***
>> >>   The await method on Waiter timed out. (HDFSMetadataLogSuite.scala:201)
>> >>
>> >> - recovery *** FAILED ***
>> >>   == Results ==
>> >>   !== Correct Answer - 1 ==   == Spark Answer - 0 ==
>> >>   !struct<_1:int,_2:int>  struct<>
>> >>   ![10,5]
>> >>
>> >>
>> >>
>> >> On Tue, Jan 8, 2019 at 1:14 PM Dongjoon Hyun  
>> >> wrote:
>> >> >
>> >> > Please vote on releasing the following candidate as Apache Spark 
>> >> > version 2.2.3.
>> >> >
>> >> > The vote is open until January 11 11:30AM (PST) and passes if a 
>> >> > majority +1 PMC votes are cast, with
>> >> > a minimum of 3 +1 votes.
>> >> >
>> >> > [ ] +1 Release this package as Apache Spark 2.2.3
>> >> > [ ] -1 Do not release this package because ...
>> >> >
>> >> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >> >
>> >> > The tag to be voted on is v2.2.3-rc1 (commit 
>> >> > 4acb6ba37b94b90aac445e6546426145a5f9eba2):
>> >> > https://github.com/apache/spark/tree/v2.2.3-rc1
>> >> >
>> >> > The release files, including signatures, digests, etc. can be found at:
>> >> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-bin/
>> >> >
>> >> > Signatures used for Spark RCs can be found in this file:
>> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >
>> >> > The staging repository for this release can be found at:
>> >> > https://repository.apache.org/content/repositories/orgapachespark-1295
>> >> >
>> >> > The documentation corresponding to this release can be found at:
>> >> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-docs/
>> >> >
>> >> > The list of bug fixes going into 2.2.3 can be found at the following 
>> >> > URL:
>> >> > https://issues.apache.org/jira/projects/SPARK/versions/12343560
>> >> >
>> >> > FAQ
>> >> >
>> >> > =
>> >> > How can I help test this release?
>> >> > =
>> >> >
>> >> > If you are a Spark user, you can help us test this release by taking
>> >> > an existing Spark workload and running on this release candidate, then
>> >> > reporting any regressions.
>> >> >
>> >> > If you're working in PySpark you can set up a virtual env and install
>> >> > the current RC and see if anything important breaks, in the Java/Scala
>> >> > you can add the staging

Re: DataSourceV2 community sync tonight

2019-01-09 Thread Wenchen Fan

There are 2 remaining problems in the write side API refactor

:
1. how to put the `queryId` parameter in the write API
2. how to put the streaming OutputMode parameter in the write API

I'd like to discuss it and hopefully we can get a consensus to move forward.

Thanks,
Wenchen

On Thu, Jan 10, 2019 at 2:40 AM Ryan Blue  wrote:

> Hi everyone,
>
> This is a quick reminder that there is a DSv2 community sync tonight at 5
> PM PST. These community syncs are open to anyone that wants to participate.
> If you’d like to be added to the invite, please send me a direct message.
>
> The main topic for this sync is the catalog API. To make discussion
> easier, I think we should separate the current discussion into a few
> orthogonal areas and discuss each:
>
>- Catalog API plugin system
>- Plan for migration to a new catalog API
>- Catalog API design approach using separate interfaces (TableCatalog,
>UDFCatalog, FunctionCatalog, etc.)
>- TableCatalog API proposal
>
> 
>  and
>implementation, PR #21306 
>(not the proposed user-facing API)
>
> If we have time, we can also talk about the user-facing API
> 
> proposed in the SPIP.
>
> Thanks,
>
> rb
> --
> Ryan Blue
> Software Engineer
> Netflix
>

[GitHub] jzhuge opened a new pull request #165: Update 2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md

2019-01-09 Thread GitBox

jzhuge opened a new pull request #165: Update 
2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md
URL: https://github.com/apache/spark-website/pull/165
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: SPARk-25299: Updates As Of December 19, 2018

2019-01-09 Thread Erik Erlandson

Curious how SPARK-25299 (where file tracking is pushed to spark drivers, at
least in option-5) interacts with Splash. The shuffle data location in
SPARK-25299 would now have additional "fallback" logic for recovering from
executor loss.

On Thu, Jan 3, 2019 at 6:24 AM Peter Rudenko 
wrote:

> Hi Matt, i'm a developer of SparkRDMA shuffle manager:
> https://github.com/Mellanox/SparkRDMA
> Thanks for your effort on improving Spark Shuffle API. We are very
> interested in participating in this. Have for now several comments:
> 1. Went through these 4 documents:
>
>
> https://docs.google.com/document/d/1tglSkfblFhugcjFXZOxuKsCdxfrHBXfxgTs-sbbNB3c/edit#
> 
>
>
> https://docs.google.com/document/d/1TA-gDw3ophy-gSu2IAW_5IMbRK_8pWBeXJwngN9YB80/edit
>
>
> https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40
>
>
> https://docs.google.com/document/d/1kSpbBB-sDk41LeORm3-Hfr-up98Ozm5wskvB49tUhSs/edit#
> 
> As i understood there's 2 discussions: improving shuffle manager API
> itself (Splash manager) and improving external shuffle service
>
> 
> 2. We may consider to revisiting SPIP: RDMA Accelerated Shuffle Engine
>  whether to support
> RDMA in the main codebase or at least as a first-class shuffle plugin
> (there are not much other open source shuffle plugins exists). We actively
> develop it, adding new features. RDMA is now available on Azure (
> https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/),
> Alibaba  and other cloud providers. For now we support only memory <->
> memory transfer, but rdma is extensible to NVM and GPU data transfer.
> 3. We have users that are interested in having this feature (
> https://issues.apache.org/jira/browse/SPARK-12196) - we can consider
> adding it to this new API.
>
> Let me know if you need help in review / testing / benchmark.
> I'll look more on documents and PR,
>
> Thanks,
> Peter Rudenko
> Software engineer at Mellanox Technologies.
>
>
> ср, 19 груд. 2018 о 20:54 John Zhuge  пише:
>
>> Matt, appreciate the update!
>>
>> On Wed, Dec 19, 2018 at 10:51 AM Matt Cheah  wrote:
>>
>>> Hi everyone,
>>>
>>>
>>>
>>> Earlier this year, we proposed SPARK-25299
>>> , proposing the idea
>>> of using other storage systems for persisting shuffle files. Since that
>>> time, we have been continuing to work on prototypes for this project. In
>>> the interest of increasing transparency into our work, we have created a 
>>> progress
>>> report document
>>> 
>>> where you may find a summary of the work we have been doing, as well as
>>> links to our prototypes on Github. We would ask that anyone who is very
>>> familiar with the inner workings of Spark’s shuffle could provide feedback
>>> and comments on our work thus far. We welcome any further discussion in
>>> this space. You may comment in this e-mail thread or by commenting on the
>>> progress report document.
>>>
>>>
>>>
>>> Looking forward to hearing from you. Thanks,
>>>
>>>
>>>
>>> -Matt Cheah
>>>
>>
>>
>> --
>> John
>>
>

Re: Remove non-Tungsten mode in Spark 3?

2019-01-09 Thread Erik Erlandson

Removing the user facing config seems like a good idea from the standpoint
of reducing cognitive load, and documentation

On Fri, Jan 4, 2019 at 7:03 AM Sean Owen  wrote:

> OK, maybe leave in tungsten for 3.0.
> I did a quick check, and removing StaticMemoryManager saves a few hundred
> lines. It's used in MemoryStore tests internally though, and not a trivial
> change to remove it. It's also used directly in HashedRelation. It could
> still be worth removing it as a user-facing option to reduce confusion
> about memory tuning, but it wouldn't take out much code. What do you all
> think?
>
> On Thu, Jan 3, 2019 at 9:41 PM Reynold Xin  wrote:
>
>> The issue with the offheap mode is it is a pretty big behavior change and
>> does require additional setup (also for users that run with UDFs that
>> allocate a lot of heap memory, it might not be as good).
>>
>> I can see us removing the legacy mode since it's been legacy for a long
>> time and perhaps very few users need it. How much code does it remove
>> though?
>>
>>
>> On Thu, Jan 03, 2019 at 2:55 PM, Sean Owen  wrote:
>>
>>> Just wondering if there is a good reason to keep around the pre-tungsten
>>> on-heap memory mode for Spark 3, and make spark.memory.offHeap.enabled
>>> always true? It would simplify the code somewhat, but I don't feel I'm so
>>> aware of the tradeoffs.
>>>
>>> I know we didn't deprecate it, but it's been off by default for a long
>>> time. It could be deprecated, too.
>>>
>>> Same question for spark.memory.useLegacyMode and all its various
>>> associated settings? Seems like these should go away at some point, and
>>> Spark 3 is a good point. Same issue about deprecation though.
>>>
>>> - To
>>> unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>
>>

Re: [VOTE] SPARK 2.2.3 (RC1)

2019-01-09 Thread Dongjoon Hyun

I tested with Maven and `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive
-Phive-thriftserver` on CentOS/JDK8.

The difference seems to be `-Pmesos -Psparkr` from your and `-Pkinesis-asl`
from mine.

Do you think it's related? BTW, at least, we have a green balls on Jenkins.

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-maven-hadoop-2.7/591/


On Wed, Jan 9, 2019 at 3:37 PM Sean Owen  wrote:

> BTW did you run with the same profiles, I wonder; I test with,
> generally, -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos
> -Psparkr
>
> I am checking mostly because none of that weird error would happen
> without testing hive-thriftserver.
>
> The others are probably just flakiness or something else odd, and I'd
> look past them if others are not seeing them.
>
> The licenses and signatures looked fine, and it built correctly, at least.
>
> On Wed, Jan 9, 2019 at 5:09 PM Dongjoon Hyun 
> wrote:
> >
> > Hi, Sean.
> >
> > It looks strange. I didn't hit them. I'm not sure but it looks like some
> flakiness at 2.2.x era.
> > For me, those test passes. (I ran twice before starting a vote and
> during this voting from the source tar file)
> >
> > Bests,
> > Dongjoon
> >
> > On Wed, Jan 9, 2019 at 1:42 PM Sean Owen  wrote:
> >>
> >> I wonder if anyone else is seeing the following issues, or whether
> >> it's specific to my environment:
> >>
> >> With -Phive-thriftserver, it compiles fine. However during tests, I get
> ...
> >> [error]
> /home/ubuntu/spark-2.2.3/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java:64:
> >> error: package org.eclipse.jetty.server does not exist
> >> [error]   protected org.eclipse.jetty.server.Server httpServer;
> >> [error] ^
> >>
> >> That's weird. I'd have to dig into the POM to see if this dependency
> >> for some reason would not be available at test time. But does this
> >> profile pass for anyone else?
> >>
> >> I'm also seeing test failures like the following. Yes, there's more,
> >> just seeing if anyone sees these?
> >>
> >> - event ordering *** FAILED ***
> >>   The code passed to failAfter did not complete within 10 seconds.
> >> (StreamingQueryListenerSuite.scala:411)
> >>
> >> - HDFSMetadataLog: metadata directory collision *** FAILED ***
> >>   The await method on Waiter timed out. (HDFSMetadataLogSuite.scala:201)
> >>
> >> - recovery *** FAILED ***
> >>   == Results ==
> >>   !== Correct Answer - 1 ==   == Spark Answer - 0 ==
> >>   !struct<_1:int,_2:int>  struct<>
> >>   ![10,5]
> >>
> >>
> >>
> >> On Tue, Jan 8, 2019 at 1:14 PM Dongjoon Hyun 
> wrote:
> >> >
> >> > Please vote on releasing the following candidate as Apache Spark
> version 2.2.3.
> >> >
> >> > The vote is open until January 11 11:30AM (PST) and passes if a
> majority +1 PMC votes are cast, with
> >> > a minimum of 3 +1 votes.
> >> >
> >> > [ ] +1 Release this package as Apache Spark 2.2.3
> >> > [ ] -1 Do not release this package because ...
> >> >
> >> > To learn more about Apache Spark, please see http://spark.apache.org/
> >> >
> >> > The tag to be voted on is v2.2.3-rc1 (commit
> 4acb6ba37b94b90aac445e6546426145a5f9eba2):
> >> > https://github.com/apache/spark/tree/v2.2.3-rc1
> >> >
> >> > The release files, including signatures, digests, etc. can be found
> at:
> >> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-bin/
> >> >
> >> > Signatures used for Spark RCs can be found in this file:
> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >> >
> >> > The staging repository for this release can be found at:
> >> >
> https://repository.apache.org/content/repositories/orgapachespark-1295
> >> >
> >> > The documentation corresponding to this release can be found at:
> >> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-docs/
> >> >
> >> > The list of bug fixes going into 2.2.3 can be found at the following
> URL:
> >> > https://issues.apache.org/jira/projects/SPARK/versions/12343560
> >> >
> >> > FAQ
> >> >
> >> > =
> >> > How can I help test this release?
> >> > =
> >> >
> >> > If you are a Spark user, you can help us test this release by taking
> >> > an existing Spark workload and running on this release candidate, then
> >> > reporting any regressions.
> >> >
> >> > If you're working in PySpark you can set up a virtual env and install
> >> > the current RC and see if anything important breaks, in the Java/Scala
> >> > you can add the staging repository to your projects resolvers and test
> >> > with the RC (make sure to clean up the artifact cache before/after so
> >> > you don't end up building with a out of date RC going forward).
> >> >
> >> > ===
> >> > What should happen to JIRA tickets still targeting 2.2.3?
> >> > ===
> >> >
> >> > The current list of open tickets targeted at 2.2.3 can be found at:
> >> >

Re: [VOTE] SPARK 2.2.3 (RC1)

2019-01-09 Thread Sean Owen

BTW did you run with the same profiles, I wonder; I test with,
generally, -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos
-Psparkr

I am checking mostly because none of that weird error would happen
without testing hive-thriftserver.

The others are probably just flakiness or something else odd, and I'd
look past them if others are not seeing them.

The licenses and signatures looked fine, and it built correctly, at least.

On Wed, Jan 9, 2019 at 5:09 PM Dongjoon Hyun  wrote:
>
> Hi, Sean.
>
> It looks strange. I didn't hit them. I'm not sure but it looks like some 
> flakiness at 2.2.x era.
> For me, those test passes. (I ran twice before starting a vote and during 
> this voting from the source tar file)
>
> Bests,
> Dongjoon
>
> On Wed, Jan 9, 2019 at 1:42 PM Sean Owen  wrote:
>>
>> I wonder if anyone else is seeing the following issues, or whether
>> it's specific to my environment:
>>
>> With -Phive-thriftserver, it compiles fine. However during tests, I get ...
>> [error] 
>> /home/ubuntu/spark-2.2.3/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java:64:
>> error: package org.eclipse.jetty.server does not exist
>> [error]   protected org.eclipse.jetty.server.Server httpServer;
>> [error] ^
>>
>> That's weird. I'd have to dig into the POM to see if this dependency
>> for some reason would not be available at test time. But does this
>> profile pass for anyone else?
>>
>> I'm also seeing test failures like the following. Yes, there's more,
>> just seeing if anyone sees these?
>>
>> - event ordering *** FAILED ***
>>   The code passed to failAfter did not complete within 10 seconds.
>> (StreamingQueryListenerSuite.scala:411)
>>
>> - HDFSMetadataLog: metadata directory collision *** FAILED ***
>>   The await method on Waiter timed out. (HDFSMetadataLogSuite.scala:201)
>>
>> - recovery *** FAILED ***
>>   == Results ==
>>   !== Correct Answer - 1 ==   == Spark Answer - 0 ==
>>   !struct<_1:int,_2:int>  struct<>
>>   ![10,5]
>>
>>
>>
>> On Tue, Jan 8, 2019 at 1:14 PM Dongjoon Hyun  wrote:
>> >
>> > Please vote on releasing the following candidate as Apache Spark version 
>> > 2.2.3.
>> >
>> > The vote is open until January 11 11:30AM (PST) and passes if a majority 
>> > +1 PMC votes are cast, with
>> > a minimum of 3 +1 votes.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.2.3
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.2.3-rc1 (commit 
>> > 4acb6ba37b94b90aac445e6546426145a5f9eba2):
>> > https://github.com/apache/spark/tree/v2.2.3-rc1
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-bin/
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1295
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-docs/
>> >
>> > The list of bug fixes going into 2.2.3 can be found at the following URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/12343560
>> >
>> > FAQ
>> >
>> > =
>> > How can I help test this release?
>> > =
>> >
>> > If you are a Spark user, you can help us test this release by taking
>> > an existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> > the current RC and see if anything important breaks, in the Java/Scala
>> > you can add the staging repository to your projects resolvers and test
>> > with the RC (make sure to clean up the artifact cache before/after so
>> > you don't end up building with a out of date RC going forward).
>> >
>> > ===
>> > What should happen to JIRA tickets still targeting 2.2.3?
>> > ===
>> >
>> > The current list of open tickets targeted at 2.2.3 can be found at:
>> > https://issues.apache.org/jira/projects/SPARK and search for "Target 
>> > Version/s" = 2.2.3
>> >
>> > Committers should look at those and triage. Extremely important bug
>> > fixes, documentation, and API tweaks that impact compatibility should
>> > be worked on immediately. Everything else please retarget to an
>> > appropriate release.
>> >
>> > ==
>> > But my bug isn't fixed?
>> > ==
>> >
>> > In order to make timely releases, we will typically not hold the
>> > release unless the bug in question is a regression from the previous
>> > release. That being said, if there is something which is a regression
>> >

Re: [VOTE] SPARK 2.2.3 (RC1)

2019-01-09 Thread Dongjoon Hyun

Hi, Sean.

It looks strange. I didn't hit them. I'm not sure but it looks like some
flakiness at 2.2.x era.
For me, those test passes. (I ran twice before starting a vote and during
this voting from the source tar file)

Bests,
Dongjoon

On Wed, Jan 9, 2019 at 1:42 PM Sean Owen  wrote:

> I wonder if anyone else is seeing the following issues, or whether
> it's specific to my environment:
>
> With -Phive-thriftserver, it compiles fine. However during tests, I get ...
> [error]
> /home/ubuntu/spark-2.2.3/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java:64:
> error: package org.eclipse.jetty.server does not exist
> [error]   protected org.eclipse.jetty.server.Server httpServer;
> [error] ^
>
> That's weird. I'd have to dig into the POM to see if this dependency
> for some reason would not be available at test time. But does this
> profile pass for anyone else?
>
> I'm also seeing test failures like the following. Yes, there's more,
> just seeing if anyone sees these?
>
> - event ordering *** FAILED ***
>   The code passed to failAfter did not complete within 10 seconds.
> (StreamingQueryListenerSuite.scala:411)
>
> - HDFSMetadataLog: metadata directory collision *** FAILED ***
>   The await method on Waiter timed out. (HDFSMetadataLogSuite.scala:201)
>
> - recovery *** FAILED ***
>   == Results ==
>   !== Correct Answer - 1 ==   == Spark Answer - 0 ==
>   !struct<_1:int,_2:int>  struct<>
>   ![10,5]
>
>
>
> On Tue, Jan 8, 2019 at 1:14 PM Dongjoon Hyun 
> wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 2.2.3.
> >
> > The vote is open until January 11 11:30AM (PST) and passes if a majority
> +1 PMC votes are cast, with
> > a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 2.2.3
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v2.2.3-rc1 (commit
> 4acb6ba37b94b90aac445e6546426145a5f9eba2):
> > https://github.com/apache/spark/tree/v2.2.3-rc1
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1295
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-docs/
> >
> > The list of bug fixes going into 2.2.3 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12343560
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 2.2.3?
> > ===
> >
> > The current list of open tickets targeted at 2.2.3 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.2.3
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [VOTE] SPARK 2.2.3 (RC1)

2019-01-09 Thread Sean Owen

I wonder if anyone else is seeing the following issues, or whether
it's specific to my environment:

With -Phive-thriftserver, it compiles fine. However during tests, I get ...
[error] 
/home/ubuntu/spark-2.2.3/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java:64:
error: package org.eclipse.jetty.server does not exist
[error]   protected org.eclipse.jetty.server.Server httpServer;
[error] ^

That's weird. I'd have to dig into the POM to see if this dependency
for some reason would not be available at test time. But does this
profile pass for anyone else?

I'm also seeing test failures like the following. Yes, there's more,
just seeing if anyone sees these?

- event ordering *** FAILED ***
  The code passed to failAfter did not complete within 10 seconds.
(StreamingQueryListenerSuite.scala:411)

- HDFSMetadataLog: metadata directory collision *** FAILED ***
  The await method on Waiter timed out. (HDFSMetadataLogSuite.scala:201)

- recovery *** FAILED ***
  == Results ==
  !== Correct Answer - 1 ==   == Spark Answer - 0 ==
  !struct<_1:int,_2:int>  struct<>
  ![10,5]



On Tue, Jan 8, 2019 at 1:14 PM Dongjoon Hyun  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.2.3.
>
> The vote is open until January 11 11:30AM (PST) and passes if a majority +1 
> PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.2.3
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.2.3-rc1 (commit 
> 4acb6ba37b94b90aac445e6546426145a5f9eba2):
> https://github.com/apache/spark/tree/v2.2.3-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1295
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-docs/
>
> The list of bug fixes going into 2.2.3 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12343560
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.2.3?
> ===
>
> The current list of open tickets targeted at 2.2.3 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.2.3
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

DataSourceV2 community sync tonight

2019-01-09 Thread Ryan Blue

Hi everyone,

This is a quick reminder that there is a DSv2 community sync tonight at 5
PM PST. These community syncs are open to anyone that wants to participate.
If you’d like to be added to the invite, please send me a direct message.

The main topic for this sync is the catalog API. To make discussion easier,
I think we should separate the current discussion into a few orthogonal
areas and discuss each:

   - Catalog API plugin system
   - Plan for migration to a new catalog API
   - Catalog API design approach using separate interfaces (TableCatalog,
   UDFCatalog, FunctionCatalog, etc.)
   - TableCatalog API proposal
   

and
   implementation, PR #21306 
   (not the proposed user-facing API)

If we have time, we can also talk about the user-facing API

proposed in the SPIP.

Thanks,

rb
-- 
Ryan Blue
Software Engineer
Netflix

Re: [VOTE] SPARK 2.2.3 (RC1)

2019-01-09 Thread Denny Lee

+1


On Wed, Jan 9, 2019 at 4:30 AM Dongjoon Hyun 
wrote:

> +1
>
> Bests,
> Dongjoon.
>
> On Tue, Jan 8, 2019 at 6:30 PM Wenchen Fan  wrote:
>
>> +1
>>
>> On Wed, Jan 9, 2019 at 3:37 AM DB Tsai  wrote:
>>
>>> +1
>>>
>>> Sincerely,
>>>
>>> DB Tsai
>>> --
>>> Web: https://www.dbtsai.com
>>> PGP Key ID: 0x5CED8B896A6BDFA0
>>>
>>> On Tue, Jan 8, 2019 at 11:14 AM Dongjoon Hyun 
>>> wrote:
>>> >
>>> > Please vote on releasing the following candidate as Apache Spark
>>> version 2.2.3.
>>> >
>>> > The vote is open until January 11 11:30AM (PST) and passes if a
>>> majority +1 PMC votes are cast, with
>>> > a minimum of 3 +1 votes.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 2.2.3
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more about Apache Spark, please see http://spark.apache.org/
>>> >
>>> > The tag to be voted on is v2.2.3-rc1 (commit
>>> 4acb6ba37b94b90aac445e6546426145a5f9eba2):
>>> > https://github.com/apache/spark/tree/v2.2.3-rc1
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-bin/
>>> >
>>> > Signatures used for Spark RCs can be found in this file:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >
>>> > The staging repository for this release can be found at:
>>> > https://repository.apache.org/content/repositories/orgapachespark-1295
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-docs/
>>> >
>>> > The list of bug fixes going into 2.2.3 can be found at the following
>>> URL:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/12343560
>>> >
>>> > FAQ
>>> >
>>> > =
>>> > How can I help test this release?
>>> > =
>>> >
>>> > If you are a Spark user, you can help us test this release by taking
>>> > an existing Spark workload and running on this release candidate, then
>>> > reporting any regressions.
>>> >
>>> > If you're working in PySpark you can set up a virtual env and install
>>> > the current RC and see if anything important breaks, in the Java/Scala
>>> > you can add the staging repository to your projects resolvers and test
>>> > with the RC (make sure to clean up the artifact cache before/after so
>>> > you don't end up building with a out of date RC going forward).
>>> >
>>> > ===
>>> > What should happen to JIRA tickets still targeting 2.2.3?
>>> > ===
>>> >
>>> > The current list of open tickets targeted at 2.2.3 can be found at:
>>> > https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 2.2.3
>>> >
>>> > Committers should look at those and triage. Extremely important bug
>>> > fixes, documentation, and API tweaks that impact compatibility should
>>> > be worked on immediately. Everything else please retarget to an
>>> > appropriate release.
>>> >
>>> > ==
>>> > But my bug isn't fixed?
>>> > ==
>>> >
>>> > In order to make timely releases, we will typically not hold the
>>> > release unless the bug in question is a regression from the previous
>>> > release. That being said, if there is something which is a regression
>>> > that has not been correctly targeted please ping me or a committer to
>>> > help target the issue.
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>

Re: [VOTE] SPARK 2.2.3 (RC1)

2019-01-09 Thread Dongjoon Hyun

+1

Bests,
Dongjoon.

On Tue, Jan 8, 2019 at 6:30 PM Wenchen Fan  wrote:

> +1
>
> On Wed, Jan 9, 2019 at 3:37 AM DB Tsai  wrote:
>
>> +1
>>
>> Sincerely,
>>
>> DB Tsai
>> --
>> Web: https://www.dbtsai.com
>> PGP Key ID: 0x5CED8B896A6BDFA0
>>
>> On Tue, Jan 8, 2019 at 11:14 AM Dongjoon Hyun 
>> wrote:
>> >
>> > Please vote on releasing the following candidate as Apache Spark
>> version 2.2.3.
>> >
>> > The vote is open until January 11 11:30AM (PST) and passes if a
>> majority +1 PMC votes are cast, with
>> > a minimum of 3 +1 votes.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.2.3
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.2.3-rc1 (commit
>> 4acb6ba37b94b90aac445e6546426145a5f9eba2):
>> > https://github.com/apache/spark/tree/v2.2.3-rc1
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-bin/
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1295
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.2.3-rc1-docs/
>> >
>> > The list of bug fixes going into 2.2.3 can be found at the following
>> URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/12343560
>> >
>> > FAQ
>> >
>> > =
>> > How can I help test this release?
>> > =
>> >
>> > If you are a Spark user, you can help us test this release by taking
>> > an existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> > the current RC and see if anything important breaks, in the Java/Scala
>> > you can add the staging repository to your projects resolvers and test
>> > with the RC (make sure to clean up the artifact cache before/after so
>> > you don't end up building with a out of date RC going forward).
>> >
>> > ===
>> > What should happen to JIRA tickets still targeting 2.2.3?
>> > ===
>> >
>> > The current list of open tickets targeted at 2.2.3 can be found at:
>> > https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 2.2.3
>> >
>> > Committers should look at those and triage. Extremely important bug
>> > fixes, documentation, and API tweaks that impact compatibility should
>> > be worked on immediately. Everything else please retarget to an
>> > appropriate release.
>> >
>> > ==
>> > But my bug isn't fixed?
>> > ==
>> >
>> > In order to make timely releases, we will typically not hold the
>> > release unless the bug in question is a regression from the previous
>> > release. That being said, if there is something which is a regression
>> > that has not been correctly targeted please ping me or a committer to
>> > help target the issue.
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-09 Thread Marco Gaido

Jörn, may you explain a bit more your proposal, please? We are not
modifying the existing decimal datatype. This is how it works now. If you
check the PR, the only difference is how we compute the result for the
divsion operation. The discussion about precision and scale is about: shall
we limit them more then we are doing now? Now we are supporting any scale
<= precision and any precision in the range (1, 38].

Il giorno mer 9 gen 2019 alle ore 09:13 Jörn Franke 
ha scritto:

> Maybe it is better to introduce a new datatype that supports negative
> scale, otherwise the migration and testing efforts for organizations
> running Spark application becomes too large. Of course the current decimal
> will be kept as it is.
>
> Am 07.01.2019 um 15:08 schrieb Marco Gaido :
>
> In general we can say that some datasources allow them, others fail. At
> the moment, we are doing no casting before writing (so we can state so in
> the doc). But since there is ongoing discussion for DSv2, we can maybe add
> a flag/interface there for "negative scale intollerant" DS and try and cast
> before writing to them. What do you think about this?
>
> Il giorno lun 7 gen 2019 alle ore 15:03 Wenchen Fan 
> ha scritto:
>
>> AFAIK parquet spec says decimal scale can't be negative. If we want to
>> officially support negative-scale decimal, we should clearly define the
>> behavior when writing negative-scale decimals to parquet and other data
>> sources. The most straightforward way is to fail for this case, but maybe
>> we can do something better, like casting decimal(1, -20) to decimal(20, 0)
>> before writing.
>>
>> On Mon, Jan 7, 2019 at 9:32 PM Marco Gaido 
>> wrote:
>>
>>> Hi Wenchen,
>>>
>>> thanks for your email. I agree adding doc for decimal type, but I am not
>>> sure what you mean speaking of the behavior when writing: we are not
>>> performing any automatic casting before writing; if we want to do that, we
>>> need a design about it I think.
>>>
>>> I am not sure if it makes sense to set a min for it. That would break
>>> backward compatibility (for very weird use case), so I wouldn't do that.
>>>
>>> Thanks,
>>> Marco
>>>
>>> Il giorno lun 7 gen 2019 alle ore 05:53 Wenchen Fan 
>>> ha scritto:
>>>
 I think we need to do this for backward compatibility, and according to
 the discussion in the doc, SQL standard allows negative scale.

 To do this, I think the PR should also include a doc for the decimal
 type, like the definition of precision and scale(this one
 
 looks pretty good), and the result type of decimal operations, and the
 behavior when writing out decimals(e.g. we can cast decimal(1, -20) to
 decimal(20, 0) before writing).

 Another question is, shall we set a min scale? e.g. shall we allow
 decimal(1, -1000)?

 On Thu, Oct 25, 2018 at 9:49 PM Marco Gaido 
 wrote:

> Hi all,
>
> a bit more than one month ago, I sent a proposal for handling properly
> decimals with negative scales in our operations. This is a long standing
> problem in our codebase as we derived our rules from Hive and SQLServer
> where negative scales are forbidden, while in Spark they are not.
>
> The discussion has been stale for a while now. No more comments on the
> design doc:
> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit#heading=h.x7062zmkubwm
> .
>
> So I am writing this e-mail in order to check whether there are more
> comments on it or we can go ahead with the PR.
>
> Thanks,
> Marco
>

Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-09 Thread Jörn Franke

Maybe it is better to introduce a new datatype that supports negative scale, 
otherwise the migration and testing efforts for organizations running Spark 
application becomes too large. Of course the current decimal will be kept as it 
is.

> Am 07.01.2019 um 15:08 schrieb Marco Gaido :
> 
> In general we can say that some datasources allow them, others fail. At the 
> moment, we are doing no casting before writing (so we can state so in the 
> doc). But since there is ongoing discussion for DSv2, we can maybe add a 
> flag/interface there for "negative scale intollerant" DS and try and cast 
> before writing to them. What do you think about this?
> 
>> Il giorno lun 7 gen 2019 alle ore 15:03 Wenchen Fan  ha 
>> scritto:
>> AFAIK parquet spec says decimal scale can't be negative. If we want to 
>> officially support negative-scale decimal, we should clearly define the 
>> behavior when writing negative-scale decimals to parquet and other data 
>> sources. The most straightforward way is to fail for this case, but maybe we 
>> can do something better, like casting decimal(1, -20) to decimal(20, 0) 
>> before writing.
>> 
>>> On Mon, Jan 7, 2019 at 9:32 PM Marco Gaido  wrote:
>>> Hi Wenchen,
>>> 
>>> thanks for your email. I agree adding doc for decimal type, but I am not 
>>> sure what you mean speaking of the behavior when writing: we are not 
>>> performing any automatic casting before writing; if we want to do that, we 
>>> need a design about it I think.
>>> 
>>> I am not sure if it makes sense to set a min for it. That would break 
>>> backward compatibility (for very weird use case), so I wouldn't do that.
>>> 
>>> Thanks,
>>> Marco
>>> 
 Il giorno lun 7 gen 2019 alle ore 05:53 Wenchen Fan  
 ha scritto:
 I think we need to do this for backward compatibility, and according to 
 the discussion in the doc, SQL standard allows negative scale.
 
 To do this, I think the PR should also include a doc for the decimal type, 
 like the definition of precision and scale(this one looks pretty good), 
 and the result type of decimal operations, and the behavior when writing 
 out decimals(e.g. we can cast decimal(1, -20) to decimal(20, 0) before 
 writing).
 
 Another question is, shall we set a min scale? e.g. shall we allow 
 decimal(1, -1000)?
 
> On Thu, Oct 25, 2018 at 9:49 PM Marco Gaido  
> wrote:
> Hi all,
> 
> a bit more than one month ago, I sent a proposal for handling properly 
> decimals with negative scales in our operations. This is a long standing 
> problem in our codebase as we derived our rules from Hive and SQLServer 
> where negative scales are forbidden, while in Spark they are not.
> 
> The discussion has been stale for a while now. No more comments on the 
> design doc: 
> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit#heading=h.x7062zmkubwm.
> 
> So I am writing this e-mail in order to check whether there are more 
> comments on it or we can go ahead with the PR.
> 
> Thanks,
> Marco

Re: [DISCUSS] Support decimals with negative scale in decimal operation

2019-01-09 Thread Marco Gaido

Oracle does the same: "The *scale* must be less than or equal to the
precision." (see
https://docs.oracle.com/javadb/10.6.2.1/ref/rrefsqlj15260.html).

Il giorno mer 9 gen 2019 alle ore 05:31 Wenchen Fan 
ha scritto:

> Some more thoughts. If we support unlimited negative scale, why can't we
> support unlimited positive scale? e.g. 0.0001 can be decimal(1, 4) instead
> of (4, 4). I think we need more references here: how other databases deal
> with decimal type and parse decimal literals?
>
> On Mon, Jan 7, 2019 at 10:36 PM Wenchen Fan  wrote:
>
>> I'm OK with it, i.e. fail the write if there are negative-scale decimals
>> (we need to document it though). We can improve it later in data source v2.
>>
>> On Mon, Jan 7, 2019 at 10:09 PM Marco Gaido 
>> wrote:
>>
>>> In general we can say that some datasources allow them, others fail. At
>>> the moment, we are doing no casting before writing (so we can state so in
>>> the doc). But since there is ongoing discussion for DSv2, we can maybe add
>>> a flag/interface there for "negative scale intollerant" DS and try and cast
>>> before writing to them. What do you think about this?
>>>
>>> Il giorno lun 7 gen 2019 alle ore 15:03 Wenchen Fan 
>>> ha scritto:
>>>
 AFAIK parquet spec says decimal scale can't be negative. If we want to
 officially support negative-scale decimal, we should clearly define the
 behavior when writing negative-scale decimals to parquet and other data
 sources. The most straightforward way is to fail for this case, but maybe
 we can do something better, like casting decimal(1, -20) to decimal(20, 0)
 before writing.

 On Mon, Jan 7, 2019 at 9:32 PM Marco Gaido 
 wrote:

> Hi Wenchen,
>
> thanks for your email. I agree adding doc for decimal type, but I am
> not sure what you mean speaking of the behavior when writing: we are not
> performing any automatic casting before writing; if we want to do that, we
> need a design about it I think.
>
> I am not sure if it makes sense to set a min for it. That would break
> backward compatibility (for very weird use case), so I wouldn't do that.
>
> Thanks,
> Marco
>
> Il giorno lun 7 gen 2019 alle ore 05:53 Wenchen Fan <
> cloud0...@gmail.com> ha scritto:
>
>> I think we need to do this for backward compatibility, and according
>> to the discussion in the doc, SQL standard allows negative scale.
>>
>> To do this, I think the PR should also include a doc for the decimal
>> type, like the definition of precision and scale(this one
>> 
>> looks pretty good), and the result type of decimal operations, and the
>> behavior when writing out decimals(e.g. we can cast decimal(1, -20) to
>> decimal(20, 0) before writing).
>>
>> Another question is, shall we set a min scale? e.g. shall we allow
>> decimal(1, -1000)?
>>
>> On Thu, Oct 25, 2018 at 9:49 PM Marco Gaido 
>> wrote:
>>
>>> Hi all,
>>>
>>> a bit more than one month ago, I sent a proposal for handling
>>> properly decimals with negative scales in our operations. This is a long
>>> standing problem in our codebase as we derived our rules from Hive and
>>> SQLServer where negative scales are forbidden, while in Spark they are 
>>> not.
>>>
>>> The discussion has been stale for a while now. No more comments on
>>> the design doc:
>>> https://docs.google.com/document/d/17ScbMXJ83bO9lx8hB_jeJCSryhT9O_HDEcixDq0qmPk/edit#heading=h.x7062zmkubwm
>>> .
>>>
>>> So I am writing this e-mail in order to check whether there are more
>>> comments on it or we can go ahead with the PR.
>>>
>>> Thanks,
>>> Marco
>>>
>>

[GitHub] jzhuge commented on issue #165: Update 2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md

Re: Remove non-Tungsten mode in Spark 3?

[GitHub] srowen commented on issue #165: Update 2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md

Re: [VOTE] SPARK 2.2.3 (RC1)

Re: DataSourceV2 community sync tonight

[GitHub] jzhuge opened a new pull request #165: Update 2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md

Re: SPARk-25299: Updates As Of December 19, 2018

Re: Remove non-Tungsten mode in Spark 3?

Re: [VOTE] SPARK 2.2.3 (RC1)

Re: [VOTE] SPARK 2.2.3 (RC1)

Re: [VOTE] SPARK 2.2.3 (RC1)

Re: [VOTE] SPARK 2.2.3 (RC1)

DataSourceV2 community sync tonight

Re: [VOTE] SPARK 2.2.3 (RC1)

Re: [VOTE] SPARK 2.2.3 (RC1)

Re: [DISCUSS] Support decimals with negative scale in decimal operation

Re: [DISCUSS] Support decimals with negative scale in decimal operation

Re: [DISCUSS] Support decimals with negative scale in decimal operation

18 matches

Site Navigation

Mail list logo

Footer information