Re: Spark Packaging Jenkins
https://issues.apache.org/jira/browse/SPARK-26537 On Fri, Jan 4, 2019 at 11:31 AM shane knapp wrote: > this may push in to early next week... these builds were set up before my > time, and i'm currently unraveling how they all work before pushing a > commit to fix stuff. > > nothing like some code archaeology to make my friday more exciting! :) > > shane > > On Fri, Jan 4, 2019 at 11:08 AM Dongjoon Hyun > wrote: > >> Thank you, Shane! >> >> Bests, >> Dongjoon. >> >> On Fri, Jan 4, 2019 at 10:50 AM shane knapp wrote: >> >>> yeah, i'll get on that today. thanks for the heads up. >>> >>> On Fri, Jan 4, 2019 at 10:46 AM Dongjoon Hyun >>> wrote: >>> Hi, All As a part of release process, we need to check Packaging/Compile/Test Jenkins status. http://spark.apache.org/release-process.html 1. Spark Packaging: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/ 2. Spark QA Compile: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/ 3. Spark QA Test: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/ Currently, (2) and (3) are working because it uses GitHub ( https://github.com/apache/spark.git). But, (1) seems to be broken because it's looking for old repo( https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead of new GitBox. Can we fix this in this week? Bests, Dongjoon. >>> >>> -- >>> Shane Knapp >>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>> https://rise.cs.berkeley.edu >>> >> > > -- > Shane Knapp > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu > -- Shane Knapp UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu
Re: Spark Packaging Jenkins
this may push in to early next week... these builds were set up before my time, and i'm currently unraveling how they all work before pushing a commit to fix stuff. nothing like some code archaeology to make my friday more exciting! :) shane On Fri, Jan 4, 2019 at 11:08 AM Dongjoon Hyun wrote: > Thank you, Shane! > > Bests, > Dongjoon. > > On Fri, Jan 4, 2019 at 10:50 AM shane knapp wrote: > >> yeah, i'll get on that today. thanks for the heads up. >> >> On Fri, Jan 4, 2019 at 10:46 AM Dongjoon Hyun >> wrote: >> >>> Hi, All >>> >>> As a part of release process, we need to check Packaging/Compile/Test >>> Jenkins status. >>> >>> http://spark.apache.org/release-process.html >>> >>> 1. Spark Packaging: >>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/ >>> 2. Spark QA Compile: >>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/ >>> 3. Spark QA Test: >>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/ >>> >>> Currently, (2) and (3) are working because it uses GitHub ( >>> https://github.com/apache/spark.git). >>> But, (1) seems to be broken because it's looking for old repo( >>> https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead of >>> new GitBox. >>> >>> Can we fix this in this week? >>> >>> Bests, >>> Dongjoon. >>> >>> >> >> -- >> Shane Knapp >> UC Berkeley EECS Research / RISELab Staff Technical Lead >> https://rise.cs.berkeley.edu >> > -- Shane Knapp UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu
Re: Spark Packaging Jenkins
Thank you, Shane! Bests, Dongjoon. On Fri, Jan 4, 2019 at 10:50 AM shane knapp wrote: > yeah, i'll get on that today. thanks for the heads up. > > On Fri, Jan 4, 2019 at 10:46 AM Dongjoon Hyun > wrote: > >> Hi, All >> >> As a part of release process, we need to check Packaging/Compile/Test >> Jenkins status. >> >> http://spark.apache.org/release-process.html >> >> 1. Spark Packaging: >> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/ >> 2. Spark QA Compile: >> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/ >> 3. Spark QA Test: >> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/ >> >> Currently, (2) and (3) are working because it uses GitHub ( >> https://github.com/apache/spark.git). >> But, (1) seems to be broken because it's looking for old repo( >> https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead of >> new GitBox. >> >> Can we fix this in this week? >> >> Bests, >> Dongjoon. >> >> > > -- > Shane Knapp > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu >
Re: Spark Packaging Jenkins
yeah, i'll get on that today. thanks for the heads up. On Fri, Jan 4, 2019 at 10:46 AM Dongjoon Hyun wrote: > Hi, All > > As a part of release process, we need to check Packaging/Compile/Test > Jenkins status. > > http://spark.apache.org/release-process.html > > 1. Spark Packaging: > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/ > 2. Spark QA Compile: > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/ > 3. Spark QA Test: > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/ > > Currently, (2) and (3) are working because it uses GitHub ( > https://github.com/apache/spark.git). > But, (1) seems to be broken because it's looking for old repo( > https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead of > new GitBox. > > Can we fix this in this week? > > Bests, > Dongjoon. > > -- Shane Knapp UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu
Spark Packaging Jenkins
Hi, All As a part of release process, we need to check Packaging/Compile/Test Jenkins status. http://spark.apache.org/release-process.html 1. Spark Packaging: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/ 2. Spark QA Compile: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/ 3. Spark QA Test: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/ Currently, (2) and (3) are working because it uses GitHub ( https://github.com/apache/spark.git). But, (1) seems to be broken because it's looking for old repo( https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead of new GitBox. Can we fix this in this week? Bests, Dongjoon.
Re: Spark History UI + Keycloak Integration
On Fri, Jan 4, 2019 at 3:25 AM G, Ajay (Nokia - IN/Bangalore) wrote: ... > Added session handler for all context - > contextHandler.setSessionHandler(new SessionHandler()) ... > Keycloak authentication seems to work, Is this the right approach ? If it is > fine I can submit a PR. I don't remember many details about servlet session management, and whether it can be enabled some other way, but that seems ok. I'd just make it a new config, since otherwise Spark doesn't need the extra overhead. -- Marcelo - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: [DISCUSS] Handling correctness/data loss jiras
Committers, When you merge tickets fixing correctness bugs, please make sure you tag the tickets with "correctness" label. I've found multiple tickets today that didn't do that. On Fri, Aug 17, 2018 at 7:11 AM, Tom Graves < tgraves...@yahoo.com.invalid > wrote: > > Since we haven't heard any objections to this, the documentation has been > updated (Thanks to Sean). > > > All devs please make sure to re-read: http:/ / spark. apache. org/ > contributing. > html ( http://spark.apache.org/contributing.html ) . > > > Note the set of labels used in Jira has been documented and correctness or > data loss issues should be marked as blocker by default. There is also a > label to mark the jira as having something needing to go into the > release-notes. > > > > > Tom > > > On Tuesday, August 14, 2018, 3:32:27 PM CDT, Imran Rashid < irashid@ cloudera. > com. INVALID ( iras...@cloudera.com.INVALID ) > wrote: > > > > > +1 on what we should do. > > > On Mon, Aug 13, 2018 at 3:06 PM, Tom Graves < tgraves_cs@ yahoo. com. invalid > ( tgraves...@yahoo.com.invalid ) > wrote: > >> >> >> >> > I mean, what are concrete steps beyond saying this is a problem? That's >> the important thing to discuss. >> >> >> Sorry I'm a bit confused by your statement but also think I agree. I >> started this thread for this reason. I pointed out that I thought it was a >> problem and also brought up things I thought we could do to help fix it. >> >> >> >> Maybe I wasn't clear in the first email, the list of things I had were >> proposals on what we do for a jira that is for a correctness/data loss >> issue. Its the committers and developers that are involved in this though >> so if people don't agree or aren't going to do them, then it doesn't work. >> >> >> >> Just to restate what I think we should do: >> >> >> - label any correctness/data loss jira with "correctness" >> - jira should be marked as a blocker by default if someone suspects a >> corruption/loss issue >> - Make sure the description is clear about when it occurs and impact to >> the user. >> - ensure its back ported to all active branches >> - See if we can have a separate section in the release notes for these >> >> >> The last one I guess is more a one time thing that i can file a jira for. >> The first 4 would be done for each jira filed. >> >> >> I'm proposing we do these things and as such if people agree we would also >> document those things in the committers or developers guide and send email >> to the list. >> >> >> >> >> >> Tom >> On Monday, August 13, 2018, 11:17:22 AM CDT, Sean Owen < srowen@ apache. org >> ( sro...@apache.org ) > wrote: >> >> >> >> >> Generally: if someone thinks correctness fix X should be backported >> further, I'd say just do it, if it's to an active release branch (see >> below). Anything that important has to outweigh most any other concern, >> like behavior changes. >> >> >> On Mon, Aug 13, 2018 at 11:08 AM Tom Graves < tgraves_cs@ yahoo. com ( >> tgraves...@yahoo.com ) > wrote: >> >>> I'm not really sure what you mean by this, this proposal is to introduce a >>> process for this type of issue so its at least brought to peoples >>> attention. We can't do anything to make people work on certain things. If >>> they aren't raised as important issues then its really easy to miss these >>> things. If its a blocker we should also not be doing any new releases >>> without a fix for it which may motivate people to look at it. >>> >> >> >> >> I mean, what are concrete steps beyond saying this is a problem? That's >> the important thing to discuss. >> >> >> There's a good one here: let's say anything that's likely to be a >> correctness or data loss issue should automatically be labeled >> 'correctness' as such and set to Blocker. >> >> >> That can go into the how-to-contribute manual in the docs and in a note to >> dev@. >> >> >> >>> >>> I agree it would be good for us to make it more official about which >>> branches are being maintained. I think at this point its still 2.1.x, >>> 2.2.x, and 2.3.x since we recently did releases of all of these. Since >>> 2.4 will be coming out we should definitely think about stop maintaining >>> 2.1.x. Perhaps we need a table on our release page about this. But this >>> should be a separate thread. >>> >>> >>> >> >> >> >> I propose writing something like this in the 'versioning' doc page, to at >> least establish a policy: >> >> >> Minor release branches will, generally, be maintained with bug fixes >> releases for a period of 18 months. For example, branch 2.1.x is no longer >> considered maintained as of July 2018, 18 months after the release of >> 2.1.0 in December 2106. >> >> >> This gives us -- and more importantly users -- some understanding of what >> to expect for backporting and fixes. >> >> >> >> >> I am going to revive the thread about adding PMC / committers as it's >> overdue. That may not do much, but, more
Re: Remove non-Tungsten mode in Spark 3?
OK, maybe leave in tungsten for 3.0. I did a quick check, and removing StaticMemoryManager saves a few hundred lines. It's used in MemoryStore tests internally though, and not a trivial change to remove it. It's also used directly in HashedRelation. It could still be worth removing it as a user-facing option to reduce confusion about memory tuning, but it wouldn't take out much code. What do you all think? On Thu, Jan 3, 2019 at 9:41 PM Reynold Xin wrote: > The issue with the offheap mode is it is a pretty big behavior change and > does require additional setup (also for users that run with UDFs that > allocate a lot of heap memory, it might not be as good). > > I can see us removing the legacy mode since it's been legacy for a long > time and perhaps very few users need it. How much code does it remove > though? > > > On Thu, Jan 03, 2019 at 2:55 PM, Sean Owen wrote: > >> Just wondering if there is a good reason to keep around the pre-tungsten >> on-heap memory mode for Spark 3, and make spark.memory.offHeap.enabled >> always true? It would simplify the code somewhat, but I don't feel I'm so >> aware of the tradeoffs. >> >> I know we didn't deprecate it, but it's been off by default for a long >> time. It could be deprecated, too. >> >> Same question for spark.memory.useLegacyMode and all its various >> associated settings? Seems like these should go away at some point, and >> Spark 3 is a good point. Same issue about deprecation though. >> >> - To >> unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > >
Logging exception of custom spark app
Hi everyone, I have an application that uses spark to perform some computation. It can be used both in a spark-shell or in a spark-submit. I want to log all exceptions throw by my code inside a file in order to have some detailed info when user have an error. I tried with this Thread.currentThread().setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() { def uncaughtException(t: Thread, e: Throwable): Unit = { logger.error("exception logged") println("exception logged") } }) but it is not working. I saw that Spark already sets an uncaughtExceptionHandler, so probably this code is not effective. The other option would be to try-catch all public methods of my API, log an exception when it happens and then throw it. But I think this is not optimal. Do you have any suggestion? *Alessandro Liparoti*
Spark History UI + Keycloak Integration
Hello, We were trying to enable spark-history UI authentication through keycloak. >From the spark documentation we found out that we can use javax filters to enable the UI authentication. Keycloak already provides a java keycloak-servlet-filter-adapter which can be used. I have added the following configuration in spark-defaults.conf spark.ui.filters org.keycloak.adapters.servlet.KeycloakOIDCFilter spark.org.keycloak.adapters.servlet.KeycloakOIDCFilter.param.keycloak.config.file /home/ag/spark-2.4.0-bin-2.7.3/conf/keycloak.json I was facing the below issue while running java.lang.IllegalStateException: No SessionManager at org.spark_project.jetty.server.Request.getSession(Request.java:1544) at org.keycloak.adapters.servlet.FilterSessionStore.saveRequest(FilterSessionStore.java:374) This was because none of the ServletContext in spark-history has sessionManangment. I have made the below changes 1. Added Session id manager in JettyUtils.scala server.setSessionIdManager(new HashSessionIdManager()) 1. Added session handler for all context - contextHandler.setSessionHandler(new SessionHandler()) in * JettyUtils.scala - at createServletHandler, createStaticHandler and createProxyHandler * HistoryServer.scala - at initialize for /history context * ApiRootResource.scala - at getServletHandler for /api context. 1. Placed required Keycloak runtime jars in spark class-path. Keycloak authentication seems to work, Is this the right approach ? If it is fine I can submit a PR. @Vanzin I saw that you have done some refactoring in spark UI code, in https://github.com/apache/spark/pull/23302 can you please suggest some inputs. Thanks and Regards, Ajay G