Re: Spark Packaging Jenkins

2019-01-04 Thread shane knapp
https://issues.apache.org/jira/browse/SPARK-26537

On Fri, Jan 4, 2019 at 11:31 AM shane knapp  wrote:

> this may push in to early next week...  these builds were set up before my
> time, and i'm currently unraveling how they all work before pushing a
> commit to fix stuff.
>
> nothing like some code archaeology to make my friday more exciting!  :)
>
> shane
>
> On Fri, Jan 4, 2019 at 11:08 AM Dongjoon Hyun 
> wrote:
>
>> Thank you, Shane!
>>
>> Bests,
>> Dongjoon.
>>
>> On Fri, Jan 4, 2019 at 10:50 AM shane knapp  wrote:
>>
>>> yeah, i'll get on that today.  thanks for the heads up.
>>>
>>> On Fri, Jan 4, 2019 at 10:46 AM Dongjoon Hyun 
>>> wrote:
>>>
 Hi, All

 As a part of release process, we need to check Packaging/Compile/Test
 Jenkins status.

 http://spark.apache.org/release-process.html

 1. Spark Packaging:
 https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/
 2. Spark QA Compile:
 https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
 3. Spark QA Test:
 https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/

 Currently, (2) and (3) are working because it uses GitHub (
 https://github.com/apache/spark.git).
 But, (1) seems to be broken because it's looking for old repo(
 https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead
 of new GitBox.

 Can we fix this in this week?

 Bests,
 Dongjoon.


>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Spark Packaging Jenkins

2019-01-04 Thread shane knapp
this may push in to early next week...  these builds were set up before my
time, and i'm currently unraveling how they all work before pushing a
commit to fix stuff.

nothing like some code archaeology to make my friday more exciting!  :)

shane

On Fri, Jan 4, 2019 at 11:08 AM Dongjoon Hyun 
wrote:

> Thank you, Shane!
>
> Bests,
> Dongjoon.
>
> On Fri, Jan 4, 2019 at 10:50 AM shane knapp  wrote:
>
>> yeah, i'll get on that today.  thanks for the heads up.
>>
>> On Fri, Jan 4, 2019 at 10:46 AM Dongjoon Hyun 
>> wrote:
>>
>>> Hi, All
>>>
>>> As a part of release process, we need to check Packaging/Compile/Test
>>> Jenkins status.
>>>
>>> http://spark.apache.org/release-process.html
>>>
>>> 1. Spark Packaging:
>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/
>>> 2. Spark QA Compile:
>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
>>> 3. Spark QA Test:
>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
>>>
>>> Currently, (2) and (3) are working because it uses GitHub (
>>> https://github.com/apache/spark.git).
>>> But, (1) seems to be broken because it's looking for old repo(
>>> https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead of
>>> new GitBox.
>>>
>>> Can we fix this in this week?
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>

-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Spark Packaging Jenkins

2019-01-04 Thread Dongjoon Hyun
Thank you, Shane!

Bests,
Dongjoon.

On Fri, Jan 4, 2019 at 10:50 AM shane knapp  wrote:

> yeah, i'll get on that today.  thanks for the heads up.
>
> On Fri, Jan 4, 2019 at 10:46 AM Dongjoon Hyun 
> wrote:
>
>> Hi, All
>>
>> As a part of release process, we need to check Packaging/Compile/Test
>> Jenkins status.
>>
>> http://spark.apache.org/release-process.html
>>
>> 1. Spark Packaging:
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/
>> 2. Spark QA Compile:
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
>> 3. Spark QA Test:
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
>>
>> Currently, (2) and (3) are working because it uses GitHub (
>> https://github.com/apache/spark.git).
>> But, (1) seems to be broken because it's looking for old repo(
>> https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead of
>> new GitBox.
>>
>> Can we fix this in this week?
>>
>> Bests,
>> Dongjoon.
>>
>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: Spark Packaging Jenkins

2019-01-04 Thread shane knapp
yeah, i'll get on that today.  thanks for the heads up.

On Fri, Jan 4, 2019 at 10:46 AM Dongjoon Hyun 
wrote:

> Hi, All
>
> As a part of release process, we need to check Packaging/Compile/Test
> Jenkins status.
>
> http://spark.apache.org/release-process.html
>
> 1. Spark Packaging:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/
> 2. Spark QA Compile:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
> 3. Spark QA Test:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/
>
> Currently, (2) and (3) are working because it uses GitHub (
> https://github.com/apache/spark.git).
> But, (1) seems to be broken because it's looking for old repo(
> https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead of
> new GitBox.
>
> Can we fix this in this week?
>
> Bests,
> Dongjoon.
>
>

-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Spark Packaging Jenkins

2019-01-04 Thread Dongjoon Hyun
Hi, All

As a part of release process, we need to check Packaging/Compile/Test
Jenkins status.

http://spark.apache.org/release-process.html

1. Spark Packaging:
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/
2. Spark QA Compile:
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
3. Spark QA Test:
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/

Currently, (2) and (3) are working because it uses GitHub (
https://github.com/apache/spark.git).
But, (1) seems to be broken because it's looking for old repo(
https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead of new
GitBox.

Can we fix this in this week?

Bests,
Dongjoon.


Re: Spark History UI + Keycloak Integration

2019-01-04 Thread Marcelo Vanzin
On Fri, Jan 4, 2019 at 3:25 AM G, Ajay (Nokia - IN/Bangalore)
 wrote:
...
> Added session handler for all context -   
> contextHandler.setSessionHandler(new SessionHandler())
...
> Keycloak authentication seems to work, Is this the right approach ? If it is 
> fine I can submit a PR.

I don't remember many details about servlet session management, and
whether it can be enabled some other way, but that seems ok. I'd just
make it a new config, since otherwise Spark doesn't need the extra
overhead.

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Handling correctness/data loss jiras

2019-01-04 Thread Reynold Xin
Committers,

When you merge tickets fixing correctness bugs, please make sure you tag the 
tickets with "correctness" label. I've found multiple tickets today that didn't 
do that.

On Fri, Aug 17, 2018 at 7:11 AM, Tom Graves < tgraves...@yahoo.com.invalid > 
wrote:

> 
> Since we haven't heard any objections to this, the documentation has been
> updated (Thanks to Sean).
> 
> 
> All devs please make sure to re-read: http:/ / spark. apache. org/ 
> contributing.
> html ( http://spark.apache.org/contributing.html ) .
> 
> 
> Note the set of labels used in Jira has been documented and correctness or
> data loss issues should be marked as blocker by default.  There is also a
> label to mark the jira as having something needing to go into the
> release-notes.
> 
> 
> 
> 
> Tom
> 
> 
> On Tuesday, August 14, 2018, 3:32:27 PM CDT, Imran Rashid < irashid@ cloudera.
> com. INVALID ( iras...@cloudera.com.INVALID ) > wrote:
> 
> 
> 
> 
> +1 on what we should do.
> 
> 
> On Mon, Aug 13, 2018 at 3:06 PM, Tom Graves < tgraves_cs@ yahoo. com. invalid
> ( tgraves...@yahoo.com.invalid ) > wrote:
> 
>> 
>> 
>> 
>> > I mean, what are concrete steps beyond saying this is a problem? That's
>> the important thing to discuss.
>> 
>> 
>> Sorry I'm a bit confused by your statement but also think I agree.  I
>> started this thread for this reason. I pointed out that I thought it was a
>> problem and also brought up things I thought we could do to help fix it.  
>> 
>> 
>> 
>> Maybe I wasn't clear in the first email, the list of things I had were
>> proposals on what we do for a jira that is for a correctness/data loss
>> issue. Its the committers and developers that are involved in this though
>> so if people don't agree or aren't going to do them, then it doesn't work.
>> 
>> 
>> 
>> Just to restate what I think we should do:
>> 
>> 
>> - label any correctness/data loss jira with "correctness"
>> - jira should be marked as a blocker by default if someone suspects a
>> corruption/loss issue
>> - Make sure the description is clear about when it occurs and impact to
>> the user.   
>> - ensure its back ported to all active branches
>> - See if we can have a separate section in the release notes for these
>> 
>> 
>> The last one I guess is more a one time thing that i can file a jira for. 
>> The first 4 would be done for each jira filed.
>> 
>> 
>> I'm proposing we do these things and as such if people agree we would also
>> document those things in the committers or developers guide and send email
>> to the list. 
>> 
>> 
>>  
>> 
>> 
>> Tom
>> On Monday, August 13, 2018, 11:17:22 AM CDT, Sean Owen < srowen@ apache. org
>> ( sro...@apache.org ) > wrote:
>> 
>> 
>> 
>> 
>> Generally: if someone thinks correctness fix X should be backported
>> further, I'd say just do it, if it's to an active release branch (see
>> below). Anything that important has to outweigh most any other concern,
>> like behavior changes.
>> 
>> 
>> On Mon, Aug 13, 2018 at 11:08 AM Tom Graves < tgraves_cs@ yahoo. com (
>> tgraves...@yahoo.com ) > wrote:
>> 
>>> I'm not really sure what you mean by this, this proposal is to introduce a
>>> process for this type of issue so its at least brought to peoples
>>> attention. We can't do anything to make people work on certain things.  If
>>> they aren't raised as important issues then its really easy to miss these
>>> things.  If its a blocker we should also not be doing any new releases
>>> without a fix for it which may motivate people to look at it.
>>> 
>> 
>> 
>> 
>> I mean, what are concrete steps beyond saying this is a problem? That's
>> the important thing to discuss.
>> 
>> 
>> There's a good one here: let's say anything that's likely to be a
>> correctness or data loss issue should automatically be labeled
>> 'correctness' as such and set to Blocker. 
>> 
>> 
>> That can go into the how-to-contribute manual in the docs and in a note to
>> dev@.
>>  
>>  
>> 
>>> 
>>> I agree it would be good for us to make it more official about which
>>> branches are being maintained.  I think at this point its still 2.1.x,
>>> 2.2.x, and 2.3.x since we recently did releases of all of these.  Since
>>> 2.4 will be coming out we should definitely think about stop maintaining
>>> 2.1.x.  Perhaps we need a table on our release page about this.  But this
>>> should be a separate thread.
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> I propose writing something like this in the 'versioning' doc page, to at
>> least establish a policy:
>> 
>> 
>> Minor release branches will, generally, be maintained with bug fixes
>> releases for a period of 18 months. For example, branch 2.1.x is no longer
>> considered maintained as of July 2018, 18 months after the release of
>> 2.1.0 in December 2106.
>> 
>> 
>> This gives us -- and more importantly users -- some understanding of what
>> to expect for backporting and fixes.
>> 
>> 
>> 
>> 
>> I am going to revive the thread about adding PMC / committers as it's
>> overdue. That may not do much, but, more 

Re: Remove non-Tungsten mode in Spark 3?

2019-01-04 Thread Sean Owen
OK, maybe leave in tungsten for 3.0.
I did a quick check, and removing StaticMemoryManager saves a few hundred
lines. It's used in MemoryStore tests internally though, and not a trivial
change to remove it. It's also used directly in HashedRelation. It could
still be worth removing it as a user-facing option to reduce confusion
about memory tuning, but it wouldn't take out much code. What do you all
think?

On Thu, Jan 3, 2019 at 9:41 PM Reynold Xin  wrote:

> The issue with the offheap mode is it is a pretty big behavior change and
> does require additional setup (also for users that run with UDFs that
> allocate a lot of heap memory, it might not be as good).
>
> I can see us removing the legacy mode since it's been legacy for a long
> time and perhaps very few users need it. How much code does it remove
> though?
>
>
> On Thu, Jan 03, 2019 at 2:55 PM, Sean Owen  wrote:
>
>> Just wondering if there is a good reason to keep around the pre-tungsten
>> on-heap memory mode for Spark 3, and make spark.memory.offHeap.enabled
>> always true? It would simplify the code somewhat, but I don't feel I'm so
>> aware of the tradeoffs.
>>
>> I know we didn't deprecate it, but it's been off by default for a long
>> time. It could be deprecated, too.
>>
>> Same question for spark.memory.useLegacyMode and all its various
>> associated settings? Seems like these should go away at some point, and
>> Spark 3 is a good point. Same issue about deprecation though.
>>
>> - To
>> unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>
>


Logging exception of custom spark app

2019-01-04 Thread Alessandro Liparoti
Hi everyone,

I have an application that uses spark to perform some computation. It can
be used both in a spark-shell or in a spark-submit. I want to log all
exceptions throw by my code inside a file in order to have some detailed
info when user have an error.

I tried with this

Thread.currentThread().setUncaughtExceptionHandler(new
Thread.UncaughtExceptionHandler() {
def uncaughtException(t: Thread, e: Throwable): Unit = {
  logger.error("exception logged")
  println("exception logged")
}
})

but it is not working. I saw that Spark already sets an
uncaughtExceptionHandler, so probably this code is not effective.
The other option would be to try-catch all public methods of my API, log an
exception when it happens and then throw it. But I think this is not
optimal.

Do you have any suggestion?
*Alessandro Liparoti*


Spark History UI + Keycloak Integration

2019-01-04 Thread G, Ajay (Nokia - IN/Bangalore)
Hello,

We were trying to enable spark-history UI authentication through keycloak. 
>From the spark documentation we found out that we can use javax filters to 
enable the UI authentication. Keycloak already provides a java 
keycloak-servlet-filter-adapter which can be used.

I have added the following configuration in spark-defaults.conf

spark.ui.filters org.keycloak.adapters.servlet.KeycloakOIDCFilter
spark.org.keycloak.adapters.servlet.KeycloakOIDCFilter.param.keycloak.config.file
 /home/ag/spark-2.4.0-bin-2.7.3/conf/keycloak.json

I was facing the below issue while running

java.lang.IllegalStateException: No SessionManager
   at 
org.spark_project.jetty.server.Request.getSession(Request.java:1544)
   at 
org.keycloak.adapters.servlet.FilterSessionStore.saveRequest(FilterSessionStore.java:374)


This was because none of the ServletContext in spark-history has 
sessionManangment. I have made the below changes

  1.  Added Session id manager in JettyUtils.scala
   server.setSessionIdManager(new HashSessionIdManager())


  1.  Added session handler for all context -   
contextHandler.setSessionHandler(new SessionHandler())

in

 *   JettyUtils.scala - at createServletHandler, createStaticHandler and 
createProxyHandler
 *   HistoryServer.scala - at initialize for /history context
 *   ApiRootResource.scala - at getServletHandler for /api context.


  1.  Placed required Keycloak runtime jars in spark class-path.


Keycloak authentication seems to work, Is this the right approach ? If it is 
fine I can submit a PR.
@Vanzin I saw that you have done some refactoring in spark UI code, in 
https://github.com/apache/spark/pull/23302 can you please suggest some inputs.

Thanks and Regards,
Ajay G