Re: 'spark-master-docs' job missing in Jenkins

2020-02-25 Thread Hyukjin Kwon
Hm, we should still run this I believe. PR builders do not run doc build
(more specifically `cd docs && jekyll build`)

Fortunately, Javadoc, Scaladoc, SparkR documentation and PySpark API
documentation are being tested in PR builder.
However, for MD file itself under `docs` and SQL Built-in
Function documentation (
https://spark.apache.org/docs/latest/api/sql/index.html) are
not being tested anymore if I am not mistaken. I believe spark-master-docs
 was only the
job which tests it.

Would it be difficult to re-enable?

2020년 2월 26일 (수) 오후 12:37, shane knapp ☠ 님이 작성:

> it's been gone for quite a long time.  these docs were being built but not
> published.
>
> relevant discussion:
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Re-moving-the-spark-jenkins-job-builder-repo-from-dbricks-spark-tp25325p26222.html
>
> shane
>
> On Tue, Feb 25, 2020 at 6:18 PM Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> I just noticed we apparently don't build the documentation in the Jenkins
>> anymore.
>> I remember we have the job:
>> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-docs
>> Does anybody know what happened to this job?
>>
>> Thanks.
>>
>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-02-25 Thread Xiao Li
+1

Xiao

Michael Armbrust  于2020年2月24日周一 下午3:03写道:

> Hello Everyone,
>
> As more users have started upgrading to Spark 3.0 preview (including
> myself), there have been many discussions around APIs that have been broken
> compared with Spark 2.x. In many of these discussions, one of the
> rationales for breaking an API seems to be "Spark follows semantic
> versioning , so this
> major release is our chance to get it right [by breaking APIs]". Similarly,
> in many cases the response to questions about why an API was completely
> removed has been, "this API has been deprecated since x.x, so we have to
> remove it".
>
> As a long time contributor to and user of Spark this interpretation of the
> policy is concerning to me. This reasoning misses the intention of the
> original policy, and I am worried that it will hurt the long-term success
> of the project.
>
> I definitely understand that these are hard decisions, and I'm not
> proposing that we never remove anything from Spark. However, I would like
> to give some additional context and also propose a different rubric for
> thinking about API breakage moving forward.
>
> Spark adopted semantic versioning back in 2014 during the preparations for
> the 1.0 release. As this was the first major release -- and as, up until
> fairly recently, Spark had only been an academic project -- no real
> promises had been made about API stability ever.
>
> During the discussion, some committers suggested that this was an
> opportunity to clean up cruft and give the Spark APIs a once-over, making
> cosmetic changes to improve consistency. However, in the end, it was
> decided that in many cases it was not in the best interests of the Spark
> community to break things just because we could. Matei actually said it
> pretty forcefully
> 
> :
>
> I know that some names are suboptimal, but I absolutely detest breaking
> APIs, config names, etc. I’ve seen it happen way too often in other
> projects (even things we depend on that are officially post-1.0, like Akka
> or Protobuf or Hadoop), and it’s very painful. I think that we as fairly
> cutting-edge users are okay with libraries occasionally changing, but many
> others will consider it a show-stopper. Given this, I think that any
> cosmetic change now, even though it might improve clarity slightly, is not
> worth the tradeoff in terms of creating an update barrier for existing
> users.
>
> In the end, while some changes were made, most APIs remained the same and
> users of Spark <= 0.9 were pretty easily able to upgrade to 1.0. I think
> this served the project very well, as compatibility means users are able to
> upgrade and we keep as many people on the latest versions of Spark (though
> maybe not the latest APIs of Spark) as possible.
>
> As Spark grows, I think compatibility actually becomes more important and
> we should be more conservative rather than less. Today, there are very
> likely more Spark programs running than there were at any other time in the
> past. Spark is no longer a tool only used by advanced hackers, it is now
> also running "traditional enterprise workloads.'' In many cases these jobs
> are powering important processes long after the original author leaves.
>
> Broken APIs can also affect libraries that extend Spark. This dependency
> can be even harder for users, as if the library has not been upgraded to
> use new APIs and they need that library, they are stuck.
>
> Given all of this, I'd like to propose the following rubric as an addition
> to our semantic versioning policy. After discussion and if people agree
> this is a good idea, I'll call a vote of the PMC to ratify its inclusion in
> the official policy.
>
> Considerations When Breaking APIs
>
> The Spark project strives to avoid breaking APIs or silently changing
> behavior, even at major versions. While this is not always possible, the
> balance of the following factors should be considered before choosing to
> break an API.
>
> Cost of Breaking an API
>
> Breaking an API almost always has a non-trivial cost to the users of
> Spark. A broken API means that Spark programs need to be rewritten before
> they can be upgraded. However, there are a few considerations when thinking
> about what the cost will be:
>
>-
>
>Usage - an API that is actively used in many different places, is
>always very costly to break. While it is hard to know usage for sure, there
>are a bunch of ways that we can estimate:
>-
>
>   How long has the API been in Spark?
>   -
>
>   Is the API common even for basic programs?
>   -
>
>   How often do we see recent questions in JIRA or mailing lists?
>   -
>
>   How often does it appear in StackOverflow or blogs?
>   -
>
>Behavior after the break - How will a program that works today, work
>after the br

Re: 'spark-master-docs' job missing in Jenkins

2020-02-25 Thread shane knapp ☠
it's been gone for quite a long time.  these docs were being built but not
published.

relevant discussion:
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-moving-the-spark-jenkins-job-builder-repo-from-dbricks-spark-tp25325p26222.html

shane

On Tue, Feb 25, 2020 at 6:18 PM Hyukjin Kwon  wrote:

> Hi all,
>
> I just noticed we apparently don't build the documentation in the Jenkins
> anymore.
> I remember we have the job:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-docs
> Does anybody know what happened to this job?
>
> Thanks.
>


-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


'spark-master-docs' job missing in Jenkins

2020-02-25 Thread Hyukjin Kwon
Hi all,

I just noticed we apparently don't build the documentation in the Jenkins
anymore.
I remember we have the job:
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-docs
Does anybody know what happened to this job?

Thanks.


Re: What options do I have to handle third party classes that are not serializable?

2020-02-25 Thread Jeff Evans
Did you try this?  https://stackoverflow.com/a/2114387/375670


On Tue, Feb 25, 2020 at 10:23 AM yeikel valdes  wrote:

> I am currently using a third party library(Lucene) with Spark that is not
> serializable. Due to that reason, it generates the following exception  :
>
> Job aborted due to stage failure: Task 144.0 in stage 25.0 (TID 2122) had a 
> not serializable result: org.apache.lucene.facet.FacetsConfig Serialization 
> stack: - object not serializable (class: 
> org.apache.lucene.facet.FacetsConfig, value: 
> org.apache.lucene.facet.FacetsConfg
>
> While it would be ideal if this class was serializable, there is really 
> nothing I can do to change this third party library in order to add 
> serialization to it.
>
> What options do I have, and what's the recommended option to handle this 
> problem?
>
> Thank you!
>
>
>


What options do I have to handle third party classes that are not serializable?

2020-02-25 Thread yeikel valdes
I am currently using a third party library(Lucene) with Spark that is not 
serializable. Due to that reason, it generates the following exception  :


Job aborted due to stage failure: Task 144.0 in stage 25.0 (TID 2122) had a not 
serializable result: org.apache.lucene.facet.FacetsConfig Serialization stack: 
- object not serializable (class: org.apache.lucene.facet.FacetsConfig, value: 
org.apache.lucene.facet.FacetsConfg
While it would be ideal if this class was serializable, there is really nothing 
I can do to change this third party library in order to add serialization to it.
What options do I have, and what's the recommended option to handle this 
problem?
Thank you!