Re: which classes/methods are considered as private in Spark?

2018-11-13 Thread Reynold Xin
I used to, before each release during the RC phase, go through every single doc 
page to make sure we don’t unintentionally leave things public. I no longer 
have time to do that unfortunately. I find that very useful because I always 
catch some mistakes through organic development.

> On Nov 13, 2018, at 8:00 PM, Wenchen Fan  wrote:
> 
> > Could you clarify what you mean here? Mima has some known limitations such 
> > as not handling "private[blah]" very well
> 
> Yes that's what I mean.
> 
> What I want to know here is, which classes/methods we expect them to be 
> private. I think things marked as "private[blabla]" are expected to be 
> private for sure, it's just the MiMa and doc generator can't handle it well. 
> We can fix them later, by using the @Private annotation probably.
> 
> > seems like it's tracked by a bunch of exclusions in the Unidoc object
> 
> That's good. At least we have a clear definition about which packages are 
> meant to be private. We should make it consistent between MiMa and doc 
> generator though.
> 
>> On Wed, Nov 14, 2018 at 10:41 AM Marcelo Vanzin  wrote:
>> On Tue, Nov 13, 2018 at 6:26 PM Wenchen Fan  wrote:
>> > Recently I updated the MiMa exclusion rules, and found MiMa tracks some 
>> > private classes/methods unexpectedly.
>> 
>> Could you clarify what you mean here? Mima has some known limitations
>> such as not handling "private[blah]" very well (because that means
>> public in Java). Spark has (had?) this tool to generate an exclusions
>> file for Mima, but not sure how up-to-date it is.
>> 
>> > AFAIK, we have several rules:
>> > 1. everything which is really private that end users can't access, e.g. 
>> > package private classes, private methods, etc.
>> > 2. classes under certain packages. I don't know if we have a list, the 
>> > catalyst package is considered as a private package.
>> > 3. everything which has a @Private annotation.
>> 
>> That's my understanding of the scope of the rules.
>> 
>> (2) to me means "things that show up in the public API docs". That's,
>> AFAIK, tracked in SparkBuild.scala; seems like it's tracked by a bunch
>> of exclusions in the Unidoc object (I remember that being different in
>> the past).
>> 
>> (3) might be a limitation of the doc generation tool? Not sure if it's
>> easy to say "do not document classes that have @Private". At the very
>> least, that annotation seems to be missing the "@Documented"
>> annotation, which would make that info present in the javadoc. I do
>> not know if the scala doc tool handles that.
>> 
>> -- 
>> Marcelo


Re: which classes/methods are considered as private in Spark?

2018-11-13 Thread Wenchen Fan
> Could you clarify what you mean here? Mima has some known limitations
such as not handling "private[blah]" very well

Yes that's what I mean.

What I want to know here is, which classes/methods we expect them to be
private. I think things marked as "private[blabla]" are expected to be
private for sure, it's just the MiMa and doc generator can't handle it
well. We can fix them later, by using the @Private annotation probably.

> seems like it's tracked by a bunch of exclusions in the Unidoc object

That's good. At least we have a clear definition about which packages are
meant to be private. We should make it consistent between MiMa and doc
generator though.

On Wed, Nov 14, 2018 at 10:41 AM Marcelo Vanzin  wrote:

> On Tue, Nov 13, 2018 at 6:26 PM Wenchen Fan  wrote:
> > Recently I updated the MiMa exclusion rules, and found MiMa tracks some
> private classes/methods unexpectedly.
>
> Could you clarify what you mean here? Mima has some known limitations
> such as not handling "private[blah]" very well (because that means
> public in Java). Spark has (had?) this tool to generate an exclusions
> file for Mima, but not sure how up-to-date it is.
>
> > AFAIK, we have several rules:
> > 1. everything which is really private that end users can't access, e.g.
> package private classes, private methods, etc.
> > 2. classes under certain packages. I don't know if we have a list, the
> catalyst package is considered as a private package.
> > 3. everything which has a @Private annotation.
>
> That's my understanding of the scope of the rules.
>
> (2) to me means "things that show up in the public API docs". That's,
> AFAIK, tracked in SparkBuild.scala; seems like it's tracked by a bunch
> of exclusions in the Unidoc object (I remember that being different in
> the past).
>
> (3) might be a limitation of the doc generation tool? Not sure if it's
> easy to say "do not document classes that have @Private". At the very
> least, that annotation seems to be missing the "@Documented"
> annotation, which would make that info present in the javadoc. I do
> not know if the scala doc tool handles that.
>
> --
> Marcelo
>


Re: which classes/methods are considered as private in Spark?

2018-11-13 Thread Marcelo Vanzin
On Tue, Nov 13, 2018 at 6:26 PM Wenchen Fan  wrote:
> Recently I updated the MiMa exclusion rules, and found MiMa tracks some 
> private classes/methods unexpectedly.

Could you clarify what you mean here? Mima has some known limitations
such as not handling "private[blah]" very well (because that means
public in Java). Spark has (had?) this tool to generate an exclusions
file for Mima, but not sure how up-to-date it is.

> AFAIK, we have several rules:
> 1. everything which is really private that end users can't access, e.g. 
> package private classes, private methods, etc.
> 2. classes under certain packages. I don't know if we have a list, the 
> catalyst package is considered as a private package.
> 3. everything which has a @Private annotation.

That's my understanding of the scope of the rules.

(2) to me means "things that show up in the public API docs". That's,
AFAIK, tracked in SparkBuild.scala; seems like it's tracked by a bunch
of exclusions in the Unidoc object (I remember that being different in
the past).

(3) might be a limitation of the doc generation tool? Not sure if it's
easy to say "do not document classes that have @Private". At the very
least, that annotation seems to be missing the "@Documented"
annotation, which would make that info present in the javadoc. I do
not know if the scala doc tool handles that.

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: which classes/methods are considered as private in Spark?

2018-11-13 Thread Sean Owen
You should find that 'surprisingly public' classes are there because
of language technicalities. For example DummySerializerInstance is
public because it's a Java class, and can't be used outside its
package otherwise.

LIkewise I think MiMa just looks at bytecode, and private[spark]
classes are public in the bytecode for similar reasons (although Scala
enforces the access within Scala as expected). Hence it will flag
changes to "nonpublic" private[spark] classes.

I think things that are meant to be marked private are, well, marked
private, or else as private as possible and flagged with annotations
like @Private. (It does sound like DummySerializerInstance should be
so annotated?) Yes, the catalyst package in its entirety is one big
exception - private by fiat, not by painstaking flagging of every
class.

The issue to me is really docs. If we have java/scaladoc of private
classes, and there's a way to avoid that like with annotations, that
should be fixed.
On Tue, Nov 13, 2018 at 6:26 PM Wenchen Fan  wrote:
>
> Hi all,
>
> Recently I updated the MiMa exclusion rules, and found MiMa tracks some 
> private classes/methods unexpectedly.
>
> Note that, "private" here means that, we have no guarantee about 
> compatibility. We don't provide documents and users need to take the risk 
> when using them.
>
> In the API document, it has some obvious private classes, e.g. 
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.serializer.DummySerializerInstance
>  , which is not expected either.
>
> I looked around and can't find a clear definition of "private" in Spark.
>
> AFAIK, we have several rules:
> 1. everything which is really private that end users can't access, e.g. 
> package private classes, private methods, etc.
> 2. classes under certain packages. I don't know if we have a list, the 
> catalyst package is considered as a private package.
> 3. everything which has a @Private annotation.
>
> I'm sending this email to collect more feedback, and hope we can come up with 
> a clear definition about what is "private".
>
> Thanks,
> Wenchen

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org