Re: Python xmlrunner being used?

2020-07-24 Thread Hyukjin Kwon
It's used in Jenkins IIRC

2020년 7월 24일 (금) 오후 11:43, Driesprong, Fokko 님이 작성:

> I found this ticket: https://issues.apache.org/jira/browse/SPARK-7021
>
> Is anybody actually using this?
>
> Cheers, Fokko
>
> Op vr 24 jul. 2020 om 16:27 schreef Driesprong, Fokko  >:
>
>> Hi all,
>>
>> Does anyone know if the xmlrunner package is still being used?
>>
>> We're working on enforcing some static code analysis checks on the Python
>> codebase, and the imports of the xmlrunner generates quite some noise:
>> https://github.com/apache/spark/pull/29121
>>
>> It looks like the entry point for a lot of tests:
>> https://github.com/apache/spark/search?p=1=xmlrunner_q=xmlrunner 
>> This
>> will only run when the testfile is explicitly invoked.
>>
>> However, looking at the coverage report generation:
>> https://github.com/apache/spark/blob/master/python/run-tests-with-coverage 
>> This
>> is being generated using coverage.
>>
>> I also can't find where it is being installed. Anyone any
>> historical knowledge on this?
>>
>> Kind regards, Fokko
>>
>>
>>
>>
>>


Re: [DISCUSS] [Spark confs] Making spark.jars conf take precedence over spark default classpath

2020-07-24 Thread Imran Rashid
Hi Nupur,

Is what you're trying to do already possible via the
spark.{driver,executor}.userClassPathFirst options?

https://github.com/apache/spark/blob/b890fdc8df64f1d0b0f78b790d36be883e852b0d/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L853

On Wed, Jul 22, 2020 at 5:50 PM nupurshukla  wrote:

> Hello,
>
> I am prototyping a change in the behavior of spark.jars conf for my
> use-case.  spark.jars conf is used to specify a list of jars to include on
> the driver and executor classpaths.
>
> *Current behavior:*  spark.jars conf value is not read until after the JVM
> has already started and the system classloader has already loaded, and
> hence
> the jars added using this conf get “appended” to the spark classpath. This
> means that spark looks for the jar in its default classpath first and then
> looks at the path specified in spark.jars conf.
>
> *Proposed prototype:* I am proposing a new behavior where we can have
> spark.jars take precedence over spark default classpath in terms of how
> jars
> are discovered. This can be achieved by using
> spark.{driver,executor}.extraClassPath conf. This conf modifies the actual
> launch command of the driver (or executors), and hence this path is
> "prepended" to the classpath and thus takes precedence over the default
> classpath. Can the behavior of conf spark.jars be modified by adding the
> conf value of spark.jars to conf value of
> spark.{driver,executor}.extraClassPath during argument parsing in
> SparkSubmitArguments.scala
> <
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L151>
>
> , so that we can achieve precedence order of jars specified in spark.jars
> >
> spark.{driver,executor}.extraClassPath > spark default classpath (left to
> right precedence order)
>
> *Pseudo sample code:*
> In  loadEnvironmentArguments()
> <
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L151>
>
> :
>
> /if (jars != null) {
>   if (driverExtraClassPath != null) {
> driverExtraClassPath = driverExtraClassPath + "," + jars
>   }
>   else {
> driverExtraClassPath = jars
>   }
> }/
>
>
> *As an example*, consider jars :
> sample-jar-1.0.0.jar present in spark’s default classpath
> sample-jar-2.0.0.jar present on all nodes of the cluster at path
> //
> new-jar-1.0.0.jar present on all nodes of the cluster at path //
> (and not in spark default classpath)
>
> And two scenarios 2 spark jobs are submitted with the following – jars conf
> values
>
> <
> http://apache-spark-developers-list.1001551.n3.nabble.com/file/t3705/Capture.png>
>
>
>
> What are your thoughts on this? Could this have any undesired side-effects?
> Or has this already been explored and there are some known issues with this
> approach?
>
> Thanks,
> Nupur
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Process for backports?

2020-07-24 Thread Sean Owen
This is pretty much standard semantic versioning: https://semver.org/
Maintenance branch releases are for bug fixes; minor releases have new features.
Of course at some level it all depends on judgment, too.

On Fri, Jul 24, 2020 at 1:08 PM venkat_20  wrote:
>
> Can you please elaborate why don't backport of features is not allowed?
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Process for backports?

2020-07-24 Thread venkat_20
Can you please elaborate why don't backport of features is not allowed?



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Starting work on last Scala 2.13 updates

2020-07-24 Thread Dongjoon Hyun
Thank you so much, Sean!

Bests,
Dongjoon.


On Fri, Jul 24, 2020 at 8:56 AM Sean Owen  wrote:

> Status update - we should have Scala 2.13 compiling, with the
> exception of the REPL.
> Looks like 99% or so of tests pass too, but the remaining ones might
> be hard to debug. I haven't looked hard yet.
> https://issues.apache.org/jira/browse/SPARK-25075 is the umbrella.
> Anyone that's interested can try building with
> "dev/change-scala-version.sh 2.13" and then build with -Pscala-2.13
> and see some of the failures and have a look.
>
> On Sat, Jul 11, 2020 at 1:32 PM Sean Owen  wrote:
> >
> > I call attention to https://github.com/apache/spark/pull/28971 which
> > represents part 1 of several changes that are the last large change
> > for Scala 2.13, except for REPL updates: dealing with the fact that
> > default collection types will be immutable.
> >
> > The goal of course is to cross-compile without separate source trees,
> > if possible, and I think this is the best bet, which we've discussed
> > for over a year.
> >
> > Before merging or applying this strategy more widely, I wanted to call
> > attention to it.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Starting work on last Scala 2.13 updates

2020-07-24 Thread Holden Karau
This is awesome progress :)

On Fri, Jul 24, 2020 at 8:56 AM Sean Owen  wrote:

> Status update - we should have Scala 2.13 compiling, with the
> exception of the REPL.
> Looks like 99% or so of tests pass too, but the remaining ones might
> be hard to debug. I haven't looked hard yet.
> https://issues.apache.org/jira/browse/SPARK-25075 is the umbrella.
> Anyone that's interested can try building with
> "dev/change-scala-version.sh 2.13" and then build with -Pscala-2.13
> and see some of the failures and have a look.
>
> On Sat, Jul 11, 2020 at 1:32 PM Sean Owen  wrote:
> >
> > I call attention to https://github.com/apache/spark/pull/28971 which
> > represents part 1 of several changes that are the last large change
> > for Scala 2.13, except for REPL updates: dealing with the fact that
> > default collection types will be immutable.
> >
> > The goal of course is to cross-compile without separate source trees,
> > if possible, and I think this is the best bet, which we've discussed
> > for over a year.
> >
> > Before merging or applying this strategy more widely, I wanted to call
> > attention to it.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: Starting work on last Scala 2.13 updates

2020-07-24 Thread Sean Owen
Status update - we should have Scala 2.13 compiling, with the
exception of the REPL.
Looks like 99% or so of tests pass too, but the remaining ones might
be hard to debug. I haven't looked hard yet.
https://issues.apache.org/jira/browse/SPARK-25075 is the umbrella.
Anyone that's interested can try building with
"dev/change-scala-version.sh 2.13" and then build with -Pscala-2.13
and see some of the failures and have a look.

On Sat, Jul 11, 2020 at 1:32 PM Sean Owen  wrote:
>
> I call attention to https://github.com/apache/spark/pull/28971 which
> represents part 1 of several changes that are the last large change
> for Scala 2.13, except for REPL updates: dealing with the fact that
> default collection types will be immutable.
>
> The goal of course is to cross-compile without separate source trees,
> if possible, and I think this is the best bet, which we've discussed
> for over a year.
>
> Before merging or applying this strategy more widely, I wanted to call
> attention to it.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: InterpretedUnsafeProjection - error in getElementSize

2020-07-24 Thread Wenchen Fan
Can you create a JIRA ticket? There are many people happy to help to fix it.

On Tue, Jul 21, 2020 at 9:49 PM Janda Martin  wrote:

> Hi,
>   I think that I found error in
> InterpretedUnsafeProjection::getElementSize. This method differs from
> similar implementation in GenerateUnsafeProjection.
>
>  InterpretedUnsafeProjection::getElementSize - returns wrong size for
> UDTs. I suggest to use similar code from GenerateUnsafeProjection.
>
>
> Test type:
> new ArrayType(new CustomUDT())
>
> CustomUDT with sqlType=StringType
>
>
>
>   Thank you
>  Martin
>
>
> InterpretedUnsafeProjection implementation:
>
>   private def getElementSize(dataType: DataType): Int = dataType match {
> case NullType | StringType | BinaryType | CalendarIntervalType |
>  _: DecimalType | _: StructType | _: ArrayType | _: MapType => 8
> case _ => dataType.defaultSize
>   }
>
>
> GenerateUnsafeProjection implementation:
>
> val et = UserDefinedType.sqlType(elementType)
> ...
>
> val elementOrOffsetSize = et match {
>   case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => 8
>   case _ if CodeGenerator.isPrimitiveType(jt) => et.defaultSize
>   case _ => 8  // we need 8 bytes to store offset and length
> }
>
>
>
> PS Following line is not necessary because DecimalType is not primitive
> type - so it should be covered by default size=8.
>
>   case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => 8
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Python xmlrunner being used?

2020-07-24 Thread Driesprong, Fokko
I found this ticket: https://issues.apache.org/jira/browse/SPARK-7021

Is anybody actually using this?

Cheers, Fokko

Op vr 24 jul. 2020 om 16:27 schreef Driesprong, Fokko :

> Hi all,
>
> Does anyone know if the xmlrunner package is still being used?
>
> We're working on enforcing some static code analysis checks on the Python
> codebase, and the imports of the xmlrunner generates quite some noise:
> https://github.com/apache/spark/pull/29121
>
> It looks like the entry point for a lot of tests:
> https://github.com/apache/spark/search?p=1=xmlrunner_q=xmlrunner 
> This
> will only run when the testfile is explicitly invoked.
>
> However, looking at the coverage report generation:
> https://github.com/apache/spark/blob/master/python/run-tests-with-coverage 
> This
> is being generated using coverage.
>
> I also can't find where it is being installed. Anyone any
> historical knowledge on this?
>
> Kind regards, Fokko
>
>
>
>
>


Python xmlrunner being used?

2020-07-24 Thread Driesprong, Fokko
Hi all,

Does anyone know if the xmlrunner package is still being used?

We're working on enforcing some static code analysis checks on the Python
codebase, and the imports of the xmlrunner generates quite some noise:
https://github.com/apache/spark/pull/29121

It looks like the entry point for a lot of tests:
https://github.com/apache/spark/search?p=1=xmlrunner_q=xmlrunner
This
will only run when the testfile is explicitly invoked.

However, looking at the coverage report generation:
https://github.com/apache/spark/blob/master/python/run-tests-with-coverage This
is being generated using coverage.

I also can't find where it is being installed. Anyone any
historical knowledge on this?

Kind regards, Fokko


Re: [DISCUSS] Amend the commiter guidelines on the subject of -1s & how we expect PR discussion to be treated.

2020-07-24 Thread Tom Graves
 +1
Tom
On Tuesday, July 21, 2020, 03:35:18 PM CDT, Holden Karau 
 wrote:  
 
 Hi Spark Developers,
There has been a rather active discussion regarding the specific vetoes that 
occured during Spark 3. From that I believe we are now mostly in agreement that 
it would be best to clarify our rules around code vetoes & merging in general. 
Personally I believe this change is important to help improve the appearance of 
a level playing field in the project.
Once discussion settles I'll run this by a copy editor, my grammar isn't 
amazing, and bring forward for a vote.
The current Spark committer guide is at 
https://spark.apache.org/committers.html. I am proposing we add a section on 
when it is OK to merge PRs directly above the section on how to merge PRs. The 
text I am proposing to amend our committer guidelines with is:

PRs shall not be merged during active on topic discussion except for issues 
like critical security fixes of a public vulnerability. Under extenuating 
circumstances PRs may be merged during active off topic discussion and the 
discussion directed to a more appropriate venue. Time should be given prior to 
merging for those involved with the conversation to explain if they believe 
they are on topic.


Lazy consensus requires giving time for discussion to settle, while 
understanding that people may not be working on Spark as their full time job 
and may take holidays. It is believed that by doing this we can limit how often 
people feel the need to exercise their veto.


For the purposes of a -1 on code changes, a qualified voter includes all PMC 
members and committers in the project. For a -1 to be a valid veto it must 
include a technical reason. The reason can include things like the change may 
introduce a maintenance burden or is not the direction of Spark.


If there is a -1 from a non-committer, multiple committers or the PMC should be 
consulted before moving forward.




If the original person who cast the veto can not be reached in a reasonable 
time frame given likely holidays, it is up to the PMC to decide the next steps 
within the guidelines of the ASF. This must be decided by a consensus vote 
under the ASF voting rules.


These policies serve to reiterate the core principle that code must not be 
merged with a pending veto or before a consensus has been reached (lazy or 
otherwise).


It is the PMC’s hope that vetoes continue to be infrequent, and when they occur 
all parties take the time to build consensus prior to additional feature work.




Being a committer means exercising your judgement, while working in a community 
with diverse views. There is nothing wrong in getting a second (or 3rd or 4th) 
opinion when you are uncertain. Thank you for your dedication to the Spark 
project, it is appreciated by the developers and users of Spark.




It is hoped that these guidelines do not slow down development, rather by 
removing some of the uncertainty that makes it easier for us to reach 
consensus. If you have ideas on how to improve these guidelines, or other parts 
of how the Spark project operates you should reach out on the dev@ list to 
start the discussion.





Kind Regards,
Holden
-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
YouTube Live Streams: https://www.youtube.com/user/holdenkarau