Re: Missing module spark-hadoop-cloud in Maven central

2021-06-21 Thread Dongjoon Hyun
Hi, Stephen and Steve.

Apache Spark community starts to publish it as a snapshot and Apache Spark 
3.2.0 will be the first release has it.

- 
https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-hadoop-cloud_2.12/3.2.0-SNAPSHOT/

Please check the snapshot artifacts and file an Apache Spark JIRA if you hit 
some issues.

Bests,
Dongjoon.

On 2021/06/02 19:05:29, Steve Loughran  wrote: 
> off the record: Really irritates me too, as it forces me to do local builds
> even though I shouldn't have to. Sometimes I do that for other reasons, but
> still.
> 
> Getting the cloud-storage module in was hard enough at the time that I
> wasn't going to push harder; I essentially stopped trying to get one in to
> spark after that and effectively being told to go and play in my own fork
> (*).
> 
> https://github.com/apache/spark/pull/12004#issuecomment-259020494
> 
> Given that effort almost failed, to then say "now include the artifact and
> releases" wasn't something I was going to do; I had everything I needed for
> my own build, and trying to add new PRs struck me as an exercise in
> confrontation and futility
> 
> Sean, if I do submit a PR which makes hadoop-cloud default on the right
> versions, but strips out the dependencies on the final tarball, would that
> get some attention?
> 
> (*) Sean of course, was a notable exception and very supportive.
> 
> 
> 
> 
> 
> 
> 
> On Wed, 2 Jun 2021 at 00:56, Stephen Coy  wrote:
> 
> > I have been building Apache Spark from source just so I can get this
> > dependency.
> >
> >
> >1. git checkout v3.1.1
> >2. dev/make-distribution.sh --name hadoop-cloud-3.2 --tgz -Pyarn
> >-Phadoop-3.2  -Pyarn -Phadoop-cloud
> >-Phive-thriftserver  -Dhadoop.version=3.2.0
> >
> >
> > It is kind of a nuisance having to do this though.
> >
> > Steve C
> >
> >
> > On 31 May 2021, at 10:34 pm, Sean Owen  wrote:
> >
> > I know it's not enabled by default when the binary artifacts are built,
> > but not exactly sure why it's not built separately at all. It's almost a
> > dependencies-only pom artifact, but there are two source files. Steve do
> > you have an angle on that?
> >
> > On Mon, May 31, 2021 at 5:37 AM Erik Torres  wrote:
> >
> >> Hi,
> >>
> >> I'm following this documentation
> >> 
> >>  to
> >> configure my Spark-based application to interact with Amazon S3. However, I
> >> cannot find the spark-hadoop-cloud module in Maven central for the
> >> non-commercial distribution of Apache Spark. From the documentation I would
> >> expect that I can get this module as a Maven dependency in my project.
> >> However, I ended up building the spark-hadoop-cloud module from the Spark's
> >> code
> >> 
> >> .
> >>
> >> Is this the expected way to setup the integration with Amazon S3? I think
> >> I'm missing something here.
> >>
> >> Thanks in advance!
> >>
> >> Erik
> >>
> >
> > This email contains confidential information of and is the copyright of
> > Infomedia. It must not be forwarded, amended or disclosed without consent
> > of the sender. If you received this message by mistake, please advise the
> > sender and delete all copies. Security of transmission on the internet
> > cannot be guaranteed, could be infected, intercepted, or corrupted and you
> > should ensure you have suitable antivirus protection in place. By sending
> > us your or any third party personal details, you consent to (or confirm you
> > have obtained consent from such third parties) to Infomedia’s privacy
> > policy. http://www.infomedia.com.au/privacy-policy/
> >
> 

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: CVEs

2021-06-21 Thread Eric Richardson
Ok, that sounds like a plan. I will gather what I found and either reach
out on the security channel and/or try and upgrade with a pull request.

Thanks for pointing me in the right direction.

On Mon, Jun 21, 2021 at 4:52 PM Sean Owen  wrote:

> Yeah if it were clearly exploitable right now we'd handle it via private@
> instead of JIRA; depends on what you think the importance is. If in doubt
> reply to priv...@spark.apache.org
>
> On Mon, Jun 21, 2021 at 6:50 PM Holden Karau  wrote:
>
>> If you get to a point where you find something you think is highly likely
>> a valid vulnerability the best path forward is likely reaching out to
>> private@ to figure out how to do a security release.
>>
>> On Mon, Jun 21, 2021 at 4:42 PM Eric Richardson 
>> wrote:
>>
>>> Thanks for the quick reply. Yes, since it is included in the jars then
>>> it is unclear whether it is used internally at least to me.
>>>
>>> I can substitute the jar in the distro to avoid the scanner from finding
>>> it but then it is unclear whether I could be breaking something or not.
>>> Given that 3.1.2 is the latest release, I guess you might expect that it
>>> would pass the scanners but I am not sure if that version spans 3.0.x and
>>> 3.1.x or not either.
>>>
>>> I can report findings in an issue where I am pretty darn sure it is a
>>> valid vulnerability if that is ok? That at least would raise the
>>> visibility.
>>>
>>> Will 3.2.x be Scala 2.13.x only or cross compiled with 2.12?
>>>
>>> I realize Spark is a beast so I just want to help if I can but also not
>>> create extra work if it is not useful for me or the Spark team/contributors.
>>>
>>> On Mon, Jun 21, 2021 at 3:43 PM Sean Owen  wrote:
>>>
 Whether it matters really depends on whether the CVE affects Spark.
 Sometimes it clearly could and so we'd try to back-port dependency updates
 to active branches.
 Sometimes it clearly doesn't and hey sometimes the dependency is
 updated anyway for good measure (mostly to keep this off static analyzer
 reports) but probably wouldn't backport.

 Jackson has been a persistent one but in this case Spark is already on
 2.12.x in master, and it wasn't clear last time I looked at those CVEs that
 they can affect Spark itself. End user apps perhaps, but those apps can
 supply their own Jackson.

 If someone had a legit view that this is potentially more serious I
 think we could _probably backport that update, but Jackson can be a little
 bit tricky with compatibility IIRC so would just bear some testing.


 On Mon, Jun 21, 2021 at 5:27 PM Eric Richardson 
 wrote:

> Hi,
>
> I am working with Spark 3.1.2 and getting several vulnerabilities
> popping up. I am wondering if the Spark distros are scanned etc. and how
> people resolve these.
>
> For example. I am finding -
> https://nvd.nist.gov/vuln/detail/CVE-2020-25649
>
> This looks like it is fixed in 2.11.0 -
> https://github.com/FasterXML/jackson-databind/issues/2589 - but Spark
> supplies 2.10.0.
>
> Thanks,
> Eric
>
 --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>


Re: CVEs

2021-06-21 Thread Sean Owen
Yeah if it were clearly exploitable right now we'd handle it via private@
instead of JIRA; depends on what you think the importance is. If in doubt
reply to priv...@spark.apache.org

On Mon, Jun 21, 2021 at 6:50 PM Holden Karau  wrote:

> If you get to a point where you find something you think is highly likely
> a valid vulnerability the best path forward is likely reaching out to
> private@ to figure out how to do a security release.
>
> On Mon, Jun 21, 2021 at 4:42 PM Eric Richardson 
> wrote:
>
>> Thanks for the quick reply. Yes, since it is included in the jars then it
>> is unclear whether it is used internally at least to me.
>>
>> I can substitute the jar in the distro to avoid the scanner from finding
>> it but then it is unclear whether I could be breaking something or not.
>> Given that 3.1.2 is the latest release, I guess you might expect that it
>> would pass the scanners but I am not sure if that version spans 3.0.x and
>> 3.1.x or not either.
>>
>> I can report findings in an issue where I am pretty darn sure it is a
>> valid vulnerability if that is ok? That at least would raise the
>> visibility.
>>
>> Will 3.2.x be Scala 2.13.x only or cross compiled with 2.12?
>>
>> I realize Spark is a beast so I just want to help if I can but also not
>> create extra work if it is not useful for me or the Spark team/contributors.
>>
>> On Mon, Jun 21, 2021 at 3:43 PM Sean Owen  wrote:
>>
>>> Whether it matters really depends on whether the CVE affects Spark.
>>> Sometimes it clearly could and so we'd try to back-port dependency updates
>>> to active branches.
>>> Sometimes it clearly doesn't and hey sometimes the dependency is updated
>>> anyway for good measure (mostly to keep this off static analyzer reports)
>>> but probably wouldn't backport.
>>>
>>> Jackson has been a persistent one but in this case Spark is already on
>>> 2.12.x in master, and it wasn't clear last time I looked at those CVEs that
>>> they can affect Spark itself. End user apps perhaps, but those apps can
>>> supply their own Jackson.
>>>
>>> If someone had a legit view that this is potentially more serious I
>>> think we could _probably backport that update, but Jackson can be a little
>>> bit tricky with compatibility IIRC so would just bear some testing.
>>>
>>>
>>> On Mon, Jun 21, 2021 at 5:27 PM Eric Richardson 
>>> wrote:
>>>
 Hi,

 I am working with Spark 3.1.2 and getting several vulnerabilities
 popping up. I am wondering if the Spark distros are scanned etc. and how
 people resolve these.

 For example. I am finding -
 https://nvd.nist.gov/vuln/detail/CVE-2020-25649

 This looks like it is fixed in 2.11.0 -
 https://github.com/FasterXML/jackson-databind/issues/2589 - but Spark
 supplies 2.10.0.

 Thanks,
 Eric

>>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: CVEs

2021-06-21 Thread Sean Owen
You could comment on https://issues.apache.org/jira/browse/SPARK-35550
which covered the updated to Jackson 2.12.3. If there's a decent case for
backporting and it doesn't have major compatibility issues, we can do it.

Then if you have time, try back-porting the patch to branch-3.1 and run
tests. (Or just open the pull request against branch-3.1 and let tests
figure it out). If it passes that's pretty good evidence it's OK.
Or get as far as you can on that and I/we can help backport.

Here were previous comments on compatibility:
https://github.com/apache/spark/pull/32688

3.2 will be Scala 2.12 and possibly experimentally 2.13, but not Scala 2.13
only.



On Mon, Jun 21, 2021 at 6:41 PM Eric Richardson 
wrote:

> Thanks for the quick reply. Yes, since it is included in the jars then it
> is unclear whether it is used internally at least to me.
>
> I can substitute the jar in the distro to avoid the scanner from finding
> it but then it is unclear whether I could be breaking something or not.
> Given that 3.1.2 is the latest release, I guess you might expect that it
> would pass the scanners but I am not sure if that version spans 3.0.x and
> 3.1.x or not either.
>
> I can report findings in an issue where I am pretty darn sure it is a
> valid vulnerability if that is ok? That at least would raise the
> visibility.
>
> Will 3.2.x be Scala 2.13.x only or cross compiled with 2.12?
>
> I realize Spark is a beast so I just want to help if I can but also not
> create extra work if it is not useful for me or the Spark team/contributors.
>
> On Mon, Jun 21, 2021 at 3:43 PM Sean Owen  wrote:
>
>> Whether it matters really depends on whether the CVE affects Spark.
>> Sometimes it clearly could and so we'd try to back-port dependency updates
>> to active branches.
>> Sometimes it clearly doesn't and hey sometimes the dependency is updated
>> anyway for good measure (mostly to keep this off static analyzer reports)
>> but probably wouldn't backport.
>>
>> Jackson has been a persistent one but in this case Spark is already on
>> 2.12.x in master, and it wasn't clear last time I looked at those CVEs that
>> they can affect Spark itself. End user apps perhaps, but those apps can
>> supply their own Jackson.
>>
>> If someone had a legit view that this is potentially more serious I think
>> we could _probably backport that update, but Jackson can be a little bit
>> tricky with compatibility IIRC so would just bear some testing.
>>
>>
>> On Mon, Jun 21, 2021 at 5:27 PM Eric Richardson 
>> wrote:
>>
>>> Hi,
>>>
>>> I am working with Spark 3.1.2 and getting several vulnerabilities
>>> popping up. I am wondering if the Spark distros are scanned etc. and how
>>> people resolve these.
>>>
>>> For example. I am finding -
>>> https://nvd.nist.gov/vuln/detail/CVE-2020-25649
>>>
>>> This looks like it is fixed in 2.11.0 -
>>> https://github.com/FasterXML/jackson-databind/issues/2589 - but Spark
>>> supplies 2.10.0.
>>>
>>> Thanks,
>>> Eric
>>>
>>


Re: CVEs

2021-06-21 Thread Holden Karau
If you get to a point where you find something you think is highly likely a
valid vulnerability the best path forward is likely reaching out to private@
to figure out how to do a security release.

On Mon, Jun 21, 2021 at 4:42 PM Eric Richardson 
wrote:

> Thanks for the quick reply. Yes, since it is included in the jars then it
> is unclear whether it is used internally at least to me.
>
> I can substitute the jar in the distro to avoid the scanner from finding
> it but then it is unclear whether I could be breaking something or not.
> Given that 3.1.2 is the latest release, I guess you might expect that it
> would pass the scanners but I am not sure if that version spans 3.0.x and
> 3.1.x or not either.
>
> I can report findings in an issue where I am pretty darn sure it is a
> valid vulnerability if that is ok? That at least would raise the
> visibility.
>
> Will 3.2.x be Scala 2.13.x only or cross compiled with 2.12?
>
> I realize Spark is a beast so I just want to help if I can but also not
> create extra work if it is not useful for me or the Spark team/contributors.
>
> On Mon, Jun 21, 2021 at 3:43 PM Sean Owen  wrote:
>
>> Whether it matters really depends on whether the CVE affects Spark.
>> Sometimes it clearly could and so we'd try to back-port dependency updates
>> to active branches.
>> Sometimes it clearly doesn't and hey sometimes the dependency is updated
>> anyway for good measure (mostly to keep this off static analyzer reports)
>> but probably wouldn't backport.
>>
>> Jackson has been a persistent one but in this case Spark is already on
>> 2.12.x in master, and it wasn't clear last time I looked at those CVEs that
>> they can affect Spark itself. End user apps perhaps, but those apps can
>> supply their own Jackson.
>>
>> If someone had a legit view that this is potentially more serious I think
>> we could _probably backport that update, but Jackson can be a little bit
>> tricky with compatibility IIRC so would just bear some testing.
>>
>>
>> On Mon, Jun 21, 2021 at 5:27 PM Eric Richardson 
>> wrote:
>>
>>> Hi,
>>>
>>> I am working with Spark 3.1.2 and getting several vulnerabilities
>>> popping up. I am wondering if the Spark distros are scanned etc. and how
>>> people resolve these.
>>>
>>> For example. I am finding -
>>> https://nvd.nist.gov/vuln/detail/CVE-2020-25649
>>>
>>> This looks like it is fixed in 2.11.0 -
>>> https://github.com/FasterXML/jackson-databind/issues/2589 - but Spark
>>> supplies 2.10.0.
>>>
>>> Thanks,
>>> Eric
>>>
>> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: CVEs

2021-06-21 Thread Eric Richardson
Thanks for the quick reply. Yes, since it is included in the jars then it
is unclear whether it is used internally at least to me.

I can substitute the jar in the distro to avoid the scanner from finding it
but then it is unclear whether I could be breaking something or not. Given
that 3.1.2 is the latest release, I guess you might expect that it would
pass the scanners but I am not sure if that version spans 3.0.x and 3.1.x
or not either.

I can report findings in an issue where I am pretty darn sure it is a valid
vulnerability if that is ok? That at least would raise the visibility.

Will 3.2.x be Scala 2.13.x only or cross compiled with 2.12?

I realize Spark is a beast so I just want to help if I can but also not
create extra work if it is not useful for me or the Spark team/contributors.

On Mon, Jun 21, 2021 at 3:43 PM Sean Owen  wrote:

> Whether it matters really depends on whether the CVE affects Spark.
> Sometimes it clearly could and so we'd try to back-port dependency updates
> to active branches.
> Sometimes it clearly doesn't and hey sometimes the dependency is updated
> anyway for good measure (mostly to keep this off static analyzer reports)
> but probably wouldn't backport.
>
> Jackson has been a persistent one but in this case Spark is already on
> 2.12.x in master, and it wasn't clear last time I looked at those CVEs that
> they can affect Spark itself. End user apps perhaps, but those apps can
> supply their own Jackson.
>
> If someone had a legit view that this is potentially more serious I think
> we could _probably backport that update, but Jackson can be a little bit
> tricky with compatibility IIRC so would just bear some testing.
>
>
> On Mon, Jun 21, 2021 at 5:27 PM Eric Richardson 
> wrote:
>
>> Hi,
>>
>> I am working with Spark 3.1.2 and getting several vulnerabilities popping
>> up. I am wondering if the Spark distros are scanned etc. and how people
>> resolve these.
>>
>> For example. I am finding -
>> https://nvd.nist.gov/vuln/detail/CVE-2020-25649
>>
>> This looks like it is fixed in 2.11.0 -
>> https://github.com/FasterXML/jackson-databind/issues/2589 - but Spark
>> supplies 2.10.0.
>>
>> Thanks,
>> Eric
>>
>


Re: CVEs

2021-06-21 Thread Sean Owen
Whether it matters really depends on whether the CVE affects Spark.
Sometimes it clearly could and so we'd try to back-port dependency updates
to active branches.
Sometimes it clearly doesn't and hey sometimes the dependency is updated
anyway for good measure (mostly to keep this off static analyzer reports)
but probably wouldn't backport.

Jackson has been a persistent one but in this case Spark is already on
2.12.x in master, and it wasn't clear last time I looked at those CVEs that
they can affect Spark itself. End user apps perhaps, but those apps can
supply their own Jackson.

If someone had a legit view that this is potentially more serious I think
we could _probably backport that update, but Jackson can be a little bit
tricky with compatibility IIRC so would just bear some testing.


On Mon, Jun 21, 2021 at 5:27 PM Eric Richardson 
wrote:

> Hi,
>
> I am working with Spark 3.1.2 and getting several vulnerabilities popping
> up. I am wondering if the Spark distros are scanned etc. and how people
> resolve these.
>
> For example. I am finding -
> https://nvd.nist.gov/vuln/detail/CVE-2020-25649
>
> This looks like it is fixed in 2.11.0 -
> https://github.com/FasterXML/jackson-databind/issues/2589 - but Spark
> supplies 2.10.0.
>
> Thanks,
> Eric
>


CVEs

2021-06-21 Thread Eric Richardson
Hi,

I am working with Spark 3.1.2 and getting several vulnerabilities popping
up. I am wondering if the Spark distros are scanned etc. and how people
resolve these.

For example. I am finding - https://nvd.nist.gov/vuln/detail/CVE-2020-25649

This looks like it is fixed in 2.11.0 -
https://github.com/FasterXML/jackson-databind/issues/2589 - but Spark
supplies 2.10.0.

Thanks,
Eric


Re: Long schedule delay time of one spark task

2021-06-21 Thread sarutak

Hi,

Have you already confirmed SPARK-30458?
https://issues.apache.org/jira/browse/SPARK-30458

The problem you met seems related to that issue.

Kousuke


Hi all,
I have a spark streaming job. One of its tasks shows abnormal long
running time compared to others.
When I check the event timeline graph, it shows 11 minutes computing
time.
But when I check the summary table, the max duration is only 2.4min
and scheduler delay is 11min.
Which data is correct?

p.s I'm using spark 2.3.


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



how Spark achieves memory fairness between tasks?

2021-06-21 Thread hatef alipoor
Dear Spark community

I was watching this 
presentation
 that is about spark memory management.

He talks about how they achieve fairness between different tasks in one 
executor (12:00). And he presents the idea of dynamic assignment between tasks 
and he declares that Spark spill other task's pages to disk if more tasks begin 
to execute.

I read before tasks in Spark are essentially threads and in Java we don't have 
this capability to manage the memory of threads and establish memory fairness 
between them. I wonder how Spark achievs this?

I ask this question in the stackoverflow and you can see this 
here,
 and they are postulating that there is no spill to disk part. I really get 
confused what is actually going on under the hood?

thank you very much.

sincerely Hatef Alipoor