Re: Does Spark have a plan to move away from sun.misc.Unsafe?

2018-10-25 Thread Vadim Semenov
Here you go:
the umbrella ticket:
https://issues.apache.org/jira/browse/SPARK-24417

and the sun.misc.unsafe one
https://issues.apache.org/jira/browse/SPARK-24421
On Wed, Oct 24, 2018 at 8:08 PM kant kodali  wrote:
>
> Hi All,
>
> Does Spark have a plan to move away from sun.misc.Unsafe to VarHandles? I am 
> trying to find a JIRA issue for this?
>
> Thanks!



-- 
Sent from my iPhone

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-25 Thread Patrick Brown
Done:

https://issues.apache.org/jira/browse/SPARK-25837

On Thu, Oct 25, 2018 at 10:21 AM Marcelo Vanzin  wrote:

> Ah that makes more sense. Could you file a bug with that information
> so we don't lose track of this?
>
> Thanks
> On Wed, Oct 24, 2018 at 6:13 PM Patrick Brown
>  wrote:
> >
> > On my production application I am running ~200 jobs at once, but
> continue to submit jobs in this manner for sometimes ~1 hour.
> >
> > The reproduction code above generally only has 4 ish jobs running at
> once, and as you can see runs through 50k jobs in this manner.
> >
> > I guess I should clarify my above statement, the issue seems to appear
> when running multiple jobs at once as well as in sequence for a while and
> may as well have something to do with high master CPU usage (thus the
> collect in the code). My rough guess would be whatever is managing clearing
> out completed jobs gets overwhelmed (my master was a 4 core machine while
> running this, and htop reported almost full CPU usage across all 4 cores).
> >
> > The attached screenshot shows the state of the webui after running the
> repro code, you can see the ui is displaying some 43k completed jobs (takes
> a long time to load) after a few minutes of inactivity this will clear out,
> however as my production application continues to submit jobs every once in
> a while, the issue persists.
> >
> > On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin 
> wrote:
> >>
> >> When you say many jobs at once, what ballpark are you talking about?
> >>
> >> The code in 2.3+ does try to keep data about all running jobs and
> >> stages regardless of the limit. If you're running into issues because
> >> of that we may have to look again at whether that's the right thing to
> >> do.
> >> On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown
> >>  wrote:
> >> >
> >> > I believe I may be able to reproduce this now, it seems like it may
> be something to do with many jobs at once:
> >> >
> >> > Spark 2.3.1
> >> >
> >> > > spark-shell --conf spark.ui.retainedJobs=1
> >> >
> >> > scala> import scala.concurrent._
> >> > scala> import scala.concurrent.ExecutionContext.Implicits.global
> >> > scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0
> until i).collect.length) } }
> >> >
> >> > On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin 
> wrote:
> >> >>
> >> >> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
> >> >> single stage (+ the tasks related to that single stage), same thing
> in
> >> >> memory (checked with jvisualvm).
> >> >>
> >> >> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin 
> wrote:
> >> >> >
> >> >> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
> >> >> >  wrote:
> >> >> > > I recently upgraded to spark 2.3.1 I have had these same
> settings in my spark submit script, which worked on 2.0.2, and according to
> the documentation appear to not have changed:
> >> >> > >
> >> >> > > spark.ui.retainedTasks=1
> >> >> > > spark.ui.retainedStages=1
> >> >> > > spark.ui.retainedJobs=1
> >> >> >
> >> >> > I tried that locally on the current master and it seems to be
> working.
> >> >> > I don't have 2.3 easily in front of me right now, but will take a
> look
> >> >> > Monday.
> >> >> >
> >> >> > --
> >> >> > Marcelo
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Marcelo
> >>
> >>
> >>
> >> --
> >> Marcelo
>
>
>
> --
> Marcelo
>


Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-25 Thread Marcelo Vanzin
Ah that makes more sense. Could you file a bug with that information
so we don't lose track of this?

Thanks
On Wed, Oct 24, 2018 at 6:13 PM Patrick Brown
 wrote:
>
> On my production application I am running ~200 jobs at once, but continue to 
> submit jobs in this manner for sometimes ~1 hour.
>
> The reproduction code above generally only has 4 ish jobs running at once, 
> and as you can see runs through 50k jobs in this manner.
>
> I guess I should clarify my above statement, the issue seems to appear when 
> running multiple jobs at once as well as in sequence for a while and may as 
> well have something to do with high master CPU usage (thus the collect in the 
> code). My rough guess would be whatever is managing clearing out completed 
> jobs gets overwhelmed (my master was a 4 core machine while running this, and 
> htop reported almost full CPU usage across all 4 cores).
>
> The attached screenshot shows the state of the webui after running the repro 
> code, you can see the ui is displaying some 43k completed jobs (takes a long 
> time to load) after a few minutes of inactivity this will clear out, however 
> as my production application continues to submit jobs every once in a while, 
> the issue persists.
>
> On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin  wrote:
>>
>> When you say many jobs at once, what ballpark are you talking about?
>>
>> The code in 2.3+ does try to keep data about all running jobs and
>> stages regardless of the limit. If you're running into issues because
>> of that we may have to look again at whether that's the right thing to
>> do.
>> On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown
>>  wrote:
>> >
>> > I believe I may be able to reproduce this now, it seems like it may be 
>> > something to do with many jobs at once:
>> >
>> > Spark 2.3.1
>> >
>> > > spark-shell --conf spark.ui.retainedJobs=1
>> >
>> > scala> import scala.concurrent._
>> > scala> import scala.concurrent.ExecutionContext.Implicits.global
>> > scala> for (i <- 0 until 5) { Future { println(sc.parallelize(0 until 
>> > i).collect.length) } }
>> >
>> > On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin  
>> > wrote:
>> >>
>> >> Just tried on 2.3.2 and worked fine for me. UI had a single job and a
>> >> single stage (+ the tasks related to that single stage), same thing in
>> >> memory (checked with jvisualvm).
>> >>
>> >> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin  
>> >> wrote:
>> >> >
>> >> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
>> >> >  wrote:
>> >> > > I recently upgraded to spark 2.3.1 I have had these same settings in 
>> >> > > my spark submit script, which worked on 2.0.2, and according to the 
>> >> > > documentation appear to not have changed:
>> >> > >
>> >> > > spark.ui.retainedTasks=1
>> >> > > spark.ui.retainedStages=1
>> >> > > spark.ui.retainedJobs=1
>> >> >
>> >> > I tried that locally on the current master and it seems to be working.
>> >> > I don't have 2.3 easily in front of me right now, but will take a look
>> >> > Monday.
>> >> >
>> >> > --
>> >> > Marcelo
>> >>
>> >>
>> >>
>> >> --
>> >> Marcelo
>>
>>
>>
>> --
>> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark SQL Error

2018-10-25 Thread Sai Kiran Kodukula
Hi all,

I am getting the following error message in one of my Spark SQL's. I
realize this may be related to the version of Spark or a configuration
change but want to know the details and resolution.

Thanks

spark.sql.codegen.aggregate.map.twolevel.enabled is set to true, but
current version of codegened fast hashmap does not support this aggregate


Re: Watermarking without aggregation with Structured Streaming

2018-10-25 Thread sanjay_awat
Hello peay-2,

Were you able to get a solution to your problem ? Were you able to get
watermark timestamp available through a function ?

Regards,
Sanjay


peay-2 wrote
> Thanks for the pointers. I guess right now the only workaround would be to
> apply a "dummy" aggregation (e.g., group by the timestamp itself) only to
> have the stateful processing logic kick in and apply the filtering?
> 
> For my purposes, an alternative solution to pushing it out to the source
> would be to make the watermark timestamp available through a function so
> that it can be used in a regular filter clause. Based on my experiments,
> the timestamp is computed and updated even when no stateful computations
> occur. I am not sure how easy that would be to contribute though, maybe
> someone can suggest a starting point?





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [External Sender] Having access to spark results

2018-10-25 Thread Affan Syed
Femi,
We have a solution that needs to be both on-prem and also in the cloud.

Not sure how that impacts anything, what we want is to run an analytical
query on a large dataset (ours is over Cassandra) -- so batch in that
sense, but think on-demand --- and then have the result be entirely (not
first x number of rows) available for a web application to access the
results.

Web application work over a REST API, so while the query can be submitted
through something like Livy or the thrift-server, the concern is how do we
get the final result back to be useful.

I could think of two ways of doing that.

A  global temp table would work, but that would be first point --- it seems
a bit involved. My point was that, has someone solved that problem and run
through all the steps?


- Affan

ᐧ

On Thu, Oct 25, 2018 at 12:39 PM Femi Anthony <
olufemi.anth...@capitalone.com> wrote:

> What sort of environment are you running Spark on - in the cloud, on
> premise ? Is its a real-time or batch oriented application?
> Please provide more details.
> Femi
>
> On Thu, Oct 25, 2018 at 3:29 AM Affan Syed  wrote:
>
>> Spark users,
>> We really would want to get an input here about how the results from a
>> Spark Query will be accessible to a web-application. Given Spark is a well
>> used in the industry I would have thought that this part would have lots of
>> answers/tutorials about it, but I didnt find anything.
>>
>> Here are a few options that come to mind
>>
>> 1) Spark results are saved in another DB ( perhaps a traditional one) and
>> a request for query returns the new table name for access through a
>> paginated query. That seems doable, although a bit convoluted as we need to
>> handle the completion of the query.
>>
>> 2) Spark results are pumped into a messaging queue from which a socket
>> server like connection is made.
>>
>> What confuses me is that other connectors to spark, like those for
>> Tableau, using something like JDBC should have all the data (not the top
>> 500 that we typically can get via Livy or other REST interfaces to Spark).
>> How do those connectors get all the data through a single connection?
>>
>>
>> Can someone with expertise help in bringing clarity.
>>
>> Thank you.
>>
>> Affan
>> ᐧ
>> ᐧ
>>
>
> --
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>


Fwd: Having access to spark results

2018-10-25 Thread onmstester onmstester
What about using cache() or save as a global temp table  for subsequent access? 
Sent using Zoho Mail  Forwarded message  From : Affan 
Syed  To : "spark users" Date : Thu, 25 
Oct 2018 10:58:43 +0330 Subject : Having access to spark results  
Forwarded message  Spark users,  We really would want to get an 
input here about how the results from a Spark Query will be accessible to a 
web-application. Given Spark is a well used in the industry I would have 
thought that this part would have lots of answers/tutorials about it, but I 
didnt find anything. Here are a few options that come to mind 1) Spark results 
are saved in another DB ( perhaps a traditional one) and a request for query 
returns the new table name for access through a paginated query. That seems 
doable, although a bit convoluted as we need to handle the completion of the 
query. 2) Spark results are pumped into a messaging queue from which a socket 
server like connection is made. What confuses me is that other connectors to 
spark, like those for Tableau, using something like JDBC should have all the 
data (not the top 500 that we typically can get via Livy or other REST 
interfaces to Spark). How do those connectors get all the data through a single 
connection? Can someone with expertise help in bringing clarity.  Thank you.  
Affan ᐧ ᐧ

Re: [External Sender] Having access to spark results

2018-10-25 Thread Femi Anthony
What sort of environment are you running Spark on - in the cloud, on
premise ? Is its a real-time or batch oriented application?
Please provide more details.
Femi

On Thu, Oct 25, 2018 at 3:29 AM Affan Syed  wrote:

> Spark users,
> We really would want to get an input here about how the results from a
> Spark Query will be accessible to a web-application. Given Spark is a well
> used in the industry I would have thought that this part would have lots of
> answers/tutorials about it, but I didnt find anything.
>
> Here are a few options that come to mind
>
> 1) Spark results are saved in another DB ( perhaps a traditional one) and
> a request for query returns the new table name for access through a
> paginated query. That seems doable, although a bit convoluted as we need to
> handle the completion of the query.
>
> 2) Spark results are pumped into a messaging queue from which a socket
> server like connection is made.
>
> What confuses me is that other connectors to spark, like those for
> Tableau, using something like JDBC should have all the data (not the top
> 500 that we typically can get via Livy or other REST interfaces to Spark).
> How do those connectors get all the data through a single connection?
>
>
> Can someone with expertise help in bringing clarity.
>
> Thank you.
>
> Affan
> ᐧ
> ᐧ
>


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Having access to spark results

2018-10-25 Thread Affan Syed
Spark users,
We really would want to get an input here about how the results from a
Spark Query will be accessible to a web-application. Given Spark is a well
used in the industry I would have thought that this part would have lots of
answers/tutorials about it, but I didnt find anything.

Here are a few options that come to mind

1) Spark results are saved in another DB ( perhaps a traditional one) and a
request for query returns the new table name for access through a paginated
query. That seems doable, although a bit convoluted as we need to handle
the completion of the query.

2) Spark results are pumped into a messaging queue from which a socket
server like connection is made.

What confuses me is that other connectors to spark, like those for Tableau,
using something like JDBC should have all the data (not the top 500 that we
typically can get via Livy or other REST interfaces to Spark). How do those
connectors get all the data through a single connection?


Can someone with expertise help in bringing clarity.

Thank you.

Affan
ᐧ
ᐧ