Storage at node or executor level

2017-12-22 Thread Jean Georges Perrin
Hi all,

This is more of a general architecture question, I have my idea, but wanted to 
confirm/infirm...

When your executor is accessing data, where is it stored: at the executor level 
or at the worker level? 

jg
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Help Required on Spark - Convert DataFrame to List with out using collect

2017-12-22 Thread Naresh Dulam
Hi Sunitha,

Make the class which is having the common function your calling as
serializable.


Thank you,
Naresh

On Wed, Dec 20, 2017 at 9:58 PM Sunitha Chennareddy <
chennareddysuni...@gmail.com> wrote:

> Hi,
>
> Thank You All..
>
> Here is my requirement, I have a dataframe which contains list of rows
> retrieved from oracle table.
> I need to iterate dataframe and fetch each record and call a common
> function by passing few parameters.
>
> Issue I am facing is : I am not able to call common function
>
> JavaRDD personRDD = person_dataframe.toJavaRDD().map(new
> Function() {
>   @Override
>   public Person call(Row row)  throws Exception{
>   Person person = new Person();
>   person.setId(row.getDecimal(0).longValue());
>   person.setName(row.getString(1));
>
> personLst.add(person);
> return person;
>   }
> });
>
> personRDD.foreach(new VoidFunction() {
> private static final long serialVersionUID = 1123456L;
>
> @Override
> public void call(Person person) throws Exception
> {
>   System.out.println(person.getId());
> Here I tried to call common function 
> }
>});
>
> I am able to print data in foreach loop, however if I tried to call common
> function it gives me below error
> Error Message :  org.apache.spark.SparkException: Task not serializable
>
> I kindly request you to share some idea(sample code / link to refer) on
> how to call a common function/Interace method by passing values in each
> record of the dataframe.
>
> Regards,
> Sunitha
>
>
> On Tue, Dec 19, 2017 at 1:20 PM, Weichen Xu 
> wrote:
>
>> Hi Sunitha,
>>
>> In the mapper function, you cannot update outer variables such as 
>> `personLst.add(person)`,
>> this won't work so that's the reason you got an empty list.
>>
>> You can use `rdd.collect()` to get a local list of `Person` objects
>> first, then you can safely iterate on the local list and do any update you
>> want.
>>
>> Thanks.
>>
>> On Tue, Dec 19, 2017 at 2:16 PM, Sunitha Chennareddy <
>> chennareddysuni...@gmail.com> wrote:
>>
>>> Hi Deepak,
>>>
>>> I am able to map row to person class, issue is I want to to call another
>>> method.
>>> I tried converting to list and its not working with out using collect.
>>>
>>> Regards
>>> Sunitha
>>> On Tuesday, December 19, 2017, Deepak Sharma 
>>> wrote:
>>>
 I am not sure about java but in scala it would be something like
 df.rdd.map{ x => MyClass(x.getString(0),.)}

 HTH

 --Deepak

 On Dec 19, 2017 09:25, "Sunitha Chennareddy" > wrote:

 Hi All,

 I am new to Spark, I want to convert DataFrame to List with
 out using collect().

 Main requirement is I need to iterate through the rows of dataframe and
 call another function by passing column value of each row (person.getId())

 Here is the snippet I have tried, Kindly help me to resolve the issue,
 personLst is returning 0:

 List personLst= new ArrayList();
 JavaRDD personRDD = person_dataframe.toJavaRDD().map(new
 Function() {
   public Person call(Row row)  throws Exception{
   Person person = new Person();
   person.setId(row.getDecimal(0).longValue());
   person.setName(row.getString(1));

 personLst.add(person);
 // here I tried to call another function but control never passed
 return person;
   }
 });
 logger.info("personLst size =="+personLst.size());
 logger.info("personRDD count ==="+personRDD.count());

 //output is
 personLst size == 0
 personRDD count === 3



>>
>


Re: [E] How to do stop streaming before the application got killed

2017-12-22 Thread Rastogi, Pankaj
You can add a shutdown hook to your JVM and request spark streaming context
to stop gracefully.

  /**
   * Shutdown hook to shutdown JVM gracefully
   * @param ssCtx
   */
  def addShutdownHook(ssCtx: StreamingContext) = {

Runtime.getRuntime.addShutdownHook( new Thread() {

  override def run() = {

println("In shutdown hook")
// stop gracefully
ssCtx.stop(true, true)
  }
})
  }
}

Pankaj

On Fri, Dec 22, 2017 at 9:56 AM, Toy  wrote:

> I'm trying to write a deployment job for Spark application. Basically the
> job will send yarn application --kill app_id to the cluster but after the
> application received the signal it dies without finishing whatever is
> processing or stopping the stream.
>
> I'm using Spark Streaming. What's the best way to stop Spark application
> so we won't lose any data.
>
>
>


How to do stop streaming before the application got killed

2017-12-22 Thread Toy
I'm trying to write a deployment job for Spark application. Basically the
job will send yarn application --kill app_id to the cluster but after the
application received the signal it dies without finishing whatever is
processing or stopping the stream.

I'm using Spark Streaming. What's the best way to stop Spark application so
we won't lose any data.


unsubscribe

2017-12-22 Thread 施朝浩
unsubscribe


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Passing an array of more than 22 elements in a UDF

2017-12-22 Thread ayan guha
Hi I think you are in correct track. You can stuff all your param in a
suitable data structure like array or dict and pass this structure as a
single param in your udf.

On Fri, 22 Dec 2017 at 2:55 pm, Aakash Basu 
wrote:

> Hi,
>
> I am using Spark 2.2 using Java, can anyone please suggest me how to take
> more than 22 parameters in an UDF? I mean, if I want to pass all the
> parameters as an array of integers?
>
> Thanks,
> Aakash.
>
-- 
Best Regards,
Ayan Guha


Passing an array of more than 22 elements in a UDF

2017-12-22 Thread Aakash Basu
Hi,

I am using Spark 2.2 using Java, can anyone please suggest me how to take
more than 22 parameters in an UDF? I mean, if I want to pass all the
parameters as an array of integers?

Thanks,
Aakash.