Re: Help Required on Spark - Convert DataFrame to List with out using collect

2017-12-22 Thread Naresh Dulam
Hi Sunitha,

Make the class which is having the common function your calling as
serializable.


Thank you,
Naresh

On Wed, Dec 20, 2017 at 9:58 PM Sunitha Chennareddy <
chennareddysuni...@gmail.com> wrote:

> Hi,
>
> Thank You All..
>
> Here is my requirement, I have a dataframe which contains list of rows
> retrieved from oracle table.
> I need to iterate dataframe and fetch each record and call a common
> function by passing few parameters.
>
> Issue I am facing is : I am not able to call common function
>
> JavaRDD personRDD = person_dataframe.toJavaRDD().map(new
> Function() {
>   @Override
>   public Person call(Row row)  throws Exception{
>   Person person = new Person();
>   person.setId(row.getDecimal(0).longValue());
>   person.setName(row.getString(1));
>
> personLst.add(person);
> return person;
>   }
> });
>
> personRDD.foreach(new VoidFunction() {
> private static final long serialVersionUID = 1123456L;
>
> @Override
> public void call(Person person) throws Exception
> {
>   System.out.println(person.getId());
> Here I tried to call common function 
> }
>});
>
> I am able to print data in foreach loop, however if I tried to call common
> function it gives me below error
> Error Message :  org.apache.spark.SparkException: Task not serializable
>
> I kindly request you to share some idea(sample code / link to refer) on
> how to call a common function/Interace method by passing values in each
> record of the dataframe.
>
> Regards,
> Sunitha
>
>
> On Tue, Dec 19, 2017 at 1:20 PM, Weichen Xu 
> wrote:
>
>> Hi Sunitha,
>>
>> In the mapper function, you cannot update outer variables such as 
>> `personLst.add(person)`,
>> this won't work so that's the reason you got an empty list.
>>
>> You can use `rdd.collect()` to get a local list of `Person` objects
>> first, then you can safely iterate on the local list and do any update you
>> want.
>>
>> Thanks.
>>
>> On Tue, Dec 19, 2017 at 2:16 PM, Sunitha Chennareddy <
>> chennareddysuni...@gmail.com> wrote:
>>
>>> Hi Deepak,
>>>
>>> I am able to map row to person class, issue is I want to to call another
>>> method.
>>> I tried converting to list and its not working with out using collect.
>>>
>>> Regards
>>> Sunitha
>>> On Tuesday, December 19, 2017, Deepak Sharma 
>>> wrote:
>>>
 I am not sure about java but in scala it would be something like
 df.rdd.map{ x => MyClass(x.getString(0),.)}

 HTH

 --Deepak

 On Dec 19, 2017 09:25, "Sunitha Chennareddy" > wrote:

 Hi All,

 I am new to Spark, I want to convert DataFrame to List with
 out using collect().

 Main requirement is I need to iterate through the rows of dataframe and
 call another function by passing column value of each row (person.getId())

 Here is the snippet I have tried, Kindly help me to resolve the issue,
 personLst is returning 0:

 List personLst= new ArrayList();
 JavaRDD personRDD = person_dataframe.toJavaRDD().map(new
 Function() {
   public Person call(Row row)  throws Exception{
   Person person = new Person();
   person.setId(row.getDecimal(0).longValue());
   person.setName(row.getString(1));

 personLst.add(person);
 // here I tried to call another function but control never passed
 return person;
   }
 });
 logger.info("personLst size =="+personLst.size());
 logger.info("personRDD count ==="+personRDD.count());

 //output is
 personLst size == 0
 personRDD count === 3



>>
>


Re: Help Required on Spark - Convert DataFrame to List with out using collect

2017-12-20 Thread Sunitha Chennareddy
Hi,

Thank You All..

Here is my requirement, I have a dataframe which contains list of rows
retrieved from oracle table.
I need to iterate dataframe and fetch each record and call a common
function by passing few parameters.

Issue I am facing is : I am not able to call common function

JavaRDD personRDD = person_dataframe.toJavaRDD().map(new
Function() {
  @Override
  public Person call(Row row)  throws Exception{
  Person person = new Person();
  person.setId(row.getDecimal(0).longValue());
  person.setName(row.getString(1));

personLst.add(person);
return person;
  }
});

personRDD.foreach(new VoidFunction() {
private static final long serialVersionUID = 1123456L;

@Override
public void call(Person person) throws Exception
{
  System.out.println(person.getId());
Here I tried to call common function 
}
   });

I am able to print data in foreach loop, however if I tried to call common
function it gives me below error
Error Message :  org.apache.spark.SparkException: Task not serializable

I kindly request you to share some idea(sample code / link to refer) on how
to call a common function/Interace method by passing values in each record
of the dataframe.

Regards,
Sunitha


On Tue, Dec 19, 2017 at 1:20 PM, Weichen Xu 
wrote:

> Hi Sunitha,
>
> In the mapper function, you cannot update outer variables such as 
> `personLst.add(person)`,
> this won't work so that's the reason you got an empty list.
>
> You can use `rdd.collect()` to get a local list of `Person` objects
> first, then you can safely iterate on the local list and do any update you
> want.
>
> Thanks.
>
> On Tue, Dec 19, 2017 at 2:16 PM, Sunitha Chennareddy <
> chennareddysuni...@gmail.com> wrote:
>
>> Hi Deepak,
>>
>> I am able to map row to person class, issue is I want to to call another
>> method.
>> I tried converting to list and its not working with out using collect.
>>
>> Regards
>> Sunitha
>> On Tuesday, December 19, 2017, Deepak Sharma 
>> wrote:
>>
>>> I am not sure about java but in scala it would be something like
>>> df.rdd.map{ x => MyClass(x.getString(0),.)}
>>>
>>> HTH
>>>
>>> --Deepak
>>>
>>> On Dec 19, 2017 09:25, "Sunitha Chennareddy" >> > wrote:
>>>
>>> Hi All,
>>>
>>> I am new to Spark, I want to convert DataFrame to List with
>>> out using collect().
>>>
>>> Main requirement is I need to iterate through the rows of dataframe and
>>> call another function by passing column value of each row (person.getId())
>>>
>>> Here is the snippet I have tried, Kindly help me to resolve the issue,
>>> personLst is returning 0:
>>>
>>> List personLst= new ArrayList();
>>> JavaRDD personRDD = person_dataframe.toJavaRDD().map(new
>>> Function() {
>>>   public Person call(Row row)  throws Exception{
>>>   Person person = new Person();
>>>   person.setId(row.getDecimal(0).longValue());
>>>   person.setName(row.getString(1));
>>>
>>> personLst.add(person);
>>> // here I tried to call another function but control never passed
>>> return person;
>>>   }
>>> });
>>> logger.info("personLst size =="+personLst.size());
>>> logger.info("personRDD count ==="+personRDD.count());
>>>
>>> //output is
>>> personLst size == 0
>>> personRDD count === 3
>>>
>>>
>>>
>


Re: Help Required on Spark - Convert DataFrame to List with out using collect

2017-12-18 Thread Weichen Xu
Hi Sunitha,

In the mapper function, you cannot update outer variables such as
`personLst.add(person)`,
this won't work so that's the reason you got an empty list.

You can use `rdd.collect()` to get a local list of `Person` objects first,
then you can safely iterate on the local list and do any update you want.

Thanks.

On Tue, Dec 19, 2017 at 2:16 PM, Sunitha Chennareddy <
chennareddysuni...@gmail.com> wrote:

> Hi Deepak,
>
> I am able to map row to person class, issue is I want to to call another
> method.
> I tried converting to list and its not working with out using collect.
>
> Regards
> Sunitha
> On Tuesday, December 19, 2017, Deepak Sharma 
> wrote:
>
>> I am not sure about java but in scala it would be something like
>> df.rdd.map{ x => MyClass(x.getString(0),.)}
>>
>> HTH
>>
>> --Deepak
>>
>> On Dec 19, 2017 09:25, "Sunitha Chennareddy" > > wrote:
>>
>> Hi All,
>>
>> I am new to Spark, I want to convert DataFrame to List with
>> out using collect().
>>
>> Main requirement is I need to iterate through the rows of dataframe and
>> call another function by passing column value of each row (person.getId())
>>
>> Here is the snippet I have tried, Kindly help me to resolve the issue,
>> personLst is returning 0:
>>
>> List personLst= new ArrayList();
>> JavaRDD personRDD = person_dataframe.toJavaRDD().map(new
>> Function() {
>>   public Person call(Row row)  throws Exception{
>>   Person person = new Person();
>>   person.setId(row.getDecimal(0).longValue());
>>   person.setName(row.getString(1));
>>
>> personLst.add(person);
>> // here I tried to call another function but control never passed
>> return person;
>>   }
>> });
>> logger.info("personLst size =="+personLst.size());
>> logger.info("personRDD count ==="+personRDD.count());
>>
>> //output is
>> personLst size == 0
>> personRDD count === 3
>>
>>
>>


Re: Help Required on Spark - Convert DataFrame to List with out using collect

2017-12-18 Thread Sunitha Chennareddy
Hi Jorn,

In my case I have to call common interface function
by passing the values of each rdd. So I have tried iterating , but I was
not able to trigger common function from call method as commented in the
snippet code in my earlier mail.

Request you please share your views.

Regards
Sunitha

On Tuesday, December 19, 2017, Jörn Franke  wrote:

> This is correct behavior. If you need to call another method simply append
> another map, flatmap or whatever you need.
>
> Depending on your use case you may use also reduce and reduce by key.
> However you never (!) should use a global variable as in your snippet.
> This can to work because you work in a distributed setting.
> Probably the code will fail on a cluster or at random.
>
> On 19. Dec 2017, at 07:16, Sunitha Chennareddy <
> chennareddysuni...@gmail.com> wrote:
>
> Hi Deepak,
>
> I am able to map row to person class, issue is I want to to call another
> method.
> I tried converting to list and its not working with out using collect.
>
> Regards
> Sunitha
> On Tuesday, December 19, 2017, Deepak Sharma 
> wrote:
>
>> I am not sure about java but in scala it would be something like
>> df.rdd.map{ x => MyClass(x.getString(0),.)}
>>
>> HTH
>>
>> --Deepak
>>
>> On Dec 19, 2017 09:25, "Sunitha Chennareddy" > > wrote:
>>
>> Hi All,
>>
>> I am new to Spark, I want to convert DataFrame to List with
>> out using collect().
>>
>> Main requirement is I need to iterate through the rows of dataframe and
>> call another function by passing column value of each row (person.getId())
>>
>> Here is the snippet I have tried, Kindly help me to resolve the issue,
>> personLst is returning 0:
>>
>> List personLst= new ArrayList();
>> JavaRDD personRDD = person_dataframe.toJavaRDD().map(new
>> Function() {
>>   public Person call(Row row)  throws Exception{
>>   Person person = new Person();
>>   person.setId(row.getDecimal(0).longValue());
>>   person.setName(row.getString(1));
>>
>> personLst.add(person);
>> // here I tried to call another function but control never passed
>> return person;
>>   }
>> });
>> logger.info("personLst size =="+personLst.size());
>> logger.info("personRDD count ==="+personRDD.count());
>>
>> //output is
>> personLst size == 0
>> personRDD count === 3
>>
>>
>>


Re: Help Required on Spark - Convert DataFrame to List with out using collect

2017-12-18 Thread Jörn Franke
This is correct behavior. If you need to call another method simply append 
another map, flatmap or whatever you need.

Depending on your use case you may use also reduce and reduce by key.
However you never (!) should use a global variable as in your snippet. This can 
to work because you work in a distributed setting.
Probably the code will fail on a cluster or at random.

> On 19. Dec 2017, at 07:16, Sunitha Chennareddy  
> wrote:
> 
> Hi Deepak,
> 
> I am able to map row to person class, issue is I want to to call another 
> method.
> I tried converting to list and its not working with out using collect.
>  
> Regards
> Sunitha
>> On Tuesday, December 19, 2017, Deepak Sharma  wrote:
>> I am not sure about java but in scala it would be something like df.rdd.map{ 
>> x => MyClass(x.getString(0),.)}
>> 
>> HTH
>> 
>> --Deepak
>> 
>> On Dec 19, 2017 09:25, "Sunitha Chennareddy"  wrote:
>> Hi All,
>> 
>> I am new to Spark, I want to convert DataFrame to List with out 
>> using collect().
>> 
>> Main requirement is I need to iterate through the rows of dataframe and call 
>> another function by passing column value of each row (person.getId())
>> 
>> Here is the snippet I have tried, Kindly help me to resolve the issue, 
>> personLst is returning 0:
>> 
>> List personLst= new ArrayList(); 
>> JavaRDD personRDD = person_dataframe.toJavaRDD().map(new 
>> Function() {
>>  
>>public Person call(Row row)  throws Exception{
>>Person person = new Person();
>>
>> person.setId(row.getDecimal(0).longValue());
>>person.setName(row.getString(1)); 
>>
>>  personLst.add(person);
>>  // here I tried to call another 
>> function but control never passed
>>  return person;
>>}
>>  });
>>  
>> logger.info("personLst size =="+personLst.size());
>> logger.info("personRDD count ==="+personRDD.count());
>> 
>> //output is 
>> personLst size == 0
>> personRDD count === 3
>>  
>> 


Re: Help Required on Spark - Convert DataFrame to List with out using collect

2017-12-18 Thread Sunitha Chennareddy
Hi Deepak,

I am able to map row to person class, issue is I want to to call another
method.
I tried converting to list and its not working with out using collect.

Regards
Sunitha
On Tuesday, December 19, 2017, Deepak Sharma  wrote:

> I am not sure about java but in scala it would be something like
> df.rdd.map{ x => MyClass(x.getString(0),.)}
>
> HTH
>
> --Deepak
>
> On Dec 19, 2017 09:25, "Sunitha Chennareddy"  > wrote:
>
> Hi All,
>
> I am new to Spark, I want to convert DataFrame to List with out
> using collect().
>
> Main requirement is I need to iterate through the rows of dataframe and
> call another function by passing column value of each row (person.getId())
>
> Here is the snippet I have tried, Kindly help me to resolve the issue,
> personLst is returning 0:
>
> List personLst= new ArrayList();
> JavaRDD personRDD = person_dataframe.toJavaRDD().map(new
> Function() {
>   public Person call(Row row)  throws Exception{
>   Person person = new Person();
>   person.setId(row.getDecimal(0).longValue());
>   person.setName(row.getString(1));
>
> personLst.add(person);
> // here I tried to call another function but control never passed
> return person;
>   }
> });
> logger.info("personLst size =="+personLst.size());
> logger.info("personRDD count ==="+personRDD.count());
>
> //output is
> personLst size == 0
> personRDD count === 3
>
>
>


Re: Help Required on Spark - Convert DataFrame to List with out using collect

2017-12-18 Thread Deepak Sharma
I am not sure about java but in scala it would be something like
df.rdd.map{ x => MyClass(x.getString(0),.)}

HTH

--Deepak

On Dec 19, 2017 09:25, "Sunitha Chennareddy" 
wrote:

Hi All,

I am new to Spark, I want to convert DataFrame to List with out
using collect().

Main requirement is I need to iterate through the rows of dataframe and
call another function by passing column value of each row (person.getId())

Here is the snippet I have tried, Kindly help me to resolve the issue,
personLst is returning 0:

List personLst= new ArrayList();
JavaRDD personRDD = person_dataframe.toJavaRDD().map(new
Function() {
  public Person call(Row row)  throws Exception{
  Person person = new Person();
  person.setId(row.getDecimal(0).longValue());
  person.setName(row.getString(1));

personLst.add(person);
// here I tried to call another function but control never passed
return person;
  }
});
logger.info("personLst size =="+personLst.size());
logger.info("personRDD count ==="+personRDD.count());

//output is
personLst size == 0
personRDD count === 3


Help Required on Spark - Convert DataFrame to List with out using collect

2017-12-18 Thread Sunitha Chennareddy
Hi All,

I am new to Spark, I want to convert DataFrame to List with out
using collect().

Main requirement is I need to iterate through the rows of dataframe and
call another function by passing column value of each row (person.getId())

Here is the snippet I have tried, Kindly help me to resolve the issue,
personLst is returning 0:

List personLst= new ArrayList();
JavaRDD personRDD = person_dataframe.toJavaRDD().map(new
Function() {
  public Person call(Row row)  throws Exception{
  Person person = new Person();
  person.setId(row.getDecimal(0).longValue());
  person.setName(row.getString(1));

personLst.add(person);
// here I tried to call another function but control never passed
return person;
  }
});
logger.info("personLst size =="+personLst.size());
logger.info("personRDD count ==="+personRDD.count());

//output is
personLst size == 0
personRDD count === 3