Re: Google Summer of Code - ideas

2015-02-28 Thread Sath
All 

  I would like to contribute for the google summer of code projects. Please 
guide me to start the process 


Sath 


> On Feb 28, 2015, at 11:34 AM, Manoj Kumar  
> wrote:
> 
> Hi,
> 
> Thanks a lot.
> Yes indeed, I am interested. I shall start looking at all the related
> JIRA's in a while.
> 
> 
> 
> -- 
> Godspeed,
> Manoj Kumar,
> http://manojbits.wordpress.com
> 
> http://github.com/MechCoder

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Scheduler hang?

2015-02-28 Thread Victor Tso-Guillen
Moving user to bcc.

What I found was that the TaskSetManager for my task set that had 5 tasks
had preferred locations set for 4 of the 5. Three had localhost/
and had completed. The one that had nothing had also completed. The last
one was set by our code to be my IP address. Local mode can hang on this
because of https://issues.apache.org/jira/browse/SPARK-4939 addressed by
https://github.com/apache/spark/pull/4147, which is obviously not an
optimal solution but since it's only local mode, it's very good enough. I'm
not going to wait for those seconds to tick by to complete the task, so
I'll fix the IP address reporting side for local mode in my code.

On Thu, Feb 26, 2015 at 8:32 PM, Victor Tso-Guillen  wrote:

> Of course, breakpointing on every status update and revive offers
> invocation kept the problem from happening. Where could the race be?
>
> On Thu, Feb 26, 2015 at 7:55 PM, Victor Tso-Guillen 
> wrote:
>
>> Love to hear some input on this. I did get a standalone cluster up on my
>> local machine and the problem didn't present itself. I'm pretty confident
>> that means the problem is in the LocalBackend or something near it.
>>
>> On Thu, Feb 26, 2015 at 1:37 PM, Victor Tso-Guillen 
>> wrote:
>>
>>> Okay I confirmed my suspicions of a hang. I made a request that stopped
>>> progressing, though the already-scheduled tasks had finished. I made a
>>> separate request that was small enough not to hang, and it kicked the hung
>>> job enough to finish. I think what's happening is that the scheduler or the
>>> local backend is not kicking the revive offers messaging at the right time,
>>> but I have to dig into the code some more to nail the culprit. Anyone on
>>> these list have experience in those code areas that could help?
>>>
>>> On Thu, Feb 26, 2015 at 2:27 AM, Victor Tso-Guillen 
>>> wrote:
>>>
 Thanks for the link. Unfortunately, I turned on rdd compression and
 nothing changed. I tried moving netty -> nio and no change :(

 On Thu, Feb 26, 2015 at 2:01 AM, Akhil Das 
 wrote:

> Not many that i know of, but i bumped into this one
> https://issues.apache.org/jira/browse/SPARK-4516
>
> Thanks
> Best Regards
>
> On Thu, Feb 26, 2015 at 3:26 PM, Victor Tso-Guillen 
> wrote:
>
>> Is there any potential problem from 1.1.1 to 1.2.1 with shuffle
>> dependencies that produce no data?
>>
>> On Thu, Feb 26, 2015 at 1:56 AM, Victor Tso-Guillen 
>> wrote:
>>
>>> The data is small. The job is composed of many small stages.
>>>
>>> * I found that with fewer than 222 the problem exhibits. What will
>>> be gained by going higher?
>>> * Pushing up the parallelism only pushes up the boundary at which
>>> the system appears to hang. I'm worried about some sort of message loss 
>>> or
>>> inconsistency.
>>> * Yes, we are using Kryo.
>>> * I'll try that, but I'm again a little confused why you're
>>> recommending this. I'm stumped so might as well?
>>>
>>> On Wed, Feb 25, 2015 at 11:13 PM, Akhil Das <
>>> ak...@sigmoidanalytics.com> wrote:
>>>
 What operation are you trying to do and how big is the data that
 you are operating on?

 Here's a few things which you can try:

 - Repartition the RDD to a higher number than 222
 - Specify the master as local[*] or local[10]
 - Use Kryo Serializer (.set("spark.serializer",
 "org.apache.spark.serializer.KryoSerializer"))
 - Enable RDD Compression (.set("spark.rdd.compress","true") )


 Thanks
 Best Regards

 On Thu, Feb 26, 2015 at 10:15 AM, Victor Tso-Guillen <
 v...@paxata.com> wrote:

> I'm getting this really reliably on Spark 1.2.1. Basically I'm in
> local mode with parallelism at 8. I have 222 tasks and I never seem 
> to get
> far past 40. Usually in the 20s to 30s it will just hang. The last 
> logging
> is below, and a screenshot of the UI.
>
> 2015-02-25 20:39:55.779 GMT-0800 INFO  [task-result-getter-3]
> TaskSetManager - Finished task 3.0 in stage 16.0 (TID 22) in 612 ms on
> localhost (1/5)
> 2015-02-25 20:39:55.825 GMT-0800 INFO  [Executor task launch
> worker-10] Executor - Finished task 1.0 in stage 16.0 (TID 20). 2492 
> bytes
> result sent to driver
> 2015-02-25 20:39:55.825 GMT-0800 INFO  [Executor task launch
> worker-8] Executor - Finished task 2.0 in stage 16.0 (TID 21). 2492 
> bytes
> result sent to driver
> 2015-02-25 20:39:55.831 GMT-0800 INFO  [task-result-getter-0]
> TaskSetManager - Finished task 1.0 in stage 16.0 (TID 20) in 670 ms on
> localhost (2/5)
> 2015-02-25 20:39:55.836 GMT-0800 INFO  [task-result-getter-1]
> TaskSetManager - Finished task 2.0 in stage 16.0 (TID 21) in 67

Re: Google Summer of Code - ideas

2015-02-28 Thread Manoj Kumar
Hi,

Thanks a lot.
Yes indeed, I am interested. I shall start looking at all the related
JIRA's in a while.



-- 
Godspeed,
Manoj Kumar,
http://manojbits.wordpress.com

http://github.com/MechCoder


Re: How to create a Row from a List or Array in Spark using Scala

2015-02-28 Thread DEVAN M.S.
  In scala API its there, Row.fromSeq(ARRAY), I dnt know much more
about java api



Devan M.S. | Research Associate | Cyber Security | AMRITA VISHWA
VIDYAPEETHAM | Amritapuri | Cell +919946535290 |


On Sat, Feb 28, 2015 at 1:28 PM, r7raul1...@163.com 
wrote:

> import org.apache.spark.sql.catalyst.expressions._
>
> val values: JavaArrayList[Any] = new JavaArrayList()
> computedValues = Row(values.get(0),values.get(1)) //It is not good by use
> get(index).  How to create a Row from a List or Array in Spark using Scala .
>
>
>
> r7raul1...@163.com
>


How to create a Row from a List or Array in Spark using Scala

2015-02-28 Thread r7raul1...@163.com
import org.apache.spark.sql.catalyst.expressions._

val values: JavaArrayList[Any] = new JavaArrayList()
computedValues = Row(values.get(0),values.get(1)) //It is not good by use 
get(index).  How to create a Row from a List or Array in Spark using Scala .



r7raul1...@163.com