Re: Contributing to PySpark

2016-10-18 Thread Holden Karau
Hi Krishna,

Thanks for your interest contributing to PySpark! I don't personally use
either of those IDEs so I'll leave that part for someone else to answer -
but in general you can find the building spark documentation at
http://spark.apache.org/docs/latest/building-spark.html which includes
notes on how to run the Python tests as well. You will also probably want
to check out the contributing to Spark guide at
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark.

Cheers,

Holden :)

On Tue, Oct 18, 2016 at 2:16 AM, Krishna Kalyan <krishnakaly...@gmail.com>
wrote:

> Hello,
> I am a masters student. Could someone please let me know how set up my dev
> working environment to contribute to pyspark.
> Questions I had were:
> a) Should I use Intellij Idea or PyCharm?.
> b) How do I test my changes?.
>
> Regards,
> Krishna
>



-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Contributing to PySpark

2016-10-18 Thread Krishna Kalyan
Hello,
I am a masters student. Could someone please let me know how set up my dev
working environment to contribute to pyspark.
Questions I had were:
a) Should I use Intellij Idea or PyCharm?.
b) How do I test my changes?.

Regards,
Krishna


Re: Contributing to pyspark

2015-06-12 Thread Manoj Kumar
1, Yes, because the issues are in JIRA.
2. Nope, (at least as far as MLlib is concerned) because most if it are
just wrappers to the underlying Scala functions or methods and are not
implemented in pure Python.
3. I'm not sure about this. It seems to work fine for me!

HTH

On Fri, Jun 12, 2015 at 10:41 AM, Usman Ehtesham Gul uehtesha...@gmail.com
wrote:

 Hello Manoj,

 First of all thank you for the quick reply. Just a couple of more things.
 I have started reading the link you provided; I will definitely filter JIRA
 with PySpark.

 Can you verify:

 1) We fork from Github right? I ask because on Github, I see its mirrored
 and there are no issues section. I am assuming because that is done in Jira.
 2) To contribute to PySpark, we will have to clone the whole project. But
 if our changes/contributions are only specific to pyspark, we can do those
 too without relying on core spark and other client libraries right?
 3) I think the email u...@spark.apache.org is broken. I am getting email
 from  mailer-dae...@apache.org that email could be sent to this address.
 Can you check this?

 Thank you again. Hope to hear from you soon.

 Usman

 On Jun 12, 2015, at 12:57 AM, Manoj Kumar manojkumarsivaraj...@gmail.com
 wrote:

 Hi,

 Thanks for your interest in PySpark.

 The first thing is to have a look at the how to contribute guide
 https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
 and filter the JIRA's using the label PySpark.

 If you have your own improvement in mind, you can file your a JIRA,
 discuss and then send a Pull Request

 HTH

 Regards.

 On Fri, Jun 12, 2015 at 9:36 AM, Usman Ehtesham uehtesha...@gmail.com
 wrote:

 Hello,

 I am currently taking a course in Apache Spark via EdX (
 https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x)
 and at the same time I try to look at the code for pySpark too. I wanted to
 ask, if ideally I would like to contribute to pyspark specifically, how can
 I do that? I do not intend to contribute to core Apache Spark any time soon
 (mainly because I do not know Scala) but I am very comfortable in Python.

 Any tips on how to contribute specifically to pyspark without being
 affected by other parts of Spark would be greatly appreciated.

 P.S.: I ask this because there is a small change/improvement I would like
 to propose. Also since I just started learning Spark, I would like to also
 read and understand the pyspark code as I learn about Spark. :)

 Hope to hear from you soon.

 Usman Ehtesham Gul
 https://github.com/ueg1990




 --
 Godspeed,
 Manoj Kumar,
 http://manojbits.wordpress.com
 http://goog_1017110195/
 http://github.com/MechCoder





-- 
Godspeed,
Manoj Kumar,
http://manojbits.wordpress.com
http://goog_1017110195
http://github.com/MechCoder


Re: Contributing to pyspark

2015-06-11 Thread Manoj Kumar
Hi,

Thanks for your interest in PySpark.

The first thing is to have a look at the how to contribute guide
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark and
filter the JIRA's using the label PySpark.

If you have your own improvement in mind, you can file your a JIRA, discuss
and then send a Pull Request

HTH

Regards.

On Fri, Jun 12, 2015 at 9:36 AM, Usman Ehtesham uehtesha...@gmail.com
wrote:

 Hello,

 I am currently taking a course in Apache Spark via EdX (
 https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x)
 and at the same time I try to look at the code for pySpark too. I wanted to
 ask, if ideally I would like to contribute to pyspark specifically, how can
 I do that? I do not intend to contribute to core Apache Spark any time soon
 (mainly because I do not know Scala) but I am very comfortable in Python.

 Any tips on how to contribute specifically to pyspark without being
 affected by other parts of Spark would be greatly appreciated.

 P.S.: I ask this because there is a small change/improvement I would like
 to propose. Also since I just started learning Spark, I would like to also
 read and understand the pyspark code as I learn about Spark. :)

 Hope to hear from you soon.

 Usman Ehtesham Gul
 https://github.com/ueg1990




-- 
Godspeed,
Manoj Kumar,
http://manojbits.wordpress.com
http://goog_1017110195
http://github.com/MechCoder


Contributing to pyspark

2015-06-11 Thread Usman Ehtesham
Hello,

I am currently taking a course in Apache Spark via EdX (
https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x)
and at the same time I try to look at the code for pySpark too. I wanted to
ask, if ideally I would like to contribute to pyspark specifically, how can
I do that? I do not intend to contribute to core Apache Spark any time soon
(mainly because I do not know Scala) but I am very comfortable in Python.

Any tips on how to contribute specifically to pyspark without being
affected by other parts of Spark would be greatly appreciated.

P.S.: I ask this because there is a small change/improvement I would like
to propose. Also since I just started learning Spark, I would like to also
read and understand the pyspark code as I learn about Spark. :)

Hope to hear from you soon.

Usman Ehtesham Gul
https://github.com/ueg1990