Re: [VOTE] Release Apache Spark 1.3.1 (RC2)

2015-04-10 Thread Michael Armbrust
-1 (binding)

We just were alerted to a pretty serious regression since 1.3.0 (
https://issues.apache.org/jira/browse/SPARK-6851).  Should have a fix
shortly.

Michael

On Fri, Apr 10, 2015 at 6:10 AM, Corey Nolet cjno...@gmail.com wrote:

 +1 (non-binding)

 - Verified signatures
 - built on Mac OSX
 - built on Fedora 21

 All builds were done using profiles: hive, hive-thriftserver, hadoop-2.4,
 yarn

 +1 tested ML-related items on Mac OS X

 On Wed, Apr 8, 2015 at 7:59 PM, Krishna Sankar ksanka...@gmail.com
 wrote:

  +1 (non-binding, of course)
 
  1. Compiled OSX 10.10 (Yosemite) OK Total time: 14:16 min
   mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
  -Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11
  2. Tested pyspark, mlib - running as well as compare results with 1.3.0
 pyspark works well with the new iPython 3.0.0 release
  2.1. statistics (min,max,mean,Pearson,Spearman) OK
  2.2. Linear/Ridge/Laso Regression OK
  2.3. Decision Tree, Naive Bayes OK
  2.4. KMeans OK
 Center And Scale OK
  2.5. RDD operations OK
State of the Union Texts - MapReduce, Filter,sortByKey (word count)
  2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
 Model evaluation/optimization (rank, numIter, lambda) with
 itertools
  OK
  3. Scala - MLlib
  3.1. statistics (min,max,mean,Pearson,Spearman) OK
  3.2. LinearRegressionWithSGD OK
  3.3. Decision Tree OK
  3.4. KMeans OK
  3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
  4.0. Spark SQL from Python OK
  4.1. result = sqlContext.sql(SELECT * from people WHERE State = 'WA')
 OK
 
  On Tue, Apr 7, 2015 at 10:46 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
   Please vote on releasing the following candidate as Apache Spark
 version
   1.3.1!
  
   The tag to be voted on is v1.3.1-rc2 (commit 7c4473a):
  
  
 

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=7c4473aa5a7f5de0323394aaedeefbf9738e8eb5
  
   The list of fixes present in this release can be found at:
   http://bit.ly/1C2nVPY
  
   The release files, including signatures, digests, etc. can be found at:
   http://people.apache.org/~pwendell/spark-1.3.1-rc2/
  
   Release artifacts are signed with the following key:
   https://people.apache.org/keys/committer/pwendell.asc
  
   The staging repository for this release can be found at:
  
 https://repository.apache.org/content/repositories/orgapachespark-1083/
  
   The documentation corresponding to this release can be found at:
   http://people.apache.org/~pwendell/spark-1.3.1-rc2-docs/
  
   The patches on top of RC1 are:
  
   [SPARK-6737] Fix memory leak in OutputCommitCoordinator
   https://github.com/apache/spark/pull/5397
  
   [SPARK-6636] Use public DNS hostname everywhere in spark_ec2.py
   https://github.com/apache/spark/pull/5302
  
   [SPARK-6205] [CORE] UISeleniumSuite fails for Hadoop 2.x test with
   NoClassDefFoundError
   https://github.com/apache/spark/pull/4933
  
   Please vote on releasing this package as Apache Spark 1.3.1!
  
   The vote is open until Saturday, April 11, at 07:00 UTC and passes
   if a majority of at least 3 +1 PMC votes are cast.
  
   [ ] +1 Release this package as Apache Spark 1.3.1
   [ ] -1 Do not release this package because ...
  
   To learn more about Apache Spark, please see
   http://spark.apache.org/
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
   For additional commands, e-mail: dev-h...@spark.apache.org
  
  
 



Guidance for becoming Spark contributor

2015-04-10 Thread Nitin Mathur
Hi Spark Dev Team,

I want to start contributing to Spark Open source. This is the first time I
will be doing any open source contributions.

It would be great if I can get some guidance on where I can start with.

Thanks,
- Nitin


Re: Guidance for becoming Spark contributor

2015-04-10 Thread Nicholas Chammas
Have you reviewed this guide?

https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

Nick

On Fri, Apr 10, 2015 at 7:29 PM Nitin Mathur ntnmat...@gmail.com wrote:

 Hi Spark Dev Team,

 I want to start contributing to Spark Open source. This is the first time I
 will be doing any open source contributions.

 It would be great if I can get some guidance on where I can start with.

 Thanks,
 - Nitin



Query regarding infering data types in pyspark

2015-04-10 Thread Suraj Shetiya
Hi,

In pyspark when if I read a json file using sqlcontext I find that the date
field is not infered as date instead it is converted to string. And when I
try to convert it to date using df.withColumn(df.DateCol.cast(timestamp))
it does not parse it successfuly and adds a null instead there. Should I
use UDF to convert the date ? Is this expected behaviour (not throwing an
error after failure to cast all fields)?

-- 
Regards,
Suraj