looking for contributors to the build task: use errorprone

2015-05-29 Thread Reynold Xin
If somebody has some free cycles, we'd greatly appreciate some investigation a patch to integrate Google's errorprone with Spark's Maven build. https://issues.apache.org/jira/browse/SPARK-7938

Re: Dataframe's .drop in PySpark doesn't accept Column

2015-05-29 Thread Olivier Girardot
Actually, the Scala API too is only based on column name Le ven. 29 mai 2015 à 11:23, Olivier Girardot o.girar...@lateral-thoughts.com a écrit : Hi, Testing a bit more 1.4, it seems that the .drop() method in PySpark doesn't seem to accept a Column as input datatype : *

Dataframe's .drop in PySpark doesn't accept Column

2015-05-29 Thread Olivier Girardot
Hi, Testing a bit more 1.4, it seems that the .drop() method in PySpark doesn't seem to accept a Column as input datatype : *.join(only_the_best, only_the_best.pol_no == df.pol_no, inner).drop(only_the_best.pol_no)\* File /usr/local/lib/python2.7/site-packages/pyspark/sql/dataframe.py, line

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-29 Thread Peter Rudenko
Hi Yin, i’m using spark-hive dependency and tests for my app work for spark1.3.1. seems it’s something with hive sbt. Running from spark-shell next statement works, but from sbt console in rc3 i get next error: scala val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) 15/05/29

Re: ClosureCleaner slowing down Spark SQL queries

2015-05-29 Thread Josh Rosen
Hey, want to file a JIRA for this? This will make it easier to track progress on this issue. Definitely upload the profiler screenshots there, too, since that's helpful information. https://issues.apache.org/jira/browse/SPARK On Wed, May 27, 2015 at 11:12 AM, Nitin Goyal

Re: ClosureCleaner slowing down Spark SQL queries

2015-05-29 Thread Yin Huai
For Spark SQL internal operations, probably we can just create MapPartitionsRDD directly (like https://github.com/apache/spark/commit/5287eec5a6948c0c6e0baaebf35f512324c0679a ). On Fri, May 29, 2015 at 11:04 AM, Josh Rosen rosenvi...@gmail.com wrote: Hey, want to file a JIRA for this? This

[RESULT] [VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-29 Thread Patrick Wendell
Thanks for all the discussion on the vote thread. I am canceling this vote in favor of RC3. On Sun, May 24, 2015 at 12:22 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc2 (commit

[VOTE] Release Apache Spark 1.4.0 (RC3)

2015-05-29 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc.

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-05-29 Thread Taka Shinagawa
Mike, The broken Configuration link can be fixed if you add a missing dash '-' on the first line in docs/configuration.md and run 'jekyll build'. https://github.com/apache/spark/pull/6513 On Fri, May 29, 2015 at 6:38 PM, Mike Ringenburg mik...@cray.com wrote: The Configuration link on the

StreamingContextSuite fails with NoSuchMethodError

2015-05-29 Thread Ted Yu
Hi, I ran the following command on 1.4.0 RC3: mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive package I saw the following failure: ^[[32mStreamingContextSuite:^[[0m ^[[32m- from no conf constructor^[[0m ^[[32m- from no conf + spark home^[[0m ^[[32m- from no conf + spark home + env^[[0m

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-05-29 Thread Mike Ringenburg
The Configuration link on the docs appears to be broken. Mike On May 29, 2015, at 4:41 PM, Patrick Wendell pwend...@gmail.commailto:pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit

Saving DataFrame in Tachyon

2015-05-29 Thread sara mustafa
Hi All, I have Spark-1.3.0 and Tachyon-0.5.0. When I am trying to save RDD in tachyon, it success. But for saving a DataFrame it fails with the following error: java.lang.IllegalArgumentException: Wrong FS: tachyon://localhost:19998/myres, expected: hdfs://localhost:54310 at

Using UDFs in Java without registration

2015-05-29 Thread Justin Uang
I would like to define a UDF in Java via a closure and then use it without registration. In Scala, I believe there are two ways to do this: myUdf = functions.udf({ _ + 5}) myDf.select(myUdf(myDf(age))) or myDf.select(functions.callUDF({_ + 5}, DataTypes.IntegerType, myDf(age)))