Re: Dataset API agg question

2016-06-07 Thread Reynold Xin
Take a look at the implementation of typed sum/avg: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/scalalang/typed.scala You can implement a typed max/min. On Tue, Jun 7, 2016 at 4:31 PM, Alexander Pivovarov wrote: >

Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

2016-06-07 Thread Ted Yu
Please go ahead. On Tue, Jun 7, 2016 at 4:45 PM, franklyn wrote: > Thanks for reproducing it Ted, should i make a Jira Issue?. > > > > -- > View this message in context: >

Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

2016-06-07 Thread franklyn
Thanks for reproducing it Ted, should i make a Jira Issue?. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Can-t-use-UDFs-with-Dataframes-in-spark-2-0-preview-scala-2-10-tp17845p17852.html Sent from the Apache Spark Developers List mailing list

Re: Dataset API agg question

2016-06-07 Thread Alexander Pivovarov
Ted, It does not work like that you have to .map(toAB).toDS On Tue, Jun 7, 2016 at 4:07 PM, Ted Yu wrote: > Have you tried the following ? > > Seq(1->2, 1->5, 3->6).toDS("a", "b") > > then you can refer to columns by name. > > FYI > > > On Tue, Jun 7, 2016 at 3:58 PM,

Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

2016-06-07 Thread Ted Yu
I built with Scala 2.10 >>> df.select(add_one(df.a).alias('incremented')).collect() The above just hung. On Tue, Jun 7, 2016 at 3:31 PM, franklyn wrote: > Thanks Ted !. > > I'm using > >

Re: Dataset API agg question

2016-06-07 Thread Ted Yu
Have you tried the following ? Seq(1->2, 1->5, 3->6).toDS("a", "b") then you can refer to columns by name. FYI On Tue, Jun 7, 2016 at 3:58 PM, Alexander Pivovarov wrote: > I'm trying to switch from RDD API to Dataset API > My question is about reduceByKey method > >

Dataset API agg question

2016-06-07 Thread Alexander Pivovarov
I'm trying to switch from RDD API to Dataset API My question is about reduceByKey method e.g. in the following example I'm trying to rewrite sc.parallelize(Seq(1->2, 1->5, 3->6)).reduceByKey(math.max).take(10) using DS API. That is what I have so far: Seq(1->2, 1->5,

Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

2016-06-07 Thread franklyn
Thanks Ted !. I'm using https://github.com/apache/spark/commit/8f5a04b6299e3a47aca13cbb40e72344c0114860 and building with scala-2.10 I can confirm that it works with scala-2.11 -- View this message in context:

Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

2016-06-07 Thread Ted Yu
With commit 200f01c8fb15680b5630fbd122d44f9b1d096e02 using Scala 2.11: Using Python version 2.7.9 (default, Apr 29 2016 10:48:06) SparkSession available as 'spark'. >>> from pyspark.sql import SparkSession >>> from pyspark.sql.types import IntegerType, StructField, StructType >>> from

Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

2016-06-07 Thread Franklyn D'souza
I've built spark-2.0-preview (8f5a04b) with scala-2.10 using the following > > > ./dev/change-version-to-2.10.sh > ./dev/make-distribution.sh -DskipTests -Dzookeeper.version=3.4.5 > -Dcurator.version=2.4.0 -Dscala-2.10 -Phadoop-2.6 -Pyarn -Phive and then ran the following code in a pyspark

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-07 Thread Shivaram Venkataraman
As far as I know the process is just to copy docs/_site from the build to the appropriate location in the SVN repo (i.e. site/docs/2.0.0-preview). Thanks Shivaram On Tue, Jun 7, 2016 at 8:14 AM, Sean Owen wrote: > As a stop-gap, I can edit that page to have a small section

Standalone Cluster Mode: how does spark allocate spark.executor.cores?

2016-06-07 Thread ElfoLiNk
Hi, I'm searching for how and where spark allocates cores per executor in the source code. Is it possible to control programmaticaly allocated cores in standalone cluster mode? Regards, Matteo -- View this message in context:

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-07 Thread Tom Graves
Thanks Sean, you were right, hard refresh made it show up. Seems like we should at least link to the preview docs fromĀ  http://spark.apache.org/documentation.html. Tom On Tuesday, June 7, 2016 10:04 AM, Sean Owen wrote: It's there (refresh maybe?). See the end of the

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-07 Thread Sean Owen
As a stop-gap, I can edit that page to have a small section about preview releases and point to the nightly docs. Not sure who has the power to push 2.0.0-preview to site/docs, but, if that's done then we can symlink "preview" in that dir to it and be done, and update this section about preview

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-07 Thread Sean Owen
It's there (refresh maybe?). See the end of the downloads dropdown. For the moment you can see the docs in the nightly docs build: https://home.apache.org/~pwendell/spark-nightly/spark-branch-2.0-docs/latest/ I don't know, what's the best way to put this into the main site? under a /preview

Re: Welcoming Yanbo Liang as a committer

2016-06-07 Thread Xiangrui Meng
Congrats!! On Mon, Jun 6, 2016, 8:12 AM Gayathri Murali wrote: > Congratulations Yanbo Liang! Well deserved. > > > On Sun, Jun 5, 2016 at 7:10 PM, Shixiong(Ryan) Zhu < > shixi...@databricks.com> wrote: > >> Congrats, Yanbo! >> >> On Sun, Jun 5, 2016 at 6:25 PM,

streaming JobScheduler and error handling confusing behavior

2016-06-07 Thread Krot Viacheslav
Hi, I don't know if it is a bug or a feature, but one thing in streaming error handling seems confusing to me - I create streaming context, start and call #awaitTermination like this: try { ssc.awaitTermination(); } catch (Exception e) { LoggerFactory.getLogger(getClass()).error("Job failed.