Spark on yarn, only 1 or 2 vcores getting allocated to the containers getting created.

2016-08-02 Thread satyajit vegesna
Hi All, I am trying to run a spark job using yarn, and i specify --executor-cores value as 20. But when i go check the "nodes of the cluster" page in http://hostname:8088/cluster/nodes then i see 4 containers getting created on each of the node in cluster. But can only see 1 vcore getting assigne

Re: SQL Based Authorization for SparkSQL

2016-08-02 Thread Ted Yu
There was SPARK-12008 which was closed. Not sure if there is active JIRA in this regard. On Tue, Aug 2, 2016 at 6:40 PM, 马晓宇 wrote: > Hi guys, > > I wonder if anyone working on SQL based authorization already or not. > > This is something we needed badly right now and we tried to embedded a > H

SQL Based Authorization for SparkSQL

2016-08-02 Thread 马晓宇
Hi guys, I wonder if anyone working on SQL based authorization already or not. This is something we needed badly right now and we tried to embedded a Hive frontend in front of SparkSQL to achieve this but it's not quite a elegant solution. If SparkSQL has a way to do it or anyone already work

Re: AccumulatorV2 += operator

2016-08-02 Thread Holden Karau
I believe it was intentional with the idea that it would be more unified between Java and Scala APIs. If your talking about the javadoc mention in https://github.com/apache/spark/pull/14466/files - I believe the += is meant to refer to what the internal implementation of the add function can be for

AccumulatorV2 += operator

2016-08-02 Thread Bryan Cutler
It seems like the += operator is missing from the new accumulator API, although the docs still make reference to it. Anyone know if it was intentionally not put in? I'm happy to do a PR for it or update the docs to just use the add() method, just want to check if there was some reason first. Bry

Graph edge type pattern matching in GraphX

2016-08-02 Thread Ulanov, Alexander
Dear Spark developers, Could you suggest how to perform pattern matching on the type of the graph edge in the following scenario. I need to perform some math by means of aggregateMessages on the graph edges if edges are Double. Here is the code: def my[VD: ClassTag, ED: ClassTag] (graph: Graph[V

Re: What happens in Dataset limit followed by rdd

2016-08-02 Thread Sun Rui
Spark does optimise subsequent limits, for example: scala> df1.limit(3).limit(1).explain == Physical Plan == CollectLimit 1 +- *SerializeFromObject [assertnotnull(input[0, $line14.$read$$iw$$iw$my, true], top level non-flat input object).x AS x#2] +- Scan ExternalRDDScan[obj#1] However, limit

Re: Testing --supervise flag

2016-08-02 Thread Noorul Islam Kamal Malmiyoda
Widening to dev@spark On Mon, Aug 1, 2016 at 4:21 PM, Noorul Islam K M wrote: > > Hi all, > > I was trying to test --supervise flag of spark-submit. > > The documentation [1] says that, the flag helps in restarting your > application automatically if it exited with non-zero exit code. > > I am lo

Re: What happens in Dataset limit followed by rdd

2016-08-02 Thread Maciej Szymkiewicz
Thank you for your prompt response and great examples Sun Rui but I am still confused about one thing. Do you see any particular reason to not to merge subsequent limits? Following case (limit n (map f (limit m ds))) could be optimized to: (map f (limit n (limit m ds))) and further to

Re: What happens in Dataset limit followed by rdd

2016-08-02 Thread Sun Rui
Based on your code, here is simpler test case on Spark 2.0 case class my (x: Int) val rdd = sc.parallelize(0.until(1), 1000).map { x => my(x) } val df1 = spark.createDataFrame(rdd) val df2 = df1.limit(1) df1.map { r => r.getAs[Int](0) }.first df2.map { r => r.getAs[Int](0) }.first // Much slow

Re: [MLlib] Term Frequency in TF-IDF seems incorrect

2016-08-02 Thread Nick Pentreath
Note that both HashingTF and CountVectorizer are usually used for creating TF-IDF normalized vectors. The definition ( https://en.wikipedia.org/wiki/Tf%E2%80%93idf#Definition) of term frequency in TF-IDF is actually the "number of times the term occurs in the document". So it's perhaps a bit of a