Re: 回复:Spark-submit Problems

2016-10-15 Thread Tobi Bosede
Hi Mekal, thanks for wanting to help. I have attached the python script as well as the different exceptions here. I have also pasted the cluster exception below so I can highlight the relevant parts. [abosede2@badboy ~]$ spark-submit --master spark://10.160.5.48:7077 trade_data_count.py Ivy

Re: 回复:Spark-submit Problems

2016-10-15 Thread Mekal Zheng
Show me your code 2016年10月16日 +0800 08:24 hxfeng <980548...@qq.com>,写道: > show you pi.py code and what is the exception message? > > > -- 原始邮件 -- > 发件人: "Tobi Bosede";; > 发送时间: 2016年10月16日(星期天) 上午8:04 > 收件人: "user"; > >

Aggregate UDF (UDAF) in Python

2016-10-15 Thread Tobi Bosede
Hello, I am trying to use a UDF that calculates inter-quartile (IQR) range for pivot() and SQL in pyspark and got the error that my function wasn't an aggregate function in both scenarios. Does anyone know if UDAF functionality is available in python? If not, what can I do as a work around?

??????Spark-submit Problems

2016-10-15 Thread hxfeng
show you pi.py code and what is the exception message? -- -- ??: "Tobi Bosede";; : 2016??10??16??(??) 8:04 ??: "user"; : Spark-submit Problems Hi everyone, I am having

Spark-submit Problems

2016-10-15 Thread Tobi Bosede
Hi everyone, I am having problems submitting an app through spark-submit when the master is not "local". However the pi.py example which comes with Spark works with any master. I believe my script has the same structure as pi.py, but for some reason my script is not as flexible. Specifically, the

NoClassDefFoundError: org/apache/spark/Logging in SparkSession.getOrCreate

2016-10-15 Thread Brad Cox
I'm experimenting with Spark 2.0.1 for the first time and hitting a problem right out of the gate. My main routine starts with this which I think is the standard idiom. SparkSession sparkSession = SparkSession .builder()

reading files with .list extension

2016-10-15 Thread Hafiz Mujadid
hi, I want to load the files in apache-spark with .list extensions as actors.list.gz here . Can anybody please suggest me the Hadoop input format for such kind of files? Thanks

Re: mllib model in production web API

2016-10-15 Thread Nicolas Long
Hi Sean and Aseem, thanks both. A simple thing which sped things up greatly was simply to load our sql (for one record effectively) directly and then convert to a dataframe, rather than using Spark to load it. Sounds stupid, but this took us from > 5 seconds to ~1 second on a very small instance.

Why the json file used by sparkSession.read.json must be a valid json object per line

2016-10-15 Thread codlife
Hi: I'm doubt about the design of spark.read.json, why the json file is not a standard json file, who can tell me the internal reason. Any advice is appreciated. -- View this message in context:

Re: import sql.implicits._

2016-10-15 Thread Jakub Dubovsky
Hey Koert, thanks for explanation. I did not recall that every rdd/df/dataset has a "parent" context/sqlContext. When I think about this it kinda make sense. J. On Fri, Oct 14, 2016 at 11:54 PM, Koert Kuipers wrote: > about the stackoverflow question, do this: > > def