from:"darren"

Re: spark cluster performance decreases by adding more nodes

2017-05-17 Thread darren

Maybe your master or zeppelin server is running out of memory and the more data it receives the more memory swapping it has to dosomething to check. Get Outlook for Android On Wed, May 17, 2017 at 11:14 AM -0400, "Junaid Nasir" wrote: I have a large data set of 1B records

Re: Where is release 2.1.1?

2017-05-05 Thread darren

Thanks. It looks like they posted the release just now because it wasn't showing before. Get Outlook for Android On Fri, May 5, 2017 at 11:04 AM -0400, "Jules Damji" wrote: Go to this link http://spark.apache.org/downloads.html CheersJules Sent from my iPhonePardon the

Where is release 2.1.1?

2017-05-05 Thread darren

Hi Website says it is released. Where can it be downloaded? Thanks Get Outlook for Android

Re: Running Spark on EMR

2017-01-15 Thread Darren Govoni

So what was the answer? Sent from my Verizon, Samsung Galaxy smartphone Original message From: Andrew Holway Date: 1/15/17 11:37 AM (GMT-05:00) To: Marco Mistroni Cc: Neil Jonkers , User Subject: Re: Running Spark on EMR Darn. I didn't respond to the list. Sorry. On Su

Spark in docker over EC2

2017-01-10 Thread Darren Govoni

Anyone got a good guide for getting spark master to talk to remote workers inside dockers? I followed the tips found by searching but doesn't work still. Spark 1.6.2. I exposed all the ports and tried to set local IP inside container to the host IP but spark complains it can't bind ui ports. Tha

Re: Dependency Injection and Microservice development with Spark

2017-01-04 Thread darren

Just replying for info since it's not identical to your request but in the same spirit. Darren Sent from my Verizon, Samsung Galaxy smartphone Original message From: Chetan Khatri Date: 1/4/17 6:34 AM (GMT-05:00) To: Lars Albertsson Cc: user , Spark Dev List S

Re: Scala Vs Python

2016-09-02 Thread darren

te: 9/2/16 4:03 AM (GMT-05:00) To: Mich Talebzadeh Cc: Jakob Odersky , ayan guha , Tal Grynbaum , darren , kant kodali , AssafMendelson , user Subject: Re: Scala Vs Python Whatever benefits you may accrue from the rapid prototyping and coding in Python, it will be offset against the tim

Re: Scala Vs Python

2016-09-01 Thread darren

This topic is a concern for us as well. In the data science world no one uses native scala or java by choice. It's R and Python. And python is growing. Yet in spark, python is 3rd in line for feature support, if at all. This is why we have decoupled from spark in our project. It's really unfortu

RE: AMQP extension for Apache Spark Streaming (messaging/IoT)

2016-07-03 Thread Darren Govoni

This is fantastic news. Sent from my Verizon 4G LTE smartphone Original message From: Paolo Patierno Date: 7/3/16 4:41 AM (GMT-05:00) To: user@spark.apache.org Subject: AMQP extension for Apache Spark Streaming (messaging/IoT) Hi all, I'm working on an AMQP exten

Re: Spark + Kafka processing trouble

2016-05-30 Thread Darren Govoni

from my Verizon Wireless 4G LTE smartphone Original message From: Malcolm Lockyer Date: 05/30/2016 10:40 PM (GMT-05:00) To: user@spark.apache.org Subject: Re: Spark + Kafka processing trouble On Tue, May 31, 2016 at 1:56 PM, Darren Govoni wrote: > So you are calling a

RE: Spark + Kafka processing trouble

2016-05-30 Thread Darren Govoni

So you are calling a SQL query (to a single database) within a spark operation distributed across your workers? Sent from my Verizon Wireless 4G LTE smartphone Original message From: Malcolm Lockyer Date: 05/30/2016 9:45 PM (GMT-05:00) To: user@spark.apache.org Su

Submit python egg?

2016-05-18 Thread Darren Govoni

Hi I have a python egg with a __main__.py in it. I am able to execute the egg by itself fine. Is there a way to just submit the egg to spark and have it run? It seems an external .py script is needed which would be unfortunate if true. Thanks Sent from my Verizon Wireless 4G LTE smartpho

Re: Does pyspark still lag far behind the Scala API in terms of features

2016-03-02 Thread Darren Govoni

M (GMT-05:00) To: Darren Govoni , Jules Damji , Joshua Sorrell Cc: user@spark.apache.org Subject: Re: Does pyspark still lag far behind the Scala API in terms of features Plenty of people get their data in Parquet, Avro, or ORC files; or from a database; or do their initial loading of u

Re: Does pyspark still lag far behind the Scala API in terms of features

2016-03-02 Thread Darren Govoni

Dataframes are essentially structured tables with schemas. So where does the non typed data sit before it becomes structured if not in a traditional RDD? For us almost all the processing comes before there is structure to it. Sent from my Verizon Wireless 4G LTE smartphone Orig

RE: How could I do this algorithm in Spark?

2016-02-25 Thread Darren Govoni

This might be hard to do. One generalization of this problem is https://en.m.wikipedia.org/wiki/Longest_path_problem Given a node (e.g. A), find longest path. All interior relations are transitive and can be inferred. But finding a distributed spark way of doing it in P time would be intere

RE: Unusually large deserialisation time

2016-02-16 Thread Darren Govoni

I meant to write 'last task in stage'. Sent from my Verizon Wireless 4G LTE smartphone Original message ---- From: Darren Govoni Date: 02/16/2016 6:55 AM (GMT-05:00) To: Abhishek Modi , user@spark.apache.org Subject: RE: Unusually large deserialisation time

RE: Unusually large deserialisation time

2016-02-16 Thread Darren Govoni

I think this is part of the bigger issue of serious deadlock conditions occurring in spark many of us have posted on. Would the task in question be the past task of a stage by chance? Sent from my Verizon Wireless 4G LTE smartphone Original message From: Abhishek Modi

Re: Spark workers disconnecting on 1.5.2

2016-02-11 Thread Darren Govoni

Max Date: 02/11/2016 2:44 PM (GMT-05:00) To: Darren Govoni Cc: user@spark.apache.org Subject: Re: Spark workers disconnecting on 1.5.2 No, ours are running on Docker containers spread across few physical servers. Databricks runs their service on AWS. Wonder if they are seeing this issues

RE: Spark workers disconnecting on 1.5.2

2016-02-11 Thread Darren Govoni

I see this too. Might explain some other serious problems we're having with 1.5.2 Is your cluster in AWS? Sent from my Verizon Wireless 4G LTE smartphone Original message From: Andy Max Date: 02/11/2016 2:12 PM (GMT-05:00) To: user@spark.apache.org Subject: Spark w

Re: 10hrs of Scheduler Delay

2016-01-25 Thread Darren Govoni

From: "Sanders, Isaac B" Date: 01/25/2016 8:59 AM (GMT-05:00) To: Ted Yu Cc: Darren Govoni , Renu Yadav , Muthu Jayakumar , user@spark.apache.org Subject: Re: 10hrs of Scheduler Delay Is the thread dump the stack trace you are talking about? If so, I will see if I can

Re: Launching EC2 instances with Spark compiled for Scala 2.11

2016-01-25 Thread Darren Govoni

Why not deploy it. Then build a custom distribution with Scala 2.11 and just overlay it. Sent from my Verizon Wireless 4G LTE smartphone Original message From: Nuno Santos Date: 01/25/2016 7:38 AM (GMT-05:00) To: user@spark.apache.org Subject: Re: Launching EC2 ins

Re: 10hrs of Scheduler Delay

2016-01-25 Thread Darren Govoni

if I only run 10mb of it it will succeed. This suggest a serious fundamental scaling problem. Workers have plenty of resources. Sent from my Verizon Wireless 4G LTE smartphone Original message From: "Sanders, Isaac B" Date: 01/24/2016 2:54 PM (GMT-05:00) To: Ren

Re: 10hrs of Scheduler Delay

2016-01-22 Thread Darren Govoni

) To: Darren Govoni , "Sanders, Isaac B" , Ted Yu Cc: user@spark.apache.org Subject: Re: 10hrs of Scheduler Delay Does increasing the number of partition helps? You could try out something 3 times what you currently have. Another trick i used was to partition the problem int

Re: 10hrs of Scheduler Delay

2016-01-22 Thread Darren Govoni

Me too. I had to shrink my dataset to get it to work. For us at least Spark seems to have scaling issues. Sent from my Verizon Wireless 4G LTE smartphone Original message From: "Sanders, Isaac B" Date: 01/21/2016 11:18 PM (GMT-05:00) To: Ted Yu Cc: user@spark.apac

Re: 10hrs of Scheduler Delay

2016-01-21 Thread Darren Govoni

I've experienced this same problem. Always the last stage hangs. Indeterminant. No errors in logs. I run spark 1.5.2. Can't find an explanation. But it's definitely a showstopper. Sent from my Verizon Wireless 4G LTE smartphone Original message From: Ted Yu Date: 01/2

RE: visualize data from spark streaming

2016-01-20 Thread Darren Govoni

Gotta roll your own. Look at kafka and websockets for example. Sent from my Verizon Wireless 4G LTE smartphone Original message From: patcharee Date: 01/20/2016 2:54 PM (GMT-05:00) To: user@spark.apache.org Subject: visualize data from spark streaming Hi, How to

Re: Docker/Mesos with Spark

2016-01-19 Thread Darren Govoni

I also would be interested in some best practice for making this work. Where will the writeup be posted? On mesosphere website? Sent from my Verizon Wireless 4G LTE smartphone Original message From: Sathish Kumaran Vairavelu Date: 01/19/2016 7:00 PM (GMT-05:00) To: T

Re: rdd.foreach return value

2016-01-18 Thread Darren Govoni

What's the rationale behind that? It certainly limits the kind of flow logic we can do in one statement. Sent from my Verizon Wireless 4G LTE smartphone Original message From: David Russell Date: 01/18/2016 10:44 PM (GMT-05:00) To: charles li Cc: user@spark.apache

Re: Task hang problem

2015-12-29 Thread Darren Govoni

here's executor trace. Thread 58: Executor task launch worker-3 (RUNNABLE) java.net.SocketInputStream.socketRead0(Native Method) java.net.SocketInputStream.read(SocketInputStream.java:152) java.net.SocketI

Task hang problem

2015-12-29 Thread Darren Govoni

Hi, I've had this nagging problem where a task will hang and the entire job hangs. Using pyspark. Spark 1.5.1 The job output looks like this, and hangs after the last task: .. 15/12/29 17:00:38 INFO BlockManagerInfo: Added broadcast_0_piece0 in me

Re: DataFrame Vs RDDs ... Which one to use When ?

2015-12-28 Thread Darren Govoni

I'll throw a thought in here. Dataframes are nice if your data is uniform and clean with consistent schema. However in many big data problems this is seldom the case. Sent from my Verizon Wireless 4G LTE smartphone Original message From: Chris Fregly Date: 12/28/2015

Re: Scala VS Java VS Python

2015-12-16 Thread Darren Govoni

I use python too. I'm actually surprises it's not the primary language since it is by far more used in data science than java snd Scala combined. If I had a second choice of script language for general apps I'd want groovy over scala. Sent from my Verizon Wireless 4G LTE smartphone -

RE: PySpark RDD with NumpyArray Structure

2015-12-06 Thread Darren Govoni

Maybe this is helpful https://github.com/lensacom/sparkit-learn/blob/master/README.rst Sent from my Verizon Wireless 4G LTE smartphone Original message From: Mustafa Elbehery Date: 12/06/2015 3:59 PM (GMT-05:00) To: user Subject: PySpark RDD with NumpyArray Structu

Re: Pyspark submitted app just hangs

2015-12-02 Thread Darren Govoni

This to me doesn't give me a direction to look without the actual logs from $SPARK_HOME or the stderr from the worker UI. Just imho maybe someone know what this means but it seems like it could be caused by a lot of things. On 12/2/2015 6:48 PM, Darren Govoni wrote: Hi all, Wondering if

Pyspark submitted app just hangs

2015-12-02 Thread Darren Govoni

Hi all, Wondering if someone can provide some insight why this pyspark app is just hanging. Here is output. ... 15/12/03 01:47:05 INFO TaskSetManager: Starting task 21.0 in stage 0.0 (TID 21, 10.65.143.174, PROCESS_LOCAL, 1794787 bytes) 15/12/03 01:47:05 INFO TaskSetManager: Starting task 22

RE: thought experiment: use spark ML to real time prediction

2015-11-12 Thread darren

I agree 100%. Making the model requires large data and many cpus. Using it does not. This is a very useful side effect of ML models. If mlib can't use models outside spark that's a real shame. Sent from my Verizon Wireless 4G LTE smartphone Original message From: "Kothuvat

Python Kafka support?

2015-11-10 Thread Darren Govoni

Hi, I read on this page http://spark.apache.org/docs/latest/streaming-kafka-integration.html about python support for "receiverless" kafka integration (Approach 2) but it says its incomplete as of version 1.4. Has this been updated in version 1.5.

can distinct transform applied on DStream?

2015-03-20 Thread Darren Hoo

val aDstream = ... val distinctStream = aDstream.transform(_.distinct()) but the elements in distinctStream are not distinct. Did I use it wrong?

Re: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Darren Hoo

On Wed, Mar 18, 2015 at 8:31 PM, Shao, Saisai wrote: > From the log you pasted I think this (-rw-r--r-- 1 root root 80K Mar > 18 16:54 shuffle_47_519_0.data) is not shuffle spilled data, but the > final shuffle result. > why the shuffle result is written to disk? > As I said, did you think

Re: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Darren Hoo

I've already done that: >From SparkUI Environment Spark properties has: spark.shuffle.spillfalse On Wed, Mar 18, 2015 at 6:34 PM, Akhil Das wrote: > I think you can disable it with spark.shuffle.spill=false > > Thanks > Best Regards > > On Wed, Mar 18, 2015 at

Re: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Darren Hoo

Thanks, Shao On Wed, Mar 18, 2015 at 3:34 PM, Shao, Saisai wrote: > Yeah, as I said your job processing time is much larger than the sliding > window, and streaming job is executed one by one in sequence, so the next > job will wait until the first job is finished, so the total latency will be

Re: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Darren Hoo

sliding window is just 3 seconds, so you will > process each 60 second's data in 3 seconds, if processing latency is larger > than the sliding window, so maybe you computation power cannot reach to the > qps you wanted. > > > > I think you need to identify the bottleneck

[spark-streaming] can shuffle write to disk be disabled?

2015-03-17 Thread Darren Hoo

I use spark-streaming reading messages from a Kafka, the producer creates messages about 1500 per second def hash(x: String): Int = { MurmurHash3.stringHash(x) } val stream = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap, StorageLevel.MEMORY_ONLY_SER).map(_._2

43 matches

Mail list logo