date:20160322

Re: SPARK-13843 Next steps

2016-03-22 Thread Cody Koeninger

I'm in favor of everything in /extras and /external being removed, but I'm more in favor of making a decision and moving on. On Tue, Mar 22, 2016 at 12:20 PM, Marcelo Vanzin wrote: > +1 for getting flume back. > > On Tue, Mar 22, 2016 at 12:27 AM, Kostas Sakellis wrote: >> Hello all, >> >> I'd l

Re: SPARK-13843 Next steps

2016-03-22 Thread Marcelo Vanzin

+1 for getting flume back. On Tue, Mar 22, 2016 at 12:27 AM, Kostas Sakellis wrote: > Hello all, > > I'd like to close out the discussion on SPARK-13843 by getting a poll from > the community on which components we should seriously reconsider re-adding > back to Apache Spark. For reference, here

EclairJS for "Powered by Spark" Wiki page

2016-03-22 Thread DavidFallside

Can someone please post the following information on the "Powered by Spark" wiki pages, thank you. Organization: IBM www.ibm.com/spark Project URL: https://github.com/EclairJS/eclairjs-node Brief project description: EclairJS enables Node.js developers to code against Spark, and data scientist

Re: toPandas very slow

2016-03-22 Thread Josh Levy-Kramer

Hi all, Wez, I read your thread earlier today after I sent this message and its exciting someone of your caliber working on the issue :) For a short term solution i've created a Gist which performs the toPandas operation using the mapPartitions method suggested by Mark: https://gist.github.com/jo

Re: toPandas very slow

2016-03-22 Thread Wes McKinney

hi all, I recently did an analysis of the performance of toPandas summary: http://wesmckinney.com/blog/pandas-and-apache-arrow/ ipython notebook: https://gist.github.com/wesm/0cb5531b1c2e346a0007 One solution I'm planning for this is an alternate serializer for Spark DataFrames, with an optimize

Re: toPandas very slow

2016-03-22 Thread Mark Vervuurt

Hi Josh, The work around we figured out to solve network latency and out of memory problems with the toPandas method was to create Pandas DataFrames or Numpy Arrays using MapPartitions for each partition. Maybe a standard solution around this line of thought could be built. The integration is q

new object store driver for Spark

2016-03-22 Thread Gil Vernik

We recently released an object store connector for Spark. https://github.com/SparkTC/stocator Currently this connector contains driver for the Swift based object store ( like SoftLayer or any other Swift cluster ), but it can easily support additional object stores. There is a pending patch to s

Job description only visible after job finish

2016-03-22 Thread hansbogert

Hi, I’m trying to do some dynamic scheduling by an external application by looking at the jobs in a Spark framework. I need the job description to know which kind of query I’m dealing with. The problem is that the job description (set with: sparkCtx.setJobDescription) but in case of a job with m

toPandas very slow

2016-03-22 Thread Josh Levy-Kramer

Hi, A common pattern in my work is querying large tables in Spark DataFrames and then needing to do more detailed analysis locally when the data can fit into memory. However, i've hit a few blockers. In Scala no well developed DataFrame library exists and in Python the `toPandas` function is very

Re: error occurs to compile spark 1.6.1 using scala 2.11.8

2016-03-22 Thread Ted Yu

>From the error message, it seems some artifacts from Scala 2.10.4 were left around. FYI maven 3.3.9 is required for master branch. On Tue, Mar 22, 2016 at 3:07 AM, Allen wrote: > Hi, > > I am facing an error when doing compilation from IDEA, please see the > attached. I fired the build process

StatefulNetworkWordCount behaviour

2016-03-22 Thread Rishi Mishra

I am trying out StatefulNetworkWordCount from latest Spark master branch. When I run this example I see a odd behaviour. If in a batch a key is repeated the output stream prints for each repetition e.g. If I key in "ab" five times for input it will show like (ab,1) (ab,2) (ab,3) (ab,4) (ab,5) Is

Re: Can we remove private[spark] from Metrics Source and SInk traits?

2016-03-22 Thread Steve Loughran

On 19 Mar 2016, at 16:16, Pete Robbins mailto:robbin...@gmail.com>> wrote: There are several open Jiras to add new Sinks OpenTSDB https://issues.apache.org/jira/browse/SPARK-12194 StatsD https://issues.apache.org/jira/browse/SPARK-11574 statsd is nicely easy to test: either listen in on a (l

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-03-22 Thread james

I guess different workload cause diff result ? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-OutOfMemoryError-Unable-to-acquire-bytes-of-memory-tp16773p16789.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-03-22 Thread Nezih Yigitbasi

Interesting. After experimenting with various parameters increasing spark.sql.shuffle.partitions and decreasing spark.buffer.pageSize helped my job go through. BTW I will be happy to help getting this issue fixed. Nezih On Tue, Mar 22, 2016 at 1:07 AM james wrote: Hi, > I also found 'Unable to

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-03-22 Thread james

Hi, I also found 'Unable to acquire memory' issue using Spark 1.6.1 with Dynamic allocation on YARN. My case happened with setting spark.sql.shuffle.partitions larger than 200. From error stack, it has a diff with issue reported by Nezih and not sure if these has same root cause. Thanks James 16

Re: SPARK-13843 Next steps

2016-03-22 Thread Jean-Baptiste Onofré

OK, so kafka, kinesis and flume will stay in Spark. Thanks, Regards JB On 03/22/2016 08:30 AM, Reynold Xin wrote: Kinesis is still in it. I think it's OK to add Flume back. On Tue, Mar 22, 2016 at 12:29 AM, Jean-Baptiste Onofré mailto:j...@nanthrax.net>> wrote: Thanks for the update Kosta

Re: SPARK-13843 Next steps

2016-03-22 Thread Reynold Xin

Kinesis is still in it. I think it's OK to add Flume back. On Tue, Mar 22, 2016 at 12:29 AM, Jean-Baptiste Onofré wrote: > Thanks for the update Kostas, > > for now, kafka stays in Spark and Kinesis will be removed, right ? > > Regards > JB > > On 03/22/2016 08:27 AM, Kostas Sakellis wrote: > >>

Re: SPARK-13843 Next steps

2016-03-22 Thread Jean-Baptiste Onofré

Thanks for the update Kostas, for now, kafka stays in Spark and Kinesis will be removed, right ? Regards JB On 03/22/2016 08:27 AM, Kostas Sakellis wrote: Hello all, I'd like to close out the discussion on SPARK-13843 by getting a poll from the community on which components we should seriousl

SPARK-13843 Next steps

2016-03-22 Thread Kostas Sakellis

Hello all, I'd like to close out the discussion on SPARK-13843 by getting a poll from the community on which components we should seriously reconsider re-adding back to Apache Spark. For reference, here are the modules that were removed as part of SPARK-13843 and pushed to: https://github.com/spar

Re: SPARK-13843 Next steps

Re: SPARK-13843 Next steps

EclairJS for "Powered by Spark" Wiki page

Re: toPandas very slow

Re: toPandas very slow

Re: toPandas very slow

new object store driver for Spark

Job description only visible after job finish

toPandas very slow

Re: error occurs to compile spark 1.6.1 using scala 2.11.8

StatefulNetworkWordCount behaviour

Re: Can we remove private[spark] from Metrics Source and SInk traits?

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

Re: SPARK-13843 Next steps

Re: SPARK-13843 Next steps

Re: SPARK-13843 Next steps

SPARK-13843 Next steps

19 matches

Site Navigation

Mail list logo

Footer information