I'm in favor of everything in /extras and /external being removed, but
I'm more in favor of making a decision and moving on.
On Tue, Mar 22, 2016 at 12:20 PM, Marcelo Vanzin wrote:
> +1 for getting flume back.
>
> On Tue, Mar 22, 2016 at 12:27 AM, Kostas Sakellis wrote:
>> Hello all,
>>
>> I'd l
+1 for getting flume back.
On Tue, Mar 22, 2016 at 12:27 AM, Kostas Sakellis wrote:
> Hello all,
>
> I'd like to close out the discussion on SPARK-13843 by getting a poll from
> the community on which components we should seriously reconsider re-adding
> back to Apache Spark. For reference, here
Can someone please post the following information on the "Powered by Spark"
wiki pages, thank you.
Organization: IBM www.ibm.com/spark
Project URL: https://github.com/EclairJS/eclairjs-node
Brief project description: EclairJS enables Node.js developers to code
against Spark, and data scientist
Hi all,
Wez, I read your thread earlier today after I sent this message and its
exciting someone of your caliber working on the issue :)
For a short term solution i've created a Gist which performs the toPandas
operation using the mapPartitions method suggested by Mark:
https://gist.github.com/jo
hi all,
I recently did an analysis of the performance of toPandas
summary: http://wesmckinney.com/blog/pandas-and-apache-arrow/
ipython notebook: https://gist.github.com/wesm/0cb5531b1c2e346a0007
One solution I'm planning for this is an alternate serializer for
Spark DataFrames, with an optimize
Hi Josh,
The work around we figured out to solve network latency and out of memory
problems with the toPandas method was to create Pandas DataFrames or Numpy
Arrays using MapPartitions for each partition. Maybe a standard solution around
this line of thought could be built. The integration is q
We recently released an object store connector for Spark.
https://github.com/SparkTC/stocator
Currently this connector contains driver for the Swift based object store
( like SoftLayer or any other Swift cluster ), but it can easily support
additional object stores.
There is a pending patch to s
Hi,
I’m trying to do some dynamic scheduling by an external application by
looking at the jobs in a Spark framework.
I need the job description to know which kind of query I’m dealing with. The
problem is that the job description (set with: sparkCtx.setJobDescription)
but in case of a job with m
Hi,
A common pattern in my work is querying large tables in Spark DataFrames
and then needing to do more detailed analysis locally when the data can fit
into memory. However, i've hit a few blockers. In Scala no well developed
DataFrame library exists and in Python the `toPandas` function is very
>From the error message, it seems some artifacts from Scala 2.10.4 were left
around.
FYI maven 3.3.9 is required for master branch.
On Tue, Mar 22, 2016 at 3:07 AM, Allen wrote:
> Hi,
>
> I am facing an error when doing compilation from IDEA, please see the
> attached. I fired the build process
I am trying out StatefulNetworkWordCount from latest Spark master branch.
When I run this example I see a odd behaviour.
If in a batch a key is repeated the output stream prints for each
repetition e.g. If I key in "ab" five times for input it will show like
(ab,1)
(ab,2)
(ab,3)
(ab,4)
(ab,5)
Is
On 19 Mar 2016, at 16:16, Pete Robbins
mailto:robbin...@gmail.com>> wrote:
There are several open Jiras to add new Sinks
OpenTSDB https://issues.apache.org/jira/browse/SPARK-12194
StatsD https://issues.apache.org/jira/browse/SPARK-11574
statsd is nicely easy to test: either listen in on a (l
I guess different workload cause diff result ?
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-OutOfMemoryError-Unable-to-acquire-bytes-of-memory-tp16773p16789.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com
Interesting. After experimenting with various parameters increasing
spark.sql.shuffle.partitions and decreasing spark.buffer.pageSize helped my
job go through. BTW I will be happy to help getting this issue fixed.
Nezih
On Tue, Mar 22, 2016 at 1:07 AM james wrote:
Hi,
> I also found 'Unable to
Hi,
I also found 'Unable to acquire memory' issue using Spark 1.6.1 with Dynamic
allocation on YARN. My case happened with setting
spark.sql.shuffle.partitions larger than 200. From error stack, it has a
diff with issue reported by Nezih and not sure if these has same root cause.
Thanks
James
16
OK, so kafka, kinesis and flume will stay in Spark.
Thanks,
Regards
JB
On 03/22/2016 08:30 AM, Reynold Xin wrote:
Kinesis is still in it. I think it's OK to add Flume back.
On Tue, Mar 22, 2016 at 12:29 AM, Jean-Baptiste Onofré mailto:j...@nanthrax.net>> wrote:
Thanks for the update Kosta
Kinesis is still in it. I think it's OK to add Flume back.
On Tue, Mar 22, 2016 at 12:29 AM, Jean-Baptiste Onofré
wrote:
> Thanks for the update Kostas,
>
> for now, kafka stays in Spark and Kinesis will be removed, right ?
>
> Regards
> JB
>
> On 03/22/2016 08:27 AM, Kostas Sakellis wrote:
>
>>
Thanks for the update Kostas,
for now, kafka stays in Spark and Kinesis will be removed, right ?
Regards
JB
On 03/22/2016 08:27 AM, Kostas Sakellis wrote:
Hello all,
I'd like to close out the discussion on SPARK-13843 by getting a poll
from the community on which components we should seriousl
Hello all,
I'd like to close out the discussion on SPARK-13843 by getting a poll from
the community on which components we should seriously reconsider re-adding
back to Apache Spark. For reference, here are the modules that were removed
as part of SPARK-13843 and pushed to: https://github.com/spar
19 matches
Mail list logo