something like an IndexRDD). But in your case you mention serialization
overhead to be the bottleneck, so maybe you could try filtering out
unchanged keys before persisting the data? Just an idea..
Andre
On 22/03/15 10:43, "Andre Schumacher" wrote:
>
>
>
>
Hi,
For testing you could also just use the Kafka 0.7.2 console consumer and
pipe it's output to netcat (nc) and process that as in the example
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/NetworkWordCount.scala
That worked for me. Back
Hi,
I don't think anybody has been testing importing of Impala tables
directly. Is there any chance to export these first, say as
unpartitioned Hive tables and import these? Just an idea..
Andre
On 07/21/2014 11:46 PM, chutium wrote:
> no, something like this
>
> 14/07/20 00:19:29 ERROR cluste
Hi,
are you using the amplab/spark-1.0.0 images from the global registry?
Andre
On 06/17/2014 01:36 AM, Mohit Jaggi wrote:
> Hi Folks,
>
> I am having trouble getting spark driver running in docker. If I run a
> pyspark example on my mac it works but the same example on a docker image
> (Via b
Hi,
On 06/12/2014 05:47 PM, Toby Douglass wrote:
> In these future jobs, when I come to load the aggregted RDD, will Spark
> load and only load the columns being accessed by the query? or will Spark
> load everything, to convert it into an internal representation, and then
> execute the query?