it you
> would use
>
> OffsetRange("topicname", 0, 0, 1)
>
> Kafka's simple shell consumer in that case would print
>
> next offset = 1
>
>
> So instead trying to consume
>
> OffsetRange("topicname", 0, 1, 2)
> shouldn't be expect
Hi,
I am using KafkaUtils.createRDD to retrieve data from Kafka for batch
processing and
when Invoking KafkaUtils.createRDD with an OffsetRange where
OffsetRange.fromOffset == OffsetRange.untilOffset for a particular
partition, i get an empy RDD.
Documentation is clear that until is exclusive and
s to first repartition
> the data by partition columns first.
>
> Cheng
>
>
> On 7/15/15 7:05 PM, Nikos Viorres wrote:
>
> Hi,
>
> I am trying to test partitioning for DataFrames with parquet usage so i
> attempted to do df.write().partitionBy("some_column"
Hi,
I am trying to test partitioning for DataFrames with parquet usage so i
attempted to do df.write().partitionBy("some_column").parquet(path) on a
small dataset of 20.000 records which when saved as parquet locally with
gzip take 4mb of disk space.
However, on my dev machine with
-Dspark.master=
Hi Akhil,
Yes, that's what we are planning on doing at the end of the data. At the
moment I am doing performance testing before the job hits production and
testing on 4 cores to get baseline figures and deduced that in order to
grow to 10 - 15 million keys we ll need at batch interval of ~20 secs
Hi all,
We are having a few issues with the performance of updateStateByKey
operation in Spark Streaming (1.2.1 at the moment) and any advice would be
greatly appreciated. Specifically, on each tick of the system (which is set
at 10 secs) we need to update a state tuple where the key is the user_i
Hi all,
We are having a few issues with the performance of updateStateByKey
operation in Spark Streaming (1.2.1 at the moment) and any advice would be
greatly appreciated. Specifically, on each tick of the system (which is set
at 10 secs) we need to update a state tuple where the key is the user_i