er.
>> It
>> > really is fairly easy to set up, and seems to be quite good so far.
>> > >
>> > > -Thunder
>> > >
>> > >
>> > > -Original Message-----
>> > > From: amiori...@gmail.com [mailto:amiori...@gmail.co
t; It
> > really is fairly easy to set up, and seems to be quite good so far.
> > >
> > > -Thunder
> > >
> > >
> > > -Original Message-
> > > From: amiori...@gmail.com [mailto:amiori...@gmail.com] On Behalf Of
> > Alberto Miorin
> > > Se
March 13, 2015 12:15 PM
> > To: users@kafka.apache.org
> > Cc: otis.gospodne...@gmail.com
> > Subject: Re: Alternative to camus
> >
> > We use spark on mesos. I don't want to partition our cluster because of
> one YARN job (camus).
> >
> > Best
> >
> &
to
> Miorin
> Sent: Friday, March 13, 2015 12:15 PM
> To: users@kafka.apache.org
> Cc: otis.gospodne...@gmail.com
> Subject: Re: Alternative to camus
>
> We use spark on mesos. I don't want to partition our cluster because of one
> YARN job (camus).
>
> Best
>
...@gmail.com] On Behalf Of Alberto
Miorin
Sent: Friday, March 13, 2015 12:15 PM
To: users@kafka.apache.org
Cc: otis.gospodne...@gmail.com
Subject: Re: Alternative to camus
We use spark on mesos. I don't want to partition our cluster because of one
YARN job (camus).
Best
Alberto
On Fri, M
It seemed really counter-intuitive; I can only imagine that it happened
because nobody wanted to refactor the existing KafkaInputDStream to use the
SimpleConsumer instead of the High Level Consumer (unless I'm misreading
the source - it looks like that's what the new DirectKafkaInputDStream is
doin
Also very interesting in hearing about them.
I prefer war stories in form for Jira for the relevant project ;)
There's a good chance we can make things less horrible if issues are reported.
Gwen
On Fri, Mar 13, 2015 at 12:48 PM, Andrew Otto wrote:
>> We are currently using spark streaming 1.2.1
1) You save everything 2 times (kafka and hdfs).
2) You need to enable the checkpoint feature, that means you cannot change
the configuration of the job, because the spark streaming context is
deserialized from hdfs every time you restart the job.
3) What happens if hdfs is unavailable, not clear?
> We are currently using spark streaming 1.2.1 with kafka and write-ahead log.
> I will only say one thing : "a nightmare". ;-)
I’d be really interested in hearing about your experience here. I’m exploring
streaming frameworks a bit, and Spark Streaming is just so easy to use and set
up. I’d be
I really like the new approach. The WAL in HDFS never made much sense
to me (I mean, Kafka is a log. I know they don't want the Kafka
dependency, but a log for a log makes no sense).
Still experimental, but I think thats the right direction.
On Fri, Mar 13, 2015 at 12:38 PM, Alberto Miorin
wrote
Thanks for the heads-up, Alberto, that's good to know. We were about to
start a few projects working with Spark Streaming + Kafka; sounds like
there's still quite a bit of work to be done there.
-Will
On Fri, Mar 13, 2015 at 3:38 PM, Alberto Miorin
wrote:
> We are currently using spark streamin
We are currently using spark streaming 1.2.1 with kafka and write-ahead log.
I will only say one thing : "a nightmare". ;-)
Let's see if things are better with 1.3.0 :
http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
On Fri, Mar 13, 2015 at 8:33 PM, William Briggs wrote:
> Sp
Spark Streaming also has built-in support for Kafka, and as of Spark 1.2,
it supports using an HDFS write-ahead log to ensure zero data loss while
streaming:
https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html
-Will
On Fri, Mar 13, 201
I'll try this too. It looks very promising.
Thx
On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira wrote:
> There's a KafkaRDD that can be used in Spark:
> https://github.com/tresata/spark-kafka. It doesn't exactly replace
> Camus, but should be useful in building Camus-like system in Spark.
>
> On
There's a KafkaRDD that can be used in Spark:
https://github.com/tresata/spark-kafka. It doesn't exactly replace
Camus, but should be useful in building Camus-like system in Spark.
On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin
wrote:
> We use spark on mesos. I don't want to partition our clust
Flume solution looks very good.
Thx.
On Fri, Mar 13, 2015 at 8:15 PM, William Briggs wrote:
> I would think that this is not a particularly great solution, as you will
> end up running into quite a few edge cases, and I can't see this scaling
> particularly well - how do you know which server t
I would think that this is not a particularly great solution, as you will
end up running into quite a few edge cases, and I can't see this scaling
particularly well - how do you know which server to copy logs from in a
clustered and replicated environment? What happens when Kafka detects a
failure
We use spark on mesos. I don't want to partition our cluster because of one
YARN job (camus).
Best
Alberto
On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:
> Just curious - why - is Camus not suitable/working?
>
> Thanks,
> Otis
> --
> Monitoring * Alerting
Just curious - why - is Camus not suitable/working?
Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/
On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin
wrote:
> I was wondering if anybody has already tried t
I was wondering if anybody has already tried to mirror a kafka topic to
hdfs just copying the log files from the topic directory of the broker
(like 23244237.log).
The file format is very simple :
https://twitter.com/amiorin/status/576448691139121152/photo/1
Implementing an InputForma
20 matches
Mail list logo