; On Thu, Sep 29, 2016 at 10:40 AM, Deepak Sharma
> wrote:
> > If you use spark direct streams , it ensure end to end guarantee for
> > messages.
> >
> >
> > On Thu, Sep 29, 2016 at 9:05 PM, Ali Akhtar
> wrote:
> >>
> >> My concern with Post
ost /
> >> duplicated data? Are your writes idempotent?
> >>
> >> Absent any other information about the problem, I'd stay away from
> >> cassandra/flume/hdfs/hbase/whatever, and use a spark direct stream
> >> feeding postgres.
> >>
>
On Thu, Sep 29, 2016 at 8:24 PM, Ali Akhtar wrote:
>
>> I don't think I need a different speed storage and batch storage. Just
>> taking in raw data from Kafka, standardizing, and storing it somewhere
>> where the web UI can query it, seems like it will be enough.
>>
&g
laimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages ar
ny
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On
It needs to be able to scale to a very large amount of data, yes.
On Thu, Sep 29, 2016 at 7:00 PM, Deepak Sharma
wrote:
> What is the message inflow ?
> If it's really high , definitely spark will be of great use .
>
> Thanks
> Deepak
>
> On Sep 29, 2016 19:24, "
I have a somewhat tricky use case, and I'm looking for ideas.
I have 5-6 Kafka producers, reading various APIs, and writing their raw
data into Kafka.
I need to:
- Do ETL on the data, and standardize it.
- Store the standardized data somewhere (HBase / Cassandra / Raw HDFS /
ElasticSearch / Pos