Re: storm-rdbms consume data from kafka spout fast enough?

Sa Li Tue, 09 Dec 2014 16:16:25 -0800

Hi, Irek

What you have done is exactly I want,  I was running my topology in
localcluster, but I submit it to storm cluster :

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/etc/apache-storm-0.9.3/lib/logback-classic-1.0.13.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/stuser/backup/pof.analytics.messaging/kafka-storm-ingress/target/kafka-storm-ingress-0.0.1-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/Static
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type
[ch.qos.logback.classic.util.ContextSelectorStaticBinder]
DB connected .....
531  [main] INFO  backtype.storm.StormSubmitter - Jar not uploaded to
master yet. Submitting jar...
542  [main] INFO  backtype.storm.StormSubmitter - Uploading topology jar
target/kafka-storm-ingress-0.0.1-SNAPSHOT-jar-with-dependencies.jar to
assigned location: /app/storm/nimbus/inbox/s
r
739  [main] INFO  backtype.storm.StormSubmitter - Successfully uploaded
topology jar to assigned location:
/app/storm/nimbus/inbox/stormjar-f3b2a8bd-0d16-4ba5-9d94-51b3ecf53e5b.jar
740  [main] INFO  backtype.storm.StormSubmitter - Submitting topology 2 in
distributed mode with conf
{"topology.max.task.parallelism":5,"nimbus.host":"10.100.70.128","topology.workers":2,
":6627,"storm.zookeeper.servers":["10.100.70.128"],"topology.trident.batch.emit.interval.millis":2000}
842  [main] INFO  backtype.storm.StormSubmitter - Finished submitting
topology: 2

but I find nothing shown in UI, this is one issure. Again back to batch
mode, when you doing the batch copy, my assumption is, accumulate tuples in
a byte array[], and cop/multi-insert into DB, clear array and reload
......, is that the way or an existing API I can use?

thank

Alec

On Tue, Dec 9, 2014 at 2:04 PM, Irek Khasyanov <[email protected]> wrote:

> >Do I need to make bulk copy?
>
> It depends. If you topology will fail, kafka spout will starts read from
> last known offset. If you will have too many data to write. And inserting
> one row can be bottleneck.
>
> You can test it actually, stop topology, write around 10000+/- messages to
> kafka and start topology. In storm ui you will see capacity for writer
> bolt. If it red colored and over 1.0 you should notice that and this is
> your bottleneck.
>
> We have kafka to HP Vertica stream. Vertica don't like 1 row inserts and
> we added batches with 10K rows. With 4 workers everything looks great.
>
>
>
> On 10 December 2014 at 00:34, Sa Li <[email protected]> wrote:
>
>> Hello, all
>>
>> I have a question here, as I post several threads before, I am using
>> storm-rdbms to write into postgresqlDB, data was collected from
>> kafkaSpout, it works. Since it insert into DB once I get a tuple, per
>> row/insert operation. I have concern that if this type of consuming is fast
>> enough and will potentially cost the overhead?
>>
>> Do I need to make bulk copy?
>>
>>
>> thanks
>>
>>
>> Alec
>>
>
>
>
> --
> With best regards, Irek Khasyanov.
>

Re: storm-rdbms consume data from kafka spout fast enough?

Reply via email to