Re: Position in Kafka Stream

2014-05-28 Thread Danijel Schiavuzzi
Yes, Trident Kafka spouts give you the same metrics. Take a look at the code to find out what's available. On Wed, May 28, 2014 at 3:55 AM, Tyson Norris tnor...@adobe.com wrote: Do Trident variants of kafka spouts do something similar? Thanks Tyson On May 27, 2014, at 3:19 PM, Harsha

Re: kafka-storm-starter, trying to add memcached state and trident topology (word count)

2014-05-28 Thread Romain Leroux
Basically any IRichSpout is OK...It's just that you get a non transactional topology https://github.com/nathanmarz/storm/wiki/Trident-spouts https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/TridentTopology.java#L113 In the end if I comment out the

Re: Position in Kafka Stream

2014-05-28 Thread Cody A. Ray
You can also use stormkafkamon to track this stuff. Its not good for historical analysis like graphite/ganglia, but its good if you just want to see how things currently stand. The original: https://github.com/otoolep/stormkafkamon This didn't work for us without some updates (incompatibility

Re: PersistentAggregate

2014-05-28 Thread Cody A. Ray
You may find this helpful to understanding persistantAggregate works internally. http://svendvanderveken.wordpress.com/2013/07/30/scalable-real-time-state-update-with-storm/ -Cody On Tue, May 27, 2014 at 1:53 PM, Danijel Schiavuzzi dani...@schiavuzzi.comwrote: The aggregations are done by

Re: Building Storm

2014-05-28 Thread Justin Workman
On Ubuntu 12.04 I have tried with Maven 3.0.4 and now the latest 3.2.1. On Tue, May 27, 2014 at 5:35 PM, P. Taylor Goetz ptgo...@gmail.com wrote: I'll do a couple tests, but for the most part it should just work on OSX, etc. (Storm releases are built on OSX). What version of maven are you

Re: Position in Kafka Stream

2014-05-28 Thread Tyson Norris
Thanks Cody - I tried the BrightTag fork and still have problems with storm 0.9.1-incubating and kafka 0.8.1, I get an error with my trident topology (haven’t tried non-trident yet): (venv)tnorris-osx:stormkafkamon tnorris$ ./monitor.py --topology TrendingTagTopology --spoutroot storm

Re: Building Storm

2014-05-28 Thread Przemek Grzędzielski
Hi Stormers, made some progress on Mac OS X - the thing was about java versions. Maven version: 3.2.1. Storm build yields the mentioned error when I run Java 8 (Oracle's JDK). After switching to Java 6 (Apple's JDK) the build succeeds. I'll try playing with jdk versions on Ubuntu 12.04 later

Re: Building Storm

2014-05-28 Thread Justin Workman
Yes Oracle JDK 1.7.0_60 On Wed, May 28, 2014 at 9:17 AM, Harsha st...@harsha.io wrote: I assume you are using oracle jdk. I tested on ubuntu 12.04 with maven 3.2.1, git 1.7.9 , java 1.7.0_55, python 2.7.3, ruby 1.8.7. On Wed, May 28, 2014, at 08:01 AM, Justin Workman wrote: On Ubuntu

Re: Position in Kafka Stream

2014-05-28 Thread Cody A. Ray
Right, its trying to read your kafka messages and parse as JSON. See the error: simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0) If you want to use the BrightTag branch, you'll need to go a couple commits back. Try this: git clone

Trident, ZooKeeper and Kafka

2014-05-28 Thread Raphael Hsieh
If I don't tell trident to start consuming data from the beginning of the Kafka stream, where does it start from? If I were to do: tridentKafkaConfig.forceFromStart = true; Then it will tell the spout to start consuming from the start of the stream. If that is not set, then where does it start

Re: Trident, ZooKeeper and Kafka

2014-05-28 Thread Danijel Schiavuzzi
By default, the Kafka spout resumes consuming where it last left off. That offset is stored in ZooKeeper. You can set forceStartOffset to -2 to start consuming from the earliest available offset, or -1 to start consuming from the latest available offset. On Wednesday, May 28, 2014, Raphael Hsieh

Re: zookeeper problem when use trident kafka spout

2014-05-28 Thread Danijel Schiavuzzi
I'm using the same Kafka transactional trident spout in a production Trident topology and haven't experienced any problems with ZooKeeper. Could you post your ZooKeeper and any Storm custom configuration here? Bear in mind that continous updating of ZooKeeper nodes is normal for the transactional

Re: Trident, ZooKeeper and Kafka

2014-05-28 Thread Raphael Hsieh
would the Trident version of this be tridentKafkaConfig.startOffsetTime ? On Wed, May 28, 2014 at 12:23 PM, Danijel Schiavuzzi dani...@schiavuzzi.com wrote: By default, the Kafka spout resumes consuming where it last left off. That offset is stored in ZooKeeper. You can set

Storm with Python

2014-05-28 Thread Ashu Goel
Does anyone have a good example program/instructions of using Python with storm? I can’t seem to find anything concrete online. Thanks, Ashu Goel

Python bolt writing to files

2014-05-28 Thread Dilpreet Singh
Hi, I'm writing the output of a python bolt to a file by: f = open('/tmp/clusters.txt', 'a') f.write(json.dumps(self.clusters) + 'l\n') f.close() It is working fine while testing locally, however, there is no file formed after it is submitted to the cluster. Any ideas on how to save the output

Re: Python bolt writing to files

2014-05-28 Thread Nathan Leung
Are you sure the bolt ran properly? If so, on which machine are you looking for the file? On Wed, May 28, 2014 at 4:30 PM, Dilpreet Singh dilpreet...@gmail.comwrote: Hi, I'm writing the output of a python bolt to a file by: f = open('/tmp/clusters.txt', 'a')

Re: Storm with Python

2014-05-28 Thread Dilpreet Singh
https://github.com/apache/incubator-storm/tree/master/examples/storm-starter The WordCountTopology contains an example python bolt. Regards, Dilpreet On Thu, May 29, 2014 at 1:59 AM, Ashu Goel a...@shopkick.com wrote: Does anyone have a good example program/instructions of using Python with

All tuples are going to same worker

2014-05-28 Thread Shaikh Riyaz
Hi All, We running Storm cluster with following servers. One Nimbus Six supervisor with 2 workers each running on 6700 and 6701 ports. All tuples are going to only one supervisor and only to one worker (6701) running on that supervisor. We have one KafkaSpout and 6 bolts processing the data.

Re: logging 'failed' tuples in mastercoord-bg0

2014-05-28 Thread P. Taylor Goetz
Silent replays are usually a sign of batches timing out. By default storm uses a timeout value of thirty seconds. Try upping that value and setting TOPOLOGY_SPOUT_MAX_PENDING to a very low value like 1. In trident that controls how many batches can be in-flight at a time. -Taylor On May 28,

Re: Trident, ZooKeeper and Kafka

2014-05-28 Thread Raphael Hsieh
This is still not working for me. I've set the offset to -1 and it is still backfilling data. Is there any documentation on the start offsets that I could take a look at ? Or even documentation on kafka.api.OffsetRequest.LatestTime() ? On Wed, May 28, 2014 at 1:01 PM, Raphael Hsieh

Re: All tuples are going to same worker

2014-05-28 Thread P. Taylor Goetz
Fields grouping uses a mod hash function to determine which task to send a tuple. It sounds like there's not enough variety in the field values you are grouping such that they are all getting sent to the same task. Without seeing your code and data I can't tell for sure. -Taylor On May 28,

Re: logging 'failed' tuples in mastercoord-bg0

2014-05-28 Thread Raphael Hsieh
thanks for your help Taylor, Do you think you could point me to some documentation on where I can set those values in Storm Trident? I can't seem to find anything or figure that out. Thanks On Wed, May 28, 2014 at 2:58 PM, P. Taylor Goetz ptgo...@gmail.com wrote: Silent replays are usually a

Re: Trident, ZooKeeper and Kafka

2014-05-28 Thread Shaikh Riyaz
I think you can use kafkaConfig.forceFromStart = *false*; We have implemented this and its working fine. Regards, Riyaz On Thu, May 29, 2014 at 1:02 AM, Raphael Hsieh raffihs...@gmail.com wrote: This is still not working for me. I've set the offset to -1 and it is still backfilling data.

Change the python path for storm cluster

2014-05-28 Thread Dilpreet Singh
Hi, I'm getting import error on various python modules, probably because I'm using python installed at /usr/bin and storm is using the python installed at some other location. Is there any way to change the python location for the storm cluster? Changing the $PATH didn't help. Thanks. Regards,

Re: Different ZooKeeper Cluster for storm and kafka ?

2014-05-28 Thread Raphael Hsieh
Never mind I figured this out. Thanks On Wed, May 28, 2014 at 3:59 PM, Raphael Hsieh raffihs...@gmail.com wrote: Hi I believe it is possible to have my Storm topology run on a different ZooKeeper cluster than the source of my data (this case being Kafka). I cannot seem to find documentation