[GitHub] storm pull request #1687: Apache master storm 1694 top storm 2097

2016-11-01 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/1687#discussion_r85983568
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/trident/KafkaTridentSpoutBatchMetadata.java
 ---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.kafka.spout.trident;
+
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Serializable;
+import java.util.List;
+
+/**
+ * Wraps transaction batch information
+ */
+public class KafkaTridentSpoutBatchMetadata implements Serializable {
+private static final Logger LOG = 
LoggerFactory.getLogger(KafkaTridentSpoutBatchMetadata.class);
+
+private TopicPartition topicPartition;  // topic partition of this 
batch
+private long firstOffset;   // first offset of this batch
+private long lastOffset;// last offset of this batch
+
+public KafkaTridentSpoutBatchMetadata(TopicPartition topicPartition, 
long firstOffset, long lastOffset) {
+this.topicPartition = topicPartition;
+this.firstOffset = firstOffset;
+this.lastOffset = lastOffset;
+}
+
+public KafkaTridentSpoutBatchMetadata(TopicPartition topicPartition, 
ConsumerRecords consumerRecords, KafkaTridentSpoutBatchMetadata 
lastBatch) {
+this.topicPartition = topicPartition;
+
+List> records = 
consumerRecords.records(topicPartition);
+
+if (records != null && !records.isEmpty()) {
+firstOffset = records.get(0).offset();
+lastOffset = records.get(records.size() - 1).offset();
+} else {
+if (lastBatch != null) {
+firstOffset = lastBatch.firstOffset;
+lastOffset = lastBatch.lastOffset;
+}
+}
+LOG.debug("Created {}", this);
--- End diff --

logging "this" will call the overridden toString() method for this class, 
which prints the first and last offset.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1687: Apache master storm 1694 top storm 2097

2016-11-01 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/1687#discussion_r85983348
  
--- Diff: 
examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentKafkaClientWordCountNamedTopics.java
 ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.starter.trident;
+
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.storm.kafka.spout.KafkaSpoutConfig;
+import org.apache.storm.kafka.spout.KafkaSpoutRetryExponentialBackoff;
+import org.apache.storm.kafka.spout.KafkaSpoutRetryService;
+import org.apache.storm.kafka.spout.KafkaSpoutStreams;
+import org.apache.storm.kafka.spout.KafkaSpoutStreamsNamedTopics;
+import org.apache.storm.kafka.spout.KafkaSpoutTupleBuilder;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilder;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilderNamedTopics;
+import org.apache.storm.kafka.spout.trident.KafkaTridentSpoutManager;
+import org.apache.storm.kafka.spout.trident.KafkaTridentSpoutOpaque;
+import org.apache.storm.trident.Stream;
+import org.apache.storm.trident.TridentState;
+import org.apache.storm.trident.TridentTopology;
+import org.apache.storm.trident.operation.builtin.Count;
+import org.apache.storm.trident.operation.builtin.Debug;
+import org.apache.storm.trident.testing.Split;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Values;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.EARLIEST;
+
+public class TridentKafkaClientWordCountNamedTopics extends 
TridentKafkaWordCount {
+public TridentKafkaClientWordCountNamedTopics(String zkUrl, String 
brokerUrl) {
+super(zkUrl, brokerUrl);
+}
+
+protected TridentState addTridentState(TridentTopology 
tridentTopology) {
+final Stream spoutStream = tridentTopology.newStream("spout1", 
createOpaqueKafkaSpoutNew()).parallelismHint(1);
+
+return spoutStream.each(spoutStream.getOutputFields(), new 
Debug(true))
+.each(new Fields("str"), new Split(), new Fields("word"))
+.groupBy(new Fields("word"))
+.persistentAggregate(new DebugMemoryMapState.Factory(), 
new Count(), new Fields("count"));
+}
+
+private KafkaTridentSpoutOpaque 
createOpaqueKafkaSpoutNew() {
+return new KafkaTridentSpoutOpaque(getKafkaTridentManager());
+}
+
+private KafkaTridentSpoutManager 
getKafkaTridentManager() {
+return new 
KafkaTridentSpoutManager<>(getKafkaSpoutConfig(getKafkaSpoutStreams()));
+}
+
+private KafkaSpoutConfig 
getKafkaSpoutConfig(KafkaSpoutStreams kafkaSpoutStreams) {
+return new KafkaSpoutConfig.Builder(getKafkaConsumerProps(), kafkaSpoutStreams, getTuplesBuilder(), 
getRetryService())
+.setOffsetCommitPeriodMs(10_000)
+.setFirstPollOffsetStrategy(EARLIEST)
+.setMaxUncommittedOffsets(250)
+.build();
+}
+
+protected Map getKafkaConsumerProps() {
+Map props = new HashMap<>();
+props.put(KafkaSpoutConfig.Consumer.BOOTSTRAP_SERVERS, 
"127.0.0.1:9092");
+props.put(KafkaSpoutConfig.Consumer.GROUP_ID, 
"kafkaSpoutTestGroup");
+props.put(KafkaSpoutConfig.Consumer.KEY_DESERIALIZER, 
"org.apache.kafka.common.serialization.StringDeserializer");
+props.put(KafkaSpoutConfig.Consumer.VALUE_DESERIALIZER, 
"org.apache.kafka.common.serialization.StringDeserializer");
+props.put("max.partition.fetch.bytes", 200);
+return props;
+}
+
+

[GitHub] storm pull request #1687: Apache master storm 1694 top storm 2097

2016-11-01 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/1687#discussion_r85983373
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/trident/KafkaTridentSpoutEmitter.java
 ---
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.kafka.spout.trident;
+
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.clients.consumer.OffsetAndMetadata;
+import org.apache.kafka.common.TopicPartition;
+import org.apache.storm.kafka.spout.KafkaSpoutConfig;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilder;
+import org.apache.storm.trident.operation.TridentCollector;
+import org.apache.storm.trident.spout.IOpaquePartitionedTridentSpout;
+import org.apache.storm.trident.topology.TransactionAttempt;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.EARLIEST;
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.LATEST;
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.UNCOMMITTED_EARLIEST;
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.UNCOMMITTED_LATEST;
+
+public class KafkaTridentSpoutEmitter implements 
IOpaquePartitionedTridentSpout.Emitter>, 
Serializable {
+private static final Logger LOG = 
LoggerFactory.getLogger(KafkaTridentSpoutEmitter.class);
+
+// Kafka
+private final KafkaConsumer kafkaConsumer;
+
+// Bookkeeping
+private final KafkaTridentSpoutManager kafkaManager;
+// Declare some KafkaTridentSpoutManager references for convenience
+private final KafkaSpoutTuplesBuilder tuplesBuilder;
+private final long pollTimeoutMs;
+private final KafkaSpoutConfig.FirstPollOffsetStrategy 
firstPollOffsetStrategy;
+
+public KafkaTridentSpoutEmitter(KafkaTridentSpoutManager 
kafkaManager) {
+this.kafkaManager = kafkaManager;
+this.kafkaManager.subscribeKafkaConsumer();
+
+//must subscribeKafkaConsumer before this line
+kafkaConsumer = kafkaManager.getKafkaConsumer();
+
+tuplesBuilder = kafkaManager.getTuplesBuilder();
+final KafkaSpoutConfig kafkaSpoutConfig = 
kafkaManager.getKafkaSpoutConfig();
+pollTimeoutMs = kafkaSpoutConfig.getPollTimeoutMs();
+firstPollOffsetStrategy = 
kafkaSpoutConfig.getFirstPollOffsetStrategy();
+LOG.debug("Created {}", this);
+}
+
+@Override
+public KafkaTridentSpoutBatchMetadata 
emitPartitionBatch(TransactionAttempt tx, TridentCollector collector,
+KafkaTridentSpoutTopicPartition partitionTs, 
KafkaTridentSpoutBatchMetadata lastBatch) {
+LOG.debug("Emitting batch: [transaction = {}], [partition = {}], 
[collector = {}], [lastBatchMetadata = {}]",
+tx, partitionTs, collector, lastBatch);
+
+final TopicPartition topicPartition = 
partitionTs.getTopicPartition();
+KafkaTridentSpoutBatchMetadata currentBatch = lastBatch;
+Collection pausedTopicPartitions = 
Collections.EMPTY_SET;
+
+try {
+// pause other topic partitions to only poll from current 
topic partition
+pausedTopicPartitions = pauseTopicPartitions(topicPartition);
+
+seek(topicPartition, 

[GitHub] storm pull request #1687: Apache master storm 1694 top storm 2097

2016-11-01 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/1687#discussion_r85981903
  
--- Diff: 
examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentKafkaClientWordCountNamedTopics.java
 ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.starter.trident;
+
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.storm.kafka.spout.KafkaSpoutConfig;
+import org.apache.storm.kafka.spout.KafkaSpoutRetryExponentialBackoff;
+import org.apache.storm.kafka.spout.KafkaSpoutRetryService;
+import org.apache.storm.kafka.spout.KafkaSpoutStreams;
+import org.apache.storm.kafka.spout.KafkaSpoutStreamsNamedTopics;
+import org.apache.storm.kafka.spout.KafkaSpoutTupleBuilder;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilder;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilderNamedTopics;
+import org.apache.storm.kafka.spout.trident.KafkaTridentSpoutManager;
+import org.apache.storm.kafka.spout.trident.KafkaTridentSpoutOpaque;
+import org.apache.storm.trident.Stream;
+import org.apache.storm.trident.TridentState;
+import org.apache.storm.trident.TridentTopology;
+import org.apache.storm.trident.operation.builtin.Count;
+import org.apache.storm.trident.operation.builtin.Debug;
+import org.apache.storm.trident.testing.Split;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Values;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.EARLIEST;
+
+public class TridentKafkaClientWordCountNamedTopics extends 
TridentKafkaWordCount {
+public TridentKafkaClientWordCountNamedTopics(String zkUrl, String 
brokerUrl) {
+super(zkUrl, brokerUrl);
+}
+
+protected TridentState addTridentState(TridentTopology 
tridentTopology) {
+final Stream spoutStream = tridentTopology.newStream("spout1", 
createOpaqueKafkaSpoutNew()).parallelismHint(1);
+
+return spoutStream.each(spoutStream.getOutputFields(), new 
Debug(true))
+.each(new Fields("str"), new Split(), new Fields("word"))
+.groupBy(new Fields("word"))
+.persistentAggregate(new DebugMemoryMapState.Factory(), 
new Count(), new Fields("count"));
+}
+
+private KafkaTridentSpoutOpaque 
createOpaqueKafkaSpoutNew() {
+return new KafkaTridentSpoutOpaque(getKafkaTridentManager());
--- End diff --

Partially Done in refactored examples in this 
[PR](https://github.com/apache/storm/pull/1757).

There were some redundant "factory methods" that I removed. However, the 
code creating the "dependency" objects that need to be passed in is not 1 or 
two lines. I believe that a method with a meaningful name creating and 
initializing these "dependency" objects makes the code much more cohesive and 
easier to read. Furthermore, this class is extended for wildcard topics, and 
some of these methods overridden. 

I will be happy to write a more "copy" and "paste" like example in the docs 
if you feel it's appropriate. Please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1687: Apache master storm 1694 top storm 2097

2016-11-01 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/1687#discussion_r85980768
  
--- Diff: 
examples/storm-starter/src/jvm/org/apache/storm/starter/trident/DebugMemoryMapState.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.starter.trident;
+
+import org.apache.storm.task.IMetricsContext;
+import org.apache.storm.topology.FailedException;
+import org.apache.storm.trident.state.CombinerValueUpdater;
+import org.apache.storm.trident.state.State;
+import org.apache.storm.trident.state.StateFactory;
+import org.apache.storm.trident.state.ValueUpdater;
+import org.apache.storm.trident.testing.MemoryMapState;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
+public class DebugMemoryMapState extends MemoryMapState {
+private static final Logger LOG = 
LoggerFactory.getLogger(DebugMemoryMapState.class);
+
+private int updateCount = 0;
+
+public DebugMemoryMapState(String id) {
+super(id);
+}
+
+public List multiUpdate(List keys, List 
updaters) {
+print(keys, updaters);
+if ((updateCount++ % 5) == 0) {
+LOG.error("Throwing FailedException");
+throw new FailedException("Enforced State Update Fail. On 
retrial should replay the exact same batch.");
+}
+return super.multiUpdate(keys, updaters);
+}
+
+private void print(List keys, List 
updaters) {
+for (int i = 0; i < keys.size(); i++) {
+ValueUpdater valueUpdater = updaters.get(i);
+Object arg = ((CombinerValueUpdater) valueUpdater).getArg();
+LOG.debug("updateCount = {}, keys = {} => updaterArgs = {}", 
updateCount, keys.get(i), arg);
--- End diff --

Done. Refactored examples in this 
[PR](https://github.com/apache/storm/pull/1757)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1687: Apache master storm 1694 top storm 2097

2016-11-01 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/1687#discussion_r85980640
  
--- Diff: 
examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentKafkaClientWordCountWildcardTopics.java
 ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.starter.trident;
+
+import org.apache.storm.kafka.spout.KafkaSpoutStream;
+import org.apache.storm.kafka.spout.KafkaSpoutStreams;
+import org.apache.storm.kafka.spout.KafkaSpoutStreamsWildcardTopics;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilder;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilderWildcardTopics;
+import org.apache.storm.tuple.Fields;
+
+import java.util.regex.Pattern;
+
+public class TridentKafkaClientWordCountWildcardTopics extends 
TridentKafkaClientWordCountNamedTopics {
+private static final String TOPIC_WILDCARD_PATTERN = 
"test-trident(-1)?";
+
+public TridentKafkaClientWordCountWildcardTopics(String zkUrl, String 
brokerUrl) {
+super(zkUrl, brokerUrl);
+}
+
+public static void main(String[] args) throws Exception {
+final String[] zkBrokerUrl = parseUrl(args);
--- End diff --

Agree. Refactored examples in this 
[PR](https://github.com/apache/storm/pull/1757)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1687: Apache master storm 1694 top storm 2097

2016-10-29 Thread hmcl
Github user hmcl commented on a diff in the pull request:

https://github.com/apache/storm/pull/1687#discussion_r85643212
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/trident/KafkaTridentSpoutEmitter.java
 ---
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.kafka.spout.trident;
+
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.clients.consumer.OffsetAndMetadata;
+import org.apache.kafka.common.TopicPartition;
+import org.apache.storm.kafka.spout.KafkaSpoutConfig;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilder;
+import org.apache.storm.trident.operation.TridentCollector;
+import org.apache.storm.trident.spout.IOpaquePartitionedTridentSpout;
+import org.apache.storm.trident.topology.TransactionAttempt;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.EARLIEST;
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.LATEST;
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.UNCOMMITTED_EARLIEST;
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.UNCOMMITTED_LATEST;
+
+public class KafkaTridentSpoutEmitter implements 
IOpaquePartitionedTridentSpout.Emitter>, 
Serializable {
+private static final Logger LOG = 
LoggerFactory.getLogger(KafkaTridentSpoutEmitter.class);
+
+// Kafka
+private final KafkaConsumer kafkaConsumer;
+
+// Bookkeeping
+private final KafkaTridentSpoutManager kafkaManager;
+// Declare some KafkaTridentSpoutManager references for convenience
+private final KafkaSpoutTuplesBuilder tuplesBuilder;
+private final long pollTimeoutMs;
+private final KafkaSpoutConfig.FirstPollOffsetStrategy 
firstPollOffsetStrategy;
+
+public KafkaTridentSpoutEmitter(KafkaTridentSpoutManager 
kafkaManager) {
+this.kafkaManager = kafkaManager;
+this.kafkaManager.subscribeKafkaConsumer();
+
+//must subscribeKafkaConsumer before this line
+kafkaConsumer = kafkaManager.getKafkaConsumer();
+
+tuplesBuilder = kafkaManager.getTuplesBuilder();
+final KafkaSpoutConfig kafkaSpoutConfig = 
kafkaManager.getKafkaSpoutConfig();
+pollTimeoutMs = kafkaSpoutConfig.getPollTimeoutMs();
+firstPollOffsetStrategy = 
kafkaSpoutConfig.getFirstPollOffsetStrategy();
+LOG.debug("Created {}", this);
+}
+
+@Override
+public KafkaTridentSpoutBatchMetadata 
emitPartitionBatch(TransactionAttempt tx, TridentCollector collector,
+KafkaTridentSpoutTopicPartition partitionTs, 
KafkaTridentSpoutBatchMetadata lastBatch) {
+LOG.debug("Emitting batch: [transaction = {}], [partition = {}], 
[collector = {}], [lastBatchMetadata = {}]",
+tx, partitionTs, collector, lastBatch);
+
+final TopicPartition topicPartition = 
partitionTs.getTopicPartition();
+KafkaTridentSpoutBatchMetadata currentBatch = lastBatch;
+Collection pausedTopicPartitions = 
Collections.EMPTY_SET;
+
+try {
+// pause other topic partitions to only poll from current 
topic partition
+pausedTopicPartitions = pauseTopicPartitions(topicPartition);
+
+seek(topicPartition, 

[GitHub] storm pull request #1687: Apache master storm 1694 top storm 2097

2016-10-13 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1687#discussion_r83286131
  
--- Diff: 
examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentKafkaClientWordCountNamedTopics.java
 ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.starter.trident;
+
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.storm.kafka.spout.KafkaSpoutConfig;
+import org.apache.storm.kafka.spout.KafkaSpoutRetryExponentialBackoff;
+import org.apache.storm.kafka.spout.KafkaSpoutRetryService;
+import org.apache.storm.kafka.spout.KafkaSpoutStreams;
+import org.apache.storm.kafka.spout.KafkaSpoutStreamsNamedTopics;
+import org.apache.storm.kafka.spout.KafkaSpoutTupleBuilder;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilder;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilderNamedTopics;
+import org.apache.storm.kafka.spout.trident.KafkaTridentSpoutManager;
+import org.apache.storm.kafka.spout.trident.KafkaTridentSpoutOpaque;
+import org.apache.storm.trident.Stream;
+import org.apache.storm.trident.TridentState;
+import org.apache.storm.trident.TridentTopology;
+import org.apache.storm.trident.operation.builtin.Count;
+import org.apache.storm.trident.operation.builtin.Debug;
+import org.apache.storm.trident.testing.Split;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Values;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.EARLIEST;
+
+public class TridentKafkaClientWordCountNamedTopics extends 
TridentKafkaWordCount {
+public TridentKafkaClientWordCountNamedTopics(String zkUrl, String 
brokerUrl) {
+super(zkUrl, brokerUrl);
+}
+
+protected TridentState addTridentState(TridentTopology 
tridentTopology) {
+final Stream spoutStream = tridentTopology.newStream("spout1", 
createOpaqueKafkaSpoutNew()).parallelismHint(1);
+
+return spoutStream.each(spoutStream.getOutputFields(), new 
Debug(true))
+.each(new Fields("str"), new Split(), new Fields("word"))
+.groupBy(new Fields("word"))
+.persistentAggregate(new DebugMemoryMapState.Factory(), 
new Count(), new Fields("count"));
+}
+
+private KafkaTridentSpoutOpaque 
createOpaqueKafkaSpoutNew() {
+return new KafkaTridentSpoutOpaque(getKafkaTridentManager());
--- End diff --

can we merge this into single a method?. So that it shows the series of 
steps in creating a KafkaTrident topology. It has few redirections with one 
method calling another which can be confusing for the users looking for an 
example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1687: Apache master storm 1694 top storm 2097

2016-10-13 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1687#discussion_r83312306
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/trident/KafkaTridentSpoutEmitter.java
 ---
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.kafka.spout.trident;
+
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.clients.consumer.OffsetAndMetadata;
+import org.apache.kafka.common.TopicPartition;
+import org.apache.storm.kafka.spout.KafkaSpoutConfig;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilder;
+import org.apache.storm.trident.operation.TridentCollector;
+import org.apache.storm.trident.spout.IOpaquePartitionedTridentSpout;
+import org.apache.storm.trident.topology.TransactionAttempt;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.EARLIEST;
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.LATEST;
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.UNCOMMITTED_EARLIEST;
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.UNCOMMITTED_LATEST;
+
+public class KafkaTridentSpoutEmitter implements 
IOpaquePartitionedTridentSpout.Emitter>, 
Serializable {
+private static final Logger LOG = 
LoggerFactory.getLogger(KafkaTridentSpoutEmitter.class);
+
+// Kafka
+private final KafkaConsumer kafkaConsumer;
+
+// Bookkeeping
+private final KafkaTridentSpoutManager kafkaManager;
+// Declare some KafkaTridentSpoutManager references for convenience
+private final KafkaSpoutTuplesBuilder tuplesBuilder;
+private final long pollTimeoutMs;
+private final KafkaSpoutConfig.FirstPollOffsetStrategy 
firstPollOffsetStrategy;
+
+public KafkaTridentSpoutEmitter(KafkaTridentSpoutManager 
kafkaManager) {
+this.kafkaManager = kafkaManager;
+this.kafkaManager.subscribeKafkaConsumer();
+
+//must subscribeKafkaConsumer before this line
+kafkaConsumer = kafkaManager.getKafkaConsumer();
+
+tuplesBuilder = kafkaManager.getTuplesBuilder();
+final KafkaSpoutConfig kafkaSpoutConfig = 
kafkaManager.getKafkaSpoutConfig();
+pollTimeoutMs = kafkaSpoutConfig.getPollTimeoutMs();
+firstPollOffsetStrategy = 
kafkaSpoutConfig.getFirstPollOffsetStrategy();
+LOG.debug("Created {}", this);
+}
+
+@Override
+public KafkaTridentSpoutBatchMetadata 
emitPartitionBatch(TransactionAttempt tx, TridentCollector collector,
+KafkaTridentSpoutTopicPartition partitionTs, 
KafkaTridentSpoutBatchMetadata lastBatch) {
+LOG.debug("Emitting batch: [transaction = {}], [partition = {}], 
[collector = {}], [lastBatchMetadata = {}]",
+tx, partitionTs, collector, lastBatch);
+
+final TopicPartition topicPartition = 
partitionTs.getTopicPartition();
+KafkaTridentSpoutBatchMetadata currentBatch = lastBatch;
+Collection pausedTopicPartitions = 
Collections.EMPTY_SET;
+
+try {
+// pause other topic partitions to only poll from current 
topic partition
+pausedTopicPartitions = pauseTopicPartitions(topicPartition);
+
+seek(topicPartition, 

[GitHub] storm pull request #1687: Apache master storm 1694 top storm 2097

2016-10-13 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1687#discussion_r83285410
  
--- Diff: 
examples/storm-starter/src/jvm/org/apache/storm/starter/trident/DebugMemoryMapState.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.starter.trident;
+
+import org.apache.storm.task.IMetricsContext;
+import org.apache.storm.topology.FailedException;
+import org.apache.storm.trident.state.CombinerValueUpdater;
+import org.apache.storm.trident.state.State;
+import org.apache.storm.trident.state.StateFactory;
+import org.apache.storm.trident.state.ValueUpdater;
+import org.apache.storm.trident.testing.MemoryMapState;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
+public class DebugMemoryMapState extends MemoryMapState {
+private static final Logger LOG = 
LoggerFactory.getLogger(DebugMemoryMapState.class);
+
+private int updateCount = 0;
+
+public DebugMemoryMapState(String id) {
+super(id);
+}
+
+public List multiUpdate(List keys, List 
updaters) {
+print(keys, updaters);
+if ((updateCount++ % 5) == 0) {
+LOG.error("Throwing FailedException");
+throw new FailedException("Enforced State Update Fail. On 
retrial should replay the exact same batch.");
+}
+return super.multiUpdate(keys, updaters);
+}
+
+private void print(List keys, List 
updaters) {
+for (int i = 0; i < keys.size(); i++) {
+ValueUpdater valueUpdater = updaters.get(i);
+Object arg = ((CombinerValueUpdater) valueUpdater).getArg();
+LOG.debug("updateCount = {}, keys = {} => updaterArgs = {}", 
updateCount, keys.get(i), arg);
--- End diff --

should this just print with info level since this is a debugState why make 
another hop to enable debug for this topology.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1687: Apache master storm 1694 top storm 2097

2016-10-13 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1687#discussion_r83312172
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/trident/KafkaTridentSpoutBatchMetadata.java
 ---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.kafka.spout.trident;
+
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.common.TopicPartition;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Serializable;
+import java.util.List;
+
+/**
+ * Wraps transaction batch information
+ */
+public class KafkaTridentSpoutBatchMetadata implements Serializable {
+private static final Logger LOG = 
LoggerFactory.getLogger(KafkaTridentSpoutBatchMetadata.class);
+
+private TopicPartition topicPartition;  // topic partition of this 
batch
+private long firstOffset;   // first offset of this batch
+private long lastOffset;// last offset of this batch
+
+public KafkaTridentSpoutBatchMetadata(TopicPartition topicPartition, 
long firstOffset, long lastOffset) {
+this.topicPartition = topicPartition;
+this.firstOffset = firstOffset;
+this.lastOffset = lastOffset;
+}
+
+public KafkaTridentSpoutBatchMetadata(TopicPartition topicPartition, 
ConsumerRecords consumerRecords, KafkaTridentSpoutBatchMetadata 
lastBatch) {
+this.topicPartition = topicPartition;
+
+List> records = 
consumerRecords.records(topicPartition);
+
+if (records != null && !records.isEmpty()) {
+firstOffset = records.get(0).offset();
+lastOffset = records.get(records.size() - 1).offset();
+} else {
+if (lastBatch != null) {
+firstOffset = lastBatch.firstOffset;
+lastOffset = lastBatch.lastOffset;
+}
+}
+LOG.debug("Created {}", this);
--- End diff --

probably useful to log the first and last offset of the batch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1687: Apache master storm 1694 top storm 2097

2016-10-13 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1687#discussion_r83289318
  
--- Diff: 
examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentKafkaClientWordCountNamedTopics.java
 ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.starter.trident;
+
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.storm.kafka.spout.KafkaSpoutConfig;
+import org.apache.storm.kafka.spout.KafkaSpoutRetryExponentialBackoff;
+import org.apache.storm.kafka.spout.KafkaSpoutRetryService;
+import org.apache.storm.kafka.spout.KafkaSpoutStreams;
+import org.apache.storm.kafka.spout.KafkaSpoutStreamsNamedTopics;
+import org.apache.storm.kafka.spout.KafkaSpoutTupleBuilder;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilder;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilderNamedTopics;
+import org.apache.storm.kafka.spout.trident.KafkaTridentSpoutManager;
+import org.apache.storm.kafka.spout.trident.KafkaTridentSpoutOpaque;
+import org.apache.storm.trident.Stream;
+import org.apache.storm.trident.TridentState;
+import org.apache.storm.trident.TridentTopology;
+import org.apache.storm.trident.operation.builtin.Count;
+import org.apache.storm.trident.operation.builtin.Debug;
+import org.apache.storm.trident.testing.Split;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Values;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.EARLIEST;
+
+public class TridentKafkaClientWordCountNamedTopics extends 
TridentKafkaWordCount {
+public TridentKafkaClientWordCountNamedTopics(String zkUrl, String 
brokerUrl) {
+super(zkUrl, brokerUrl);
+}
+
+protected TridentState addTridentState(TridentTopology 
tridentTopology) {
+final Stream spoutStream = tridentTopology.newStream("spout1", 
createOpaqueKafkaSpoutNew()).parallelismHint(1);
+
+return spoutStream.each(spoutStream.getOutputFields(), new 
Debug(true))
+.each(new Fields("str"), new Split(), new Fields("word"))
+.groupBy(new Fields("word"))
+.persistentAggregate(new DebugMemoryMapState.Factory(), 
new Count(), new Fields("count"));
+}
+
+private KafkaTridentSpoutOpaque 
createOpaqueKafkaSpoutNew() {
+return new KafkaTridentSpoutOpaque(getKafkaTridentManager());
+}
+
+private KafkaTridentSpoutManager 
getKafkaTridentManager() {
+return new 
KafkaTridentSpoutManager<>(getKafkaSpoutConfig(getKafkaSpoutStreams()));
+}
+
+private KafkaSpoutConfig 
getKafkaSpoutConfig(KafkaSpoutStreams kafkaSpoutStreams) {
+return new KafkaSpoutConfig.Builder(getKafkaConsumerProps(), kafkaSpoutStreams, getTuplesBuilder(), 
getRetryService())
+.setOffsetCommitPeriodMs(10_000)
+.setFirstPollOffsetStrategy(EARLIEST)
+.setMaxUncommittedOffsets(250)
+.build();
+}
+
+protected Map getKafkaConsumerProps() {
+Map props = new HashMap<>();
+props.put(KafkaSpoutConfig.Consumer.BOOTSTRAP_SERVERS, 
"127.0.0.1:9092");
+props.put(KafkaSpoutConfig.Consumer.GROUP_ID, 
"kafkaSpoutTestGroup");
+props.put(KafkaSpoutConfig.Consumer.KEY_DESERIALIZER, 
"org.apache.kafka.common.serialization.StringDeserializer");
+props.put(KafkaSpoutConfig.Consumer.VALUE_DESERIALIZER, 
"org.apache.kafka.common.serialization.StringDeserializer");
+props.put("max.partition.fetch.bytes", 200);
+return props;
+}
+
+

[GitHub] storm pull request #1687: Apache master storm 1694 top storm 2097

2016-10-13 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1687#discussion_r83290282
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/trident/KafkaTridentSpoutEmitter.java
 ---
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.kafka.spout.trident;
+
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.clients.consumer.OffsetAndMetadata;
+import org.apache.kafka.common.TopicPartition;
+import org.apache.storm.kafka.spout.KafkaSpoutConfig;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilder;
+import org.apache.storm.trident.operation.TridentCollector;
+import org.apache.storm.trident.spout.IOpaquePartitionedTridentSpout;
+import org.apache.storm.trident.topology.TransactionAttempt;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.EARLIEST;
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.LATEST;
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.UNCOMMITTED_EARLIEST;
+import static 
org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.UNCOMMITTED_LATEST;
+
+public class KafkaTridentSpoutEmitter implements 
IOpaquePartitionedTridentSpout.Emitter>, 
Serializable {
+private static final Logger LOG = 
LoggerFactory.getLogger(KafkaTridentSpoutEmitter.class);
+
+// Kafka
+private final KafkaConsumer kafkaConsumer;
+
+// Bookkeeping
+private final KafkaTridentSpoutManager kafkaManager;
+// Declare some KafkaTridentSpoutManager references for convenience
+private final KafkaSpoutTuplesBuilder tuplesBuilder;
+private final long pollTimeoutMs;
+private final KafkaSpoutConfig.FirstPollOffsetStrategy 
firstPollOffsetStrategy;
+
+public KafkaTridentSpoutEmitter(KafkaTridentSpoutManager 
kafkaManager) {
+this.kafkaManager = kafkaManager;
+this.kafkaManager.subscribeKafkaConsumer();
+
+//must subscribeKafkaConsumer before this line
+kafkaConsumer = kafkaManager.getKafkaConsumer();
+
+tuplesBuilder = kafkaManager.getTuplesBuilder();
+final KafkaSpoutConfig kafkaSpoutConfig = 
kafkaManager.getKafkaSpoutConfig();
+pollTimeoutMs = kafkaSpoutConfig.getPollTimeoutMs();
+firstPollOffsetStrategy = 
kafkaSpoutConfig.getFirstPollOffsetStrategy();
+LOG.debug("Created {}", this);
+}
+
+@Override
+public KafkaTridentSpoutBatchMetadata 
emitPartitionBatch(TransactionAttempt tx, TridentCollector collector,
+KafkaTridentSpoutTopicPartition partitionTs, 
KafkaTridentSpoutBatchMetadata lastBatch) {
+LOG.debug("Emitting batch: [transaction = {}], [partition = {}], 
[collector = {}], [lastBatchMetadata = {}]",
+tx, partitionTs, collector, lastBatch);
+
+final TopicPartition topicPartition = 
partitionTs.getTopicPartition();
+KafkaTridentSpoutBatchMetadata currentBatch = lastBatch;
+Collection pausedTopicPartitions = 
Collections.EMPTY_SET;
+
+try {
+// pause other topic partitions to only poll from current 
topic partition
+pausedTopicPartitions = pauseTopicPartitions(topicPartition);
+
+seek(topicPartition, 

[GitHub] storm pull request #1687: Apache master storm 1694 top storm 2097

2016-10-13 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1687#discussion_r83280245
  
--- Diff: 
examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentKafkaClientWordCountWildcardTopics.java
 ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ *   or more contributor license agreements.  See the NOTICE file
+ *   distributed with this work for additional information
+ *   regarding copyright ownership.  The ASF licenses this file
+ *   to you under the Apache License, Version 2.0 (the
+ *   "License"); you may not use this file except in compliance
+ *   with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.storm.starter.trident;
+
+import org.apache.storm.kafka.spout.KafkaSpoutStream;
+import org.apache.storm.kafka.spout.KafkaSpoutStreams;
+import org.apache.storm.kafka.spout.KafkaSpoutStreamsWildcardTopics;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilder;
+import org.apache.storm.kafka.spout.KafkaSpoutTuplesBuilderWildcardTopics;
+import org.apache.storm.tuple.Fields;
+
+import java.util.regex.Pattern;
+
+public class TridentKafkaClientWordCountWildcardTopics extends 
TridentKafkaClientWordCountNamedTopics {
+private static final String TOPIC_WILDCARD_PATTERN = 
"test-trident(-1)?";
+
+public TridentKafkaClientWordCountWildcardTopics(String zkUrl, String 
brokerUrl) {
+super(zkUrl, brokerUrl);
+}
+
+public static void main(String[] args) throws Exception {
+final String[] zkBrokerUrl = parseUrl(args);
--- End diff --

do we still need zookeeper config. This topology using new KafkaSpout right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1687: Apache master storm 1694 top storm 2097

2016-09-16 Thread hmcl
GitHub user hmcl opened a pull request:

https://github.com/apache/storm/pull/1687

Apache master storm 1694 top storm 2097

The Kafka Trident implementation is on top of the Trident logs improvement 
patch because they are related, and it makes it easier to merge the patch. 
There is already another PR for STORM-2097

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hmcl/storm-apache 
Apache_master_STORM-1694_top_STORM-2097

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/1687.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1687


commit 71465dc1fe7bb21c43e4b57bac7010105facd947
Author: Hugo Louro 
Date:   2016-06-21T16:28:09Z

STORM-2097: Improve logging in trident core and examples
 - Improve logging in trident core, MasterBatchCoordinator,  and examples
 - Added DebugMemoryMapState and test main for new Kafka client API

commit a2d678d800daf24b87226593d731cc43d63caa72
Author: Hugo Louro 
Date:   2016-06-21T16:35:16Z

STORM-1694: Kafka Spout Trident Implementation Using New Kafka Consumer API
 - Kafka New Client - Opaque Transactional Trident Spout Implementation
 - Implementation supporting multiple named topics and wildcard topics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---