STORM-2423 for 1.1.0

2017-03-17 Thread Roshan Naik
All,
Found a critical issue with JoinBolt that would like to get a fix into 1.1.0.

https://issues.apache.org/jira/browse/STORM-2423


-roshan






[GitHub] storm pull request #2019: STORM-2423 - Join Bolt should use explicit instead...

2017-03-17 Thread roshannaik
GitHub user roshannaik opened a pull request:

https://github.com/apache/storm/pull/2019

STORM-2423 - Join Bolt should use explicit instead of default window 
anchoring of emitted tuples

Default anchoring will anchor each emitted tuple to every tuple in current 
window. This requires a very large numbers of ACKs from any downstream bolt. If 
topology.debug is enabled, it also worsens the load on the system significantly.
Letting the topo run in this mode (in particular with max.spout.pending 
disabled), could lead to the worker running out of memory and crashing.

*Fix* Join Bolt should avoid using default window anchoring, and explicitly 
anchor each emitted tuple with the exact matching tuples form each inputs 
streams. This reduces the complexity of the tuple trees and consequently the 
reduces burden on the ACKing & messaging subsystems.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/roshannaik/storm STORM-2423

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2019.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2019


commit a3cd13474846756a7edf3061fd60a947bec078f8
Author: Roshan Naik 
Date:   2017-03-18T02:46:28Z

STORM-2423 - Join Bolt : Use explicit instead of default window anchoring 
of emitted tuples




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1998: Eventhub2

2017-03-17 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1998#discussion_r106768060
  
--- Diff: 
external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/bolt/EventHubBolt.java
 ---
@@ -84,11 +85,47 @@ public void prepare(Map config, TopologyContext context,
@Override
public void execute(Tuple tuple) {
try {
-   
sender.send(boltConfig.getEventDataFormat().serialize(tuple));
+   EventData sendEvent = new 
EventData(boltConfig.getEventDataFormat().serialize(tuple));
+   if(boltConfig.getPartitionMode() && sender!=null)
+   sender.sendSync(sendEvent);
+   else if(boltConfig.getPartitionMode() && sender==null)
+   throw new EventHubException("Sender is null");
+   else if(!boltConfig.getPartitionMode() && 
ehClient!=null)
+   ehClient.sendSync(sendEvent);
+   else if(!boltConfig.getPartitionMode() && 
ehClient==null)
+   throw new EventHubException("ehclient is null");
collector.ack(tuple);
-   } catch (EventHubException ex) {
+   } catch (EventHubException ex ) {
collector.reportError(ex);
collector.fail(tuple);
+   }catch (ServiceBusException e){
+   collector.reportError(e);
+   collector.fail(tuple);
+   }
+   }
+
+   @Override
+   public void cleanup() {
+   if(sender != null) {
+   try {
+   sender.close().whenComplete((voidargs,error)->{
--- End diff --

FYI, 1.x-branch still is on JDK 1.7 make sure you open another PR for 
1.x-branch with JDK7 changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1998: Eventhub2

2017-03-17 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1998#discussion_r106768022
  
--- Diff: 
external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/bolt/EventHubBolt.java
 ---
@@ -41,7 +42,8 @@
.getLogger(EventHubBolt.class);
 
protected OutputCollector collector;
-   protected EventHubSender sender;
+   protected PartitionSender sender=null;
--- End diff --

Don't need to assign null


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1998: Eventhub2

2017-03-17 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1998#discussion_r106768166
  
--- Diff: 
external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/spout/EventHubReceiverImpl.java
 ---
@@ -65,77 +57,81 @@ public EventHubReceiverImpl(EventHubSpoutConfig config, 
String partitionId) {
   }
 
   @Override
-  public void open(IEventHubFilter filter) throws EventHubException {
-logger.info("creating eventhub receiver: partitionId=" + partitionId + 
-   ", filterString=" + filter.getFilterString());
+  public void open(String offset) throws EventHubException {
+logger.info("creating eventhub receiver: partitionId=" + partitionId +
+", offset=" + offset);
 long start = System.currentTimeMillis();
-receiver = new ResilientEventHubReceiver(connectionString, entityName,
-   partitionId, consumerGroupName, defaultCredits, filter);
-receiver.initialize();
-
+try {
+  ehClient = 
EventHubClient.createFromConnectionStringSync(connectionString);
+  receiver = ehClient.createEpochReceiverSync(
+  consumerGroupName,
+  partitionId,
+  offset,
+  false,
+  1);
+}catch (Exception e){
+  logger.info("Exception in creating EventhubClient"+e.toString());
+}
 long end = System.currentTimeMillis();
 logger.info("created eventhub receiver, time taken(ms): " + 
(end-start));
   }
 
   @Override
-  public void close() {
+  public void close(){
 if(receiver != null) {
-  receiver.close();
+  try {
+receiver.close().whenComplete((voidargs,error)->{
+  try{
+if(error!=null){
+  logger.error("Exception during receiver close 
phase"+error.toString());
+}
+ehClient.closeSync();
+  }catch (Exception e){
+logger.error("Exception during ehclient close 
phase"+e.toString());
+  }
+}).get();
+  }catch (InterruptedException e){
+logger.error("Exception occured during close phase"+e.toString());
+  }catch (ExecutionException e){
--- End diff --

same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1998: Eventhub2

2017-03-17 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1998#discussion_r106768041
  
--- Diff: 
external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/bolt/EventHubBolt.java
 ---
@@ -84,11 +85,47 @@ public void prepare(Map config, TopologyContext context,
@Override
public void execute(Tuple tuple) {
try {
-   
sender.send(boltConfig.getEventDataFormat().serialize(tuple));
+   EventData sendEvent = new 
EventData(boltConfig.getEventDataFormat().serialize(tuple));
+   if(boltConfig.getPartitionMode() && sender!=null)
--- End diff --

can you please add braces around if conditions. This goes to all of the 
code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1998: Eventhub2

2017-03-17 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1998#discussion_r106768115
  
--- Diff: 
external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/spout/EventHubReceiverImpl.java
 ---
@@ -65,77 +57,81 @@ public EventHubReceiverImpl(EventHubSpoutConfig config, 
String partitionId) {
   }
 
   @Override
-  public void open(IEventHubFilter filter) throws EventHubException {
-logger.info("creating eventhub receiver: partitionId=" + partitionId + 
-   ", filterString=" + filter.getFilterString());
+  public void open(String offset) throws EventHubException {
+logger.info("creating eventhub receiver: partitionId=" + partitionId +
+", offset=" + offset);
 long start = System.currentTimeMillis();
-receiver = new ResilientEventHubReceiver(connectionString, entityName,
-   partitionId, consumerGroupName, defaultCredits, filter);
-receiver.initialize();
-
+try {
+  ehClient = 
EventHubClient.createFromConnectionStringSync(connectionString);
+  receiver = ehClient.createEpochReceiverSync(
+  consumerGroupName,
+  partitionId,
+  offset,
+  false,
+  1);
+}catch (Exception e){
--- End diff --

spaces around the braces


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1998: Eventhub2

2017-03-17 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1998#discussion_r106768140
  
--- Diff: 
external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/spout/EventHubReceiverImpl.java
 ---
@@ -65,77 +57,81 @@ public EventHubReceiverImpl(EventHubSpoutConfig config, 
String partitionId) {
   }
 
   @Override
-  public void open(IEventHubFilter filter) throws EventHubException {
-logger.info("creating eventhub receiver: partitionId=" + partitionId + 
-   ", filterString=" + filter.getFilterString());
+  public void open(String offset) throws EventHubException {
+logger.info("creating eventhub receiver: partitionId=" + partitionId +
+", offset=" + offset);
 long start = System.currentTimeMillis();
-receiver = new ResilientEventHubReceiver(connectionString, entityName,
-   partitionId, consumerGroupName, defaultCredits, filter);
-receiver.initialize();
-
+try {
+  ehClient = 
EventHubClient.createFromConnectionStringSync(connectionString);
+  receiver = ehClient.createEpochReceiverSync(
+  consumerGroupName,
+  partitionId,
+  offset,
+  false,
+  1);
+}catch (Exception e){
+  logger.info("Exception in creating EventhubClient"+e.toString());
+}
 long end = System.currentTimeMillis();
 logger.info("created eventhub receiver, time taken(ms): " + 
(end-start));
   }
 
   @Override
-  public void close() {
+  public void close(){
 if(receiver != null) {
-  receiver.close();
+  try {
+receiver.close().whenComplete((voidargs,error)->{
+  try{
--- End diff --

space after try


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1998: Eventhub2

2017-03-17 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1998#discussion_r106768148
  
--- Diff: 
external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/spout/EventHubReceiverImpl.java
 ---
@@ -65,77 +57,81 @@ public EventHubReceiverImpl(EventHubSpoutConfig config, 
String partitionId) {
   }
 
   @Override
-  public void open(IEventHubFilter filter) throws EventHubException {
-logger.info("creating eventhub receiver: partitionId=" + partitionId + 
-   ", filterString=" + filter.getFilterString());
+  public void open(String offset) throws EventHubException {
+logger.info("creating eventhub receiver: partitionId=" + partitionId +
+", offset=" + offset);
 long start = System.currentTimeMillis();
-receiver = new ResilientEventHubReceiver(connectionString, entityName,
-   partitionId, consumerGroupName, defaultCredits, filter);
-receiver.initialize();
-
+try {
+  ehClient = 
EventHubClient.createFromConnectionStringSync(connectionString);
+  receiver = ehClient.createEpochReceiverSync(
+  consumerGroupName,
+  partitionId,
+  offset,
+  false,
+  1);
+}catch (Exception e){
+  logger.info("Exception in creating EventhubClient"+e.toString());
+}
 long end = System.currentTimeMillis();
 logger.info("created eventhub receiver, time taken(ms): " + 
(end-start));
   }
 
   @Override
-  public void close() {
+  public void close(){
 if(receiver != null) {
-  receiver.close();
+  try {
+receiver.close().whenComplete((voidargs,error)->{
+  try{
+if(error!=null){
--- End diff --

space after if and before brace


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1998: Eventhub2

2017-03-17 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1998#discussion_r106768160
  
--- Diff: 
external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/spout/EventHubReceiverImpl.java
 ---
@@ -65,77 +57,81 @@ public EventHubReceiverImpl(EventHubSpoutConfig config, 
String partitionId) {
   }
 
   @Override
-  public void open(IEventHubFilter filter) throws EventHubException {
-logger.info("creating eventhub receiver: partitionId=" + partitionId + 
-   ", filterString=" + filter.getFilterString());
+  public void open(String offset) throws EventHubException {
+logger.info("creating eventhub receiver: partitionId=" + partitionId +
+", offset=" + offset);
 long start = System.currentTimeMillis();
-receiver = new ResilientEventHubReceiver(connectionString, entityName,
-   partitionId, consumerGroupName, defaultCredits, filter);
-receiver.initialize();
-
+try {
+  ehClient = 
EventHubClient.createFromConnectionStringSync(connectionString);
+  receiver = ehClient.createEpochReceiverSync(
+  consumerGroupName,
+  partitionId,
+  offset,
+  false,
+  1);
+}catch (Exception e){
+  logger.info("Exception in creating EventhubClient"+e.toString());
+}
 long end = System.currentTimeMillis();
 logger.info("created eventhub receiver, time taken(ms): " + 
(end-start));
   }
 
   @Override
-  public void close() {
+  public void close(){
 if(receiver != null) {
-  receiver.close();
+  try {
+receiver.close().whenComplete((voidargs,error)->{
+  try{
+if(error!=null){
+  logger.error("Exception during receiver close 
phase"+error.toString());
+}
+ehClient.closeSync();
+  }catch (Exception e){
+logger.error("Exception during ehclient close 
phase"+e.toString());
+  }
+}).get();
+  }catch (InterruptedException e){
--- End diff --

same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1998: Eventhub2

2017-03-17 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1998#discussion_r106768154
  
--- Diff: 
external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/spout/EventHubReceiverImpl.java
 ---
@@ -65,77 +57,81 @@ public EventHubReceiverImpl(EventHubSpoutConfig config, 
String partitionId) {
   }
 
   @Override
-  public void open(IEventHubFilter filter) throws EventHubException {
-logger.info("creating eventhub receiver: partitionId=" + partitionId + 
-   ", filterString=" + filter.getFilterString());
+  public void open(String offset) throws EventHubException {
+logger.info("creating eventhub receiver: partitionId=" + partitionId +
+", offset=" + offset);
 long start = System.currentTimeMillis();
-receiver = new ResilientEventHubReceiver(connectionString, entityName,
-   partitionId, consumerGroupName, defaultCredits, filter);
-receiver.initialize();
-
+try {
+  ehClient = 
EventHubClient.createFromConnectionStringSync(connectionString);
+  receiver = ehClient.createEpochReceiverSync(
+  consumerGroupName,
+  partitionId,
+  offset,
+  false,
+  1);
+}catch (Exception e){
+  logger.info("Exception in creating EventhubClient"+e.toString());
+}
 long end = System.currentTimeMillis();
 logger.info("created eventhub receiver, time taken(ms): " + 
(end-start));
   }
 
   @Override
-  public void close() {
+  public void close(){
 if(receiver != null) {
-  receiver.close();
+  try {
+receiver.close().whenComplete((voidargs,error)->{
+  try{
+if(error!=null){
+  logger.error("Exception during receiver close 
phase"+error.toString());
+}
+ehClient.closeSync();
+  }catch (Exception e){
--- End diff --

same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1998: Eventhub2

2017-03-17 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1998#discussion_r106768176
  
--- Diff: 
external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/spout/EventHubReceiverImpl.java
 ---
@@ -65,77 +57,81 @@ public EventHubReceiverImpl(EventHubSpoutConfig config, 
String partitionId) {
   }
 
   @Override
-  public void open(IEventHubFilter filter) throws EventHubException {
-logger.info("creating eventhub receiver: partitionId=" + partitionId + 
-   ", filterString=" + filter.getFilterString());
+  public void open(String offset) throws EventHubException {
+logger.info("creating eventhub receiver: partitionId=" + partitionId +
+", offset=" + offset);
 long start = System.currentTimeMillis();
-receiver = new ResilientEventHubReceiver(connectionString, entityName,
-   partitionId, consumerGroupName, defaultCredits, filter);
-receiver.initialize();
-
+try {
+  ehClient = 
EventHubClient.createFromConnectionStringSync(connectionString);
+  receiver = ehClient.createEpochReceiverSync(
+  consumerGroupName,
+  partitionId,
+  offset,
+  false,
+  1);
+}catch (Exception e){
+  logger.info("Exception in creating EventhubClient"+e.toString());
+}
 long end = System.currentTimeMillis();
 logger.info("created eventhub receiver, time taken(ms): " + 
(end-start));
   }
 
   @Override
-  public void close() {
+  public void close(){
 if(receiver != null) {
-  receiver.close();
+  try {
+receiver.close().whenComplete((voidargs,error)->{
+  try{
+if(error!=null){
+  logger.error("Exception during receiver close 
phase"+error.toString());
+}
+ehClient.closeSync();
+  }catch (Exception e){
+logger.error("Exception during ehclient close 
phase"+e.toString());
+  }
+}).get();
+  }catch (InterruptedException e){
+logger.error("Exception occured during close phase"+e.toString());
+  }catch (ExecutionException e){
+logger.error("Exception occured during close phase"+e.toString());
+  }
   logger.info("closed eventhub receiver: partitionId=" + partitionId );
   receiver = null;
+  ehClient =  null;
 }
   }
-  
+
+
   @Override
   public boolean isOpen() {
 return (receiver != null);
   }
 
   @Override
-  public EventData receive(long timeoutInMilliseconds) {
+  public EventDataWrap receive() {
 long start = System.currentTimeMillis();
-Message message = receiver.receive(timeoutInMilliseconds);
+Iterable receivedEvents=null;
+/*Get one message at a time for backward compatibility behaviour*/
+try {
+  receivedEvents = receiver.receiveSync(1);
+}catch (ServiceBusException e){
--- End diff --

same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1998: Eventhub2

2017-03-17 Thread harshach
Github user harshach commented on a diff in the pull request:

https://github.com/apache/storm/pull/1998#discussion_r106768224
  
--- Diff: 
external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/spout/PartitionManager.java
 ---
@@ -74,8 +74,8 @@ public void ack(String offset) {
   @Override
   public void fail(String offset) {
 logger.warn("fail on " + offset);
--- End diff --

this might trigger lot of log writing if the spout starts failing. May be 
good idea to make it debug


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2018: STORM-2416: Improve Release Packaging to Reduce Fi...

2017-03-17 Thread ptgoetz
GitHub user ptgoetz opened a pull request:

https://github.com/apache/storm/pull/2018

STORM-2416: Improve Release Packaging to Reduce File Size

This brings the binary distribution file size down from > 200MB to ~79MB.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ptgoetz/storm STORM-2416

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2018.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2018


commit d6c8298d8a596589442cfa049039ff90baa33f5f
Author: P. Taylor Goetz 
Date:   2017-03-16T19:14:26Z

STORM-2416: break out storm-jms-examples

commit 1b514cbd0d029f7ff4bc32b1061350fede89beab
Author: P. Taylor Goetz 
Date:   2017-03-16T20:47:20Z

STORM-2416: move flux-examples and refactor storm-mqtt module

commit ff93e07f0d63c4942ee398cd795006984ff8e4c7
Author: P. Taylor Goetz 
Date:   2017-03-17T20:04:53Z

STORM-2416: normalize provided.scope across poms; move storm-perf to 
examples; update packaging

commit cbf064c12fa84369da765563be26077c5edf2312
Author: P. Taylor Goetz 
Date:   2017-03-17T20:42:33Z

STORM-2416: fix typos; stop shading Druid connector




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2017: STORM-2416: Release Packaging Improvements

2017-03-17 Thread ptgoetz
Github user ptgoetz closed the pull request at:

https://github.com/apache/storm/pull/2017


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2017: STORM-2416: Release Packaging Improvements

2017-03-17 Thread ptgoetz
GitHub user ptgoetz opened a pull request:

https://github.com/apache/storm/pull/2017

STORM-2416: Release Packaging Improvements

This reduces the size of the binary distribution from > 200MB to ~79MB.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ptgoetz/storm STORM-2416

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2017.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2017


commit b02c6a2edfdbf32766aeeaaef94a95bde64ecfe5
Author: Wissman, Matthew 
Date:   2016-12-29T15:29:39Z

STORM-2095 remove any remaining files when deleting blobstore directory

commit 89829354b71678d581da886c855c98adf5c48d4d
Author: Jungtaek Lim 
Date:   2017-01-03T14:14:22Z

Merge branch 'STORM-2095-1.x-merge' into 1.x-branch

commit 4d882dc0ae781e8d66dba49cd0825268eb4cff2a
Author: Jungtaek Lim 
Date:   2017-01-03T14:14:38Z

STORM-2095: CHANGELOG

commit fa59dfe1301d5a875dc991d16a75f44df1c9bb72
Author: Jungtaek Lim 
Date:   2017-01-03T14:26:09Z

Merge branch 'STORM-2266-1.x' of https://github.com/satishd/storm into 
STORM-2266-1.x-merge

commit 0d40546a9019f68330bb2e82523db35ad55f41ed
Author: Jungtaek Lim 
Date:   2017-01-03T14:30:54Z

STORM-2266: CHANGELOG

commit d1ffee2a29194afdd74ab6de4b401c72fe4760ee
Author: Jungtaek Lim 
Date:   2016-11-15T03:55:44Z

STORM-2200 [Storm SQL] Drop Aggregate & Join support on Trident mode

* Dropping aggregate & join support based on discussion
  * also addresses documentation
* These implementation will be reused when we introduce them again

commit 4db9472c4f7ba05b9bd7169e88eeaa875cf44178
Author: Jungtaek Lim 
Date:   2017-01-03T22:39:35Z

Merge branch 'STORM-2200-1.x-merge' into 1.x-branch

commit bd93b7f7ba020f6ca4290404a2ceeb95381c1efc
Author: Jungtaek Lim 
Date:   2017-01-03T22:39:57Z

STORM-2200: CHANGELOG

commit 032f1f33af93c99c27c53a6fb0cbe83f8d5fdb8b
Author: chenyuzhao 
Date:   2017-01-04T02:57:01Z

kafka spout consume from latest when zk offset bigger than latest offset

commit 209e2cd504f1e7078451b919ae7b765bcf2fe4a2
Author: Sanket 
Date:   2016-12-21T21:08:12Z

Provide Socket time out for nimbus thrift client

commit 6184e560d49542f157c397e371fe187f527c6c3f
Author: Sanket 
Date:   2016-12-21T22:29:02Z

increasing timeout as tumbling window test is taking more than 10 minutes 
to run?

commit b432d7dcafce646dd28e9107381b5838db98bdf7
Author: Sanket 
Date:   2016-12-22T15:45:42Z

realized tumbling window test is failing from way before my PR and has 
nothing to do with mine so switching it back to 10 minutes which should be 
sufficient

commit 204dd8765fdebf7093c2dadc62567bf9a924d89f
Author: Jungtaek Lim 
Date:   2017-01-04T08:44:12Z

Merge branch 'STORM-2254-1.x-merge' into 1.x-branch

commit 3129bfa37a51bce9923b17f32499cadbd9e5addb
Author: Jungtaek Lim 
Date:   2017-01-04T08:57:30Z

STORM-2254: CHANGELOG

commit bd2f6cdc00c4f3b5d89d484faaa7fd29727c1afc
Author: Jungtaek Lim 
Date:   2017-01-03T04:27:08Z

STORM-2267 Use user's local maven repo. directory to local repo.

* use user's maven local repository if possible
* if it's not available, just use 'local-repo' (it will be created if not 
exists)

commit c4bf9f854235aec82d60b46f3e914ed3bdf1ec98
Author: Jungtaek Lim 
Date:   2017-01-04T09:28:29Z

Merge branch 'STORM-2267-1.x-merge' into 1.x-branch

commit 00d20742066584567362489507ecb9ddc69e0287
Author: Jungtaek Lim 
Date:   2017-01-04T09:28:53Z

STORM-2267: CHANGELOG

commit 4c3c4962ea591705c385ee86fff159c09f22ad02
Author: Hugo Louro 
Date:   2017-01-04T22:27:35Z

PMMLPredictorBolt - Handle duplicate output and predicted fields

commit 133d71518a30f496f1affcf6cd3d0c08f32257a2
Author: Jungtaek Lim 
Date:   2017-01-05T02:17:32Z

Merge branch '1.x-branch' of https://github.com/ambud/storm into 
STORM-2204-1.x-merge

commit f75f3e3e3f1c86d1a0cce402940645155fe817b7
Author: Jungtaek Lim 
Date:   2017-01-05T02:20:30Z

STORM-2204: CHANGELOG

commit cf41f31d8ac0c9d34ec808752b0b74c043fdbc08
Author: chenyuzhao 
Date:   2017-01-05T02:32:26Z

format code

commit e4cff9ba5ca5015e3230211798d5e0dfacbf382f
Author: Jungtaek Lim 
Date:   2017-01-05T04:05:11Z

STORM-2276 Remove twitter4j usages due to license issue (JSON.org is 
catalog X)

commit 7e7b2631a8405e2adf3942d20f41701446cd5de0
Author: 

Re: Too many threads per bolt instance (as seen with jconsole images)

2017-03-17 Thread Roshan Naik
Can you provide a jstack dump of all the threads in one worker ?
-roshan 


On 3/16/17, 8:40 PM, "S G"  wrote:

I did not say 3 thds per spout/bolt.
I said "When 60 bolts are run per node, there are 1200 total threads/node"
That gets me 20 threads/bolt.
And I am not creating any new threads inside my bolts.


On Thu, Mar 16, 2017 at 6:12 PM, Roshan Naik  wrote:

> I assume you mean ..3 thds per spout/bolt is on the high side ?
> Currently there is an executor thd, a xsfer thd and flusher thd. The plan
> (in the redesign – STORM-2284 subtask 2) is to have 1 thread per 
spout/bolt.
> Also have some (untested) thoughts on reducing the number of remaining
> threads in the worker.
> -roshan
>
>
> On 3/16/17, 5:07 PM, "S G"  wrote:
>
> Thanks for sharing the doc Roshan.
> It is very informative.
>
> I think number of threads per bolt is on the high side (When 60 bolts
> are
> run per node, there are 1200 total threads/node).
>
> Some of these would be essential for the worker's book-keeping but
> still it
> seems we can get a lot of performance boost if we can somehow reduce
> these
> threads and the associated context switching between them.
>
>
> On Thu, Mar 16, 2017 at 12:53 PM, Roshan Naik 
> wrote:
>
> > Typically  there are 3 threads per spout or bolt instance (unless 
the
> > spout or bolt is spawning its own threads). There are the acker
> bolts,
> > event loggers and system bolt running there too.
> >
> > Then there are several more per worker. See a summary of it in
> section 3.1
> > of this document
> >
> > https://docs.google.com/document/d/1EzeHL3d7EE-
> > RyyBEpN7CwRmWz3oqjbbKiVVAlzFp2Nc/edit?usp=sharing
> >
> >
> >
> >
> > Get Outlook for iOS
> >
> >
> >
> >
> > On Thu, Mar 16, 2017 at 12:16 PM -0700, "S G" <
> sg.online.em...@gmail.com<
> > mailto:sg.online.em...@gmail.com>> wrote:
> >
> > Hi,
> >
> > I am trying to make sense of number of threads seen in JConsole.
> > It seems like a very high number of threads are launched per bolt
> thread.
> >
> >
> > [cid:ii_j0crtbtm0_15ad8874d0924ae2]
> > ​
> > Experiment 1
> > topology.workers.val=16
> > spout.parallelism.val=1
> > bolt1.parallelism.val=900
> > bolt2.parallelism.val=160
> > observation:
> > Threads seen per node in jconsole = 1200
> > Bolt threads per node = (900 + 160)/16 = 66
> > Threads per bolt = 1200/66 = 18
> >
> >
> > Experiment 2
> > topology.workers.val=16
> > spout.parallelism.val=1
> > bolt1.parallelism.val=16
> > bolt2.parallelism.val=16
> > observation:
> > Threads seen per node in jconsole = 61
> > Bolt threads per node = (16+16)/16 = 2
> > Threads per bolt = 61/2 = 30
> >
> > There are no other topologies running in my cluster.
> > There are no other spouts/bolts running in my cluster except the 
ones
> > mentioned above.
> > I am running only one worker process per machine.
> >
> >
> > So the question is how many threads per bolt are launched by storm?
> > I am not interested in the exact number, but concerned about the 
high
> > number of extra threads (18+ thread) for running a single bolt.
> >
> > Can we limit or optimize it somehow?
> > Or if all of them are required, it would be good to document them
> > somewhere.
> >
> > Thanks
> > SG
> >
> >
> >
> >
>
>
>




[GitHub] storm issue #2016: STORM-2422: Reduce the size of a serialized trident topol...

2017-03-17 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2016
  
@arunmahadevan yes this is showing up in a real topology (part of 
monitoring and alerting at Yahoo).  And yes some of these topologies are really 
large.  They are machine generated from a number of different user provided 
configurations, which is why it can grow to be so large.  Having fewer big 
topologies instead of many small topologies makes them more efficient because 
they only have to read/process the metrics once.  There are ways to work around 
it on the user side, but this felt like a cleaner solution overall.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #2016: STORM-2422: Reduce the size of a serialized triden...

2017-03-17 Thread revans2
GitHub user revans2 opened a pull request:

https://github.com/apache/storm/pull/2016

STORM-2422: Reduce the size of a serialized trident topology

For now I only plan to do this for master.  If someone else runs into this 
issue I am happy to backport it to other versions of storm.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/revans2/incubator-storm STORM-2422

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2016.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2016


commit 525f14f64b112cb6e73788a0e97dad56048c876d
Author: Robert (Bobby) Evans 
Date:   2017-03-17T18:25:27Z

STORM-2422: Reduce the size of a serialized trident topology




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1924: STORM-2343: New Kafka spout can stop emitting tupl...

2017-03-17 Thread srdo
Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/1924#discussion_r106689294
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java
 ---
@@ -270,17 +272,40 @@ public void setWaitingToEmit(ConsumerRecords 
consumerRecords) {
 
 //  poll =
 private ConsumerRecords pollKafkaBroker() {
-doSeekRetriableTopicPartitions();
-if (refreshSubscriptionTimer.isExpiredResetOnTrue()) {
-kafkaSpoutConfig.getSubscription().refreshAssignment();
+Set retriableTopicPartitions = 
doSeekRetriableTopicPartitions();
+Set partitionsToPause = 
getPartitionsToPauseToEnforceUncommittedOffsetsLimit(retriableTopicPartitions);
--- End diff --

I wasn't sure if calling resume() after every poll() might be expensive, 
but it appears to just set a flag in the consumer. I'll change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1924: STORM-2343: New Kafka spout can stop emitting tupl...

2017-03-17 Thread srdo
Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/1924#discussion_r106687999
  
--- Diff: 
external/storm-kafka-client/src/test/java/org/apache/storm/kafka/spout/builders/SingleTopicKafkaSpoutConfiguration.java
 ---
@@ -41,31 +41,22 @@ public static Config getConfig() {
 
 public static StormTopology getTopologyKafkaSpout(int port) {
 final TopologyBuilder tp = new TopologyBuilder();
-tp.setSpout("kafka_spout", new 
KafkaSpout<>(getKafkaSpoutConfig(port)), 1);
+tp.setSpout("kafka_spout", new 
KafkaSpout<>(getKafkaSpoutConfigBuilder(port).build()), 1);
--- End diff --

The SingleTopicKafkaSpoutConfiguration class was beginning to turn into a 
mess of telescoping constructors, and it was a bit silly given that we already 
have a nice builder interface underlying that class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1924: STORM-2343: New Kafka spout can stop emitting tupl...

2017-03-17 Thread srdo
Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/1924#discussion_r106687191
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutRetryExponentialBackoff.java
 ---
@@ -54,21 +55,28 @@
 private static class RetryEntryTimeStampComparator implements 
Serializable, Comparator {
 @Override
 public int compare(RetrySchedule entry1, RetrySchedule entry2) {
-return 
Long.valueOf(entry1.nextRetryTimeNanos()).compareTo(entry2.nextRetryTimeNanos());
+int result = 
Long.valueOf(entry1.nextRetryTimeNanos()).compareTo(entry2.nextRetryTimeNanos());
+
+if(result == 0) {
+//TreeSet uses compareTo instead of equals() for the Set 
contract
+//Ensure that we can save two retry schedules with the 
same timestamp
--- End diff --

It's probably impossible in real life if System.nanoTime has sufficiently 
high resolution (not guaranteed to be any better than 
System.currentTimeMillis), but it can happen in tests that use simulated time. 
There's still potential for collisions if two retry schedules happen to have 
the same hashCode and timestamp, but that's very unlikely. I mostly made the 
change for the benefit of the tests, but it doesn't hurt to reduce the chance 
of collision in general.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1924: STORM-2343: New Kafka spout can stop emitting tupl...

2017-03-17 Thread srdo
Github user srdo commented on a diff in the pull request:

https://github.com/apache/storm/pull/1924#discussion_r106685548
  
--- Diff: 
external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java
 ---
@@ -268,12 +268,12 @@ private Builder(Builder builder, 
SerializableDeserializer keyDes, Class
 
 /**
  * The maximum number of records a poll will return.
- * Will only work with Kafka 0.10.0 and above.
+ * This is limited by maxUncommittedOffsets, since it doesn't make 
sense to allow larger polls than the spout is allowed to emit.
+ * Please note that when there are retriable tuples on a 
partition, maxPollRecords is an upper bound for how far the spout will read 
past the last committed offset on that partition.
+ * It is recommended that users set maxUncommittedOffsets and 
maxPollRecords to be equal.
--- End diff --

@hmcl I agree that it is not ideal, but there's an issue with letting 
maxUncommittedOffsets be larger than maxPollRecords.

From an earlier response:
"If maxPollRecords is less than maxUncommittedOffsets, there's a risk of 
the spout getting stuck on some tuples for a while when it is retrying tuples.
Say there are 10 retriable tuples following the last committed offset, and 
maxUncommittedOffsets is 10. If maxPollRecords is 5 and the first 5 retriable 
tuples are reemitted in the first batch, the next 5 tuples can't be emitted 
until (some of) the first 5 are acked. This is because the spout will seek the 
consumer back to the last committed offset any time there are failed tuples, 
which will lead to it getting the first 5 tuples out of the consumer, checking 
that they are emitted, and skipping them. This will repeat until the last 
committed offset moves. If there are other partitions with tuples available, 
those tuples may get emitted, but the "blocked" partition won't progress until 
some tuples are acked on it."

How about we fix this by making doSeekRetriableTopicPartitions seek to the 
lowest retriable offset per partition for the partitions with failed tuples, 
instead of seeking to the last committed/committable offset? It seems like 
seeking to the last committed offset is likely to have some bad interactions 
with maxUncommittedOffsets and maxPollRecords. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2014: STORM-2038: Disable symlinks with a config option (1.x)

2017-03-17 Thread revans2
Github user revans2 commented on the issue:

https://github.com/apache/storm/pull/2014
  
@HeartSaVioR could you also take a look at #2015 it is almost identical, 
except some white space changes in nimbus.clj made it so the cherry-pick was 
not 100% clean.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #2005: STORM-2414 Skip checking ACL when clearing already remove...

2017-03-17 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/storm/pull/2005
  
Looking into the logic deeply and realize `deleteBlob` is not that simple 
as it seems especially with H/A... Investigating more on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---