[jira] [Closed] (STORM-2915) How could I to get the fail Number in Bolt When I use Kafka Spout

2018-06-24 Thread Jungtaek Lim (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim closed STORM-2915.
---
Resolution: Invalid

The issue looked like a question which is better to be posted to user mailing 
list.

> How could I to get the fail Number   in Bolt When I use  Kafka Spout
> 
>
> Key: STORM-2915
> URL: https://issues.apache.org/jira/browse/STORM-2915
> Project: Apache Storm
>  Issue Type: New Feature
>  Components: storm-kafka-client
>Affects Versions: 1.0.2, 1.1.0, 1.0.3, 1.0.4, 1.1.1, 1.0.5
>Reporter: Gergo Hong
>Priority: Minor
>
> I want to get fail num in bolt , how could  I  to get it? 
> if  fail it  retry, I see This 
> if (!isScheduled || retryService.isReady(msgId)) {
>  final String stream = tuple instanceof KafkaTuple ? ((KafkaTuple) 
> tuple).getStream() : Utils.DEFAULT_STREAM_ID;
>  if (!isAtLeastOnceProcessing()) {
>  if (kafkaSpoutConfig.isTupleTrackingEnforced()) {
>  collector.emit(stream, tuple, msgId);
>  LOG.trace("Emitted tuple [{}] for record [{}] with msgId [{}]", tuple, 
> record, msgId);
>  } else {
>  collector.emit(stream, tuple);
>  LOG.trace("Emitted tuple [{}] for record [{}]", tuple, record);
>  }
>  } else {
>  emitted.add(msgId);
>  offsetManagers.get(tp).addToEmitMsgs(msgId.offset());
>  if (isScheduled) { // Was scheduled for retry and re-emitted, so remove from 
> schedule.
>  retryService.remove(msgId);
>  }
>  collector.emit(stream, tuple, msgId);
>  tupleListener.onEmit(tuple, msgId);
>  LOG.trace("Emitted tuple [{}] for record [{}] with msgId [{}]", tuple, 
> record, msgId);
>  }
>  return true;
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-2945) Nail down and document how to support background emits in Spouts and Bolts

2018-06-24 Thread Jungtaek Lim (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated STORM-2945:

Issue Type: Documentation  (was: Bug)

> Nail down and document how to support background emits in Spouts and Bolts
> --
>
> Key: STORM-2945
> URL: https://issues.apache.org/jira/browse/STORM-2945
> Project: Apache Storm
>  Issue Type: Documentation
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-2963) Updates to Performance.md

2018-06-24 Thread Jungtaek Lim (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated STORM-2963:

Issue Type: Documentation  (was: Improvement)

> Updates to Performance.md 
> --
>
> Key: STORM-2963
> URL: https://issues.apache.org/jira/browse/STORM-2963
> Project: Apache Storm
>  Issue Type: Documentation
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-2963) Updates to Performance.md

2018-06-24 Thread Jungtaek Lim (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated STORM-2963:

Fix Version/s: (was: 2.0.0)

> Updates to Performance.md 
> --
>
> Key: STORM-2963
> URL: https://issues.apache.org/jira/browse/STORM-2963
> Project: Apache Storm
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3061) Upgrade Dependencies before 2.x release

2018-06-24 Thread Jungtaek Lim (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated STORM-3061:

Affects Version/s: 2.0.0

> Upgrade Dependencies before 2.x release
> ---
>
> Key: STORM-3061
> URL: https://issues.apache.org/jira/browse/STORM-3061
> Project: Apache Storm
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Storm has a lot of dependencies.  It would be great to upgrade many of them 
> to newer versions ahead of a 2.x release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3081) Storm kafka client not consuming messages properly

2018-06-24 Thread Jungtaek Lim (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated STORM-3081:

Component/s: (was: storm-kafka)

> Storm kafka client not consuming messages properly
> --
>
> Key: STORM-3081
> URL: https://issues.apache.org/jira/browse/STORM-3081
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Affects Versions: 1.2.1
>Reporter: Kush Khandelwal
>Priority: Blocker
>
> A single thread is pushing some messages to kafka serially and we consume it 
> at the storm's end using storm-kafka-client.
> After a few requests, the consumer is not able to consume from the queue 
> blocking the thread which waits for the response from storm's end. 
> We added one more consumer with a different consumer group and there the 
> messages are getting read properly.
> So we know there is some problem at the storm kafka client consumer's end.
> The kafka spout config is written like this - 
> KafkaSpoutConfig kafkaSpoutConfig = 
> KafkaSpoutConfig.builder(kafkaProperties.getProperty("bootstrap.servers"), 
> stormProperties.getProperty("TOPIC"))
> 
> .setFirstPollOffsetStrategy(KafkaSpoutConfig.FirstPollOffsetStrategy.LATEST)
> .setRecordTranslator(new MessageDeserializer(), arguments)
> .build();
> I can't seem to figure out the issue.
> Can someone please help me out?
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3110) Supervisor does not kill all worker processes in secure mode in case of user mismatch

2018-06-24 Thread Jungtaek Lim (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated STORM-3110:

Issue Type: Bug  (was: Improvement)

> Supervisor does not kill all worker processes in secure mode in case of user 
> mismatch
> -
>
> Key: STORM-3110
> URL: https://issues.apache.org/jira/browse/STORM-3110
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Arun Mahadevan
>Assignee: Arun Mahadevan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.2.3
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> While running in secure mode, supervisor sets the worker user (in workers 
> local state) as the user that launched the topology.
>  
> {code:java}
> SET worker-user 4d67a6be-4c80-4622-96af-f94706d58553 foo
> {code}
>  
> However the worker OS process does not actually run as the user "foo" 
> (instead runs as storm user) unless {{supervisor.run.worker.as.user}} is also 
> set.
> If the supervisor's assignment changes, the supervisor in some cases checks 
> if all processes are dead by matching the "pid+user". Here if the worker is 
> running as a different user (say storm) the supervisor wrongly assumes that 
> the worker process is dead.
> Later when supervisor tries to launch a worker at that same port, it throws a 
> bind exception
> o.a.s.m.n.Server main [INFO] Create Netty Server Netty-server-localhost-6700, 
> buffer_size: 5242880, maxWorkers: 1
>  o.a.s.d.worker main [ERROR] Error on initialization of server mk-worker
>  org.apache.storm.shade.org.jboss.netty.channel.ChannelException: Failed to 
> bind to: 0.0.0.0/0.0.0.0:6700
>  at 
> org.apache.storm.shade.org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>  ~[storm-core-1.2.0.3.1.0.0-501.jar:1.2.0.3.1.0.0-501]
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (STORM-3110) Supervisor does not kill all worker processes in secure mode in case of user mismatch

2018-06-24 Thread Jungtaek Lim (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved STORM-3110.
-
   Resolution: Fixed
Fix Version/s: 1.2.3
   2.0.0

Thanks [~arunmahadevan], I merged into master and 1.x-branch.

For master branch I squashed commits and pushed. The commit is 
https://github.com/apache/storm/commit/82deb62dd3a6f58e5569244517bc93647fcab2a2

> Supervisor does not kill all worker processes in secure mode in case of user 
> mismatch
> -
>
> Key: STORM-3110
> URL: https://issues.apache.org/jira/browse/STORM-3110
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: Arun Mahadevan
>Assignee: Arun Mahadevan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.2.3
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> While running in secure mode, supervisor sets the worker user (in workers 
> local state) as the user that launched the topology.
>  
> {code:java}
> SET worker-user 4d67a6be-4c80-4622-96af-f94706d58553 foo
> {code}
>  
> However the worker OS process does not actually run as the user "foo" 
> (instead runs as storm user) unless {{supervisor.run.worker.as.user}} is also 
> set.
> If the supervisor's assignment changes, the supervisor in some cases checks 
> if all processes are dead by matching the "pid+user". Here if the worker is 
> running as a different user (say storm) the supervisor wrongly assumes that 
> the worker process is dead.
> Later when supervisor tries to launch a worker at that same port, it throws a 
> bind exception
> o.a.s.m.n.Server main [INFO] Create Netty Server Netty-server-localhost-6700, 
> buffer_size: 5242880, maxWorkers: 1
>  o.a.s.d.worker main [ERROR] Error on initialization of server mk-worker
>  org.apache.storm.shade.org.jboss.netty.channel.ChannelException: Failed to 
> bind to: 0.0.0.0/0.0.0.0:6700
>  at 
> org.apache.storm.shade.org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>  ~[storm-core-1.2.0.3.1.0.0-501.jar:1.2.0.3.1.0.0-501]
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3122) FNFE due to race condition between "async localizer" and "update blob" timer thread

2018-06-24 Thread Jungtaek Lim (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated STORM-3122:

Priority: Critical  (was: Major)

> FNFE due to race condition between "async localizer" and "update blob" timer 
> thread
> ---
>
> Key: STORM-3122
> URL: https://issues.apache.org/jira/browse/STORM-3122
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 1.x
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There's race condition between "async localizer" and "update blob" timer 
> thread.
> When worker is shutting down, reference count for blob will be 0 and 
> supervisor will remove actual blob file. There's also "update blob" timer 
> thread which tries to keep blobs updated for downloaded topologies. While 
> updating topology it should read some of blob files already downloaded 
> assuming these files should be downloaded before, and the assumption is 
> broken because of async localizer.
> [~arunmahadevan] suggested an approach to fix this: "updateBlobsForTopology" 
> can just catch the FIleNotFoundException and skip updating the blobs in case 
> it can't find the stormconf, and the approach looks simplest fix so I'll 
> provide a patch based on suggestion.
> Btw, it doesn't apply to master branch, since in master branch all blobs are 
> synced up separately (no need to read stormconf to enumerate topology related 
> blobs), and update logic is already fault-tolerance (skip to next sync when 
> it can't pull the blob).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3122) FNFE due to race condition between "async localizer" and "update blob" timer thread

2018-06-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated STORM-3122:
--
Labels: pull-request-available  (was: )

> FNFE due to race condition between "async localizer" and "update blob" timer 
> thread
> ---
>
> Key: STORM-3122
> URL: https://issues.apache.org/jira/browse/STORM-3122
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 1.x
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: pull-request-available
>
> There's race condition between "async localizer" and "update blob" timer 
> thread.
> When worker is shutting down, reference count for blob will be 0 and 
> supervisor will remove actual blob file. There's also "update blob" timer 
> thread which tries to keep blobs updated for downloaded topologies. While 
> updating topology it should read some of blob files already downloaded 
> assuming these files should be downloaded before, and the assumption is 
> broken because of async localizer.
> [~arunmahadevan] suggested an approach to fix this: "updateBlobsForTopology" 
> can just catch the FIleNotFoundException and skip updating the blobs in case 
> it can't find the stormconf, and the approach looks simplest fix so I'll 
> provide a patch based on suggestion.
> Btw, it doesn't apply to master branch, since in master branch all blobs are 
> synced up separately (no need to read stormconf to enumerate topology related 
> blobs), and update logic is already fault-tolerance (skip to next sync when 
> it can't pull the blob).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (STORM-3122) FNFE due to race condition between "async localizer" and "update blob" timer thread

2018-06-24 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created STORM-3122:
---

 Summary: FNFE due to race condition between "async localizer" and 
"update blob" timer thread
 Key: STORM-3122
 URL: https://issues.apache.org/jira/browse/STORM-3122
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-core
Affects Versions: 1.x
Reporter: Jungtaek Lim
Assignee: Jungtaek Lim


There's race condition between "async localizer" and "update blob" timer thread.

When worker is shutting down, reference count for blob will be 0 and supervisor 
will remove actual blob file. There's also "update blob" timer thread which 
tries to keep blobs updated for downloaded topologies. While updating topology 
it should read some of blob files already downloaded assuming these files 
should be downloaded before, and the assumption is broken because of async 
localizer.

[~arunmahadevan] suggested an approach to fix this: "updateBlobsForTopology" 
can just catch the FIleNotFoundException and skip updating the blobs in case it 
can't find the stormconf, and the approach looks simplest fix so I'll provide a 
patch based on suggestion.

Btw, it doesn't apply to master branch, since in master branch all blobs are 
synced up separately (no need to read stormconf to enumerate topology related 
blobs), and update logic is already fault-tolerance (skip to next sync when it 
can't pull the blob).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3121) Fix flaky metrics tests in storm-core

2018-06-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated STORM-3121:
--
Labels: pull-request-available  (was: )

> Fix flaky metrics tests in storm-core
> -
>
> Key: STORM-3121
> URL: https://issues.apache.org/jira/browse/STORM-3121
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>
> The tests are flaky, but only rarely fail. I've only seen them fail on Travis 
> when Travis is under load.
> Example failures:
> {code}
> classname: org.apache.storm.metrics-test / testname: 
> test-custom-metric-with-multi-tasks
> expected: (clojure.core/= [1 0 0 0 0 0 2] (clojure.core/subvec 
> (org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! "2" 
> "my-custom-metric") 0 N__3207__auto__))
>   actual: (not (clojure.core/= [1 0 0 0 0 0 2] [1 0 0 0 0 0 0]))
>   at: test_runner.clj:105
> {code}
> {code}
> classname: org.apache.storm.metrics-test / testname: test-builtin-metrics-2
> expected: (clojure.core/= [1 1] (clojure.core/subvec 
> (org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! 
> "myspout" "__emit-count/default") 0 N__3207__auto__))
>   actual: (not (clojure.core/= [1 1] [1 0]))
>   at: test_runner.clj:105
> {code}
> The problem is that the tests increment metrics counters in the executor 
> async loops, then expect the counters to end up in exact metrics buckets. The 
> creation of a bucket is triggered by the metrics timer. The timer is included 
> in time simulation and LocalCluster.waitForIdle, but the executor async loop 
> isn't. There isn't any guarantee that the executor async loop gets to run 
> when the test does a sequence like
> {code}
> Time.advanceClusterTime
> cluster.waitForIdle
> {code}
> because the waitForIdle check doesn't know about the executor async loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3120) Clean up leftover null checks in Time, ensure idle threads get to run when cluster time is advanced

2018-06-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated STORM-3120:
--
Labels: pull-request-available  (was: )

> Clean up leftover null checks in Time, ensure idle threads get to run when 
> cluster time is advanced
> ---
>
> Key: STORM-3120
> URL: https://issues.apache.org/jira/browse/STORM-3120
> Project: Apache Storm
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (STORM-3121) Fix flaky metrics tests in storm-core

2018-06-24 Thread JIRA
Stig Rohde Døssing created STORM-3121:
-

 Summary: Fix flaky metrics tests in storm-core
 Key: STORM-3121
 URL: https://issues.apache.org/jira/browse/STORM-3121
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-core
Affects Versions: 2.0.0
Reporter: Stig Rohde Døssing
Assignee: Stig Rohde Døssing


The tests are flaky, but only rarely fail. I've only seen them fail on Travis 
when Travis is under load.

Example failures:
{code}
classname: org.apache.storm.metrics-test / testname: 
test-custom-metric-with-multi-tasks
expected: (clojure.core/= [1 0 0 0 0 0 2] (clojure.core/subvec 
(org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! "2" 
"my-custom-metric") 0 N__3207__auto__))
  actual: (not (clojure.core/= [1 0 0 0 0 0 2] [1 0 0 0 0 0 0]))
  at: test_runner.clj:105
{code}
{code}
classname: org.apache.storm.metrics-test / testname: test-builtin-metrics-2
expected: (clojure.core/= [1 1] (clojure.core/subvec 
(org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! 
"myspout" "__emit-count/default") 0 N__3207__auto__))
  actual: (not (clojure.core/= [1 1] [1 0]))
  at: test_runner.clj:105
{code}

The problem is that the tests increment metrics counters in the executor async 
loops, then expect the counters to end up in exact metrics buckets. The 
creation of a bucket is triggered by the metrics timer. The timer is included 
in time simulation and LocalCluster.waitForIdle, but the executor async loop 
isn't. There isn't any guarantee that the executor async loop gets to run when 
the test does a sequence like
{code}
Time.advanceClusterTime
cluster.waitForIdle
{code}
because the waitForIdle check doesn't know about the executor async loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (STORM-3120) Clean up leftover null checks in Time, ensure idle threads get to run when cluster time is advanced

2018-06-24 Thread JIRA
Stig Rohde Døssing created STORM-3120:
-

 Summary: Clean up leftover null checks in Time, ensure idle 
threads get to run when cluster time is advanced
 Key: STORM-3120
 URL: https://issues.apache.org/jira/browse/STORM-3120
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Stig Rohde Døssing
Assignee: Stig Rohde Døssing






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)