[jira] [Created] (STORM-3272) supervisor can fail to delete topology stormdist folder
Aaron Gresch created STORM-3272: --- Summary: supervisor can fail to delete topology stormdist folder Key: STORM-3272 URL: https://issues.apache.org/jira/browse/STORM-3272 Project: Apache Storm Issue Type: Bug Reporter: Aaron Gresch Assignee: Aaron Gresch Upon investigating supervisor restarts, I saw messages such as: 2018-10-19 23:01:38.816 o.a.s.u.Utils Thread-20731 [INFO] UNNAMED:rmdir of /home/y/var/storm/supervisor/stormdist/XXX-124-1539730805 failed - Directory not empty Looking further, the directory contained a symlink that pointed at a missing file. Investigating worker-launcher code, this looks like it sees that this is a file that does not exist, so it does not process the symlink, preventing deletion of the directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (STORM-3271) Launch storm workers in docker containers
Ethan Li created STORM-3271: --- Summary: Launch storm workers in docker containers Key: STORM-3271 URL: https://issues.apache.org/jira/browse/STORM-3271 Project: Apache Storm Issue Type: New Feature Reporter: Ethan Li Assignee: Ethan Li FYI I am working on adding docker support for storm. Daemons will run on host machines and workers will be launched inside docker containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-971) Storm-Kafka: Emit metric for messages lost due to kafka retention
[ https://issues.apache.org/jira/browse/STORM-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-971: - Labels: pull-request-available (was: ) > Storm-Kafka: Emit metric for messages lost due to kafka retention > - > > Key: STORM-971 > URL: https://issues.apache.org/jira/browse/STORM-971 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka >Reporter: Abhishek Agarwal >Assignee: Abhishek Agarwal >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0, 2.0.0 > > > In case of TopicOffsetOutOfRange exception, it is useful to know just how > many unread messages were lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1662) Reduce map lookups in send_to_eventlogger
[ https://issues.apache.org/jira/browse/STORM-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1662: -- Labels: pull-request-available (was: ) > Reduce map lookups in send_to_eventlogger > - > > Key: STORM-1662 > URL: https://issues.apache.org/jira/browse/STORM-1662 > Project: Apache Storm > Issue Type: Bug >Reporter: Arun Mahadevan >Assignee: Arun Mahadevan >Priority: Major > Labels: pull-request-available > > Reducing map lookup in send_to_eventlogger can improve performance when when > a spout emits in a tight loop. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1707) Improve supervisor latency by removing 2-min wait
[ https://issues.apache.org/jira/browse/STORM-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1707: -- Labels: pull-request-available (was: ) > Improve supervisor latency by removing 2-min wait > - > > Key: STORM-1707 > URL: https://issues.apache.org/jira/browse/STORM-1707 > Project: Apache Storm > Issue Type: Improvement >Reporter: Paul Poulosky >Assignee: Paul Poulosky >Priority: Major > Labels: pull-request-available > > After launching workers, the supervisor waits up to 2 minutes synchronously > for the workers to be "launched". > We should remove this, and instead keep track of launch time, making the > "killer" function smart enough to determine the difference between a worker > that's still launching, one that's timed out, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1885) python script for squashing and merging prs
[ https://issues.apache.org/jira/browse/STORM-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1885: -- Labels: pull-request-available (was: ) > python script for squashing and merging prs > --- > > Key: STORM-1885 > URL: https://issues.apache.org/jira/browse/STORM-1885 > Project: Apache Storm > Issue Type: Task >Reporter: Sriharsha Chintalapani >Assignee: Sriharsha Chintalapani >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1674) Idle KafkaSpout consumes more bandwidth than needed
[ https://issues.apache.org/jira/browse/STORM-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1674: -- Labels: pull-request-available (was: ) > Idle KafkaSpout consumes more bandwidth than needed > --- > > Key: STORM-1674 > URL: https://issues.apache.org/jira/browse/STORM-1674 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka >Affects Versions: 0.9.3, 0.10.1, 1.0.1 >Reporter: Robert Hastings >Assignee: Robert Hastings >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0, 1.0.2, 1.1.0 > > > Discovered 30 megabits of traffic flowing between a set of KafkaSpouts > and our kafka servers even though no Kafka messages were moving. > Using the wireshark kafka dissector, we were able to see that > each FetchRequest had maxWait set to 1 > and minBytes set to 0. When binBytes is set to 0 the kafka server > responds immediately when there are no messages. In turn the KafkaSpout > polls without any delay causing a constant stream of FetchRequest/ > FetchResponse messages. Using a non-KafkaSpout client had a similar > traffic pattern with two key differences > 1) minBytes was 1 > 2) maxWait was 100 > With these FetchRequest parameters and no messages flowing, > the kafka server delays the FetchResponse by 100 ms. This reduces > the network traffic from megabits to the low kilobits. It also > reduced the CPU utilization of our kafka server from 140% to 2%. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1957) Support Storm JDBC batch insert
[ https://issues.apache.org/jira/browse/STORM-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1957: -- Labels: pull-request-available (was: ) > Support Storm JDBC batch insert > --- > > Key: STORM-1957 > URL: https://issues.apache.org/jira/browse/STORM-1957 > Project: Apache Storm > Issue Type: New Feature > Components: storm-jdbc >Affects Versions: 0.10.0, 1.0.0, 1.0.1 >Reporter: darion yaphet >Assignee: darion yaphet >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Batch insert support execute grouped SQL a batch and submit into one call . > It can reduce the amount of communication , improving performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1875) Separate Jedis/JedisCluster Config
[ https://issues.apache.org/jira/browse/STORM-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1875: -- Labels: pull-request-available (was: ) > Separate Jedis/JedisCluster Config > -- > > Key: STORM-1875 > URL: https://issues.apache.org/jira/browse/STORM-1875 > Project: Apache Storm > Issue Type: Improvement > Components: storm-redis >Affects Versions: 1.0.0, 1.0.1, 1.0.2 >Reporter: darion yaphet >Assignee: darion yaphet >Priority: Major > Labels: pull-request-available > > Separate Jedis / JedisCluster to provide full operations for each environment > to users . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2373) HDFS Spout should support multiple ignore extensions
[ https://issues.apache.org/jira/browse/STORM-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-2373: -- Labels: pull-request-available (was: ) > HDFS Spout should support multiple ignore extensions > > > Key: STORM-2373 > URL: https://issues.apache.org/jira/browse/STORM-2373 > Project: Apache Storm > Issue Type: Improvement > Components: storm-hdfs >Reporter: Sachin Pasalkar >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > Currently hdfs spout supports only .ignore or user provided one extension. It > should support multiple extension to be ignored -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1015) Store Kafka offsets with Kafka's consumer offset management api
[ https://issues.apache.org/jira/browse/STORM-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1015: -- Labels: consumer kafka offset pull-request-available (was: consumer kafka offset) > Store Kafka offsets with Kafka's consumer offset management api > --- > > Key: STORM-1015 > URL: https://issues.apache.org/jira/browse/STORM-1015 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka >Affects Versions: 1.0.0 >Reporter: Hang Sun >Assignee: Hang Sun >Priority: Minor > Labels: consumer, kafka, offset, pull-request-available > Original Estimate: 72h > Remaining Estimate: 72h > > Current Kafka spout stores the offsets (and some other states) inside ZK with > its proprietary format. This does not work well with other Kafka offset > monitoring tools such as Burrow, KafkaOffsetMonitor etc. In addition, the > performance does not scale well compared with offsets managed by Kafka's > built-in offset management api. I have added a new option for Kafka to store > the same data using Kafka's built-in offset management capability. The change > is completely backward compatible with the current ZK storage option. The > feature can be turned on by a single configuration option. Hope this will > help people who wants to explore the option of using Kafka's built-in offset > management api. > References: > https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka > https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommit/FetchAPI > -thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1368) For secure cluster, heapdump file lacks of group read permissions for UI download
[ https://issues.apache.org/jira/browse/STORM-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1368: -- Labels: pull-request-available (was: ) > For secure cluster, heapdump file lacks of group read permissions for UI > download > - > > Key: STORM-1368 > URL: https://issues.apache.org/jira/browse/STORM-1368 > Project: Apache Storm > Issue Type: Bug > Components: storm-core > Environment: Secure Storm (runAsUser) >Reporter: Zhuo Liu >Assignee: Zhuo Liu >Priority: Minor > Labels: pull-request-available > > In Secure storm, Jstack, gc and other log files have read permission for > group. However, heapdump generated from OOM or User Dynamic Profiling has no > read permissions for group user because JVM hard-coded it in this way. > HTTP ERROR: 500 > Problem accessing > /download/Penguin-151202-234121-19-1449099695%2F6701%2Frecording-28103-20151203-184734.bin. > Reason: > Server Error > We need to fix it to enable user to download heapdump from UI. > -rw--- 1 zhuol gstorm 3664982 Dec 3 19:37 > /home/y/var/storm/workers-artifacts/wc2-38-1449171210/6702/java_pid24691.hprof > -rw-r- 1 zhuol gstorm7597 Dec 3 19:37 > /home/y/var/storm/workers-artifacts/wc2-38-14491712 > {code} > 4373 // create binary file, rewriting existing file if required > 4374 int os::create_binary_file(const char* path, bool rewrite_existing) { > 4375 int oflags = O_WRONLY | O_CREAT; > 4376 if (!rewrite_existing) { > 4377 oflags |= O_EXCL; > 4378 } > 4379 return ::open64(path, oflags, S_IREAD | S_IWRITE); > 4380 } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1057) Add throughput metric to spout/bolt and display them on web ui
[ https://issues.apache.org/jira/browse/STORM-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1057: -- Labels: pull-request-available (was: ) > Add throughput metric to spout/bolt and display them on web ui > -- > > Key: STORM-1057 > URL: https://issues.apache.org/jira/browse/STORM-1057 > Project: Apache Storm > Issue Type: New Feature > Components: storm-core >Reporter: Li Wang >Assignee: Li Wang >Priority: Major > Labels: pull-request-available > Original Estimate: 168h > Time Spent: 1h 50m > Remaining Estimate: 166h 10m > > Throughput is a fundamental metric to reasoning about the performance > bottleneck of a topology. Displaying the throughputs of components and tasks > on the web ui could greatly facilitate the user identifying the performance > bottleneck and checking whether the the workload among components and tasks > are balanced. > What to do: > 1. Measure the throughput of each spout/bolt. > 2. Display the throughput metrics on web UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-828) HdfsBolt takes a lot of configuration, need good defaults
[ https://issues.apache.org/jira/browse/STORM-828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-828: - Labels: pull-request-available (was: ) > HdfsBolt takes a lot of configuration, need good defaults > - > > Key: STORM-828 > URL: https://issues.apache.org/jira/browse/STORM-828 > Project: Apache Storm > Issue Type: Improvement > Components: storm-hdfs >Reporter: Robert Joseph Evans >Priority: Major > Labels: pull-request-available > > The following is code from > https://github.com/apache/storm/blob/master/external/storm-hdfs/src/test/java/org/apache/storm/hdfs/bolt/HdfsFileTopology.java > representing the amount of configuration required to use the HdfsBolt. > {code} > // sync the filesystem after every 1k tuples > SyncPolicy syncPolicy = new CountSyncPolicy(1000); > // rotate files every 1 min > FileRotationPolicy rotationPolicy = new TimedRotationPolicy(1.0f, > TimedRotationPolicy.TimeUnit.MINUTES); > FileNameFormat fileNameFormat = new DefaultFileNameFormat() > .withPath("/tmp/foo/") > .withExtension(".txt"); > RecordFormat format = new DelimitedRecordFormat() > .withFieldDelimiter("|"); > Yaml yaml = new Yaml(); > InputStream in = new FileInputStream(args[1]); > Map yamlConf = (Map) yaml.load(in); > in.close(); > config.put("hdfs.config", yamlConf); > HdfsBolt bolt = new HdfsBolt() > .withConfigKey("hdfs.config") > .withFsUrl(args[0]) > .withFileNameFormat(fileNameFormat) > .withRecordFormat(format) > .withRotationPolicy(rotationPolicy) > .withSyncPolicy(syncPolicy) > .addRotationAction(new > MoveFileAction().toDestination("/tmp/dest2/")); > {code} > This is way too much. If it were just an example showing all of the > possibilities that would be OK but of the 8 lines used in the construction of > the bolt, 5 of them are required or the bolt will blow up at run time. We > should provide reasonable defaults for everything that can have a reasonable > default. And required parameters should be passed in through the > constructor, not as builder arguments. I realize we need to maintain > backwards compatibility so we may need some new Bolt definitions. > {code} > HdfsTSVBolt bolt = new HdfsTSVBolt(outputDir); > {code} > If someone wanted to sync every 100 records instead of every 1000 we could do > {code} > TSVFileBolt bolt = new TSVFileBolt(outputDir).withSyncPolicy(new > CountSyncPolicy(100)) > {code} > I would like to see a base HdfsFileBolt that requires a record format, and an > output directory. It would have defaults for everything else. Then we could > have a TSVFileBolt and CSVFileBolt subclass it and ideally SequenceFileBolt > as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1065) storm-kafka : kafka-partition can not find leader in zookeeper
[ https://issues.apache.org/jira/browse/STORM-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1065: -- Labels: pull-request-available (was: ) > storm-kafka : kafka-partition can not find leader in zookeeper > --- > > Key: STORM-1065 > URL: https://issues.apache.org/jira/browse/STORM-1065 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka >Reporter: dongxinwang >Priority: Minor > Labels: pull-request-available > > If the Kafka cluster is not consistent with the zookeeper data, the partition > can not find leader in zookeeper.In storm-kafka, it throws runtime > exception"No leader found for partition",it is not friendly. > Suggestion: > If there is no leader partition in zookeeper,don't add the partition to > GlobalPartitionInformation object,instead of throwing runtime exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-997) Add support for user specified UGI - (UserGroupInformation) for storm hdfs connector
[ https://issues.apache.org/jira/browse/STORM-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-997: - Labels: pull-request-available (was: ) > Add support for user specified UGI - (UserGroupInformation) for storm hdfs > connector > > > Key: STORM-997 > URL: https://issues.apache.org/jira/browse/STORM-997 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-hdfs >Reporter: Priyank Shah >Assignee: Priyank Shah >Priority: Major > Labels: pull-request-available > > In a non-secure environment, Storm HDFS component that provides interaction > with HDFS from storm currently does that as the user storm with which the > worker process had been started. We want to allow the component to interact > with hdfs as the user provided instead of user running the worker process -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1506) It's better to be Integer about port of STORM_ZOOKEEPER_PORT_ZOOKEEPER_PORT
[ https://issues.apache.org/jira/browse/STORM-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1506: -- Labels: pull-request-available (was: ) > It's better to be Integer about port of > STORM_ZOOKEEPER_PORT_ZOOKEEPER_PORT > --- > > Key: STORM-1506 > URL: https://issues.apache.org/jira/browse/STORM-1506 > Project: Apache Storm > Issue Type: Wish >Reporter: John Fang >Assignee: John Fang >Priority: Minor > Labels: pull-request-available > > It's better to replace Object by Integer about port of > STORM_ZOOKEEPER_PORT_ZOOKEEPER_PORT -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2384) Add a log statement when spout skips calling nextTuple.
[ https://issues.apache.org/jira/browse/STORM-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-2384: -- Labels: easyfix logging pull-request-available (was: easyfix logging) > Add a log statement when spout skips calling nextTuple. > --- > > Key: STORM-2384 > URL: https://issues.apache.org/jira/browse/STORM-2384 > Project: Apache Storm > Issue Type: Improvement > Components: storm-core >Reporter: Abhishek >Priority: Minor > Labels: easyfix, logging, pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > I have come across threads where people ask questions about a topology being > stuck because the spout isn't emitting anything. Having spent considerable > time debugging this myself, I think adding a log statement for the case when > spout skips calling nextTuple() because maxSpoutPending is reached or because > throttling is on could save many developer hours. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1039) Storm Web UI gives 500 error: Remove commons-codec shading, commons-codec is used by hadoop authentication library which is used during ui authentication in secured envir
[ https://issues.apache.org/jira/browse/STORM-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1039: -- Labels: pull-request-available (was: ) > Storm Web UI gives 500 error: Remove commons-codec shading, commons-codec is > used by hadoop authentication library which is used during ui authentication > in secured environment. > - > > Key: STORM-1039 > URL: https://issues.apache.org/jira/browse/STORM-1039 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Reporter: Priyank Shah >Assignee: Priyank Shah >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2358) Update storm hdfs spout to remove specific implementation handlings
[ https://issues.apache.org/jira/browse/STORM-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-2358: -- Labels: newbie pull-request-available (was: newbie) > Update storm hdfs spout to remove specific implementation handlings > --- > > Key: STORM-2358 > URL: https://issues.apache.org/jira/browse/STORM-2358 > Project: Apache Storm > Issue Type: Improvement > Components: storm-hdfs >Affects Versions: 1.x >Reporter: Sachin Pasalkar >Priority: Major > Labels: newbie, pull-request-available > Attachments: AbstractFileReader.java, FileReader.java, > HDFSSpout.java, SequenceFileReader.java, TextFileReader.java > > Time Spent: 4h 40m > Remaining Estimate: 0h > > I was looking at storm hdfs spout code in 1.x branch, I found below > improvements can be made in below code. > 1. Make org.apache.storm.hdfs.spout.AbstractFileReader as public so > that it can be used in generics. > 2. org.apache.storm.hdfs.spout.HdfsSpout requires readerType as > String. It will be great to have class > readerType; So we will not use Class.forName at multiple places also it > will help in below point. > 3. HdfsSpout also needs to provide outFields which are declared as > constants in each reader(e.g.SequenceFileReader). We can have abstract > API AbstractFileReader in which return them to user to make it generic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1469) Unable to deploy large topologies on apache storm
[ https://issues.apache.org/jira/browse/STORM-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1469: -- Labels: pull-request-available (was: ) > Unable to deploy large topologies on apache storm > - > > Key: STORM-1469 > URL: https://issues.apache.org/jira/browse/STORM-1469 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Affects Versions: 1.0.0, 2.0.0 >Reporter: Rudra Sharma >Assignee: Kishor Patil >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When deploying to a nimbus a topology which is larger in size >17MB, we get > an exception. In storm 0.9.3 this could be mitigated by using the following > config on the storm.yaml to increse the buffer size to handle the topology > size. i.e. 50MB would be > nimbus.thrift.max_buffer_size: 5000 > This configuration does not resolve the issue in the master branch of storm > and we cannot deploy topologies which are large in size. > Here is the log on the client side when attempting to deploy to the nimbus > node: > java.lang.RuntimeException: org.apache.thrift7.transport.TTransportException > at > backtype.storm.StormSubmitter.submitTopologyAs(StormSubmitter.java:251) > ~[storm-core-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT] > at > backtype.storm.StormSubmitter.submitTopology(StormSubmitter.java:272) > ~[storm-core-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT] > at > backtype.storm.StormSubmitter.submitTopology(StormSubmitter.java:155) > ~[storm-core-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT] > at > com.trustwave.siem.storm.topology.deployer.TopologyDeployer.deploy(TopologyDeployer.java:149) > [siem-ng-storm-deployer-cloud.jar:] > at > com.trustwave.siem.storm.topology.deployer.TopologyDeployer.main(TopologyDeployer.java:87) > [siem-ng-storm-deployer-cloud.jar:] > Caused by: org.apache.thrift7.transport.TTransportException > at > org.apache.thrift7.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) > ~[storm-core-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT] > at org.apache.thrift7.transport.TTransport.readAll(TTransport.java:86) > ~[storm-core-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT] > at > org.apache.thrift7.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > ~[storm-core-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT] > at > org.apache.thrift7.transport.TFramedTransport.read(TFramedTransport.java:101) > ~[storm-core-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT] > at org.apache.thrift7.transport.TTransport.readAll(TTransport.java:86) > ~[storm-core-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT] > at > org.apache.thrift7.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) > ~[storm-core-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT] > at > org.apache.thrift7.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) > ~[storm-core-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT] > at > org.apache.thrift7.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) > ~[storm-core-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT] > at > org.apache.thrift7.TServiceClient.receiveBase(TServiceClient.java:77) > ~[storm-core-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT] > at > backtype.storm.generated.Nimbus$Client.recv_submitTopology(Nimbus.java:238) > ~[storm-core-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT] > at > backtype.storm.generated.Nimbus$Client.submitTopology(Nimbus.java:222) > ~[storm-core-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT] > at > backtype.storm.StormSubmitter.submitTopologyAs(StormSubmitter.java:237) > ~[storm-core-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT] > ... 4 more > Here is the log on the server side (nimbus.log): > 2016-01-13 10:48:07.206 o.a.s.d.nimbus [INFO] Cleaning inbox ... deleted: > stormjar-c8666220-fa19-426b-a7e4-c62dfb57f1f0.jar > 2016-01-13 10:55:09.823 o.a.s.d.nimbus [INFO] Uploading file from client to > /var/storm-data/nimbus/inbox/stormjar-80ecdf05-6a25-4281-8c78-10062ac5e396.jar > 2016-01-13 10:55:11.910 o.a.s.d.nimbus [INFO] Finished uploading file from > client: > /var/storm-data/nimbus/inbox/stormjar-80ecdf05-6a25-4281-8c78-10062ac5e396.jar > 2016-01-13 10:55:12.084 o.a.t.s.AbstractNonblockingServer$FrameBuffer [WARN] > Exception while invoking! > org.apache.thrift7.transport.TTransportException: Frame size (17435758) > larger than max length (16384000)! > at > org.apache.thrift7.transport.TFramedTransport.readFrame(TFramedTransport.java:137) > at > org.apache.thrift7.transport.TFramedTransport.read(TFramedTransport.java:101) > at org.apache.thrift7.transport.TTransport.readAll(TTransport.java:86) > at > org.apache.thrift7.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) > at >
[jira] [Updated] (STORM-1345) Thrift, nimbus ,zookeeper, supervisor and worker changes to support update topology.
[ https://issues.apache.org/jira/browse/STORM-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1345: -- Labels: pull-request-available (was: ) > Thrift, nimbus ,zookeeper, supervisor and worker changes to support update > topology. > > > Key: STORM-1345 > URL: https://issues.apache.org/jira/browse/STORM-1345 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Reporter: Parth Brahmbhatt >Assignee: Parth Brahmbhatt >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-433) Give users visibility to the depth of queues at each bolt
[ https://issues.apache.org/jira/browse/STORM-433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-433: - Labels: pull-request-available (was: ) > Give users visibility to the depth of queues at each bolt > - > > Key: STORM-433 > URL: https://issues.apache.org/jira/browse/STORM-433 > Project: Apache Storm > Issue Type: Wish > Components: storm-core >Reporter: Dane Hammer >Assignee: Abhishek Agarwal >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > I envision being able to browse the Storm UI and see where queues of tuples > are backing up. > Today if I see latencies increasing at a bolt, it may not be due to anything > specific to that bolt, but that it is backed up behind an overwhelmed bolt > (which has too low of parallelism or too high of latency). > I would expect this could use sampling like the metrics reported to the UI > today, and just retrieve data from netty about the state of the queues. I > wouldn't imagine supporting zeromq on the first pass. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2201) Dynamic scheduler configuration loader
[ https://issues.apache.org/jira/browse/STORM-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-2201: -- Labels: pull-request-available (was: ) > Dynamic scheduler configuration loader > -- > > Key: STORM-2201 > URL: https://issues.apache.org/jira/browse/STORM-2201 > Project: Apache Storm > Issue Type: Improvement >Reporter: Paul Poulosky >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Original Estimate: 168h > Time Spent: 13.5h > Remaining Estimate: 154.5h > > It would be useful if scheduler configuration for multitenant or resource > aware scheduler could be loaded and updated dynamically through a local file > change or through an update to a configuration archive in an Artifactory > repository. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2380) worker.childopts with whitespace inside one param will be split into pieces
[ https://issues.apache.org/jira/browse/STORM-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-2380: -- Labels: pull-request-available (was: ) > worker.childopts with whitespace inside one param will be split into pieces > --- > > Key: STORM-2380 > URL: https://issues.apache.org/jira/browse/STORM-2380 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Affects Versions: 2.0.0, 1.0.3, 1.x >Reporter: Howard Lee >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > worker.childopts params with whitespace inside, like -XX:OnError="pstack %p > >~/pstack%p.log", will be split into pieces for supervisor use > string.split("\\s+") to split params. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1129) Storm should use topology name instead of ids for url in storm UI.
[ https://issues.apache.org/jira/browse/STORM-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1129: -- Labels: pull-request-available (was: ) > Storm should use topology name instead of ids for url in storm UI. > -- > > Key: STORM-1129 > URL: https://issues.apache.org/jira/browse/STORM-1129 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Reporter: Priyank Shah >Assignee: Priyank Shah >Priority: Major > Labels: pull-request-available > > Currently, in storm UI details about a topology can be viewed at a URL which > has a topology id as a query parameter. When a topology is updated and > redeployed a new id is assigned by storm and existing URL(and any bookmarks > relying on it) for the topology do not work since the id has changed. We > should change it so that topology name is used instead of id. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1901) Avro Integration for Storm-Kafka
[ https://issues.apache.org/jira/browse/STORM-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1901: -- Labels: pull-request-available (was: ) > Avro Integration for Storm-Kafka > > > Key: STORM-1901 > URL: https://issues.apache.org/jira/browse/STORM-1901 > Project: Apache Storm > Issue Type: Improvement >Reporter: Xin Wang >Assignee: Xin Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-979) [storm-elasticsearch] Introduces BaseQueryFunction to query to ES while using Trident
[ https://issues.apache.org/jira/browse/STORM-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-979: - Labels: pull-request-available (was: ) > [storm-elasticsearch] Introduces BaseQueryFunction to query to ES while using > Trident > - > > Key: STORM-979 > URL: https://issues.apache.org/jira/browse/STORM-979 > Project: Apache Storm > Issue Type: Improvement > Components: storm-elasticsearch >Reporter: Jungtaek Lim >Assignee: Subhankar Biswas >Priority: Major > Labels: pull-request-available > > storm-elasticsearch has features on storing document, not querying something. > It would be better to have BaseQueryFunction for querying to ES and emit > matched documents, as other external modules did. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2194) ReportErrorAndDie doesn't always die
[ https://issues.apache.org/jira/browse/STORM-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-2194: -- Labels: pull-request-available (was: ) > ReportErrorAndDie doesn't always die > > > Key: STORM-2194 > URL: https://issues.apache.org/jira/browse/STORM-2194 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Affects Versions: 2.0.0, 1.0.2 >Reporter: Craig Hawco >Assignee: Paul Poulosky >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0, 1.0.4, 1.1.1, 1.2.0 > > Attachments: scrubbed-thread-dump.txt > > Time Spent: 2h 40m > Remaining Estimate: 0h > > I've been trying to track down a cause of some of our issues with some > exceptions leaving Storm workers in a zombified state for some time. I > believe I've isolated the bug to the behaviour in > :report-error-and-die/reportErrorAndDie in the executor. Essentially: > {code} > :report-error-and-die (fn [error] > (try >((:report-error <>) error) >(catch Exception e > (log-message "Error while reporting error to > cluster, proceeding with shutdown"))) > (if (or > (exception-cause? InterruptedException > error) > (exception-cause? > java.io.InterruptedIOException error)) >(log-message "Got interrupted excpetion > shutting thread down...") >((:suicide-fn <> > {code} > has the grouping for the if statement slightly wrong. It shouldn't log OR die > from InterruptedException/InterruptedIOException, but it should log under > that condition, and ALWAYS die. > Basically: > {code} > :report-error-and-die (fn [error] > (try >((:report-error <>) error) >(catch Exception e > (log-message "Error while reporting error to > cluster, proceeding with shutdown"))) > (if (or > (exception-cause? InterruptedException > error) > (exception-cause? > java.io.InterruptedIOException error)) >(log-message "Got interrupted excpetion > shutting thread down...")) > ((:suicide-fn <>))) > {code} > After digging into the Java port of this code, it looks like a different bug > was introduced while porting: > {code} > if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e) > || > Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) { > LOG.info("Got interrupted exception shutting thread down..."); > suicideFn.run(); > } > {code} > Was how this was initially ported, and STORM-2142 changed this to: > {code} > if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e) > || > Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) { > LOG.info("Got interrupted exception shutting thread down..."); > } else { > suicideFn.run(); > } > {code} > However, I believe the correct port is as described above: > {code} > if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e) > || > Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) { > LOG.info("Got interrupted exception shutting thread down..."); > } > suicideFn.run(); > {code} > I'll look into providing patches for the 1.x and 2.x branches shortly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1931) Share mapper and selector in Storm-Kafka
[ https://issues.apache.org/jira/browse/STORM-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1931: -- Labels: pull-request-available (was: ) > Share mapper and selector in Storm-Kafka > > > Key: STORM-1931 > URL: https://issues.apache.org/jira/browse/STORM-1931 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka >Affects Versions: 1.0.0, 1.0.1 >Reporter: darion yaphet >Assignee: darion yaphet >Priority: Major > Labels: pull-request-available > > Storm Kafka's mapper and selector and Storm Kafka trident's mapper and > selector are the same . I try to merge them into one . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1992) Deploy multilang-javascript code as node package
[ https://issues.apache.org/jira/browse/STORM-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1992: -- Labels: pull-request-available (was: ) > Deploy multilang-javascript code as node package > > > Key: STORM-1992 > URL: https://issues.apache.org/jira/browse/STORM-1992 > Project: Apache Storm > Issue Type: New Feature > Components: storm-multilang >Affects Versions: 1.0.1 >Reporter: Marc Zbyszynski >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now that storm includes Flux, it is easier than ever to deploy a topology > with javascript components. If the Bolt and Spout base classes defined in > storm/multi-lang/javascript were available in a node.js package on > https://www.npmjs.com/ it would allow node.js storm users to take advantage > of node's built in package manager to develop their own bolts and spouts. > It would be relatively trivial to add some maven tasks to > storm/multi-lang/javascript/pom.xml to take the storm.js resource and package > it in a node module and submit it to npm. > This could be added to the pom as a separate profile so it wouldn't impact > the normal storm build process. > This integration will also make it easier to add unit tests for storm.js > For additional background see this discussion: > http://mail-archives.apache.org/mod_mbox/storm-dev/201607.mbox/%3CCAD8EKPHc6O1LCnoQUUoYoDuMQ3uSaNpD5gR4onK%2B0EL5_qcZ3Q%40mail.gmail.com%3E > If this sounds like a worthwhile addition to the project I would be happy to > submit a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1600) Do not report errors when the worker shutdown is in progress
[ https://issues.apache.org/jira/browse/STORM-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1600: -- Labels: pull-request-available (was: ) > Do not report errors when the worker shutdown is in progress > > > Key: STORM-1600 > URL: https://issues.apache.org/jira/browse/STORM-1600 > Project: Apache Storm > Issue Type: Improvement >Reporter: Abhishek Agarwal >Assignee: Abhishek Agarwal >Priority: Major > Labels: pull-request-available > > Usually in a worker, some uncaught exception in an executor threads leads to > process exit. Process exit is not instantaneous and it first triggers the > shutdown. The shutdown initiation usually results in network connections > closing e.g. zookeeper, hdfs in other threads causing other exceptions. These > threads end up reporting their exceptions as well. It confuses the user who > can these errors on UI but not the actual root cause of shutdown hidden > beneath new errors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1675) Ability to submit multiple jars from client to topology
[ https://issues.apache.org/jira/browse/STORM-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1675: -- Labels: pull-request-available (was: ) > Ability to submit multiple jars from client to topology > --- > > Key: STORM-1675 > URL: https://issues.apache.org/jira/browse/STORM-1675 > Project: Apache Storm > Issue Type: New Feature > Components: storm-core >Reporter: Abhishek Agarwal >Assignee: Abhishek Agarwal >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Though submitting shaded jar works in most of the cases, in some cases such > an ability may be very helpful. e.g. storm sql project. It currently packs > the classes into one jar and submits it as a single topology jar. A generic > capability may be helpful for other tools such as storm-sql. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1736) Change KafkaTestBroker.buildKafkaConfig to new KafkaConfig api.
[ https://issues.apache.org/jira/browse/STORM-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1736: -- Labels: pull-request-available (was: ) > Change KafkaTestBroker.buildKafkaConfig to new KafkaConfig api. > --- > > Key: STORM-1736 > URL: https://issues.apache.org/jira/browse/STORM-1736 > Project: Apache Storm > Issue Type: Bug >Reporter: Sriharsha Chintalapani >Assignee: Sriharsha Chintalapani >Priority: Major > Labels: pull-request-available > > 1.x-branch failing with following error > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile > (default-testCompile) on project storm-kafka: Compilation failure > [ERROR] > /Users/harsha/code/harshach/incubator-storm/external/storm-kafka/src/test/org/apache/storm/kafka/KafkaTestBroker.java:[70,16] > constructor KafkaConfig in class kafka.server.KafkaConfig cannot be applied > to given types; > [ERROR] required: java.util.Map,boolean > [ERROR] found: java.util.Properties > [ERROR] reason: actual and formal argument lists differ in length > [ERROR] -> [Help 1] > [ERROR] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-822) Kafka Spout New Consumer API
[ https://issues.apache.org/jira/browse/STORM-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-822: - Labels: pull-request-available (was: ) > Kafka Spout New Consumer API > > > Key: STORM-822 > URL: https://issues.apache.org/jira/browse/STORM-822 > Project: Apache Storm > Issue Type: Story > Components: storm-kafka-client >Reporter: Thomas Becker >Assignee: Hugo Louro >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0, 2.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1712) make storage plugin for transactional state
[ https://issues.apache.org/jira/browse/STORM-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1712: -- Labels: pull-request-available (was: ) > make storage plugin for transactional state > --- > > Key: STORM-1712 > URL: https://issues.apache.org/jira/browse/STORM-1712 > Project: Apache Storm > Issue Type: Improvement >Reporter: John Fang >Assignee: John Fang >Priority: Major > Labels: pull-request-available > > As we know the transactional state must storage in zk when we run trident > topology. In fact I have packaged the TransactionalState to a plugin. We > still storage the state to zk by default. But you can storage the > transactional state to other places by set different plugin. And we now > support storage transactional state to hbase, and so on. I want to hear your > opinion. If Ok, I am pleasure to create the PR and merge the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2355) Storm-HDFS: inotify support
[ https://issues.apache.org/jira/browse/STORM-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-2355: -- Labels: pull-request-available (was: ) > Storm-HDFS: inotify support > --- > > Key: STORM-2355 > URL: https://issues.apache.org/jira/browse/STORM-2355 > Project: Apache Storm > Issue Type: New Feature > Components: storm-hdfs >Reporter: Tibor Kiss >Assignee: Tibor Kiss >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > This is a proposal to implement inotify based watch dir monitoring in > Storm-HDFS Spout. > *Motivation* > Storm-HDFS's HdfsSpout currently polls the Spout’s input directory using > Hadoop's {{FileSystem.listFiles()}}. This operation is expensive since it > returns the block locations and all stat information of the files inside the > watch directory. Moreover HdfsSpout currently uses only one element of the > returned Path list which is inefficient as the rest of the entries are thrown > away without processing. > The proposed design provides greater efficiency through the inotify interface > and also enables to easier extension of the original ({{listFiles()}} based) > monitoring with buffering (see Further work section below). > *High level design* > Goal is to leverage [HDFS inotify > API|http://hadoop.apache.org/docs/current/api//org/apache/hadoop/hdfs/DFSInotifyEventInputStream.html] > to monitor new file arrival to HdfsSpout's input directory. > The inotify based monitoring is an addition to the original > {{FileSystem.listFiles()}} based implementation, the default behavior of the > spout will be unchanged by this modification. > To unify the two monitoring methods and enable buffering an iterator based > ({{HdfsDirectoryMonitor}}) class is created. > To retain backward compatibility the HdfsSpout's default monitoring behavior > is unchanged, inotify based monitoring could be enabled through a parameter. > As inotify requires administrative privileges (see Caveat section below) a > fallback mechanism is be implemented in HdfsSpout to use the original > {{listFiles()}} based monitoring if initialization fails for inotify based > monitoring. > *Implementation details* > As inotify provides only a delta of the filesystem events from a given Tx Id > (of Hdfs Edit Log) it is required to do a {{FileSystem.listFiles()}} based > collection during the Spout's initialization to ensure that any left over > files are processed. > The inotify based implementation uses HdfsAdmin's > [{{DFSInotifyEventInputStream.poll()}}|http://hadoop.apache.org/docs/current/api//org/apache/hadoop/hdfs/DFSInotifyEventInputStream.html#poll--] > method to fetch and buffer the list of new files created since the provided > Tx Id to {{newFileList}} buffer. > During {{HdfsSpout.nextTuple()}} call one element is taken from the > {{newFileList}} buffer and processed by the spout. > The {{newFileList}} buffer is extended with the result of the > {{DFSInotifyEventInputStream.poll(lastTxId)}} call in every nextTuple() call. > Since HdfsSpout is able to create it's own {{HdfsAdmin()}} instance there > will be no need for the user to do additional initialization for the spout > even if inotify is enabled. > *Caveat* > HDFS inotify is currently available through hdfs administrator user only, but > there is ongoing discussion in Hadoop community to extend its support to > users. See: HDFS-8940 > *Further work* > 1) The number of calls to {{DFSInotifyEventInputStream.poll(lastTxId)}} could > be further reduced if the locking directory is moved away from the input > directory. With the current design updates on the lock dir are also included > in the {{newFileList}} buffer hence the buffer will never get completely > empty. > 2) The original {{listFiles()}} based solution could be improved through > {{HdfsDirectoryMonitor}} to buffer and use all the returned items from the > work directory, similarly to inotify based monitoring. Such improvement will > reduce the number of calls made to namenode. > These improvements are currently not part of this ticket. > *Error scenarios* > - Inability of HdfsAdmin instance creation (e.g. lack of privileges): >The spout falls back to the original {{listFiles()}} based method. > - Namenode's edit log is not yet open for write during {{HdfsSpout.open()}}: >The initialization will be postponed to the {{HdfsSpout.nextTuple()}} > call(s). > - Hdfs gets disconnected while the topology is running: >HdfsSpout reports an error and retries in the next call of nextSpout() > call. >No data will be skipped as the update will be requested from the last > known Tx Id. > > *Testing related changes* > The {{TestHdfsSpout}} testcase should be parametrized to
[jira] [Updated] (STORM-904) move storm bin commands to java and provide appropriate bindings for windows and linux
[ https://issues.apache.org/jira/browse/STORM-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-904: - Labels: pull-request-available (was: ) > move storm bin commands to java and provide appropriate bindings for windows > and linux > -- > > Key: STORM-904 > URL: https://issues.apache.org/jira/browse/STORM-904 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Reporter: Sriharsha Chintalapani >Assignee: Priyank Shah >Priority: Major > Labels: pull-request-available > > Currently we have python and .cmd implementation for windows. This is > becoming increasing difficult upkeep both versions. Lets make all the main > code of starting daemons etc. to java and provider wrapper scripts in shell > and batch for linux and windows respectively. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-947) replace all `backtype.storm.scheduler.ExecutorDetails` with `backtype.storm.generated.ExecutorInfo `
[ https://issues.apache.org/jira/browse/STORM-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-947: - Labels: pull-request-available (was: ) > replace all `backtype.storm.scheduler.ExecutorDetails` with > `backtype.storm.generated.ExecutorInfo ` > -- > > Key: STORM-947 > URL: https://issues.apache.org/jira/browse/STORM-947 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 1.0.0 >Reporter: caofangkun >Assignee: caofangkun >Priority: Minor > Labels: pull-request-available > > replace all > {code} > backtype.storm.scheduler.ExecutorDetails > {code} > with > {code} > backtype.storm.generated.ExecutorInfo > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2339) Python code format cleanup in storm.py
[ https://issues.apache.org/jira/browse/STORM-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-2339: -- Labels: pull-request-available (was: ) > Python code format cleanup in storm.py > -- > > Key: STORM-2339 > URL: https://issues.apache.org/jira/browse/STORM-2339 > Project: Apache Storm > Issue Type: Improvement > Components: storm-core >Affects Versions: 0.10.0, 1.0.0, 2.0.0 >Reporter: Tibor Kiss >Assignee: Tibor Kiss >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > {{bin/storm.py}} has multiple stylistic shortcomings: > - PEP8 standard is not followed > - the python interpreter is hard-wired to /usr/bin/python > - unnecessary global statements are posted before reading globals > These issues shadows error reporting by modern IDEs (such as PyCharm). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2286) Storm Rebalance command should support arbitrary component parallelism
[ https://issues.apache.org/jira/browse/STORM-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-2286: -- Labels: pull-request-available (was: ) > Storm Rebalance command should support arbitrary component parallelism > -- > > Key: STORM-2286 > URL: https://issues.apache.org/jira/browse/STORM-2286 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Affects Versions: 0.9.3, 0.9.6, 0.10.1, 1.0.1 >Reporter: Yuzhao Chen >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > For legacy reasons, config TOPOLOGY-TASKS is considered first when schedule a > topology, for a component, if user don’t specify TOPOLOGY-TASKS, storm just > override it to be equal to component parallelism hint, and schedule based on > TOPOLOGY-TASKS later on. > This works for the most cases, but not Rebalance command. Now, when do > Rebalance, the StormBase :component->executors attribute will be overridden > in Zookeeper which is used to partition component tasks into executors, as we > said above, the TOPOLOGY-TASKS is considered here as the real tasks number > for components, something goes weird here: > If we override a bigger executor numbers for a component when do rebalance, > it just don’t work because smaller TOPOLOGY-TASKS [ not changed since first > submitted at all ]is partitioned into bigger number of executors which read > from ZooKeeper overridden by Rebalance command, but for smaller task, it > works fine. > I see that storm support a command like this now: [storm rebalance > topology-name [-w wait-time-secs] [-n new-num-workers] [-e > component=parallelism]*] which indicate that user can override a component > parallelism freely, i think it’s more sensible to support this and it's > meaningless to have a restriction like before. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2508) storm-solr enhancement: update solrj to 5.5, support custom SolrClientFactory and commit operation
[ https://issues.apache.org/jira/browse/STORM-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-2508: -- Labels: pull-request-available (was: ) > storm-solr enhancement: update solrj to 5.5, support custom SolrClientFactory > and commit operation > -- > > Key: STORM-2508 > URL: https://issues.apache.org/jira/browse/STORM-2508 > Project: Apache Storm > Issue Type: Improvement > Components: storm-solr >Reporter: Yelei Wu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > I have a case that the SolrCloud in my organization is protected by SSL + > Basic Auth, so I need to provide customized SolrClient instance to > SolrUpdateBolt. Likewise, the RestJsonSchemaBuilder cannot access Schema API > without auth. > In addition, for "near real-time search", soft commit is preferred, so it is > better to expose an interface for user to customize commit operation. > I think those features will also be useful for other people who needs to use > custom SolrClient implementation and control the detail of commit operation. > So, finally, I plan an enhancement for storm-solr: > 1. update solrj and related dependencies to 5.5; > 2. improve SolrConfig to support custom SolrClientFactory and CommitCallBack; > 3. provide a SchemaBuilder implementation which use custom SolrClient to > request Schema API. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-1929) Check when create topology
[ https://issues.apache.org/jira/browse/STORM-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-1929: -- Labels: pull-request-available (was: ) > Check when create topology > -- > > Key: STORM-1929 > URL: https://issues.apache.org/jira/browse/STORM-1929 > Project: Apache Storm > Issue Type: Improvement > Components: storm-core >Affects Versions: 1.0.0, 1.0.1 >Reporter: darion yaphet >Assignee: darion yaphet >Priority: Major > Labels: pull-request-available > > Add some check when create topology . > 1. Spout and Bolt id shouldn't conflict > 2. createTopology's spout and bolt set shouldn't empty . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-440) NimbusClient throws NPE if Config.STORM_THRIFT_TRANSPORT_PLUGIN is not set
[ https://issues.apache.org/jira/browse/STORM-440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-440: - Labels: pull-request-available (was: ) > NimbusClient throws NPE if Config.STORM_THRIFT_TRANSPORT_PLUGIN is not set > -- > > Key: STORM-440 > URL: https://issues.apache.org/jira/browse/STORM-440 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Affects Versions: 0.9.2-incubating >Reporter: Bryan Baugher >Priority: Minor > Labels: pull-request-available > > We just upgraded from 0.8.2 to 0.9.2 and noticed that when constructing a > NimbusClient if Config.STORM_THRIFT_TRANSPORT_PLUGIN is not specified then > AuthUtils[1] throws a NPE. > [1] - > https://github.com/bbaugher/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/security/auth/AuthUtils.java#L73-L74 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-3269) storm-client and storm-server indirectly depend on storm-core
[ https://issues.apache.org/jira/browse/STORM-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-3269: -- Labels: pull-request-available (was: ) > storm-client and storm-server indirectly depend on storm-core > - > > Key: STORM-3269 > URL: https://issues.apache.org/jira/browse/STORM-3269 > Project: Apache Storm > Issue Type: Bug > Components: storm-client, storm-core, storm-server >Affects Versions: 2.0.0 >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans >Priority: Major > Labels: pull-request-available > > When trying to get the version information for nimbus it looks for > storm-core, which requires the storm-core class to be on the classpath. We > need to fix this, because VersionInfo is in storm-client so it is possible > for someone who uses it from storm-client to load the wrong thing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (STORM-3263) Memory and CPU guarantee columns do not sort correctly in Owner Summary
[ https://issues.apache.org/jira/browse/STORM-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved STORM-3263. Resolution: Fixed Assignee: Jacob Tolar Fix Version/s: 2.0.0 Thanks [~jtolar], I merged this into master. > Memory and CPU guarantee columns do not sort correctly in Owner Summary > --- > > Key: STORM-3263 > URL: https://issues.apache.org/jira/browse/STORM-3263 > Project: Apache Storm > Issue Type: Improvement >Reporter: Jacob Tolar >Assignee: Jacob Tolar >Priority: Minor > Labels: pull-request-available > Fix For: 2.0.0 > > Attachments: after-fix.png, screenshot-1.png > > Time Spent: 10m > Remaining Estimate: 0h > > If there is a user running with no guarantees, the memory and CPU guarantee > columns show up as "N/A". I think that breaks the DataTables auto detection > of the column type and it falls back to string type. Then it does string sort > instead of numeric sort which is incorrect for these columns. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (STORM-3268) Try to make the integration test more stable
[ https://issues.apache.org/jira/browse/STORM-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stig Rohde Døssing resolved STORM-3268. --- Resolution: Fixed Fix Version/s: 2.0.1 > Try to make the integration test more stable > > > Key: STORM-3268 > URL: https://issues.apache.org/jira/browse/STORM-3268 > Project: Apache Storm > Issue Type: Task > Components: integration-test >Affects Versions: 2.0.0 >Reporter: Stig Rohde Døssing >Assignee: Stig Rohde Døssing >Priority: Major > Labels: pull-request-available > Fix For: 2.0.1 > > Time Spent: 10m > Remaining Estimate: 0h > > The integration test is still flaky, and most of the time it's not because of > bugs in Storm. Try to make it more stable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (STORM-3270) Build Storm with JDK 11, excluding incompatible modules
Stig Rohde Døssing created STORM-3270: - Summary: Build Storm with JDK 11, excluding incompatible modules Key: STORM-3270 URL: https://issues.apache.org/jira/browse/STORM-3270 Project: Apache Storm Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Stig Rohde Døssing Assignee: Stig Rohde Døssing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-3258) prevent flooding supervisor log with blobstore location
[ https://issues.apache.org/jira/browse/STORM-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated STORM-3258: --- Fix Version/s: (was: 2.0.1) 2.0.0 > prevent flooding supervisor log with blobstore location > --- > > Key: STORM-3258 > URL: https://issues.apache.org/jira/browse/STORM-3258 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Trivial > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > This line repeats over and over in supervisor logs, and really does not > provide any useful information. We should at least change it to debug. > [https://github.com/apache/storm/blob/master/external/storm-hdfs-blobstore/src/main/java/org/apache/storm/hdfs/blobstore/HdfsBlobStoreImpl.java#L128] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-3249) Nimbus Shutdown Faster
[ https://issues.apache.org/jira/browse/STORM-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated STORM-3249: --- Fix Version/s: (was: 2.0.1) 2.0.0 > Nimbus Shutdown Faster > -- > > Key: STORM-3249 > URL: https://issues.apache.org/jira/browse/STORM-3249 > Project: Apache Storm > Issue Type: Bug > Components: storm-server >Affects Versions: 2.0.0 >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Nimbus takes for ever to shut down in 2.x. It would really be nice to fix > that. (and it is really our own fault why it takes so long) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (STORM-3262) Nimbus REST API reports leader before it gains leadership
[ https://issues.apache.org/jira/browse/STORM-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved STORM-3262. Resolution: Fixed Fix Version/s: 2.0.0 Thanks [~agresch], I merged this into master. > Nimbus REST API reports leader before it gains leadership > - > > Key: STORM-3262 > URL: https://issues.apache.org/jira/browse/STORM-3262 > Project: Apache Storm > Issue Type: Bug >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > At times when nimbus restarts, the REST API returns Leader before leadership > is acquired. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (STORM-3250) Clean up old Pull Requests
[ https://issues.apache.org/jira/browse/STORM-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stig Rohde Døssing resolved STORM-3250. --- Resolution: Fixed > Clean up old Pull Requests > -- > > Key: STORM-3250 > URL: https://issues.apache.org/jira/browse/STORM-3250 > Project: Apache Storm > Issue Type: Task >Reporter: Derek Dagit >Assignee: Derek Dagit >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Close [pull > requests|https://github.com/apache/storm/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+updated%3A%3C2018-01-01] > that have not been updated in 2018. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (STORM-3269) storm-client and storm-server indirectly depend on storm-core
Robert Joseph Evans created STORM-3269: -- Summary: storm-client and storm-server indirectly depend on storm-core Key: STORM-3269 URL: https://issues.apache.org/jira/browse/STORM-3269 Project: Apache Storm Issue Type: Bug Components: storm-client, storm-core, storm-server Affects Versions: 2.0.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans When trying to get the version information for nimbus it looks for storm-core, which requires the storm-core class to be on the classpath. We need to fix this, because VersionInfo is in storm-client so it is possible for someone who uses it from storm-client to load the wrong thing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)