[GitHub] samza pull request #883: SAMZA-2072: Update guava to 23.0

2019-01-15 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/883 SAMZA-2072: Update guava to 23.0 Startpoint is relying on an old version of guava, which should be updated to 23.0 for the newer api. You can merge this pull request into a Git repository

[GitHub] samza pull request #881: SAMZA-2068: Separating container launch logic into ...

2019-01-15 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/881 SAMZA-2068: Separating container launch logic into util class The container launch logic needs to be invoked for beam-runner to run beam containers. This is a small refactoring

[GitHub] samza pull request #867: SAMZA-2048: Add guide to run Beam wordcount example

2018-12-20 Thread xinyuiscool
Github user xinyuiscool closed the pull request at: https://github.com/apache/samza/pull/867 ---

[GitHub] samza pull request #867: SAMZA-2048: Add guide to run Beam wordcount example

2018-12-20 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/867 SAMZA-2048: Add guide to run Beam wordcount example Use the maven archetype to generate the example project for beam wordcount examples. Add the steps to set it up and run the examples. You can

[GitHub] samza pull request #805: SAMZA-1972: Make Operator Timer metrics calculation...

2018-11-13 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/805 SAMZA-1972: Make Operator Timer metrics calculation configurable This patch introduces two changes: 1. Make the timer metrics in OperatorImpl to be optional, and disabled by default. Adding

[GitHub] samza pull request #704: SAMZA-1911: Add documentation for quick start

2018-10-10 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/704 SAMZA-1911: Add documentation for quick start md file for quick start. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xinyuiscool/samza

[GitHub] samza pull request #658: SAMZA-1907: Add metrics to monitor watermarks

2018-09-25 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/658 SAMZA-1907: Add metrics to monitor watermarks Add initial metric to monitor the aggregated watermark time. You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] samza pull request #595: SAMZA-1796: PassthroughJobCoordinator doesn't creat...

2018-08-01 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/595 SAMZA-1796: PassthroughJobCoordinator doesn't create changelog streams Currently only the ClusterBasedJobCoordinator and ZkJobCoordinator are creating changelog streams. The Passthrough one

[GitHub] samza pull request #588: SAMZA-1768: Handle corrupted OFFSET file

2018-07-27 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/588 SAMZA-1768: Handle corrupted OFFSET file This patch addresses the following tickets: SAMZA-1778: SIGSEGV when reading properties (metrics) on a closed RocksDB store SAMZA-1777

[GitHub] samza pull request #566: SAMZA-1762: Fix Memory link in the Timer Registry M...

2018-06-26 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/566 SAMZA-1762: Fix Memory link in the Timer Registry Map Found a memory leak in the SystemTimerScheduler which does not remove the timers from scheduledFutures after the timers are fired

[GitHub] samza pull request #505: SAMZA-1702: Prepare 0.14.1 release on the 0.14.1 br...

2018-05-25 Thread xinyuiscool
Github user xinyuiscool closed the pull request at: https://github.com/apache/samza/pull/505 ---

[GitHub] samza pull request #516: Remove the iterable interface from KeyValueSnapshot

2018-05-10 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/516 Remove the iterable interface from KeyValueSnapshot The iterable interface makes it hard for the users to close it after using. You can merge this pull request into a Git repository by running

[GitHub] samza pull request #510: SAMZA-1705: Switch to use snapshot in iterable impl...

2018-05-08 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/510 SAMZA-1705: Switch to use snapshot in iterable impl of RocksDb We should use rocksDb.snapshot() method to keep the snapshot and creates a new iterator with it all the time. The perf shows

[GitHub] samza pull request #508: SAMZA-1704: Fix compatibility issues with scala 2.1...

2018-05-07 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/508 SAMZA-1704: Fix compatibility issues with scala 2.12 Need to add override keyword for overriding a method in scala 2.12. You can merge this pull request into a Git repository by running

[GitHub] samza pull request #507: SAMZA-1703: Disable flaky test TestEmbeddedTaggedRa...

2018-05-07 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/507 SAMZA-1703: Disable flaky test TestEmbeddedTaggedRateLimiter.testAcquireWithTimeout You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] samza pull request #506: SAMZA-1702: Prepare 0.14.1 release on the master br...

2018-05-07 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/506 SAMZA-1702: Prepare 0.14.1 release on the master branch You can merge this pull request into a Git repository by running: $ git pull https://github.com/xinyuiscool/samza SAMZA-1702-master

[GitHub] samza pull request #505: SAMZA-1702: Prepare 0.14.1 release on the 0.14.1 br...

2018-05-07 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/505 SAMZA-1702: Prepare 0.14.1 release on the 0.14.1 branch You can merge this pull request into a Git repository by running: $ git pull https://github.com/xinyuiscool/samza SAMZA-1702

[GitHub] samza pull request #492: SAMZA-1691: Support get iterable from KeyValueStore

2018-04-27 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/492 SAMZA-1691: Support get iterable from KeyValueStore Right now for KeyValueStore we have a range query to return an iterator. For usage in BEAM, we need a iterable which will 1) create

[GitHub] samza pull request #469: SAMZA-1645: A few issues found by BEAM stress test

2018-04-11 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/469 SAMZA-1645: A few issues found by BEAM stress test 1. Revert the priority set to intermediate streams. 2. Fix a watermark propagation condition You can merge this pull request into a Git

[GitHub] samza pull request #456: SAMZA-1627: Watermark broadcast enhancements

2018-03-26 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/456 SAMZA-1627: Watermark broadcast enhancements Currently each upstream task needs to broadcast to every single partition of intermediate streams in order to aggregate watermarks in the consumers

[GitHub] samza pull request #444: SAMZA-1615: Fix a couple of issues in ControlMessag...

2018-03-12 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/444 SAMZA-1615: Fix a couple of issues in ControlMessageSender Two issues I found during testing: 1) medaDataCache.getSystemStreamMetadata(): if we pass in partitionOnly to be true

[GitHub] samza pull request #419: SAMZA-1498: Support arbitrary system clock timer in...

2018-02-06 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/419 SAMZA-1498: Support arbitrary system clock timer in operators This patch adds the capability to register arbitrary timers for both high-level and low-level api. For high-level

[GitHub] samza pull request #415: SAMZA-1578: Fix watermark bug found by BEAM tests

2018-01-26 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/415 SAMZA-1578: Fix watermark bug found by BEAM tests The problem is getOutputWatermark() does not return the real outputWatermark. This caused problem in user override watermark function. You can

[GitHub] samza pull request #410: SAMZA-1557: Broadcast operator

2018-01-19 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/410 SAMZA-1557: Broadcast operator This patch adds Broadcast operator that allows broadcasting messages to all tasks. It's the counterpart of the Samza broadcast stream in low level api

[GitHub] samza pull request #402: SAMZA-1553: Add log4j for latest Kafka build

2018-01-09 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/402 SAMZA-1553: Add log4j for latest Kafka build Add it so Samza compiles with the latest kafka. You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] samza pull request #400: SAMZA-1550: Update master to use 0.14.1-SNAPSHOT ve...

2018-01-03 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/400 SAMZA-1550: Update master to use 0.14.1-SNAPSHOT version Update master to use 0.14.1-SNAPSHOT version. You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] samza pull request #399: [SAMZA-1550]: replace snapshot with release version...

2018-01-03 Thread xinyuiscool
Github user xinyuiscool closed the pull request at: https://github.com/apache/samza/pull/399 ---

[GitHub] samza pull request #396: SAMZA-1550: Doc for 0.14.0 release

2018-01-02 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/396 SAMZA-1550: Doc for 0.14.0 release Docs update for both master and 0.14.0 branch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xinyuiscool

[GitHub] samza pull request #385: SAMZA-1534: Fix the visualization in job graph with...

2017-12-12 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/385 SAMZA-1534: Fix the visualization in job graph with the new PartitionBy Op Seems the stream and the partitionBy op has the same id. So in rendering I added the stream as the id for the node

[GitHub] samza pull request #381: SAMZA-1512: Documentation on the multi-stage batch ...

2017-12-08 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/381 SAMZA-1512: Documentation on the multi-stage batch processing Documentation to explain how partitionBy(), checkpoint and state works in batch. You can merge this pull request into a Git

[GitHub] samza pull request #370: SAMZA-1516: Another round of issues found by BEAM t...

2017-11-28 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/370 SAMZA-1516: Another round of issues found by BEAM tests A couple of more fixes: 1. fix a bug of identifying input streams for an operator. 2. for partitionBy, set the partitionKey to 0L when key

[GitHub] samza pull request #364: SAMZA-1505: Fix CheckpointTool writing only one ssp...

2017-11-20 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/364 SAMZA-1505: Fix CheckpointTool writing only one ssp per task Currently when using CheckpointTool to write checkpoints, it only writes a checkpoint of a single ssp per task. By debugging the code

[GitHub] samza pull request #361: SAMZA-1504: Allow user to register container-level ...

2017-11-15 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/361 SAMZA-1504: Allow user to register container-level metrics This change allows user to register the metrics on the per-container basis. Tested in beam runner and works as expected

[GitHub] samza pull request #345: SAMZA-1477: Fix issues found by BEAM tests

2017-10-31 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/345 SAMZA-1477: Fix issues found by BEAM tests A bunch of issues were found by BEAM tests, which includes: 1) WatermarkFunction needs to be able to return output after processWatermark

[GitHub] samza pull request #328: SAMZA-1457: Set retention for internal streams for ...

2017-10-16 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/328 SAMZA-1457: Set retention for internal streams for Batch application For intermediate streams, checkpoint and changelog, we need to set a short retention period for batch. You can merge

[GitHub] samza pull request #307: SAMZA-1434: Fix issues found in Hadoop

2017-09-29 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/307 SAMZA-1434: Fix issues found in Hadoop Fix the following bugs found when running Samza on hadoop: 1. Hdfs allows output partitions to be 0 (empty folder) 2. Add null check

[GitHub] samza pull request #297: SAMZA-1417: Clear and recreate intermediate and met...

2017-09-25 Thread xinyuiscool
Github user xinyuiscool closed the pull request at: https://github.com/apache/samza/pull/297 ---

[GitHub] samza pull request #297: SAMZA-1417: Clear and recreate intermediate and met...

2017-09-19 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/297 SAMZA-1417: Clear and recreate intermediate and metadata streams for batch processing For each run of a batch application, we need to clear the internal streams from the previous run

[GitHub] samza pull request #225: SAMZA-1321: Aggregate and propagate end-of-stream a...

2017-09-18 Thread xinyuiscool
Github user xinyuiscool closed the pull request at: https://github.com/apache/samza/pull/225 ---

[GitHub] samza pull request #236: SAMZA-1321: Propagate end-of-stream and watermark m...

2017-09-18 Thread xinyuiscool
Github user xinyuiscool closed the pull request at: https://github.com/apache/samza/pull/236 ---

[GitHub] samza pull request #292: SAMZA-1415: Add clearStream API in SystemAdmin and ...

2017-09-06 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/292 SAMZA-1415: Add clearStream API in SystemAdmin and remove deprecated APIs The patch does the following: 1) add clearStream() APi in SystemAdmin. Currently it's only supported in Kafka

[GitHub] samza pull request #277: SAMZA-1386: Inline End-of-stream and Watermark logi...

2017-08-17 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/277 SAMZA-1386: Inline End-of-stream and Watermark logic inside OperatorImpl This patch contains the following changes: 1. Refactor watermark and end-of-stream logic. The aggregation/handling has

[GitHub] samza pull request #236: SAMZA-1321: Propagate end-of-stream and watermark m...

2017-06-28 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/236 SAMZA-1321: Propagate end-of-stream and watermark messages The patch completes the end-of-stream work flow across multi-stage pipeline. It also contains initial commit for supporting watermarks

[GitHub] samza pull request #225: SAMZA-1321: Propagate end-of-stream messages

2017-06-14 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/225 SAMZA-1321: Propagate end-of-stream messages The patch completes the end-of-stream propagation across intermediate streams. It does the following: 1) EndOfStreamManager aggregates

[GitHub] samza pull request #207: SAMZA-1312: Add Control Messages and Intermediate S...

2017-05-30 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/207 SAMZA-1312: Add Control Messages and Intermediate Stream Serde In this patch, we add the control message types which includes: * EndOfStreamMessage * WatermarkMessage To support

[GitHub] samza pull request #189: SAMZA-1289: Default id generator if not configured

2017-05-12 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/189 SAMZA-1289: Default id generator if not configured Right now in standalone deployment we require the user to provide an id generator. Since most of the time the users can simply use the UUID

[GitHub] samza pull request #188: SAMZA-1288: Add null check for sink OutputStream

2017-05-11 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/188 SAMZA-1288: Add null check for sink OutputStream The logic to generate json for Sink operator does not check whether the output stream is null. This causes null pointer exception. You can merge

[GitHub] samza pull request #184: SAMZA-1283: Expose the buffered-message-size metric

2017-05-10 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/184 SAMZA-1283: Expose the buffered-message-size metric Regardless of whether we enable size limit for the consumer buffer, this metric helps to see what's the buffer size and make configuring size

[GitHub] samza pull request #172: SAMZA-1273: Make StreamConfig.getStreamIds() public

2017-05-08 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/172 SAMZA-1273: Make StreamConfig.getStreamIds() public Making StreamConfig.getStreamIds() public so config provider can scan through all the configured streams and expand some properties if needed

[GitHub] samza pull request #168: SAMZA-1267: ApplicationRunner#getLocalRunner return...

2017-05-05 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/168 SAMZA-1267: ApplicationRunner#getLocalRunner returns null Remove ApplicationRunner#getLocalRunner and clean up any usage examples. You can merge this pull request into a Git repository

[GitHub] samza pull request #154: SAMZA-1246: ApplicatonRunner.stats() should include...

2017-05-01 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/154 SAMZA-1246: ApplicatonRunner.stats() should include exception in case of failure Current when ApplicationRunner.stats() only returns the enum representing the status. It also need to include

[GitHub] samza pull request #145: SAMZA-1245: Make stream samza.physical.name config ...

2017-04-27 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/145 SAMZA-1245: Make stream samza.physical.name config name string public For certain system such as hdfs, the physical stream name might need to be finalized during the config generation. In order

[GitHub] samza pull request #127: SAMZA-1204: Visualize StreamGraph and ExecutionPlan

2017-04-18 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/127 SAMZA-1204: Visualize StreamGraph and ExecutionPlan First look: https://xinyuiscool.github.io/visualizer/plan.html. This is based on the example graph JSON generated in TestJobGraphJsonGenerator

[GitHub] samza pull request #117: SAMZA-1132: LocalApplicationRunner for StreamApplic...

2017-04-06 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/117 SAMZA-1132: LocalApplicationRunner for StreamApplication LocalApplicationRunner runs the StreamApplication locally on every node that the application is deployed to. LocalRunner.start

[GitHub] samza pull request #110: SAMZA-1178: Generate JSON from StreamPlan

2017-04-04 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/110 SAMZA-1178: Generate JSON from StreamPlan As the first step to visualize the StreamGraph/Plan, this patch generates a json representation of it. For the example StreamGraph

[GitHub] samza pull request #109: Samza 1186: Rename Processor to Job

2017-04-03 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/109 Samza 1186: Rename Processor to Job Now we have the top level Samza application, and each stage is called a job, the previous introduced "processor" naming should be rena

[GitHub] samza pull request #100: SAMZA-1172: Fix for the topological sort to handle ...

2017-03-28 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/100 SAMZA-1172: Fix for the topological sort to handle single-node loop In the processor graph, the topological sort missed adding to the visited set during graph traversal. This caused wrong graph

[GitHub] samza pull request #98: SAMZA-1171: Rewrite config in ApplicationRunnerMain ...

2017-03-27 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/98 SAMZA-1171: Rewrite config in ApplicationRunnerMain when creating ApplicationRunner The config needs to be rewritten before passing down to the ApplicationRunner. This is a bug

[GitHub] samza pull request #94: SAMZA-1137: Instantiate ApplicationRunner in SamzaCo...

2017-03-20 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/94 SAMZA-1137: Instantiate ApplicationRunner in SamzaContainer Create an ApplicationRunner in SamzaContainer to provide StreamSpecs for fluent API. You can merge this pull request into a Git

[GitHub] samza pull request #88: SAMZA-1131: RemoteApplicationRunner for cluster-base...

2017-03-16 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/88 SAMZA-1131: RemoteApplicationRunner for cluster-based Samza applications RemoteApplicationRunner starts the Samza StreamApplication on the remote cluster, e.g. Yarn. It uses ExecutionPlanner

[GitHub] samza pull request #79: Samza 1123: Create intermediate stream in partitionB...

2017-03-08 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/79 Samza 1123: Create intermediate stream in partitionBy() operator For partitionBy() operator, Samza generates an intermediate stream with id based on operator name and id, and system based

[GitHub] samza pull request #75: SAMZA-1067: Physical execution graph and planner for...

2017-03-03 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/75 SAMZA-1067: Physical execution graph and planner for fluent API Initial commit for the physical graph and plan. The commit includes: 1) Physical ProcessorGraph, where each processor

[GitHub] samza pull request #41: SAMZA-1078: Add my gpg key to KEYS

2017-01-10 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/41 SAMZA-1078: Add my gpg key to KEYS You can merge this pull request into a Git repository by running: $ git pull https://github.com/xinyuiscool/samza KEYS Alternatively you can review

[GitHub] samza pull request #37: SAMZA-1069: Fix Deadlock between KafkaSystemProducer...

2016-12-23 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/37 SAMZA-1069: Fix Deadlock between KafkaSystemProducer and KafkaProducer Moving the producer.close() and sources.flush() outside the lock so it won't have race condition with the kafka network