[GitHub] samza pull request #886: SAMZA-2074: Read configuration from coordinator str...
GitHub user shanthoosh opened a pull request: https://github.com/apache/samza/pull/886 SAMZA-2074: Read configuration from coordinator stream in checkpoint tool. Currently, each run of a `CheckpointTool` requires the `TaskModel` from the `JobModel` for updating the checkpoints. `JobModel` generation involves reading the `SystemStreamMetadata`, `SystemStreamPartitionMetadata` of the input streams of the job. Post samza 1.0, this would require the entire configuration bag stored in coordinator stream. As a followup to `SAMZA-2059`, this PR changes the `CheckpointTool` read the configuration from the coordinator stream and use it to generate `JobModel`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shanthoosh/samza read_config_from_coordinator_stream_for_checkpoint_tool Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/886.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #886 commit 910cfacac4278376b9c4ceb3c43407e19298a8b4 Author: Shanthoosh Venkataraman Date: 2019-01-15T18:04:31Z Read configuration from coordinator stream in checkpoint tool. ---
[GitHub] samza pull request #885: Type system for Samza SQL and Support for types in ...
GitHub user srinipunuru opened a pull request: https://github.com/apache/samza/pull/885 Type system for Samza SQL and Support for types in UDFS This checkin adds 1. Type system for Samza SQL. Previously Samza SQL was using Calcite's relational type system. We need an intermediate type system that is specific to Samza SQL so that we could support Beam SQL in future. This intermediate type system also allows to provide typing to Samza SQL UDFs. 2. Java annotations for Samza SQL that allows us to discover the Samza SQL UDFs easily and also provide users to configure name of the UDF, whether it is disabled. 3. Initial support for adding types in Samza SQL UDFs. Right now we are not using these types for validations. Future checkin will add that capability. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srinipunuru/samza sql-schema.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/885.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #885 commit a3af25702f38a06bc38f2fbf1886cf2d2e179762 Author: Srinivasulu Punuru Date: 2019-01-16T21:16:11Z Support for types in UDFS ---
[GitHub] samza pull request #884: SAMZA-2073: Do not commit the task offsets when shu...
Github user asfgit closed the pull request at: https://github.com/apache/samza/pull/884 ---
Re: InMemorySystemDescriptor ignores serde
Hi Tom, InMemorySystem is a system that is supposed to only support NoOpSerde since all the associated steams for this system are maintained in memory. In addition to this, if your test is using the Samza's Test Framework, it will override any explicit serde configs specified for streams to NoOp. You are expected to supply deserialized objects to the in-memory system. In addition to that in your email you mentioned: {unformat} I had still specified in my config: streams.in-0.samza.msg.serde=integer Apparently, that *was* respected by some part of the system because integers were deserialized properly! Removing this configuration value results in my operator receiving a byte array since the in-memory system only uses NoOpSerde. {unformat} Can you send me a snippet of test you were trying to fix so that I can understand the problem better? Thanks Sanil On Tue, 8 Jan 2019 at 17:28, Tom Davis wrote: > I am in the process of updating a project to 1.0 and spent today debugging > a > rather odd test failure. When using input/output streams with IntegerSerde, > things worked fine -- however, using LongSerde, every message value was 0! > I > eventually found that InMemorySystemDescriptor#getInputDescriptor ignores > the > serde passed to it. However, I had still specified in my config: > > streams.in-0.samza.msg.serde=integer > > Apparently that *was* respected by some part of the system because > integers were > deserialized properly! Removing this configuration value results in my > operator > receiving a byte array since the in-memory system only uses NoOpSerde. > > This behavior appears inconsistent with the previous version of Samza. The > old > `getInputStream` was passed a serde that was always used, but since the new > version receives a Descriptor that has already discarded the serde, I am > forced > into assuming NoOpSerde everywhere, at least for testing purposes. > > Not the end of the world, but it does introduce an inconsistency between > the > in-memory system and any other -- one that requires a fair bit of domain > knowledge to avoid. > > As always, thanks for the great project! >