[GitHub] samza pull request #886: SAMZA-2074: Read configuration from coordinator str...

2019-01-16 Thread shanthoosh
GitHub user shanthoosh opened a pull request:

https://github.com/apache/samza/pull/886

SAMZA-2074: Read configuration from coordinator stream in checkpoint tool.

 Currently, each run of a `CheckpointTool` requires the `TaskModel` from 
the `JobModel` for updating the checkpoints. `JobModel` generation involves 
reading the `SystemStreamMetadata`, `SystemStreamPartitionMetadata` of the 
input streams of the job. Post samza 1.0, this would require the entire 
configuration bag stored in coordinator stream. 

As a followup to `SAMZA-2059`, this PR changes the `CheckpointTool` read 
the configuration from the coordinator stream and use it to generate `JobModel`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shanthoosh/samza 
read_config_from_coordinator_stream_for_checkpoint_tool

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/886.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #886


commit 910cfacac4278376b9c4ceb3c43407e19298a8b4
Author: Shanthoosh Venkataraman 
Date:   2019-01-15T18:04:31Z

Read configuration from coordinator stream in checkpoint tool.




---


[GitHub] samza pull request #885: Type system for Samza SQL and Support for types in ...

2019-01-16 Thread srinipunuru
GitHub user srinipunuru opened a pull request:

https://github.com/apache/samza/pull/885

Type system for Samza SQL and Support for types in UDFS

This checkin adds 
1. Type system for Samza SQL. Previously Samza SQL was using Calcite's 
relational type system. We need an intermediate type system that is specific to 
Samza SQL so that we could support Beam SQL in future. This intermediate type 
system also allows to provide typing to Samza SQL UDFs.
2. Java annotations for Samza SQL that allows us to discover the Samza SQL 
UDFs easily and also provide users to configure name of the UDF, whether it is 
disabled. 
3. Initial support for adding types in Samza SQL UDFs. Right now we are not 
using these types for validations. Future checkin will add that capability.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srinipunuru/samza sql-schema.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/885.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #885


commit a3af25702f38a06bc38f2fbf1886cf2d2e179762
Author: Srinivasulu Punuru 
Date:   2019-01-16T21:16:11Z

Support for types in UDFS




---


[GitHub] samza pull request #884: SAMZA-2073: Do not commit the task offsets when shu...

2019-01-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/samza/pull/884


---


Re: InMemorySystemDescriptor ignores serde

2019-01-16 Thread Sanil Jain
Hi Tom,

InMemorySystem is a system that is supposed to only support NoOpSerde since
all the associated steams for this system are maintained in memory. In
addition to this, if your test is using the Samza's Test Framework, it will
override any explicit serde configs specified for streams to NoOp.


You are expected to supply deserialized objects to the in-memory system.


In addition to that in your email you mentioned:


{unformat}

I had still specified in my config:

streams.in-0.samza.msg.serde=integer


Apparently, that *was* respected by some part of the system because
integers were
deserialized properly! Removing this configuration value results in my
operator
receiving a byte array since the in-memory system only uses NoOpSerde.

{unformat}


Can you send me a snippet of test you were trying to fix so that I can
understand the problem better?


Thanks

Sanil

On Tue, 8 Jan 2019 at 17:28, Tom Davis  wrote:

> I am in the process of updating a project to 1.0 and spent today debugging
> a
> rather odd test failure. When using input/output streams with IntegerSerde,
> things worked fine -- however, using LongSerde, every message value was 0!
> I
> eventually found that InMemorySystemDescriptor#getInputDescriptor ignores
> the
> serde passed to it. However, I had still specified in my config:
>
> streams.in-0.samza.msg.serde=integer
>
> Apparently that *was* respected by some part of the system because
> integers were
> deserialized properly! Removing this configuration value results in my
> operator
> receiving a byte array since the in-memory system only uses NoOpSerde.
>
> This behavior appears inconsistent with the previous version of Samza. The
> old
> `getInputStream` was passed a serde that was always used, but since the new
> version receives a Descriptor that has already discarded the serde, I am
> forced
> into assuming NoOpSerde everywhere, at least for testing purposes.
>
> Not the end of the world, but it does introduce an inconsistency between
> the
> in-memory system and any other -- one that requires a fair bit of domain
> knowledge to avoid.
>
> As always, thanks for the great project!
>