Re: InMemorySystemDescriptor ignores serde

2019-01-17 Thread Xinyu Liu
Hi, Tom,

First, your observation about current InMemorySystem is exactly right and
thanks for raising this issue to the community!

The current InMemorySystem came up with a tight coupling with the Samza
test framework, which I believe put quite a lot of limitations on its uses,
e.g. using InMemorySystem in prototyping. Currently we are working on ways
to improve it so it can be used by normal user code. For your case, it
seems to me the inconsistency of using serde caused the confusion. You
point of being more consistent to support Serde in InMemory stream sounds
reasonable to me. I have the same impression that InMemory can be treated
the same way as other input streams. The initial rollout doesn't have this
feature and I create the ticket
https://issues.apache.org/jira/browse/SAMZA-2075 to track this. We will see
whether we can get it there in the next release.

Thanks,
Xinyu





On Thu, Jan 17, 2019 at 6:11 AM Tom Davis  wrote:

> Hey Sanil, thanks for the reply. I eventually figured that not
> supporting serdes for in-memory streams was an intentional restriction,
> I was just pointing out that it is inconsistent with earlier versions
> since it was relatively easy to supply stream serdes directly before the
> Descriptor API.
>
> I can't really send a test case along because it's all in Clojure and
> uses a Clojure-based API wrapper I wrote for interacting with Samza. In
> theory, the easiest test would be one where a Config contains the
> property I mentioned; with that, you should be able to run a simple
> pipeline that shows -- despite the NoOpSerde forced by
> InMemorySystemDescriptor -- the input is serialized using that serde.
>
> Anyway, I'm not sure if it's worth the trouble. I get why you'd forgo
> serialization for the in-memory system, it was just a handy way to test
> my entire pipeline which contains a few non-trivial custom serdes.
>
>
> Sanil Jain  writes:
>
> > Hi Tom,
> >
> > InMemorySystem is a system that is supposed to only support NoOpSerde
> since
> > all the associated steams for this system are maintained in memory. In
> > addition to this, if your test is using the Samza's Test Framework, it
> will
> > override any explicit serde configs specified for streams to NoOp.
> >
> >
> > You are expected to supply deserialized objects to the in-memory system.
> >
> >
> > In addition to that in your email you mentioned:
> >
> >
> > {unformat}
> >
> > I had still specified in my config:
> >
> > streams.in-0.samza.msg.serde=integer
> >
> >
> > Apparently, that *was* respected by some part of the system because
> > integers were
> > deserialized properly! Removing this configuration value results in my
> > operator
> > receiving a byte array since the in-memory system only uses NoOpSerde.
> >
> > {unformat}
> >
> >
> > Can you send me a snippet of test you were trying to fix so that I can
> > understand the problem better?
> >
> >
> > Thanks
> >
> > Sanil
> >
> > On Tue, 8 Jan 2019 at 17:28, Tom Davis  wrote:
> >
> >> I am in the process of updating a project to 1.0 and spent today
> debugging
> >> a
> >> rather odd test failure. When using input/output streams with
> IntegerSerde,
> >> things worked fine -- however, using LongSerde, every message value was
> 0!
> >> I
> >> eventually found that InMemorySystemDescriptor#getInputDescriptor
> ignores
> >> the
> >> serde passed to it. However, I had still specified in my config:
> >>
> >> streams.in-0.samza.msg.serde=integer
> >>
> >> Apparently that *was* respected by some part of the system because
> >> integers were
> >> deserialized properly! Removing this configuration value results in my
> >> operator
> >> receiving a byte array since the in-memory system only uses NoOpSerde.
> >>
> >> This behavior appears inconsistent with the previous version of Samza.
> The
> >> old
> >> `getInputStream` was passed a serde that was always used, but since the
> new
> >> version receives a Descriptor that has already discarded the serde, I am
> >> forced
> >> into assuming NoOpSerde everywhere, at least for testing purposes.
> >>
> >> Not the end of the world, but it does introduce an inconsistency between
> >> the
> >> in-memory system and any other -- one that requires a fair bit of domain
> >> knowledge to avoid.
> >>
> >> As always, thanks for the great project!
> >>
>


[GitHub] samza pull request #887: SAMZA-2076 RocksDbTableDescriptor should use Long t...

2019-01-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/samza/pull/887


---


[GitHub] samza pull request #751: Samza 1965: SAMZA 1.0 DOCUMENTATION FOR TEST Framew...

2019-01-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/samza/pull/751


---


[GitHub] samza pull request #880: SAMZA-2066 Refactor remote tables to separate vario...

2019-01-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/samza/pull/880


---


[GitHub] samza pull request #887: SAMZA-2076 RocksDbTableDescriptor should use Long t...

2019-01-17 Thread weisong44
GitHub user weisong44 opened a pull request:

https://github.com/apache/samza/pull/887

SAMZA-2076 RocksDbTableDescriptor should use Long type for TTL

Samza uses millisec as config value, while in rocksDB it's defined as 
int32. It's currently defined as integer in RocksDbTableDescriptor, the range 
isn't large enough to match, and it should be of Long type.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/weisong44/samza SAMZA-2076

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/887.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #887


commit 3f7ed71f8f5d6efe3567f039734ac96bb12679b6
Author: Wei Song 
Date:   2018-07-25T01:34:06Z

Added self to committer list

commit 5cbf9af9c12b96c5773a37274538f8baea8347e1
Author: Wei Song 
Date:   2018-08-03T20:38:41Z

Merge remote-tracking branch 'upstream/master'

commit a15a7c9a231adcc1b441351963e15a7855ea5879
Author: Wei Song 
Date:   2018-08-06T18:49:30Z

Merge remote-tracking branch 'upstream/master'

commit aae0f380e49950bc36a18e34d08041d554fc13a7
Author: Wei Song 
Date:   2018-08-16T17:50:45Z

Merge remote-tracking branch 'upstream/master'

commit 0440f75fcd16061d8ff87e775d4a1e731936ece0
Author: Wei Song 
Date:   2018-08-20T18:42:56Z

Merge remote-tracking branch 'upstream/master'

commit 4782c61df5eb046a57c0401ab94c1d88dc6ba0a7
Author: Wei Song 
Date:   2018-08-21T19:47:29Z

Merge remote-tracking branch 'upstream/master'

commit f28b491de20cf18357f9f92b780fb3eba52f6301
Author: Wei Song 
Date:   2018-08-23T22:22:40Z

Merge remote-tracking branch 'upstream/master'

commit df2f8d7b6ebc714a42376d98a45db20343f0a0fd
Author: Wei Song 
Date:   2018-08-27T16:10:59Z

Merge remote-tracking branch 'upstream/master'

commit de708f5e93bcdd352137db8c570469dd6a657ee5
Author: Wei Song 
Date:   2018-09-05T19:17:04Z

Merge remote-tracking branch 'upstream/master'

commit 5156239155e0a010242fd5f7d96de1892494e376
Author: Wei Song 
Date:   2018-09-11T22:27:45Z

Merge remote-tracking branch 'upstream/master'

commit eca0020411ef3997b947fb8ec7c30a400bd796df
Author: Wei Song 
Date:   2018-09-18T04:33:54Z

Merge remote-tracking branch 'upstream/master'

commit 239a0950539899ea14d1bd40dd462172f477b735
Author: Wei Song 
Date:   2018-09-19T20:37:32Z

Merge remote-tracking branch 'upstream/master'

commit 41299b5b03ccd6135715a55e39210d2e8cd15fe3
Author: Wei Song 
Date:   2018-09-20T16:41:17Z

Merge remote-tracking branch 'upstream/master'

commit a6c94add7724c0f1a41873e2c2050a9088ec0850
Author: Wei Song 
Date:   2018-09-20T17:39:31Z

Merge remote-tracking branch 'upstream/master'

commit 1c6a2eae475127e79078ddd378f95f5bb5a8bd45
Author: Wei Song 
Date:   2018-09-21T21:44:40Z

Merge remote-tracking branch 'upstream/master'

commit 8ee784415a329468972e894d96194a8240071ff4
Author: Wei Song 
Date:   2018-09-25T00:31:24Z

Merge remote-tracking branch 'upstream/master'

commit e19b4dc9a56dd6d9f18d4fa461c9349d7c29c5e2
Author: Wei Song 
Date:   2018-09-25T22:09:11Z

Merge remote-tracking branch 'upstream/master'

commit ec7d84095142895ce83d3a33120138520d07d872
Author: Wei Song 
Date:   2018-09-26T19:30:01Z

Merge remote-tracking branch 'upstream/master'

commit 242d8442b7d0ba458c4b94dce52cebce14d79217
Author: Wei Song 
Date:   2018-09-26T21:15:41Z

Merge remote-tracking branch 'upstream/master'

commit c85604e0b1b977eda236e249baf40a718087cea7
Author: Wei Song 
Date:   2018-09-27T17:16:46Z

Merge remote-tracking branch 'upstream/master'

commit 1e5de45a84e8a55353174ac8afb86ab4d5ed44d0
Author: Wei Song 
Date:   2018-09-29T05:45:06Z

Merge remote-tracking branch 'upstream/master'

commit f5731b10b8772d318e34b0cd0687361ffa158dcd
Author: Wei Song 
Date:   2018-10-01T20:33:10Z

Merge remote-tracking branch 'upstream/master'

commit 7706ab1f8f04cb3d90de23eed5d7dbb3d3b8bf3f
Author: Wei Song 
Date:   2018-10-02T17:05:30Z

Merge remote-tracking branch 'upstream/master'

commit f748050528cadb9c7126232c24c4a9d66942bf6c
Author: Wei Song 
Date:   2018-10-03T18:49:16Z

Merge remote-tracking branch 'upstream/master'

commit 05822f0a4001ab7723898ea5f4e61505ef1fe819
Author: Wei Song 
Date:   2018-10-04T22:52:17Z

Merge remote-tracking branch 'upstream/master'

commit 097958c897fb2a1ef48e2203b1c2433fe2ecd8d0
Author: Wei Song 
Date:   2018-10-05T22:17:18Z

Merge remote-tracking branch 'upstream/master'

commit a56c28dc09ada7c0783aa0c0b601c36bd63d989f
Author: Wei Song 
Date:   2018-10-11T17:06:44Z

Merge remote-tracking branch 'upstream/master'

commit 2c679c3915c96a703c6f528d504ea99063e82401
Author: Wei Song 
Date:   2018-10-11T21:58:19Z

Merge remote-tracking branch 'upstream/master'

commit a06e8ec2088fe0a2737624957077d6260b73e1c6
Author: Wei Song 
Date:   

[GitHub] samza pull request #882: DiagnosticsAppender for log4j2

2019-01-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/samza/pull/882


---


[GitHub] samza pull request #871: SAMZA-2056 : Adding a TaskMode in the TaskModel

2019-01-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/samza/pull/871


---


Re: InMemorySystemDescriptor ignores serde

2019-01-17 Thread Tom Davis

Hey Sanil, thanks for the reply. I eventually figured that not
supporting serdes for in-memory streams was an intentional restriction,
I was just pointing out that it is inconsistent with earlier versions
since it was relatively easy to supply stream serdes directly before the
Descriptor API.

I can't really send a test case along because it's all in Clojure and
uses a Clojure-based API wrapper I wrote for interacting with Samza. In
theory, the easiest test would be one where a Config contains the
property I mentioned; with that, you should be able to run a simple
pipeline that shows -- despite the NoOpSerde forced by
InMemorySystemDescriptor -- the input is serialized using that serde.

Anyway, I'm not sure if it's worth the trouble. I get why you'd forgo
serialization for the in-memory system, it was just a handy way to test
my entire pipeline which contains a few non-trivial custom serdes.


Sanil Jain  writes:


Hi Tom,

InMemorySystem is a system that is supposed to only support NoOpSerde since
all the associated steams for this system are maintained in memory. In
addition to this, if your test is using the Samza's Test Framework, it will
override any explicit serde configs specified for streams to NoOp.


You are expected to supply deserialized objects to the in-memory system.


In addition to that in your email you mentioned:


{unformat}

I had still specified in my config:

streams.in-0.samza.msg.serde=integer


Apparently, that *was* respected by some part of the system because
integers were
deserialized properly! Removing this configuration value results in my
operator
receiving a byte array since the in-memory system only uses NoOpSerde.

{unformat}


Can you send me a snippet of test you were trying to fix so that I can
understand the problem better?


Thanks

Sanil

On Tue, 8 Jan 2019 at 17:28, Tom Davis  wrote:


I am in the process of updating a project to 1.0 and spent today debugging
a
rather odd test failure. When using input/output streams with IntegerSerde,
things worked fine -- however, using LongSerde, every message value was 0!
I
eventually found that InMemorySystemDescriptor#getInputDescriptor ignores
the
serde passed to it. However, I had still specified in my config:

streams.in-0.samza.msg.serde=integer

Apparently that *was* respected by some part of the system because
integers were
deserialized properly! Removing this configuration value results in my
operator
receiving a byte array since the in-memory system only uses NoOpSerde.

This behavior appears inconsistent with the previous version of Samza. The
old
`getInputStream` was passed a serde that was always used, but since the new
version receives a Descriptor that has already discarded the serde, I am
forced
into assuming NoOpSerde everywhere, at least for testing purposes.

Not the end of the world, but it does introduce an inconsistency between
the
in-memory system and any other -- one that requires a fair bit of domain
knowledge to avoid.

As always, thanks for the great project!



[GitHub] samza pull request #888: SAMZA-2078: Add zookeeper session metrics to ZkJobC...

2019-01-17 Thread shanthoosh
GitHub user shanthoosh opened a pull request:

https://github.com/apache/samza/pull/888

SAMZA-2078: Add zookeeper session metrics to ZkJobCoordinator.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shanthoosh/samza SAMZA-2078

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/888.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #888


commit ee796d57b4fe69856b1918a50f8c0393b3d6c87e
Author: Shanthoosh Venkataraman 
Date:   2019-01-18T00:32:53Z

SAMZA-2078: Add zookeeper session metrics to ZkJobCoordinator.




---