Re: Proposal: samza metrics to use tags

2016-03-30 Thread Yi Pan
hi, Vadim, Nice suggestion. So I guess that we can do it by making the internal metrics name as a structured object w/ tag + name, and allow actual implementation of metric reporters to format the serialized metrics name according to the different metrics DB. Would you please open a JIRA for

Re: Review Request 44935: SAMZA-898 - TestSamzaTaskManager incorrectly shares mock state across tests that cause failures when test ordering changes

2016-03-28 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/44935/#review125808 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On March

Re: Review Request 44604: split deployment logic

2016-03-25 Thread Yi Pan (Data Infrastructure)
deployment as well. Please add the online documentation for this feature. - Yi Pan (Data Infrastructure) On March 15, 2016, 5:33 p.m., Boris Shkolnik wrote: > > --- > This is an automatically generated e-mail. To reply, visit

Re: Review Request 44604: split deployment logic

2016-03-25 Thread Yi Pan (Data Infrastructure)
de is almost dup from the ProcessJobFactory. Can we fold this in the config.get()? Or some utility class? - Yi Pan (Data Infrastructure) On March 15, 2016, 5:33 p.m., Boris Shkolnik wrote: > > --- > This is an automatically gener

Review Request 45324: SAMZA-914: Initial draft for Java programming APIs on operators supporting DAGs

2016-03-24 Thread Yi Pan (Data Infrastructure)
://reviews.apache.org/r/45324/diff/ Testing --- Locally build via ./gradlew clean build Thanks, Yi Pan (Data Infrastructure)

Re: Review Request 45144: SAMZA-906 Host Affinity - Minimize task reassignment when container count changes

2016-03-24 Thread Yi Pan (Data Infrastructure)
/45144/#comment188120> nit: the key in the example shows containerId = 1 and the values show containerId = 139. They should be the same, right? - Yi Pan (Data Infrastructure) On March 24, 2016, 9:33 p.m., Jake Maes wrote: > > ---

Re: Picking up checkpoint when upgrade to 0.10.0 from 0.9?

2016-03-23 Thread Yi Pan
> org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory > > task.checkpoint.system=kafka > > task.checkpoint.replication.factor=3 > > > > On Tue, Mar 22, 2016 at 1:33 PM, Yi Pan <nickpa...@gmail.com> wrote: > > > Hi, Yuanchi, > > > > Did you check y

Re: Jackson null pointer when upgrading to Samza 0.10.0

2016-03-23 Thread Yi Pan
to see if that's >> related since our current version is 2.6.0. >> >> Thanks! >> Yuanchi >> >> On Wed, Mar 23, 2016 at 2:57 PM, Yi Pan <nickpa...@gmail.com> wrote: >> >>> Hi, Yuanchi, >>> >>> Is this related w/ the issue you repor

Re: Jackson null pointer when upgrading to Samza 0.10.0

2016-03-23 Thread Yi Pan
Hi, Yuanchi, Is this related w/ the issue you reported earlier regarding to "problem picking up checkpoint after upgrade" in another thread? I assume that you are using the official Samza 0.10 release? That has jackson version 1.8.5 by default. How do you change it in your own build/package to

Review Request 45189: SAMZA-908: fix a missing dependency in samza-sql-calcite

2016-03-22 Thread Yi Pan (Data Infrastructure)
/45189/diff/ Testing --- Thanks, Yi Pan (Data Infrastructure)

Re: Review Request 44772: SAMZA-893 Fix the host affinity expiration logic bug introduced in SAMZA-867

2016-03-22 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/44772/#review124924 --- Ship it! LGTM! Thanks! - Yi Pan (Data Infrastructure

Re: Picking up checkpoint when upgrade to 0.10.0 from 0.9?

2016-03-22 Thread Yi Pan
Hi, Yuanchi, Did you check your configuration of task.checkpoint.system? What are the config value you used in 0.9 and what's the current configuration in 0.10? If you can share your config before and after the upgrade, + the container log from 0.10, we can be more helpful. Thanks! -Yi On Tue,

Re: Review Request 44952: SAMZA-854: Zookeeper dependency upgraded to version 3.4.8

2016-03-21 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/44952/#review124627 --- Ship it! LGTM! Thanks! - Yi Pan (Data Infrastructure

Re: Review Request 44405: SAMZA-882 - Detect partition count changes in input streams

2016-03-14 Thread Yi Pan (Data Infrastructure)
the purpose of this section of code within if? - Yi Pan (Data Infrastructure) On March 4, 2016, 8:14 p.m., Navina Ramesh wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://re

Re: Samza with Rocksdb hanged forever in KeyValueStorageEngine.restore()

2016-03-14 Thread Yi Pan
Hi, Weinan, We have not seen this hanging issue at LinkedIn. But it would be good to track that. We will keep an eye on issue-696 as well. Thanks for reporting! -Yi On Sun, Mar 13, 2016 at 8:03 PM, Weinan Zhao wrote: > Hi all, > > I've encountered Samza on yarn couldn't

Re: Reporting deserialization error in StreamTask

2016-03-11 Thread Yi Pan
Hi, Jack, There have been asks similar to yours, like SAMZA-427. As fixed in SAMZA-59, we also included metrics to report the count of deserialization errors. If you are asking about the actual message that caused the error to be reported, there has to be a different way. Options are: 1) write

Re: Need some clarifications - Newbie to samza

2016-03-11 Thread Yi Pan
Hi, Rohit, On Thu, Mar 10, 2016 at 9:53 PM, Rohit Bansal wrote: > 1) Is it possible to analyze the metrics (graphs of messages , disk space > and other details for performance / status check) ? In Application master > in YARN UI, it contains only config information and

Re: RocksDB TTL not working

2016-03-11 Thread Yi Pan
Hi, Rohit, RocksDB TTL is best-effort: https://github.com/facebook/rocksdb/wiki/Time-to-Live. And the expired records are removed during compaction. I wonder in your test case whether the compaction has ever happened. How long did you wait to see whether records are gone or not? -Yi On Thu,

Re: Review Request 43350: SAMZA-867 Fix job restart/shutdown in the event of a node outage.

2016-03-07 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/43350/#review122363 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On Feb. 24

Re: Review Request 41068: SAMZA-813: Add Seek functionality to KeyValueStoreIterator

2016-03-07 Thread Yi Pan (Data Infrastructure)
068/#comment184192> I recommend to drop this seekToLast() for now. Based on your use case description, to achieve the backward traverse, you would need both seekToLast() and previous() in the iterator. Adding seekToLast() only won't be sufficient. - Yi Pan (Data Infrastructure) On Feb. 26, 2016

Review Request 44399: SAMZA-884: upgrade Jackson to 1.9.13 in hello-samza latest

2016-03-04 Thread Yi Pan (Data Infrastructure)
/ Testing --- Locally build and deployed w/ ./gradlew and mvn Thanks, Yi Pan (Data Infrastructure)

Re: Lag metric

2016-03-03 Thread Yi Pan
Hi, Vadim, Glad that you figured out. Please let us know if you have further questions. -Yi On Wed, Mar 2, 2016 at 6:57 PM, Vadim Chekan wrote: > Never mind, I forgot to "listen" to new metrics in my reporter, so those > metrics which were created immediately worked

Re: ThreadJobFactory in production

2016-03-03 Thread Yi Pan
orking well, so we are thinking to update that patch later this week > so it can be added to the main project. > > HTH, > > Jose Luis Barrueta > > On Wed, Mar 2, 2016 at 2:11 PM, Yi Pan <nickpa...@gmail.com> wrote: > > > Hi, Robert, > > >

Re: Review Request 44293: SAMZA-883 Improve logging for container handling and kafka abdication

2016-03-02 Thread Yi Pan (Data Infrastructure)
issue. samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala (line 259) <https://reviews.apache.org/r/44293/#comment183537> Were SREs alerted on the number of WARNs? If yes, we should reduce it to INFO. - Yi Pan (Data Infrastructure) On March 2, 2016, 9:

Re: ThreadJobFactory in production

2016-03-02 Thread Yi Pan
Hi, Robert, The main reason that ThreadJobFactory and ProcessJobFactory are not considered "production-ready" is that there is only one container for the job and all tasks are assigned to the single container. Hence, it is not easy to scale out of a single host. As Rick mentioned, Netflix has

Re: Review Request 43732: Implemented AvroDataFileHdfsWriter

2016-03-01 Thread Yi Pan (Data Infrastructure)
uploaded. Could you try to rebase and upload again?Thanks! gradle/dependency-versions.gradle (line 23) <https://reviews.apache.org/r/43732/#comment183340> Please rebase w/ the latest master branch. - Yi Pan (Data Infrastructure) On Feb. 25, 2016, 7:39 p.m., Edi Bice

Re: [DISCUSS] Moving to github/pull-request for code review and check-in

2016-02-25 Thread Yi Pan
Thanks for all the +1s! I have created https://issues.apache.org/jira/browse/SAMZA-880 to track it. Thanks! -Yi On Wed, Feb 24, 2016 at 5:31 PM, Boris Shkolnik <bor...@gmail.com> wrote: > +1 for pull requests. > > On Thu, Feb 18, 2016 at 3:53 PM, Yi Pan <nickpa...@gmail.co

Re: Upgrading Samza from 0.9.1 to 0.10.0

2016-02-25 Thread Yi Pan
Hi, Jose, Glad that you have already figured out the problem! Cheers! -Yi On Wed, Feb 24, 2016 at 2:50 PM, José Barrueta wrote: > Hi all, > > Just to let you know, we already figured it out the problem, issue was that > we had a dependency version conflict that result in

Re: Review Request 43732: Implemented AvroDataFileHdfsWriter and exposed several RocksDb config

2016-02-23 Thread Yi Pan (Data Infrastructure)
e AvroDataFileWriter? Can we move it to another patch? - Yi Pan (Data Infrastructure) On Feb. 18, 2016, 7:46 p.m., Edi Bice wrote: > > --- > This is an automatically generated e-ma

Re: Question about Stream-Stream join

2016-02-23 Thread Yi Pan
Hi, Chad, The following code will let you find the stream name: {code} envelope.getSystemStreamPartition().getSystemStream().getStream(); {code} Thanks! -Yi On Tue, Feb 23, 2016 at 8:14 AM, Chad Scribner wrote: > Hi All, > > First time emailing the list. I just

Re: [DISCUSS] Moving to github/pull-request for code review and check-in

2016-02-19 Thread Yi Pan
;> +1 - Thanks for bringing this up, Yi. I've done it both ways and feel > >>>> pull requests are much easier. > >>>> > >>>> Sent from my iPhone > >>>> > >>>>> On Feb 18, 2016, at 4:25 PM, Navina Ramesh > >>>

Re: Review Request 43766: SAMZA-872: little change in Logging docs

2016-02-19 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/43766/#review119907 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On Feb. 19

[DISCUSS] Moving to github/pull-request for code review and check-in

2016-02-18 Thread Yi Pan
Hi, all, I want to start the discussion on our code review/commit process. I felt that our code review and check-in process is a little bit cumbersome: - developers need to create RBs and attach diff to JIRA - committers need to review RBs, dowload diff and apply, then push. It would be much

Re: Review Request 43053: allow warning instead of fail in case of invalid num of partitions in the checkpoint partition

2016-02-18 Thread Yi Pan (Data Infrastructure)
.e. wrong system configuration) - Yi Pan (Data Infrastructure) On Feb. 18, 2016, 8:04 p.m., Boris Shkolnik wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https:

Re: Review Request 43589: Avoid unnecessary flushes in CachedStore

2016-02-18 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/43589/#review119654 --- Ship it! lgtm! Thanks! - Yi Pan (Data Infrastructure

Review Request 43550: SAMZA-836: fix unit test failure w/ FlushOptions() in rocksdbjni-3.13.1

2016-02-12 Thread Yi Pan (Data Infrastructure)
://reviews.apache.org/r/43550/diff/ Testing --- Ran the full set of unit tests both on Mac and Linux and both passed. Thanks, Yi Pan (Data Infrastructure)

Re: Allow user-specified class-loader

2016-02-11 Thread Yi Pan
O is the requirement to build a jar to run a > job > > > during > > > development. I think it would be possible to define these classes in > Java > > > and > > > have them call into a Clojure API but that's basically what I'm trying > to > > > avoid > >

Re: Zombie writers protection

2016-02-10 Thread Yi Pan
Hi, Rick and John, Thanks for the great discussion! As Jacob said, we realized the possible drawbacks relying solely on YARN for process liveness detection as well and that's why SAMZA-871 was opened. Please help to comment on the JIRA so that we can track the discussion and move the design

Re: Review Request 42484: hello-samza 0.10.0 changes for CDH 5.4 distribution

2016-02-09 Thread Yi Pan (Data Infrastructure)
> On Jan. 19, 2016, 4:38 a.m., Yi Pan (Data Infrastructure) wrote: > > pom.xml, line 289 > > <https://reviews.apache.org/r/42484/diff/1/?file=1200969#file1200969line289> > > > > We have observed an issue with YARN 2.6.0 AM client that would not > > r

Re: java.lang.ClassNotFoundException: org.codehaus.jackson.annotate.JsonClass

2016-02-09 Thread Yi Pan
Hi, Avi, Could you try to fix the jackson version to 1.9.13 in your build first? It seems like a incompatible jackson version issue, as you have noticed in your classpath. It would be good if you can share your maven pom.xml as well. Thanks! -Yi On Mon, Feb 8, 2016 at 1:12 PM, Avi Flax

Review Request 43404: SAMZA-851: update Hello-Samza w/ CDH tutorial documentation

2016-02-09 Thread Yi Pan (Data Infrastructure)
--- Thanks, Yi Pan (Data Infrastructure)

Re: HTTP-based Elasticsearch system producer and reusable task

2016-02-09 Thread Yi Pan
t end to end latency > stats >(total pipeline latency = commit time - event time). > > #3 is easily solvable with a additional plugin options. #1 and #2 require > changing the system producer API. > > Roger > > On Tue, Feb 9, 2016 at 10:56 AM, Yi Pan <nickpa...@gmail.c

Re: Allow user-specified class-loader

2016-02-09 Thread Yi Pan
> notify the sender immediately by telephone at 857.285.1263 or by return > email and promptly delete this message from your system. > > On 9 February 2016 at 22:25, Yi Pan <nickpa...@gmail.com> wrote: > > > Hi, Andy, > > > > I think that you are looking for the feature in

Re: Review Request 43074: SAMZA-866 Refactor and fix Container allocation logic.

2016-02-09 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/43074/#review118603 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On Feb. 3

Re: Review Request 43350: SAMZA-867 Fix job restart/shutdown in the event of a node outage.

2016-02-09 Thread Yi Pan (Data Infrastructure)
method SamzaAppState.isValidContainerId(containerId) here. - Yi Pan (Data Infrastructure) On Feb. 10, 2016, 2:40 a.m., Jake Maes wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.

Re: Allow user-specified class-loader

2016-02-09 Thread Yi Pan
Hi, Andy, I think that you are looking for the feature in SAMZA-697. Or are you looking for something even more specific? On Tue, Feb 9, 2016 at 10:22 PM, Andy Chambers < andy.chamb...@fundingcircle.com> wrote: > Hey Folks, > I'm trying to build some tooling to make writing jobs in Clojure a

Re: Review Request 43053: allow warning instead of fail in case of invalid num of partitions in the checkpoint partition

2016-02-09 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/43053/#review118598 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On Feb. 5

Re: Review Request 43053: allow warning instead of fail in case of invalid num of partitions in the checkpoint partition

2016-02-09 Thread Yi Pan (Data Infrastructure)
/JobConfig.scala (line 46) <https://reviews.apache.org/r/43053/#comment179879> One more nit that I forgot: it would be nice to document why we have to add this config: a) Kafka topic auto-creation can not be easily turned off; b) Kafka delete topic does not work. - Yi Pan (Data Infrastructure) On

Re: Review Request 43350: SAMZA-867 Fix job restart/shutdown in the event of a node outage.

2016-02-09 Thread Yi Pan (Data Infrastructure)
43074? - Yi Pan (Data Infrastructure) On Feb. 10, 2016, 2:40 a.m., Jake Maes wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache

Re: Download page seems out of date

2016-02-08 Thread Yi Pan
Hi, Avi, Thanks for point out the online doc problem! I messed the latest online doc up when fixing SAMZA-782. Now it should be fixed! Please check the latest online doc again and let us know if you see any further problems w/ it. Thanks a lot! -Yi On Mon, Feb 8, 2016 at 12:39 PM, Avi Flax

Help to vote on Samza talks in HadoopSummit

2016-02-05 Thread Yi Pan
Hi, all Samza lovers, We recently submit a talk on Samza to Hadoop Summit this year. If you are interested in seeing this talk in the conference, please help to vote on the talk:

Re: Need Help with Samza

2016-02-04 Thread Yi Pan
Hi, Ramesh, Could you share the exact stack trace and the job configuration, especially the Kafka producer configuration for the changelog system? Thanks! -Yi On Thu, Feb 4, 2016 at 11:56 AM, Ramesh Bhojan wrote: > Dear team @ Samza, > I would really appreciate some

Re: ChangeLog Question for TTL rocksDB stores

2016-01-28 Thread Yi Pan
Hi, David, The "compaction" referred to together w/ TTL is referring to RocksDb's compaction, not the Kafka-based changelog topic. Currently, TTL is not applied to Kafka-based changelog topic. SAMZA-677 is opened for this. -Yi On Thu, Jan 28, 2016 at 11:36 AM, David Garcia

Re: Custom System Consumer filling up memory

2016-01-26 Thread Yi Pan
d as a check inside the method > BlockingEnvelopeMap.put(...) - it is better to delay a bit than halt/crash > the whole consumer due to memory limitation. > > > Rgds, > Marcelo > > > From: Yi Pan <nickpa...@gmail.com> > To: dev@samza.apache.org; Marcelo Romaniuc <mroma...

Re: Question on zero downtime deployment of Samza jobs

2016-01-25 Thread Yi Pan
;> email Peter (CC'ed) or me about any questions. FYI: the real solution would >> be to implement standby containers. This solution is an attempt to do the >> same. >> >> On Thu, Jan 14, 2016 at 10:17 AM, Yi Pan <nickpa...@gmail.com> wrote: >> >>> It

Re: Review Request 42619: move public kv api from samza-kv to samza-api package

2016-01-25 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/42619/#review116220 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On Jan. 21

Re: Review Request 41068: SAMZA-813: Add Seek functionality to KeyValueStoreIterator

2016-01-19 Thread Yi Pan (Data Infrastructure)
> On Dec. 8, 2015, 10:55 p.m., Yi Pan (Data Infrastructure) wrote: > > @Amit, thanks for putting up this patch. I have two high-level comments: > > * I would prefer to keep the return type of seek() as an iterator. The > > pattern of the current API looks fine to me: al

Re: Data consistency and check-pointing

2016-01-18 Thread Yi Pan
Hi, Michael, Your use case sounds much like a "customized checkpointing" to me. We have similar cases in LinkedIn and the following are the solution in production: 1) disable Samza auto-checkpoint by setting the commit_ms to -1 2) explicitly calling TaskCoordinator.commit() in sync with closing

Re: Review Request 42484: hello-samza 0.10.0 changes for CDH 5.4 distribution

2016-01-18 Thread Yi Pan (Data Infrastructure)
484/#comment175966> Samza 0.10.0 is officially released. Hence, this change should be made in hello-samza master branch, not latest. The latest branch is used to track and keep in-sync with samza trunk, which is under-development branch. - Yi Pan (Data Infrastructure) On Jan. 19, 2016, 4:16 a.

Re: Review Request 42484: hello-samza 0.10.0 changes for CDH 5.4 distribution

2016-01-18 Thread Yi Pan (Data Infrastructure)
484/#comment175968> We have observed an issue with YARN 2.6.0 AM client that would not refresh the token (YARN-3103). Hence, we have updated the minimum required YARN version for Samza 0.10.0 to YARN 2.6.0. Does CDH 5.4 not having this issue? - Yi Pan (Data Infrastructure) On Jan. 19, 2016, 4:

Re: Same error when using an ubuntu (15.10) image

2016-01-15 Thread Yi Pan
roject, changed the directory into the > project and ran ‘./gradlew’. Since it’s stated ‘./gradlew’ and the file is > present in the project, that’s what I assumed. > > Cheers > Christian > -- > > > > Christian Kniep | Release Engineer > > www.gaikai.com > &

Re: Same error when using an ubuntu (15.10) image

2016-01-15 Thread Yi Pan
or walking in my shoes a bit. Let's say afterwards you would have a > howto that serve java-rockies like me as well. :) > > Cheers > Christian > > On Fri, Jan 15, 2016 at 6:45 PM, Yi Pan <nickpa...@gmail.com> wrote: > > > Hi, Christian, > > > > > > The

Re: Same error when using an ubuntu (15.10) image

2016-01-15 Thread Yi Pan
but it seems: Habemus Samza![image: >> Inline image 1] >> >> On Fri, Jan 15, 2016 at 8:22 PM, Christian Kniep <ckn...@gaikai.com> >> wrote: >> >>> I downloaded the tgz from http://samza.apache.org/startup/download/. >>> >>> I'll give a t

Re: Samza Join example

2016-01-14 Thread Yi Pan
Hi, Stanislov, That's awesome! It would be great to have this integrated w/ Samza tutorial. Would you mind to create a tutorial page for the join job implementation in Samza? Thanks a lot! -Yi On Thu, Jan 14, 2016 at 7:28 AM, Stanislav Los wrote: > If anyone

Re: Same error when using an ubuntu (15.10) image

2016-01-14 Thread Yi Pan
Hi, Christian, Which local directory are you running ./gradlew publishToMavenLocal from? The command needs to be executed from the directory under which the Samza project is checked out, *not* where the hello-samza project is checked out. -Yi On Thu, Jan 14, 2016 at 7:53 AM, Christian Kniep

Re: Question on zero downtime deployment of Samza jobs

2016-01-14 Thread Yi Pan
It might be the mail list restriction. Could you try to my personal email (i.e. nickpa...@gmail.com)? On Thu, Jan 14, 2016 at 10:15 AM, Chinmay Soman <chinmay.cere...@gmail.com> wrote: > It shows as attached in my sent email. That's weird. > > On Thu, Jan 14, 2016 at 10:14 AM,

Re: Production deployement

2016-01-13 Thread Yi Pan
Hi, Alex, I apologize for the late reply. Let me try to give some feedbacks/comments below: On Thu, Jan 7, 2016 at 3:59 PM, Alexander Filipchik wrote: > > 1) What is the best way to handle partial outages. Let's say my yarn > cluster is deployed on amazon among 3

Re: Java Object in RocksDb

2016-01-07 Thread Yi Pan
Hi, Leo, Samza RocksDB store stores ByteArray for keys and values on-disk. You can define your own key and msg serde via the configuration ( http://samza.apache.org/learn/documentation/0.10/jobs/configuration-table.html#stores-key-serde,

Samza 0.10 released

2016-01-04 Thread Yi Pan
Hi, all, In case you missed the announcement of Samza 0.10 release before the Christmas, please check it out here: https://blogs.apache.org/samza/ and help to spread out the word on twitter. And happy new year to everyone! Thanks! -Yi

Re: Review Request 41663: SAMZA-843 : Slow start of Samza jobs with large number of containers

2016-01-04 Thread Yi Pan (Data Infrastructure)
/JobCoordinator.scala <https://reviews.apache.org/r/41663/#comment173181> I am a bit confused here. If the refreshJobModel() function actually returns a different jobModel than what's already set in jobModelRef, isn't the whole point is to replace it w/ the new one? - Yi Pan (Data Infrastr

Re: Review Request 41663: SAMZA-843 : Slow start of Samza jobs with large number of containers

2016-01-04 Thread Yi Pan (Data Infrastructure)
> On Jan. 4, 2016, 11:11 p.m., Yi Pan (Data Infrastructure) wrote: > > samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala, > > line 204 > > <https://reviews.apache.org/r/41663/diff/2-3/?file=1181044#file1181044line204> > >

Re: Batch processing stream-stream joins

2016-01-04 Thread Yi Pan
Hi, Rick, Please refer to the whole discussion on SAMZA-552, which exactly targets the issues that you are considering. We have been working on the design and proto-type since last year. The work was paused for the last few months due to other priorities. We are planning to resume the work this

Review Request 41636: SAMZA-848: merge changes from master to latest

2015-12-22 Thread Yi Pan (Data Infrastructure)
-gradle.txt PRE-CREATION README.md 0a71cf634a6757d8f5cfe4ca1d9cef1e45695c8f src/main/config/wikipedia-parser.properties 8f1086f051ed9ebc5df57dc3897a5fb1497d8c18 Diff: https://reviews.apache.org/r/41636/diff/ Testing --- All build and local deployment succeeded. Thanks, Yi Pan (Data

Re: KafkaSystemProducer partitioning

2015-12-21 Thread Yi Pan
are writing > some upstream Samza jobs. The only way to get it write in a particular > partitioning scheme is then to write a different KafkaSystemFactory right ? > Or perhaps patch the existing one ? I don't see a reason why it has to > always use the default partitioning.. > > On 17

Review Request 41580: SAMZA-831: Update Samza 0.10 online documentation

2015-12-18 Thread Yi Pan (Data Infrastructure)
--- Generated the site locally and test successfully. Thanks, Yi Pan (Data Infrastructure)

Re: Review Request 41576: SAMZA-831: preparing 0.10.0 release candidate

2015-12-18 Thread Yi Pan (Data Infrastructure)
gradle.properties b18c0cb62aec7592e8bfb1f2aa83c6b8eada867f Diff: https://reviews.apache.org/r/41576/diff/ Testing (updated) --- Locally published all sites and traversed successfully. Thanks, Yi Pan (Data Infrastructure)

Re: Review Request 41576: SAMZA-831: preparing 0.10.0 release candidate

2015-12-18 Thread Yi Pan (Data Infrastructure)
--- Locally published all sites and traversed successfully. Thanks, Yi Pan (Data Infrastructure)

Re: SystemConsumer questions

2015-12-17 Thread Yi Pan
Hi, Ivan, Sorry to reply late. Could you explain what state that you have to maintain in SystemConsumer, not in KV-store and checkpoint topics? Samza's SystemConsumer is designed as a "pump" to simply pump the messages in to Samza StreamTasks, where the main stateful process is executed. Why and

Re: Statefull system consumer

2015-12-17 Thread Yi Pan
Hi, Anton, It seems to me that the best option would possibly use the row number as the IncomingMessageEnvelope's offset. Then, when Samza commits the checkpoint, it will commit the row number as the offset. When the Samza job restarts, the row number would be read from the checkpoint topic and

Re: KafkaSystemProducer partitioning

2015-12-17 Thread Yi Pan
Hi, Michal, Sorry to reply late. Actually, you are right that the "partition.class" configuration is not used in Samza to determine the outgoing partition. In Samza, partition is defined by the following code sections: {code} val topicName = envelope.getSystemStream.getStream val partitions:

Re: Configuring RocksDB SST file size

2015-12-17 Thread Yi Pan
Hi, Kishore, Could you open a JIRA for this small SST files issue? It is good to track it s.t. we won't forget this one. Thanks! -Yi On Thu, Dec 17, 2015 at 4:16 AM, Kishore N C wrote: > Hi Tao, > > > I am not sure what do you mean by ulimit issues > > When so many small

Re: Random connection errors

2015-12-14 Thread Yi Pan
Hi, Kishore, First, I would like to ask which version of Samza you are running? And if you can attach the log and config of your container (i.e. I assume the log you attached here is a container log?), it would be greatly helpful. Thanks a lot! -Yi On Mon, Dec 14, 2015 at 5:07 AM, Kishore N C

Re: Review Request 41071: SAMZA-843: fix heap usage increase caused by container timer change

2015-12-08 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/41071/#review109387 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On Dec. 8

Re: Review Request 41106: SAMZA-833: ProcessJob mishandling containers

2015-12-08 Thread Yi Pan (Data Infrastructure)
object: import org.apache.samza.config.JobConfig.Config2Job - Yi Pan (Data Infrastructure) On Dec. 8, 2015, 10:49 p.m., Tao Feng wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://re

Re: Passing Java system properties to YARN containers of Samza Job

2015-12-08 Thread Yi Pan
Hi, Gordon, Try to use --config X.Y.Z=value from the command line to run run-job.sh On Tue, Dec 8, 2015 at 5:57 PM, Gordon Tai wrote: > Hi Rad, > > Our Samza jobs run on on-premiere clusters, on top of YARN and Kafka. So > IAM roles won't be an option either. > > BR, > > On 9

[VOTE] Apache Samza 0.10.0 RC1

2015-12-07 Thread Yi Pan
[ ] -1 disapprove (and reason why) +1 from my side for the release. Yi Pan nickpa...@gmail.com

Re: table-stream join

2015-12-07 Thread Yi Pan
Hi, Bart, Your question is more like "is Kafka reliable against failures"? As for the reliability of the changelog, Samza is designed as reliable as the underlying messaging layer provides. In the case of Kafka, there are configurations in the Kafka producer that users can tune up to make sure of

Re: Review Request 40934: SAMZA-827: Handle null offsets when writing state store OFFSET file

2015-12-03 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/40934/#review108910 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On Dec. 3

Re: Executing Samza jobs natively in Kubernetes

2015-12-02 Thread Yi Pan
Hi, Elias, Thanks a lot to put up the patch for the simple job running in Kubernetes! As Kartik mentioned, that is well aligned w/ our goal to make Samza job launching easier. I am glad that we actually share a lot of common ideas from independent minds. Let me try to give my opinions on this: 1.

Re: Review Request 40624: SAMZA-775: netflix patch for memory size based throttling.

2015-11-24 Thread Yi Pan (Data Infrastructure)
/TestKafkaSystemConsumer.scala 23fa9398e89edeeabfd3ee754c27a8f3bec417b0 Diff: https://reviews.apache.org/r/40624/diff/ Testing --- ./gradlew clean build passed. run ./bin/integration-tests.sh passed as well. Thanks, Yi Pan (Data Infrastructure)

Re: Review Request 40624: SAMZA-775: netflix patch for memory size based throttling.

2015-11-24 Thread Yi Pan (Data Infrastructure)
o move this condition to KafkaConfig as a method - boolean > > isFetchLimitByBytesEnabled(); Make sense. Will do. - Yi --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/40624/#review107823 --

Re: Review Request 40624: SAMZA-775: netflix patch for memory size based throttling.

2015-11-24 Thread Yi Pan (Data Infrastructure)
/TestKafkaSystemConsumer.scala 23fa9398e89edeeabfd3ee754c27a8f3bec417b0 Diff: https://reviews.apache.org/r/40624/diff/ Testing (updated) --- ./gradlew clean build passed. run ./bin/integration-tests.sh passed as well. Thanks, Yi Pan (Data Infrastructure)

Review Request 40624: SAMZA-775: netflix patch for memory size based throttling.

2015-11-23 Thread Yi Pan (Data Infrastructure)
23fa9398e89edeeabfd3ee754c27a8f3bec417b0 Diff: https://reviews.apache.org/r/40624/diff/ Testing --- ./gradlew clean build passed. Thanks, Yi Pan (Data Infrastructure)

Review Request 40622: SAMZA-822: add job.coordinator.system to integration tests

2015-11-23 Thread Yi Pan (Data Infrastructure)
-test/src/main/config/perf/kafka-read-write-performance.properties 112caba4f3b5cc05570ca4f746e0bf4ecf9a21c7 Diff: https://reviews.apache.org/r/40622/diff/ Testing --- Thanks, Yi Pan (Data Infrastructure)

Re: Review Request 40525: SAMZA-819: RocksDbKeyValueStore.flush() should be implemented

2015-11-20 Thread Yi Pan (Data Infrastructure)
> On Nov. 20, 2015, 6:53 a.m., Yi Pan (Data Infrastructure) wrote: > > samza-kv-rocksdb/src/test/scala/org/apache/samza/storage/kv/TestRocksDbKeyValueStore.scala, > > line 84 > > <https://reviews.apache.org/r/40525/diff/1/?file=1133805#file1133805line84&g

Re: Review Request 40457: SAMZA-788 - coordinator stream configuration should not guess the system names

2015-11-19 Thread Yi Pan (Data Infrastructure)
/apache/samza/test/integration/TestStatefulTask.scala (line 72) <https://reviews.apache.org/r/40457/#comment166220> The patch to SAMZA-754 already added this to StreamTaskTestUtils. - Yi Pan (Data Infrastructure) On Nov. 18, 2015, 10:13 p.m., Navina Ramesh

Re: Review Request 40485: SAMZA-767 yarn.queue option is not used anywhere

2015-11-19 Thread Yi Pan (Data Infrastructure)
/apache/samza/job/yarn/YarnJob.scala (line 76) <https://reviews.apache.org/r/40485/#comment166228> nit: better to follow the convention that each input argument to submitApplication() starts a newline and aligned. - Yi Pan (Data Infrastructure) On Nov. 19, 2015, 2:56 p.m., Aleksandar Pej

Re: Review Request 40525: SAMZA-819: RocksDbKeyValueStore.flush() should be implemented

2015-11-19 Thread Yi Pan (Data Infrastructure)
/TestRocksDbKeyValueStore.scala (line 84) <https://reviews.apache.org/r/40525/#comment166391> To make sure that it is flush() that write to the disk, not close(), you may want to keep this db open and open another db in read-only mode to verify that the read-only db sees the data. - Yi Pan

Re: Sporadic errors in JobRunner

2015-11-18 Thread Yi Pan
Hi, Rick, I think that you are running into SAMZA-754. I have a RB available for it already. I will upload the patch and it would be good if you can try the patch to see whether that solves your problem. -Yi On Tue, Nov 17, 2015 at 12:01 PM, Rick Mangi wrote: > Hi, getting

Re: Can't get all stored values via range iterator

2015-11-18 Thread Yi Pan
was calling > next() on a range iterator twice :(. > After removing the duplicate call everything works as expected. > > Thank you! > > Alex > > On Mon, Nov 16, 2015 at 10:45 PM, Yi Pan <nickpa...@gmail.com> wrote: > > > Hi, Alexander, > > >

<    1   2   3   4   5   6   7   8   >