Re: Samza and sliding window

2015-07-20 Thread Yi Pan
Hi, Shekar, It would also be helpful if you can post your job configuration on the pastebin s.t. I can test the same config. Thanks! -Yi On Mon, Jul 20, 2015 at 11:11 AM, Shekar Tippur ctip...@gmail.com wrote: Yi, Thanks a lot. - Shekar

Re: Samza and sliding window

2015-07-17 Thread Yi Pan
Hi, Shekar, If possible, could you share your code somewhere? I can try to dig into it this weekend. Thanks! -Yi On Fri, Jul 17, 2015 at 1:31 PM, Shekar Tippur ctip...@gmail.com wrote: Any takers on this please? - Shekar

Re: Handling task process failure

2015-07-16 Thread Yi Pan
Hi, Dmitry, There isn't the best way for all scenarios, IMO. For example, if the exception is critical and the application can not afford to ignore the failure, throw the exception uncaught is proper, which would fail the container and allows the application to restart from the previous

Re: Thoughts and obesrvations on Samza

2015-07-13 Thread Yi Pan
Hi, Garry, Just want to chime in to state our experience in LinkedIn. In LinkedIn, we have a lot of aggregation/transformation stream processing jobs that falls into the transformation category. That's also the motivation for us to develop the SQL layer on top of streams to allow easy programming

Re: Thoughts and obesrvations on Samza

2015-07-13 Thread Yi Pan
Hi, Jay, Given all the user concerns, the board disagreement on sub-projects, I am supporting your 5th option as well. As you said, even the end goal is the same, it might help to pave a smooth path forward. One thing I learned over the years is that what we planned for may not be the final

Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Yi Pan
Hi, Chris, Thanks for sending out this concrete set of points here. I agree w/ all but have a slight different point view on 8). My view on this is: instead of sunset Samza as TLP, can we re-charter the scope of Samza to be the home for running streaming process as a service? My main motivation

Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Yi Pan
, 2015 at 7:29 PM, Yi Pan nickpa...@gmail.com wrote: Hi, Chris, Thanks for sending out this concrete set of points here. I agree w/ all but have a slight different point view on 8). My view on this is: instead of sunset Samza as TLP, can we re-charter the scope of Samza to be the home for running

Review Request 36398: SAMZA-714: update release doc in 0.9.1

2015-07-10 Thread Yi Pan (Data Infrastructure)
/36398/diff/ Testing --- Verified w/ RB #36384 on samza-hello-samza Thanks, Yi Pan (Data Infrastructure)

Review Request 36405: SAMZA-714: Doc publish for 0.9.1 release

2015-07-10 Thread Yi Pan (Data Infrastructure)
://reviews.apache.org/r/36405/diff/ Testing --- Thanks, Yi Pan (Data Infrastructure)

Re: [VOTE] Apache Samza 0.9.1 RC1

2015-07-08 Thread Yi Pan
Hi, all, If there is no objection, I plan to close this vote as passed today. So far, counting the vote +1 from myself, we have got: RC1: +1 (binding) x 5 and +1 (non-binding) x2 Thanks! -Yi On Tue, Jul 7, 2015 at 11:10 AM, Yi Pan nickpa...@gmail.com wrote: Hi, all, Is the vote done? We

Re: Review Request 35676: Checkpoint migration

2015-07-08 Thread Yi Pan (Data Infrastructure)
review comment. Thanks! samza-kafka/src/main/scala/old/checkpoint/KafkaCheckpointManager.scala (line 236) https://reviews.apache.org/r/35676/#comment144132 nit: reading both changelog partition mapping and checkpoint? - Yi Pan (Data Infrastructure) On July 8, 2015, 1:41 a.m., Naveen

Re: Powered by page update

2015-07-08 Thread Yi Pan
Hey, all, Reviving this thread. It would be really nice if we can update the Powered-by page when releasing 0.9.1. Thanks a lot! -Yi On Tue, Jun 16, 2015 at 5:31 PM, Chris Riccomini criccom...@apache.org wrote: Hey all, I'm seeing a lot of new faces on the mailing list, which is really

Re: [VOTE] Apache Samza 0.9.1 RC1

2015-07-07 Thread Yi Pan
Hi, all, Is the vote done? We have got 4 binding and 2 un-binding votes for +1 so far. Thanks! -Yi On Mon, Jul 6, 2015 at 12:45 PM, Martin Kleppmann mar...@kleppmann.com wrote: +1 (binding) on RC1. Verified sig, built, tested with hello-samza. On 2 Jul 2015, at 19:22, Yi Pan nickpa

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Yi Pan
other systems. But I think I may actually be misunderstanding your proposal... -Jay On Mon, Jul 6, 2015 at 11:30 AM, Yi Pan nickpa...@gmail.com wrote: Hi, Martin, Great to hear your voice! I will just try to focus on your questions regarding to w/o YARN part. {quote} For example

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Yi Pan
it is necessarily a massive change and would give more flexibility for the variety of cases. -Jay On Thu, Jul 2, 2015 at 3:38 PM, Yi Pan nickpa...@gmail.com wrote: @Guozhang, yes, that's what I meant. From Kafka consumers' point of view, it pretty much boils down to answer the following

Re: Samza and sliding window

2015-07-02 Thread Yi Pan
Hi, Shekar, Sorry I was not able to follow up w/ you in time. It is great that you have found the configure problem and made it work! As for the exception on the iterator, could you send us the log w/ the exception? Thanks! -Yi On Thu, Jul 2, 2015 at 4:36 PM, Shekar Tippur ctip...@gmail.com

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Yi Pan
Hi, all, Thanks Chris for sending out this proposal and Jay for sharing the extremely illustrative prototype code. I have been thinking it over many times and want to list out my personal opinions below: 1. Generally, I agree with most of the people here on the mailing list on two points:

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Yi Pan
the actual resource assignment, process restart, etc, right? Is the additional value add of the JobCoordinator just partition management? -Jay On Thu, Jul 2, 2015 at 11:32 AM, Yi Pan nickpa...@gmail.com wrote: Hi, all, Thanks Chris for sending out this proposal and Jay for sharing

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Yi Pan
all that, my main point is simple: I am proposing that we need a pluggable partition management component, decoupled from the framework to do resource assignment, process restart, etc. On Thu, Jul 2, 2015 at 2:35 PM, Yi Pan nickpa...@gmail.com wrote: @Jay, yes, the current function

Re: Samza and sliding window

2015-06-29 Thread Yi Pan
Hi, Shekar, First, I would like to clarify what you meant by sliding window: is it defined as windows with size N and advance step size of 1 (which means that windows overlap and each input message would contribute to multiple counts in different windows)? Or windows with size N and advance step

Re: Hopping and tumbling windows in streaming SQL

2015-06-29 Thread Yi Pan
Hey, Julian, That's awesome! I read through all the examples and it is really easy to express most of our use cases now! Thanks a lot! I have just a few additional points here: Q5. Aligned tumbling window TUMBLE does not have an align argument, so you need to use HOP. SELECT STREAM

[VOTE] Apache Samza 0.9.1 RC1

2015-06-28 Thread Yi Pan
Hey all, This is a call for a vote on a release of Apache Samza 0.9.1. This is a bug-fix release against 0.9.0. The release candidate can be downloaded from here: http://people.apache.org/~nickpan47/samza-0.9.1-rc1/ The release candidate is signed with pgp key 911402D8, which is included in

Re: [SAMZA-690] Changelog topic creation should not be in the container code

2015-06-25 Thread Yi Pan
Hi, Robert, Thanks for digging into this. I am embedding my answers below: On Thu, Jun 25, 2015 at 7:40 AM, Robert Zuljevic r.zulje...@levi9.com wrote: 1. Is checkpoint topic referred to in the description coordinator stream/topic? In the master branch, checkpoint topic is

Re: Installing Samza w/o internet connection

2015-06-25 Thread Yi Pan
Hi, Amos, I assume that you are referring to preparing the build environment for Samza source code. As Milinda said, to set up the build environment, you will need a) an Internet connection to download required packages from Maven; b) a cached collection of required package on your local machine.

Re: [VOTE] Apache Samza 0.9.1 RC0

2015-06-25 Thread Yi Pan
. After completing the vote, you can release the artifacts to the public repository by clicking the release button. :) Thanks, Fang, Yan yanfang...@gmail.com On Mon, Jun 22, 2015 at 5:30 PM, Yi Pan nickpa...@gmail.com wrote: Hi, Yan, Thanks for point out that! Actually I saw that last

Re: Review Request 35723: SAMZA-720: fix bootstrap hangs when container number 1

2015-06-22 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35723/#review88820 --- Ship it! +1. LGTM. Thanks for the quick fix, Yan. - Yi Pan (Data

Re: [VOTE] Apache Samza 0.9.1 RC0

2015-06-22 Thread Yi Pan
, 2015 at 5:25 PM, Yan Fang yanfang...@gmail.com wrote: Hi Yi Pan, Is there any document regarding to how to publish the maven staging link? -- Yes. Check the last part of the https://github.com/apache/samza/blob/master/RELEASE.md . Not sure if you have seen this. I should have pointed

Re: [VOTE] Apache Samza 0.9.1 RC0

2015-06-22 Thread Yi Pan
On Fri, Jun 19, 2015 at 10:03 AM, Yi Pan nickpa...@gmail.com wrote: +1. Ran the Samza failure test suite and succeeded over night. On Wed, Jun 17, 2015 at 5:54 PM, Guozhang Wang wangg...@gmail.com wrote: Hey all, This is a call for a vote

Re: Review Request 35397: Fix SAMZA-697

2015-06-19 Thread Yi Pan (Data Infrastructure)
{ task.process(envelope, collector, coordinator) } } ... - Yi Pan (Data Infrastructure) On June 18, 2015, 6:42 p.m., Guozhang Wang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r

Re: Measuring Samza Job Throughput

2015-06-17 Thread Yi Pan
Hi, Milinda, Tao @LinkedIn has done some Samza benchmark test using a standard word-count task. You may want to reach out to him for some detailed ideas on how to set up the perf tests. Best! -Yi On Wed, Jun 17, 2015 at 11:25 AM, Milinda Pathirage mpath...@umail.iu.edu wrote: Thank you all

Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Yi Pan
Hi, Shekar, This 0.9.1 is a bug-fix only release. No features added yet. New features are expected in 0.10.0. Thanks! On Tue, Jun 16, 2015 at 10:59 AM, Shekar Tippur ctip...@gmail.com wrote: Wang, I have not caught up but can you please highlight if there are any feature additions as well?

Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Yi Pan
+1 Agreed. Thanks! On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote: Agreed on this. Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against

Re: Review Request 35325: SAMZA-698: update Samza and Spark Streaming Comparison

2015-06-12 Thread Yi Pan (Data Infrastructure)
to claim exact-once under the asumption that the system running healthy, - Yi Pan (Data Infrastructure) On June 12, 2015, 11:54 p.m., Yan Fang wrote: --- This is an automatically generated e-mail. To reply, visit: https

Confluent wiki pages are down

2015-06-12 Thread Yi Pan
Hi, all, Just FYI that the cwiki links are down now. I have filed an infra ticket for that: INFRA-9806 - Cwiki site down for Samza https://issues.apache.org/jira/browse/INFRA-9806 -Yi

Re: Containers stuck in event loop

2015-06-02 Thread Yi Pan
Hi, Davide, Which version of Samza are you using now? Did you check SAMZA-608? It seems to me that you may be experiencing that bug. We are including this fix in the upcoming release soon. Regards! -Yi On Tue, Jun 2, 2015 at 12:44 AM, Davide Simoncelli netcelli@gmail.com wrote: Hello,

Re: ProcessJobFactory parent process

2015-06-01 Thread Yi Pan
, Lukas Steiblys lu...@doubledutch.me wrote: Yes, I think switching to ThreadJobFactory is a good solution. I think the reasons why I switched to ProcessJobFactory earlier no longer hold true. Thanks. Lukas -Original Message- From: Yi Pan Sent: Friday, May 29

Re: Review Request 34746: Adding new CoordinatorStreamMessage SetContainerHostMapping and LocalityManager (SAMZA-618)

2015-06-01 Thread Yi Pan (Data Infrastructure)
On May 30, 2015, 8:58 a.m., Yi Pan (Data Infrastructure) wrote: samza-core/src/main/java/org/apache/samza/container/LocalityManager.java, line 62 https://reviews.apache.org/r/34746/diff/2/?file=974783#file974783line62 This would be invoked twice by checkpointManager

Re: Review Request 34746: Adding new CoordinatorStreamMessage SetContainerHostMapping and LocalityManager (SAMZA-618)

2015-06-01 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34746/#review86125 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On May 30

Re: [2/2] samza git commit: Yi's TopologyBuilder RB 34500

2015-06-01 Thread Yi Pan
Hi, Milinda, That was an accidental mistake. I have reverted the check-in. I am still working on that. Thanks! -Yi On Mon, Jun 1, 2015 at 9:34 PM, Milinda Pathirage mpath...@umail.iu.edu wrote: Hi Navina, Did we decided to push this patch to samza-sql branch. I thought Yi is still working

Re: Review Request 34500: SAMZA-552 Operator API change: builder and simplified operator classes

2015-05-29 Thread Yi Pan (Data Infrastructure)
completely remove that. Yi Pan (Data Infrastructure) wrote: Thank you both for the good points here. @Navina, yes, the basic idea for the topology builder is exactly what you mentioned and the model you illustrated is much simpler and very attractive. The issue I saw

Re: ProcessJobFactory parent process

2015-05-29 Thread Yi Pan
Hi, Lukas, I assume that when you say the job crashes, you were referring to the child process running the container, not the parent process? If yes, we were actually talking about adding container health-check/failure-detection in the JobCoordinator. SAMZA-680 would be the good place to start

Re: ProcessJobFactory parent process

2015-05-29 Thread Yi Pan
at 12:59 PM, Lukas Steiblys lu...@doubledutch.me wrote: Yes, I'm talking about the child process crashing. I'd like the parent to die as well if the child crashes so Docker can understand that the process failed and restart the container. Lukas -Original Message- From: Yi Pan Sent

Review Request 34664: SAMZA-552 Operator API change: builder and simplified operator classes

2015-05-26 Thread Yi Pan (Data Infrastructure)
commit of the following: commit fad81106901e494d3950eeaafaeefef482ac0125 Author: Yi Pan (Data Infrastructure) yi...@linkedin.com Date: Mon May 25 23:40:00 2015 -0700 SAMZA-650 window message store and window store implementation commit 58c2eeebf4bb0975f70aeba733379e1104f3a7de Author: Yi Pan

Re: Do we want to release the 0.9.1 now?

2015-05-22 Thread Yi Pan
above and if you can give a +1 to move forward quickly with 0.9.1 release, that would be great! Thanks a lot! -Yi On Thu, May 21, 2015 at 4:21 PM, Yi Pan nickpa...@gmail.com wrote: Hi, Jakob, Thanks a lot for the thorough check-through. I agree w/ your point that those bug fixes

Re: Do we want to release the 0.9.1 now?

2015-05-21 Thread Yi Pan
Hi, Yan, I am voting to start it now. Guozhang has already signed up to follow the release process that Chris wrote up. There will be an announcement soon. Thanks! -Yi On Thu, May 21, 2015 at 2:21 PM, Yan Fang yanfang...@gmail.com wrote: Hi guys, Just ask, are there any other bugs that we

Review Request 34574: SAMZA-608; don't hange on serde errors in system consumers

2015-05-21 Thread Yi Pan (Data Infrastructure)
/SystemConsumers.scala 125d37602e2c0a9da75674f37580a1ac02f94796 samza-core/src/test/scala/org/apache/samza/system/TestSystemConsumers.scala 3fdc781c1275f928f4b51b01869e1122502a2c08 Diff: https://reviews.apache.org/r/34574/diff/ Testing --- Thanks, Yi Pan (Data Infrastructure)

Re: Do we want to release the 0.9.1 now?

2015-05-21 Thread Yi Pan
Pan (Data Infrastructure) * a09b1ff - SAMZA-646: Remove support for JDK6 (5 weeks ago) Jakob Homan * ffa84c0 - SAMZA-608; don't hange on serde errors in system consumers (5 weeks ago) Yi Pan * 3eb15a0 - SAMZA-629: add instructions for upgrading websites when releasing new version (6 weeks ago

Library version conflict issues

2015-05-20 Thread Yi Pan
Hi, all, Just curious about one thing: - Samza as a platform brings in a set of dependency libraries - Applications developed in Samza may bring in other libraries that conflicts w/ the Samza libraries (we have got one use case that requires jackson 1.4.2 which conflicts with jackson 1.8.5 that

Re: Review Request 34206: WIP: update operator API to allow callbacks and allow a single API to trigger OperatorRouter execution w/ user callbacks

2015-05-15 Thread Yi Pan (Data Infrastructure)
/#review83937 --- On May 15, 2015, 2:16 a.m., Yi Pan (Data Infrastructure) wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34206

Re: Review Request 33735: RocksDB TTL support

2015-05-14 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33735/#review83847 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On May 13

Re: Log rotation on Samza/yarn logs

2015-05-14 Thread Yi Pan
Hi, Shekar, Are you having a problem w/ retention of too many old log files on disk? I did a quick search online to see whether there is any configuration for DailyRollingFileAppender and couldn't find any. The closest thing is this one:

Re: Review Request 34206: WIP: update operator API to allow callbacks and allow a single API to trigger OperatorRouter execution w/ user callbacks

2015-05-14 Thread Yi Pan (Data Infrastructure)
/UserCallbacksSqlTask.java PRE-CREATION Diff: https://reviews.apache.org/r/34206/diff/ Testing --- Thanks, Yi Pan (Data Infrastructure)

Re: Review Request 34206: WIP: update operator API to allow callbacks and allow a single API to trigger OperatorRouter execution w/ user callbacks

2015-05-14 Thread Yi Pan (Data Infrastructure)
--- Thanks, Yi Pan (Data Infrastructure)

Review Request 34206: WIP: update operator API to allow callbacks and allow a single API to trigger OperatorRouter execution w/ user callbacks

2015-05-14 Thread Yi Pan (Data Infrastructure)
/test/java/org/apache/samza/task/sql/UserCallbacksSqlTask.java PRE-CREATION Diff: https://reviews.apache.org/r/34206/diff/ Testing --- Thanks, Yi Pan (Data Infrastructure)

Re: Review Request 34009: SAMZA-552 window store implementation

2015-05-13 Thread Yi Pan (Data Infrastructure)
. - Yi --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34009/#review83667 --- On May 13, 2015, 5:36 p.m., Yi Pan (Data Infrastructure) wrote

Re: Review Request 34009: WIP: SAMZA-650 window store implementation

2015-05-12 Thread Yi Pan (Data Infrastructure)
--- On May 9, 2015, 1:52 a.m., Yi Pan (Data Infrastructure) wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34009

Re: Review Request 34009: WIP: SAMZA-650 window store implementation

2015-05-12 Thread Yi Pan (Data Infrastructure)
://reviews.apache.org/r/34009/#review83395 --- On May 9, 2015, 1:52 a.m., Yi Pan (Data Infrastructure) wrote: --- This is an automatically generated e-mail. To reply, visit: https

Re: Review Request 34009: WIP: SAMZA-650 window store implementation

2015-05-12 Thread Yi Pan (Data Infrastructure)
, 2015, 1:52 a.m., Yi Pan (Data Infrastructure) wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34009/ --- (Updated May 9, 2015, 1

Review Request 34009: WIP: SAMZA-650 window store implementation

2015-05-08 Thread Yi Pan (Data Infrastructure)
/StreamSqlTask.java PRE-CREATION samza-sql-core/src/test/java/org/apache/samza/task/sql/UserCallbacksSqlTask.java PRE-CREATION Diff: https://reviews.apache.org/r/34009/diff/ Testing --- Thanks, Yi Pan (Data Infrastructure)

Re: Review Request 34009: WIP: SAMZA-650 window store implementation

2015-05-08 Thread Yi Pan (Data Infrastructure)
/test/java/org/apache/samza/task/sql/UserCallbacksSqlTask.java PRE-CREATION Diff: https://reviews.apache.org/r/34009/diff/ Testing (updated) --- ./gradlew clean build passed Thanks, Yi Pan (Data Infrastructure)

Re: Review Request 33749: WIP: SAMZA-650 window store implementation

2015-05-07 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33749/#review82821 --- On May 4, 2015, 6:58 a.m., Yi Pan (Data Infrastructure) wrote

Re: Review Request 33749: WIP: SAMZA-650 window store implementation

2015-05-07 Thread Yi Pan (Data Infrastructure)
. To reply, visit: https://reviews.apache.org/r/33749/#review82824 --- On May 4, 2015, 6:58 a.m., Yi Pan (Data Infrastructure) wrote: --- This is an automatically generated e-mail

Re: Review Request 33488: SAMZA-657

2015-05-07 Thread Yi Pan (Data Infrastructure)
/samza/test/integration/join/Emitter.java https://reviews.apache.org/r/33488/#comment133628 nit: There are still many trailing white spaces. We should remove them. - Yi Pan (Data Infrastructure) On April 27, 2015, 7:59 p.m., Guozhang Wang wrote

Re: Review Request 33761: Fix SAMZA-658

2015-05-06 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33761/#review82777 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On May 6, 2015

Re: Review Request 33453: SAMZA-557 Reuse local state in SamzaContainer on clean shutdown

2015-05-06 Thread Yi Pan (Data Infrastructure)
-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala https://reviews.apache.org/r/33453/#comment133550 Should be default here. - Yi Pan (Data Infrastructure) On May 6, 2015, 6:22 a.m., Navina Ramesh wrote

Re: Review Request 33749: WIP: SAMZA-650 window store implementation

2015-05-06 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33749/#review82674 --- On May 4, 2015, 6:58 a.m., Yi Pan (Data Infrastructure) wrote

Re: Review Request 33749: WIP: SAMZA-650 window store implementation

2015-05-06 Thread Yi Pan (Data Infrastructure)
--- On May 4, 2015, 6:58 a.m., Yi Pan (Data Infrastructure) wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33749

Re: What next for streaming SQL?

2015-05-05 Thread Yi Pan
Hi, Julian, Great! I am looking forward to it. Could you help to answer my question regarding to the sliding windows in the previous email? Thanks a lot! -Yi On Tue, May 5, 2015 at 10:46 AM, Julian Hyde jul...@hydromatic.net wrote: On May 4, 2015, at 10:52 AM, Yi Pan nickpa...@gmail.com

Re: Local state in Samza - sharing data between tasks

2015-05-05 Thread Yi Pan
Hi, Andreas, Are you describing a use case where the *same* copy of data is shared among all tasks? That will depend on a lot factors: 1. is your data size huge? 2. Can your data be partitioned to work with a single partition of input stream? 3. Do you have a means to bootstrap the data from a

Re: Questions regarding Samza in production

2015-05-05 Thread Yi Pan
Hi, Jose, Good to know that you chose Samza! I will embed my answers inline below: On Mon, May 4, 2015 at 5:02 PM, José Barrueta j...@stormpath.com wrote: - I assume caching will help a lot with serialization/deserialization of the Value, but have you guys used the value to be of type

Re: What next for streaming SQL?

2015-05-04 Thread Yi Pan
Hi, Julian, Thanks for the reply. I want to add a few more points here: {quote} Once you have computed that boundary and stored it in your data structure you can keep on adding rows until you see one rowtime 11:00:00 or higher. {quote} The above is not true when the incoming messages in the

Re: Review Request 33761: Fix SAMZA-658

2015-05-04 Thread Yi Pan (Data Infrastructure)
/KeyValueStorageEngine.scala https://reviews.apache.org/r/33761/#comment133095 nit: same here. - Yi Pan (Data Infrastructure) On May 1, 2015, 6:43 p.m., Guozhang Wang wrote: --- This is an automatically generated e-mail. To reply, visit: https

Re: Review Request 33146: New KeyValueStore Features

2015-05-04 Thread Yi Pan (Data Infrastructure)
getAll() vs many get() that directly hitting RocksDB APIs? - Yi Pan (Data Infrastructure) On May 4, 2015, 4:27 a.m., Mohamed Mahmoud (El-Geish) wrote: --- This is an automatically generated e-mail. To reply, visit: https

Re: Review Request 33146: New KeyValueStore Features

2015-05-04 Thread Yi Pan (Data Infrastructure)
On May 4, 2015, 8:14 p.m., Yi Pan (Data Infrastructure) wrote: samza-test/src/main/scala/org/apache/samza/test/performance/TestKeyValuePerformance.scala, line 320 https://reviews.apache.org/r/33146/diff/5-6/?file=943969#file943969line320 This is a bit confusing to me: why do we

Re: Review Request 33453: SAMZA-557 Reuse local state in SamzaContainer on clean shutdown

2015-04-28 Thread Yi Pan (Data Infrastructure)
. - Yi Pan (Data Infrastructure) On April 22, 2015, 9:54 p.m., Navina Ramesh wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33453

Re: Review Request 33146: New KeyValueStore Features

2015-04-26 Thread Yi Pan (Data Infrastructure)
On April 24, 2015, 5:01 p.m., Yi Pan (Data Infrastructure) wrote: Ship It! Mohamed Mahmoud (El-Geish) wrote: I don't have access to commit. Can you please grant me access or commit for me? Thanks! Hi, MOhamed, I was trying to go through all the tests w/ your patch. After all

Re: Questions about partitioning

2015-04-24 Thread Yi Pan
Hi, Susan, Welcome to Samza! First I will try to answer your question about partition assignment in Samza. The assignment from stream partition to Samza tasks is determined by the SystemStreamPartitionGrouper. The default implementation include two assignment methods: 1 task per system stream

Re: Review Request 33146: New KeyValueStore Features

2015-04-24 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33146/#review81497 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On April 24

Re: Review Request 33146: New KeyValueStore Features

2015-04-21 Thread Yi Pan (Data Infrastructure)
/TestKeyValueStores.scala https://reviews.apache.org/r/33146/#comment131257 nit: prefer not to re-order the methods if not necessary. - Yi Pan (Data Infrastructure) On April 16, 2015, 10:43 a.m., Mohamed Mahmoud (El-Geish) wrote

Re: Review Request 33146: New KeyValueStore Features

2015-04-21 Thread Yi Pan (Data Infrastructure)
On April 21, 2015, 6:49 p.m., Yi Pan (Data Infrastructure) wrote: samza-kv/src/main/java/org/apache/samza/storage/kv/KeyValueStore.java, line 33 https://reviews.apache.org/r/33146/diff/2/?file=931566#file931566line33 The signature of close() and flush() functions from

Re: Review Request 33219: [SAMZA-649] Create samza-sql-calcite module for Calcite SQL front end

2015-04-15 Thread Yi Pan (Data Infrastructure)
On April 15, 2015, 6:20 p.m., Yi Pan (Data Infrastructure) wrote: samza-sql-calcite/src/main/java/org/apache/samza/sql/calcite/schema/AvroSchemaConverter.java, line 37 https://reviews.apache.org/r/33219/diff/1/?file=930371#file930371line37 I assume that this class is used

Re: Review Request 33219: [SAMZA-649] Create samza-sql-calcite module for Calcite SQL front end

2015-04-15 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33219/#review80237 --- Ship it! +1 - Yi Pan (Data Infrastructure) On April 15, 2015, 2

Re: Review Request 33170: Renamed samza-sql to samza-sql-core

2015-04-14 Thread Yi Pan (Data Infrastructure)
-calcite samza-sql-core/src/test/java/org/apache/samza/sql/test/metadata/TestAvroSchemaConverter.java https://reviews.apache.org/r/33170/#comment129816 Same here. - Yi Pan (Data Infrastructure) On April 14, 2015, 3:14 p.m., Milinda Pathirage wrote

Re: Updating samza-sql branch to Java 1.7

2015-04-14 Thread Yi Pan
Merged master to samza-sql. On Tue, Apr 14, 2015 at 2:57 PM, Jakob Homan jgho...@gmail.com wrote: Yes, I removed the tests for JDK6 yesterday. We're 1.7 or above now for development. On 14 April 2015 at 12:47, Milinda Pathirage mpath...@umail.iu.edu wrote: Hi Devs, Calcite dropped

Re: Review Request 33142: [SAMZA-561] Review in progress

2015-04-14 Thread Yi Pan (Data Infrastructure)
. samza-sql/src/main/java/org/apache/samza/sql/operators/scan/ProjectableFilterableStreamScanSpec.java https://reviews.apache.org/r/33142/#comment129692 These calcite specific class should be moved out-of samza-sql-core module. - Yi Pan (Data Infrastructure) On April 13, 2015, 9:04 p.m., Yi

Review Request 33142: [SAMZA-561] Review in progress

2015-04-13 Thread Yi Pan (Data Infrastructure)
samza-test/src/main/python/tests/sql_tests.py PRE-CREATION samza-test/src/main/resources/orders.avsc PRE-CREATION samza-test/src/main/resources/orders.json PRE-CREATION Diff: https://reviews.apache.org/r/33142/diff/ Testing --- Thanks, Yi Pan (Data Infrastructure)

Re: Joining Avro records

2015-04-09 Thread Yi Pan
Hi, Roger, Good question on that. I am actually not aware of any automatic way of doing this in Avro. I have tried to add generic Schema and Data interface in samza-sql branch to address the morphing of the schemas from input streams to the output streams. The basic idea is to have wrapper Schema

Re: Stream SQL Query Planner Update

2015-04-06 Thread Yi Pan
Hi, Milinda, Great! Thanks for making the excellent progress in this! I will try to follow up with the patch today. Thanks! -Yi On Mon, Apr 6, 2015 at 11:00 AM, Milinda Pathirage mpath...@umail.iu.edu wrote: Hi All, I have attached a patch to SAMZA-561 (

Review Request 32872: SAMZA-571: add test to capture RecordTooLargeException

2015-04-06 Thread Yi Pan (Data Infrastructure)
d66b3bd070a4cef4b1d3dded1d79a33cbe3fa09b Diff: https://reviews.apache.org/r/32872/diff/ Testing --- Passed local test suite Thanks, Yi Pan (Data Infrastructure)

Re: Review Request 32407: SAMZA-571: add suppression interface for uncaught exceptions

2015-03-24 Thread Yi Pan (Data Infrastructure)
samza-core/src/test/scala/org/apache/samza/container/TestTaskInstance.scala 54b4df84f47f818d62ac0361196567ad1f430fde Diff: https://reviews.apache.org/r/32407/diff/ Testing (updated) --- Unit tests added. Pass with ./bin/check-all.sh Thanks, Yi Pan (Data Infrastructure)

Re: Review Request 32006: SAMZA-597

2015-03-13 Thread Yi Pan (Data Infrastructure)
/apache/samza/logging/log4j/serializers/LoggingEventJsonSerde.java https://reviews.apache.org/r/32006/#comment123934 nit: SerdeObject - Yi Pan (Data Infrastructure) On March 13, 2015, 12:57 a.m., Chris Riccomini wrote

Re: Review Request 32052: SAMZA-592

2015-03-13 Thread Yi Pan (Data Infrastructure)
in the cache, getOffsets would raise exception? And how do we capture that case? - Yi Pan (Data Infrastructure) On March 13, 2015, 7:56 p.m., Chris Riccomini wrote: --- This is an automatically generated e-mail. To reply, visit: https

Re: Review Request 32006: SAMZA-597

2015-03-13 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32006/#review76430 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On March 13

Re: Review Request 32052: SAMZA-592

2015-03-13 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32052/#review76442 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On March 13

Re: Review Request 32052: SAMZA-592

2015-03-13 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32052/#review76437 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On March 13

Re: Review Request 31909: SAMZA-590

2015-03-11 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31909/#review76120 --- Ship it! Ship It! - Yi Pan (Data Infrastructure) On March 10

Re: A question regarding to the default semantic meaning of join

2015-03-09 Thread Yi Pan
to their different windows are not equivalent. Julian On Fri, Mar 6, 2015 at 4:28 PM, Yi Pan nickpa...@gmail.com wrote: Hi, Julian, I am writing down some detailed examples of join and need your further help in understanding the semantic meaning of the following example: SELECT id, value

A question regarding to the default semantic meaning of join

2015-03-06 Thread Yi Pan
Hi, Julian, I am writing down some detailed examples of join and need your further help in understanding the semantic meaning of the following example: SELECT id, value, cost FROM Orders OVER (ROWS 3 PRECEDING) JOIN Shipments OVER (ROWS 3 PROCEDING) ON Orders.id = Shipments.id In this example,

Re: Handling defaults and windowed aggregates in stream queries

2015-03-06 Thread Yi Pan
and in our case it will be inside the query plan to operator router conversion phase. Thanks Milinda On Mon, Mar 2, 2015 at 2:31 PM, Yi Pan nickpa...@gmail.com wrote: Hi, Milinda, +1 on your default window idea. One question: what's the difference

Re: Handling defaults and windowed aggregates in stream queries

2015-03-02 Thread Yi Pan
move the window out from Project. I’ll see how we can do this. Also I’ll go ahead and implement default windows. We can change it later if Julian or someone from Calcite comes up with a better suggestion. Thanks Milinda On Sun, Mar 1, 2015 at 8:23 PM, Yi Pan nickpa...@gmail.com wrote: Hi

<    2   3   4   5   6   7   8   >