Re: Terminology: Tumbling and sliding windows

2015-02-17 Thread Yi Pan
+1 on consolidating the terminology as well. Azure's definition looks good to me. On Tue, Feb 17, 2015 at 9:13 AM, Chris Riccomini criccom...@apache.org wrote: Hey Julian, +1 I'm not sure if we actually *are* using the right terminology, but I agree that Azure's terminology is what we should

A question regarding to the default semantic meaning of join

2015-03-06 Thread Yi Pan
Hi, Julian, I am writing down some detailed examples of join and need your further help in understanding the semantic meaning of the following example: SELECT id, value, cost FROM Orders OVER (ROWS 3 PRECEDING) JOIN Shipments OVER (ROWS 3 PROCEDING) ON Orders.id = Shipments.id In this example,

Re: A question regarding to the default semantic meaning of join

2015-03-09 Thread Yi Pan
to their different windows are not equivalent. Julian On Fri, Mar 6, 2015 at 4:28 PM, Yi Pan nickpa...@gmail.com wrote: Hi, Julian, I am writing down some detailed examples of join and need your further help in understanding the semantic meaning of the following example: SELECT id, value

Re: Handling defaults and windowed aggregates in stream queries

2015-03-06 Thread Yi Pan
and in our case it will be inside the query plan to operator router conversion phase. Thanks Milinda On Mon, Mar 2, 2015 at 2:31 PM, Yi Pan nickpa...@gmail.com wrote: Hi, Milinda, +1 on your default window idea. One question: what's the difference

Re: Handling defaults and windowed aggregates in stream queries

2015-03-02 Thread Yi Pan
move the window out from Project. I’ll see how we can do this. Also I’ll go ahead and implement default windows. We can change it later if Julian or someone from Calcite comes up with a better suggestion. Thanks Milinda On Sun, Mar 1, 2015 at 8:23 PM, Yi Pan nickpa...@gmail.com wrote: Hi

Re: Handling defaults and windowed aggregates in stream queries

2015-03-01 Thread Yi Pan
Hi, Milinda, Sorry to reply late on this. Here are some of my comments: 1) In Calcite's model, it seems that there is no stream-to-relation conversion step. In the first example where the window specification is missing, I like your solution to add the default LogicalNowWindow operator s.t. it

Re: Reprocessing and windowing

2015-02-23 Thread Yi Pan
Hey, Geoffry, We have started some work in SAMZA-552 to create a window operator API in samza, as part of effort to implement support for a high-level language. I will probably be able to have something to share in a few days and would love to get feedbacks regarding to the window operator.

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Yi Pan
of work and 90% of the functionality is identical in a streaming and non-streaming system. Lastly, building a stack based on extended standard SQL does not preclude adding other high-level languages on top of the algebra at a later date. Julian On Jan 28, 2015, at 5:36 PM, Yi Pan nickpa

Re: Streaming SQL - object models, ASTs and algebras

2015-01-29 Thread Yi Pan
Hyde jul...@hydromatic.net wrote: On Jan 29, 2015, at 3:04 PM, Yi Pan nickpa...@gmail.com wrote: Hi, Julian, Thanks for sharing your idea! It is interesting and well organized. Let me try to summarize the main difference between yours and the current proposal are: - removing

Re: Questions about partitioning

2015-04-24 Thread Yi Pan
Hi, Susan, Welcome to Samza! First I will try to answer your question about partition assignment in Samza. The assignment from stream partition to Samza tasks is determined by the SystemStreamPartitionGrouper. The default implementation include two assignment methods: 1 task per system stream

Re: What next for streaming SQL?

2015-05-04 Thread Yi Pan
Hi, Julian, Thanks for the reply. I want to add a few more points here: {quote} Once you have computed that boundary and stored it in your data structure you can keep on adding rows until you see one rowtime 11:00:00 or higher. {quote} The above is not true when the incoming messages in the

Re: What next for streaming SQL?

2015-05-05 Thread Yi Pan
Hi, Julian, Great! I am looking forward to it. Could you help to answer my question regarding to the sliding windows in the previous email? Thanks a lot! -Yi On Tue, May 5, 2015 at 10:46 AM, Julian Hyde jul...@hydromatic.net wrote: On May 4, 2015, at 10:52 AM, Yi Pan nickpa...@gmail.com

Re: Local state in Samza - sharing data between tasks

2015-05-05 Thread Yi Pan
Hi, Andreas, Are you describing a use case where the *same* copy of data is shared among all tasks? That will depend on a lot factors: 1. is your data size huge? 2. Can your data be partitioned to work with a single partition of input stream? 3. Do you have a means to bootstrap the data from a

Re: Questions regarding Samza in production

2015-05-05 Thread Yi Pan
Hi, Jose, Good to know that you chose Samza! I will embed my answers inline below: On Mon, May 4, 2015 at 5:02 PM, José Barrueta j...@stormpath.com wrote: - I assume caching will help a lot with serialization/deserialization of the Value, but have you guys used the value to be of type

Re: Log rotation on Samza/yarn logs

2015-05-14 Thread Yi Pan
Hi, Shekar, Are you having a problem w/ retention of too many old log files on disk? I did a quick search online to see whether there is any configuration for DailyRollingFileAppender and couldn't find any. The closest thing is this one:

Re: Updating samza-sql branch to Java 1.7

2015-04-14 Thread Yi Pan
Merged master to samza-sql. On Tue, Apr 14, 2015 at 2:57 PM, Jakob Homan jgho...@gmail.com wrote: Yes, I removed the tests for JDK6 yesterday. We're 1.7 or above now for development. On 14 April 2015 at 12:47, Milinda Pathirage mpath...@umail.iu.edu wrote: Hi Devs, Calcite dropped

Re: Stream SQL Query Planner Update

2015-04-06 Thread Yi Pan
Hi, Milinda, Great! Thanks for making the excellent progress in this! I will try to follow up with the patch today. Thanks! -Yi On Mon, Apr 6, 2015 at 11:00 AM, Milinda Pathirage mpath...@umail.iu.edu wrote: Hi All, I have attached a patch to SAMZA-561 (

Re: Joining Avro records

2015-04-09 Thread Yi Pan
Hi, Roger, Good question on that. I am actually not aware of any automatic way of doing this in Avro. I have tried to add generic Schema and Data interface in samza-sql branch to address the morphing of the schemas from input streams to the output streams. The basic idea is to have wrapper Schema

Library version conflict issues

2015-05-20 Thread Yi Pan
Hi, all, Just curious about one thing: - Samza as a platform brings in a set of dependency libraries - Applications developed in Samza may bring in other libraries that conflicts w/ the Samza libraries (we have got one use case that requires jackson 1.4.2 which conflicts with jackson 1.8.5 that

Re: Containers stuck in event loop

2015-06-02 Thread Yi Pan
Hi, Davide, Which version of Samza are you using now? Did you check SAMZA-608? It seems to me that you may be experiencing that bug. We are including this fix in the upcoming release soon. Regards! -Yi On Tue, Jun 2, 2015 at 12:44 AM, Davide Simoncelli netcelli@gmail.com wrote: Hello,

Re: [VOTE] Apache Samza 0.9.1 RC0

2015-06-22 Thread Yi Pan
, 2015 at 5:25 PM, Yan Fang yanfang...@gmail.com wrote: Hi Yi Pan, Is there any document regarding to how to publish the maven staging link? -- Yes. Check the last part of the https://github.com/apache/samza/blob/master/RELEASE.md . Not sure if you have seen this. I should have pointed

Re: [SAMZA-690] Changelog topic creation should not be in the container code

2015-06-25 Thread Yi Pan
Hi, Robert, Thanks for digging into this. I am embedding my answers below: On Thu, Jun 25, 2015 at 7:40 AM, Robert Zuljevic r.zulje...@levi9.com wrote: 1. Is checkpoint topic referred to in the description coordinator stream/topic? In the master branch, checkpoint topic is

Re: Installing Samza w/o internet connection

2015-06-25 Thread Yi Pan
Hi, Amos, I assume that you are referring to preparing the build environment for Samza source code. As Milinda said, to set up the build environment, you will need a) an Internet connection to download required packages from Maven; b) a cached collection of required package on your local machine.

Re: [VOTE] Apache Samza 0.9.1 RC0

2015-06-25 Thread Yi Pan
. After completing the vote, you can release the artifacts to the public repository by clicking the release button. :) Thanks, Fang, Yan yanfang...@gmail.com On Mon, Jun 22, 2015 at 5:30 PM, Yi Pan nickpa...@gmail.com wrote: Hi, Yan, Thanks for point out that! Actually I saw that last

Re: [VOTE] Apache Samza 0.9.1 RC0

2015-06-22 Thread Yi Pan
On Fri, Jun 19, 2015 at 10:03 AM, Yi Pan nickpa...@gmail.com wrote: +1. Ran the Samza failure test suite and succeeded over night. On Wed, Jun 17, 2015 at 5:54 PM, Guozhang Wang wangg...@gmail.com wrote: Hey all, This is a call for a vote

Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Yi Pan
Hi, Shekar, This 0.9.1 is a bug-fix only release. No features added yet. New features are expected in 0.10.0. Thanks! On Tue, Jun 16, 2015 at 10:59 AM, Shekar Tippur ctip...@gmail.com wrote: Wang, I have not caught up but can you please highlight if there are any feature additions as well?

Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Yi Pan
+1 Agreed. Thanks! On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote: Agreed on this. Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against

Re: Measuring Samza Job Throughput

2015-06-17 Thread Yi Pan
Hi, Milinda, Tao @LinkedIn has done some Samza benchmark test using a standard word-count task. You may want to reach out to him for some detailed ideas on how to set up the perf tests. Best! -Yi On Wed, Jun 17, 2015 at 11:25 AM, Milinda Pathirage mpath...@umail.iu.edu wrote: Thank you all

Confluent wiki pages are down

2015-06-12 Thread Yi Pan
Hi, all, Just FYI that the cwiki links are down now. I have filed an infra ticket for that: INFRA-9806 - Cwiki site down for Samza https://issues.apache.org/jira/browse/INFRA-9806 -Yi

Re: ProcessJobFactory parent process

2015-06-01 Thread Yi Pan
, Lukas Steiblys lu...@doubledutch.me wrote: Yes, I think switching to ThreadJobFactory is a good solution. I think the reasons why I switched to ProcessJobFactory earlier no longer hold true. Thanks. Lukas -Original Message- From: Yi Pan Sent: Friday, May 29

Re: ProcessJobFactory parent process

2015-05-29 Thread Yi Pan
Hi, Lukas, I assume that when you say the job crashes, you were referring to the child process running the container, not the parent process? If yes, we were actually talking about adding container health-check/failure-detection in the JobCoordinator. SAMZA-680 would be the good place to start

Re: ProcessJobFactory parent process

2015-05-29 Thread Yi Pan
at 12:59 PM, Lukas Steiblys lu...@doubledutch.me wrote: Yes, I'm talking about the child process crashing. I'd like the parent to die as well if the child crashes so Docker can understand that the process failed and restart the container. Lukas -Original Message- From: Yi Pan Sent

Re: [2/2] samza git commit: Yi's TopologyBuilder RB 34500

2015-06-01 Thread Yi Pan
Hi, Milinda, That was an accidental mistake. I have reverted the check-in. I am still working on that. Thanks! -Yi On Mon, Jun 1, 2015 at 9:34 PM, Milinda Pathirage mpath...@umail.iu.edu wrote: Hi Navina, Did we decided to push this patch to samza-sql branch. I thought Yi is still working

Re: Samza and sliding window

2015-06-29 Thread Yi Pan
Hi, Shekar, First, I would like to clarify what you meant by sliding window: is it defined as windows with size N and advance step size of 1 (which means that windows overlap and each input message would contribute to multiple counts in different windows)? Or windows with size N and advance step

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Yi Pan
it is necessarily a massive change and would give more flexibility for the variety of cases. -Jay On Thu, Jul 2, 2015 at 3:38 PM, Yi Pan nickpa...@gmail.com wrote: @Guozhang, yes, that's what I meant. From Kafka consumers' point of view, it pretty much boils down to answer the following

Re: Samza and sliding window

2015-07-02 Thread Yi Pan
Hi, Shekar, Sorry I was not able to follow up w/ you in time. It is great that you have found the configure problem and made it work! As for the exception on the iterator, could you send us the log w/ the exception? Thanks! -Yi On Thu, Jul 2, 2015 at 4:36 PM, Shekar Tippur ctip...@gmail.com

Re: Thoughts and obesrvations on Samza

2015-07-06 Thread Yi Pan
other systems. But I think I may actually be misunderstanding your proposal... -Jay On Mon, Jul 6, 2015 at 11:30 AM, Yi Pan nickpa...@gmail.com wrote: Hi, Martin, Great to hear your voice! I will just try to focus on your questions regarding to w/o YARN part. {quote} For example

[VOTE] Apache Samza 0.9.1 RC1

2015-06-28 Thread Yi Pan
Hey all, This is a call for a vote on a release of Apache Samza 0.9.1. This is a bug-fix release against 0.9.0. The release candidate can be downloaded from here: http://people.apache.org/~nickpan47/samza-0.9.1-rc1/ The release candidate is signed with pgp key 911402D8, which is included in

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Yi Pan
Hi, all, Thanks Chris for sending out this proposal and Jay for sharing the extremely illustrative prototype code. I have been thinking it over many times and want to list out my personal opinions below: 1. Generally, I agree with most of the people here on the mailing list on two points:

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Yi Pan
the actual resource assignment, process restart, etc, right? Is the additional value add of the JobCoordinator just partition management? -Jay On Thu, Jul 2, 2015 at 11:32 AM, Yi Pan nickpa...@gmail.com wrote: Hi, all, Thanks Chris for sending out this proposal and Jay for sharing

Re: Thoughts and obesrvations on Samza

2015-07-02 Thread Yi Pan
all that, my main point is simple: I am proposing that we need a pluggable partition management component, decoupled from the framework to do resource assignment, process restart, etc. On Thu, Jul 2, 2015 at 2:35 PM, Yi Pan nickpa...@gmail.com wrote: @Jay, yes, the current function

Re: Do we want to release the 0.9.1 now?

2015-05-22 Thread Yi Pan
above and if you can give a +1 to move forward quickly with 0.9.1 release, that would be great! Thanks a lot! -Yi On Thu, May 21, 2015 at 4:21 PM, Yi Pan nickpa...@gmail.com wrote: Hi, Jakob, Thanks a lot for the thorough check-through. I agree w/ your point that those bug fixes

Re: Do we want to release the 0.9.1 now?

2015-05-21 Thread Yi Pan
Hi, Yan, I am voting to start it now. Guozhang has already signed up to follow the release process that Chris wrote up. There will be an announcement soon. Thanks! -Yi On Thu, May 21, 2015 at 2:21 PM, Yan Fang yanfang...@gmail.com wrote: Hi guys, Just ask, are there any other bugs that we

Re: Do we want to release the 0.9.1 now?

2015-05-21 Thread Yi Pan
Pan (Data Infrastructure) * a09b1ff - SAMZA-646: Remove support for JDK6 (5 weeks ago) Jakob Homan * ffa84c0 - SAMZA-608; don't hange on serde errors in system consumers (5 weeks ago) Yi Pan * 3eb15a0 - SAMZA-629: add instructions for upgrading websites when releasing new version (6 weeks ago

Re: [VOTE] Apache Samza 0.9.1 RC1

2015-07-07 Thread Yi Pan
Hi, all, Is the vote done? We have got 4 binding and 2 un-binding votes for +1 so far. Thanks! -Yi On Mon, Jul 6, 2015 at 12:45 PM, Martin Kleppmann mar...@kleppmann.com wrote: +1 (binding) on RC1. Verified sig, built, tested with hello-samza. On 2 Jul 2015, at 19:22, Yi Pan nickpa

Re: [Discuss/Vote] upgrade to Yarn 2.6.0

2015-08-17 Thread Yi Pan
Hi, Yan, Thanks for rolling the ball! +1 from me to upgrade the minimum supported version to YARN 2.6, assuming that we are going to fix SAMZA-750 together. On Mon, Aug 17, 2015 at 4:41 PM, Yan Fang yanfang...@gmail.com wrote: Hi guys, we have been discussing upgrading to Yarn 2.6.0

Re: KIP-28 kafka processor

2015-08-18 Thread Yi Pan
Hi, Chris and Jay, Thanks for the reminder. I plan to follow up this week. Cheers! -Yi On Sun, Aug 16, 2015 at 12:27 PM, Jay Kreps jay.kr...@gmail.com wrote: +1 Any feedback would be appreciated! -Jay On Sat, Aug 15, 2015 at 3:55 PM, Chris Riccomini criccom...@apache.org wrote: Hey

Re: [Discuss/Vote] upgrade to Yarn 2.6.0

2015-08-24 Thread Yi Pan
. Thanks, Fang, Yan yanfang...@gmail.com On Thu, Aug 20, 2015 at 4:48 PM, Yi Pan nickpa...@gmail.com wrote: Hi, Selina, Samza 0.9.1 on YARN 2.6 is the proved working solution. Best, -Yi On Thu, Aug 20, 2015 at 12:28 PM, Selina Tech swucaree...@gmail.com wrote: Hi

Re: [Discuss/Vote] upgrade to Yarn 2.6.0

2015-08-20 Thread Yi Pan
Hi, Selina, Samza 0.9.1 on YARN 2.6 is the proved working solution. Best, -Yi On Thu, Aug 20, 2015 at 12:28 PM, Selina Tech swucaree...@gmail.com wrote: Hi, Yi: If I use Samza0.9.1 and Yarn2.6.0, Will the system be failed? Sincerely, Selina On Wed, Aug 19, 2015 at 1:58 PM, Yi Pan

Re: Hopping and tumbling windows in streaming SQL

2015-06-29 Thread Yi Pan
Hey, Julian, That's awesome! I read through all the examples and it is really easy to express most of our use cases now! Thanks a lot! I have just a few additional points here: Q5. Aligned tumbling window TUMBLE does not have an align argument, so you need to use HOP. SELECT STREAM

Re: Review Request 36815: SAMZA-741 Support for versioning with Elasticsearch Producer

2015-07-29 Thread Yi Pan
Hi, Roger, I am testing the patch now. Will update the JIRA soon. Thanks! -Yi On Wed, Jul 29, 2015 at 12:11 PM, Roger Hoover roger.hoo...@gmail.com wrote: Thank you, Dan. I think we're ready to merge. Can one of the Samza committers please take a look? On Wed, Jul 29, 2015 at 11:31 AM,

Re: [DISCUSS] Release 0.10.0

2015-07-30 Thread Yi Pan
...@gmail.com wrote: Thanks, Yi. I propose that we also include SAMZA-741 for Elasticsearch versioning support with the new ES producer. I think it's very close to being merged. Roger On Tue, Jul 28, 2015 at 10:08 PM, Yi Pan nickpa...@gmail.com

Re: [DISCUSS] Release 0.10.0

2015-07-30 Thread Yi Pan
on Linux boxes - SAMZA-723, stream appender deadlock issue Thanks! -Yi On Thu, Jul 30, 2015 at 4:52 PM, Yi Pan nickpa...@gmail.com wrote: Hi, all, Thanks a lot for helping out to select the features planned in 0.10.0. Based on the above discussion, I am proposing to move the following

Re: [DISCUSS] Release 0.10.0

2015-07-30 Thread Yi Pan
excluding SAMZA-723 from the current release ? Doesn't this break the existing StreamAppender functionality in 0.9? Thanks! Navina On Thu, Jul 30, 2015 at 4:55 PM, Yi Pan nickpa...@gmail.com wrote: Sorry, hit the send button too fast. Let me correct the summary section: 29/32 tickets

Re: Coordinator URL always 127.0.0.1

2015-07-30 Thread Yi Pan
here? How does the container communicate status to the AM? -Tommy From: Yi Pan [nickpa...@gmail.com] Sent: Thursday, July 30, 2015 6:48 PM To: dev@samza.apache.org Subject: Re: Coordinator URL always 127.0.0.1 Hi, Tommy, I think that it might

Re: [DISCUSS] Release 0.10.0

2015-07-30 Thread Yi Pan
at 5:24 PM, Yi Pan nickpa...@gmail.com wrote: Hi, Navina, The 29/30 tickets are to be excluded from 0.10.0 (i.e. moved to 0.11.0), the three tickets are either to be included in 0.10.0, or won't fix. -Yi On Thu, Jul 30, 2015 at 5:10 PM, Navina Ramesh nram...@linkedin.com.invalid wrote

Re: Coordinator URL always 127.0.0.1

2015-07-30 Thread Yi Pan
with the RM. So, the NM might still be using getLocalhost.getAddress(). I don't know of any other way to programmatically fetch the machine's hostname (apart from some hacky shell commands). Cheers, Navina On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan nickpa...@gmail.com wrote

Re: [DISCUSS] Release 0.10.0

2015-07-31 Thread Yi Pan
of exclusions! SAMZA-723 (StreamAppender bug) and SAMZA-747 (rocksdb) should be in 0.10.0. I think we don't have an ETA on the fix for SAMZA-747? Thanks! Navina On Thu, Jul 30, 2015 at 5:26 PM, Yi Pan nickpa...@gmail.com wrote: Uh... wrong math all the day today

[DISCUSS] Release 0.10.0

2015-07-28 Thread Yi Pan
Hi, all, I want to start the discussion on the release schedule for 0.10.0. There are a few important features that we plan to release in 0.10.0 and I want to start this thread s.t. we can agree on what to include in 0.10.0 release. There are the following main features added in 0.10.0: -

Re: Coordinator URL always 127.0.0.1

2015-07-30 Thread Yi Pan
Hi, Tommy, I think that it might be a commonly asked question regarding to multiple IPs on a single host. A common trick w/o changing code is (copied from SO: http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip ) {code} 1. Find

Re: Missing a change log offset for SystemStreamPartition

2015-08-11 Thread Yi Pan
supposed to load the DB to be able to use it from the consuming job? Is RocksDB the tool to use or should I use any other technique? Thanks, Jordi -Mensaje original- De: Yi Pan [mailto:nickpa...@gmail.com] Enviado el: martes, 11 de agosto de 2015 3:27 Para: dev

Re: question on commit on changelog

2015-08-04 Thread Yi Pan
Hi, Chen, So, is your goal to improve the throughput to the changelog topic or reduce the size of the changelog topic? If you are targeting for later and your KV-store truly is of the size of the input log, I don't see how it is possible. In a lot of use cases, users will only need to retain the

Re: Missing a change log offset for SystemStreamPartition

2015-08-10 Thread Yi Pan
Hi, Jordi, Agree with Yan. More specifically, your class definition should be something like: {code} public class testStore implements StreamTask, InitableTask { ... } {code} On Mon, Aug 10, 2015 at 6:08 PM, Yan Fang yanfang...@gmail.com wrote: Hi Jordi, I think, you need to implement the

Re: Kill All Jobs

2015-08-14 Thread Yi Pan
Hi, Jordi, Thanks a lot! I have added you to the contributor and assigned the bug to you. Once tested, we will commit it to samza-shell module. Cheers! -Yi On Fri, Aug 14, 2015 at 2:45 AM, Jordi Blasi Uribarri jbl...@nextel.es wrote: As I said I am new to this procedure, but I guess I have

Re: Samza and sliding window

2015-07-23 Thread Yi Pan
Yeah, that's why I added some test code in the window() to call store.all() and iterate through. I traced into it in my local environment and verified that the iterator is functioning with store.all(). -Yi On Thu, Jul 23, 2015 at 4:26 PM, Shekar Tippur ctip...@gmail.com wrote: Yi, In my

Re: Samza and sliding window

2015-07-22 Thread Yi Pan
Hi, Shekar, Here it is: http://pastebin.com/fKGpHwW6 -Yi On Wed, Jul 22, 2015 at 8:05 AM, Shekar Tippur ctip...@gmail.com wrote: Yi, I am not sure I see an attachment. Is it possible to paste that on pastebin? Shekar On Jul 21, 2015 4:27 PM, Yi Pan nickpa...@gmail.com wrote: Hi

Re: kafka producer failed

2015-07-24 Thread Yi Pan
Hi, Selina, Your question is not clear. {quote} When the messages was send to Kafka by KafkaProducer, It always failed when the message more than 3000 - 4000 messages. {quote} What's failing? The error stack shows errors on the consumer side and you were referring to failures to produce to

Re: Samza and sliding window

2015-07-21 Thread Yi Pan
is the config: http://pastebin.com/mCALEACs - Shekar On Mon, Jul 20, 2015 at 12:27 PM, Yi Pan nickpa...@gmail.com wrote: Hi, Shekar, It would also be helpful if you can post your job configuration on the pastebin s.t. I can test the same config. Thanks! -Yi On Mon, Jul 20

Re: Handling task process failure

2015-07-16 Thread Yi Pan
Hi, Dmitry, There isn't the best way for all scenarios, IMO. For example, if the exception is critical and the application can not afford to ignore the failure, throw the exception uncaught is proper, which would fail the container and allows the application to restart from the previous

Re: Samza and sliding window

2015-07-17 Thread Yi Pan
Hi, Shekar, If possible, could you share your code somewhere? I can try to dig into it this weekend. Thanks! -Yi On Fri, Jul 17, 2015 at 1:31 PM, Shekar Tippur ctip...@gmail.com wrote: Any takers on this please? - Shekar

Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Yi Pan
Hi, Chris, Thanks for sending out this concrete set of points here. I agree w/ all but have a slight different point view on 8). My view on this is: instead of sunset Samza as TLP, can we re-charter the scope of Samza to be the home for running streaming process as a service? My main motivation

Re: Thoughts and obesrvations on Samza

2015-07-12 Thread Yi Pan
, 2015 at 7:29 PM, Yi Pan nickpa...@gmail.com wrote: Hi, Chris, Thanks for sending out this concrete set of points here. I agree w/ all but have a slight different point view on 8). My view on this is: instead of sunset Samza as TLP, can we re-charter the scope of Samza to be the home for running

Re: Thoughts and obesrvations on Samza

2015-07-13 Thread Yi Pan
Hi, Garry, Just want to chime in to state our experience in LinkedIn. In LinkedIn, we have a lot of aggregation/transformation stream processing jobs that falls into the transformation category. That's also the motivation for us to develop the SQL layer on top of streams to allow easy programming

Re: Thoughts and obesrvations on Samza

2015-07-13 Thread Yi Pan
Hi, Jay, Given all the user concerns, the board disagreement on sub-projects, I am supporting your 5th option as well. As you said, even the end goal is the same, it might help to pave a smooth path forward. One thing I learned over the years is that what we planned for may not be the final

Re: Samza and sliding window

2015-07-20 Thread Yi Pan
Hi, Shekar, It would also be helpful if you can post your job configuration on the pastebin s.t. I can test the same config. Thanks! -Yi On Mon, Jul 20, 2015 at 11:11 AM, Shekar Tippur ctip...@gmail.com wrote: Yi, Thanks a lot. - Shekar

Re: Re: Samza processing reference data

2015-10-28 Thread Yi Pan
Hi, Chen, On Wed, Oct 28, 2015 at 4:05 AM, Yan Fang <yanfangw...@163.com> wrote: > > > * Is there a tentative date for 0.10.0 release? > I think it's coming out soon. @Yi Pan , he should know more about that. > There is a bit delay on the release date due to a rec

Re: Unable to submit Samza job into YARN RM

2015-11-08 Thread Yi Pan
Hi, Raja, Please watch SAMZA-727. There is a patch available already. Unfortunately, no one seems to have the time to test it and verifies it yet. If you want to pick up the patch and test/verify it, it would be great! Thanks! -Yi On Sun, Nov 8, 2015 at 12:24 AM, Raja.Aravapalli

Re: Problems upgrading Job

2015-11-12 Thread Yi Pan
es, that’s the output from JobRunner. I also tried setting a job.id to > see if this was an issue migrating from an old task checkpoint topic but I > got the same result. > > Would you like me to open a jira ticket? > > Thanks, > > Rick > > > > > On Nov 12, 2015,

Re: Problems upgrading Job

2015-11-12 Thread Yi Pan
Hi, Rick, Did you get the fix in SAMZA-723 in your test? And could you confirm that the errors are from JobRunner log? -Yi On Thu, Nov 12, 2015 at 8:48 AM, Rick Mangi wrote: > Hi, > > I’m trying to migrate our samza jobs to 0.10.0 snapshot (built against the > latest).

Re: Samza hdfs

2015-11-10 Thread Yi Pan
Hi, Aram, Welcome to the community! :) We are planning to release 0.10 in this month and YES HDFS producer is included in this release! -Yi On Tue, Nov 10, 2015 at 3:18 AM, Aram Mkrtchyan < aram.mkrtch...@picsart.com.invalid> wrote: > Hi guys, > > We are considering to move from spark

Re: Checkpoint tool not working

2015-10-29 Thread Yi Pan
Hi, Lukas, Which version of checkpoint-tool are you using? -Yi On Thu, Oct 29, 2015 at 5:39 PM, Lukas Steiblys wrote: > Hello, > > I’m trying to write the checkpoints for a Samza task supplying these > arguments to the checkpoint tool: > > bin/checkpoint-tool.sh >

Re: Does Samza create partitions automatically when sending messages?

2015-11-03 Thread Yi Pan
Hi, John, Unfortunately, Samza currently does not handle creation of output topic w/ user specified partitions. The construction of OutgoingMessageEnvelope only guarantees that the outgoing messages are partitioned by the specific key. Hence, the messages with the same partition key are always in

Re: KafkaCheckPointManager is too slow

2015-11-03 Thread Yi Pan
r_xxx_1 PartitionCount:1 > ReplicationFactor:3 Configs:segment.bytes=26214400,cleanup.policy=compact > > Topic: __samza_checkpoint_ver_1_for_xxx_1 Partition: 0 Leader: 66 Replicas: > 66,24,65 Isr: 24,65,66 > > So, the problem is not log-compaction. > > > > On Tue, Nov

Re: Detecting "done" on a bounded input dataset

2015-10-14 Thread Yi Pan
Hi, Kishore, First I want some clarification on your use case. 1) Scenario 1: you still want the Samza jobs continuously running, while simply want to detect the end of a certain stream. On detection, do you need to unsubscribe from the stream? Or you are still OK receiving more messages from the

Re: SamzaContainer NullPointerException for read byte[] topic from Kafka

2015-10-15 Thread Yi Pan
Hi, Selina, Your stack trace showed that the exception was thrown at line 50 in your task code. Could you point out which line is it? It would be helpful if you can add some log info regarding to the message you receive in the process() vs the message you read from Kafka console consumer.

Re: [REPORT] Apache Samza

2015-10-09 Thread Yi Pan
gt; > ## Issues: > - There are no issues requiring board attention at this time > > ## PMC changes: > > - Currently 10 PMC members. > - No new PMC members added in the last 3 months > - Last PMC addition was Yi Pan at Mon Jun 15 2015 > > ## LDAP changes: > > - Curr

Re: [VOTE] Apache Samza 0.9.1 RC1

2015-07-08 Thread Yi Pan
Hi, all, If there is no objection, I plan to close this vote as passed today. So far, counting the vote +1 from myself, we have got: RC1: +1 (binding) x 5 and +1 (non-binding) x2 Thanks! -Yi On Tue, Jul 7, 2015 at 11:10 AM, Yi Pan nickpa...@gmail.com wrote: Hi, all, Is the vote done? We

Re: Powered by page update

2015-07-08 Thread Yi Pan
Hey, all, Reviving this thread. It would be really nice if we can update the Powered-by page when releasing 0.9.1. Thanks a lot! -Yi On Tue, Jun 16, 2015 at 5:31 PM, Chris Riccomini criccom...@apache.org wrote: Hey all, I'm seeing a lot of new faces on the mailing list, which is really

Re: Sporadic errors in JobRunner

2015-11-18 Thread Yi Pan
Hi, Rick, I think that you are running into SAMZA-754. I have a RB available for it already. I will upload the patch and it would be good if you can try the patch to see whether that solves your problem. -Yi On Tue, Nov 17, 2015 at 12:01 PM, Rick Mangi wrote: > Hi, getting

Re: Can't get all stored values via range iterator

2015-11-18 Thread Yi Pan
was calling > next() on a range iterator twice :(. > After removing the duplicate call everything works as expected. > > Thank you! > > Alex > > On Mon, Nov 16, 2015 at 10:45 PM, Yi Pan <nickpa...@gmail.com> wrote: > > > Hi, Alexander, > > >

Re: Can't get all stored values via range iterator

2015-11-16 Thread Yi Pan
Hi, Alexander, Sorry to reply late on this one. I embedded my questions and comments in-between the lines: On Sun, Nov 15, 2015 at 7:07 PM, Alexander Filipchik wrote: > > nodeIterator = store.range( > String.join(".", nodeId, String.valueOf(Character.MIN_VALUE)),

Re: how to run samza container for hello-samza?

2015-08-30 Thread Yi Pan
get the two messages from two streams in StreamTask? (seems like IncomingMessageEnvelope that process() provides only represents one message from one stream, where would it take in the two messages sent to the same partition?) Thank you! Connie On Sun, Aug 30, 2015 at 8:34 PM, Yi Pan nickpa

Re: SAMZA build failing!!!

2015-08-26 Thread Yi Pan
Hi, Raja, Yeah, unfortunately, you are hitting another known issue: SAMZA-676 and https://github.com/facebook/rocksdb/issues/606 MacOS build works. And if you definitely need to build on Linux now, please use Samza 0.9.1. We are actively working on resolving this asap and apologize for the

Re: One task sending payload to multiple output streams

2015-09-02 Thread Yi Pan
Hi, Elangovan, Could you confirm how many containers in your job? And how is the outgoing messages partitioned on? Most likely, this is related to the choice on the outgoing message partition key, which is the only deciding factor for which partition of a topic the message is sent to. -Yi On

Re: question on commit on changelog

2015-08-25 Thread Yi Pan
explanation above), how do we support at least once processing? For example, what if the task commits offsets to checkpoint topic but the producer doesn't send all data in its buffer to the changelog topic and the task crashed? Chen On Tue, Aug 4, 2015 at 12:54 PM, Yi Pan nickpa...@gmail.com wrote

Re: Runtime Execution Model

2015-09-14 Thread Yi Pan
Hi, Bruno, The number of containers are configurable in YarnJobFactory via yarn.container.count. Each container is a single threaded model and you can run multiple tasks in a single container. At maximum, you can have as many containers as the number of tasks in this config to achieve 1 task /

Re: Runtime Execution Model

2015-09-14 Thread Yi Pan
gt; Hi Yi, > > Does a single task consume from a single partition or it consumes from > more/all partitions? > > Thanks > Bruno > > > On 14 Sep 2015, at 23:22, Yi Pan <nickpa...@gmail.com> wrote: > > > > Hi, Bruno, > > > &g

Re: Runtime Execution Model

2015-09-16 Thread Yi Pan
in the current code. Yeah, in short term, I think that we can enhance the current ProcessJobFactory to do this. In the long term, it may even be good to make ProcessJob as the standard Samza process model in all environments (i.e. YARN/Mesos/standalone). > > Thanks, > > Fang, Yan &g

Re: Runtime Execution Model

2015-09-16 Thread Yi Pan
per-task > single-threaded > programming model for the users > Do we already have this, or need to add that? This I think can be > done in current ProcessJob. We can have the same number of threads as the > tasks. > > Thanks, > > Fang, Yan > yanfang...@gmai

Re: process killing

2015-09-28 Thread Yi Pan
3431699703_0002_01_01 on node: host: kfk-samza01:36066 > #containers=0 available=2048 used=0 with event: KILL > 2015-09-28 11:22:23,169 INFO [ResourceManager Event Processor] > scheduler.AppSchedulingInfo (AppSchedulingInfo.java:clearRequests(115)) - > Application application_14434316

  1   2   3   4   5   6   7   8   >