Re: Do we want to release the 0.9.1 now?

2015-05-21 Thread Jakob Homan
On 21 May 2015 at 23:38, Yi Pan wrote: > if you can give a +1 to > move forward quickly with 0.9.1 release, that would be great! Done. Great job. Thanks.

Re: Do we want to release the 0.9.1 now?

2015-05-21 Thread Yi Pan
Hi, Jakob and Yan, I have back-ported SAMZA-608, SAMZA-658, and SAMZA-616 to 0.9.1 branch. And the 0.9.1 release will now include bugfixes: SAMZA-608 SAMZA-616 SAMZA-658 SAMZA-662 I think that would be a good minimum list of bugfixes. I have attached the back-ported fixes to all JIRAs mentioned a

Re: Do we want to release the 0.9.1 now?

2015-05-21 Thread Yan Fang
Agreed to back port those bug fixes. On the branch/tag methodology, I think we can start in next release because we already have opened the 0.9.1 branch and applied patch to it. Thanks, Fang, Yan yanfang...@gmail.com On Thu, May 21, 2015 at 4:21 PM, Yi Pan wrote: > Hi, Jakob, > > Thanks a lot

Re: Samza YarnJobFactory support for https

2015-05-21 Thread José Barrueta
Hi Yan, Happy to contribute with something back, for this amazing project! I'll go through the guide and submit a patch tomorrow! Best, José Luis On Thu, May 21, 2015 at 11:17 PM, Yan Fang wrote: > Hi José, > > Thank you. If you can contribute a patch for this fix (SAMZA-688 >

Re: Samza YarnJobFactory support for https

2015-05-21 Thread Yan Fang
Hi José, Thank you. If you can contribute a patch for this fix (SAMZA-688 ), it would be very helpful. And here is the guide for contributing. Cheers, Fang, Yan yanfang...@

Review Request 34585: SAMZA-681: create a unit test harness to easily test samza tasks

2015-05-21 Thread Luis De Pombo
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34585/ --- Review request for samza. Repository: samza Description --- SAMZA-681: c

Re: Samza YarnJobFactory support for https

2015-05-21 Thread Yi Pan
Hi, Jose, Thanks a lot! I I have opened a JIRA to support that: SAMZA-688. -Yi On Thu, May 21, 2015 at 8:03 PM, José Barrueta wrote: > Hi all, > > Once we figure it out the problem we were able to easily come up with a > solution for this. > > Basically, we want to be able to set the `yarn.pac

Samza YarnJobFactory support for https

2015-05-21 Thread José Barrueta
Hi all, Once we figure it out the problem we were able to easily come up with a solution for this. Basically, we want to be able to set the `yarn.package.path` property to look for an artifact over `https`, when we did this we ran into this exception: Exception in thread "main" java.io.IOExcepti

Review Request 34574: SAMZA-608; don't hange on serde errors in system consumers

2015-05-21 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34574/ --- Review request for samza, Yan Fang, Chinmay Soman, Chris Riccomini, Guozhang Wan

Re: Do we want to release the 0.9.1 now?

2015-05-21 Thread Yi Pan
Hi, Jakob, Thanks a lot for the thorough check-through. I agree w/ your point that those bug fixes are important and should be back ported. @Yan and others, what are your opinions? I will back port SAMZA-608 now. On the branch/tag methodology, I would prefer to make the change in the next releas

Re: Review Request 34011: Add support for a Graphite Metrics Reporter

2015-05-21 Thread Chinmay Soman
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34011/#review84718 --- docs/learn/documentation/versioned/container/metrics.md

Re: Do we want to release the 0.9.1 now?

2015-05-21 Thread Jakob Homan
Currently we have individual branches for each of the point releases (0.8.1, 0.9.0, 0.9.1, etc). It may be better to have a single 0.9 branch and then demarcate the actual releases through git tags. This way the branch becomes more stable over time as bug fixes are applied (in a magical world whe

Re: Samza job throughput much lower than Kafka throughput

2015-05-21 Thread Guozhang Wang
Hi George, How is the incoming traffic to your source topic? Is it more than 130K? Guozhang On Thu, May 21, 2015 at 11:44 AM, George Li wrote: > Hi Roger, > > These parameters dont seem to affect throughput much, probably because my > test job just reads from kafka and doesnt write to it? > >

Re: Do we want to release the 0.9.1 now?

2015-05-21 Thread Yan Fang
Sounds good! Thanks, Fang, Yan yanfang...@gmail.com On Thu, May 21, 2015 at 2:24 PM, Yi Pan wrote: > Hi, Yan, > > I am voting to start it now. Guozhang has already signed up to follow the > release process that Chris wrote up. There will be an announcement soon. > > Thanks! > > -Yi > > On Thu,

Re: Do we want to release the 0.9.1 now?

2015-05-21 Thread Yi Pan
Hi, Yan, I am voting to start it now. Guozhang has already signed up to follow the release process that Chris wrote up. There will be an announcement soon. Thanks! -Yi On Thu, May 21, 2015 at 2:21 PM, Yan Fang wrote: > Hi guys, > > Just ask, are there any other bugs that we want to back port

Do we want to release the 0.9.1 now?

2015-05-21 Thread Yan Fang
Hi guys, Just ask, are there any other bugs that we want to back port to 0.9.1 besides SAMZA-662 ? If no, I think we can prepare the 0.9.1 release and ask for the vote. Cheers, Fang, Yan yanfang...@gmail.com

Review Request 34564: SAMZA-401: add utilization metrics for the event loop

2015-05-21 Thread Luis De Pombo
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34564/ --- Review request for samza. Repository: samza Description --- SAMZA-401: a

Re: Number of partitions

2015-05-21 Thread Michael Ravits
Hey Garry, Thanks for the good advice. I'm definitely going to read the confluent blog post. I actually went back and re-read the section on containers before reading your reply and realized I was mixing between jobs and containers.. Thanks, Michael On Thu, May 21, 2015 at 10:50 PM, Garry Turki

RE: Number of partitions

2015-05-21 Thread Garry Turkington
Hi, The other variable to think about here is the task to container mapping. Each job will indeed have 1 task per input partition in the underlying topic but you can then spread those 500 instances across multiple containers in your Yarn grid: http://samza.apache.org/learn/documentation/0.9/co

Re: Number of partitions

2015-05-21 Thread Lukas Steiblys
Each job will get all the partitions and each task (500 of them) within the job will get 1 partition. So there will be 500 processes working through the log. I'd try to figure out what your scaling needs are for the next 2-3 years and then calculate your resource requirements accordingly (how

Re: Review Request 33419: SAMZA-625: Provide tool to consume changelog and materialize a state store

2015-05-21 Thread Navina Ramesh
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33419/#review84612 --- Apart from some questions, everything looks good to me! docs/learn

Re: Samza job throughput much lower than Kafka throughput

2015-05-21 Thread George Li
Hi Roger, These parameters dont seem to affect throughput much, probably because my test job just reads from kafka and doesnt write to it? Thanks, George From: Roger Hoover To: "dev@samza.apache.org" , Date: 21/05/2015 12:04 PM Subject:Re: Samza job throughput much lower t

Re: Samza job throughput much lower than Kafka throughput

2015-05-21 Thread George Li
Hi Guozhang, That was introduced when I was trying to figure out odd handshakes between kafka pulls. I turn it off, and the throughput increased by about 5% The fafka reader program I ran as baseline is from kafka repository,i.e., $KAFKA_BIN/kafka-consumer-perf-test.sh --zookeeper $ZOOKEEPER --

Re: Yarn jobs in accepted state

2015-05-21 Thread Navina Ramesh
Hi Shekar, Actually, I am not able to view even after signing in. Like Yan suggested, please try to share it elsewhere on the internet. Should G drive work? Thanks! Navina On 5/21/15, 11:21 AM, "Yan Fang" wrote: >Hi Shekar, > >This website requires signed in. Could you past to another more ope

Re: Yarn jobs in accepted state

2015-05-21 Thread Yan Fang
Hi Shekar, This website requires signed in. Could you past to another more open place ? There are a lot if you just google it. Sorry for being a little picky. Also, when you see a lot of jobs in the accepted state, what does the log say? Thanks, Fang, Yan yanfang...@gmail.com On Wed, May 20, 2

Re: Number of partitions

2015-05-21 Thread Michael Ravits
Well, since the number of partitions can't be changed after the system starts running I wanted to have the flexibility to grow a lot without stopping for upgrade. Just wonder what would be a tolerable number for Samza. For example if I'd start with 5 jobs, each will get 100 partitions. Is this reas

Re: Number of partitions

2015-05-21 Thread Lukas Steiblys
500 is a bit extreme unless you're planning on running the job on some 200 machines and try to exploit their full power. I personally run 4 in production for our system processing 100 messages/s and there's plenty of room to grow. Lukas On Thursday, May 21, 2015, Michael Ravits wrote: > Hi, > >

Re: Samza job throughput much lower than Kafka throughput

2015-05-21 Thread Roger Hoover
Oops. Sent too soon. I mean: producer.batch.size=262144 producer.linger.ms=5 producer.compression.type=lz4 On Thu, May 21, 2015 at 9:00 AM, Roger Hoover wrote: > Hi George, > > You might also try tweaking the producer settings. > > producer.batch.size=262144 > producer.linger

Re: Samza job throughput much lower than Kafka throughput

2015-05-21 Thread Roger Hoover
Hi George, You might also try tweaking the producer settings. producer.batch.size=262144 producer.linger.ms=5 producer.compression.type: lz4 On Wed, May 20, 2015 at 9:30 PM, Guozhang Wang wrote: > Hi George, > > Is there any reason you need to set the following configs? > > sys

Review Request 34539: SAMZA-401: add utilization metrics for the event loop

2015-05-21 Thread Luis De Pombo
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34539/ --- Review request for samza. Repository: samza Description --- SAMZA-401: a

Number of partitions

2015-05-21 Thread Michael Ravits
Hi, I wonder what are the considerations I need to account for in regard to the number of partitions in input topics for Samza. When testing with a 500 partitions topic with one Samza job I noticed the start up time to be very long. Are there any problems that might occur when dealing with this nu

Re: Review Request 34011: Add support for a Graphite Metrics Reporter

2015-05-21 Thread Luis De Pombo
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34011/ --- (Updated May 21, 2015, 7:54 a.m.) Review request for samza. Repository: samza