Hi,
Wondering what naming conventions people are using for topics in Kafka.
When there's re-partitioning involved, you can end up with multiple topics
that have the exact same data but are partitioned differently. How do you
name them?
Thanks,
Roger
Thanks, guys. I was also playing around with including partition count and
even the partition key in the topic name. My thought was that topics may
have the same data and number of partitions but only differ by partition
key. After a while, the naming does get crazy (too long and ugly). We
that
need to be put in to the YARN NM classpath.
Cheers,
Chris
On Tue, Mar 24, 2015 at 2:22 PM, Roger Hoover roger.hoo...@gmail.com
wrote:
Hi all,
I'm new to YARN and trying to have YARN download the Samza job tarball (
https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html
Do I need to bring up sshd on my laptop or can the tests be made to not ssh?
On Wed, Mar 25, 2015 at 4:27 PM, Roger Hoover roger.hoo...@gmail.com
wrote:
Hi,
I wanted to see if I could run the integration tests on the 0.9.0 branch
on my Mac.
I cloned the 0.9.0 branch from the github mirror
Hi,
I wanted to see if I could run the integration tests on the 0.9.0 branch on
my Mac.
I cloned the 0.9.0 branch from the github mirror, built everything
(./gradlew clean build), and tried to run the integration tests.
./bin/integration-tests.sh /tmp/roger
I get an error when the test script
Ah, thanks for the great explanation. Any particular reason that the
job(s) you described should not be Samza jobs?
We're started experimenting with such jobs for Druid and Elasticsearch.
For Elasticsearch, the Samza job containers join the Elasticsearch cluster
as transport nodes and use the
for 72 hours with torture
test
running. No consistency/data loss issues! I did find an issue
with
the
checker integration test, but I think it's best left for
0.10.0,
so
I'll
open a JIRA to track that.
On Mon, Mar 30, 2015 at 10:49 AM, Roger Hoover
Hi Felix,
1,3. We're experimenting with both Druid and Elasticsearch for this. We're
using Samza to enrich user activity and system performance events then
index them in Druid +/or Elasticsearch depending on the use case.
2. These are internal BI/Operations applications
4. We're still getting up
On Fri, Feb 20, 2015 at 9:04 PM, Roger Hoover roger.hoo...@gmail.com
wrote:
Jay,
Sorry, I didn't explain it very well. I'm talking about a stream-table
join where the table comes from a compacted topic that is used to populate
a local data store. As the stream events are processed
Hi Geoffry,
You might find the Google Millwheel paper and recent talk relevant. That
system supports windows based on event time as well as reprocessing.
Sent from my iPhone
On Feb 23, 2015, at 4:49 PM, Geoffry Sumter vit...@gmail.com wrote:
Hey everyone,
I've been thinking about
Hi all,
I'm new to YARN and trying to have YARN download the Samza job tarball (
https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.html).
From the log, it seems that the download failed. I've tested that the file
is available via curl. The error message is:
mention anything about the number of partitions when
doing so, anyways maybe it helps.
Renato M.
[1] https://www.mail-archive.com/users@kafka.apache.org/msg11976.html
2015-03-19 5:43 GMT+01:00 Roger Hoover roger.hoo...@gmail.com:
Thanks, guys. I was also playing around with including
partitioning. If that case, push or
pull is the same, yeah?
Thanks,
Roger
On Thu, Apr 2, 2015 at 3:21 PM, Roger Hoover roger.hoo...@gmail.com wrote:
Chinmay,
Thanks for your input.
I'm not understanding what the difference is. With the design that Felix
laid out, the co-located Kafka consumer
();
Class is here:
https://github.com/Quantiply/rico/blob/master/avro-serde/src/main/java/com/quantiply/avro/Join.java
Cheers,
Roger
On Thu, Apr 9, 2015 at 12:54 PM, Roger Hoover roger.hoo...@gmail.com
wrote:
Yi Pan,
Thanks for your response. I'm thinking that I'll iterate over the fields
the log from the broker as well.
On Tue, Apr 28, 2015 at 3:31 PM, Roger Hoover roger.hoo...@gmail.com
wrote:
Hi,
I need some help figuring out what's going on.
I'm running Kafka 0.8.2.1 and Samza 0.9.0 on YARN. All the topics have
replication factor of 2.
I'm bouncing the Kafka
Hi,
I need some help figuring out what's going on.
I'm running Kafka 0.8.2.1 and Samza 0.9.0 on YARN. All the topics have
replication factor of 2.
I'm bouncing the Kafka broker using SIGTERM (with
controlled.shutdown.enable=true). I see the Samza job log this message and
then hang (does not
/apache/samza/system/kafka/KafkaSystemProducer.scala#L143
,
otherwise, it will throw SamzaException to quit the job. So maybe some
Kafka experts in this mailing list or Kafka mailing list can help
Fang, Yan
yanfang...@gmail.com
On Tue, Apr 28, 2015 at 5:35 PM, Roger Hoover roger.hoo
AM, Roger Hoover roger.hoo...@gmail.com
wrote:
Hi,
I'm trying to deploy a job to a small YARN cluster. How do tell the
launcher script where to find the Resource Manager? I tried creating a
yarn-site.xml and setting HADOOP_CONF_DIR environment variable but it
doesn't find my config
Hi,
I'm trying to deploy a job to a small YARN cluster. How do tell the
launcher script where to find the Resource Manager? I tried creating a
yarn-site.xml and setting HADOOP_CONF_DIR environment variable but it
doesn't find my config.
2015-04-14 22:02:45 ClientHelper [INFO] trying to connect
!
-Yi
On Thu, Apr 9, 2015 at 8:55 AM, Roger Hoover roger.hoo...@gmail.com
wrote:
Hi Milinda and others,
This is an Avro question but since you guys are working on Avro support
for
stream SQL, I thought I'd ask you for help.
If I have a two records of type A and B as below
!
-Yi
On Thu, Apr 9, 2015 at 8:55 AM, Roger Hoover roger.hoo...@gmail.com
wrote:
Hi Milinda and others,
This is an Avro question but since you guys are working on Avro support
for
stream SQL, I thought I'd ask you for help.
If I have a two records of type A and B as below and want
Hi Milinda and others,
This is an Avro question but since you guys are working on Avro support for
stream SQL, I thought I'd ask you for help.
If I have a two records of type A and B as below and want to join them
similar to SELECT * in SQL to produce a record of type AB, is there an
simple way
Hi Warren,
Yes, I think Hello Samza is the template project to work from. I believe
that the slow message rate that you are seeing is because it's subscribed
to the the wikipedia IRC stream which may only generate a few events per
second.
That said, some of the example configuration for the
Ah, this seems to work. I saw the YarnJob.scala was referencing __package
to launch to AM itself.
yarn.am.opts=-Xmx768m -XX:+UseSerialGC
-Dlog4j.configuration=file://$(pwd)/__package/lib/log4j-am.xml
On Tue, Jun 23, 2015 at 12:40 PM, Roger Hoover roger.hoo...@gmail.com
wrote:
Hi,
I want
Hi,
I want the App Master to log at INFO level and the container to log at
ERROR. Is there a way to configure the AM to use a different log4j config
file?
I'm trying to setting yarn.am.opts but ran couldn't get it to work with
system properties.
yarn.am.opts=-Xmx768m -XX:+UseSerialGC
, it will be more helpful
because many logs in chooser are trace level.
Thanks,
Fang, Yan
yanfang...@gmail.com
On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover roger.hoo...@gmail.com
wrote:
I need some help. I have a job which bootstraps one stream and then is
supposed to read from two. When I run
partitioned bootstrapped topics?
Thanks,
Roger
On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover roger.hoo...@gmail.com
wrote:
Hi Yan,
I've uploaded a file with TRACE level logging here:
http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz
I really appreciate your help as this is a critical issue
Hi all,
Do you think we could get this bootstrapping bug fixed before 0.9.1
release? It seems like a critical bug.
https://issues.apache.org/jira/browse/SAMZA-720
Thanks,
Roger
On Sat, Jun 20, 2015 at 10:38 PM, Yan Fang yanfang...@gmail.com wrote:
Agree. I will test it this weekend.
18, 2015 at 5:20 PM, Roger Hoover roger.hoo...@gmail.com
wrote:
I need some help. I have a job which bootstraps one stream and then is
supposed to read from two. When I run it on our YARN cluster with a single
container, it works correctly. When I tried it with 5 containers, it gets
hung
I need some help. I have a job which bootstraps one stream and then is
supposed to read from two. When I run it on our YARN cluster with a single
container, it works correctly. When I tried it with 5 containers, it gets
hung after consuming the bootstrap topic. I ran it with the grid script on
. It seems the fix for
SAMZA-720 is pretty localized and I am OK to push it into 0.9.1. I will be
working on back porting those changes to 0.9.1 later today and fix all the
release related issues.
Thanks!
-Yi
On Mon, Jun 22, 2015 at 10:30 AM, Roger Hoover roger.hoo...@gmail.com
wrote:
Yan
Oops. Sent too soon. I mean:
producer.batch.size=262144
producer.linger.ms=5
producer.compression.type=lz4
On Thu, May 21, 2015 at 9:00 AM, Roger Hoover roger.hoo...@gmail.com
wrote:
Hi George,
You might also try tweaking the producer settings.
producer.batch.size=262144
Metamorphosis...nice. :)
This has been a great discussion. As a user of Samza who's recently
integrated it into a relatively large organization, I just want to add
support to a few points already made.
The biggest hurdles to adoption of Samza as it currently exists that I've
experienced are:
1)
We're using 2.4.0 in production. Are there any major incompatibilities to
watch out for when upgrading to 2.6.0?
Thanks,
Roger
On Mon, Aug 17, 2015 at 4:41 PM, Yan Fang yanfang...@gmail.com wrote:
Hi guys,
we have been discussing upgrading to Yarn 2.6.0 (SAMZA-536
in since he did all the upgrade and may
have
more insights.
@Jon, could you help to comment on this?
Thanks!
-Yi
On Wed, Aug 19, 2015 at 9:12 AM, Roger Hoover roger.hoo...@gmail.com
wrote:
We're using 2.4.0 in production. Are there any major
incompatibilities
to
watch out
Hi Yan,
My (uneducated) guess is that the performance gains come from batching. I
don't know if the new producer ever batches by destination broker. If not
and it only batches by (broker,topic,partition) then I doubt that one vs
two producers will affect performance as they send to different
You also may want to check if the cleaner thread in the broker is still
alive (using jstack). I've run into this issue and used the fix mentioned
in the ticket to get compaction working again.
https://issues.apache.org/jira/browse/KAFKA-1641
I'd just like to mention that a possible workaround
. To reply, visit:
https://reviews.apache.org/r/36815/#review93413
---
On July 29, 2015, 6:22 a.m., Roger Hoover wrote:
---
This is an automatically generated e-mail. To reply, visit:
https
Thanks, Yi!
On Wed, Jul 29, 2015 at 12:16 PM, Yi Pan nickpa...@gmail.com wrote:
Hi, Roger,
I am testing the patch now. Will update the JIRA soon.
Thanks!
-Yi
On Wed, Jul 29, 2015 at 12:11 PM, Roger Hoover roger.hoo...@gmail.com
wrote:
Thank you, Dan. I think we're ready to merge
be true for other log lines added.
On July 29th, 2015, 6:22 p.m. UTC, *Roger Hoover* wrote:
Good idea. Thanks.
BTW, it didn't work like this: Logger.info(Failed to index message in
ElasticSearch., itemResp.getFailure()) so I did this:
LOGGER.error(Failed to index document in Elasticsearch
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36815/#review93249
---
On July 28, 2015, 6:15 a.m., Roger Hoover wrote:
---
This is an automatically
: Is it guaranteed that there is no DeleteResponse here?
It would be good to at least log a warn if we get an unexpected response
here.
Roger Hoover wrote:
It is guaranteed that you will not get a DeleteResponse back because the
producer currently only allows IndexRequests. In the furture
Thanks, Yi.
I propose that we also include SAMZA-741 for Elasticsearch versioning
support with the new ES producer. I think it's very close to being merged.
Roger
On Tue, Jul 28, 2015 at 10:08 PM, Yi Pan nickpa...@gmail.com wrote:
Hi, all,
I want to start the discussion on the release
, 2015, 5:17 a.m., Roger Hoover wrote:
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36815/
---
(Updated July 29, 2015, 5:17 a.m
---
Refactored DefaultIndexRequestFactory to make it easier to subclass and
customize to handle version and version_type parameters.
Thanks,
Roger Hoover
to make it easier to subclass and
customize to handle version and version_type parameters.
Thanks,
Roger Hoover
. Thanks for the reply.
-Jordan
On Thu, Jul 23, 2015 at 9:32 AM, Roger Hoover roger.hoo...@gmail.com
wrote:
Hi Jordan,
I ran into a similiar issue when using snappy compression and the new
producer. If you disable compression or switch to lz4 or gzip, does the
issue go away?
Cheers
were created and indexed.
Thanks,
Roger Hoover
were created and indexed.
Thanks,
Roger Hoover
overhead, but also keep all the methods consistent for better
readability. What do you think?
Roger Hoover wrote:
Sounds good. I only baulked on it the first time because I'm not that
skilled with Scala type decarations yet. :) I can make this work
I take it back. It seems it [can't
/
Testing
---
Refactored DefaultIndexRequestFactory to make it easier to subclass and
customize to handle version and version_type parameters.
Thanks,
Roger Hoover
Hi Jordan,
I ran into a similiar issue when using snappy compression and the new
producer. If you disable compression or switch to lz4 or gzip, does the
issue go away?
Cheers,
Roger
On Wed, Jul 22, 2015 at 11:54 PM, Jordan Shaw jor...@pubnub.com wrote:
Hey Everyone,
I'm getting an:
Hi Dan and Samza devs,
I have a use case for which I need to set an external version on
Elasticsearch documents. Versioning (
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning)
lets you prevent duplicate messages from temporarily overwriting new
Hi all,
I've started using the new Elasticsearch System Producer (many thanks,
Dan!) and decided to add some metrics to it.
The JIRA ticket and review request links are here:
https://issues.apache.org/jira/browse/SAMZA-733
https://reviews.apache.org/r/36473/
Cheers,
Roger
correctly count how many Elasticsearch documents
were created and indexed.
Thanks,
Roger Hoover
://reviews.apache.org/r/36473/#review91670
---
On July 14, 2015, 6:12 a.m., Roger Hoover wrote:
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36473/#review91670
---
On July 14, 2015, 6:12 a.m., Roger Hoover wrote
? This can simplifies a little.
Roger Hoover wrote:
I don't see how it simplifies things because I have to implement all the
methods in the Scala trait. I'm having trouble getting the newGauge
signatures to match.
```
public class ElasticsearchSystemProducerMetrics
and that the metrics correctly count how many Elasticsearch documents
were created and indexed.
Thanks,
Roger Hoover
for Elasticsearch producer appear in JMX and the metrics
stream and that the metrics correctly count how many Elasticsearch documents
were created and indexed.
Thanks,
Roger Hoover
Hi Samza devs,
I ran into an issue with Samza 0.9.1 where I had a serialization exception
thrown in the MetricsSnapshotReporter. It's very hard to find because
nothing is logged and the metrics just stop getting scheduled. Samza
should catch all exceptions in that thread, log them, and suppress
I tried it once with 0.9.1 and it didn't work for me either. I didn't have
time to examine it more carefully at the time.
Roger
On Thu, Oct 29, 2015 at 10:05 PM, Lukas Steiblys
wrote:
> I'm using Samza 0.9.1.
>
> Lukas
>
> On 10/29/15, Yi Pan wrote:
Great. Thanks, Yi.
On Mon, Oct 5, 2015 at 10:25 AM, Yi Pan <nickpa...@gmail.com> wrote:
> Hi, Roger,
>
>
> On Sat, Oct 3, 2015 at 11:13 AM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > As previously discussed, the biggest request I
> > have is b
the advantage of putting all that complex
stuff behind a clean api that the clients are already going to be
implementing for their consumer, so the added functionality for stream
processing beyond a consumer becomes very minor.
-Jay
On Tue, Jul 7, 2015 at 10:49 AM, Roger Hoover roger.hoo...@gmail.com
in terms of host placement since there is already a configurable
partition movement timeout and task-by-task state reuse with a check on
state validity.
-Jay
On Fri, Jul 10, 2015 at 8:34 AM, Roger Hoover roger.hoo...@gmail.com
wrote:
That would be great to let Kafka do as much heavy lifting
Hi Selina,
If you want to use Confluent's schema registry for Avro, then I have an
example in this repo:
https://github.com/theduderog/hello-samza-confluent
Cheers,
Roger
On Tue, Nov 17, 2015 at 12:32 AM, Selina Tech wrote:
> Dear All:
> Do you know where I can
Thanks for sharing!
Tao, did you use YARN to run 15 containers or is there a way to have them
statically divide up the tasks?
On Mon, Aug 24, 2015 at 12:03 PM, Ed Yakabosky
eyakabo...@linkedin.com.invalid wrote:
Hi Samza open source,
I want to share that Tao Feng
Hi Yi,
Thank you for sharing this update and perspective. I tend to agree that
for simple, stateless cases, things could be easier and hopefully KStreams
may help with that. I also appreciate a lot of features that Samza already
supports for operations. As previously discussed, the biggest
Elias,
I would also love to be able to deploy Samza on Kubernetes with dynamic
task management. Thanks for sharing this. It may be a good interim
solution.
Roger
On Sun, Nov 29, 2015 at 11:18 AM, Elias Levy
wrote:
> I've been exploring Samza for stream
Awesome. Thanks.
On Sun, Nov 29, 2015 at 3:25 PM, Elias Levy <fearsome.lucid...@gmail.com>
wrote:
> Roger,
>
> You are welcomed. If you want to experiment, you can use my hello samza
> <https://hub.docker.com/r/elevy/hello-samza/> Docker image.
>
> On Sun, N
and configure
> options do not change, I would vote to replace the implementation w/
> HTTP-based ElasticsearchSystemProducer.
>
> Thanks for putting this new additions up!
>
> -Yi
>
> On Tue, Feb 9, 2016 at 10:39 AM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
&
Hi Jeremiah,
There's currently no way to do that. I think the best way to modify the
existing ElasticsearchSystemProducer would be to add a config option for a
callback to let you customize this behavior. Basically, a pluggable
listener (
LOGGER.info("Failed to index document in Elasticsearch: " +
> itemResp.getFailureMessage());
>} else {
> hasFatalError = true;
> LOGGER.error("Failed to index document in Elasticsearch: " +
> itemResp.getFailureMessage());
>}
>
> - jeremiah
>
+1 - Thanks for bringing this up, Yi. I've done it both ways and feel pull
requests are much easier.
Sent from my iPhone
> On Feb 18, 2016, at 4:25 PM, Navina Ramesh
> wrote:
>
> +1
>
> Haven't tried any contribution with pull requests. But sounds simpler than
/rico/blob/master/samza-elasticsearch/src/main/java/com/quantiply/elasticsearch/HTTPBulkLoader.java#L237-L272
> Thanks!
>
> -Yi
>
> On Tue, Feb 9, 2016 at 4:19 PM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > Hi Yi,
> >
> > It could be merged into
Jose,
It would be great if you could share it. I'm interested in trying to use
it as well.
Thanks,
Roger
On Wed, Mar 2, 2016 at 2:31 PM, José Barrueta wrote:
> Hi guys,
>
> At Stormpath, we made a custom samza 10 version merging SAMZA-41 into it,
> it's working well, so
76 matches
Mail list logo