Re: Maximum number of producers per topic per broker

2016-07-23 Thread Ewen Cheslack-Postava
There's no strict limit on the number of producers. If you're hitting some CPU limit, perhaps you are simply overloading the broker? 6 or 700 brokers doesn't sound that bad, but if they are producing too much data then of course eventually the broker will become overwhelmed. How much total data is

Re: Duplicates consumed on rebalance. No compression, autocommit enabled.

2016-07-23 Thread Ewen Cheslack-Postava
I'd suggest using the new consumer instead of the old consumer. We've refined the implementation such that even with auto-commit you should get at least once processing in the worst case (and when there aren't failures, exactly once). The 0.10.0.0 release should get all of these semantics right.

Re: Topic naming convention and common message envelope.

2016-07-23 Thread Ewen Cheslack-Postava
On Tue, Jul 19, 2016 at 12:48 AM, Denis Mikhaylov wrote: > Hi, I plan to use Kafka for event-based integration between services. And > I have two questions that bother me a lot: > > 1) What topic naming convention do you use? > There's no strict convention, but using '.' or

Re: Deploying new connector to existing Kafka cluster

2016-07-23 Thread Ewen Cheslack-Postava
You're right that today you need to distribute jars manually today -- we don't have a built-in distribution mechanism, we just depend on what's on the classpath. Once you've got the jars installed, to make the jars accessible you'll need to do a rolling bounce with updated classpaths. We know

Re: Understanding Consumer Pooling vs Streaming

2016-07-23 Thread Ewen Cheslack-Postava
They implement generally the same consumer group functionality, but the new consumer (your option 1) is more modern, will be supported for a long time (whereas option 2 will eventually be deprecated and removed), and has a better implementation. The new consumer takes into account a lot of lessons

Re: Opportunity to contribute in Apache Kafka

2016-07-23 Thread Ewen Cheslack-Postava
Hey Shubham, I'd highly recommend a couple of newbie bugs just to get familiarized ( https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20%22newbie%22%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20key%20DESC ) After getting familiarized with the project

Re: Kafka Active Segment List Diagram Typo?

2016-07-23 Thread Ewen Cheslack-Postava
Yes, that's just a bug in the image -- the second log segment should hold messages in the range indicated in the left side of the image. -Ewen On Sun, Jul 3, 2016 at 10:03 AM, Adam Cardenas wrote: > Good day Kafka users, > > Was looking over the current Kafka docs; >

Re: consumer.subscribe(Pattern p , ..) method fails with Authorizer

2016-07-23 Thread Ewen Cheslack-Postava
Manikumar, Yeah, that seems bad. Seems like maybe instead of moving to server-side processing we should make the metadata request limit results to topics the principal is authorized for? I suspect this is important anyway since generally it seems we don't want to reveal errors when there's

Re: 0.9 client persistently high CPU usage

2016-07-23 Thread Ewen Cheslack-Postava
That exception indicates that another thread is interrupting the consumer thread. Is there something else in the process that could be causing that interruption? The -1 broker ID actually isn't unusual. Since broker IDs should be positive, this is just a simple approach to identifying bootstrap

Re: How to know the producer of one topic?

2016-07-23 Thread Ewen Cheslack-Postava
Unfortunately there's no ID for the producer of messages -- the client ID is included when the request is sent, but it isn't recorded on disk. You *might* be able to dig out the producer of bad messages from the Kafka logs, but there's nothing in the stored data that would lead you directly to the

Re: Find partition offsets in a kerberized kafka cluster

2016-07-23 Thread Ewen Cheslack-Postava
The GetOffsetShell utility still uses the SimpleConsumer, so I don't think there's a way to use it with Kerberos. The new consumer doesn't expose all the APIs that SimpleConsumer does, so I don't think the tool can be converted to the new consumer yet. -Ewen On Wed, Jul 6, 2016 at 11:02 AM,

Re: kafka-connect-hdfs failure due to corrupt WAL file

2016-08-02 Thread Ewen Cheslack-Postava
If you're not worried about duplicates, then yes, you can delete the WAL to recover. By the way, we're aware of some issues in that code that we're addressing here: https://github.com/confluentinc/kafka-connect-hdfs/pull/96 -Ewen On Tue, Aug 2, 2016 at 3:12 PM, Prabhu V

Re: Consumer poll in 0.9.0.1 hanging

2016-08-02 Thread Ewen Cheslack-Postava
It seems like we definitely shouldn't block indefinitely, but what is probably happening is that the consumer is fetching metadata, not finding the topic, then getting hung up somewhere. It probably won't hang indefinitely -- there is a periodic refresh of metadata, defaulting to every 5 minutes,

Re: How many TCP connections Java producers opens to feed data to broker?

2016-08-02 Thread Ewen Cheslack-Postava
One connection to each broker. You can get very high throughput even with a single TCP connection. The producer handles batching internally so you don't send small requests if you have high data rates. -Ewen On Wed, Jul 27, 2016 at 5:41 AM, Vladimir Picka wrote: > Hello,

Re: KafkaConsumer blocks indefinitely when server settings are wrong

2016-08-02 Thread Ewen Cheslack-Postava
This is unfortunate, but a known issue. See https://issues.apache.org/jira/browse/KAFKA-1894 The producer suffers from a similar issue with its initial metadata fetch on the first send(). -Ewen On Thu, Jul 28, 2016 at 12:46 PM, Oleg Zhurakousky < ozhurakou...@hortonworks.com> wrote: > Also,

Re: Same partition number of different Kafka topcs

2016-08-02 Thread Ewen Cheslack-Postava
Jack, The partition is always selected by the client -- if it weren't the brokers would need to forward requests since different partitions are handled by different brokers. The only "default Kafka partitioner" is the one that you could consider "standardized" by the Java client implementation.

Re: Kafka REST proxy performance vs Akka HTTP

2016-08-04 Thread Ewen Cheslack-Postava
On Thu, Aug 4, 2016 at 2:11 AM, vincent gromakowski < vincent.gromakow...@gmail.com> wrote: > Hi all, > I am familiar with Akka HTTP and have built a prototype of an application > that ingest data to kafka topic with Akka HTTP. > My app is using a new Kafka Producer shared among multiple Akka

Re: Runtime JARs for 0.10 Producer and Consumer

2016-08-04 Thread Ewen Cheslack-Postava
Alex, Normally you'd just depend on kafka-clients 0.10.0.0 and let any transitive dependencies get pulled in automatically. This snippet from the Kafka build files shows the libraries that clients depend on: dependencies { compile libs.lz4 compile libs.snappy compile libs.slf4jApi So

Re: Kafka Connect HdfsSink and the Schema Registry

2016-06-19 Thread Ewen Cheslack-Postava
t; > } > > When the connector got a message and did a lookup it didn't have the > "namespace" field and the lookup failed. I then posted a new version of > the schema without the "namespace" field and it worked. > > -Dave > > Dave Tauzell | Senior Software

Re: Fail fast producer/consumer when no connection to Kafka brokers cluster

2016-06-19 Thread Ewen Cheslack-Postava
You can adjust request.timeout.ms, which is shared between both new producer and new consumer. I don't think its quite what you want, but probably the closest that exists across both clients. There's not much more than that -- when you say "when the connection to the entire broker cluster is lost"

Re: Automatic Offset Committing

2017-01-23 Thread Ewen Cheslack-Postava
topic test* > > > On 24 Jan. 2017 3:25 pm, "Ewen Cheslack-Postava" <e...@confluent.io> > wrote: > > > The new consumer only supports committing offsets to Kafka. (It doesn't > > even have connection info to ZooKeeper, which is a general trend in Kafk

Re: min hardware requirement

2017-01-23 Thread Ewen Cheslack-Postava
Smaller servers/instances work fine for tests, as long as the workload is scaled down as well. Most memory on a Kafka broker will end up dedicated to page cache. For, e.g., 1GB of RAM just consider that you probably won't be leaving much room to cache the data so your performance may suffer a bit.

Re: Does offsetsForTimes use createtime of logsegment file?

2017-01-23 Thread Ewen Cheslack-Postava
> On Fri, Jan 6, 2017 at 1:26 PM, Ewen Cheslack-Postava <e...@confluent.io> > wrote: > > > It would return the earlier one, offset 0. > > > > -Ewen > > > > On Thu, Jan 5, 2017 at 10:15 PM, Vignesh <vignesh.v...@gmail.com> wrote: > > > >

Re: Kafka Protocol : about clients and number of TCP connections

2017-01-23 Thread Ewen Cheslack-Postava
The only other connections to brokers would be to the bootstrap brokers in order to collect cluster metadata. -Ewen On Wed, Jan 18, 2017 at 3:48 AM, Paolo Patierno wrote: > Hello, > > > I'd like to know the number of connections that Kafka clients establish. I > mean ... >

Re: Query on MirrorMaker Replication - Bi-directional/Failover replication

2017-01-23 Thread Ewen Cheslack-Postava
o replication. >> [Query]In order to lookup the offset deltas before initiating the >> consumers on the original cluster, is there any recommended >> mechanism/tooling that can be leveraged? >> > There isn't tooling for this, and the intent in this step is to le

Re: Kafka Connect: Paused connector but still processing data

2017-01-23 Thread Ewen Cheslack-Postava
There was this issue: https://issues.apache.org/jira/browse/KAFKA-4527 which was a test failure that had to do with updating the status as soon as the request to pause the connector was received rather than after it was processed. The corresponding PR fixed that (and will be released in 0.10.2.0).

Re: Automatic Offset Committing

2017-01-23 Thread Ewen Cheslack-Postava
The new consumer only supports committing offsets to Kafka. (It doesn't even have connection info to ZooKeeper, which is a general trend in Kafka clients -- all details of ZooKeeper are being hidden away from clients, even administrative functions like creating topics.) -Ewen On Thu, Jan 19,

Re: Kafka JDBC connector vs Sqoop

2017-01-30 Thread Ewen Cheslack-Postava
For MySQL you would either want to use Debezium's connector (which can handle bulk dump + incremental CDC, but requires direct access to the binlog) or the JDBC connector (does an initial bulk dump + incremental queries, but has limitations compared to a "true" CDC solution). Sqoop and the JDBC

Re: using kafka log compaction withour key

2017-01-30 Thread Ewen Cheslack-Postava
The log compaction functionality uses the key to determine which records to deduplicate. You can think of it (very roughly) as deleting entries from a hash map as the value for each key is overwritten. This functionality doesn't have much of a point unless you include keys in your records. -Ewen

Re: special characters in kafka log

2017-01-30 Thread Ewen Cheslack-Postava
Not sure what special characters you are referring to, but for data in the key and value fields in Kafka, it handles arbitrary binary data. "Special characters" aren't special because Kafka doesn't even inspect the data it is handling: clients tell it the length of the data and then it copies that

Re: kafka connect architecture

2017-01-30 Thread Ewen Cheslack-Postava
On Mon, Jan 30, 2017 at 8:24 AM, Koert Kuipers wrote: > i have been playing with kafka connect in standalone and distributed mode. > > i like standalone because: > * i get to configure it using a file. this is easy for automated deployment > (chef, puppet, etc.). configuration

Re: Upgrade questions

2017-01-30 Thread Ewen Cheslack-Postava
Note that the documentation that you linked to for upgrades specifically lists configs that you need to be careful to adjust in your server.properties. In fact, the server.properties shipped with Kafka is meant for testing only. There are some configs in the example server.properties that are not

Re: kafka_2.10-0.8.1 simple consumer retrieves junk data in the message

2017-01-30 Thread Ewen Cheslack-Postava
What are the 26 additional bytes? That sounds like a header that a decoder/deserializer is handling with the high level consumer. What class are you using to deserialize the messages with the high level consumer? -Ewen On Fri, Jan 27, 2017 at 10:19 AM, Anjani Gupta

Re: When publishing to non existing topic, TimeoutException is thrown instead of UnknownTopicOrPartitionException

2017-01-30 Thread Ewen Cheslack-Postava
Stevo, Agreed that this seems broken if we're just timing out trying to fetch metadata if we should be able to tell that the topic will never be created. Clients can't explicitly tell whether auto topic creation is on. Implicit indication via the error code seems like a good idea. My only

Re: Ideal value for Kafka Connect Distributed tasks.max configuration setting?

2017-01-30 Thread Ewen Cheslack-Postava
On Fri, Jan 27, 2017 at 10:49 AM, Phillip Mann wrote: > I am looking to product ionize and deploy my Kafka Connect application. > However, there are two questions I have about the tasks.max setting which > is required and of high importance but details are vague for what to >

[ANNOUNCE] Apache Kafka 0.10.2.0 Released

2017-02-22 Thread Ewen Cheslack-Postava
Mahadevan, Ashish Singh, Balint Molnar, Ben Stopford, Bernard Leach, Bill Bejeck, Colin P. Mccabe, Damian Guy, Dan Norwood, Dana Powers, dasl, Derrick Or, Dong Lin, Dustin Cote, Edoardo Comar, Edward Ribeiro, Elias Levy, Emanuele Cesena, Eno Thereska, Ewen Cheslack-Postava, Flavio Junqueira, fpj

Re: [VOTE] 0.10.2.0 RC2

2017-02-18 Thread Ewen Cheslack-Postava
from the source and ran the quickstart successfully on Ubuntu, Mac, > Windows (64 bit). > > Thank you Ewen for running the release. > > --Vahid > > > > From: Ewen Cheslack-Postava <e...@confluent.io> > To: d...@kafka.apache.org, "users@k

[VOTE] 0.10.2.0 RC2

2017-02-14 Thread Ewen Cheslack-Postava
Hello Kafka users, developers and client-developers, This is the third candidate for release of Apache Kafka 0.10.2.0. This is a minor version release of Apache Kafka. It includes 19 new KIPs. See the release notes and release plan (https://cwiki.apache.org/conf

[VOTE] 0.10.2.0 RC1

2017-02-10 Thread Ewen Cheslack-Postava
Hello Kafka users, developers and client-developers, This is RC1 for release of Apache Kafka 0.10.2.0. This is a minor version release of Apache Kafka. It includes 19 new KIPs. See the release notes and release plan (https://cwiki.apache.org/ confluence/display/KAFKA/Release+Plan+0.10.2.0) for

Re: Kafka Connect requestTaskReconfiguration

2017-01-15 Thread Ewen Cheslack-Postava
This is currently expected. Internally the Connect cluster uses the same rebalancing process as consumer groups which means it has similar limitations -- all tasks must stop just as you would need to stop consuming from all partitions and commit offsets during a consumer group rebalance. There's

Re: Schema Registry in Staging Environment

2016-09-27 Thread Ewen Cheslack-Postava
Lawrence, There are two common ways to approach registration of schemas. The first is to just rely on auto-registration that the serializers do (I'm assuming you're using the Java clients & serializers here, or an equivalent implementation in another language). In this case you can generally just

Re: Kafka Connect Hdfs Sink not sinking

2016-11-07 Thread Ewen Cheslack-Postava
://docs.confluent.io/3.0.0/connect/intro.html#quickstart) I am unable > to get any kafka-connect sink to properly function. All reportedly hangs > while attempting to consume. > > > connect-console-sink, connect-file-sink, etc... > > ____ > From: E

Re: no luck with kafka-connect on secure cluster

2016-11-26 Thread Ewen Cheslack-Postava
Koert, I think what you're seeing is that there are actually 3 different ways Connect can interact with Kafka. For both standalone and distributed mode, you have producers and consumers that are part of the source and sink connector implementations, respectively. Security for these are configured

Re: Convert log file in Avro json encoding to Avro binary encoding and send into Kafka

2016-11-26 Thread Ewen Cheslack-Postava
Avro JSON encoding is a wire-level format. The AvroConverter accepts Java runtime data (e.g. primitive types like Strings & Integers, Maps, Arrays, and Connect Structs). The component that most closely matches your needs is Confluent's REST proxy, which supports the Avro JSON encoding when

Re: Upgrading from kafka-0.8.1.1 to kafka-0.9.0.1

2016-11-13 Thread Ewen Cheslack-Postava
The errors you're seeing sound like an issue where you updated the artifact but didn't recompile against the newer Scala version. Did you recompile or just replace the Kafka jar with a newer one? -Ewen On Wed, Nov 9, 2016 at 4:31 PM, Divyajothi Baskaran wrote: > Hi, >

Re: Kafka Connect Hdfs Sink not sinking

2016-11-01 Thread Ewen Cheslack-Postava
Are you writing new data into the topic that the HDFS sink is trying to read data from? This line [2016-10-28 10:56:48,408] TRACE hdfs-sink-0 polling consumer with timeout 58820 ms (org.apache.kafka.connect.runtime.WorkerSinkTask:221) indicates it's going to wait for about 60s until some data

Re: sharing storage topics between different distributed connect clusters

2016-11-26 Thread Ewen Cheslack-Postava
If you want independent clusters based on the same Kafka cluster, they need independent values for config/offset/status topics. The Kafka Connect framework doesn't provide its own sort of namespacing or anything, so if you used the same topics in the same cluster, the values between different

Re: Error "Unknown magic byte" occurred while deserializing Avro message

2016-11-26 Thread Ewen Cheslack-Postava
Are you sure you have not produced any other data into that topic, e.g. perhaps you were testing the regular kafka-console-producer before? This would cause it to fail on the non-Avro messages (as Dayong says, because the initial magic byte mismatches). Can you try starting the consumer first

Re: Kafka Consumer not consuming new messages randomly

2016-11-26 Thread Ewen Cheslack-Postava
The REST proxy cannot guarantee that if there are messages in Kafka it will definitely return them. There will always be some latency between the request to the REST proxy and fetching data from Kafka, and because of the way the Kafka protocol works this could be delayed by the fetch timeout. The

Re: Custom Partitioner explanation?

2016-11-26 Thread Ewen Cheslack-Postava
Yes, that's correct. For reference you can just take a look at the DefaultPartitioner which does nearly same (with additional logic to do round robin when there isn't a key): https://github.com/ apache/kafka/blob/trunk/clients/src/main/java/org/

Re: Minor documentation error

2016-11-26 Thread Ewen Cheslack-Postava
I think you're seeing one of the confusing name changes between old and new consumers. A quick grep suggests that you are correct that the parameter for the old consumer is fetch.wait.max.ms, but the parameter for the new consumer is fetch.max.wait.ms. Since the link you gave is for the new

Re: Kafka Mirror viable hardware

2016-11-26 Thread Ewen Cheslack-Postava
Kevin, Generally you're right that mirroring, whether with MirrorMaker or Confluent's Replicator, shouldn't be too expensive wrt CPU. However, do be aware that in both cases, if you are using compression you'll need to decompress and recompress due to the way they copy data. This could possibly

Re: Balancing based on lag

2016-11-26 Thread Ewen Cheslack-Postava
Jens, Sorry, I'm very late to this thread but figure it might be worth following up since I think this is a cool feature of the new consumer but isn't well known. You actually have *quite* a bit of flexibility in controlling how partition assignment happens. The key hook is the

Re: Reg: DefaultParititioner in Kafka

2016-11-26 Thread Ewen Cheslack-Postava
When a key is available, you generally include it because you want all messages with the same key to always end up in the same partition. This allows all messages with the same key to be processed by the same consumer (e.g. allowing you to aggregate all data for a single user if you key on user

Re: NotEnoughReplication

2016-12-10 Thread Ewen Cheslack-Postava
This error doesn't necessarily mean that a broker is down, it can also mean that too many replicas for that topic partition have fallen behind the leader. This indicates your replication is lagging for some reason. You'll want to be monitoring some of the metrics listed here:

Re: Kafka supported on AIX OS?

2016-12-10 Thread Ewen Cheslack-Postava
As documented here http://kafka.apache.org/documentation#os Linux and Solaris have been tested, with Linux being the most common platform and the one regularly tested within the project itself. Since AIX is a Unix it'll probably work fine there, and I believe IBM at least provided documentation if

Re: How to disable auto commit for SimpleConsumer kafka 0.8.1

2016-12-10 Thread Ewen Cheslack-Postava
The simple consumer doesn't do auto-commit. It really only issues individual low-level Kafka protocol request types, so `commitOffsets` is the only way it should write offsets. Is it possible it crashed after the request was sent but before finishing reading the response? Side-note: I know you

Re: Best approach to frequently restarting consumer process

2016-12-10 Thread Ewen Cheslack-Postava
Consumer groups aren't going to handle 'let it crash' particularly well (and really any session-based services, but particularly consumer groups since a single failure affects the entire group). That said, 'let it crash' doesn't necessarily have to mean 'don't try to clean up at all'. The consumer

Re: Error starting kafka server.

2016-12-10 Thread Ewen Cheslack-Postava
Sai, Attachments to the mailing list get filtered, you'll need to paste the relevant info into your email. -Ewen On Fri, Dec 9, 2016 at 1:18 AM, Sai Karthi Golagani < skgolagani...@fishbowl.com> wrote: > Hi team, > > I’ve recently setup kafka server on a edge node and zk on 3 separate >

Re: Configuration for low latency and low cpu utilization? java/librdkafka

2016-12-10 Thread Ewen Cheslack-Postava
On the producer side, there's not much you can do to reduce CPU usage if you want low latency and don't have enough throughput to buffer multiple messages -- you're going to end up sending 1 record at a time in order to achieve your desired latency. Note, however, that the producer is thread safe,

Re: Running mirror maker between two different version of kafka

2016-12-10 Thread Ewen Cheslack-Postava
It's tough to read that stacktrace, but if I understand what you mean by "running the kafka mirroring in destination cluster which is 0.10.1.0 version of kafka", then the problem is that you cannot use 0.10.1.0 mirror maker with an 0.8.1. cluster. MirrorMaker is both a producer and consumer, so

Re: Efficient Kafka batch processing

2016-12-10 Thread Ewen Cheslack-Postava
You may actually want this implemented in a Streams app eventually, there is a KIP being discussed to support this type of incremental batch processing in Streams: https://cwiki.apache.org/confluence/display/KAFKA/KIP-95%3A+Incremental+Batch+Processing+for+Kafka+Streams However, for now the

Re: Upgrading from 0.10.0.1 to 0.10.1.0

2016-12-10 Thread Ewen Cheslack-Postava
Hagen, What does "new consumer doesn't like the old brokers" mean exactly? When upgrading MM, remember that it uses the clients internally so the same compatibility rules apply: you need to upgrade both sets of brokers before you can start using the new version of MM. -Ewen On Thu, Dec 8, 2016

Re: Kafka Connect gets into a rebalance loop

2016-12-17 Thread Ewen Cheslack-Postava
The message > Wasn't unable to resume work after last rebalance means that you previous iterations of the rebalance were somehow behind/out of sync with other members of the group, i.e. they had not read up to the same point in the config topic so it wouldn't be safe for this worker (or possibly

Re: __consumer_offsets topic acks

2016-12-17 Thread Ewen Cheslack-Postava
The default is -1 which means all replicas need to replicate the committed data before the ack will be sent to the consumer. See the offsets.commit.required.acks setting for the broker. min.insync.replicas applies to the offsets topic as well, but defaults to 1. You may want to increase this

Re: What does GetOffsetShell result represent

2016-12-17 Thread Ewen Cheslack-Postava
The tool writes output in the format: :: So in the case of your example with --time -1 that returned test-window-stream:0:724, it is saying that test-window-stream has partition 0 with a valid log segment which has the first offset = 724. Note that --time -1 is a special code for "only give the

Re: Kafka connect distributed mode not distributing the work

2016-12-17 Thread Ewen Cheslack-Postava
Hi Manjunath, I think you're seeing a case of this issue: https://issues.apache. org/jira/browse/KAFKA-4553 where the way round robin assignment works with an even # of workers and connectors that generate only 1 task generates uneven work assignments because connectors aren't really equivalent

Re: Producer connect timeouts

2016-12-17 Thread Ewen Cheslack-Postava
Without having dug back into the code to check, this sounds right. Connection management just fires off a request to connect and then subsequent poll() calls will handle any successful/failed connections. The timeouts wrt requests are handled somewhat differently (the connection request isn't

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-13 Thread Ewen Cheslack-Postava
; > >>> > > > > >>> > > -Original Message- > > >>> > > From: Stephen Powis [mailto:spo...@salesforce.com] > > >>> > > Sent: Thursday, January 12, 2017 8:34 AM > > >>> > > To: users@kafka.a

Re: java.lang.OutOfMemoryError: Java heap space while running kafka-consumer-perf-test.sh

2017-01-13 Thread Ewen Cheslack-Postava
Perhaps the default heap options aren't sufficient for your particular use of the tool. The consumer perf test defaults to 512MB. I'd simply try increasing the max heap usage: KAFKA_HEAP_OPTS="-Xmx1024M" to bump it up a bit. -Ewen On Wed, Jan 11, 2017 at 2:59 PM, Check Peck

Re: Kafka as a data ingest

2017-01-09 Thread Ewen Cheslack-Postava
> However, I'm trying to figure out if I can use Kafka to read Hadoop file. The question is a bit unclear as to whether you mean "use Kafka to send data to a Hadoop file" or "use Kafka to read a Hadoop file into a Kafka topic". But in both cases, Kafka Connect provides a good option. The more

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-09 Thread Ewen Cheslack-Postava
I can't speak to the exact details of why fds would be kept open longer in that specific case, but are you aware that the recommendation for production clusters for open fd limits is much higher? It's been suggested to be 100,000 as a starting point for quite awhile:

Re: Kafka as a data ingest

2017-01-10 Thread Ewen Cheslack-Postava
hink so. In this case, Kafka connect implement has no advantages to read > single big file unless you also use mapreduce. > > Sent from my iPhone > > On Jan 10, 2017, at 02:41, Ewen Cheslack-Postava <e...@confluent.io> > wrote: > > >> However, I'm tryi

Re: ConnectStandalone with no starting connector properties

2016-12-04 Thread Ewen Cheslack-Postava
Micah, Sure, we'd be happy to commit a patch that removes this restriction. In practice, for most folks its a bit simpler to generate the config file in whatever deployment system they are using than to start the standalone server separately and make an HTTP request to trigger creation of the

Re: FW: mirror and schema topics

2016-12-04 Thread Ewen Cheslack-Postava
Can you give more details about how you're setting up your mirror? It sounds like you're simply missing the __schemas topic, but it's hard to determine the problem without more details about your mirroring setup. -Ewen On Wed, Nov 30, 2016 at 12:03 PM, Berryman, Eric

Re: Expected client producer/consumer CPU utilization when idle

2016-12-04 Thread Ewen Cheslack-Postava
If completely idle, producers shouldn't need to do anything beyond very infrequent metadata updates (once ever few minutes). Consumers, however, will have some ongoing work -- they will always issues fetch requests (to get more data) and heartbeats (to indicate they are still alive). But these

Re: compaction + delete not working for me

2017-01-06 Thread Ewen Cheslack-Postava
On Fri, Jan 6, 2017 at 3:57 AM, Mike Gould wrote: > Hi > > I'm trying to configure log compaction + deletion as per KIP-71 in kafka > 0.10.1 but so far haven't had any luck. My tests show more than 50% > duplicate keys when reading from the beginning even several minutes

Re: One big kafka connect cluster or many small ones?

2017-01-06 Thread Ewen Cheslack-Postava
ented somewhere on the confluent website. I couldn’t find > it. > > On 6 January 2017 at 3:42:45 pm, Ewen Cheslack-Postava (e...@confluent.io) > wrote: > > On Thu, Jan 5, 2017 at 7:19 PM, Stephane Maarek < > steph...@simplemachines.com.au> wrote: > >> Thanks a lo

Re: Does offsetsForTimes use createtime of logsegment file?

2017-01-06 Thread Ewen Cheslack-Postava
= 1 > Message3, Timestamp = T1, Offset = 2 > > > Would offsetForTimestamp(T1) return offset for earliest message with > timestamp T1 (i.e. Offset 0 in above example) ? > > > -Vignesh. > > On Thu, Jan 5, 2017 at 8:19 PM, Ewen Cheslack-Postava <e...@confluent.io> >

Re: Metric meaning

2017-01-05 Thread Ewen Cheslack-Postava
There's not currently anything more detaild than what is included in http://kafka.apache.org/documentation/#monitoring There's some work trying to automate the generation of that documentation ( https://issues.apache.org/jira/browse/KAFKA-3480). That combined with some addition to give longer

Re: One big kafka connect cluster or many small ones?

2017-01-05 Thread Ewen Cheslack-Postava
On Thu, Jan 5, 2017 at 3:12 PM, Stephane Maarek < steph...@simplemachines.com.au> wrote: > Hi, > > We like to operate in micro-services (dockerize and ship everything on ecs) > and I was wondering which approach was preferred. > We have one kafka cluster, one zookeeper cluster, etc, but when it

Re: Query on MirrorMaker Replication - Bi-directional/Failover replication

2017-01-05 Thread Ewen Cheslack-Postava
On Thu, Jan 5, 2017 at 3:07 AM, Greenhorn Techie wrote: > Hi, > > We are planning to setup MirrorMaker based Kafka replication for DR > purposes. The base requirement is to have a DR replication from primary > (site1) to DR site (site2)using MirrorMaker, > > However,

Re: Interesting error message du jour

2016-12-30 Thread Ewen Cheslack-Postava
Jon, This looks the same as https://issues.apache.org/jira/browse/KAFKA-4563, although for a different invalid transition. The temporary fix suggested there is to simply convert the exception to log a warning, which should be a pretty trivial patch against trunk. It seems there are some

Re: how to ingest a database with a Kafka Connect cluster in parallel?

2017-01-03 Thread Ewen Cheslack-Postava
ctor? > > Have a nice day. > > Best regards, > Yang > > > 2017-01-03 20:55 GMT+01:00 Ewen Cheslack-Postava <e...@confluent.io>: > > > The unit of parallelism in connect is a task. It's only listing one task, > > so you only have one process copying data.

Re: MirrorMaker - Topics Identification and Replication

2017-01-03 Thread Ewen Cheslack-Postava
Yes, the consumer will pick up the new topics when it refreshes metadata (defaults to every 5 min) and start subscribing to the new topics. -Ewen On Tue, Jan 3, 2017 at 3:07 PM, Greenhorn Techie wrote: > Hi, > > I am new to Kafka and as well as MirrorMaker. So

Re: Kafka Connect Consumer reading messages from Kafka recursively

2017-01-03 Thread Ewen Cheslack-Postava
the > __consumer_offsets topic and it doesn't have anything in it. Should I > provide write permissions to this topic for my Kafka client user? I am > running my consumer using a different user than Kafka user. > > Thanks, > Sri > > On Tue, Jan 3, 2017 at 3:40 PM,

Re: Kafka Connect Consumer reading messages from Kafka recursively

2017-01-03 Thread Ewen Cheslack-Postava
offset checker to see if any offsets are committed for the group. Also, is there anything in the logs that might indicate a problem with the consumer committing offsets? -Ewen > Thanks, > Sri > > On Tue, Jan 3, 2017 at 1:59 PM, Ewen Cheslack-Postava <e...@confluent.io> > wro

Re: About Kafka Consumer : synchronous and blocking ?

2017-01-03 Thread Ewen Cheslack-Postava
would be fine, just be careful about committing offsets properly! -Ewen > > Thanks > Paolo > > Get Outlook for Android<https://aka.ms/ghei36> > > ____ > From: Ewen Cheslack-Postava <e...@confluent.io> > Sent: Tuesday, January

Re: Kafka Connect gets into a rebalance loop

2017-01-04 Thread Ewen Cheslack-Postava
.S. The version of Kafka Connect I'm running is > {"version":"0.10.0.0-cp1","commit":"7aeb2e89dbc741f6"} > On Sat, Dec 17, 2016 at 7:55 PM, Ewen Cheslack-Postava <e...@confluent.io> > wrote: > > > The message > > > > >

Re: Consumer Rebalancing Question

2017-01-04 Thread Ewen Cheslack-Postava
The coordinator will immediately move the group into a rebalance if it needs it. The reason LeaveGroupRequest was added was to avoid having to wait for the session timeout before completing a rebalance. So aside from the latency of cleanup/committing offests/rejoining after a heartbeat, rolling

Re: Reg: Need info on Kafka Brokers

2017-01-03 Thread Ewen Cheslack-Postava
Unfortunately, I don't think it has been open sourced (it doesn't seem to be available on https://github.com/paypal). -Ewen On Tue, Jan 3, 2017 at 5:54 PM, Jhon Davis wrote: > Found an interesting Kafka monitoring tool but no information on whether > it's not open

Re: Why does consumer.subscribe(Pattern) require a ConsumerRebalanceListener?

2017-01-03 Thread Ewen Cheslack-Postava
Tbh, I can't remember the exact details around the discussion of the addition of this API, but I think this was to minimize API bloat. It's easy to end up with 83 overloads of methods to handle all the different combinations of parameters, but just a couple of shorthand overrides cover the vast

Re: Processing time series data in order

2016-12-29 Thread Ewen Cheslack-Postava
The best you can do to ensure ordering today is to set: acks = all retries = Integer.MAX_VALUE max.block.ms = Long.MAX_VALUE max.in.flight.requests.per.connection = 1 This ensures there's only one outstanding produce request (batch of messages) at a time, it will be retried indefinitely on

Re: kafka streams and broadcast topic

2017-01-02 Thread Ewen Cheslack-Postava
I think what you're describing could be handled in KStreams by a "global" KTable. This functionality is currently being discussed/voted on in a KIP discussion: https://cwiki.apache.org/confluence/pages/viewpa ge.action?pageId=67633649 The list of interests would be a global KTable (shared globally

Re: About Kafka Consumer : synchronous and blocking ?

2017-01-03 Thread Ewen Cheslack-Postava
That's correct. Aside from commitAsync, all the consumer methods will block, although note that some are just local operations that affect subsequent method calls (e.g. seek() just sets some state locally). In fact, the only call that I think you'd need to actually worry about blocking is poll().

Re: how to ingest a database with a Kafka Connect cluster in parallel?

2017-01-03 Thread Ewen Cheslack-Postava
The unit of parallelism in connect is a task. It's only listing one task, so you only have one process copying data. The connector can consume data from within a single *database* in parallel, but each *table* must be handled by a single task. Since your table whitelist only includes a single

Re: Kafka Connect Consumer reading messages from Kafka recursively

2017-01-03 Thread Ewen Cheslack-Postava
On Tue, Jan 3, 2017 at 8:38 AM, Srikrishna Alla wrote: > Hi, > > I am using Kafka/Kafka Connect to track certain events happening in my > application. This is how I have implemented it - > 1. My application is opening a KafkaProducer every time this event happens > and

Re: Schema reg avro version

2017-01-03 Thread Ewen Cheslack-Postava
It just hasn't been update recently. It isn't just the schema-registry that needs to be updated since other components use the library as well and we'd need to avoid potential classpath conflicts, but it should be straightforward to update. -Ewen On Tue, Jan 3, 2017 at 8:45 AM, Scott Ferguson

Re: Consumer Rebalancing Question

2017-01-05 Thread Ewen Cheslack-Postava
p. This happens once and then the "issue" is resolved without any additional interruptions. -Ewen On Thu, Jan 5, 2017 at 3:01 PM, Pradeep Gollakota <pradeep...@gmail.com> wrote: > I see... doesn't that cause flapping though? > > On Wed, Jan 4, 2017 at 8:22 PM, Ewen

Re: Is this a bug or just unintuitive behavior?

2017-01-05 Thread Ewen Cheslack-Postava
The basic issue here is just that the auto.offset.reset defaults to latest, right? That's not a very good setting for a mirroring tool and this seems like something we might just want to change the default for. It's debatable whether it would even need a KIP. We have other settings in MM where we

<    1   2   3   4   >