I may be wrong but try to use flume (http://flume.apache.org), just not
sure if it has hive source.
On 28 April 2015 at 15:09, Svante Karlsson svante.karls...@csi.se wrote:
What's the best way of exporting contents (avro encoded) from hive queries
to kafka?
Kind of camus, the other way
@Ewen
No I did not use compression in my measurements.
I must agree with @Roshan – it's hard to imagine anything more intuitive
and easy to use for atomic batching as old sync batch api. Also, it's fast.
Coupled with a separate instance of producer per
broker:port:topic:partition it works very well. I would be glad if it finds
its way into new
Does increasing PartitionFetchInfo.fetchSize help?
Speaking of Kafka API, it looks like throwing exception would be less
confusing if fetchSize is not enough to get at least one message at
requested offset.
2015-04-28 21:12 GMT+03:00 Laran Evans laran.ev...@nominum.com:
I’ve got a simple
Hey guys,
The locking argument is correct for very small records ( 50 bytes),
batching will help here because for small records locking becomes the big
bottleneck. I think these use cases are rare but not unreasonable.
Overall I'd emphasize that the new producer is way faster at virtually all
I have just posted the following question in stackoverflow. Could you
answer the following questions?
I would like to use Kafka high level consumer API, and at the same time I
would like to disable auto commit of offsets. I tried to achieve this
through the following steps.
1) auto.commit.enable
I am trying to setup a cluster where messages should never be lost once it
is published. Say if I have 3 brokers, and if I configure the replicas to
be 3 also, and if I consider max failures as 1, and I can achieve the above
requirement. But when I post a message, how do I prevent kafka from
subscribe
What's the best way of exporting contents (avro encoded) from hive queries
to kafka?
Kind of camus, the other way around
best regards
svante
Hi Zakee,
Thanks for your reply.
message.send.max.retries
3
retry.backoff.ms
100
topic.metadata.refresh.interval.ms
600*1000
This is my properties.
Regards,
Madhukar
On Tue, Apr 28, 2015 at 3:26 AM, Zakee kzak...@netzero.net wrote:
What values do you have for below properties? Or are
Hi
I have had zk and kafka(2_8.0-0.8.1) set up nicely running for a week or so, I
decided to stop the zk and the kafka brokers and re-start them, since stopping
zk I can't start it again!! It gives me this fatal exception that is related to
one of my test topics multinode1partition4reptopic!?
Hi Ewen,
Thanks for the response. I agree with you, In some case we should use
bootstrap servers.
If you have logs at debug level, are you seeing this message in between the
connection attempts:
Give up sending metadata request since no node is available
Yes, this log came for couple
You can use the min.insync.replicas topic level configuration in this case. It
must be used with acks=-1 which is a producer config.
http://kafka.apache.org/documentation.html#topic-config
Aditya
From: Gomathivinayagam Muthuvinayagam
Yes, if you set the offset storage to Kafka, high level consumer will be
using Kafka for all offset related operations.
Jiangjie (Becket) Qin
On 4/27/15, 7:03 PM, Gomathivinayagam Muthuvinayagam
sankarm...@gmail.com wrote:
I am trying to commit offset request in a background thread. I am able
1. We’re using version 0.8.1.1.
2. No failures in the consumer logs
3. We’re using the ConsumerOffsetChecker to see what partitions are assigned to
the consumer group and what their offsets are. 8 of the 12 process each have
been assigned two partitions and they’re keeping up with the topic. The
@Joel,
If flush() works for this use case it may be an acceptable starting point
(although not as clean as a native batched sync). I am not as yet clear
about some aspects of flush's batch semantics and its suitability for this
mode of operation. Allow me explore it with you folks..
1) flush()
I’m sorry, I forgot to specify that these processes are in the same consumer
group.
Thanks,
Dave
On 4/28/15, 1:15 PM, Aditya Auradkar aaurad...@linkedin.com.INVALID wrote:
Hi Dave,
The simple consumer doesn't do any state management across consumer instances.
So I'm not sure how you are
Also note that the metadata for the topic is missing. I tried creating few
more topics and all have the same issue.
Using the Kafka console producer on the topic, I see these error messages
indicating the missing metadata:
WARN Error while fetching metadata [{TopicMetadata for topic my-topic -
Couple of questions:
- What version of the consumer API are you using?
- Are you seeing any rebalance failures in the consumer logs?
- How do you determine that some partitions are unassigned? Just confirming
that you have partitions that are not being consumed from as opposed to
consumer
Kind of what you need but not quiet:
Sqoop2 is capable of getting data from HDFS to Kafka.
AFAIK it doesn't support Hive queries, but feel free to open a JIRA for
Sqoop :)
Gwen
On Tue, Apr 28, 2015 at 4:09 AM, Svante Karlsson svante.karls...@csi.se
wrote:
What's the best way of exporting
Ok, all of that makes sense. The only way to possibly recover from that
state is either for K2 to come back up allowing the metadata refresh to
eventually succeed or to eventually try some other node in the cluster.
Reusing the bootstrap nodes is one possibility. Another would be for the
client to
Hi,
I wonder if anyone has a good example of how to write Spark RDDs into
Kafka. Specifically, my question is if there is an advantage of sending a
list of messages each time over sending one message at a time.
Sample code for sending one message at a time:
dStream.foreachRDD(rdd = {
22 matches
Mail list logo