Re: writing a single record to Kafka ...

2018-09-11 Thread Mahesh Vangala
Thanks, Lukasz.
Appreciate your advice.

*--*
*Mahesh Vangala*
*(Ph) 443-326-1957*
*(web) mvangala.com *


On Tue, Sep 11, 2018 at 6:19 PM Lukasz Cwik  wrote:

> A PCollection is a bag of elements. PCollections can be empty, have only
> one element or have many. It is up to you to choose how many elements are
> emitted into the PCollection by the upstream transforms.
>
> If you can limit the number of elements to the PCollection that you
> applied KafkaIO to to only one element you will have achieved your goal.
>
> On Tue, Sep 11, 2018 at 3:11 PM Mahesh Vangala 
> wrote:
>
>> Hello -
>>
>> I'd like to write a single record to kafka topic through beam.
>> However, I only see examples that work with PCollection.
>> Any thoughts about how I can approach to this?
>> Thank you.
>>
>> Regards,
>> Mahesh
>>
>> *--*
>> *Mahesh Vangala*
>> *(Ph) 443-326-1957*
>> *(web) mvangala.com *
>>
>


writing a single record to Kafka ...

2018-09-11 Thread Mahesh Vangala
Hello -

I'd like to write a single record to kafka topic through beam.
However, I only see examples that work with PCollection.
Any thoughts about how I can approach to this?
Thank you.

Regards,
Mahesh

*--*
*Mahesh Vangala*
*(Ph) 443-326-1957*
*(web) mvangala.com *


Re: Problem with KafkaIO

2018-09-11 Thread Raghu Angadi
Specifically, I am interested if you have any thread running
'consumerPollLoop()' [1]. There should always be one (if a worker is
assigned one of the partitions). It is possible that KafkaClient itself is
hasn't recovered from the group coordinator error (though unlikely).

https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java#L570

On Tue, Sep 11, 2018 at 12:31 PM Raghu Angadi  wrote:

> Hi Eduardo,
>
> In case of any error, the pipeline should keep on trying to fetch. I don't
> know about this particular error. Do you see any others afterwards in the
> log?
> Couple of things you could try if the logs are not useful :
>  - login to one of the VMs and get stacktrace of java worker (look for a
> container called java-streaming)
>  - file a support bug or stackoverflow question with jobid so that
> Dataflow oncall can take a look.
>
> Raghu.
>
>
> On Tue, Sep 11, 2018 at 12:10 PM Eduardo Soldera <
> eduardo.sold...@arquivei.com.br> wrote:
>
>> Hi,
>> We have a Apache Beam pipeline running in Google Dataflow using KafkaIO.
>> Suddenly the pipeline stop fetching Kafka messages at all, as our other
>> workers from other pipelines continued to get Kafka messages.
>>
>> At the moment it stopped we got these messages:
>>
>> I  [Consumer clientId=consumer-1, groupId=genericPipe] Error sending fetch 
>> request (sessionId=1396189203, epoch=2431598) to node 3: 
>> org.apache.kafka.common.errors.DisconnectException.
>> I  [Consumer clientId=consumer-1, groupId=genericPipe] Group coordinator 
>> 10.0.52.70:9093 (id: 2147483646 rack: null) is unavailable or invalid, will 
>> attempt rediscovery
>> I  [Consumer clientId=consumer-1, groupId=genericPipe] Discovered group 
>> coordinator 10.0.52.70:9093 (id: 2147483646 rack: null)
>>
>> And then the pipeline stopped reading the messages.
>>
>> This is the KafkaIO setup  we have:
>>
>> KafkaIO.read[String,String]()
>>   .withBootstrapServers(server)
>>   .withTopic(topic)
>>   .withKeyDeserializer(classOf[StringDeserializer])
>>   .withValueDeserializer(classOf[StringDeserializer])
>>   .updateConsumerProperties(properties)
>>   .commitOffsetsInFinalize()
>>   .withoutMetadata()
>>
>>  Any help will be much appreciated.
>>
>> Best regards,
>> --
>> Eduardo Soldera Garcia
>> Data Engineer
>> (16) 3509- | www.arquivei.com.br
>> 
>> [image: Arquivei.com.br – Inteligência em Notas Fiscais]
>> 
>> [image: Google seleciona Arquivei para imersão e mentoria no Vale do
>> Silício]
>> 
>> 
>> 
>> 
>>
>


Re: Problem with KafkaIO

2018-09-11 Thread Raghu Angadi
Hi Eduardo,

In case of any error, the pipeline should keep on trying to fetch. I don't
know about this particular error. Do you see any others afterwards in the
log?
Couple of things you could try if the logs are not useful :
 - login to one of the VMs and get stacktrace of java worker (look for a
container called java-streaming)
 - file a support bug or stackoverflow question with jobid so that Dataflow
oncall can take a look.

Raghu.


On Tue, Sep 11, 2018 at 12:10 PM Eduardo Soldera <
eduardo.sold...@arquivei.com.br> wrote:

> Hi,
> We have a Apache Beam pipeline running in Google Dataflow using KafkaIO.
> Suddenly the pipeline stop fetching Kafka messages at all, as our other
> workers from other pipelines continued to get Kafka messages.
>
> At the moment it stopped we got these messages:
>
> I  [Consumer clientId=consumer-1, groupId=genericPipe] Error sending fetch 
> request (sessionId=1396189203, epoch=2431598) to node 3: 
> org.apache.kafka.common.errors.DisconnectException.
> I  [Consumer clientId=consumer-1, groupId=genericPipe] Group coordinator 
> 10.0.52.70:9093 (id: 2147483646 rack: null) is unavailable or invalid, will 
> attempt rediscovery
> I  [Consumer clientId=consumer-1, groupId=genericPipe] Discovered group 
> coordinator 10.0.52.70:9093 (id: 2147483646 rack: null)
>
> And then the pipeline stopped reading the messages.
>
> This is the KafkaIO setup  we have:
>
> KafkaIO.read[String,String]()
>   .withBootstrapServers(server)
>   .withTopic(topic)
>   .withKeyDeserializer(classOf[StringDeserializer])
>   .withValueDeserializer(classOf[StringDeserializer])
>   .updateConsumerProperties(properties)
>   .commitOffsetsInFinalize()
>   .withoutMetadata()
>
>  Any help will be much appreciated.
>
> Best regards,
> --
> Eduardo Soldera Garcia
> Data Engineer
> (16) 3509- | www.arquivei.com.br
> 
> [image: Arquivei.com.br – Inteligência em Notas Fiscais]
> 
> [image: Google seleciona Arquivei para imersão e mentoria no Vale do
> Silício]
> 
> 
> 
> 
>


Problem with KafkaIO

2018-09-11 Thread Eduardo Soldera
Hi,
We have a Apache Beam pipeline running in Google Dataflow using KafkaIO.
Suddenly the pipeline stop fetching Kafka messages at all, as our other
workers from other pipelines continued to get Kafka messages.

At the moment it stopped we got these messages:

I  [Consumer clientId=consumer-1, groupId=genericPipe] Error sending
fetch request (sessionId=1396189203, epoch=2431598) to node 3:
org.apache.kafka.common.errors.DisconnectException.
I  [Consumer clientId=consumer-1, groupId=genericPipe] Group
coordinator 10.0.52.70:9093 (id: 2147483646 rack: null) is unavailable
or invalid, will attempt rediscovery
I  [Consumer clientId=consumer-1, groupId=genericPipe] Discovered
group coordinator 10.0.52.70:9093 (id: 2147483646 rack: null)

And then the pipeline stopped reading the messages.

This is the KafkaIO setup  we have:

KafkaIO.read[String,String]()
  .withBootstrapServers(server)
  .withTopic(topic)
  .withKeyDeserializer(classOf[StringDeserializer])
  .withValueDeserializer(classOf[StringDeserializer])
  .updateConsumerProperties(properties)
  .commitOffsetsInFinalize()
  .withoutMetadata()

 Any help will be much appreciated.

Best regards,
-- 
Eduardo Soldera Garcia
Data Engineer
(16) 3509- | www.arquivei.com.br

[image: Arquivei.com.br – Inteligência em Notas Fiscais]

[image: Google seleciona Arquivei para imersão e mentoria no Vale do
Silício]






Re: Acknowledging Pubsub messages in Flink Runner

2018-09-11 Thread Maximilian Michels

Hey Encho,

The Flink Runner acknowledges messages through PubSubIO's 
`CheckpointMark#finalizeCheckpoint()` method.


The Flink Runner wraps the PubSubIO source via the 
UnboundedSourceWrapper. When Flink takes a checkpoint of the running 
Beam streaming job, the wrapper will retrieve the CheckpointMarks from 
the PubSubIO source.


When the Checkpoint is completed, there is a callback which informs the 
wrapper (`notifyCheckpointComplete()`) and calls `finalizeCheckpoint()` 
on all the generated CheckpointMarks.


Hope that helps debugging your problem. I don't have an explanation why 
this doesn't work for the last records in your PubSub queue. It 
shouldn't make a difference for how the Flink Runner does checkpointing.


Best,
Max

On 10.09.18 18:17, Encho Mishinev wrote:

Hello,

I am using Flink runner with Apache Beam 2.6.0. I was wondering if there 
is information on when exactly the runner acknowledges a pubsub message 
when reading from PubsubIO?


My problem is that whenever there are a few messages left in a 
subscription my streaming job never really seems to acknowledge them 
all. For example is a subscription has 100,000,000 messages in total, 
the job will go through about 99,990,000 and then keep reading the last 
few thousand and seemingly never acknowledge them.


Some clarity on when the acknowledgement happens in the pipeline might 
help me debug this problem.


Thanks!