RE: kafkaIO Consumer Rebalance with Spark Runner

2019-01-30 Thread linrick
Dear Alexey, I have tried to use the following settings: Map map = ImmutableMap.builder() .put("topic", (Object)"kafkasink2") .put("group.id", (Object)"test-consumer-group") .put("partition.assignment.strategy", (Object)"org.apache.kafka.clients.consumer.RangeAssignor")

RE: kafkaIO Consumer Rebalance with Spark Runner

2019-01-28 Thread linrick
Dear Raghu, I add the line: “PCollection reshuffled = windowKV.apply(Reshuffle.viaRandomKey());” in my program. I tried to control the streaming data size: 100,000/1sec to decrease the processing time. The following settings are used for my project. 1. One topic / 2 partitions

kafkaIO Consumer Rebalance with Spark Runner

2019-01-24 Thread linrick
Dear all, I am using the kafkaIO sdk in my project (Beam with Spark runner). The problem about task skew is shown as the following figure. [cid:image002.jpg@01D4B4A0.6017B340] My running environment is: OS: Ubuntn 14.04.4 LTS The version of related tools is: java version: "1.8.0_151" Beam

RE: A windows/trigger Question with kafkaIO over Spark Runner

2018-07-31 Thread linrick
) Spark 2.3.1 (local mode) Kafka: 2.11-0.10.1.1 scala: 2.11.8 java: 1.8 ====== My programs (KafkaToKafka.java and StarterPipeline.java) are as shown on Github, as: https://github.com/LinRick/beamkafkaIO The configuration setting of Kafka broker is: =

RE: A windows/trigger Question with kafkaIO over Spark Runner

2018-07-31 Thread linrick
n my running program: == Beam 2.4.0 (Direct runner and Spark runner) Spark 2.3.1 (local mode) Kafka: 2.11-0.10.1.1 scala: 2.11.8 java: 1.8 == My programs (KafkaToKafka.java and StarterPipeline.java) are as shown on Github, a

RE: A windows/trigger Question with kafkaIO over Spark Runner

2018-07-30 Thread linrick
= My programs (KafkaToKafka.java and StarterPipeline.java) are as shown on Github, as: https://github.com/LinRick/beamkafkaIO The configuration setting of Kafka broker is: == /kafka_broker/bin/kafka-producer-perf-test.sh \ --num-records 1000 \ --re

A windows/trigger Question with kafkaIO over Spark Runner

2018-07-30 Thread linrick
== My programs (KafkaToKafka.java and StarterPipeline.java) are as shown on Github, as: https://github.com/LinRick/beamkafkaIO The configuration setting of Kafka broker is: == /kafka_broker/bin/kafka-producer-perf-test.sh \ --num-records 1000

RE: kafkaIO Run with Spark Runner: "streaming-job-executor-0"

2018-06-20 Thread linrick
Hi, I insert one a text in my code as follows: … .apply(Window.in.of(Duration.standardSeconds(1))) .triggering(AfterWatermark.pastEndOfWindow() .withLateFirings(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.ZERO)))

RE: kafkaIO Run with Spark Runner: "streaming-job-executor-0"

2018-06-20 Thread linrick
Dear all, The related version in my running env: Beam 2.4.0 (Spark runner with Standalone mode) Spark 2.0.0 Kafka: 2.11-0.10.1.1 , and the setting of Pom.xml as mentioned earlier. When we run the following code: args=new String[]{"--runner=SparkRunner","--sparkMaster=spark://ubuntu8:7077"};

RE: kafkaIO Run with Spark Runner: "streaming-job-executor-0"

2018-06-14 Thread linrick
Hi all, I have changed these versions of tools, as: Beam 2.4.0 (Spark runner with Standalone mode) Spark 2.0.0 Kafka: 2.11-0.10.1.1 Pom.xml: org.apache.beam beam-sdks-java-core 2.4.0 org.apache.beam beam-runners-spark 2.4.0 org.apache.beam beam-sdks-java-io-kafka

kafkaIO Run with Spark Runner: "streaming-job-executor-0"

2018-06-13 Thread linrick
Dear all, I am using the kafkaIO in my project (Beam 2.0.0 with Spark runner). My running environment is: OS: Ubuntn 14.04.3 LTS The different version for these tools: JAVA: JDK 1.8 Beam 2.0.0 (Spark runner with Standalone mode) Spark 1.6.0 Standalone mode :One driver node: ubuntu7; One master

RE: The problem of kafkaIO sdk for data latency

2018-03-04 Thread linrick
Hi Raghu, I changed my beam version from 2.0.0 to 2.3.0, and then work well. Thanks for your help. Now, I have another question about data size in my window. I am trying to control a fixed data size in my window (i.e., fixed 100 data size/window). However, I sometimes see that the number of

The problem of kafkaIO sdk for data latency

2018-02-28 Thread linrick
Dear all, I am using the kafkaIO sdk in my project (Beam 2.0.0 with Direct runner). With using this sdk, there are a situation about data latency, and the description of situation is in the following. The data come from kafak with a fixed speed: 100 data size/ 1 sec. I create a fixed window

RE: Regarding Beam Slack Channel

2017-12-01 Thread linrick
Hi Can I receive this invitation, too? Thanks Rick -Original Message- From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net] Sent: Friday, December 01, 2017 12:53 PM To: user@beam.apache.org Subject: Re: Regarding Beam Slack Channel Invite sent as well. Regards JB On 11/30/2017 07:19

Can we transfrom PCollection to ArrayList?

2017-10-31 Thread linrick
Hi all, I am Rick I would like to transform datatype from PCollection to ArrayList, and I am not sure if it is right? My run env is following Java: 1.8 and Beam: 2.1.0 My java code is as: ArrayList myList = new ArrayList(); PipelineOptions options = PipelineOptionsFactory.create(); Pipeline

RE: How to use ConsoleIO SDK

2017-10-23 Thread linrick
Hi, If it's all right with you, I wanna try it. Thanks Sincerely Rick -Original Message- From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net] Sent: Friday, October 20, 2017 1:32 PM To: user@beam.apache.org Subject: Re: How to use ConsoleIO SDK Hi, I started to work on the ConsoleIO

How to use ConsoleIO SDK

2017-10-19 Thread linrick
Dear sir, I have the question how to use the beam java sdk: ConsoleIO. My objective colored in background yellow is to write the PCollection ”data” on Console, and then use it(type: RDD ??) as another variable to do other works. If any further information is needed, I am glad to be informed