user, wrote:
> Do you still have the same issue? I tried to follow your setup.sh to
> reproduce this but somehow I am stuck at the word_len step. I saw you also
> tried to use `print(kafka_kv)` to debug it. I am not sure about your
> current status.
>
> On Fri, May 10, 2024 at 9:
) on minikube and a docker image
named beam-python-example:1.17 created using the following docker file -
the full details can be checked in
https://github.com/jaehyeon-kim/beam-demos/blob/feature/beam-deploy/beam-deploy/beam/Dockerfile
The java sdk is used for the sdk harness of the kafka io's expansion
r SDFs,
>
> On Sun, May 5, 2024 at 5:10 AM Jaehyeon Kim wrote:
>
>> Hi XQ
>>
>> Thanks for checking it out. SDFs chaining seems to work as I created my
>> pipeline while converting a pipeline that is built in the Java SDK. The
>> source of the Java p
makes me
> wonder whether the chained SDFs work.
>
> @Chamikara Jayalath Any thoughts?
>
> On Fri, May 3, 2024 at 2:45 AM Jaehyeon Kim wrote:
>
>> Hello,
>>
>> I am building a pipeline using two SDFs that are chained. The first
>> function (DirectoryWatch
but the issue is the first SDF stops as soon as it returns the first
list of files.
The source of the pipeline can be found in
- First SDF:
https://github.com/jaehyeon-kim/beam-demos/blob/feature/beam-pipeline/beam-pipelines/chapter7/directory_watch.py
- Second SDF:
https://github.com/jaehyeon-kim
Here is an example from a book that I'm reading now and it may be
applicable.
JAVA - (id.hashCode() & Integer.MAX_VALUE) % 100
PYTHON - ord(id[0]) % 100
On Sat, 13 Apr 2024 at 06:12, George Dekermenjian wrote:
> How about just keeping track of a buffer and flush the buffer after 100
> messages
Hello,
I saw an example of bundling a Beam pipeline along with all dependencies,
artifacts, etc in the Spark runner document (
https://beam.apache.org/documentation/runners/spark/), and the bundled jar
can be used to run a job without a job server. Does it apply to the Flink
Runner as well? Also,
Hello,
I have questions on watermark generation and trigger about ReadFromKafka
(Python SDK).
1.It only supports the following simple timestamp policies:
- ProcessingTime
- CreateTime
- LogAppendTime
Is there any way to assign event time from timestamp in a Kafka message?
> The Java SDK
nctions.java
> for the supported functions.
>
> On Sat, Mar 23, 2024 at 5:50 AM Jaehyeon Kim wrote:
>
>> Hello,
>>
>> I found a blog article about pattern matching with Beam SQL -
>> https://beam.apache.org/blog/pattern-match-beam-sql/. All of the PRs and
>
Hello,
I found a blog article about pattern matching with Beam SQL -
https://beam.apache.org/blog/pattern-match-beam-sql/. All of the PRs and
commits that are included in the post are merged.
On the other hand, the Beam Calcite SQL overview page indicates
MATCH_RECOGNIZE is not supported (
)
> 4. The key I think is to explicitly specify the output types for
> TestStream like this: TestStream(coder=coders.StrUtf8Coder())
> *.with_output_types(str)*
>
> These at least work for me.
>
> On Thu, Mar 14, 2024 at 4:37 PM Jaehyeon Kim wrote:
>
>> Hello,
&g
Hello,
The pipeline runs in host while host.docker.internal would only be resolved
on the containers that run with the host network mode. I guess the pipeline
wouldn't be accessible to host.docker.internal and fails to run.
If everything before ReadFromKafka works successfully, a docker
Hello,
I am trying a simple word count pipeline in a streaming environment using
TestStream (Python SDK). While it works with the DirectRunner, it fails on
the FlinkRunner with the following error. It looks like a type casting
issue.
Traceback (most recent call last):
File
ise self._runtime_exception
RuntimeError: Pipeline sql-transform_4f61dc80-9aeb-4e85-9bfb-3df1b6e3db48
failed in state FAILED: java.lang.IllegalStateException: No container
running for id
c03fe103bd0c2498b66945608709e6488dabbae9ba359c1e7600c1a9a9c63800
On Tue, 12 Mar 2024 at 15:07, Jaehyeon Kim wrote:
Hello,
✔️ I have a simple pipeline that transforms data with *SqlTransform*. I use
the *FlinkRunner *and, when I don't specify the *flink_master *option and
use an embedded flink cluster, it works fine. However, if I use a local
flink cluster and specify the *flink_master *option to
t;
> That being said, it should work just fine with distinct jars on an
> arbitrary setup as well (and the fact that it works locally hints to
> something being up with that setup). Not sure what your flink master
> configuration is like, but maybe it's a docker-in-docker issue?
>
&
I've been teaching myself Beam and here is an example pipeline that uses
Kafka IO (read and write). I hope it helps.
*Prerequisites*
1. Kafka runs on Docker and its external listener is exposed on port 29092
(i.e. its bootstrap server address can be specified as localhost:29092)
2. The following
t;
>
> Ning.
>
> On Wed, Mar 6, 2024 at 9:50 PM Jaehyeon Kim wrote:
>
>> Hello,
>>
>> I'm playing with the interactive runner on a notebook and the flink
>> runner is used as the underlying runner. I wonder if it can read messages
>> from Kafka. I check
Hello,
I'm playing with the interactive runner on a notebook and the flink runner
is used as the underlying runner. I wonder if it can read messages from
Kafka. I checked the example notebook
It may be a configuration issue. It works if I don't specify *flink_master*,
which uses an embedded flink cluster.
On Thu, 7 Mar 2024 at 12:47, Jaehyeon Kim wrote:
> I works fine if I only use Kafka read/write as I only see a single
> container - two transforms (read and write) but a
d see why it failed to bring up the
> containers (or, if they started, look in the container logs to see why
> they died).
>
> On Wed, Mar 6, 2024 at 5:28 PM Jaehyeon Kim wrote:
> >
> > I am not using the python local runner but the flink runner. A flink
> cluster is started
I am not using the python local runner but the flink runner. A flink
cluster is started locally.
On Thu, 7 Mar 2024 at 12:13, Robert Bradshaw via user
wrote:
> Streaming portable pipelines are not yet supported on the Python local
> runner.
>
> On Wed, Mar 6, 2024 at 5:03 PM Jaehyeo
Hello,
I use the python SDK and my pipeline reads messages from Kafka and
transforms via SQL. I see two containers are created but it seems that they
don't communicate with each other so that the Flink task manager keeps
killing the containers. The Flink cluster runs locally. Is there a way to
or
> defining window on table is cool).
>
> Best
>
> Wiśniowski Piotr
> On 4.03.2024 05:27, Jaehyeon Kim wrote:
>
> Hello,
>
> I just tried a simple tumbling window but failed with the following error
>
> RuntimeError: org.apache.beam.sdk.extensions.sql.i
Hello,
beam_sql magic doesn't seem to have an option to specify an SQL dialect
while the underlying SqlTransform has the dialect argument. Is there a way
to specify an SQL dialect on a notebook?
Cheers,
Jaehyeon
[image: image.png]
Hello,
I just tried a simple tumbling window but failed with the following error
RuntimeError: org.apache.beam.sdk.extensions.sql.impl.ParseException:
Unable to parse query
WITH cte AS (
SELECT TO_TIMESTAMP(event_datetime) AS ts FROM PCOLLECTION
)
SELECT
Hello,
Sorry for sending a late request. Can you please invite me to the
slack channel as well?
Cheers,
Jaehyeon
On Tue, 27 Feb 2024 at 03:44, Valentyn Tymofieiev via user <
user@beam.apache.org> wrote:
> Hi, done for all requests since my last message.
>
> On Mon, Feb 26, 2024 at 12:05 AM
wrote:
> I made this a few years ago to help people like yourself.
>
> https://github.com/sambvfx/beam-flink-k8s
>
> Hopefully it's insightful and I'm happy to accept any MRs to update any
> outdated information or to flesh it out more.
>
> On Thu, Feb 22, 2024 at 3:4
Hello,
I'm playing with the beam portable runner to read/write data from Kafka. I
see a spark runner example on Kubernetes (
https://beam.apache.org/documentation/runners/spark/#kubernetes) but the
flink runner section doesn't include such an example.
Is there a resource that I can learn?
29 matches
Mail list logo