Re: Request to join slack channel

2024-02-20 Thread Daniel Chen via user
Hey Valentyn, can you add me to slack channel as well?

On Tue, Feb 20, 2024 at 3:23 PM Valentyn Tymofieiev via user <
user@beam.apache.org> wrote:

> Hi Lydian,
>
> According to https://infra.apache.org/slack.html, invitations by link
> have been disabled. I submitted an invitation for your email address.
>
> Thanks,
> Valentyn
>
> On Tue, Feb 20, 2024 at 9:33 AM Lydian Lee 
> wrote:
>
>> Hi,
>>
>> Can I get the invitation to join slack channel ?  The ASF slack seems
>> required invitation to be able to join. Thanks
>>
>


Re: Seeking Assistance to Resolve Issues/bug with Flink Runner on Kubernetes

2023-08-14 Thread Daniel Chen via user
Not the OP, but is it possible to join the slack channel without an
apache.org email address? I tried joining slack previously for support and
gave up because it looked like it wasn't.

On Mon, Aug 14, 2023 at 10:58 AM Kenneth Knowles  wrote:

> There is a slack channel linked from
> https://beam.apache.org/community/contact-us/ it is #beam on
> the-asf.slack.com
>
> (you find this via beam.apache.org -> Community -> Contact Us)
>
> It sounds like an issue with running a multi-language pipeline on the
> portable flink runner. (something which I am not equipped to help with in
> detail)
>
> Kenn
>
> On Wed, Aug 9, 2023 at 2:51 PM kapil singh  wrote:
>
>> Hey,
>>
>> I've been grappling with this issue for the past five days and, despite
>> my continuous efforts, I haven't found a resolution. Additionally, I've
>> been unable to locate a Slack channel for Beam where I might seek
>> assistance.
>>
>> issue
>>
>> *RuntimeError: Pipeline construction environment and pipeline runtime
>> environment are not compatible. If you use a custom container image, check
>> that the Python interpreter minor version and the Apache Beam version in
>> your image match the versions used at pipeline construction time.
>> Submission environment: beam:version:sdk_base:apache/beam_java8_sdk:2.48.0.
>> Runtime environment:
>> beam:version:sdk_base:apache/beam_python3.8_sdk:2.48.0.*
>>
>>
>> Here what i am trying to do
>>
>>  i am running job from kubernetes container  that hits on job server and
>> then job manager and task manager
>> task manager and job manager is one Container
>>
>> Here is  My custom Dockerfile. name:custom-flink
>>
>> # Starting with the base Flink image
>> FROM apache/flink:1.16-java11
>> ARG FLINK_VERSION=1.16
>> ARG KAFKA_VERSION=2.8.0
>>
>> # Install python3.8 and its associated dependencies, followed by pyflink
>> RUN set -ex; \
>> apt-get update && \
>> apt-get install -y build-essential libssl-dev zlib1g-dev libbz2-dev
>> libffi-dev lzma liblzma-dev && \
>> wget https://www.python.org/ftp/python/3.8.0/Python-3.8.0.tgz && \
>> tar -xvf Python-3.8.0.tgz && \
>> cd Python-3.8.0 && \
>> ./configure --without-tests --enable-shared && \
>> make -j4 && \
>> make install && \
>> ldconfig /usr/local/lib && \
>> cd .. && rm -f Python-3.8.0.tgz && rm -rf Python-3.8.0 && \
>> ln -s /usr/local/bin/python3.8 /usr/local/bin/python && \
>> ln -s /usr/local/bin/pip3.8 /usr/local/bin/pip && \
>> apt-get clean && \
>> rm -rf /var/lib/apt/lists/* && \
>> python -m pip install --upgrade pip; \
>> pip install apache-flink==${FLINK_VERSION}; \
>> pip install kafka-python
>>
>> RUN pip install --no-cache-dir apache-beam[gcp]==2.48.0
>>
>> # Copy files from official SDK image, including script/dependencies.
>> COPY --from=apache/beam_python3.8_sdk:2.48.0 /opt/apache/beam/
>> /opt/apache/beam/
>>
>> # java SDK
>> COPY --from=apache/beam_java11_sdk:2.48.0 /opt/apache/beam/
>> /opt/apache/beam_java/
>>
>> RUN apt-get update && apt-get install -y python3-venv && rm -rf
>> /var/lib/apt/lists/*
>>
>> # Give permissions to the /opt/apache/beam-venv directory
>> RUN mkdir -p /opt/apache/beam-venv && chown -R :
>> /opt/apache/beam-venv
>>
>> Here is my Deployment file for Job manager,Task manager plus worker-pool
>> and job server
>>
>>
>> apiVersion: v1
>> kind: Service
>> metadata:
>> name: flink-jobmanager
>> namespace: flink
>> spec:
>> type: ClusterIP
>> ports:
>> - name: rpc
>> port: 6123
>> - name: blob-server
>> port: 6124
>> - name: webui
>> port: 8081
>> selector:
>> app: flink
>> component: jobmanager
>> ---
>> apiVersion: v1
>> kind: Service
>> metadata:
>> name: beam-worker-pool
>> namespace: flink
>> spec:
>> selector:
>> app: flink
>> component: taskmanager
>> ports:
>> - protocol: TCP
>> port: 5
>> targetPort: 5
>> name: pool
>> ---
>> apiVersion: apps/v1
>> kind: Deployment
>> metadata:
>> name: flink-jobmanager
>> namespace: flink
>> spec:
>> replicas: 1
>> selector:
>> matchLabels:
>> app: flink
>> component: jobmanager
>> template:
>> metadata:
>> labels:
>> app: flink
>> component: jobmanager
>> spec:
>> containers:
>> - name: jobmanager
>> image: custom-flink:latest
>> imagePullPolicy: IfNotPresent
>> args: ["jobmanager"]
>> ports:
>> - containerPort: 6123
>> name: rpc
>> - containerPort: 6124
>> name: blob-server
>> - containerPort: 8081
>> name: webui
>> livenessProbe:
>> tcpSocket:
>> port: 6123
>> initialDelaySeconds: 30
>> periodSeconds: 60
>> volumeMounts:
>> - name: flink-config-volume
>> mountPath: /opt/flink/conf
>> - name: flink-staging
>> mountPath: /tmp/beam-artifact-staging
>> securityContext:
>> runAsUser: 
>> resources:
>> requests:
>> memory: "1Gi"
>> cpu: "1"
>> limits:
>> memory: "1Gi"
>> cpu: "1"
>> volumes:
>> - name: flink-config-volume
>> configMap:
>> name: flink-config
>> items:
>> - key: flink-conf.yaml
>> path: flink-conf.yaml
>> - key: log4j-console.properties
>> path: log4j-console.properties
>> - name: flink-staging
>> persistentVolumeClaim:
>> claimName: 

Exception handling in ReadFromTextWithFilename?

2022-12-07 Thread Daniel Chen via user
Hi friends,

I encountered an issue with the beam python SDK (2.43.0) recently where I
was using ReadFromTextWithFilename on a Google Cloud Storage (GCS) bucket
that contains roughly 95k  gzip compressed CSV files. One of the files was
truncated in transit, so the job ran for a couple of hours before returning
an exception like zlib.error: Error -3 while decompressing data: incorrect
header check from within the apache_beam.io.Filesystem module. The
exception didn't indicate the filename for the truncated file, and from
looking through the standard library, I couldn't find any mechanism to
handle the exception or to return additional context that would have
allowed me to remediate the situation.

Is there an example of how to handle this situation? Ideally, the library
would return a PCollection of filenames that encountered errors while
reading or something similar to that for further processing rather than
causing a job to crash.