What is the error message now? You can easily ssh to your docker container and check everything is installed correctly by: docker run --rm -it --entrypoint=/bin/bash $CUSTOM_CONTAINER_IMAGE
On Sun, Jun 16, 2024 at 5:18 AM Sofia’s World <[email protected]> wrote: > Valentin, many thanks... i actually spotted the reference in teh setup file > However , after correcting it, i am still at square 1 where somehow my > runtime environment does not see it.. so i added some debugging to my > Dockerfile to check if i forgot to copy something, > and here's the output, where i can see the mypackage has been copied > > here's my directory structure > > ---- mypackage > __init__.py > obbutils.py > launcher.py > __init__.py > dataflow_tester.py > setup_dftester.py (copied to setup.py) > > i can see the directory structure has been maintained when i copy my files > to docker as i added some debug to my dockerfile > > Step #0 - "dftester-image": Removing intermediate container 4c4e763289d2 > Step #0 - "dftester-image": ---> cda378f70a9e > Step #0 - "dftester-image": Step 6/23 : COPY requirements.txt . > Step #0 - "dftester-image": ---> 9a43da08b013 > Step #0 - "dftester-image": Step 7/23 : COPY setup_dftester.py setup.py > Step #0 - "dftester-image": ---> 5a6bf71df052 > Step #0 - "dftester-image": Step 8/23 : COPY dataflow_tester.py . > Step #0 - "dftester-image": ---> 82cfe1f1f9ed > Step #0 - "dftester-image": Step 9/23 : COPY mypackage mypackage > Step #0 - "dftester-image": ---> d86497b791d0 > Step #0 - "dftester-image": Step 10/23 : COPY __init__.py > ${WORKDIR}/__init__.py > Step #0 - "dftester-image": ---> 337d149d64c7 > Step #0 - "dftester-image": Step 11/23 : RUN echo '----- listing workdir' > Step #0 - "dftester-image": ---> Running in 9d97d8a64319 > Step #0 - "dftester-image": ----- listing workdir > Step #0 - "dftester-image": Removing intermediate container 9d97d8a64319 > Step #0 - "dftester-image": ---> bc9a6a2aa462 > Step #0 - "dftester-image": Step 12/23 : RUN ls -la ${WORKDIR} > Step #0 - "dftester-image": ---> Running in cf164108f9d6 > Step #0 - "dftester-image": total 24 > Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 . > Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 .. > Step #0 - "dftester-image": -rw-r--r-- 1 root root 0 Jun 16 08:57 > __init__.py > Step #0 - "dftester-image": -rw-r--r-- 1 root root 135 Jun 16 08:57 > dataflow_tester.py > Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59 > mypackage > Step #0 - "dftester-image": -rw-r--r-- 1 root root 64 Jun 16 08:57 > requirements.txt > Step #0 - "dftester-image": -rw-r--r-- 1 root root 736 Jun 16 08:57 > setup.py > Step #0 - "dftester-image": Removing intermediate container cf164108f9d6 > Step #0 - "dftester-image": ---> eb1a080b7948 > Step #0 - "dftester-image": Step 13/23 : RUN echo '--- listing modules > -----' > Step #0 - "dftester-image": ---> Running in 884f03dd81d6 > Step #0 - "dftester-image": --- listing modules ----- > Step #0 - "dftester-image": Removing intermediate container 884f03dd81d6 > Step #0 - "dftester-image": ---> 9f6f7e27bd2f > Step #0 - "dftester-image": Step 14/23 : RUN ls -la ${WORKDIR}/mypackage > Step #0 - "dftester-image": ---> Running in bd74ade37010 > Step #0 - "dftester-image": total 16 > Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59 . > Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 .. > Step #0 - "dftester-image": -rw-r--r-- 1 root root 0 Jun 16 08:57 > __init__.py > Step #0 - "dftester-image": -rw-r--r-- 1 root root 1442 Jun 16 08:57 > launcher.py > Step #0 - "dftester-image": -rw-r--r-- 1 root root 607 Jun 16 08:57 > obb_utils.py > Step #0 - "dftester-image": Removing intermediate container bd74ade37010 > > > i have this in my setup.py > > REQUIRED_PACKAGES = [ > 'openbb', > "apache-beam[gcp]", # Must match the version in `Dockerfile``. > 'sendgrid', > 'pandas_datareader', > 'vaderSentiment', > 'numpy', > 'bs4', > 'lxml', > 'pandas_datareader', > 'beautifulsoup4', > 'xlrd', > 'openpyxl' > ] > > > setuptools.setup( > name='mypackage', > version='0.0.1', > description='Shres Runner Package.', > install_requires=REQUIRED_PACKAGES, > packages=setuptools.find_packages() > ) > > > and this is my dataflow_tester.py > > from mypackage import launcher > import logging > if __name__ == '__main__': > logging.getLogger().setLevel(logging.INFO) > launcher.run() > > > > have compared my setup vs > https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies > and all looks the same (apart from my copying the __init__.,py fromo the > directory where the main file(dataflow_tester.py) resides > > Would you know how else can i debug what is going on and why my > mypackages subdirectory is not being seen? > > Kind regars > Marco > > > > > On Sat, Jun 15, 2024 at 7:27 PM Valentyn Tymofieiev via user < > [email protected]> wrote: > >> Your pipeline launcher refers to a package named 'modules', but this >> package is not available in the runtime environment. >> >> On Sat, Jun 15, 2024 at 11:17 AM Sofia’s World <[email protected]> >> wrote: >> >>> Sorry, i cheered up too early >>> i can successfully build the image however, at runtime the code fails >>> always with this exception and i cannot figure out why >>> >>> i mimicked the sample directory structure >>> >>> >>> ---- mypackage >>> --- __init__,py >>> dftester.py >>> obb_utils.py >>> >>> dataflow_tester_main.py >>> >>> this is the content of my dataflow_tester_main.py >>> >>> from mypackage import dftester >>> import logging >>> if __name__ == '__main__': >>> logging.getLogger().setLevel(logging.INFO) >>> dftester.run() >>> >>> >>> and this is my dockerfile >>> >>> >>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/Dockerfile_tester >>> >>> and at the bottom if this email my exception >>> I am puzzled on where the error is coming from as i have almost copied >>> this sample >>> https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/main.py >>> >>> thanks and regards >>> Marco >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Traceback (most recent call last): File >>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py", >>> line 115, in create_harness _load_main_session(semi_persistent_directory) >>> File >>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py", >>> line 354, in _load_main_session pickler.load_session(session_file) File >>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/pickler.py", >>> line 65, in load_session return desired_pickle_lib.load_session(file_path) >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File >>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/dill_pickler.py", >>> line 446, in load_session return dill.load_session(file_path) >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File >>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 368, in >>> load_session module = unpickler.load() ^^^^^^^^^^^^^^^^ File >>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in load >>> obj = StockUnpickler.load(self) ^^^^^^^^^^^^^^^^^^^^^^^^^ File >>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 462, in >>> find_class return StockUnpickler.find_class(self, module, name) >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No >>> module named 'modules' >>> >>> >>> >>> >>> >>> >>> >>> On Fri, Jun 14, 2024 at 5:52 AM Sofia’s World <[email protected]> >>> wrote: >>> >>>> Many thanks Hu, worked like a charm >>>> >>>> few qq >>>> so in my reqs.txt i should put all beam requirements PLUS my own? >>>> >>>> and in the setup.py, shall i just declare >>>> >>>> "apache-beam[gcp]==2.54.0", # Must match the version in `Dockerfile``. >>>> >>>> thanks and kind regards >>>> Marco >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Jun 12, 2024 at 1:48 PM XQ Hu <[email protected]> wrote: >>>> >>>>> Any reason to use this? >>>>> >>>>> RUN pip install avro-python3 pyarrow==0.15.1 apache-beam[gcp]==2.30.0 >>>>> pandas-datareader==0.9.0 >>>>> >>>>> It is typically recommended to use the latest Beam and build the >>>>> docker image using the requirements released for each Beam, for example, >>>>> https://github.com/apache/beam/blob/release-2.56.0/sdks/python/container/py311/base_image_requirements.txt >>>>> >>>>> On Wed, Jun 12, 2024 at 1:31 AM Sofia’s World <[email protected]> >>>>> wrote: >>>>> >>>>>> Sure, apologies, it crossed my mind it would have been useful to >>>>>> refert to it >>>>>> >>>>>> so this is the docker file >>>>>> >>>>>> >>>>>> https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester >>>>>> >>>>>> I was using a setup.py as well, but then i commented out the usage in >>>>>> the dockerfile after checking some flex templates which said it is not >>>>>> needed >>>>>> >>>>>> >>>>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/setup_dftester.py >>>>>> >>>>>> thanks in advance >>>>>> Marco >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jun 11, 2024 at 10:54 PM XQ Hu <[email protected]> wrote: >>>>>> >>>>>>> Can you share your Dockerfile? >>>>>>> >>>>>>> On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> thanks all, it seemed to work but now i am getting a different >>>>>>>> problem, having issues in building pyarrow... >>>>>>>> >>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>> <string>:36: DeprecationWarning: pkg_resources is deprecated as an >>>>>>>> API. See https://setuptools.pypa.io/en/latest/pkg_resources.html >>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>> WARNING setuptools_scm.pyproject_reading toml section missing >>>>>>>> 'pyproject.toml does not contain a tool.setuptools_scm section' >>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>> Traceback (most recent call last): >>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>> File >>>>>>>> "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py", >>>>>>>> line 36, in read_pyproject >>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>> section = defn.get("tool", {})[tool_name] >>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>> ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^ >>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>> KeyError: 'setuptools_scm' >>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>> running bdist_wheel >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> It is somehow getting messed up with a toml ? >>>>>>>> >>>>>>>> >>>>>>>> Could anyone advise? >>>>>>>> >>>>>>>> thanks >>>>>>>> >>>>>>>> Marco >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jun 11, 2024 at 1:00 AM XQ Hu via user < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies >>>>>>>>> is a great example. >>>>>>>>> >>>>>>>>> On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> In this case the Python version will be defined by the Python >>>>>>>>>> version installed in the docker image of your flex template. So, >>>>>>>>>> you'd >>>>>>>>>> have to build your flex template from a base image with Python 3.11. >>>>>>>>>> >>>>>>>>>> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hello >>>>>>>>>>> no i am running my pipelien on GCP directly via a flex >>>>>>>>>>> template, configured using a Docker file >>>>>>>>>>> Any chances to do something in the Dockerfile to force the >>>>>>>>>>> version at runtime? >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> Are you running your pipeline from the python 3.11 >>>>>>>>>>>> environment? If you are running from a python 3.11 environment >>>>>>>>>>>> and don't >>>>>>>>>>>> use a custom docker container image, DataflowRunner(Assuming >>>>>>>>>>>> Apache Beam on >>>>>>>>>>>> GCP means Apache Beam on DataflowRunner), will use Python 3.11. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Anand >>>>>>>>>>>> >>>>>>>>>>>
