You have “mypackage” incorrectly built. Please check and confirm that. Utkarsh
On Sun, Jun 16, 2024 at 12:48 PM Sofia’s World <[email protected]> wrote: > Error is same...- see bottom - > i have tried to ssh in the container and the directory is setup as > expected...... so not quite sure where the issue is > i will try to start from the pipeline with dependencies sample and work > out from there w.o bothering the list > > thanks again for following up > Marco > > Could not load main session. Inspect which external dependencies are used > in the main module of your pipeline. Verify that corresponding packages are > installed in the pipeline runtime environment and their installed versions > match the versions used in pipeline submission environment. For more > information, see: > https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/ > Traceback (most recent call last): File > "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py", > line 115, in create_harness _load_main_session(semi_persistent_directory) > File > "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py", > line 354, in _load_main_session pickler.load_session(session_file) File > "/usr/local/lib/python3.11/site-packages/apache_beam/internal/pickler.py", > line 65, in load_session return desired_pickle_lib.load_session(file_path) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File > "/usr/local/lib/python3.11/site-packages/apache_beam/internal/dill_pickler.py", > line 446, in load_session return dill.load_session(file_path) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File > "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 368, in > load_session module = unpickler.load() ^^^^^^^^^^^^^^^^ File > "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in load > obj = StockUnpickler.load(self) ^^^^^^^^^^^^^^^^^^^^^^^^^ File > "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 827, in > _import_module return getattr(__import__(module, None, None, [obj]), obj) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No module named > 'mypackage' > > > > On Sun, 16 Jun 2024, 14:50 XQ Hu via user, <[email protected]> wrote: > >> What is the error message now? >> You can easily ssh to your docker container and check everything is >> installed correctly by: >> docker run --rm -it --entrypoint=/bin/bash $CUSTOM_CONTAINER_IMAGE >> >> >> On Sun, Jun 16, 2024 at 5:18 AM Sofia’s World <[email protected]> >> wrote: >> >>> Valentin, many thanks... i actually spotted the reference in teh setup >>> file >>> However , after correcting it, i am still at square 1 where somehow my >>> runtime environment does not see it.. so i added some debugging to my >>> Dockerfile to check if i forgot to copy something, >>> and here's the output, where i can see the mypackage has been copied >>> >>> here's my directory structure >>> >>> ---- mypackage >>> __init__.py >>> obbutils.py >>> launcher.py >>> __init__.py >>> dataflow_tester.py >>> setup_dftester.py (copied to setup.py) >>> >>> i can see the directory structure has been maintained when i copy my >>> files to docker as i added some debug to my dockerfile >>> >>> Step #0 - "dftester-image": Removing intermediate container 4c4e763289d2 >>> Step #0 - "dftester-image": ---> cda378f70a9e >>> Step #0 - "dftester-image": Step 6/23 : COPY requirements.txt . >>> Step #0 - "dftester-image": ---> 9a43da08b013 >>> Step #0 - "dftester-image": Step 7/23 : COPY setup_dftester.py setup.py >>> Step #0 - "dftester-image": ---> 5a6bf71df052 >>> Step #0 - "dftester-image": Step 8/23 : COPY dataflow_tester.py . >>> Step #0 - "dftester-image": ---> 82cfe1f1f9ed >>> Step #0 - "dftester-image": Step 9/23 : COPY mypackage mypackage >>> Step #0 - "dftester-image": ---> d86497b791d0 >>> Step #0 - "dftester-image": Step 10/23 : COPY __init__.py >>> ${WORKDIR}/__init__.py >>> Step #0 - "dftester-image": ---> 337d149d64c7 >>> Step #0 - "dftester-image": Step 11/23 : RUN echo '----- listing workdir' >>> Step #0 - "dftester-image": ---> Running in 9d97d8a64319 >>> Step #0 - "dftester-image": ----- listing workdir >>> Step #0 - "dftester-image": Removing intermediate container 9d97d8a64319 >>> Step #0 - "dftester-image": ---> bc9a6a2aa462 >>> Step #0 - "dftester-image": Step 12/23 : RUN ls -la ${WORKDIR} >>> Step #0 - "dftester-image": ---> Running in cf164108f9d6 >>> Step #0 - "dftester-image": total 24 >>> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 . >>> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 .. >>> Step #0 - "dftester-image": -rw-r--r-- 1 root root 0 Jun 16 08:57 >>> __init__.py >>> Step #0 - "dftester-image": -rw-r--r-- 1 root root 135 Jun 16 08:57 >>> dataflow_tester.py >>> Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59 >>> mypackage >>> Step #0 - "dftester-image": -rw-r--r-- 1 root root 64 Jun 16 08:57 >>> requirements.txt >>> Step #0 - "dftester-image": -rw-r--r-- 1 root root 736 Jun 16 08:57 >>> setup.py >>> Step #0 - "dftester-image": Removing intermediate container cf164108f9d6 >>> Step #0 - "dftester-image": ---> eb1a080b7948 >>> Step #0 - "dftester-image": Step 13/23 : RUN echo '--- listing modules >>> -----' >>> Step #0 - "dftester-image": ---> Running in 884f03dd81d6 >>> Step #0 - "dftester-image": --- listing modules ----- >>> Step #0 - "dftester-image": Removing intermediate container 884f03dd81d6 >>> Step #0 - "dftester-image": ---> 9f6f7e27bd2f >>> Step #0 - "dftester-image": Step 14/23 : RUN ls -la ${WORKDIR}/mypackage >>> Step #0 - "dftester-image": ---> Running in bd74ade37010 >>> Step #0 - "dftester-image": total 16 >>> Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59 . >>> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 .. >>> Step #0 - "dftester-image": -rw-r--r-- 1 root root 0 Jun 16 08:57 >>> __init__.py >>> Step #0 - "dftester-image": -rw-r--r-- 1 root root 1442 Jun 16 08:57 >>> launcher.py >>> Step #0 - "dftester-image": -rw-r--r-- 1 root root 607 Jun 16 08:57 >>> obb_utils.py >>> Step #0 - "dftester-image": Removing intermediate container bd74ade37010 >>> >>> >>> i have this in my setup.py >>> >>> REQUIRED_PACKAGES = [ >>> 'openbb', >>> "apache-beam[gcp]", # Must match the version in `Dockerfile``. >>> 'sendgrid', >>> 'pandas_datareader', >>> 'vaderSentiment', >>> 'numpy', >>> 'bs4', >>> 'lxml', >>> 'pandas_datareader', >>> 'beautifulsoup4', >>> 'xlrd', >>> 'openpyxl' >>> ] >>> >>> >>> setuptools.setup( >>> name='mypackage', >>> version='0.0.1', >>> description='Shres Runner Package.', >>> install_requires=REQUIRED_PACKAGES, >>> packages=setuptools.find_packages() >>> ) >>> >>> >>> and this is my dataflow_tester.py >>> >>> from mypackage import launcher >>> import logging >>> if __name__ == '__main__': >>> logging.getLogger().setLevel(logging.INFO) >>> launcher.run() >>> >>> >>> >>> have compared my setup vs >>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies >>> and all looks the same (apart from my copying the __init__.,py fromo the >>> directory where the main file(dataflow_tester.py) resides >>> >>> Would you know how else can i debug what is going on and why my >>> mypackages subdirectory is not being seen? >>> >>> Kind regars >>> Marco >>> >>> >>> >>> >>> On Sat, Jun 15, 2024 at 7:27 PM Valentyn Tymofieiev via user < >>> [email protected]> wrote: >>> >>>> Your pipeline launcher refers to a package named 'modules', but this >>>> package is not available in the runtime environment. >>>> >>>> On Sat, Jun 15, 2024 at 11:17 AM Sofia’s World <[email protected]> >>>> wrote: >>>> >>>>> Sorry, i cheered up too early >>>>> i can successfully build the image however, at runtime the code fails >>>>> always with this exception and i cannot figure out why >>>>> >>>>> i mimicked the sample directory structure >>>>> >>>>> >>>>> ---- mypackage >>>>> --- __init__,py >>>>> dftester.py >>>>> obb_utils.py >>>>> >>>>> dataflow_tester_main.py >>>>> >>>>> this is the content of my dataflow_tester_main.py >>>>> >>>>> from mypackage import dftester >>>>> import logging >>>>> if __name__ == '__main__': >>>>> logging.getLogger().setLevel(logging.INFO) >>>>> dftester.run() >>>>> >>>>> >>>>> and this is my dockerfile >>>>> >>>>> >>>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/Dockerfile_tester >>>>> >>>>> and at the bottom if this email my exception >>>>> I am puzzled on where the error is coming from as i have almost copied >>>>> this sample >>>>> https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/main.py >>>>> >>>>> thanks and regards >>>>> Marco >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Traceback (most recent call last): File >>>>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py", >>>>> line 115, in create_harness _load_main_session(semi_persistent_directory) >>>>> File >>>>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py", >>>>> line 354, in _load_main_session pickler.load_session(session_file) File >>>>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/pickler.py", >>>>> line 65, in load_session return desired_pickle_lib.load_session(file_path) >>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File >>>>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/dill_pickler.py", >>>>> line 446, in load_session return dill.load_session(file_path) >>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File >>>>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 368, in >>>>> load_session module = unpickler.load() ^^^^^^^^^^^^^^^^ File >>>>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in load >>>>> obj = StockUnpickler.load(self) ^^^^^^^^^^^^^^^^^^^^^^^^^ File >>>>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 462, in >>>>> find_class return StockUnpickler.find_class(self, module, name) >>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No >>>>> module named 'modules' >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Jun 14, 2024 at 5:52 AM Sofia’s World <[email protected]> >>>>> wrote: >>>>> >>>>>> Many thanks Hu, worked like a charm >>>>>> >>>>>> few qq >>>>>> so in my reqs.txt i should put all beam requirements PLUS my own? >>>>>> >>>>>> and in the setup.py, shall i just declare >>>>>> >>>>>> "apache-beam[gcp]==2.54.0", # Must match the version in >>>>>> `Dockerfile``. >>>>>> >>>>>> thanks and kind regards >>>>>> Marco >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jun 12, 2024 at 1:48 PM XQ Hu <[email protected]> wrote: >>>>>> >>>>>>> Any reason to use this? >>>>>>> >>>>>>> RUN pip install avro-python3 pyarrow==0.15.1 >>>>>>> apache-beam[gcp]==2.30.0 pandas-datareader==0.9.0 >>>>>>> >>>>>>> It is typically recommended to use the latest Beam and build the >>>>>>> docker image using the requirements released for each Beam, for example, >>>>>>> https://github.com/apache/beam/blob/release-2.56.0/sdks/python/container/py311/base_image_requirements.txt >>>>>>> >>>>>>> On Wed, Jun 12, 2024 at 1:31 AM Sofia’s World <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Sure, apologies, it crossed my mind it would have been useful to >>>>>>>> refert to it >>>>>>>> >>>>>>>> so this is the docker file >>>>>>>> >>>>>>>> >>>>>>>> https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester >>>>>>>> >>>>>>>> I was using a setup.py as well, but then i commented out the usage >>>>>>>> in the dockerfile after checking some flex templates which said it is >>>>>>>> not >>>>>>>> needed >>>>>>>> >>>>>>>> >>>>>>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/setup_dftester.py >>>>>>>> >>>>>>>> thanks in advance >>>>>>>> Marco >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jun 11, 2024 at 10:54 PM XQ Hu <[email protected]> wrote: >>>>>>>> >>>>>>>>> Can you share your Dockerfile? >>>>>>>>> >>>>>>>>> On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> thanks all, it seemed to work but now i am getting a different >>>>>>>>>> problem, having issues in building pyarrow... >>>>>>>>>> >>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>>>> <string>:36: DeprecationWarning: pkg_resources is deprecated as >>>>>>>>>> an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html >>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>>>> WARNING setuptools_scm.pyproject_reading toml section missing >>>>>>>>>> 'pyproject.toml does not contain a tool.setuptools_scm section' >>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>>>> Traceback (most recent call last): >>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>>>> File >>>>>>>>>> "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py", >>>>>>>>>> line 36, in read_pyproject >>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>>>> section = defn.get("tool", {})[tool_name] >>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^ >>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>>>> KeyError: 'setuptools_scm' >>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image": >>>>>>>>>> running bdist_wheel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It is somehow getting messed up with a toml ? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Could anyone advise? >>>>>>>>>> >>>>>>>>>> thanks >>>>>>>>>> >>>>>>>>>> Marco >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Jun 11, 2024 at 1:00 AM XQ Hu via user < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies >>>>>>>>>>> is a great example. >>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> In this case the Python version will be defined by the Python >>>>>>>>>>>> version installed in the docker image of your flex template. So, >>>>>>>>>>>> you'd >>>>>>>>>>>> have to build your flex template from a base image with Python >>>>>>>>>>>> 3.11. >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hello >>>>>>>>>>>>> no i am running my pipelien on GCP directly via a flex >>>>>>>>>>>>> template, configured using a Docker file >>>>>>>>>>>>> Any chances to do something in the Dockerfile to force the >>>>>>>>>>>>> version at runtime? >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Are you running your pipeline from the python 3.11 >>>>>>>>>>>>>> environment? If you are running from a python 3.11 environment >>>>>>>>>>>>>> and don't >>>>>>>>>>>>>> use a custom docker container image, DataflowRunner(Assuming >>>>>>>>>>>>>> Apache Beam on >>>>>>>>>>>>>> GCP means Apache Beam on DataflowRunner), will use Python 3.11. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Anand >>>>>>>>>>>>>> >>>>>>>>>>>>>
