What is the error message now?
You can easily ssh to your docker container and check everything is
installed correctly by:
docker run --rm -it --entrypoint=/bin/bash $CUSTOM_CONTAINER_IMAGE


On Sun, Jun 16, 2024 at 5:18 AM Sofia’s World <[email protected]> wrote:

> Valentin, many thanks... i actually spotted the reference in teh setup file
> However , after correcting it, i am still at square 1 where somehow my
> runtime environment does not see it.. so i added some debugging to my
> Dockerfile to check if i forgot to copy something,
> and here's the output, where i can see the mypackage has been copied
>
> here's my directory structure
>
> ---- mypackage
> __init__.py
> obbutils.py
> launcher.py
> __init__.py
> dataflow_tester.py
> setup_dftester.py (copied to setup.py)
>
> i can see the directory structure has been maintained when i copy my files
> to docker as i added some debug to my dockerfile
>
> Step #0 - "dftester-image": Removing intermediate container 4c4e763289d2
> Step #0 - "dftester-image":  ---> cda378f70a9e
> Step #0 - "dftester-image": Step 6/23 : COPY requirements.txt .
> Step #0 - "dftester-image":  ---> 9a43da08b013
> Step #0 - "dftester-image": Step 7/23 : COPY setup_dftester.py setup.py
> Step #0 - "dftester-image":  ---> 5a6bf71df052
> Step #0 - "dftester-image": Step 8/23 : COPY dataflow_tester.py .
> Step #0 - "dftester-image":  ---> 82cfe1f1f9ed
> Step #0 - "dftester-image": Step 9/23 : COPY mypackage mypackage
> Step #0 - "dftester-image":  ---> d86497b791d0
> Step #0 - "dftester-image": Step 10/23 : COPY __init__.py
> ${WORKDIR}/__init__.py
> Step #0 - "dftester-image":  ---> 337d149d64c7
> Step #0 - "dftester-image": Step 11/23 : RUN echo '----- listing workdir'
> Step #0 - "dftester-image":  ---> Running in 9d97d8a64319
> Step #0 - "dftester-image": ----- listing workdir
> Step #0 - "dftester-image": Removing intermediate container 9d97d8a64319
> Step #0 - "dftester-image":  ---> bc9a6a2aa462
> Step #0 - "dftester-image": Step 12/23 : RUN ls -la ${WORKDIR}
> Step #0 - "dftester-image":  ---> Running in cf164108f9d6
> Step #0 - "dftester-image": total 24
> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 .
> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 ..
> Step #0 - "dftester-image": -rw-r--r-- 1 root root    0 Jun 16 08:57
> __init__.py
> Step #0 - "dftester-image": -rw-r--r-- 1 root root  135 Jun 16 08:57
> dataflow_tester.py
> Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59
> mypackage
> Step #0 - "dftester-image": -rw-r--r-- 1 root root   64 Jun 16 08:57
> requirements.txt
> Step #0 - "dftester-image": -rw-r--r-- 1 root root  736 Jun 16 08:57
> setup.py
> Step #0 - "dftester-image": Removing intermediate container cf164108f9d6
> Step #0 - "dftester-image":  ---> eb1a080b7948
> Step #0 - "dftester-image": Step 13/23 : RUN echo '--- listing modules
> -----'
> Step #0 - "dftester-image":  ---> Running in 884f03dd81d6
> Step #0 - "dftester-image": --- listing modules -----
> Step #0 - "dftester-image": Removing intermediate container 884f03dd81d6
> Step #0 - "dftester-image":  ---> 9f6f7e27bd2f
> Step #0 - "dftester-image": Step 14/23 : RUN ls -la  ${WORKDIR}/mypackage
> Step #0 - "dftester-image":  ---> Running in bd74ade37010
> Step #0 - "dftester-image": total 16
> Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59 .
> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 ..
> Step #0 - "dftester-image": -rw-r--r-- 1 root root    0 Jun 16 08:57
> __init__.py
> Step #0 - "dftester-image": -rw-r--r-- 1 root root 1442 Jun 16 08:57
> launcher.py
> Step #0 - "dftester-image": -rw-r--r-- 1 root root  607 Jun 16 08:57
> obb_utils.py
> Step #0 - "dftester-image": Removing intermediate container bd74ade37010
>
>
> i have this in my setup.py
>
> REQUIRED_PACKAGES = [
>     'openbb',
>     "apache-beam[gcp]",  # Must match the version in `Dockerfile``.
>     'sendgrid',
>     'pandas_datareader',
>     'vaderSentiment',
>     'numpy',
>     'bs4',
>     'lxml',
>     'pandas_datareader',
>     'beautifulsoup4',
>     'xlrd',
>     'openpyxl'
>     ]
>
>
> setuptools.setup(
>     name='mypackage',
>     version='0.0.1',
>     description='Shres Runner Package.',
>     install_requires=REQUIRED_PACKAGES,
>     packages=setuptools.find_packages()
>     )
>
>
> and this is my dataflow_tester.py
>
> from mypackage import launcher
> import logging
> if __name__ == '__main__':
>   logging.getLogger().setLevel(logging.INFO)
>   launcher.run()
>
>
>
> have compared my setup vs
> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
> and all looks the same (apart from my copying the __init__.,py fromo the
> directory where the main file(dataflow_tester.py) resides
>
> Would you know how else can i debug what is going on and why my
> mypackages subdirectory is not being seen?
>
> Kind regars
>  Marco
>
>
>
>
> On Sat, Jun 15, 2024 at 7:27 PM Valentyn Tymofieiev via user <
> [email protected]> wrote:
>
>> Your pipeline launcher refers to a package named 'modules', but this
>> package is not available in the runtime environment.
>>
>> On Sat, Jun 15, 2024 at 11:17 AM Sofia’s World <[email protected]>
>> wrote:
>>
>>> Sorry, i cheered up too early
>>> i can successfully build the image however, at runtime the code fails
>>> always with this exception and i cannot figure out why
>>>
>>> i mimicked the sample directory structure
>>>
>>>
>>> ---- mypackage
>>>    --- __init__,py
>>>        dftester.py
>>>        obb_utils.py
>>>
>>> dataflow_tester_main.py
>>>
>>> this is the content of my dataflow_tester_main.py
>>>
>>> from mypackage import dftester
>>> import logging
>>> if __name__ == '__main__':
>>>   logging.getLogger().setLevel(logging.INFO)
>>>   dftester.run()
>>>
>>>
>>> and this is my dockerfile
>>>
>>>
>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/Dockerfile_tester
>>>
>>> and at the bottom if this email my exception
>>> I am puzzled on where the error is coming from as i have almost copied
>>> this sample
>>> https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/main.py
>>>
>>> thanks and regards
>>>  Marco
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Traceback (most recent call last): File
>>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>>> line 115, in create_harness _load_main_session(semi_persistent_directory)
>>> File
>>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>>> line 354, in _load_main_session pickler.load_session(session_file) File
>>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/pickler.py",
>>> line 65, in load_session return desired_pickle_lib.load_session(file_path)
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File
>>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/dill_pickler.py",
>>> line 446, in load_session return dill.load_session(file_path)
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File
>>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 368, in
>>> load_session module = unpickler.load() ^^^^^^^^^^^^^^^^ File
>>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in load
>>> obj = StockUnpickler.load(self) ^^^^^^^^^^^^^^^^^^^^^^^^^ File
>>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 462, in
>>> find_class return StockUnpickler.find_class(self, module, name)
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No
>>> module named 'modules'
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jun 14, 2024 at 5:52 AM Sofia’s World <[email protected]>
>>> wrote:
>>>
>>>> Many thanks Hu, worked like a charm
>>>>
>>>> few qq
>>>> so in my reqs.txt i should put all beam requirements PLUS my own?
>>>>
>>>> and in the setup.py, shall i just declare
>>>>
>>>> "apache-beam[gcp]==2.54.0",  # Must match the version in `Dockerfile``.
>>>>
>>>> thanks and kind regards
>>>> Marco
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jun 12, 2024 at 1:48 PM XQ Hu <[email protected]> wrote:
>>>>
>>>>> Any reason to use this?
>>>>>
>>>>> RUN pip install avro-python3 pyarrow==0.15.1 apache-beam[gcp]==2.30.0
>>>>>  pandas-datareader==0.9.0
>>>>>
>>>>> It is typically recommended to use the latest Beam and build the
>>>>> docker image using the requirements released for each Beam, for example,
>>>>> https://github.com/apache/beam/blob/release-2.56.0/sdks/python/container/py311/base_image_requirements.txt
>>>>>
>>>>> On Wed, Jun 12, 2024 at 1:31 AM Sofia’s World <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Sure, apologies, it crossed my mind it would have been useful to
>>>>>> refert to it
>>>>>>
>>>>>> so this is the docker file
>>>>>>
>>>>>>
>>>>>> https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester
>>>>>>
>>>>>> I was using a setup.py as well, but then i commented out the usage in
>>>>>> the dockerfile after checking some flex templates which said it is not
>>>>>> needed
>>>>>>
>>>>>>
>>>>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/setup_dftester.py
>>>>>>
>>>>>> thanks in advance
>>>>>>  Marco
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 11, 2024 at 10:54 PM XQ Hu <[email protected]> wrote:
>>>>>>
>>>>>>> Can you share your Dockerfile?
>>>>>>>
>>>>>>> On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> thanks all,  it seemed to work but now i am getting a different
>>>>>>>> problem, having issues in building pyarrow...
>>>>>>>>
>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":    
>>>>>>>>    <string>:36: DeprecationWarning: pkg_resources is deprecated as an 
>>>>>>>> API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":    
>>>>>>>>    WARNING setuptools_scm.pyproject_reading toml section missing 
>>>>>>>> 'pyproject.toml does not contain a tool.setuptools_scm section'
>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":    
>>>>>>>>    Traceback (most recent call last):
>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":    
>>>>>>>>      File 
>>>>>>>> "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py",
>>>>>>>>  line 36, in read_pyproject
>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":    
>>>>>>>>        section = defn.get("tool", {})[tool_name]
>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":    
>>>>>>>>                  ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":    
>>>>>>>>    KeyError: 'setuptools_scm'
>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":    
>>>>>>>>    running bdist_wheel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> It is somehow getting messed up with a toml ?
>>>>>>>>
>>>>>>>>
>>>>>>>> Could anyone advise?
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>>  Marco
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 11, 2024 at 1:00 AM XQ Hu via user <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
>>>>>>>>> is a great example.
>>>>>>>>>
>>>>>>>>> On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> In this case the Python version will be defined by the Python
>>>>>>>>>> version installed in the docker image of your flex template. So, 
>>>>>>>>>> you'd
>>>>>>>>>> have to build your flex template from a base image with Python 3.11.
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello
>>>>>>>>>>>  no i am running my pipelien on  GCP directly via a flex
>>>>>>>>>>> template, configured using a Docker file
>>>>>>>>>>> Any chances to do something in the Dockerfile to force the
>>>>>>>>>>> version at runtime?
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> Are you running your pipeline from the python 3.11
>>>>>>>>>>>> environment?  If you are running from a python 3.11 environment 
>>>>>>>>>>>> and don't
>>>>>>>>>>>> use a custom docker container image, DataflowRunner(Assuming 
>>>>>>>>>>>> Apache Beam on
>>>>>>>>>>>> GCP means Apache Beam on DataflowRunner), will use Python 3.11.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Anand
>>>>>>>>>>>>
>>>>>>>>>>>

Reply via email to