You have “mypackage” incorrectly built. Please check and confirm that.

Utkarsh

On Sun, Jun 16, 2024 at 12:48 PM Sofia’s World <[email protected]> wrote:

> Error is same...- see bottom -
> i have tried to ssh in the container and the directory is setup as
> expected...... so not quite sure where the issue is
> i will try to start from the pipeline with dependencies sample and work
> out from there  w.o bothering the list
>
> thanks again for following up
>  Marco
>
> Could not load main session. Inspect which external dependencies are used
> in the main module of your pipeline. Verify that corresponding packages are
> installed in the pipeline runtime environment and their installed versions
> match the versions used in pipeline submission environment. For more
> information, see:
> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
> Traceback (most recent call last): File
> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
> line 115, in create_harness _load_main_session(semi_persistent_directory)
> File
> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
> line 354, in _load_main_session pickler.load_session(session_file) File
> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/pickler.py",
> line 65, in load_session return desired_pickle_lib.load_session(file_path)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File
> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/dill_pickler.py",
> line 446, in load_session return dill.load_session(file_path)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File
> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 368, in
> load_session module = unpickler.load() ^^^^^^^^^^^^^^^^ File
> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in load
> obj = StockUnpickler.load(self) ^^^^^^^^^^^^^^^^^^^^^^^^^ File
> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 827, in
> _import_module return getattr(__import__(module, None, None, [obj]), obj)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No module named
> 'mypackage'
>
>
>
> On Sun, 16 Jun 2024, 14:50 XQ Hu via user, <[email protected]> wrote:
>
>> What is the error message now?
>> You can easily ssh to your docker container and check everything is
>> installed correctly by:
>> docker run --rm -it --entrypoint=/bin/bash $CUSTOM_CONTAINER_IMAGE
>>
>>
>> On Sun, Jun 16, 2024 at 5:18 AM Sofia’s World <[email protected]>
>> wrote:
>>
>>> Valentin, many thanks... i actually spotted the reference in teh setup
>>> file
>>> However , after correcting it, i am still at square 1 where somehow my
>>> runtime environment does not see it.. so i added some debugging to my
>>> Dockerfile to check if i forgot to copy something,
>>> and here's the output, where i can see the mypackage has been copied
>>>
>>> here's my directory structure
>>>
>>> ---- mypackage
>>> __init__.py
>>> obbutils.py
>>> launcher.py
>>> __init__.py
>>> dataflow_tester.py
>>> setup_dftester.py (copied to setup.py)
>>>
>>> i can see the directory structure has been maintained when i copy my
>>> files to docker as i added some debug to my dockerfile
>>>
>>> Step #0 - "dftester-image": Removing intermediate container 4c4e763289d2
>>> Step #0 - "dftester-image":  ---> cda378f70a9e
>>> Step #0 - "dftester-image": Step 6/23 : COPY requirements.txt .
>>> Step #0 - "dftester-image":  ---> 9a43da08b013
>>> Step #0 - "dftester-image": Step 7/23 : COPY setup_dftester.py setup.py
>>> Step #0 - "dftester-image":  ---> 5a6bf71df052
>>> Step #0 - "dftester-image": Step 8/23 : COPY dataflow_tester.py .
>>> Step #0 - "dftester-image":  ---> 82cfe1f1f9ed
>>> Step #0 - "dftester-image": Step 9/23 : COPY mypackage mypackage
>>> Step #0 - "dftester-image":  ---> d86497b791d0
>>> Step #0 - "dftester-image": Step 10/23 : COPY __init__.py
>>> ${WORKDIR}/__init__.py
>>> Step #0 - "dftester-image":  ---> 337d149d64c7
>>> Step #0 - "dftester-image": Step 11/23 : RUN echo '----- listing workdir'
>>> Step #0 - "dftester-image":  ---> Running in 9d97d8a64319
>>> Step #0 - "dftester-image": ----- listing workdir
>>> Step #0 - "dftester-image": Removing intermediate container 9d97d8a64319
>>> Step #0 - "dftester-image":  ---> bc9a6a2aa462
>>> Step #0 - "dftester-image": Step 12/23 : RUN ls -la ${WORKDIR}
>>> Step #0 - "dftester-image":  ---> Running in cf164108f9d6
>>> Step #0 - "dftester-image": total 24
>>> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 .
>>> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 ..
>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root    0 Jun 16 08:57
>>> __init__.py
>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root  135 Jun 16 08:57
>>> dataflow_tester.py
>>> Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59
>>> mypackage
>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root   64 Jun 16 08:57
>>> requirements.txt
>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root  736 Jun 16 08:57
>>> setup.py
>>> Step #0 - "dftester-image": Removing intermediate container cf164108f9d6
>>> Step #0 - "dftester-image":  ---> eb1a080b7948
>>> Step #0 - "dftester-image": Step 13/23 : RUN echo '--- listing modules
>>> -----'
>>> Step #0 - "dftester-image":  ---> Running in 884f03dd81d6
>>> Step #0 - "dftester-image": --- listing modules -----
>>> Step #0 - "dftester-image": Removing intermediate container 884f03dd81d6
>>> Step #0 - "dftester-image":  ---> 9f6f7e27bd2f
>>> Step #0 - "dftester-image": Step 14/23 : RUN ls -la  ${WORKDIR}/mypackage
>>> Step #0 - "dftester-image":  ---> Running in bd74ade37010
>>> Step #0 - "dftester-image": total 16
>>> Step #0 - "dftester-image": drwxr-xr-x 2 root root 4096 Jun 16 08:59 .
>>> Step #0 - "dftester-image": drwxr-xr-x 1 root root 4096 Jun 16 08:59 ..
>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root    0 Jun 16 08:57
>>> __init__.py
>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root 1442 Jun 16 08:57
>>> launcher.py
>>> Step #0 - "dftester-image": -rw-r--r-- 1 root root  607 Jun 16 08:57
>>> obb_utils.py
>>> Step #0 - "dftester-image": Removing intermediate container bd74ade37010
>>>
>>>
>>> i have this in my setup.py
>>>
>>> REQUIRED_PACKAGES = [
>>>     'openbb',
>>>     "apache-beam[gcp]",  # Must match the version in `Dockerfile``.
>>>     'sendgrid',
>>>     'pandas_datareader',
>>>     'vaderSentiment',
>>>     'numpy',
>>>     'bs4',
>>>     'lxml',
>>>     'pandas_datareader',
>>>     'beautifulsoup4',
>>>     'xlrd',
>>>     'openpyxl'
>>>     ]
>>>
>>>
>>> setuptools.setup(
>>>     name='mypackage',
>>>     version='0.0.1',
>>>     description='Shres Runner Package.',
>>>     install_requires=REQUIRED_PACKAGES,
>>>     packages=setuptools.find_packages()
>>>     )
>>>
>>>
>>> and this is my dataflow_tester.py
>>>
>>> from mypackage import launcher
>>> import logging
>>> if __name__ == '__main__':
>>>   logging.getLogger().setLevel(logging.INFO)
>>>   launcher.run()
>>>
>>>
>>>
>>> have compared my setup vs
>>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
>>> and all looks the same (apart from my copying the __init__.,py fromo the
>>> directory where the main file(dataflow_tester.py) resides
>>>
>>> Would you know how else can i debug what is going on and why my
>>> mypackages subdirectory is not being seen?
>>>
>>> Kind regars
>>>  Marco
>>>
>>>
>>>
>>>
>>> On Sat, Jun 15, 2024 at 7:27 PM Valentyn Tymofieiev via user <
>>> [email protected]> wrote:
>>>
>>>> Your pipeline launcher refers to a package named 'modules', but this
>>>> package is not available in the runtime environment.
>>>>
>>>> On Sat, Jun 15, 2024 at 11:17 AM Sofia’s World <[email protected]>
>>>> wrote:
>>>>
>>>>> Sorry, i cheered up too early
>>>>> i can successfully build the image however, at runtime the code fails
>>>>> always with this exception and i cannot figure out why
>>>>>
>>>>> i mimicked the sample directory structure
>>>>>
>>>>>
>>>>> ---- mypackage
>>>>>    --- __init__,py
>>>>>        dftester.py
>>>>>        obb_utils.py
>>>>>
>>>>> dataflow_tester_main.py
>>>>>
>>>>> this is the content of my dataflow_tester_main.py
>>>>>
>>>>> from mypackage import dftester
>>>>> import logging
>>>>> if __name__ == '__main__':
>>>>>   logging.getLogger().setLevel(logging.INFO)
>>>>>   dftester.run()
>>>>>
>>>>>
>>>>> and this is my dockerfile
>>>>>
>>>>>
>>>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/Dockerfile_tester
>>>>>
>>>>> and at the bottom if this email my exception
>>>>> I am puzzled on where the error is coming from as i have almost copied
>>>>> this sample
>>>>> https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/dataflow/flex-templates/pipeline_with_dependencies/main.py
>>>>>
>>>>> thanks and regards
>>>>>  Marco
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Traceback (most recent call last): File
>>>>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>>>>> line 115, in create_harness _load_main_session(semi_persistent_directory)
>>>>> File
>>>>> "/usr/local/lib/python3.11/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>>>>> line 354, in _load_main_session pickler.load_session(session_file) File
>>>>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/pickler.py",
>>>>> line 65, in load_session return desired_pickle_lib.load_session(file_path)
>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File
>>>>> "/usr/local/lib/python3.11/site-packages/apache_beam/internal/dill_pickler.py",
>>>>> line 446, in load_session return dill.load_session(file_path)
>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File
>>>>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 368, in
>>>>> load_session module = unpickler.load() ^^^^^^^^^^^^^^^^ File
>>>>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 472, in load
>>>>> obj = StockUnpickler.load(self) ^^^^^^^^^^^^^^^^^^^^^^^^^ File
>>>>> "/usr/local/lib/python3.11/site-packages/dill/_dill.py", line 462, in
>>>>> find_class return StockUnpickler.find_class(self, module, name)
>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No
>>>>> module named 'modules'
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 14, 2024 at 5:52 AM Sofia’s World <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Many thanks Hu, worked like a charm
>>>>>>
>>>>>> few qq
>>>>>> so in my reqs.txt i should put all beam requirements PLUS my own?
>>>>>>
>>>>>> and in the setup.py, shall i just declare
>>>>>>
>>>>>> "apache-beam[gcp]==2.54.0",  # Must match the version in
>>>>>> `Dockerfile``.
>>>>>>
>>>>>> thanks and kind regards
>>>>>> Marco
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 12, 2024 at 1:48 PM XQ Hu <[email protected]> wrote:
>>>>>>
>>>>>>> Any reason to use this?
>>>>>>>
>>>>>>> RUN pip install avro-python3 pyarrow==0.15.1
>>>>>>> apache-beam[gcp]==2.30.0  pandas-datareader==0.9.0
>>>>>>>
>>>>>>> It is typically recommended to use the latest Beam and build the
>>>>>>> docker image using the requirements released for each Beam, for example,
>>>>>>> https://github.com/apache/beam/blob/release-2.56.0/sdks/python/container/py311/base_image_requirements.txt
>>>>>>>
>>>>>>> On Wed, Jun 12, 2024 at 1:31 AM Sofia’s World <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Sure, apologies, it crossed my mind it would have been useful to
>>>>>>>> refert to it
>>>>>>>>
>>>>>>>> so this is the docker file
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/mmistroni/GCP_Experiments/edit/master/dataflow/shareloader/Dockerfile_tester
>>>>>>>>
>>>>>>>> I was using a setup.py as well, but then i commented out the usage
>>>>>>>> in the dockerfile after checking some flex templates which said it is 
>>>>>>>> not
>>>>>>>> needed
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/mmistroni/GCP_Experiments/blob/master/dataflow/shareloader/setup_dftester.py
>>>>>>>>
>>>>>>>> thanks in advance
>>>>>>>>  Marco
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 11, 2024 at 10:54 PM XQ Hu <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Can you share your Dockerfile?
>>>>>>>>>
>>>>>>>>> On Tue, Jun 11, 2024 at 4:43 PM Sofia’s World <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> thanks all,  it seemed to work but now i am getting a different
>>>>>>>>>> problem, having issues in building pyarrow...
>>>>>>>>>>
>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":  
>>>>>>>>>>      <string>:36: DeprecationWarning: pkg_resources is deprecated as 
>>>>>>>>>> an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":  
>>>>>>>>>>      WARNING setuptools_scm.pyproject_reading toml section missing 
>>>>>>>>>> 'pyproject.toml does not contain a tool.setuptools_scm section'
>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":  
>>>>>>>>>>      Traceback (most recent call last):
>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":  
>>>>>>>>>>        File 
>>>>>>>>>> "/tmp/pip-build-env-meihcxsp/overlay/lib/python3.11/site-packages/setuptools_scm/_integration/pyproject_reading.py",
>>>>>>>>>>  line 36, in read_pyproject
>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":  
>>>>>>>>>>          section = defn.get("tool", {})[tool_name]
>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":  
>>>>>>>>>>                    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":  
>>>>>>>>>>      KeyError: 'setuptools_scm'
>>>>>>>>>> Step #0 - "build-shareloader-template": Step #4 - "dftester-image":  
>>>>>>>>>>      running bdist_wheel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It is somehow getting messed up with a toml ?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Could anyone advise?
>>>>>>>>>>
>>>>>>>>>> thanks
>>>>>>>>>>
>>>>>>>>>>  Marco
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 11, 2024 at 1:00 AM XQ Hu via user <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/pipeline_with_dependencies
>>>>>>>>>>> is a great example.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 10, 2024 at 4:28 PM Valentyn Tymofieiev via user <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> In this case the Python version will be defined by the Python
>>>>>>>>>>>> version installed in the docker image of your flex template. So, 
>>>>>>>>>>>> you'd
>>>>>>>>>>>> have to build your flex template from a base image with Python 
>>>>>>>>>>>> 3.11.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 10, 2024 at 12:50 PM Sofia’s World <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello
>>>>>>>>>>>>>  no i am running my pipelien on  GCP directly via a flex
>>>>>>>>>>>>> template, configured using a Docker file
>>>>>>>>>>>>> Any chances to do something in the Dockerfile to force the
>>>>>>>>>>>>> version at runtime?
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jun 10, 2024 at 7:24 PM Anand Inguva via user <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Are you running your pipeline from the python 3.11
>>>>>>>>>>>>>> environment?  If you are running from a python 3.11 environment 
>>>>>>>>>>>>>> and don't
>>>>>>>>>>>>>> use a custom docker container image, DataflowRunner(Assuming 
>>>>>>>>>>>>>> Apache Beam on
>>>>>>>>>>>>>> GCP means Apache Beam on DataflowRunner), will use Python 3.11.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Anand
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

Reply via email to