Re: ImportError: __import__ not found on python job
hm, interesting, are we using pybind anywhere? I didn't see any references to it. I can give it a try on python 3.8 too though. On Wed, Dec 8, 2021 at 9:19 AM Brian Hulette wrote: > A google search for "__import__ not found" turned up an issue filed with > pybind [1]. I can't deduce a root cause from the discussion there, but it > looks like they didn't experience the issue in Python 3.8 - it could be > interesting to see if your problem goes away there. > > It looks like +Charles Chen added the __import__('re') > workaround in [2], maybe he remembers what was going on? > > [1] https://github.com/pybind/pybind11/issues/2557 > [2] https://github.com/apache/beam/pull/5071 > > On Wed, Dec 8, 2021 at 5:30 AM Steve Niemitz wrote: > >> Yeah, I can't imagine this is a "normal" problem. >> >> I'm on linux w/ py 3.7. My script does have a __name__ == '__main__' >> block. >> >> On Wed, Dec 8, 2021 at 12:38 AM Ning Kang wrote: >> >>> I tried a pipeline: >>> >>> p = beam.Pipeline(DataflowRunner(), options=options) >>> text = p | beam.Create(['Hello World, Hello You']) >>> >>> >>> def tokenize(x): >>> import re >>> return re.findall('Hello', x) >>> >>> >>> flatten = text | 'Split' >> >>> (beam.FlatMap(tokenize).with_output_types(str)) >>> pipeline_result = p.run() >>> >>> >>> Didn't run into the issue. >>> >>> What OS and Python version are you using? Does your script come with a >>> `if __name__ == '__main__': `? >>> >>> On Tue, Dec 7, 2021 at 6:58 PM Steve Niemitz >>> wrote: >>> I have a fairly simple python word count job (although the packaging is a little more complicated) that I'm trying to run. (note: I'm explicitly NOT using save_main_session.) In it is a method to tokenize the incoming text to words, and I used something similar to how the wordcount example worked. def tokenize(row): import re return re.findall(r'[A-Za-z\']+', row.text) which is then used as the function for a FlatMap: | 'Split' >> ( beam.FlatMap(tokenize).with_output_types(str)) However, if I run this job on dataflow (2.33), the python runner fails with a bizarre error: INFO:apache_beam.runners.dataflow.dataflow_runner:2021-12-07T20:59:59.704Z: JOB_MESSAGE_ERROR: Traceback (most recent call last): File "apache_beam/runners/common.py", line 1232, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 572, in apache_beam.runners.common.SimpleInvoker.invoke_process File "/tmp/tmpq_8l154y/wordcount_test.py", line 75, in tokenize ImportError: __import__ not found I was able to find an example in the streaming wordcount snippet that did something similar, but very strange [1]: | 'ExtractWords' >> beam.FlatMap(lambda x: __import__('re').findall(r'[A-Za-z\']+', x)) For whatever reason this actually fixed the issue in my job as well. I can't for the life of me understand why this works, or why the normal import fails. Someone else must have run into this same issue though for that streaming wordcount example to be like that. Any ideas what's going on here? [1] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py#L692 >>>
Re: ImportError: __import__ not found on python job
Yeah, I can't imagine this is a "normal" problem. I'm on linux w/ py 3.7. My script does have a __name__ == '__main__' block. On Wed, Dec 8, 2021 at 12:38 AM Ning Kang wrote: > I tried a pipeline: > > p = beam.Pipeline(DataflowRunner(), options=options) > text = p | beam.Create(['Hello World, Hello You']) > > > def tokenize(x): > import re > return re.findall('Hello', x) > > > flatten = text | 'Split' >> (beam.FlatMap(tokenize).with_output_types(str)) > pipeline_result = p.run() > > > Didn't run into the issue. > > What OS and Python version are you using? Does your script come with a `if > __name__ == '__main__': `? > > On Tue, Dec 7, 2021 at 6:58 PM Steve Niemitz wrote: > >> I have a fairly simple python word count job (although the packaging is a >> little more complicated) that I'm trying to run. (note: I'm explicitly NOT >> using save_main_session.) >> >> In it is a method to tokenize the incoming text to words, and I used >> something similar to how the wordcount example worked. >> >> def tokenize(row): >> import re >> return re.findall(r'[A-Za-z\']+', row.text) >> >> which is then used as the function for a FlatMap: >> | 'Split' >> ( >> beam.FlatMap(tokenize).with_output_types(str)) >> >> However, if I run this job on dataflow (2.33), the python runner fails >> with a bizarre error: >> INFO:apache_beam.runners.dataflow.dataflow_runner:2021-12-07T20:59:59.704Z: >> JOB_MESSAGE_ERROR: Traceback (most recent call last): >> File "apache_beam/runners/common.py", line 1232, in >> apache_beam.runners.common.DoFnRunner.process >> File "apache_beam/runners/common.py", line 572, in >> apache_beam.runners.common.SimpleInvoker.invoke_process >> File "/tmp/tmpq_8l154y/wordcount_test.py", line 75, in tokenize >> ImportError: __import__ not found >> >> I was able to find an example in the streaming wordcount snippet that did >> something similar, but very strange [1]: >> | 'ExtractWords' >> >> beam.FlatMap(lambda x: __import__('re').findall(r'[A-Za-z\']+', >> x)) >> >> For whatever reason this actually fixed the issue in my job as well. I >> can't for the life of me understand why this works, or why the normal >> import fails. Someone else must have run into this same issue though for >> that streaming wordcount example to be like that. Any ideas what's going >> on here? >> >> [1] >> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py#L692 >> >
ImportError: __import__ not found on python job
I have a fairly simple python word count job (although the packaging is a little more complicated) that I'm trying to run. (note: I'm explicitly NOT using save_main_session.) In it is a method to tokenize the incoming text to words, and I used something similar to how the wordcount example worked. def tokenize(row): import re return re.findall(r'[A-Za-z\']+', row.text) which is then used as the function for a FlatMap: | 'Split' >> ( beam.FlatMap(tokenize).with_output_types(str)) However, if I run this job on dataflow (2.33), the python runner fails with a bizarre error: INFO:apache_beam.runners.dataflow.dataflow_runner:2021-12-07T20:59:59.704Z: JOB_MESSAGE_ERROR: Traceback (most recent call last): File "apache_beam/runners/common.py", line 1232, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 572, in apache_beam.runners.common.SimpleInvoker.invoke_process File "/tmp/tmpq_8l154y/wordcount_test.py", line 75, in tokenize ImportError: __import__ not found I was able to find an example in the streaming wordcount snippet that did something similar, but very strange [1]: | 'ExtractWords' >> beam.FlatMap(lambda x: __import__('re').findall(r'[A-Za-z\']+', x)) For whatever reason this actually fixed the issue in my job as well. I can't for the life of me understand why this works, or why the normal import fails. Someone else must have run into this same issue though for that streaming wordcount example to be like that. Any ideas what's going on here? [1] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py#L692