Hi Matt, So I had a look at the integration test and it does not cover the use case I wrote about. There, the "Get file names" transform is configured with a single hardcoded path, whereas my issue concerns paths passed to "Get filename from field". I've created a new issue <https://github.com/apache/hop/issues/3243>.
cheers Fabian > Am 12.09.2023 um 19:01 schrieb Matt Casters <[email protected]>: > > That "fix" caused an out of memory error so I wouldn't rely on it too much > for larger volumes of files. > But this is why we added the integration test. Listing thousands of files > IIRC. I couldn't possibly comment beyond that ;-) > > Take care, > > Matt > > Op di 12 sep. 2023 17:18 schreef Fabian Peters <[email protected] > <mailto:[email protected]>>: >> Hi Matt, >> >> Well, I've since applied the recursion-based fix again and the pipeline >> started working as expected. Was anything else changed in the logic that >> would ensure that multiple rows get passed into the transform? This was the >> original problem, that only the first row was acted upon. (The problem only >> occurs if the path to the directory is set via "Get filename from field".) >> >> cheers >> >> Fabian >> >>> Am 12.09.2023 um 16:37 schrieb Matt Casters <[email protected] >>> <mailto:[email protected]>>: >>> >>> It's surprising since we have a successful test running with "Get File >>> Names" on the Beam direct runner. >>> >>> https://ci-builds.apache.org/job/Hop/job/Hop-integration-tests/lastCompletedBuild/testReport/(root)/beam_directrunner/0010_get_file_names/ >>> >>> I think that the main thing is to have permissions on the gs:// location >>> you want to get files from. >>> >>> Cheers, >>> >>> Matt >>> >>> >>> Op wo 6 sep. 2023 09:05 schreef Fabian Peters <[email protected] >>> <mailto:[email protected]>>: >>>> Good morning all! >>>> >>>> Not having worked with Hop for a couple of months I downloaded the 2.5.0 >>>> version and found that an existing pipeline failed to work as expected. >>>> This is due to the "Get file names" transform returning only a single row >>>> for each row passed to "Get filename from field". I ran into the same >>>> issue >>>> <https://issues.apache.org/jira/projects/HOP/issues/HOP-4191?filter=allissues> >>>> last year, but the fix <https://github.com/apache/hop/pull/1674/files> I >>>> provided turned out to sometimes cause a stack overflow >>>> <https://issues.apache.org/jira/projects/HOP/issues/HOP-4528?filter=allissues> >>>> and was reverted. (No hard feelings…) >>>> >>>> Is there another way to make this work on Beam/Dataflow? Or is there an >>>> alternative approach I can use to get all files in a GCS path, short of >>>> using their HTTP API? >>>> >>>> Besides this: Great work on the Dataflow template handling – works like a >>>> charm now! >>>> >>>> cheers >>>> >>>> Fabian >>
