Hi Matt, Well, I've since applied the recursion-based fix again and the pipeline started working as expected. Was anything else changed in the logic that would ensure that multiple rows get passed into the transform? This was the original problem, that only the first row was acted upon. (The problem only occurs if the path to the directory is set via "Get filename from field".)
cheers Fabian > Am 12.09.2023 um 16:37 schrieb Matt Casters <[email protected]>: > > It's surprising since we have a successful test running with "Get File Names" > on the Beam direct runner. > > https://ci-builds.apache.org/job/Hop/job/Hop-integration-tests/lastCompletedBuild/testReport/(root)/beam_directrunner/0010_get_file_names/ > > I think that the main thing is to have permissions on the gs:// location you > want to get files from. > > Cheers, > > Matt > > > Op wo 6 sep. 2023 09:05 schreef Fabian Peters <[email protected] > <mailto:[email protected]>>: >> Good morning all! >> >> Not having worked with Hop for a couple of months I downloaded the 2.5.0 >> version and found that an existing pipeline failed to work as expected. This >> is due to the "Get file names" transform returning only a single row for >> each row passed to "Get filename from field". I ran into the same issue >> <https://issues.apache.org/jira/projects/HOP/issues/HOP-4191?filter=allissues> >> last year, but the fix <https://github.com/apache/hop/pull/1674/files> I >> provided turned out to sometimes cause a stack overflow >> <https://issues.apache.org/jira/projects/HOP/issues/HOP-4528?filter=allissues> >> and was reverted. (No hard feelings…) >> >> Is there another way to make this work on Beam/Dataflow? Or is there an >> alternative approach I can use to get all files in a GCS path, short of >> using their HTTP API? >> >> Besides this: Great work on the Dataflow template handling – works like a >> charm now! >> >> cheers >> >> Fabian
