That "fix" caused an out of memory error so I wouldn't rely on it too much for larger volumes of files. But this is why we added the integration test. Listing thousands of files IIRC. I couldn't possibly comment beyond that ;-)
Take care, Matt Op di 12 sep. 2023 17:18 schreef Fabian Peters <[email protected]>: > Hi Matt, > > Well, I've since applied the recursion-based fix again and the pipeline > started working as expected. Was anything else changed in the logic that > would ensure that multiple rows get passed into the transform? This was the > original problem, that only the first row was acted upon. (The problem only > occurs if the path to the directory is set via "Get filename from field".) > > cheers > > Fabian > > Am 12.09.2023 um 16:37 schrieb Matt Casters <[email protected] > >: > > It's surprising since we have a successful test running with "Get File > Names" on the Beam direct runner. > > > https://ci-builds.apache.org/job/Hop/job/Hop-integration-tests/lastCompletedBuild/testReport/(root)/beam_directrunner/0010_get_file_names/ > > I think that the main thing is to have permissions on the gs:// location > you want to get files from. > > Cheers, > > Matt > > > Op wo 6 sep. 2023 09:05 schreef Fabian Peters <[email protected]>: > >> Good morning all! >> >> Not having worked with Hop for a couple of months I downloaded the 2.5.0 >> version and found that an existing pipeline failed to work as expected. >> This is due to the "Get file names" transform returning only a single row >> for each row passed to "Get filename from field". I ran into the same >> issue >> <https://issues.apache.org/jira/projects/HOP/issues/HOP-4191?filter=allissues> >> last >> year, but the fix <https://github.com/apache/hop/pull/1674/files> I >> provided turned out to sometimes cause a stack overflow >> <https://issues.apache.org/jira/projects/HOP/issues/HOP-4528?filter=allissues> >> and >> was reverted. (No hard feelings…) >> >> Is there another way to make this work on Beam/Dataflow? Or is there an >> alternative approach I can use to get all files in a GCS path, short of >> using their HTTP API? >> >> Besides this: Great work on the Dataflow template handling – works like a >> charm now! >> >> cheers >> >> Fabian >> > >
