Hi Fabian, Not exactly sure if the issue you had is related to what I encountered in 2.5.0, but I've attached a sample pipeline to your ticket 3243 [1] that runs without any issues in 2.6.0. Can you please give this sample a try (in 2.6.0) and let us know if that works for you?
[1] https://github.com/apache/hop/issues/3243 Regards, Bart On Wed, Sep 20, 2023 at 3:46 PM Fabian Peters <[email protected]> wrote: > Hi Matt, > > So I had a look at the integration test and it does not cover the use case > I wrote about. There, the "Get file names" transform is configured with a > single hardcoded path, whereas my issue concerns paths passed to "Get > filename from field". I've created a new issue > <https://github.com/apache/hop/issues/3243>. > > cheers > > Fabian > > Am 12.09.2023 um 19:01 schrieb Matt Casters <[email protected] > >: > > That "fix" caused an out of memory error so I wouldn't rely on it too much > for larger volumes of files. > But this is why we added the integration test. Listing thousands of files > IIRC. I couldn't possibly comment beyond that ;-) > > Take care, > > Matt > > Op di 12 sep. 2023 17:18 schreef Fabian Peters <[email protected]>: > >> Hi Matt, >> >> Well, I've since applied the recursion-based fix again and the pipeline >> started working as expected. Was anything else changed in the logic that >> would ensure that multiple rows get passed into the transform? This was the >> original problem, that only the first row was acted upon. (The problem only >> occurs if the path to the directory is set via "Get filename from field".) >> >> cheers >> >> Fabian >> >> Am 12.09.2023 um 16:37 schrieb Matt Casters < >> [email protected]>: >> >> It's surprising since we have a successful test running with "Get File >> Names" on the Beam direct runner. >> >> >> https://ci-builds.apache.org/job/Hop/job/Hop-integration-tests/lastCompletedBuild/testReport/(root)/beam_directrunner/0010_get_file_names/ >> >> I think that the main thing is to have permissions on the gs:// location >> you want to get files from. >> >> Cheers, >> >> Matt >> >> >> Op wo 6 sep. 2023 09:05 schreef Fabian Peters <[email protected]>: >> >>> Good morning all! >>> >>> Not having worked with Hop for a couple of months I downloaded the 2.5.0 >>> version and found that an existing pipeline failed to work as expected. >>> This is due to the "Get file names" transform returning only a single row >>> for each row passed to "Get filename from field". I ran into the same >>> issue >>> <https://issues.apache.org/jira/projects/HOP/issues/HOP-4191?filter=allissues> >>> last >>> year, but the fix <https://github.com/apache/hop/pull/1674/files> I >>> provided turned out to sometimes cause a stack overflow >>> <https://issues.apache.org/jira/projects/HOP/issues/HOP-4528?filter=allissues> >>> and >>> was reverted. (No hard feelings…) >>> >>> Is there another way to make this work on Beam/Dataflow? Or is there an >>> alternative approach I can use to get all files in a GCS path, short of >>> using their HTTP API? >>> >>> Besides this: Great work on the Dataflow template handling – works like >>> a charm now! >>> >>> cheers >>> >>> Fabian >>> >> >> >
