Hi Matt,

So I had a look at the integration test and it does not cover the use case I 
wrote about. There, the "Get file names" transform is configured with a single 
hardcoded path, whereas my issue concerns paths passed to "Get filename from 
field". I've created a new issue <https://github.com/apache/hop/issues/3243>.

cheers

Fabian

> Am 12.09.2023 um 19:01 schrieb Matt Casters <[email protected]>:
> 
> That "fix" caused an out of memory error so I wouldn't rely on it too much 
> for larger volumes of files.
> But this is why we added the integration test. Listing thousands of files 
> IIRC. I couldn't possibly comment beyond that ;-)
> 
> Take care, 
> 
> Matt
> 
> Op di 12 sep. 2023 17:18 schreef Fabian Peters <[email protected] 
> <mailto:[email protected]>>:
>> Hi Matt,
>> 
>> Well, I've since applied the recursion-based fix again and the pipeline 
>> started working as expected. Was anything else changed in the logic that 
>> would ensure that multiple rows get passed into the transform? This was the 
>> original problem, that only the first row was acted upon. (The problem only 
>> occurs if the path to the directory is set via "Get filename from field".)
>> 
>> cheers
>> 
>> Fabian
>> 
>>> Am 12.09.2023 um 16:37 schrieb Matt Casters <[email protected] 
>>> <mailto:[email protected]>>:
>>> 
>>> It's surprising since we have a successful test running with "Get File 
>>> Names" on the Beam direct runner.
>>> 
>>> https://ci-builds.apache.org/job/Hop/job/Hop-integration-tests/lastCompletedBuild/testReport/(root)/beam_directrunner/0010_get_file_names/
>>> 
>>> I think that the main thing is to have permissions on the gs:// location 
>>> you want to get files from. 
>>> 
>>> Cheers,
>>> 
>>> Matt
>>> 
>>> 
>>> Op wo 6 sep. 2023 09:05 schreef Fabian Peters <[email protected] 
>>> <mailto:[email protected]>>:
>>>> Good morning all!
>>>> 
>>>> Not having worked with Hop for a couple of months I downloaded the 2.5.0 
>>>> version and found that an existing pipeline failed to work as expected. 
>>>> This is due to the "Get file names" transform returning only a single row 
>>>> for each row passed to "Get filename from field". I ran into the same 
>>>> issue 
>>>> <https://issues.apache.org/jira/projects/HOP/issues/HOP-4191?filter=allissues>
>>>>  last year, but the fix <https://github.com/apache/hop/pull/1674/files> I 
>>>> provided turned out to sometimes cause a stack overflow 
>>>> <https://issues.apache.org/jira/projects/HOP/issues/HOP-4528?filter=allissues>
>>>>  and was reverted. (No hard feelings…)
>>>> 
>>>> Is there another way to make this work on Beam/Dataflow? Or is there an 
>>>> alternative approach I can use to get all files in a GCS path, short of 
>>>> using their HTTP API?
>>>> 
>>>> Besides this: Great work on the Dataflow template handling – works like a 
>>>> charm now!
>>>> 
>>>> cheers
>>>> 
>>>> Fabian
>> 

Reply via email to