Hi Fabian,

Not exactly sure if the issue you had is related to what I encountered in
2.5.0, but I've attached a sample pipeline to your ticket 3243 [1] that
runs without any issues in 2.6.0.
Can you please give this sample a try (in 2.6.0) and let us know if that
works for you?

[1] https://github.com/apache/hop/issues/3243

Regards,
Bart

On Wed, Sep 20, 2023 at 3:46 PM Fabian Peters <[email protected]> wrote:

> Hi Matt,
>
> So I had a look at the integration test and it does not cover the use case
> I wrote about. There, the "Get file names" transform is configured with a
> single hardcoded path, whereas my issue concerns paths passed to "Get
> filename from field". I've created a new issue
> <https://github.com/apache/hop/issues/3243>.
>
> cheers
>
> Fabian
>
> Am 12.09.2023 um 19:01 schrieb Matt Casters <[email protected]
> >:
>
> That "fix" caused an out of memory error so I wouldn't rely on it too much
> for larger volumes of files.
> But this is why we added the integration test. Listing thousands of files
> IIRC. I couldn't possibly comment beyond that ;-)
>
> Take care,
>
> Matt
>
> Op di 12 sep. 2023 17:18 schreef Fabian Peters <[email protected]>:
>
>> Hi Matt,
>>
>> Well, I've since applied the recursion-based fix again and the pipeline
>> started working as expected. Was anything else changed in the logic that
>> would ensure that multiple rows get passed into the transform? This was the
>> original problem, that only the first row was acted upon. (The problem only
>> occurs if the path to the directory is set via "Get filename from field".)
>>
>> cheers
>>
>> Fabian
>>
>> Am 12.09.2023 um 16:37 schrieb Matt Casters <
>> [email protected]>:
>>
>> It's surprising since we have a successful test running with "Get File
>> Names" on the Beam direct runner.
>>
>>
>> https://ci-builds.apache.org/job/Hop/job/Hop-integration-tests/lastCompletedBuild/testReport/(root)/beam_directrunner/0010_get_file_names/
>>
>> I think that the main thing is to have permissions on the gs:// location
>> you want to get files from.
>>
>> Cheers,
>>
>> Matt
>>
>>
>> Op wo 6 sep. 2023 09:05 schreef Fabian Peters <[email protected]>:
>>
>>> Good morning all!
>>>
>>> Not having worked with Hop for a couple of months I downloaded the 2.5.0
>>> version and found that an existing pipeline failed to work as expected.
>>> This is due to the "Get file names" transform returning only a single row
>>> for each row passed to "Get filename from field". I ran into the same
>>> issue
>>> <https://issues.apache.org/jira/projects/HOP/issues/HOP-4191?filter=allissues>
>>>  last
>>> year, but the fix <https://github.com/apache/hop/pull/1674/files> I
>>> provided turned out to sometimes cause a stack overflow
>>> <https://issues.apache.org/jira/projects/HOP/issues/HOP-4528?filter=allissues>
>>>  and
>>> was reverted. (No hard feelings…)
>>>
>>> Is there another way to make this work on Beam/Dataflow? Or is there an
>>> alternative approach I can use to get all files in a GCS path, short of
>>> using their HTTP API?
>>>
>>> Besides this: Great work on the Dataflow template handling – works like
>>> a charm now!
>>>
>>> cheers
>>>
>>> Fabian
>>>
>>
>>
>

Reply via email to