Hi Tatsuo,

Thank you for the thorough testing and performance profiling!

BTW, I have tested RPR with large partition/reduced frame (up to 100k
> rows). Following query took over 3 minutes on my laptop.
>

The profiling results are very insightful and help us understand where
the time is spent in extreme cases.


> I don't try to say we should enhance nfa modules in the case when
> there is a large reduced frame, because I don't think this is a
> realistic use case and I doubt it's worth to fight against unrealistic
> scenario.
>

I agree with your assessment. Let me clarify:

The pattern itself (START UP+) is realistic - it's a common pattern for
detecting upward trends. However, 100k+ consecutive matches without
interruption is not realistic. Your UP{,3} example (94ms for 100k partition)
demonstrates that the current implementation performs very well for
realistic
use cases.


> I think this is more realistic use case than former. If I were
> correct, we don't need to work on current nfa code to squeeze better
> performance for unrealistic 100k reduced frame case. I maybe wrong
> and I would love to hear from others especially, Henson, Vik, Jacob.


I agree that we should prioritize realistic use cases. That said, there are
potential optimizations that could help:

1. Anchored pattern absorption (see my earlier message:

https://www.postgresql.org/message-id/CAAAe_zAEg7sVM%3DWDwXMyE-odGmQyXSVi5ZzWgye6SupSjdMKpg%40mail.gmail.com
)

2. Alt-pruning: In patterns like "A* | B*", once the higher-priority A
branch
   has a confirmed match, the B branch can be pruned immediately. Even if B
   could continue extending to a longer match, it can never be selected due
to
   lexical order semantics—A will always win. This proactive pruning
respects
   SQL standard semantics while reducing unnecessary state expansion.

However, given the complexity of NFA internals, I believe we should take a
step-by-step approach:

1. First, stabilize the current RPR patch and prepare it for review
2. Then, consider optimizations as separate follow-up patches
3. Each optimization should be well-tested and reviewed independently

This approach reduces risk and makes review more manageable. The fact that
the current implementation handles even unrealistic 100k-row cases without
crashing (just slowly) shows it's already robust.

What do you think about this phased approach?

Best regards,
Henson

P.S. I discovered a crash bug that was introduced in the latest patch
refactoring. The issue occurs with nested alternation patterns like
(A+ | (A | B)+)*, where infinite recursion happens in nfa_advance_alt when
the inner BEGIN(+)'s skip jump is followed as an ALT branch pointer. I will
include the fix in the next patch update.

Reply via email to