Re: Test Script Recorder XML Regex Matching

sebb Sun, 05 Oct 2014 05:36:07 -0700

On 5 October 2014 13:26, Felix Schumacher
<[email protected]> wrote:
> Am 05.10.2014 um 11:30 schrieb sebb:
>
>> On 4 October 2014 19:41, Philippe Mouawad <[email protected]>
>> wrote:
>>>
>>> On Sat, Oct 4, 2014 at 2:10 PM, Felix Schumacher <
>>> [email protected]> wrote:
>>>
>>>> Am 29.09.2014 um 22:32 schrieb Philippe Mouawad:
>>>>
>>>>> Hi Felix,
>>>>>
>>>> Hi
>>>> I agree with sebb, patch is interesting.
>>>>>
>>>>> But it clearly needs to be documented (I think many users don't know
>>>>> about
>>>>> this feature which is really interesting) as long as code, reading
>>>>> patch
>>>>> first it wasn't clear for me what was intended.
>>>>>
>>>> I have added documentation to the patch and found two other things, that
>>>> I
>>>> changed
>>>> in the same bug-entry.
>>>>
>>>> The random order of applying the matchers, seems a bit strange, so I
>>>> sorted the matchers
>>>> first by their length and if the matchers are the same length, then by
>>>> the
>>>> name of their keys. So
>>>> the set
>>>>   {'domain': 'example.com', 'server': 'www',  'regex': 'w.*' }
>>>> would be applied in the order ['domain', 'regex', 'server'] since
>>>> 'domain'
>>>> has the longest matcher and
>>>> 'regex' comes before 'server' alphabetically (matchers are both the same
>>>> length).
>>>>
>>> Isn't it better to order by longest value or regexp ?
>>> www is more specific than w.*
>>> So would be :
>>> domain, server , regex
>>
>> Or the code could try to match every variable and select the one that
>> produces the longest match.
>>
>> But rather than try and sort the regexes, which is always going to be
>> tricky to do "correctly" (whatever that means), maybe the user should
>> be given control of the matching order.
>>
>> For example, it is probably possible to match by order of appearance.
>>
>> It would certainly be possible to match the variables in sorted order by
>> name.
>> This would be a bit more awkard to use than changing the order of
>> variable definitions.
>
> I just wanted to give a simple algorithm for ordering, which I think is
> better than random ordering.
>
> Correctness will be hard to implement, when everyone has a different view on
> the correct ordering.
>
> I had thought of giving more control to the user by appending the variable
> names with something to sort by.
>
> For example extending the above example with variable names ['domain',
> 'server', 'regex'] the names could be
> changed to ['domain_3', 'server_1', 'regex_2'] to impose replacement in the
> order ['server', 'regex', 'domain'].
> But what should we do with the suffix '_\d+'? (A prefix could be used, too)
>
> We could look for a specially named variable like '_regex_order' which could
> have a comma separated list of
> the variable names in the wished order.
>
> The longer I think about it, the more I am inclined to take the simple
> ordering algorithm of length and then name. One can
> always make any regex longer by adding useless junk like
> '(?:WILLNOTBEFOUNDANYWAY)?' and in such a way influence
> the order.


No, length of regex is not useful.
More useful would be sorting by matched string.
Sorting by name is awkward to use, and anyway what about non-regexes
that happen to match the same text?

I don't think it's possible to automatically sort correctly by regex.
So we should allow the user to control the search order, as I already
suggested a short while ago.

> Felix
>
>>
>>>
>>>> If no one objects, I will submit it next week.
>>>>
>>>> Regards
>>>>   Felix
>>>>
>>>>> Thanks for contributing
>>>>> Regards
>>>>>
>>>>>
>>>>> On Monday, September 29, 2014, sebb <[email protected]> wrote:
>>>>>
>>>>>   On 29 September 2014 15:49, Felix Schumacher
>>>>>>
>>>>>> <[email protected] <javascript:;>> wrote:
>>>>>>
>>>>>>> Am 29. September 2014 12:46:19 MESZ, schrieb sebb <[email protected]
>>>>>>>
>>>>>> <javascript:;>>:
>>>>>>
>>>>>>> On 29 September 2014 11:24, Felix Schumacher
>>>>>>>>
>>>>>>>> <[email protected] <javascript:;>> wrote:
>>>>>>>>
>>>>>>>>> Am 29.09.2014 11:56, schrieb sebb:
>>>>>>>>>
>>>>>>>>>   On 28 September 2014 18:11, Felix Schumacher
>>>>>>>>>>
>>>>>>>>>> <[email protected] <javascript:;>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Am 22.09.2014 um 11:13 schrieb Marijn Wijbenga:
>>>>>>>>>>>
>>>>>>>>>>> I've attached a jmeter project file and a html file that
>>>>>>>>>>>
>>>>>>>>>> demonstrates the
>>>>>>>>>
>>>>>>>>> issue. In order to reproduce:
>>>>>>>>>>>
>>>>>>>>>>> 1. Load up xml-bug-test.jmx in jmeter.
>>>>>>>>>>> 2. Start the proxy (recorder)
>>>>>>>>>>> 3. Place xml-bug-test.html on a webserver somewhere (if on
>>>>>>>>>>>
>>>>>>>>>> localhost, do
>>>>>>>>>
>>>>>>>>> not
>>>>>>>>>>>
>>>>>>>>>>> forget to remove localhost from proxy exclusion if applicable)
>>>>>>>>>>> 4. Navigate with a browser to this file (using the proxy)
>>>>>>>>>>> 5. Click both buttons in order.
>>>>>>>>>>>
>>>>>>>>>>> I could not post to a html file, hence the "test 2" button will
>>>>>>>>>>>
>>>>>>>>>> post to
>>>>>>>>>
>>>>>>>>> Google. The page that loads has an error, but it still records the
>>>>>>>>>>
>>>>>>>>>> post
>>>>>>>>>
>>>>>>>>> request which is what we want to see.
>>>>>>>>>>>
>>>>>>>>>>> I also discovered that when I was using a "get" request instead
>>>>>>>>>>>
>>>>>>>>>> (I've
>>>>>>>>>
>>>>>>>>> made
>>>>>>>>>>>
>>>>>>>>>>> that "test 1") then it doesn't match the first character (%). I
>>>>>>>>>>>
>>>>>>>>>> think
>>>>>>>>>
>>>>>>>>> this
>>>>>>>>>>>
>>>>>>>>>>> is related.
>>>>>>>>>>>
>>>>>>>>>>> The project has a user defined variable called "TEST" with a
>>>>>>>>>>> value
>>>>>>>>>>>
>>>>>>>>>> os
>>>>>>>>>
>>>>>>>>> ".*",
>>>>>>>>>>>
>>>>>>>>>>> I've ticked the box
>>>>>>>>>>>
>>>>>>>>>>> To see the results, in the recording controller the last two
>>>>>>>>>>>
>>>>>>>>>> requests
>>>>>>>>>
>>>>>>>>> contain a parameter with these values:
>>>>>>>>>>>
>>>>>>>>>>> Test 1: %${TEST}
>>>>>>>>>>> Test 2: <${TEST}>
>>>>>>>>>>>
>>>>>>>>>>> Both should be just ${TEST} I believe.
>>>>>>>>>>>
>>>>>>>>>>> In the current implementation the regex will be matched against a
>>>>>>>>>>>
>>>>>>>>>> pattern
>>>>>>>>>
>>>>>>>>> which looks like
>>>>>>>>>>>
>>>>>>>>>>>    \b(YOUR_VALUE)\b
>>>>>>>>>>>
>>>>>>>>>>> As % and < are boundary characters they are excluded from you
>>>>>>>>>>>
>>>>>>>>>> pattern.
>>>>>>>>>> This is deliberate.
>>>>>>>>>> There were problems previously as partial values were being
>>>>>>>>>> unexpectedly matched.
>>>>>>>>>>
>>>>>>>>>> See https://issues.apache.org/bugzilla/show_bug.cgi?id=52678
>>>>>>>>>>
>>>>>>>>> I thougt so. Maybe, that would have been helped by adding more
>>>>>>>>> documentation, but then it is regex...
>>>>>>>>>
>>>>>>>>>>   I would consider this a bug, or at least documentation could be
>>>>>>>>>> a
>>>>>>>>>> bit
>>>>>>>>>
>>>>>>>>> more
>>>>>>>>>>>
>>>>>>>>>>> concise.
>>>>>>>>>>>
>>>>>>>>>> Patches welcome.
>>>>>>>>>>
>>>>>>>>> A patch was attached :)
>>>>>>>>>
>>>>>>>> I meant that we would welcome a patch for the documentation.
>>>>>>>> Or at least some indication of where the documentation needs to be
>>>>>>>> updated to clarify the current behaviour.
>>>>>>>>
>>>>>>> I will look into that.
>>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>   What is your opinion on the option to detect parens and modify the
>>>>>> regex
>>>>>> behavior?
>>>>>>
>>>>>> Looks good to me.
>>>>>>
>>>>>> The parens are very unlikely to have been used in existing tests, so
>>>>>> the modified behaviour is unlikely to break anything.
>>>>>> But we should document it in the release notes just in case.
>>>>>>
>>>>>>   Felix
>>>>>>>>
>>>>>>>> Attached is a patch against trunk, which checks the regex if it
>>>>>>>>>>
>>>>>>>>>> starts
>>>>>>>>>
>>>>>>>>> with
>>>>>>>>>>>
>>>>>>>>>>> '(' and ends with ')' and uses the regex as given, instead of
>>>>>>>>>>>
>>>>>>>>>> building
>>>>>>>>>
>>>>>>>>> its
>>>>>>>>>>>
>>>>>>>>>>> own version.
>>>>>>>>>>>
>>>>>>>>>> Please use Bugzilla for patches; it's easier to keep track of
>>>>>>>>>> them.
>>>>>>>>>>
>>>>>>>>> I have already done so yesterday shortly after sending my mail. It
>>>>>>>>> is
>>>>>>>>> https://issues.apache.org/bugzilla/show_bug.cgi?id=57032
>>>>>>>>>
>>>>>>>>> What is missing from the patch is documentation. If the feature as
>>>>>>>>>
>>>>>>>> such is
>>>>>>>>
>>>>>>>>> ok, then I would add that to the existing documentation.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>    Felix
>>>>>>>>>
>>>>>>>>>>> Also, see notes below.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: sebb [mailto:[email protected] <javascript:;>]
>>>>>>>>>>> Sent: 21 September 2014 01:52
>>>>>>>>>>> To: JMeter Users List
>>>>>>>>>>> Subject: Re: Test Script Recorder XML Regex Matching
>>>>>>>>>>>
>>>>>>>>>>> On 19 September 2014 16:45, Marijn Wijbenga
>>>>>>>>>>> <[email protected] <javascript:;>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have an issue, which might well be a potential bug, where a
>>>>>>>>>>>
>>>>>>>>>> posted
>>>>>>>>>
>>>>>>>>> value
>>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>>
>>>>>>>>>>> not being matched by the Test Script Recorder's Regex Matching
>>>>>>>>>>> functionality.
>>>>>>>>>>>
>>>>>>>>>>> The request I'm recording has a post value containing XML (SAML
>>>>>>>>>>>
>>>>>>>>>> token to
>>>>>>>>>
>>>>>>>>> be
>>>>>>>>>>>
>>>>>>>>>>> exact) which I'd like to replace with a variable automatically.
>>>>>>>>>>>
>>>>>>>>>>> What does the value look like?
>>>>>>>>>>> Does it have multiple lines?
>>>>>>>>>>>
>>>>>>>>>>> No, it did not have multiple lines. I did check if this was the
>>>>>>>>>>>
>>>>>>>>>> case, but
>>>>>>>>>
>>>>>>>>> it
>>>>>>>>>>>
>>>>>>>>>>> wasn't
>>>>>>>>>>>
>>>>>>>>>>> For testing purposes I have configured a User Defined Variable
>>>>>>>>>>>
>>>>>>>>>> (called
>>>>>>>>>
>>>>>>>>> TEST)
>>>>>>>>>>>
>>>>>>>>>>> with a value of "(?s)^.*$", I've tried "^.*$" and ".*" as well
>>>>>>>>>>> (all
>>>>>>>>>>> without
>>>>>>>>>>> double
>>>>>>>>>>> quotes).
>>>>>>>>>>>
>>>>>>>>>>> Only ".*" replaces the content with this: <${TEST}>
>>>>>>>>>>>
>>>>>>>>>>> That does not make sense.
>>>>>>>>>>> ".*" will match everything, including < and >, so the content
>>>>>>>>>>> would
>>>>>>>>>>> become
>>>>>>>>>>> ${TEST}
>>>>>>>>>>>
>>>>>>>>>>> I know. It doesn't really. Hence I think this might be a bug.
>>>>>>>>>>>
>>>>>>>>>>> I've tried other expressions as well and I'm able to match
>>>>>>>>>>> anything
>>>>>>>>>>> within
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> <> characters, but not those characters itself.
>>>>>>>>>>>
>>>>>>>>>>> Again, that does not make sense.
>>>>>>>>>>>
>>>>>>>>>>> The weird thing is, that inside the outer <> characters there are
>>>>>>>>>>>
>>>>>>>>>> other
>>>>>>>>>
>>>>>>>>> <>
>>>>>>>>>>>
>>>>>>>>>>> characters that are matched fine. It's just the first and last
>>>>>>>>>>>
>>>>>>>>>> character.
>>>>>>>>>
>>>>>>>>> Does anyone else have experienced the same thing, or is this a
>>>>>>>>>>
>>>>>>>>>> known
>>>>>>>>>
>>>>>>>>> issue?
>>>>>>>>>>>
>>>>>>>>>>> It is not a known issue, and may not even be an issue.
>>>>>>>>>>>
>>>>>>>>>>> Or should I post this in the developer's mailing list?
>>>>>>>>>>>
>>>>>>>>>>> No, the developers all follow this list.
>>>>>>>>>>>
>>>>>>>>>>> Great, please see attachment for an example.
>>>>>>>>>>>
>>>>>>>>>>> Cheers
>>>>>>>>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Test Script Recorder XML Regex Matching

Reply via email to