Re your 2nd question, you can make the filter slightly more robust:
[{test}splitregexp[\n]join[ ]splitregexp[<li.*?>]butfirst[1]splitregexp[</li
>]butlast[1]regexp<realchars>]
Re your 1st question, I don't believe you can do this in a single filter.
It will probably take multiple lines if possible at all. Because, there are
no core tools
for grabbing the actual text you want -- only for splitting. People have
done a lot with splitting, but it gets tedious.
If you had a regular expression filter that could split and return groups
(e.g. #2963) then you could simply search for and lift out the <li ...>
group and the content group in one regular expression.
On Thursday, August 22, 2019 at 7:58:06 AM UTC-7, TonyM wrote:
>
> Mark - Wow,
>
> I will test it out tomorrow to see how far I can take it.
>
> I hope it works for multi-line tags
>
> My interest would be also the option to return
> <li>line 3</li>
> <li>line 2</li>
> <li>line 1</li>
> or
> line 3
> line 2 <https://tiddlywiki.com/#line%202>
> line 1 <https://tiddlywiki.com/#line%201>
> Because keeping the valid tags can be made use of as well.
>
> Ahd also see how to handle If the list tag had a style eg <li
> style="something"> it would be nice if we could return
> <li style="something">line 1</li>
> or
> line 1
>
> If so a lot can be done to extract useful content from html, even if just
> to summarise some content.
>
> Perhaps further resolution would help like <section
> name=extract>content</section>
>
> Or extract list items.
>
> Even without using html a tiddlers text field could use html block and
> inline elements https://www.w3schools.com/html/html_blocks.asp to
> structure the content, and with such a regex macro extract parts of the
> tiddler text such as say a prepared extract from the content, or an
> excerpt, or a config settings or more.
>
> Regards
> Tony
>
>
> On Friday, August 23, 2019 at 12:22:47 AM UTC+10, Mark S. wrote:
>>
>>
>> There's that saying, "When all you have is a hammer, everything starts to
>> look like a nail."
>>
>> All we have is regex. It would be great to have some other tool for
>> extracting actual DOM-like structures the way you
>> could with TW classic. But we don't have it.
>>
>> Actually, the tool we have for regexp is also a bit lacking. There's no
>> tool for directly lifting desired target text. The new splitregexp only
>> splits, it doesn't
>> return the text we want to find. Here's my version that does most
>> literally what you ask for
>>
>> <$vars realchars="[^\s]+">
>> <$list filter="[{test}splitregexp[\n]join[ ]splitregexp[<li>
>> ]butfirst[1]splitregexp[</li>]butlast[1]regexp<realchars>]">
>>
>> </$list>
>> </$vars>
>>
>> Input:
>>
>> More text here
>> <li>line 3</li>
>> <li>line 2</li>
>> <li>line 1</li>
>> More text there
>>
>> Output
>>
>>
>> line 3 <https://tiddlywiki.com/#line%203>
>> line 2 <https://tiddlywiki.com/#line%202>
>> line 1 <https://tiddlywiki.com/#line%201>
>>
>>
>>
>> Good luck!
>>
>> On Thursday, August 22, 2019 at 2:21:34 AM UTC-7, TonyM wrote:
>>>
>>> Jeremy,
>>>
>>> You are aware I do not want so much to parse it as locate the content
>>> between matching tags.
>>>
>>> Its intention is to access content delimited by html tags inside the
>>> text content.
>>>
>>> Perhaps we could use it to retrieve items between the section div tags
>>> or all instances of text between the li tags.
>>>
>>> Regards
>>> Tony
>>>
>>>
--
You received this message because you are subscribed to the Google Groups
"TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tiddlywiki/fd507336-3981-4657-9abd-db3d41024a6c%40googlegroups.com.