Re: [tw5] Re: Tiddlywiki and regexp

Mohammad Thu, 22 Aug 2019 09:23:56 -0700

Added to TW-Scripts!

Mark,
 What  the part do?


<$vars realchars="[^\s]+">

--Mohammad

On Thursday, August 22, 2019 at 8:38:20 PM UTC+4:30, Mark S. wrote:
>
> Re your 2nd question, you can make the filter slightly more robust:
>
> [{test}splitregexp[\n]join[ ]splitregexp[<li.*?>]butfirst[1]splitregexp[</
> li>]butlast[1]regexp<realchars>]
>
> Re your 1st question, I don't believe you can do this in a single filter. 
> It will probably take multiple lines if possible at all. Because, there are 
> no core tools
> for grabbing the actual text you want -- only for splitting. People have 
> done a lot with splitting, but it gets tedious.
>
> If you had a regular expression filter that could split and return groups 
> (e.g. #2963) then you could simply search for and lift out the <li ...> 
> group and the content group in one regular expression.
>
> On Thursday, August 22, 2019 at 7:58:06 AM UTC-7, TonyM wrote:
>>
>> Mark - Wow,
>>
>> I will test it out tomorrow to see how far I can take it. 
>>
>> I hope it works for multi-line tags
>>
>> My interest would be also the option to return
>> <li>line 3</li>
>> <li>line 2</li>
>> <li>line 1</li>
>> or
>> line 3
>> line 2 <https://tiddlywiki.com/#line%202>
>> line 1 <https://tiddlywiki.com/#line%201>
>> Because keeping the valid tags can be made use of as well.
>>
>> Ahd also see how to handle If the list tag had a style eg <li 
>> style="something"> it would be nice if we could return
>> <li style="something">line 1</li>
>> or
>> line 1
>>
>> If so a lot can be done to extract useful content from html, even if just 
>> to summarise some content.
>>
>> Perhaps further resolution would help like <section 
>> name=extract>content</section>
>>
>> Or extract list items.
>>
>> Even without using html a tiddlers text field could use html block and 
>> inline elements https://www.w3schools.com/html/html_blocks.asp to 
>> structure the content, and with such a regex macro extract parts of the 
>> tiddler text such as say a prepared extract from the content, or an 
>> excerpt, or a config settings or more.
>>
>> Regards
>> Tony
>>
>>
>> On Friday, August 23, 2019 at 12:22:47 AM UTC+10, Mark S. wrote:
>>>
>>>
>>> There's that saying, "When all you have is a hammer, everything starts 
>>> to look like a nail."
>>>
>>> All we have is regex. It would be great to have some other tool for 
>>> extracting actual DOM-like structures the way you
>>> could with TW classic. But we don't have it.
>>>
>>> Actually, the tool we have for regexp is also a bit lacking. There's no 
>>> tool for directly lifting desired target text. The new splitregexp only 
>>> splits, it doesn't 
>>> return the text we want to find. Here's my version that does most 
>>> literally what you ask for
>>>
>>> <$vars realchars="[^\s]+">
>>> <$list filter="[{test}splitregexp[\n]join[ ]splitregexp[<li>
>>> ]butfirst[1]splitregexp[</li>]butlast[1]regexp<realchars>]">
>>>
>>> </$list>
>>> </$vars>
>>>
>>> Input:
>>>
>>> More text here
>>> <li>line 3</li>
>>> <li>line 2</li>
>>> <li>line 1</li>
>>> More text there
>>>
>>> Output
>>>
>>>
>>> line 3 <https://tiddlywiki.com/#line%203>
>>> line 2 <https://tiddlywiki.com/#line%202>
>>> line 1 <https://tiddlywiki.com/#line%201>
>>>
>>>
>>>
>>> Good luck!
>>>
>>> On Thursday, August 22, 2019 at 2:21:34 AM UTC-7, TonyM wrote:
>>>>
>>>> Jeremy,
>>>>
>>>> You are aware I do not want so much to parse it as locate the content 
>>>> between matching tags.
>>>>
>>>> Its intention is to access content delimited by html tags inside the 
>>>> text content.
>>>>
>>>> Perhaps we could use it to retrieve items between the section div tags 
>>>> or all instances of text between the li tags.
>>>>
>>>> Regards
>>>> Tony
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tiddlywiki/9d72d049-e484-409d-a01e-ad30389dbbce%40googlegroups.com.

Re: [tw5] Re: Tiddlywiki and regexp

Reply via email to