Re: [tw5] Re: Tiddlywiki and regexp

@TiddlyTweeter Tue, 17 Sep 2019 04:08:45 -0700

TonyM

It makes great sense to throw away unneeded text BETWEEN tags.


Unfortunately I could not get your version to work.

As far as I can see it just re-adds tags you just took off, and also adds 
them to text you need to excise.

Yes?

TT

On Sunday, 25 August 2019 05:50:46 UTC+2, TonyM wrote:
>
> Mark,
>
> Thanks for this, I only just got to test this; A Test tiddler as follows 
> is not working as I may expect it
> zfdtshwfthf
> <li>Content</li>
> sfghn
> <li>Content2</li>
>
> sfghsfgh
> <li>Content3</li>
> sxgfhfgsdh
>
> I would have hoped it would return
> Content
> Content2
> Content3
>
>
> If it was to return only the content between the `<li> and </li>` and not 
> any other content from the test tiddler I could do this;
> \define output()
> <$vars realchars="[^\s]+">
> <$list 
> filter="[{test data}splitregexp[\n]join[ 
> ]splitregexp[<li.*?>]butfirst[1]splitregexp[</li>]butlast[1]regexp<realchars>addprefix[<li>]addsuffix[</li>]]"
> >
>
> </$list>
> </$vars>
> \end
> <$wikify name=result text="<<output>>">
> <<result>>
> </$wikify>
> Which would find all list items in test (HTML copied from somewhere) and 
> create a new list of only list (li) items in the HTML
>
> Does that make sense?
>
> Regards
> Tony
>
> On Friday, August 23, 2019 at 2:08:20 AM UTC+10, Mark S. wrote:
>>
>> Re your 2nd question, you can make the filter slightly more robust:
>>
>> [{test}splitregexp[\n]join[ ]splitregexp[<li.*?>]butfirst[1]splitregexp
>> [</li>]butlast[1]regexp<realchars>]
>>
>> Re your 1st question, I don't believe you can do this in a single filter. 
>> It will probably take multiple lines if possible at all. Because, there are 
>> no core tools
>> for grabbing the actual text you want -- only for splitting. People have 
>> done a lot with splitting, but it gets tedious.
>>
>> If you had a regular expression filter that could split and return groups 
>> (e.g. #2963) then you could simply search for and lift out the <li ...> 
>> group and the content group in one regular expression.
>>
>> On Thursday, August 22, 2019 at 7:58:06 AM UTC-7, TonyM wrote:
>>>
>>> Mark - Wow,
>>>
>>> I will test it out tomorrow to see how far I can take it. 
>>>
>>> I hope it works for multi-line tags
>>>
>>> My interest would be also the option to return
>>> <li>line 3</li>
>>> <li>line 2</li>
>>> <li>line 1</li>
>>> or
>>> line 3
>>> line 2 <https://tiddlywiki.com/#line%202>
>>> line 1 <https://tiddlywiki.com/#line%201>
>>> Because keeping the valid tags can be made use of as well.
>>>
>>> Ahd also see how to handle If the list tag had a style eg <li 
>>> style="something"> it would be nice if we could return
>>> <li style="something">line 1</li>
>>> or
>>> line 1
>>>
>>> If so a lot can be done to extract useful content from html, even if 
>>> just to summarise some content.
>>>
>>> Perhaps further resolution would help like <section 
>>> name=extract>content</section>
>>>
>>> Or extract list items.
>>>
>>> Even without using html a tiddlers text field could use html block and 
>>> inline elements https://www.w3schools.com/html/html_blocks.asp to 
>>> structure the content, and with such a regex macro extract parts of the 
>>> tiddler text such as say a prepared extract from the content, or an 
>>> excerpt, or a config settings or more.
>>>
>>> Regards
>>> Tony
>>>
>>>
>>> On Friday, August 23, 2019 at 12:22:47 AM UTC+10, Mark S. wrote:
>>>>
>>>>
>>>> There's that saying, "When all you have is a hammer, everything starts 
>>>> to look like a nail."
>>>>
>>>> All we have is regex. It would be great to have some other tool for 
>>>> extracting actual DOM-like structures the way you
>>>> could with TW classic. But we don't have it.
>>>>
>>>> Actually, the tool we have for regexp is also a bit lacking. There's no 
>>>> tool for directly lifting desired target text. The new splitregexp only 
>>>> splits, it doesn't 
>>>> return the text we want to find. Here's my version that does most 
>>>> literally what you ask for
>>>>
>>>> <$vars realchars="[^\s]+">
>>>> <$list filter="[{test}splitregexp[\n]join[ ]splitregexp[<li>
>>>> ]butfirst[1]splitregexp[</li>]butlast[1]regexp<realchars>]">
>>>>
>>>> </$list>
>>>> </$vars>
>>>>
>>>> Input:
>>>>
>>>> More text here
>>>> <li>line 3</li>
>>>> <li>line 2</li>
>>>> <li>line 1</li>
>>>> More text there
>>>>
>>>> Output
>>>>
>>>>
>>>> line 3 <https://tiddlywiki.com/#line%203>
>>>> line 2 <https://tiddlywiki.com/#line%202>
>>>> line 1 <https://tiddlywiki.com/#line%201>
>>>>
>>>>
>>>>
>>>> Good luck!
>>>>
>>>> On Thursday, August 22, 2019 at 2:21:34 AM UTC-7, TonyM wrote:
>>>>>
>>>>> Jeremy,
>>>>>
>>>>> You are aware I do not want so much to parse it as locate the content 
>>>>> between matching tags.
>>>>>
>>>>> Its intention is to access content delimited by html tags inside the 
>>>>> text content.
>>>>>
>>>>> Perhaps we could use it to retrieve items between the section div tags 
>>>>> or all instances of text between the li tags.
>>>>>
>>>>> Regards
>>>>> Tony
>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tiddlywiki/344b59db-6e88-4bc4-bbe5-b88ef421ede2%40googlegroups.com.

Re: [tw5] Re: Tiddlywiki and regexp

Reply via email to