subject:"Find followed by many lines of arbitrary HTML through next but exclude the second "

Re: Find followed by many lines of arbitrary HTML through next but exclude the second

2021-09-15 Thread Sonic Purity

Thank you to everyone who’s replied. My issue has been solved. For anyone 
else who may be interested, i report my findings below.

On Wednesday, September 15, 2021 at 3:26:11 AM UTC-7 listmei...@gmail.com 
wrote:

> You miswrote your lookahead-assertion.
>

Oh boy i sure did. Thank you for pointing that out.

> Try this instead:
>
> (?s).+?(?=|\Z)
>

^ This is spectacular, and what i’ll be using. After testing it, i made 
myself sit down and re-read the Grep Help to understand what each part of 
the expression is doing. I’d entirely missed the section on the (?s) 
ability to allow . to include \r as well. This knowledge alone will help 
improve a number of my other regular expressions. I’ve not used positional 
assertions like \Z in the past, hence they don’t come to mind—something 
else learned—thanks!

The PERL filters (original and Chris’ modification both tested) failed for 
me: created the folder on the Desktop, but it was empty. The original 
document did remain intact. *But that’s OK* because i‘ve not evolved to the 
point to be doing exactly that yet. No one should spend any more time on 
this PERL filter on my behalf. This is both a learning experience and 
practical matter of getting things done activity for me, so it’s best for 
me to clunk along on training wheels with Text Factories full of Replace 
All clauses and likely AppleScript until i’m ready to learn more and get 
into something like PERL.

Appreciatively,
))Sonic((

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "supp...@barebones.com" rather than 
posting here. Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/d11dfe55-f72c-41d0-bfc6-da74def5a02en%40googlegroups.com.

Re: Find followed by many lines of arbitrary HTML through next but exclude the second

2021-09-15 Thread Christopher Stone

> On Sep 15, 2021, at 00:51, ctfishman  wrote:
> 
> I tried doing this with just a regular expression but couldn't figure out how.

Hey There,

Yeah, you couldn't automate the whole process with regex alone.

> I was however able to do it quite easily with a text filter...
> 
> --
> 
> #!/usr/bin/perl
> 
> # Read each line into a scaler, then print it back
> 
> my $fullstring;
> 
> while (<>) {
> $fullstring .= $_;
> print;
> }

Looks good, although I'd shortcut the above with:


#!/usr/bin/env perl -0777 -nsw

print;


Now the entire string is in $_ and ready to process.


--
Best Regards,
Chris

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "supp...@barebones.com" rather than 
posting here. Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/D625E3D2-F473-4FA1-8A2B-F404DC2FA76F%40gmail.com.

Re: Find followed by many lines of arbitrary HTML through next but exclude the second

2021-09-15 Thread Christopher Stone

> On Sep 14, 2021, at 16:57, Sonic Purity  wrote:
> Re-reading the Grep help file with BBEdit, i thought lookahead might help. I 
> tried:
> 
> ([\s\S]+?)(?)
> 

Hey There,

You miswrote your lookahead-assertion.

This:
> ([\s\S]+?)(?)
> 


Should look like this:


([\s\S]+?)(?=)


This is fine, except it will exclude your last chapter.


Try this instead:

(?s).+?(?=|\Z)


--
Best Regards,
Chris

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "supp...@barebones.com" rather than 
posting here. Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/44B8E425-64DA-4A2F-B315-2E0E26E019BC%40gmail.com.

Re: Find followed by many lines of arbitrary HTML through next but exclude the second

2021-09-14 Thread ctfishman

I tried doing this with just a regular expression but couldn't figure out 
how. I was however able to do it quite easily with a text filter. The 
following PERL example works for me to split the text and create and save 
an individual file for each chapter.

Save the following in your text filters folder and run it against your 
document.

--

#!/usr/bin/perl

# Read each line into a scaler, then print it back

my $fullstring;

while (<>) {
$fullstring .= $_;
print;
}

# split the scaler into an array

my @h2s = split( //, $fullstring );

# Delete the first item of the array, which will be empty because our text 
starts with ""

shift @h2s;

# add back the "" at the start of each array element
# which was removed when we did the split

foreach $string (@h2s) {
$string = "" . $string;
}

# Now the array contains each of your chapters, one per element.
# The following will create a new directory on your desktop called 
"Chapters"
# (if it doesn't exist already) and save a new document with the text from 
each
# chapter/array element. The original document will be the same as when it 
started,
#  because we printed each line back out after we read it.

my $counter = 1;

print `mkdir -p ~/Desktop/Chapters/`;

for (@h2s) {
open( CHAPTER, ">~/Desktop/Chapters/chapter$counter.html" );
print CHAPTER $_;
close(CHAPTER);
$counter++;
}

On Tuesday, September 14, 2021 at 6:17:42 PM UTC-4 sonic...@gmail.com wrote:

> My fiction writing workflow initially produces one HTML document with the 
> entire novel’s content. Each chapter starts with Exciting Chapter Title 
> Here then many paragraphs of story text with arbitrary HTML markup. I 
> split each chapter into its own HTML page, containing everything from that 
> first  with the chapter title through the end of the chapter, which is 
> always immediately before the subsequent opening  for the following 
> chapter (in the original un-split document).
>
> Working manually, i’ve been using the Grep Find:
>
> ([\s\S]+?)
>
> This works perfectly, other than it includes the  at the start of the 
> next chapter in the selection i’m about to cut or copy into a new HTML 
> document. I manually back off the selection to include everything found 
> minus that ending . I would like to better automate my workflow, but 
> can’t with the need for this manual adjustment.
>
> Re-reading the Grep help file with BBEdit, i thought lookahead might help. 
> I tried:
>
> ([\s\S]+?)(?)
>
> but that just finds the first  and one character immediately following 
> it. Noticing that BBEdit is highlighting the < for that second , i 
> tried escaping it:
>
> ([\s\S]+?)(?\)
>
> This throws a PCRE error: unrecognized character after (? or (?- (12)
>
> Can anyone suggest a search string that will accomplish my goal?
>
> (BBEdit 11.6.8 running under macOS 10.12.6 Sierra.)
>
> Thanks!
>

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "supp...@barebones.com" rather than 
posting here. Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/15847dcc-6be4-49c1-b6f3-54606099a5ffn%40googlegroups.com.

Re: Find followed by many lines of arbitrary HTML through next but exclude the second

2021-09-14 Thread Tom Robinson

Misread that part.  Try this:

(?<=)([\s\S]+?)(?=)

Cheers


> On 2021-09-15, at 13:15, Sonic Purity  wrote:
> 
> Thank you, but all that does for me is select that one entire chapter 
> heading, not the entire chapter heading plus all the paragraphs of text 
> below. In other words in my example that string selects Exciting Chapter 
> Title Here, but nothing else.
> 
> On Tuesday, September 14, 2021 at 4:03:49 PM UTC-7 Tom Robinson wrote:
> Try this, using positive lookahead and lookbehind assertions:
> 
> (?<=).+(?=)
> 
> Cheers
> 
> 
>> On 2021-09-15, at 09:57, Sonic Purity > > wrote:
>> 
>> My fiction writing workflow initially produces one HTML document with the 
>> entire novel’s content. Each chapter starts with Exciting Chapter Title 
>> Here then many paragraphs of story text with arbitrary HTML markup. I 
>> split each chapter into its own HTML page, containing everything from that 
>> first  with the chapter title through the end of the chapter, which is 
>> always immediately before the subsequent opening  for the following 
>> chapter (in the original un-split document).
>> 
>> Working manually, i’ve been using the Grep Find:
>> ([\s\S]+?)
>> 
>> This works perfectly, other than it includes the  at the start of the 
>> next chapter in the selection i’m about to cut or copy into a new HTML 
>> document. I manually back off the selection to include everything found 
>> minus that ending . I would like to better automate my workflow, but 
>> can’t with the need for this manual adjustment.
>> 
>> Re-reading the Grep help file with BBEdit, i thought lookahead might help. I 
>> tried:
>> 
>> ([\s\S]+?)(?)
>> 
>> but that just finds the first  and one character immediately following 
>> it. Noticing that BBEdit is highlighting the < for that second , i tried 
>> escaping it:
>> 
>> ([\s\S]+?)(?\)
>> 
>> This throws a PCRE error: unrecognized character after (? or (?- (12)
>> 
>> Can anyone suggest a search string that will accomplish my goal?
>> 
>> (BBEdit 11.6.8 running under macOS 10.12.6 Sierra.)
>> 
>> Thanks!
>> 

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "supp...@barebones.com" rather than 
posting here. Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/F8B467E3-9DF4-493B-8F39-33D76B616B34%40gmail.com.

Re: Find followed by many lines of arbitrary HTML through next but exclude the second

2021-09-14 Thread Sonic Purity

Thank you, but all that does for me is select that one entire chapter 
heading, not the entire chapter heading plus all the paragraphs of text 
below. In other words in my example that string selects Exciting Chapter 
Title Here, but nothing else.

On Tuesday, September 14, 2021 at 4:03:49 PM UTC-7 Tom Robinson wrote:

> Try this, using positive lookahead and lookbehind assertions:
>
> (?<=).+(?=)
>
> Cheers
>
>
> On 2021-09-15, at 09:57, Sonic Purity  wrote:
>
> My fiction writing workflow initially produces one HTML document with the 
> entire novel’s content. Each chapter starts with Exciting Chapter Title 
> Here then many paragraphs of story text with arbitrary HTML markup. I 
> split each chapter into its own HTML page, containing everything from that 
> first  with the chapter title through the end of the chapter, which is 
> always immediately before the subsequent opening  for the following 
> chapter (in the original un-split document).
>
> Working manually, i’ve been using the Grep Find:
>
> ([\s\S]+?)
>
> This works perfectly, other than it includes the  at the start of the 
> next chapter in the selection i’m about to cut or copy into a new HTML 
> document. I manually back off the selection to include everything found 
> minus that ending . I would like to better automate my workflow, but 
> can’t with the need for this manual adjustment.
>
> Re-reading the Grep help file with BBEdit, i thought lookahead might help. 
> I tried:
>
> ([\s\S]+?)(?)
>
> but that just finds the first  and one character immediately following 
> it. Noticing that BBEdit is highlighting the < for that second , i 
> tried escaping it:
>
> ([\s\S]+?)(?\)
>
> This throws a PCRE error: unrecognized character after (? or (?- (12)
>
> Can anyone suggest a search string that will accomplish my goal?
>
> (BBEdit 11.6.8 running under macOS 10.12.6 Sierra.)
>
> Thanks!
>
>
>

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "supp...@barebones.com" rather than 
posting here. Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/7d92c4bf-2bf0-49af-b072-ca870c7fd145n%40googlegroups.com.

Re: Find followed by many lines of arbitrary HTML through next but exclude the second

2021-09-14 Thread Tom Robinson

Try this, using positive lookahead and lookbehind assertions:

(?<=).+(?=)

Cheers


> On 2021-09-15, at 09:57, Sonic Purity  wrote:
> 
> My fiction writing workflow initially produces one HTML document with the 
> entire novel’s content. Each chapter starts with Exciting Chapter Title 
> Here then many paragraphs of story text with arbitrary HTML markup. I 
> split each chapter into its own HTML page, containing everything from that 
> first  with the chapter title through the end of the chapter, which is 
> always immediately before the subsequent opening  for the following 
> chapter (in the original un-split document).
> 
> Working manually, i’ve been using the Grep Find:
> ([\s\S]+?)
> 
> This works perfectly, other than it includes the  at the start of the 
> next chapter in the selection i’m about to cut or copy into a new HTML 
> document. I manually back off the selection to include everything found minus 
> that ending . I would like to better automate my workflow, but can’t with 
> the need for this manual adjustment.
> 
> Re-reading the Grep help file with BBEdit, i thought lookahead might help. I 
> tried:
> 
> ([\s\S]+?)(?)
> 
> but that just finds the first  and one character immediately following 
> it. Noticing that BBEdit is highlighting the < for that second , i tried 
> escaping it:
> 
> ([\s\S]+?)(?\)
> 
> This throws a PCRE error: unrecognized character after (? or (?- (12)
> 
> Can anyone suggest a search string that will accomplish my goal?
> 
> (BBEdit 11.6.8 running under macOS 10.12.6 Sierra.)
> 
> Thanks!
> 

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "supp...@barebones.com" rather than 
posting here. Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/70D20E25-6152-437F-8167-7F11144FA3D8%40gmail.com.

Find followed by many lines of arbitrary HTML through next but exclude the second

2021-09-14 Thread Sonic Purity

My fiction writing workflow initially produces one HTML document with the 
entire novel’s content. Each chapter starts with Exciting Chapter Title 
Here then many paragraphs of story text with arbitrary HTML markup. I 
split each chapter into its own HTML page, containing everything from that 
first  with the chapter title through the end of the chapter, which is 
always immediately before the subsequent opening  for the following 
chapter (in the original un-split document).

Working manually, i’ve been using the Grep Find:

([\s\S]+?)

This works perfectly, other than it includes the  at the start of the 
next chapter in the selection i’m about to cut or copy into a new HTML 
document. I manually back off the selection to include everything found 
minus that ending . I would like to better automate my workflow, but 
can’t with the need for this manual adjustment.

Re-reading the Grep help file with BBEdit, i thought lookahead might help. 
I tried:

([\s\S]+?)(?)

but that just finds the first  and one character immediately following 
it. Noticing that BBEdit is highlighting the < for that second , i 
tried escaping it:

([\s\S]+?)(?\)

This throws a PCRE error: unrecognized character after (? or (?- (12)

Can anyone suggest a search string that will accomplish my goal?

(BBEdit 11.6.8 running under macOS 10.12.6 Sierra.)

Thanks!

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "supp...@barebones.com" rather than 
posting here. Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/de7c4288-00f9-4aa9-9546-4e7637fcf007n%40googlegroups.com.

Re: Find followed by many lines of arbitrary HTML through next but exclude the second

Re: Find followed by many lines of arbitrary HTML through next but exclude the second

Re: Find followed by many lines of arbitrary HTML through next but exclude the second

Re: Find followed by many lines of arbitrary HTML through next but exclude the second

Re: Find followed by many lines of arbitrary HTML through next but exclude the second

Re: Find followed by many lines of arbitrary HTML through next but exclude the second

Re: Find followed by many lines of arbitrary HTML through next but exclude the second

Find followed by many lines of arbitrary HTML through next but exclude the second

8 matches

Site Navigation

Mail list logo

Footer information