Re: Advanced Grep querys

2019-05-01 Thread Christopher Stone
On 04/27/2019, at 09:01, Phil Emery mailto:focusedcreat...@gmail.com>> wrote:
> I assumed as BBEdit could detect the div (you click on the start of 
> Main-content and it selects everything till the close tag) that it could do 
> this.


Hey Phil,

Nyet.

As Patrick has explained two different mechanisms are involved.

> The html is very well formatted as it's an export from a Hubspot website.

Ah, in that case the task should be pretty simple.

Find:

(^\h+)(

Since the HTML is well formatted we can use the indention level of the 
main-content div to find its closing tag.


--
Best Regards,
Chris

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.


Re: Advanced Grep querys

2019-04-28 Thread Phil Emery
Right. Here is an example from a basic internal page. They're mostly the 
same structure wise - everything different from one page to another is in 
the "main-content" div.

If Hubspot was honest, it would allow you to export your content as an XML 
file but I guess they don't want to make it easy for people to leave.

On Friday, 26 April 2019 14:42:38 UTC-4, Phil Emery wrote:
>
> Hi
>
> Long time BBEditer (v3) here.
>
> In the past I've eventually gotten Grep to do some pretty cool things, but 
> this is a bit out of my expertise -- not being a programmer.
>
> I have a folder full of html pages (.html). Each file has a lot of stuff 
> we don't need. Every page has content within a div called "main-content". 
> The only stuff we need is within that div.
>
> I'm pretty sure that via some fancy Grep-ing it could delete everything 
> out of those files EXCEPT what's contained within the "main-content" div. 
> But I can't figure it out.
>
> I was hoping there was a "Grep builder" on the web somewhere but there 
> doesn't seem to be one that I can find.
>
> Any help would be greatly appreciated.
>
>

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.
Title: Health First







 
  
  
   

 
  
   


 
  
   

 
  
   

 
  
   
 
  

 
 
  
  The smart solution for authors 
 
  

 
   

  
   
 
  

 
   


 
  
   

 
  
  

 
 Home 
 Blog 
 Publish 
   
   Why Publish with BPS Books? 
   Meet the Publisher 
   
 Fees 
 Submit a Manuscript 
 Browse Books 
   
   Biography / Autobiography 
 
 Crash Test 
 Chronicles of Ginger Farm 
 Confessions of a Trauma Therapist 
 On Retirements 
 Beautiful Buttons 
 How I Succeeded in Retirement 
 Growing Up Jewish in China 
 A Different Road 
 Finding Matthew 
 A Good Home 
 An Honest House 
 All My Loving: Coming of Age with Paul McCartney i 
 A Diplomat in Environmentalist's Clothing: A Memoi 
 The Fight of Our Life: A True Story of Crisis, Hop 
 One Foot in Jamaica: A Memoir 
 The Gift of Memoir 
 
   Body, mind, soul 
 
 Conversations for Power and Possibility 
 Sheeba's Secret 
 Glimpses Through the Mirror 
 Tapping the Iceberg 
 G3: The Gift of You, Leadership, and Netgiving 
 The Natural Brilliance of the Soul 
 Supercharge Your Emotions to Win 
  
 
   Business 
 
 Beyond the Bull 
 I Wish Someone Had Told Me That 
 Finding the Sticking Point 
 Juice: The Power of Conversation 
 Love At Work 
 SmartBounce 
 Wealthbuilding 
 The Business Transition Crisis 
 Gifts of Leadership 
 There's Always a Way to Sell Your Business 
 Just in Time Management 
 AU-DELÀ DE L'ÉGO 
 Le 

Re: Advanced Grep querys

2019-04-27 Thread Patrick Woolsey


> On Apr 27, 2019, at 10:01, Phil Emery  wrote:
> 
> thanks. I assumed as BBEdit could detect the div (you click on the start of 
> Main-content and it selects everything till the close tag) that it could do 
> this. The html is very well formatted as it's an export from a Hubspot 
> website.


Sorry but balancing tags is a separate action that does require parsing the 
HTML, whereas grep searches are governed by the standard rules for regular 
expression matching, which (as Sam mentioned :-) are not well-suited to parsing 
HTML.

If however your files contain any distinct content immediately after the 
"main-content" div, it may be possible to construct a search which keys on that 
content, or perhaps there may be some other way to come to grips with the task 
-- either way, an actual sample file would be helpful. :-)


Regards,

 Patrick Woolsey
==
Bare Bones Software, Inc. 

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.


Re: Advanced Grep querys

2019-04-27 Thread Phil Emery
thanks. I assumed as BBEdit could detect the div (you click on the start of 
Main-content and it selects everything till the close tag) that it could do 
this. The html is very well formatted as it's an export from a Hubspot 
website.

On Friday, 26 April 2019 16:21:22 UTC-4, Sam Hathaway wrote:
>
> I’m not sure this can be made to be reliable. Regular expressions can’t 
> balance tags, so:
>
> 
>  Leave me out!
>  
>   Include me!
>And me!
>   And also me!
>  
>  But not me.
> 
>
> Will result in:
>
> 
>   Include me!
>And me!
>
> If you make the pattern greedy, you’ll get:
>
> 
>   Include me!
>And me!
>   And also me!
>  
>  But not me.
> 
>
> Just fine if you don’t have any DIVs inside your main-content DIV, but how 
> likely is that?
>
> Better off using a tool that’s designed to manipulate HTML. Some options 
> here 
> 
> .
>
> It’d be lovely if BBEdit could allow find/replace based on CSS selectors 
> or XPath expressions in addition to text and regexps. But presumably that 
> would be a large undertaking.
>
> Just my 2¢.
> -sam
>
> On 26 Apr 2019, at 16:11, Patrick Woolsey wrote:
>
> On 4/26/19 at 3:07 PM, focused...@gmail.com  (Phil Emery) 
> wrote:
>
> Ideally it would be great to delete all other content from each existing 
> file.
>
> OK, thanks and in that case, you should be able to obtain the desired 
> outcome by performing a multi-file search & replace with "Grep" enabled and 
> patterns like these:
>
> Find: \A(?s).+?((?s).+?)(?s).+
>
> Replace: \1
>
> and in short, here's how the patterns work:
>
> The Find pattern begins by matching at the start of the document \A and 
> then _non-greedily_ matches ? one or more instances of any character .+ 
> _including_ line breaks (achieved by pre-pending (?s) to the .) and 
> followed by a single _sub-pattern_, whose contents are enclosed in 
> parentheses ( ) and consist of the opening div followed by one or more 
> characters in another non-greedy match across lines and then a closing div, 
> and finally matching any characters remaining in the document, including 
> line breaks (?s).+
>
> [NB: You'll need to adjust the exact form of the desired  to suit
> your content, i.e. depending whether these sections are identified by
> 'class', 'name', or 'id'.]
>
> The Replace pattern then reinserts only the contents of the matched 
> subpattern (consisting of the desired div, its contents, and the closing 
> div), thus effectively deleting everything else.
>
> As always, I recommend you try this procedure out on a few sample files or 
> a cloned copy before applying it to your actual data, just to make sure 
> it's doing what you expect/want. :-)
>
>
> Regards,
>
> Patrick Woolsey
> ==
> Bare Bones Software, Inc. 
>
> -- 
> This is the BBEdit Talk public discussion group. If you have a feature 
> request or need technical support, please email
> "sup...@barebones.com " rather than posting to the group.
> Follow @bbedit on Twitter: 
> --- You received this message because you are subscribed to the Google 
> Groups "BBEdit Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to bbe...@googlegroups.com .
> To post to this group, send email to bbe...@googlegroups.com 
> .
> Visit this group at https://groups.google.com/group/bbedit.
>
>

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.


Re: Advanced Grep querys

2019-04-26 Thread Sam Hathaway

On 26 Apr 2019, at 16:38, Patrick Woolsey wrote:

sometimes it can still be helpful to start with a simple solution and 
work out from there.


Agreed!
-sam

--
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email

"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.


Re: Advanced Grep querys

2019-04-26 Thread Patrick Woolsey
That is true and my apologies for not including a suitable 
caveat in my prior post, though sometimes it can still be 
helpful to start with a simple solution and work out from there. :-)


Regards,

  -- Patrick


On 4/26/19 at 4:21 PM, list.bbe...@munkynet.org (Sam Hathaway) wrote:


I’m not sure this can be made to be reliable. Regular expressions can’t balance 
tags, so:

```html

Leave me out!

Include me!
And me!
And also me!

But not me.

```

Will result in:

```html

Include me!
And me!
```

If you make the pattern greedy, you’ll get:

```html

Include me!
And me!
And also me!

But not me.

```

Just fine if you don’t have any DIVs inside your main-content DIV, but how 
likely is that?

Better off using a tool that’s designed to manipulate HTML. 
Some options [here](https://superuser.com/questions/528709/command-line-css-selector-tool/528728).


It’d be lovely if BBEdit could allow find/replace based on 
CSS selectors or XPath expressions in addition to text and 
regexps. But presumably that would be a large undertaking.


Just my 2¢.
-sam

On 26 Apr 2019, at 16:11, Patrick Woolsey wrote:


On 4/26/19 at 3:07 PM, focusedcreat...@gmail.com (Phil Emery) wrote:

Ideally it would be great to delete all other content from 
each existing file.


OK, thanks and in that case, you should be able to obtain the 
desired outcome by performing a multi-file search & replace 
with "Grep" enabled and patterns like these:


Find:  \A(?s).+?((?s).+?)(?s).+

Replace:   \1

and in short, here's how the patterns work:

The Find pattern begins by matching at the start of the 
document \A and then _non-greedily_ matches ? one or more 
instances of any character .+ _including_ line breaks 
(achieved by pre-pending (?s) to the .) and followed by a 
single _sub-pattern_, whose contents are enclosed in 
parentheses ( ) and consist of the opening div followed by one 
or more characters in another non-greedy match across lines 
and then a closing div, and finally matching any characters 
remaining in the document, including line breaks (?s).+


[NB: You'll need to adjust the exact form of the desired  
to suit
your content, i.e. depending whether these sections are 
identified by

'class', 'name', or 'id'.]

The Replace pattern then reinserts only the contents of the 
matched subpattern (consisting of the desired div, its 
contents, and the closing div), thus effectively deleting 
everything else.


As always, I recommend you try this procedure out on a few 
sample files or a cloned copy before applying it to your 
actual data, just to make sure it's doing what you 
expect/want. :-)



Regards,

Patrick Woolsey
==
Bare Bones Software, Inc. 

--
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email

"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- You received this message because you are subscribed to 
the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from 
it, send an email to bbedit+unsubscr...@googlegroups.com.

To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.




--
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email

"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.


Re: Advanced Grep querys

2019-04-26 Thread Sam Hathaway
I’m not sure this can be made to be reliable. Regular expressions 
can’t balance tags, so:


```html

 Leave me out!
 
  Include me!
   And me!
  And also me!
 
 But not me.

```

Will result in:

```html

  Include me!
   And me!
```

If you make the pattern greedy, you’ll get:

```html

  Include me!
   And me!
  And also me!
 
 But not me.

```

Just fine if you don’t have any DIVs inside your main-content DIV, but 
how likely is that?


Better off using a tool that’s designed to manipulate HTML. Some 
options 
[here](https://superuser.com/questions/528709/command-line-css-selector-tool/528728).


It’d be lovely if BBEdit could allow find/replace based on CSS 
selectors or XPath expressions in addition to text and regexps. But 
presumably that would be a large undertaking.


Just my 2¢.
-sam

On 26 Apr 2019, at 16:11, Patrick Woolsey wrote:


On 4/26/19 at 3:07 PM, focusedcreat...@gmail.com (Phil Emery) wrote:

Ideally it would be great to delete all other content from each 
existing file.


OK, thanks and in that case, you should be able to obtain the desired 
outcome by performing a multi-file search & replace with "Grep" 
enabled and patterns like these:


Find:  \A(?s).+?((?s).+?)(?s).+

Replace:   \1

and in short, here's how the patterns work:

The Find pattern begins by matching at the start of the document \A 
and then _non-greedily_ matches ? one or more instances of any 
character .+ _including_ line breaks (achieved by pre-pending (?s) to 
the .) and followed by a single _sub-pattern_, whose contents are 
enclosed in parentheses ( ) and consist of the opening div followed by 
one or more characters in another non-greedy match across lines and 
then a closing div, and finally matching any characters remaining in 
the document, including line breaks (?s).+


   [NB: You'll need to adjust the exact form of the desired  to 
suit
your content, i.e. depending whether these sections are identified 
by

'class', 'name', or 'id'.]

The Replace pattern then reinserts only the contents of the matched 
subpattern (consisting of the desired div, its contents, and the 
closing div), thus effectively deleting everything else.


As always, I recommend you try this procedure out on a few sample 
files or a cloned copy before applying it to your actual data, just to 
make sure it's doing what you expect/want. :-)



Regards,

  Patrick Woolsey
==
Bare Bones Software, Inc. 

--
This is the BBEdit Talk public discussion group. If you have a feature 
request or need technical support, please email

"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- You received this message because you are subscribed to the Google 
Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to bbedit+unsubscr...@googlegroups.com.

To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.


--
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email

"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.


Re: Advanced Grep querys

2019-04-26 Thread Patrick Woolsey

On 4/26/19 at 3:07 PM, focusedcreat...@gmail.com (Phil Emery) wrote:

Ideally it would be great to delete all other content from each 
existing file.


OK, thanks and in that case, you should be able to obtain the 
desired outcome by performing a multi-file search & replace with 
"Grep" enabled and patterns like these:


Find:  \A(?s).+?((?s).+?)(?s).+

Replace:   \1

and in short, here's how the patterns work:

The Find pattern begins by matching at the start of the document 
\A and then _non-greedily_ matches ? one or more instances of 
any character .+ _including_ line breaks (achieved by 
pre-pending (?s) to the .) and followed by a single 
_sub-pattern_, whose contents are enclosed in parentheses ( ) 
and consist of the opening div followed by one or more 
characters in another non-greedy match across lines and then a 
closing div, and finally matching any characters remaining in 
the document, including line breaks (?s).+


   [NB: You'll need to adjust the exact form of the desired 
 to suit
your content, i.e. depending whether these sections are 
identified by

'class', 'name', or 'id'.]

The Replace pattern then reinserts only the contents of the 
matched subpattern (consisting of the desired div, its contents, 
and the closing div), thus effectively deleting everything else.


As always, I recommend you try this procedure out on a few 
sample files or a cloned copy before applying it to your actual 
data, just to make sure it's doing what you expect/want. :-)



Regards,

  Patrick Woolsey
==
Bare Bones Software, Inc. 

--
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email

"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.


Re: Advanced Grep querys

2019-04-26 Thread Phil Emery
Ideally it would be great to delete all other content from each existing 
file.



On Friday, 26 April 2019 14:59:34 UTC-4, Patrick Woolsey wrote:
>
> On 4/26/19 at 2:40 PM, focused...@gmail.com  (Phil Emery) 
> wrote: 
>
> >I have a folder full of html pages (.html). Each file has a lot 
> >of stuff we don't need. Every page has content within a div 
> >called "main-content". The only stuff we need is within that div. 
> > 
> >I'm pretty sure that via some fancy Grep-ing it could delete 
> >everything out of those files EXCEPT what's contained within 
> >the "main-content" div. But I can't figure it out. 
>
>
> To help define the scope of the task: 
>
> Do you need to delete all the other content from each existing 
> file, or would you instead be satisfied to extract the contents 
> of every such div for use elsewhere? 
>
>
> Regards, 
>
>Patrick Woolsey 
> == 
> Bare Bones Software, Inc.  
>
>

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.


Re: Advanced Grep querys

2019-04-26 Thread Patrick Woolsey

On 4/26/19 at 2:40 PM, focusedcreat...@gmail.com (Phil Emery) wrote:

I have a folder full of html pages (.html). Each file has a lot 
of stuff we don't need. Every page has content within a div 
called "main-content". The only stuff we need is within that div.


I'm pretty sure that via some fancy Grep-ing it could delete 
everything out of those files EXCEPT what's contained within 
the "main-content" div. But I can't figure it out.



To help define the scope of the task:

Do you need to delete all the other content from each existing 
file, or would you instead be satisfied to extract the contents 
of every such div for use elsewhere?



Regards,

  Patrick Woolsey
==
Bare Bones Software, Inc. 

--
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email

"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.


Advanced Grep querys

2019-04-26 Thread Phil Emery
Hi

Long time BBEditer (v3) here.

In the past I've eventually gotten Grep to do some pretty cool things, but 
this is a bit out of my expertise -- not being a programmer.

I have a folder full of html pages (.html). Each file has a lot of stuff we 
don't need. Every page has content within a div called "main-content". The 
only stuff we need is within that div.

I'm pretty sure that via some fancy Grep-ing it could delete everything out 
of those files EXCEPT what's contained within the "main-content" div. But I 
can't figure it out.

I was hoping there was a "Grep builder" on the web somewhere but there 
doesn't seem to be one that I can find.

Any help would be greatly appreciated.

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or need technical support, please email
"supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: 
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To post to this group, send email to bbedit@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.