Hi,

The sample text that I have been parsing up is found in the eCFR site. I
take pages like this:
http://www.ecfr.gov/cgi-bin/text-idx?node=pt14.4.413&rgn=div5
Copy all the regulation text into a tiddler as plain text and clean out the
"back to top" cruft. Then I cut and paste each section into a separate
tiddler using the "New Here" option.

At that point, I have a root tiddler that I can just add a toc macro and a
transclusion through a template to display the full regulation or be able
to search and display easily a specific section.

Things are difficult in a few spots:
* Parsing it up in the first place ... it's quite labour-intensive
** it would be a nice feature to be able to text-slice using the section
symbol (but only when it starts a line)
** a tiddler per paragraph is not terribly useful in the case, especially
if the titles are just meaningless sequential IDs
** it is convenient in this case that, due to the section numbering, each
tiddler name is unique. I can see how a more flexible slicer would be
tricky for more texts to keep the tiddler names unique.

* ordering the resulting tiddlers is tricky ... §413.11 gets ordered before
§413.2
** I have used some plugins to made a utility-tiddler to parse tiddler
titles to extract out the section and subsection in order to add a section
and subsection field to each tiddler to sort numerically on.
*** I have run into situations where i would like to stack up the sort
filters ... nsort(section)nsort(subsection) but it seems that only one gets
parsed.

I dread updating my regulations tiddlywiki when there are updates. Luckily
rulemaking runs at a glacial pace and I haven't had to do so so far ... but
that will change eventually.

/Mike


On Tue, May 10, 2016 at 2:32 PM, Jeremy Ruston <[email protected]>
wrote:

> Hi Amanda
>
> Here's a completely random bit of text I just generated
>
>
> Terrific random text if I may say so.
>
> which matches the format of my real document. It has the headers + plain
> paragraphs with line breaks in-between. This was created in Sublime text
> editor. I then pasted it and split it. As expected, each paragraph was its
> own tiddler.
>
>
> The sample lacks <p> tags around the paragraphs. I tried pasting the text
> as it is into both a text/html tiddler and an ordinary wikitext tiddler. In
> both cases, the broken markup prevents things from working properly.
>
> I added <p> tags around the paragraphs (attached). Processing the result
> as text/html does have the expected output of a separate tiddler for each
> heading and for paragraph.
>
> If that’s looking like it’s going to generate too many tiddlers for your
> texts, then the best approach may be to extend the text-slicer plugin with
> more options, so that we could have a tiddler for each heading plus it’s
> immediate text.
>
> I won’t have time to explore that for a while. The other option would be
> to preprocess your texts to merge contiguous paragraphs, putting a couple
> of <br>s in between.
>
> Btw, I must say I'm overwhelmed and impressed by the community. I posted
> some questions in a few places online regarding other plugins. Everyone
> answered within 24 hours. It just blew me away. I'm used to the less
> friendly communities of major CMS software apps…
>
>
> Thank you — from my perspective the community is also what makes doing
> this such fun.
>
> Best wishes
>
> Jeremy
>
>
> Best,
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "TiddlyWiki" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tiddlywiki/vp62YKOsE54/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tiddlywiki.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tiddlywiki/9AA37EF5-5F6F-4EB1-9C89-7977D81BA573%40gmail.com
> <https://groups.google.com/d/msgid/tiddlywiki/9AA37EF5-5F6F-4EB1-9C89-7977D81BA573%40gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>
> LG
>
> On Monday, May 9, 2016 at 11:58:06 AM UTC-4, Jeremy Ruston wrote:
>>
>> Hi LG
>>
>> Interesting, can you share an excerpt of some of your text?
>>
>> It sounds like you’d benefit from finer control over the tiddlerisation;
>> perhaps making a tiddler for heading, rather than for each paragraph.
>>
>> Best wishes
>>
>> Jeremy
>>
>> On 8 May 2016, at 15:02, LG <[email protected]> wrote:
>>
>> Great tool! Thank you.
>>
>> I copied Word into Mammoth.js (on a WP site) then into a text editor to
>> remove all the paragraph tags. When I pasted the newly cleaned text over
>> into TW and sliced it, I ran into the million or so new tiddlers at the
>> paragraph level. Has there been any progress on keeping the texts at the
>> header level chunking?
>>
>> I have 13.5 years worth of text I'd like to put into TW. Having every
>> single line/paragraph broken into a tiddler would surely break the system
>> (I generate about 300 pages of text every 4 months x 13.5 years).
>>
>> Thanks,
>>
>> LG
>>
>> On Saturday, August 1, 2015 at 8:31:06 AM UTC-4, Jeremy Ruston wrote:
>>>
>>> I've just pushed a new prerelease that includes an early cut of a tool
>>> to slice longer texts into individual tiddlers based on headings and lists.
>>> It's based on ideas that have come up in previous discussions about dealing
>>> with long, structured tiddlers.
>>>
>>> You can try it out at:
>>>
>>> http://tiddlywiki.com/prerelease/editions/text-slicer/index.html
>>>
>>> You'll need to carefully follow the instructions in the "HelloThere"
>>> tiddler:
>>>
>>> * Scroll down to the "Sample Text" tiddler and click on the "text
>>> slicer" icon
>>> * Click the "import" button in the resulting import listing
>>> * Open the tiddler "Sliced up Sample Text"
>>>
>>> You should see a copy of the original text, but you can explore the
>>> table of contents to see how it is composed of individual tiddlers that are
>>> threaded together by tags.
>>>
>>> I'd welcome any feedback on the tool. I would also love some help in
>>> finding a better sample text, something public domain that we can
>>> re-distribute.
>>>
>>> Best wishes
>>>
>>> Jeremy
>>>
>>>
>>> --
>>> Jeremy Ruston
>>> mailto:[email protected]
>>>
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "TiddlyWiki" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tiddlywiki.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tiddlywiki/c7beb1eb-8d72-48dc-b35f-81e73a968d74%40googlegroups.com
> <https://groups.google.com/d/msgid/tiddlywiki/c7beb1eb-8d72-48dc-b35f-81e73a968d74%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
> <sample_text_for_slicer.txt>
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "TiddlyWiki" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tiddlywiki/vp62YKOsE54/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tiddlywiki.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tiddlywiki/9AA37EF5-5F6F-4EB1-9C89-7977D81BA573%40gmail.com
> <https://groups.google.com/d/msgid/tiddlywiki/9AA37EF5-5F6F-4EB1-9C89-7977D81BA573%40gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>


-- 
Michael Wiktowy [email protected]

-- 
You received this message because you are subscribed to the Google Groups 
"TiddlyWiki" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tiddlywiki.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tiddlywiki/CAAkycaWoB8sTgZ831bpP_vnW-h%2B3QY0j0cZowGxVre9wZWch1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to