Hi, The sample text that I have been parsing up is found in the eCFR site. I take pages like this: http://www.ecfr.gov/cgi-bin/text-idx?node=pt14.4.413&rgn=div5 Copy all the regulation text into a tiddler as plain text and clean out the "back to top" cruft. Then I cut and paste each section into a separate tiddler using the "New Here" option.
At that point, I have a root tiddler that I can just add a toc macro and a transclusion through a template to display the full regulation or be able to search and display easily a specific section. Things are difficult in a few spots: * Parsing it up in the first place ... it's quite labour-intensive ** it would be a nice feature to be able to text-slice using the section symbol (but only when it starts a line) ** a tiddler per paragraph is not terribly useful in the case, especially if the titles are just meaningless sequential IDs ** it is convenient in this case that, due to the section numbering, each tiddler name is unique. I can see how a more flexible slicer would be tricky for more texts to keep the tiddler names unique. * ordering the resulting tiddlers is tricky ... §413.11 gets ordered before §413.2 ** I have used some plugins to made a utility-tiddler to parse tiddler titles to extract out the section and subsection in order to add a section and subsection field to each tiddler to sort numerically on. *** I have run into situations where i would like to stack up the sort filters ... nsort(section)nsort(subsection) but it seems that only one gets parsed. I dread updating my regulations tiddlywiki when there are updates. Luckily rulemaking runs at a glacial pace and I haven't had to do so so far ... but that will change eventually. /Mike On Tue, May 10, 2016 at 2:32 PM, Jeremy Ruston <[email protected]> wrote: > Hi Amanda > > Here's a completely random bit of text I just generated > > > Terrific random text if I may say so. > > which matches the format of my real document. It has the headers + plain > paragraphs with line breaks in-between. This was created in Sublime text > editor. I then pasted it and split it. As expected, each paragraph was its > own tiddler. > > > The sample lacks <p> tags around the paragraphs. I tried pasting the text > as it is into both a text/html tiddler and an ordinary wikitext tiddler. In > both cases, the broken markup prevents things from working properly. > > I added <p> tags around the paragraphs (attached). Processing the result > as text/html does have the expected output of a separate tiddler for each > heading and for paragraph. > > If that’s looking like it’s going to generate too many tiddlers for your > texts, then the best approach may be to extend the text-slicer plugin with > more options, so that we could have a tiddler for each heading plus it’s > immediate text. > > I won’t have time to explore that for a while. The other option would be > to preprocess your texts to merge contiguous paragraphs, putting a couple > of <br>s in between. > > Btw, I must say I'm overwhelmed and impressed by the community. I posted > some questions in a few places online regarding other plugins. Everyone > answered within 24 hours. It just blew me away. I'm used to the less > friendly communities of major CMS software apps… > > > Thank you — from my perspective the community is also what makes doing > this such fun. > > Best wishes > > Jeremy > > > Best, > > -- > You received this message because you are subscribed to a topic in the > Google Groups "TiddlyWiki" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/tiddlywiki/vp62YKOsE54/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tiddlywiki. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tiddlywiki/9AA37EF5-5F6F-4EB1-9C89-7977D81BA573%40gmail.com > <https://groups.google.com/d/msgid/tiddlywiki/9AA37EF5-5F6F-4EB1-9C89-7977D81BA573%40gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > > LG > > On Monday, May 9, 2016 at 11:58:06 AM UTC-4, Jeremy Ruston wrote: >> >> Hi LG >> >> Interesting, can you share an excerpt of some of your text? >> >> It sounds like you’d benefit from finer control over the tiddlerisation; >> perhaps making a tiddler for heading, rather than for each paragraph. >> >> Best wishes >> >> Jeremy >> >> On 8 May 2016, at 15:02, LG <[email protected]> wrote: >> >> Great tool! Thank you. >> >> I copied Word into Mammoth.js (on a WP site) then into a text editor to >> remove all the paragraph tags. When I pasted the newly cleaned text over >> into TW and sliced it, I ran into the million or so new tiddlers at the >> paragraph level. Has there been any progress on keeping the texts at the >> header level chunking? >> >> I have 13.5 years worth of text I'd like to put into TW. Having every >> single line/paragraph broken into a tiddler would surely break the system >> (I generate about 300 pages of text every 4 months x 13.5 years). >> >> Thanks, >> >> LG >> >> On Saturday, August 1, 2015 at 8:31:06 AM UTC-4, Jeremy Ruston wrote: >>> >>> I've just pushed a new prerelease that includes an early cut of a tool >>> to slice longer texts into individual tiddlers based on headings and lists. >>> It's based on ideas that have come up in previous discussions about dealing >>> with long, structured tiddlers. >>> >>> You can try it out at: >>> >>> http://tiddlywiki.com/prerelease/editions/text-slicer/index.html >>> >>> You'll need to carefully follow the instructions in the "HelloThere" >>> tiddler: >>> >>> * Scroll down to the "Sample Text" tiddler and click on the "text >>> slicer" icon >>> * Click the "import" button in the resulting import listing >>> * Open the tiddler "Sliced up Sample Text" >>> >>> You should see a copy of the original text, but you can explore the >>> table of contents to see how it is composed of individual tiddlers that are >>> threaded together by tags. >>> >>> I'd welcome any feedback on the tool. I would also love some help in >>> finding a better sample text, something public domain that we can >>> re-distribute. >>> >>> Best wishes >>> >>> Jeremy >>> >>> >>> -- >>> Jeremy Ruston >>> mailto:[email protected] >>> >> >> > -- > You received this message because you are subscribed to the Google Groups > "TiddlyWiki" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tiddlywiki. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tiddlywiki/c7beb1eb-8d72-48dc-b35f-81e73a968d74%40googlegroups.com > <https://groups.google.com/d/msgid/tiddlywiki/c7beb1eb-8d72-48dc-b35f-81e73a968d74%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > <sample_text_for_slicer.txt> > > > -- > You received this message because you are subscribed to a topic in the > Google Groups "TiddlyWiki" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/tiddlywiki/vp62YKOsE54/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tiddlywiki. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tiddlywiki/9AA37EF5-5F6F-4EB1-9C89-7977D81BA573%40gmail.com > <https://groups.google.com/d/msgid/tiddlywiki/9AA37EF5-5F6F-4EB1-9C89-7977D81BA573%40gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > -- Michael Wiktowy [email protected] -- You received this message because you are subscribed to the Google Groups "TiddlyWiki" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tiddlywiki. To view this discussion on the web visit https://groups.google.com/d/msgid/tiddlywiki/CAAkycaWoB8sTgZ831bpP_vnW-h%2B3QY0j0cZowGxVre9wZWch1g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

