Barry---

You make a good point, I completely skipped ICON. I used Snobol for years and then one day I wondered if there had been any updates to the language and ran in to Icon and Unicon, but by that time Unicon already existed.

Basically I use the Icon subset of Unicon if the program is very large, but mostly I just use the string parsing in a single straight program without any function calls.

I never saw the ICON book. I think the online PDF books are great, normally I find what I need in one of them.

But it is true that if I had to buy a book I would probably stuck to Snobol.

I kind of like Snobol because it reminds me of assembly language programming.

I tell people about Snobol/Icon/Unicon all the time, because it is such a great tool, but no one is ever interested.

And I don't understand why, because if you need to manipulate text it is the only way to go.

I have been working on a text related project recently and tried to do some searches in VBA for Word.

That is really despicable. I just could not get at what I needed, because you can grab only characters, words or paragraphs, no LINES because the lines are in a flow. And then you have to manipulate either selections or ranges. And then once you have the ranges then you only have very primitive string manipulation functions to deal with them.

I asked around various people and no one I know does any VBA programming in Word, everyone uses VBA with excel.

When I mentioned Snobol/Icon/Unicon to the people I was talking to about my problem, I got only blank stares until I got to Snobol and then everyone says at that point, that they thought that was just a relic of computing history, and they are shocked that there were follow on languages in that family.

I had to solve my problem outside of Word. I did not in this case solve it with Unicon, but with a concordance program. I have a wonderful concordance program from http://www.concordancesoftware.co.uk/ which did the trick, because the text was so malformed that it would have taken for ever to write a program that would recognize all the cases.

But I still had to do a lot of hand manipulation, in order to create the files I needed from the concordance program output.

It seems to me no one has really solved the problem of text manipulation on the fly.

I have to have a certain size problem before it is worth writing a program in Unicon to solve the problem. I am constantly using multiple tools to process the data and massage it into a needed format on all the text that is not worthy of writing a Unicon program to process.

So it seems to me that I must not be the only one doing this. I think there must be a huge hidden market out there for something that manipulates text which is more powerful than regular expressions and search, but less of a problem than programming in Unicon. I have seen several suites of special purpose tools (for example http://www.boxersoftware.com/textmonkey.htm ), but I have never seen anything that solves this problem between what is possible by using spreadsheets together with wordprocessing programs and what is possible using UNICON.

Perhaps if Unicon addressed this hidden market somehow, it could find its niche. Instead of program examples we need something like templates by which programs could be altered to do slightly different processing. The problem is that tools are too specific, but to get variation you need the complete generality of the programming language. There should be something between these extremes where there were adaptable tools that were flexible and changable but still not general.

This is the idea of Domain Specific Languages but I don't think there is anything equivalent to that in the text manipulation world.

I was hoping that XML would fill the bill but it really only makes a bigger problem, because most of the texts do not have tags, and placing the tags in the text is just as big a problem as writing a text recognition program.

For instance, in my latest effort I tried producing the XML output of Word 2007 and then unzipping it go get the core document. But that XML had the text so split up that it was going to be a bigger problem than just dealing with the text in bulk.

Once you get outside of Word by dumping a txt file then many times the text is so screwed up that it is difficult to parse.

So if the TXT is screwed up and the things searched are malformed, then building a text recognizer is complicated. So that is where the concordance program came in handy.

I wish that once I had the concordance program output that I could then parse that. But it was not worth the time write that program even though the concordance program produces txt and html versions of the concordance.

Anyway I know I am waffling a bit, but it just seems that there is a middle ground of text manipulation for which the tools are missing.

I heard today that 161 extabytes (10^18) of data was produced this last year.

In 2003 it was 5 extabytes.

It seems that with all that data, there should be a market for a good text manipulation language.

Especially one that supported some sort of middle level tools for problems where writing a program is not possible, but there is a lot of text processing to be done by hand.

Perhaps there is some level between text and xml which we are missing out on, but where there is a significant amount of work performed.

I know that many times I am using multiple tools to get the final result I am seeking, were I am massaging the document many times with different tools and in multiple passes where each pass does something a little different to it toward achieving the final result.

For instance, one trick I am sure you have all used is to do search and replace or insert into a text document to put in the hooks that I would search for with my text recognizer. Sometimes that is tricky with wildcards. An excellent tool in this regard is SR http://www.funduc.com/search_replace.htm for search and replace across multiple files.

Kent Palmer


X-Verify-SMTP: Host 66.35.250.225 sending to us was not listening
Date: Fri, 9 Mar 2007 05:21:39 -0600
From: [EMAIL PROTECTED]
To: unicon-group@lists.sourceforge.net
User-Agent: Mutt/1.5.13 (2006-08-11)
X-Spam-Score: 1.2 (+)
X-Spam-Report: Spam Filtering performed by sourceforge.net.
        See http://spamassassin.org/tag/ for more details.
        Report problems to
        http://sf.net/tracker/?func=add&group_id=1&atid=200001
        0.2 NO_REAL_NAME           From: does not include a real name
        1.0 FORGED_RCVD_HELO       Received: contains a forged HELO
Subject: Re: [Unicon-group] Ruby Python vs. Icon/Unicon
X-BeenThere: unicon-group@lists.sourceforge.net
X-Mailman-Version: 2.1.8
List-Id: Unicon programming language discussion list
        <unicon-group.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/unicon-group>,

<mailto:[EMAIL PROTECTED]>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum=unicon-group>
List-Post: <mailto:unicon-group@lists.sourceforge.net>
List-Help: <mailto:[EMAIL PROTECTED]>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/unicon-group>,
        <mailto:[EMAIL PROTECTED]>
Sender: [EMAIL PROTECTED]
X-Server: High Performance Mail Server - http://surgemail.com r=1653887525
X-Rcpt-To: <[EMAIL PROTECTED]>
X-SpamDetect-Info: This message may be spam see http://www.smitespam.com for more information X-SpamDetect: *: 1.189000 From4consonants=1.0,From: does not include a real name=0.3,X-Verify-SMTP present=0.6,Aspam=-0.8
X-NotAscii: charset=us-ascii
X-Avast: Message is clean
X-IP-stats: Incoming Last 0, First 6, in=54, out=0, spam=0
X-External-IP: 66.35.250.225
X-ChoiceMail-OriginalAccount: [EMAIL PROTECTED]
X-ChoiceMail-AcceptedReason: Mailing List Message

bryan rasmussen <[EMAIL PROTECTED]> wrote:
> I'm a newbie with Unicon, I basically decided to start with it
> because, well I like learning new languages especially  ones with an
> easily perceived niche. I think the niche of Unicon as you say is text
> processing.
>
> I use XML a lot in my day to day, I'm not sure I understand the
> assertion that Unicon would be great for XML, since the main thing one
> needs for XML programming is easy tree manipulation, such as is
> provided with XSL-T.

There's an XML parser in the uni/xml directory of the CVS. I haven't
used it a lot, though it is likely I will be soon.

I agree with Kent Palmer that the syntax of Icon/Unicon isn't great,
but I'm not sure how much effect that has on popularity, considering
that languages with significant syntax difficulties, such as Perl and
C++, have been adopted anyway. Personally, my best hypothesis would be
that Icon hasn't become more popular mainly because you had to go buy
the book, 'The Icon Programming Language', and thus only the most
curious people ever tried Icon, for they had to go to some effort and
spend money doing it. Perl is an example of a language that became
popular in part because you could try it without buying a book, though
of course you might buy a book or two later. The copylefted Unicon
book as a PDF is a start towards a remedy, but you need more stuff
similar to the Perl manpage(s), the GNU info pages, on-line HTML
tutorials and documentation, and so forth.


--
Barry.SCHWARTZ ĉe chemoelectric punkto org  http://chemoelectric.org
              Free stuff / Senpagaj varoj:  http://crudfactory.com
'Democracies don't war; democracies are peaceful countries.' - Bush
(http://www.whitehouse.gov/news/releases/2005/12/20051219-2.html)


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Unicon-group mailing list
Unicon-group@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/unicon-group
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.3 (GNU/Linux)

iD8DBQFF8UNDBNGXDWV0vIMRAt0MAJ91CTXOmYnavH4srIMhExqDWy8G8gCdEVhM
ca9z4+zbzzjKDN0IDgJpO4c=
=pAqH
-----END PGP SIGNATURE-----
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Unicon-group mailing list
Unicon-group@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/unicon-group

Reply via email to