Re: *Buffy's Not Included

2001-06-15 Thread Lucy McWilliam


On Thu, 14 Jun 2001, Leo Lapworth (on topic, for a change) wrote:

 XML - do it because you need it, not because of the Buzz.

Indeed ;-)


 XML is cool for handeling complex (or varied) data and sharing
 this info with others (but if CSV will do, then use that!).

I'll soon be implementing an annotation/transformation tool to produce
large, *complicated* XML files containing information about microarray
experiments.  Would be grateful if anyone could suggest any useful
tools/modules.


L.
www.gen.cam.ac.uk/~flychip




Re: *Buffy's Not Included

2001-06-14 Thread Leo Lapworth

XML - do it because you need it, not because of the Buzz.

XML is cool for handeling complex (or varied) data and sharing
this info with others (but if CSV will do, then use that!).

I'd suggest it VERY much depends on why you want to use it,
what is the ASCII data ?

If it was worth putting the data into XML and you were
worried about the speed of searching, you could always write a 
script (with one of the _many_ XML:: modules) to slurp keywords 
or whatever in from the XML so that you can search it in a DB and have
that point to a file rather than trawling all the XML files
for every search.

XML in a Nutshell is a very good book. 

Leo

- Who's only actually used XML for 2 projects and is proud of that fact.

On Thu, Jun 14, 2001 at 09:25:25AM +0100, Robert Thompson wrote:
 I was wondering what the pro's and con's are of using XML to structure ASCII
 based data files.
 
 What are peoples experience of using XML - particularly where you may have
 to trawl through lots of files to get at the data you want?



RE: *Buffy's Not Included

2001-06-14 Thread Robert Thompson

 From: Leo Lapworth [mailto:[EMAIL PROTECTED]]
 
 XML - do it because you need it, not because of the Buzz.

Never was one for following the hype - just trying to work out if it's the
right tool or not.

Part of the problem is probably that I've heard snippets of what XML can
do/is good for, and I've been given these requirements and now I'm trying to
work out if its The Right Way or Not.
 
 XML is cool for handeling complex (or varied) data and sharing
 this info with others (but if CSV will do, then use that!).

Not sure if CSV will handle it easily...

 I'd suggest it VERY much depends on why you want to use it,
 what is the ASCII data ?

Basically it's the results from a spidering system. The results are put both
into a db and these files (one file per day).

The data will include a ranking score based on the search criteria, URI,
document summary (which will include HTML snippets - although it may not be
properly formed). There may also be some other data that needs to be saved.

CSV is an option - except that an awful lot of the data will need to be
escaped out before it goes into the file, and I would rather only have to do
when its rendered out to the browser.

I could produce my own file format from scratch - and write the tools to
look after it... eg

#URL#http://whatever.com
#SCORE#0
#SUMMARY#'broken html'/a bugger/P

Or I could use one of the XML modules to help me look after the files.


The data from these files will primarily be diplayed within an HTML page. A
perceived advantage of XML here (for someone who has barely scratched the
surface of what XML can do), is the ability to (relatively) easily take the
XML and spit it out to the browser - and yes I know it's never quite that
simple.

I'm also trying to future proof the system slightly - I think that by having
the data XML based it may make it easier to use in new and wonderful ways in
the future, without having to write all the tools from scratch.


 If it was worth putting the data into XML and you were
 worried about the speed of searching, you could always write a 
 script (with one of the _many_ XML:: modules) to slurp keywords 
 or whatever in from the XML so that you can search it in a DB and have
 that point to a file rather than trawling all the XML files
 for every search.

Unfortunately I'm not allowed to make use of the database - it is a
requirement that this particular functionality can cope with the db not
actually being there.

 
 XML in a Nutshell is a very good book. 

I'll look it up


Ta

Rob


---
Any views expressed in this message are those of the individual sender,
except where the sender specifically states them to be the views of IBNet
Plc. 

This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail.  Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and delete
this e-mail from your system. 

E-mail transmission cannot be guaranteed to be secure or error-free as
information could be intercepted, corrupted, lost, destroyed, arrive late or
incomplete, or contain viruses. The sender therefore does not accept
liability for any errors or omissions in the contents of this message which
arise as a result of e-mail transmission. If verification is required please
request a hard-copy version. 




Re: *Buffy's Not Included

2001-06-14 Thread Roger Burton West

On or about Thu, Jun 14, 2001 at 10:01:33AM +0100, Robert Thompson typed:
CSV is an option - except that an awful lot of the data will need to be
escaped out before it goes into the file, and I would rather only have to do
when its rendered out to the browser.

DBD::CSV is your friend. Sits on top of Text::CSV_XS and gives you a
basic SQLish interface.

Roger



Re: *Buffy's Not Included

2001-06-14 Thread Philip Newton

Roger Burton West wrote:
 DBD::CSV is your friend.

I second that. DBD::CSV is yum. Also handles escaping of double quotes or
commas when inserting strings, etc.

Cheers,
Philip
-- 
Philip Newton [EMAIL PROTECTED]
All opinions are my own, not my employer's.
If you're not part of the solution, you're part of the precipitate.



RE: *Buffy's Not Included

2001-06-14 Thread Robert Thompson

 From: Philip Newton [mailto:[EMAIL PROTECTED]]
 Roger Burton West wrote:
  DBD::CSV is your friend.
 
 I second that. DBD::CSV is yum. Also handles escaping of 
 double quotes or
 commas when inserting strings, etc.

DBD:CSV seems to be popular and on reading the docs I can see why...

... did I mention that the powers that be want to ability to hand modify
these files? And possibly by people who do not know what they are doing?

That's part of the reason why I would rather not have to worry about
escaping anything out in the data file itself - just in case somebody breaks
it.


Rob
-
paranoia's just a state of mind
-


---
Any views expressed in this message are those of the individual sender,
except where the sender specifically states them to be the views of IBNet
Plc. 

This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail.  Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and delete
this e-mail from your system. 

E-mail transmission cannot be guaranteed to be secure or error-free as
information could be intercepted, corrupted, lost, destroyed, arrive late or
incomplete, or contain viruses. The sender therefore does not accept
liability for any errors or omissions in the contents of this message which
arise as a result of e-mail transmission. If verification is required please
request a hard-copy version. 




Re: *Buffy's Not Included

2001-06-14 Thread David Cantrell

On Thu, Jun 14, 2001 at 11:19:10AM +0200, Philip Newton wrote:
 Roger Burton West wrote:
  DBD::CSV is your friend.
 
 I second that. DBD::CSV is yum. Also handles escaping of double quotes or
 commas when inserting strings, etc.

Of course, the plain ol' CSV modules handle all the appropriate escaping
too if you use them to build your CSV records.  You do do that, right,
and not try to write the files yourself?

-- 
David Cantrell | [EMAIL PROTECTED] | http://www.cantrell.org.uk/david/

  Good advice is always certain to be ignored,
  but that's no reason not to give it-- Agatha Christie



RE: *Buffy's Not Included

2001-06-14 Thread Ian Brayshaw

  From: Leo Lapworth [mailto:[EMAIL PROTECTED]]
 
  XML - do it because you need it, not because of the Buzz.

Amen.


I'm also trying to future proof the system slightly - I think that by 
having the data XML based it may make it easier to use in new and
wonderful ways in the future, without having to write all the tools from 
scratch.

I'm doing a lot of work at the moment with XML (still fairly rudiemntary 
stuff, though) and for me the future proofing has been the deciding 
factor. We don't know what we need the system to be capable of in the medium 
to long term, and so we've gone down the XML path to keep our options open.

The way I look at it, if it makes the data more readable and easily managed 
(through XPath  XSLT for example), then provided you don't suffer an 
unbearable performance hit, it's worth the effort.

Also, with something reasonably simple, it's probably a safe training ground 
for getting into XML without the pressure of a more critical project. Let 
you find out what you like/dislike about XML without becoming a lemming.


XML in a Nutshell is a very good book

++


My £0.02.


Ian

_
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.




Re: *Buffy's Not Included

2001-06-14 Thread Redvers Davies

 CSV is an option - except that an awful lot of the data will need to be
 escaped out before it goes into the file, and I would rather only have to do
 when its rendered out to the browser.

 Unfortunately I'm not allowed to make use of the database - it is a
 requirement that this particular functionality can cope with the db not
 actually being there.

Sounds like you need DBD::CSV.

DBD::CSV will allow you to operate your text files in the same way that
you operate your database.  You can literally say:

$dbh = DBI-connect(DBI:mysql:database=whatever, user, pass) ## Your
# normal DB access.

$dbhcsv = DBI-connect(DBI:CSV:f_dir=/usr/data/csvstuff); # Open the csv files

$sqlstring = insert into mytable values (?,?,?,?,?,?);

$sth1 = $dbh-prepare($sqlstring);
$sthcsv = $dbhcsv-prepare($sqlstring);

#
#   Then, any operations you do on the database, you can do on the
#   CSV file... [0]
#

$sth1-execute(@args);
$sthcsv-execute(@args);

Regards,


Red

[0] Although I wouldn't reccomend using it for large select-type queries
as with all plaintext files it won't scale.  Appending data to this type of
file seems to scale nicely.  It also means that if you were using these files
to restore your database if it loses everything retoration is as simple as
a read on one handle and a write on the other.



Re: *Buffy's Not Included

2001-06-14 Thread Redvers Davies

 Sounds like you need DBD::CSV.

Note to self... read all replies in a thread to make sure you don't 
provide redundant information.



RE: *Buffy's Not Included

2001-06-14 Thread Cross David - dcross

From: Robert Thompson [EMAIL PROTECTED]
Sent: Thursday, June 14, 2001 9:25 AM

 Sorry to go off topic...
 
 I was wondering what the pro's and con's are of using XML to structure
ASCII
 based data files.
 
 What are peoples experience of using XML - particularly where you may have
 to trawl through lots of files to get at the data you want?

Ta muchly in advance for any info or pointers

I notice that a new Perl/XML web page has just appeared:

http://www.xmlperl.com/

Dave...

-- 


The information contained in this communication is
confidential, is intended only for the use of the recipient
named above, and may be legally privileged. If the reader 
of this message is not the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  
If you have received this communication in error, please 
re-send this communication to the sender and delete the 
original message or any copy of it from your computer
system.



Re: *Buffy's Not Included

2001-06-14 Thread Dominic Mitchell

On Thu, Jun 14, 2001 at 10:44:50AM +0100, Jonathan Peterson wrote:
 At 10:01 14/06/01 +0100, you wrote:
 The data from these files will primarily be diplayed within an HTML page. 
 A
 perceived advantage of XML here (for someone who has barely scratched the
 surface of what XML can do), is the ability to (relatively) easily take 
 the
 XML and spit it out to the browser - and yes I know it's never quite that
 simple.
 
 I'm biased because I'm now working in a highly SGML / XML based company, 
 and it's remarkable what they manage to do with XSLT.

Seeing as this (alledgedly) a perl place, I'd like to point at that if
you're considering displaying XML on a web site, you should probably be
looking at http://axkit.org/ .  Matt Sergeant came and gave a talk us
a couple of months ago, it's a fine toy.

-Dom

-- 
| Semantico: creators of major online resources  |
|   URL: http://www.semantico.com/   |
|   Tel: +44 (1273) 72   |
|   Address: 33 Bond St., Brighton, Sussex, BN1 1RD, UK. |