RE: [Flashcoders] CDROM XML search

2008-02-21 Thread Merrill, Jason
First questions to get out of the way is which version of Actionscript
and potentially how much data (in k)?  

Jason Merrill
Bank of America  
GTO LLD Solutions Design  Development 
eTools  Multimedia 

Bank of America Flash Platform Developer Community


Are you a Bank of America associate interested in innovative learning
ideas and technologies? 
Check out our internal  GTO Innovative Learning Blog and  subscribe.




 

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf 
Of Glen Pike
Sent: Thursday, February 21, 2008 10:50 AM
To: Flash Coders List
Subject: [Flashcoders] CDROM XML search

Hi,

I have been asked to look at a search facility for a 
CDROM project.

The customer is archiving magazines, 1 a month, for a 
decade per CD and wants a simple search engine.

The magazines will be archived as scanned images plus XML 
data containing page text content.

Loading in an XML file and searching / filtering is 
pretty easy in principle, but I am guessing I may run into 
performance issues as the amount of data is scaled up.

Google is proving fairly useless today, so has anyone had 
much experience of this and have any recommendations.

Thanks

Glen
-- 

Glen Pike
01736 759321
www.glenpike.co.uk http://www.glenpike.co.uk 
___
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders

___
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders


[Flashcoders] CDROM XML search

2008-02-21 Thread Glen Pike

Hi,

   I have been asked to look at a search facility for a CDROM project.

   The customer is archiving magazines, 1 a month, for a decade per CD 
and wants a simple search engine.


   The magazines will be archived as scanned images plus XML data 
containing page text content.


   Loading in an XML file and searching / filtering is pretty easy in 
principle, but I am guessing I may run into performance issues as the 
amount of data is scaled up.


   Google is proving fairly useless today, so has anyone had much 
experience of this and have any recommendations.


   Thanks

   Glen
--

Glen Pike
01736 759321
www.glenpike.co.uk http://www.glenpike.co.uk
___
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders


Re: [Flashcoders] CDROM XML search

2008-02-21 Thread Glen Pike

The system can use AS3 - as it is a CDROM.

I asked about the data size - at the moment, a sample XML file, 
generated by an automatic tool is about 500k, gulp.

That means, 6MB per year, 60MB per decade at the moment.

I have asked to see the file, because there may be a lot of rubbish that 
can be eliminated - I hope so..


Glen



Merrill, Jason wrote:

First questions to get out of the way is which version of Actionscript
and potentially how much data (in k)?  


Jason Merrill
Bank of America  
GTO LLD Solutions Design  Development 
eTools  Multimedia 


Bank of America Flash Platform Developer Community


Are you a Bank of America associate interested in innovative learning
ideas and technologies? 
Check out our internal  GTO Innovative Learning Blog and  subscribe.





 

  

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf 
Of Glen Pike

Sent: Thursday, February 21, 2008 10:50 AM
To: Flash Coders List
Subject: [Flashcoders] CDROM XML search

Hi,

   I have been asked to look at a search facility for a 
CDROM project.


   The customer is archiving magazines, 1 a month, for a 
decade per CD and wants a simple search engine.


   The magazines will be archived as scanned images plus XML 
data containing page text content.


   Loading in an XML file and searching / filtering is 
pretty easy in principle, but I am guessing I may run into 
performance issues as the amount of data is scaled up.


   Google is proving fairly useless today, so has anyone had 
much experience of this and have any recommendations.


   Thanks

   Glen
--

Glen Pike
01736 759321
www.glenpike.co.uk http://www.glenpike.co.uk 
___

Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders

  

___
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders


  


--

Glen Pike
01736 759321
www.glenpike.co.uk http://www.glenpike.co.uk
___
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders


RE: [Flashcoders] CDROM XML search

2008-02-21 Thread Rob Emenecker
Glen,

You might want consider structuring the XML as a faux relational database...
#1) full text
#2) keyword index and frequency
#3) titles, volume, number, etc.

During search, you only have to trawl through #2, which would have cross-ref
IDs for numbers 1 and 3. Then pull #1 for presentation purposes and possible
search term highlighting.

...Rob


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Glen Pike
Sent: Thursday, February 21, 2008 11:25 AM
To: Flash Coders List
Subject: Re: [Flashcoders] CDROM XML search

The system can use AS3 - as it is a CDROM.

I asked about the data size - at the moment, a sample XML file, generated by
an automatic tool is about 500k, gulp.
That means, 6MB per year, 60MB per decade at the moment.

I have asked to see the file, because there may be a lot of rubbish that can
be eliminated - I hope so..

Glen



Merrill, Jason wrote:
 First questions to get out of the way is which version of Actionscript 
 and potentially how much data (in k)?

 Jason Merrill
 Bank of America
 GTO LLD Solutions Design  Development eTools  Multimedia

 Bank of America Flash Platform Developer Community


 Are you a Bank of America associate interested in innovative learning 
 ideas and technologies?
 Check out our internal  GTO Innovative Learning Blog and  subscribe.




  

   
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Glen 
 Pike
 Sent: Thursday, February 21, 2008 10:50 AM
 To: Flash Coders List
 Subject: [Flashcoders] CDROM XML search

 Hi,

I have been asked to look at a search facility for a CDROM 
 project.

The customer is archiving magazines, 1 a month, for a decade per 
 CD and wants a simple search engine.

The magazines will be archived as scanned images plus XML data 
 containing page text content.

Loading in an XML file and searching / filtering is pretty easy 
 in principle, but I am guessing I may run into performance issues as 
 the amount of data is scaled up.

Google is proving fairly useless today, so has anyone had much 
 experience of this and have any recommendations.

Thanks

Glen
 --

 Glen Pike
 01736 759321
 www.glenpike.co.uk http://www.glenpike.co.uk 
 ___
 Flashcoders mailing list
 Flashcoders@chattyfig.figleaf.com
 http://chattyfig.figleaf.com/mailman/listinfo/flashcoders

   
 ___
 Flashcoders mailing list
 Flashcoders@chattyfig.figleaf.com
 http://chattyfig.figleaf.com/mailman/listinfo/flashcoders


   

-- 

Glen Pike
01736 759321
www.glenpike.co.uk http://www.glenpike.co.uk
___
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders

___
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders


RE: [Flashcoders] CDROM XML search

2008-02-21 Thread Merrill, Jason
So will the file be 500k and not more or do you mean it could grow to
6mb or 60 mb?  If so, you definitely don't want to have all of this in
one file, perhaps even with 500k.  I had troubles in AS2/FP8 parsing
250k files occassionally, but FP9 may be better at handling larger XML
files and parsing them.  It would be really really easy to test even
before they give you the actual file, just create dummy XML file and try
to load it in a read data from it.

With that much data though, you may want to try and split it up into
separate XML files and load them either as needed, or preload them one
at a time before doing a search.  You could categorize by year, or
alphabetical, or whatever makes sense based on the data they give you.
And if you can easily clean the files to strip out uneeded data, even
better. If you can get them to produce an even cleaner file for you,
even better than that obviously.

Lucky for you, since you said you can use AS3, searching using E4X
syntax is going to be a whole lot easier and likely faster.

Jason Merrill
Bank of America  
GTO LLD Solutions Design  Development 
eTools  Multimedia 

Bank of America Flash Platform Developer Community


Are you a Bank of America associate interested in innovative learning
ideas and technologies? 
Check out our internal  GTO Innovative Learning Blog and  subscribe.




 

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf 
Of Glen Pike
Sent: Thursday, February 21, 2008 11:25 AM
To: Flash Coders List
Subject: Re: [Flashcoders] CDROM XML search

The system can use AS3 - as it is a CDROM.

I asked about the data size - at the moment, a sample XML 
file, generated by an automatic tool is about 500k, gulp.
That means, 6MB per year, 60MB per decade at the moment.

I have asked to see the file, because there may be a lot of 
rubbish that can be eliminated - I hope so..

Glen



Merrill, Jason wrote:
 First questions to get out of the way is which version of 
Actionscript 
 and potentially how much data (in k)?

 Jason Merrill
 Bank of America
 GTO LLD Solutions Design  Development eTools  Multimedia

 Bank of America Flash Platform Developer Community


 Are you a Bank of America associate interested in 
innovative learning 
 ideas and technologies?
 Check out our internal  GTO Innovative Learning Blog and  
subscribe.




  

   
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On 
Behalf Of Glen 
 Pike
 Sent: Thursday, February 21, 2008 10:50 AM
 To: Flash Coders List
 Subject: [Flashcoders] CDROM XML search

 Hi,

I have been asked to look at a search facility for a CDROM 
 project.

The customer is archiving magazines, 1 a month, for a 
decade per 
 CD and wants a simple search engine.

The magazines will be archived as scanned images plus XML data 
 containing page text content.

Loading in an XML file and searching / filtering is 
pretty easy 
 in principle, but I am guessing I may run into 
performance issues as 
 the amount of data is scaled up.

Google is proving fairly useless today, so has anyone had much 
 experience of this and have any recommendations.

Thanks

Glen
 --

 Glen Pike
 01736 759321
 www.glenpike.co.uk http://www.glenpike.co.uk 
 ___
 Flashcoders mailing list
 Flashcoders@chattyfig.figleaf.com
 http://chattyfig.figleaf.com/mailman/listinfo/flashcoders

   
 ___
 Flashcoders mailing list
 Flashcoders@chattyfig.figleaf.com
 http://chattyfig.figleaf.com/mailman/listinfo/flashcoders


   

-- 

Glen Pike
01736 759321
www.glenpike.co.uk http://www.glenpike.co.uk 
___
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders

___
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders


Re: [Flashcoders] CDROM XML search

2008-02-21 Thread Glen Pike
Rob: Relational databases sound like a good plan, thanks I will think 
about that.


Jason: A magazine produces around 500k of data, so a year would be 6MB 
and a decade around 60MB.  Your tip about keeping the files separate is 
handy though.


Looking on the web, some people seemed to recommend using Director or 
other wrappers to allow access to the file system.  Anyone know if 
this might be something to consider?


Glen


--

Glen Pike
01736 759321
www.glenpike.co.uk http://www.glenpike.co.uk
___
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders


Re: [Flashcoders] CDROM XML search

2008-02-21 Thread Cory Petosky
Since it's a CD-ROM, you can get away with a lot of preprocessing on
the data. Do all the really heavy lifting before you deliver the
project, so that the live Flash app doesn't have to.

One straightforward thing you can do is parse the entire collection
XML, keeping every article every word appears in, and dumping this
list  (sorted) as a plaintext file. Something like:

aardvark: i302a27, i322a41, i412a2
anchovy: i210a9, i289a31
bezier: i123a4

Where the list format is iissueNumaarticleNum. Then, at run time,
your Flash app can (relatively) quickly load your index file into a
huge sorted list. Finally, when a search term is entered, you can do a
quick binary search on the search term and find all relevant articles.

You'll probably need to write your preprocessor in another language,
since Flash can't write to local files. You could conceivably write
your preprocessor using AIR and Actionscript.

Also note that I'm not guaranteeing that a simple list index is best
-- I'm just providing one implementation idea off the top of my head
that demonstrates the use of preprocessing the data, which I think you
must do regardless of your final indexing strategy.

On 2/21/08, Glen Pike [EMAIL PROTECTED] wrote:
 The system can use AS3 - as it is a CDROM.

  I asked about the data size - at the moment, a sample XML file,
  generated by an automatic tool is about 500k, gulp.
  That means, 6MB per year, 60MB per decade at the moment.

  I have asked to see the file, because there may be a lot of rubbish that
  can be eliminated - I hope so..


  Glen




  Merrill, Jason wrote:
   First questions to get out of the way is which version of Actionscript
   and potentially how much data (in k)?
  
   Jason Merrill
   Bank of America
   GTO LLD Solutions Design  Development
   eTools  Multimedia
  
   Bank of America Flash Platform Developer Community
  
  
   Are you a Bank of America associate interested in innovative learning
   ideas and technologies?
   Check out our internal  GTO Innovative Learning Blog and  subscribe.
  
  
  
  
  
  
  
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On Behalf
   Of Glen Pike
   Sent: Thursday, February 21, 2008 10:50 AM
   To: Flash Coders List
   Subject: [Flashcoders] CDROM XML search
  
   Hi,
  
  I have been asked to look at a search facility for a
   CDROM project.
  
  The customer is archiving magazines, 1 a month, for a
   decade per CD and wants a simple search engine.
  
  The magazines will be archived as scanned images plus XML
   data containing page text content.
  
  Loading in an XML file and searching / filtering is
   pretty easy in principle, but I am guessing I may run into
   performance issues as the amount of data is scaled up.
  
  Google is proving fairly useless today, so has anyone had
   much experience of this and have any recommendations.
  
  Thanks
  
  Glen
   --
  
   Glen Pike
   01736 759321
   www.glenpike.co.uk http://www.glenpike.co.uk
   ___
   Flashcoders mailing list
   Flashcoders@chattyfig.figleaf.com
   http://chattyfig.figleaf.com/mailman/listinfo/flashcoders
  
  
   ___
   Flashcoders mailing list
   Flashcoders@chattyfig.figleaf.com
   http://chattyfig.figleaf.com/mailman/listinfo/flashcoders
  
  
  

  --

  Glen Pike
  01736 759321
  www.glenpike.co.uk http://www.glenpike.co.uk
  ___
  Flashcoders mailing list
  Flashcoders@chattyfig.figleaf.com
  http://chattyfig.figleaf.com/mailman/listinfo/flashcoders



-- 
Cory Petosky : Lead Developer : PUNY
1618 Central Ave NE Suite 130
Minneapolis, MN 55413
Office: 612.216.3924
Mobile: 240.422.9652
Fax: 612.605.9216
http://www.punyentertainment.com
___
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders


RE: [Flashcoders] CDROM XML search

2008-02-21 Thread Rob Emenecker

 Looking on the web, some people seemed to recommend using Director 
 or other wrappers to allow access to the file system.  Anyone 
 know if this might be something to consider?

Yes. If you go this route there are several commercial database engines that
are available to you. However, the Flash Xtra for Director is not current
with the web Flash Player. Specifically it does not currently support AS3 or
certain components, and Flex is WAY OUT OF THE QUESTION.

If you are coding in AS2, then Director coupled with a database Xtra
(Valentina, Vizion, Arca, ADO, Datagrip, etc.) is what I would recommend.

...Rob

___
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders