[topbraid-users] Processing 10,000 files

Tim Smith Fri, 10 Feb 2012 13:03:30 -0800

Hi,

I have a situation where I have 10,000+ small text files in a rather deep
file system tree.  Ultimately, I need to attach these files to instances in
an ontology based on the presence/absence of various words in the files.


For example, if a file mentions the name of a particular database table,
that file should be attached to the instance that is the database table.

I believe the embedded Lucene engine can do this.

I was thinking that if I can sequentially process each file, I can make
each one an instance of a File class where each instance has a name (the
file name) and a property that contains the contents of the file.  Then if
I can trigger Lucene to index the resulting text strings, I will be able to
look for the key words I'm interested in and Construct the tagging
relationships (using pf:Match).

However, I'm struggling with how to process all the files in a given
directory and how to trigger Lucene to index the text strings added to a
model.

SPARQLMotion supports importing a single text file but does not appear to
support multiple files/directory trees.

Any suggestions on how to process large numbers of files and trigger the
indexing process?

Thanks in advance,

Tim

-- 
You received this message because you are subscribed to the Google
Group "TopBraid Suite Users", the topics of which include Enterprise Vocabulary 
Network (EVN), TopBraid Composer,
TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
To post to this group, send email to
[email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/topbraid-users?hl=en

[topbraid-users] Processing 10,000 files

Reply via email to