Emiliano wrote: > > Andrew Hatton wrote: > [...] > > The rub comes when we allow users to attach PDF's to documents > > using Nadmin's gui tool, as this means that these PDF's get saved to the > > blobs directory as a GUID. > > The Swish file system index method can pick these up but will > > give them GUID filesnames, making the search results v.strange,and > > meaningless. > > I notice that in the blobs table, the name and title of the attachment is > > saved! > > Is there any way of not saving as a GUID so my spider will > > pick this information up?, and if so, what are the consequences of doing > > this?
[...] > If the URL isn't relevant information for your indexing, it'd be quite > simple to create a perl script that reads the blob table and builds a > directory full of symlinks with the real names into the blobs > dirtectory, and the spider can index that. If you do need the URL, > that's pretty hard to resolve externally. A blob can be served by I do not know Swish, but if it works like other indexers, then it is a separate module to collect the text (and its data, such as URL) to index, and to put it into its database. So, it is possible to actively "push" into the database your site with your own (for example, PHP) scripts, which get the information from blobs table, etc. In other words: not HTTP, not filesystem, but self-implemented collecting method. Fery --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
