I didn't see your original post, but we use a spider based indexer to index the pages
instead of verity or SQL. Threre are several you can buy or you can use an ASP like
Atomz which is what we ended up doing because of all the features they had.
Lanny Udey
Associate Dean,
Learning and Information Technology
Hofstra University
[EMAIL PROTECTED]
>>> [EMAIL PROTECTED] Wednesday, March 21, 2001 >>>
Could try this to create an index of all objects currently published onto
pages.
It's what I've used in 1.01 and directly accesses the sitecomposition table
so probably doesn't work in 1.5.
It also wouldn't work properly if you use any personalisation rules in your
containers since the user that runs the indexer wouldn't have any options
set to show content.
It also needs cfx_pcregex which you can download from the tag gallery and I
suspect the sql I've used may only work with Oracle.
Anyway, it may give you a start for your own indexing.
Regards,
Andy
<!---
cf_indexsite.cfm
Takes a compositionid (site,section or page) and indexes the pages below it
Attributes:
datasource (REQUIRED) datasource that contains the sitemodel
compositionid (REQUIRED) the objectid of the site, section or page to index
collection (REQUIRED) the collection name to index to
purge (OPTIONAL) boolean to indicate whether the purge the
index first
--->
<cfparam name="attributes.datasource" type="string">
<cfparam name="attributes.compositionid" type="UUID">
<cfparam name="attributes.collection" type="string"
default="ProvidentSiteIndex">
<cfparam name="attributes.purge" type="boolean" default="FALSE">
<!--- get all the pages --->
<cfquery name="GetPages" datasource="#attributes.datasource#">
SELECT compositionid,compositionlabel,absoluteurl,relativeurl,absolutepath
FROM sitecomposition
WHERE LOWER(compositiontype)='page'
START WITH compositionid = '#attributes.compositionid#'
CONNECT BY parentcompositionid = PRIOR compositionid
</cfquery>
<!--- set up a query to store the results to be indexed --->
<cfset qPages = QueryNew("objectid,title,url,content")>
<!--- loop through the pages --->
<cfloop query="GetPages">
<!--- make sure we've got a url to the page --->
<cfif Len(Trim(absoluteurl))>
<cfset url = absoluteurl>
<cfelse>
<cfif Len(Trim(relativeurl))>
<cfset url = "http://" & cgi.server_name & relativeurl>
<cfelse>
<cfset url = REReplace(absolutepath,"[[:alpha:]]:\\","http://"
&
cgi.server_name & "/")>
</cfif>
</cfif>
<cfset url = Replace(url,"\","/","ALL")>
<cfset url = Replace(url," ","%20","ALL")>
<!--- get the page --->
<cftry>
<cfhttp url="#url#" method="GET" resolveurl="false" useragent="PF
Indexer"
timeout="10" throwonerror="yes">
<!--- rip out the whole head and any other tags --->
<cfx_pcregex subject="#cfhttp.filecontent#"
pattern="(?isU)<HEAD>.*</HEAD>" results="content" replace="" count="ALL">
<cfx_pcregex subject="#content#" pattern="(?isU)<STYLE[^>]*>.*</STYLE>"
results="content" replace="" count="ALL">
<cfx_pcregex subject="#content#"
pattern="(?isU)<SCRIPT[^>]*>.*</SCRIPT>"
results="content" replace="" count="ALL">
<cfx_pcregex subject="#content#" pattern="(?isU)<[^>]*>"
results="content"
replace=" " count="ALL">
<cfscript>
QueryAddRow(qPages);
QuerySetCell(qPages,"objectid",compositionid);
QuerySetCell(qPages,"title",compositionlabel);
QuerySetCell(qPages,"url",url);
QuerySetCell(qPages,"content",content);
</cfscript>
Indexed <cfoutput>#url#</cfoutput><br>
<cfcatch>
Couldn't get <cfoutput>#url#</cfoutput><br>
<cfa_dump var="#cfcatch#">
</cfcatch>
</cftry>
</cfloop>
<cfif attributes.purge>
<cfindex action="PURGE" collection="#attributes.collection#">
</cfif>
<!--- index the results --->
<cfindex action="UPDATE" collection="#attributes.collection#" key="objectid"
type="CUSTOM" title="title" query="qPages" body="content" custom1="url"
custom2="">
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Structure your ColdFusion code with Fusebox. Get the official book at
http://www.fusionauthority.com/bkinfo.cfm
------------------------------------------------------------------------------
To Unsubscribe visit
http://www.houseoffusion.com/index.cfm?sidebar=lists&body=lists/spectra_talk or send a
message to [EMAIL PROTECTED] with 'unsubscribe' in the body.