Hi,
I followed the following approach to make the raw files searchable using Lucene.
Forrest uses site.xml to pass the documents to the Lucene index transformer. site.xml will not have the list of all the raw files as entries. In my case I wanted javadocs for a component library to be placed as raw HTML files and be searchable. Hence updating
site.xml every time the raw HTML files change is out of the question. Hence a new file site-lucene.xml that contains both site.xml and entries corresponding to all the raw HTML files was created. Steps are as follows:
1. Write a batch file (UpdateLuceneSearchList.bat) that gets the recursive list of all the HTML files and writes it to a file jupd.txt. Place it in the root of the folder containing the raw HTML files.
Contents of UpdateLuceneSearchList.bat >>
dir *.htm* /n /b /s >jupd.txt
2. Write a java program that takes site.xml and jupd.txt and produces a new xml file site-lucene.xml. Source attached.
3. Update search.xmap to enable our new site-lucene.xml to be used to obtain the input
<map:match pattern="site.lucene">
<map:generate src="" href="cocoon://abs-linkmap"/">cocoon://abs-linkmap"/>
<map:generate src="" href="cocoon://abs-linkmap"/">cocoon://abs-linkmap"/>
<map:match pattern="site.lucene">
<map:generate src="" href="cocoon://abs-linkmap-lucene"/">cocoon://abs-linkmap-lucene"/>
<map:generate src="" href="cocoon://abs-linkmap-lucene"/">cocoon://abs-linkmap-lucene"/>
4. Add an entry for abs-linkmap-lucene to the pipeline in linkmap.xmap
<map:match pattern="abs-linkmap-lucene">
<map:generate src="" />
<map:transform type="xinclude"/>
<map:transform src="" />
<map:serialize type="xml" />
</map:match>
<map:generate src="" />
<map:transform type="xinclude"/>
<map:transform src="" />
<map:serialize type="xml" />
</map:match>
5. Comment the following lines in site2book.xsl (as we generate the tags in site-lucene.xml without labels)
<!--
<xsl:when test="not(@label)">
</xsl:when>
-->
<xsl:when test="not(@label)">
</xsl:when>
-->
6. Create a batch file that calls UpdateLuceneSearchList.bat and executes the java program to update the index.
C:\neio\src\documentation\content\xdocs\globaljavadocs\jupd
java UpdateSite C:\neio\src\documentation\content\xdocs\globaljavadocs\jupd.txt C:\neio\src\documentation\content\xdocs\ C:\neio\src\documentation\content\xdocs\site.xml C:\neio\src\documentation\content\xdocs\site- lucene.xml
java UpdateSite C:\neio\src\documentation\content\xdocs\globaljavadocs\jupd.txt C:\neio\src\documentation\content\xdocs\ C:\neio\src\documentation\content\xdocs\site.xml C:\neio\src\documentation\content\xdocs\site- lucene.xml
This batch file can be scheduled to call every time there are updates to the raw files to keep the index updated. If this is of any help and the search related info on Forrest documentation could be updated, will be glad to do so.
Thanks and regards,
Karthik.
import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.FileReader; import java.io.FileWriter; import java.io.IOException;
public class UpdateSite {
public static void main(String[] args) {
String strJupdFile=args[0];
String strReplacePattern=args[1];
String strSiteXml=args[2];
String strDestSiteXml=args[3];
/*
String
strJupdFile="C:\\neio\\src\\documentation\\content\\xdocs\\globaljavadocs\\jupd.txt";
String
strReplacePattern="C:\\neio\\src\\documentation\\content\\xdocs\\";
String
strSiteXml="C:\\neio\\src\\documentation\\content\\xdocs\\site.xml";
String
strDestSiteXml="C:\\neio\\src\\documentation\\content\\xdocs\\site-lucene.xml";
*/
StringBuffer sb;
try {
BufferedWriter out = new BufferedWriter(new
FileWriter(strDestSiteXml));
BufferedReader in = new BufferedReader(new
FileReader(strSiteXml));
String str;
while (((str = in.readLine()) != null) &&
!(str.equals("</site>"))) {
out.write(str);
}
in.close();
in = new BufferedReader(new FileReader(strJupdFile));
while ((str = in.readLine()) != null) {
sb=new StringBuffer(100);
sb.append("<lib_ref href=\"");
sb.append(UpdateSite.replace(UpdateSite.replace(str,strReplacePattern,""),"\\","/"));
sb.append("\"/>");
out.write(sb.toString());
}
in.close();
out.write("</site>");
out.close();
System.out.println("Site index updated successfully.");
} catch (IOException e) {
}
}
public static String replace(String source, String pattern, String
replace)
{
if (source!=null)
{
final int len = pattern.length();
StringBuffer sb = new StringBuffer();
int found = -1;
int start = 0;
while( (found = source.indexOf(pattern, start) ) != -1) {
sb.append(source.substring(start, found));
sb.append(replace);
start = found + len;
}
sb.append(source.substring(start));
return sb.toString();
}
else return "";
}
}
