Hi, did you make sure that the workspace configuration file (workspace.xml) is the same as the one you used with jackrabbit 1.4.5? that's where the text extractor classes are configured, which are responsible for extracting the text from your pdf and odt files.
regards marcel Jop Zinkweg - Initworks B.V. wrote: > Hello everyone, > > After upgrading from Jackrabbit 1.4(.5) to 1.5(.0), the behaviour of our > search box changed. > > Previously we were able to search the 'binary' contents and additional > properties of a jcr:content node using the following query: > > SELECT * FROM bos:correspondentie WHERE CONTAINS(.,'abc') > > After upgrading this query only returns 'old' files (uploaded while > running 1.4) which have 'abc' in them (pdf / odt files). > > When searching for a value known to be in a 'meta' property both 'old' > and 'new' (1.5) files are returned. > > After removing the index both the 'old' and 'new' files can only be > found using their properties. > > This leads me to believe the indexing behaviour (and not the query > behaviour) has changed between 1.4 and 1.5. > > We're running a vanilla 1.4 configuration, and looking at the 1.5 > vanilla config nothing has changed in respect to the searching/indexing > default setup. > > Our node structure is as follows: > > * 'folder' = nt:folder > * 'file.odt' = nt:file > * 'jcr:content' = nt:unstructured (+ bos:correspondentie mixin > defining some properties) > > > Has the indexing behaviour changed in 1.5, or am I looking at another > problem entirely? > > Thanks in advance, > > Jop Zinkweg > >
