Hi Carl, AFAIK the documentation still refers to jackrabbit 1.x.x - see [1] for details. Maybe [2] has the correct answer to your problem (explicitly setting the jcr:mimeType for your data node)?
HTH, Torsten [1] https://issues.apache.org/jira/browse/JCR-1878 [2] http://jackrabbit.510166.n4.nabble.com/textFilterClasses-deprecated-How-to-specify-extractors-td4534050.html On 11.07.2012 20:16, Furst, Carl wrote: > So after some investigation I'm at a loss as to which class to use for > text extraction (ie what to set textFilterClasses to in the workspace.xml > file). Which class is the default in 2.4.2? The Wiki I think is > incorrect... It states > org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter as the > default, but I don't see that class in the source code. > > Possible candidates are: > Org.apache.jackrabbit.core.query.lucene.SearchIndex (regular search > indexer) > Org.apache.jackrabbit.core.query.lucene.BlockingParser > org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField > > Any suggestions? I'll plug in the last two and see if things improve. > > > > > Thanks, > Carl Furst > > > > > > On 7/11/12 1:36 PM, "Furst, Carl"<[email protected]> wrote: > >> 2.4.2 - Thanks for the references.. I'll check out Tika and try a test. >> >> Thanks, >> Carl Furst >> >> >> >> >> >> On 7/3/12 5:19 AM, "Alex Parvulescu"<[email protected]> wrote: >> >>> Hi Carl, >>> >>> What version of jackrabbit are you on? >>> >>> Next, are you sure you have the tika extractors in the classpath? maybe >>> you >>> are seeing something along the lines of [0]. >>> >>> I would try to isolate the problem by taking tomcat out of the setup. >>> Build >>> a simple test, see how it works then deploy on tomcat and verify. >>> A good place to start is the unit test collection available in jackrabbit >>> core [1]. >>> >>> >>> best, >>> alex >>> >>> [0] https://issues.apache.org/jira/browse/JCR-3287 >>> [1] >>> http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/test/ja >>> v >>> a/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=markup >>> >>> >>> On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl<[email protected]> wrote: >>> >>>> So given the below I tried to use >>>> >>>> 'inclu*' and 'include*' and still no results so I'm going to start >>>> looking >>>> into perhaps maybe some of these reasons as why: >>>> >>>> >>>> https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_. >>>> 2 >>>> BA >>>> C8_incorrect_hits.3F >>>> >>>> Of course it could just be that the parser is not parsing the '*'. >>>> >>>> Thanks again, >>>> >>>> >>>> >>>> Carl Furst >>>> >>>> >>>> >>>> >>>> >>>> On 6/27/12 1:59 PM, "Furst, Carl"<[email protected]> wrote: >>>> >>>>> Thanks Torsten, >>>>> >>>>> So even using JQOM would not help here. I'll read up more on lucine >>>> and >>>>> find out more. My main stumbling block here was where the query was >>>> being >>>>> executed. Was it on the Derby level or the Lucine level.. >>>>> >>>>> This has cleared that part of it up for me as well. >>>>> >>>>> Thanks again, >>>>> >>>>> Carl Furst >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 6/27/12 1:50 PM, "Torsten Stolpmann"<[email protected]> wrote: >>>>> >>>>>> Hi Carl, >>>>>> >>>>>> per default the underlying Lucene implementation does not match >>>> leading >>>>>> wildcards for performance reasons. See also: >>>>>> >>>> >>>> https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_suppo >>>> r >>>>>> t >>>>>> _is_available_from_Lucene.3F >>>>>> >>>>>> So just matching '*' will not work, but eg. 'i*' might give you the >>>>>> results you were looking for. >>>>>> >>>>>> Sadly enough I did not find any reference to this in the JackRabbit >>>>>> documentation. >>>>>> >>>>>> Took me quite a while to find that too. >>>>>> >>>>>> Hope this helps, >>>>>> >>>>>> Torsten >>>>>> >>>>>> On 27.06.2012 17:19, Furst, Carl wrote: >>>>>>> I'm probably missing something here but everything I've read so far >>>>>>> leads >>>>>>> me to believe this should work.. >>>>>>> >>>>>>> I have nodes in a repositoy of type nt:folder and nt:file. nt:file >>>> has >>>>>>> a >>>>>>> child node jcr:content of type nt:resource which has a child >>>> property >>>>>>> called jcr:data >>>>>>> >>>>>>> There are many cases where the jcr:data column has the world >>>> 'include' >>>>>>> in >>>>>>> it. They are jsp files so, yes, I know this word exists in several >>>>>>> files. >>>>>>> >>>>>>> So here's the sql I use: >>>>>>> >>>>>>> select * from [nt:resource] where contains([jcr:data], 'include'); >>>>>>> >>>>>>> Here's the sql that is returned from q.getStatement() : >>>>>>> >>>>>>> SELECT [nt:resource].* FROM [nt:resource] WHERE >>>>>>> CONTAINS([nt:resource].[jcr:data], 'include'); >>>>>>> >>>>>>> Here is a sample text in jcr:data to search on. >>>>>>> >>>>>>> <%@ include file="..." >>>>>>> >>>>>>> >>>>>>> ... More jsp here.. >>>>>>> <%/jsp:include... >>>>>>> >>>>>>> Yet it doesn¹t find it. I feel I'm missing something.. Do I need to >>>> add >>>>>>> a >>>>>>> "searchable" mixin or something? >>>>>>> >>>>>>> Any ideas why this is not being found? >>>>>>> >>>>>>> It used to be that apache had the cdn file for jackrabbit node >>>> types >>>>>>> was >>>>>>> readily available. Does anyone know where I can find the cdn file >>>> for >>>>>>> jackrabbit node types? >>>>>>> >>>>>>> jcr:content is unstructured, but I explicitly make the type >>>> nt:resource >>>>>>> (otherwise the statement would would not be parsed, Query object >>>> would >>>>>>> throw an error, like "table not found," right? Because the type is >>>> a >>>>>>> table). So the type is right.. The field is right.. The search is >>>> not >>>>>>> working. >>>>>>> >>>>>>> >>>>>>> I'm using Jackrabbit without any special configuration. Just the >>>> war in >>>>>>> a >>>>>>> simple tomcat deployment. So it's sitting on top of Derby and >>>> Lucine. >>>>>>> >>>>>>> >>>>>>> Any help would be appreciated. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Carl Furst >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ********************************************************** >>>>>>> >>>>>>> MLB.com: Where Baseball is Always On >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ********************************************************** >>>>> >>>>> MLB.com: Where Baseball is Always On >>>> >>>> >>>> >>>> >>>> >>>> >>>> ********************************************************** >>>> >>>> MLB.com: Where Baseball is Always On >>>> >> >> >> >> >> >> >> ********************************************************** >> >> MLB.com: Where Baseball is Always On > > > > > > > ********************************************************** > > MLB.com: Where Baseball is Always On
