I even tried a simpler query based on findings with Luke In Luke I did the following:
_\:LOCAL_NAME: mc_art_contest.inc.html Which is the name of one of the nodes stored. And Luke reported one record found. Then I tried select * from [nt:file] where name = 'mc_art_contest.inc.html' In JR and found: 0 nodes Was the result.. The problem is.. I'm not sure where the bug is.. But text searches are not working with a derby/lucene/Jackrabbit default deploy. I tried this as a servlet in the same container as the war, I tried this as an RMI/JCA application… No luck.. So that is that and its been fun. Thanks, Carl Furst On 7/11/12 4:17 PM, "Furst, Carl" <[email protected]> wrote: >Thanks for the help Torsten, > >Unfortunately that didn't work. The output from my test is as follows: > >mimetype for node we are looking for is: text/html >// Which was taken from the node, using the path. This is the text that is >stored in jcr:mimeType > >text for node we are looking for is: >FanFest Art Contest Winners</b></span><br> >// this is a snippet of text from the document I was searching stored in >jcr:data > > > > >starting execute >executing current query with sqlSELECT [nt:resource].* FROM [nt:resource] >WHERE CONTAINS([nt:resource].[jcr:data], 'FanFest Art Contest') using >language JCR-SQL2 >//This is the query as extracted from the Query object > >And this is the result: > >found: 0 nodes >executed test in 660 ms > > >So something is not right…(SQL, maybe?). Maybe the node iterator isn't >getting the right count of nodes? Could it be that over RMI it's possible >to get the nodes but not the right count nodes returned? > >Hmmm…. > > >Carl Furst > > > > > >On 7/11/12 2:32 PM, "Torsten Stolpmann" <[email protected]> wrote: > >>Hi Carl, >> >>AFAIK the documentation still refers to jackrabbit 1.x.x - see [1] for >>details. Maybe [2] has the correct answer to your problem (explicitly >>setting the jcr:mimeType for your data node)? >> >>HTH, >> >>Torsten >> >>[1] https://issues.apache.org/jira/browse/JCR-1878 >>[2] >>http://jackrabbit.510166.n4.nabble.com/textFilterClasses-deprecated-How-t >>o >>-specify-extractors-td4534050.html >> >>On 11.07.2012 20:16, Furst, Carl wrote: >>> So after some investigation I'm at a loss as to which class to use for >>> text extraction (ie what to set textFilterClasses to in the >>>workspace.xml >>> file). Which class is the default in 2.4.2? The Wiki I think is >>> incorrect... It states >>> org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter as the >>> default, but I don't see that class in the source code. >>> >>> Possible candidates are: >>> Org.apache.jackrabbit.core.query.lucene.SearchIndex (regular search >>> indexer) >>> Org.apache.jackrabbit.core.query.lucene.BlockingParser >>> org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField >>> >>> Any suggestions? I'll plug in the last two and see if things improve. >>> >>> >>> >>> >>> Thanks, >>> Carl Furst >>> >>> >>> >>> >>> >>> On 7/11/12 1:36 PM, "Furst, Carl"<[email protected]> wrote: >>> >>>> 2.4.2 - Thanks for the references.. I'll check out Tika and try a >>>>test. >>>> >>>> Thanks, >>>> Carl Furst >>>> >>>> >>>> >>>> >>>> >>>> On 7/3/12 5:19 AM, "Alex Parvulescu"<[email protected]> >>>>wrote: >>>> >>>>> Hi Carl, >>>>> >>>>> What version of jackrabbit are you on? >>>>> >>>>> Next, are you sure you have the tika extractors in the classpath? >>>>>maybe >>>>> you >>>>> are seeing something along the lines of [0]. >>>>> >>>>> I would try to isolate the problem by taking tomcat out of the setup. >>>>> Build >>>>> a simple test, see how it works then deploy on tomcat and verify. >>>>> A good place to start is the unit test collection available in >>>>>jackrabbit >>>>> core [1]. >>>>> >>>>> >>>>> best, >>>>> alex >>>>> >>>>> [0] https://issues.apache.org/jira/browse/JCR-3287 >>>>> [1] >>>>> >>>>>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/test >>>>>/ >>>>>ja >>>>> v >>>>> >>>>>a/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=mar >>>>>k >>>>>up >>>>> >>>>> >>>>> On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl<[email protected]> >>>>>wrote: >>>>> >>>>>> So given the below I tried to use >>>>>> >>>>>> 'inclu*' and 'include*' and still no results so I'm going to start >>>>>> looking >>>>>> into perhaps maybe some of these reasons as why: >>>>>> >>>>>> >>>>>> >>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hit >>>>>>s >>>>>>_. >>>>>> 2 >>>>>> BA >>>>>> C8_incorrect_hits.3F >>>>>> >>>>>> Of course it could just be that the parser is not parsing the '*'. >>>>>> >>>>>> Thanks again, >>>>>> >>>>>> >>>>>> >>>>>> Carl Furst >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 6/27/12 1:59 PM, "Furst, Carl"<[email protected]> wrote: >>>>>> >>>>>>> Thanks Torsten, >>>>>>> >>>>>>> So even using JQOM would not help here. I'll read up more on lucine >>>>>> and >>>>>>> find out more. My main stumbling block here was where the query was >>>>>> being >>>>>>> executed. Was it on the Derby level or the Lucine level.. >>>>>>> >>>>>>> This has cleared that part of it up for me as well. >>>>>>> >>>>>>> Thanks again, >>>>>>> >>>>>>> Carl Furst >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 6/27/12 1:50 PM, "Torsten Stolpmann"<[email protected]> wrote: >>>>>>> >>>>>>>> Hi Carl, >>>>>>>> >>>>>>>> per default the underlying Lucene implementation does not match >>>>>> leading >>>>>>>> wildcards for performance reasons. See also: >>>>>>>> >>>>>> >>>>>> >>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_su >>>>>>p >>>>>>po >>>>>> r >>>>>>>> t >>>>>>>> _is_available_from_Lucene.3F >>>>>>>> >>>>>>>> So just matching '*' will not work, but eg. 'i*' might give you >>>>>>>>the >>>>>>>> results you were looking for. >>>>>>>> >>>>>>>> Sadly enough I did not find any reference to this in the >>>>>>>>JackRabbit >>>>>>>> documentation. >>>>>>>> >>>>>>>> Took me quite a while to find that too. >>>>>>>> >>>>>>>> Hope this helps, >>>>>>>> >>>>>>>> Torsten >>>>>>>> >>>>>>>> On 27.06.2012 17:19, Furst, Carl wrote: >>>>>>>>> I'm probably missing something here but everything I've read so >>>>>>>>>far >>>>>>>>> leads >>>>>>>>> me to believe this should work.. >>>>>>>>> >>>>>>>>> I have nodes in a repositoy of type nt:folder and nt:file. >>>>>>>>>nt:file >>>>>> has >>>>>>>>> a >>>>>>>>> child node jcr:content of type nt:resource which has a child >>>>>> property >>>>>>>>> called jcr:data >>>>>>>>> >>>>>>>>> There are many cases where the jcr:data column has the world >>>>>> 'include' >>>>>>>>> in >>>>>>>>> it. They are jsp files so, yes, I know this word exists in >>>>>>>>>several >>>>>>>>> files. >>>>>>>>> >>>>>>>>> So here's the sql I use: >>>>>>>>> >>>>>>>>> select * from [nt:resource] where contains([jcr:data], >>>>>>>>>'include'); >>>>>>>>> >>>>>>>>> Here's the sql that is returned from q.getStatement() : >>>>>>>>> >>>>>>>>> SELECT [nt:resource].* FROM [nt:resource] WHERE >>>>>>>>> CONTAINS([nt:resource].[jcr:data], 'include'); >>>>>>>>> >>>>>>>>> Here is a sample text in jcr:data to search on. >>>>>>>>> >>>>>>>>> <%@ include file="..." >>>>>>>>> >>>>>>>>> >>>>>>>>> ... More jsp here.. >>>>>>>>> <%/jsp:include... >>>>>>>>> >>>>>>>>> Yet it doesn¹t find it. I feel I'm missing something.. Do I need >>>>>>>>>to >>>>>> add >>>>>>>>> a >>>>>>>>> "searchable" mixin or something? >>>>>>>>> >>>>>>>>> Any ideas why this is not being found? >>>>>>>>> >>>>>>>>> It used to be that apache had the cdn file for jackrabbit node >>>>>> types >>>>>>>>> was >>>>>>>>> readily available. Does anyone know where I can find the cdn file >>>>>> for >>>>>>>>> jackrabbit node types? >>>>>>>>> >>>>>>>>> jcr:content is unstructured, but I explicitly make the type >>>>>> nt:resource >>>>>>>>> (otherwise the statement would would not be parsed, Query object >>>>>> would >>>>>>>>> throw an error, like "table not found," right? Because the type >>>>>>>>>is >>>>>> a >>>>>>>>> table). So the type is right.. The field is right.. The search is >>>>>> not >>>>>>>>> working. >>>>>>>>> >>>>>>>>> >>>>>>>>> I'm using Jackrabbit without any special configuration. Just the >>>>>> war in >>>>>>>>> a >>>>>>>>> simple tomcat deployment. So it's sitting on top of Derby and >>>>>> Lucine. >>>>>>>>> >>>>>>>>> >>>>>>>>> Any help would be appreciated. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Carl Furst >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ********************************************************** >>>>>>>>> >>>>>>>>> MLB.com: Where Baseball is Always On >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ********************************************************** >>>>>>> >>>>>>> MLB.com: Where Baseball is Always On >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ********************************************************** >>>>>> >>>>>> MLB.com: Where Baseball is Always On >>>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ********************************************************** >>>> >>>> MLB.com: Where Baseball is Always On >>> >>> >>> >>> >>> >>> >>> ********************************************************** >>> >>> MLB.com: Where Baseball is Always On >> > > > > > > >********************************************************** > >MLB.com: Where Baseball is Always On ********************************************************** MLB.com: Where Baseball is Always On
