Tried the same query using xpath: //mc_art_contest.inc.html
Worked! Just FYI. Carl Furst On 7/12/12 6:07 PM, "Furst, Carl" <[email protected]> wrote: >I even tried a simpler query based on findings with Luke > >In Luke I did the following: > >_\:LOCAL_NAME: mc_art_contest.inc.html > > >Which is the name of one of the nodes stored. > >And Luke reported one record found. > >Then I tried > >select * from [nt:file] where name = 'mc_art_contest.inc.html' > > >In JR and >found: 0 nodes > > >Was the result.. The problem is.. I'm not sure where the bug is.. But text >searches are not working with a derby/lucene/Jackrabbit default deploy. I >tried this as a servlet in the same container as the war, I tried this as >an RMI/JCA application… No luck.. So that is that and its been fun. > >Thanks, > > > >Carl Furst > > > > > > >On 7/11/12 4:17 PM, "Furst, Carl" <[email protected]> wrote: > >>Thanks for the help Torsten, >> >>Unfortunately that didn't work. The output from my test is as follows: >> >>mimetype for node we are looking for is: text/html >>// Which was taken from the node, using the path. This is the text that >>is >>stored in jcr:mimeType >> >>text for node we are looking for is: >>FanFest Art Contest Winners</b></span><br> >>// this is a snippet of text from the document I was searching stored in >>jcr:data >> >> >> >> >>starting execute >>executing current query with sqlSELECT [nt:resource].* FROM [nt:resource] >>WHERE CONTAINS([nt:resource].[jcr:data], 'FanFest Art Contest') using >>language JCR-SQL2 >>//This is the query as extracted from the Query object >> >>And this is the result: >> >>found: 0 nodes >>executed test in 660 ms >> >> >>So something is not right…(SQL, maybe?). Maybe the node iterator isn't >>getting the right count of nodes? Could it be that over RMI it's possible >>to get the nodes but not the right count nodes returned? >> >>Hmmm…. >> >> >>Carl Furst >> >> >> >> >> >>On 7/11/12 2:32 PM, "Torsten Stolpmann" <[email protected]> wrote: >> >>>Hi Carl, >>> >>>AFAIK the documentation still refers to jackrabbit 1.x.x - see [1] for >>>details. Maybe [2] has the correct answer to your problem (explicitly >>>setting the jcr:mimeType for your data node)? >>> >>>HTH, >>> >>>Torsten >>> >>>[1] https://issues.apache.org/jira/browse/JCR-1878 >>>[2] >>>http://jackrabbit.510166.n4.nabble.com/textFilterClasses-deprecated-How- >>>t >>>o >>>-specify-extractors-td4534050.html >>> >>>On 11.07.2012 20:16, Furst, Carl wrote: >>>> So after some investigation I'm at a loss as to which class to use for >>>> text extraction (ie what to set textFilterClasses to in the >>>>workspace.xml >>>> file). Which class is the default in 2.4.2? The Wiki I think is >>>> incorrect... It states >>>> org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter as the >>>> default, but I don't see that class in the source code. >>>> >>>> Possible candidates are: >>>> Org.apache.jackrabbit.core.query.lucene.SearchIndex (regular search >>>> indexer) >>>> Org.apache.jackrabbit.core.query.lucene.BlockingParser >>>> org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField >>>> >>>> Any suggestions? I'll plug in the last two and see if things improve. >>>> >>>> >>>> >>>> >>>> Thanks, >>>> Carl Furst >>>> >>>> >>>> >>>> >>>> >>>> On 7/11/12 1:36 PM, "Furst, Carl"<[email protected]> wrote: >>>> >>>>> 2.4.2 - Thanks for the references.. I'll check out Tika and try a >>>>>test. >>>>> >>>>> Thanks, >>>>> Carl Furst >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 7/3/12 5:19 AM, "Alex Parvulescu"<[email protected]> >>>>>wrote: >>>>> >>>>>> Hi Carl, >>>>>> >>>>>> What version of jackrabbit are you on? >>>>>> >>>>>> Next, are you sure you have the tika extractors in the classpath? >>>>>>maybe >>>>>> you >>>>>> are seeing something along the lines of [0]. >>>>>> >>>>>> I would try to isolate the problem by taking tomcat out of the >>>>>>setup. >>>>>> Build >>>>>> a simple test, see how it works then deploy on tomcat and verify. >>>>>> A good place to start is the unit test collection available in >>>>>>jackrabbit >>>>>> core [1]. >>>>>> >>>>>> >>>>>> best, >>>>>> alex >>>>>> >>>>>> [0] https://issues.apache.org/jira/browse/JCR-3287 >>>>>> [1] >>>>>> >>>>>>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/tes >>>>>>t >>>>>>/ >>>>>>ja >>>>>> v >>>>>> >>>>>>a/org/apache/jackrabbit/core/query/FulltextSQL2QueryTest.java?view=ma >>>>>>r >>>>>>k >>>>>>up >>>>>> >>>>>> >>>>>> On Wed, Jun 27, 2012 at 8:06 PM, Furst, Carl<[email protected]> >>>>>>wrote: >>>>>> >>>>>>> So given the below I tried to use >>>>>>> >>>>>>> 'inclu*' and 'include*' and still no results so I'm going to start >>>>>>> looking >>>>>>> into perhaps maybe some of these reasons as why: >>>>>>> >>>>>>> >>>>>>> >>>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hi >>>>>>>t >>>>>>>s >>>>>>>_. >>>>>>> 2 >>>>>>> BA >>>>>>> C8_incorrect_hits.3F >>>>>>> >>>>>>> Of course it could just be that the parser is not parsing the '*'. >>>>>>> >>>>>>> Thanks again, >>>>>>> >>>>>>> >>>>>>> >>>>>>> Carl Furst >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 6/27/12 1:59 PM, "Furst, Carl"<[email protected]> wrote: >>>>>>> >>>>>>>> Thanks Torsten, >>>>>>>> >>>>>>>> So even using JQOM would not help here. I'll read up more on >>>>>>>>lucine >>>>>>> and >>>>>>>> find out more. My main stumbling block here was where the query >>>>>>>>was >>>>>>> being >>>>>>>> executed. Was it on the Derby level or the Lucine level.. >>>>>>>> >>>>>>>> This has cleared that part of it up for me as well. >>>>>>>> >>>>>>>> Thanks again, >>>>>>>> >>>>>>>> Carl Furst >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 6/27/12 1:50 PM, "Torsten Stolpmann"<[email protected]> >>>>>>>>wrote: >>>>>>>> >>>>>>>>> Hi Carl, >>>>>>>>> >>>>>>>>> per default the underlying Lucene implementation does not match >>>>>>> leading >>>>>>>>> wildcards for performance reasons. See also: >>>>>>>>> >>>>>>> >>>>>>> >>>>>>>https://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_s >>>>>>>u >>>>>>>p >>>>>>>po >>>>>>> r >>>>>>>>> t >>>>>>>>> _is_available_from_Lucene.3F >>>>>>>>> >>>>>>>>> So just matching '*' will not work, but eg. 'i*' might give you >>>>>>>>>the >>>>>>>>> results you were looking for. >>>>>>>>> >>>>>>>>> Sadly enough I did not find any reference to this in the >>>>>>>>>JackRabbit >>>>>>>>> documentation. >>>>>>>>> >>>>>>>>> Took me quite a while to find that too. >>>>>>>>> >>>>>>>>> Hope this helps, >>>>>>>>> >>>>>>>>> Torsten >>>>>>>>> >>>>>>>>> On 27.06.2012 17:19, Furst, Carl wrote: >>>>>>>>>> I'm probably missing something here but everything I've read so >>>>>>>>>>far >>>>>>>>>> leads >>>>>>>>>> me to believe this should work.. >>>>>>>>>> >>>>>>>>>> I have nodes in a repositoy of type nt:folder and nt:file. >>>>>>>>>>nt:file >>>>>>> has >>>>>>>>>> a >>>>>>>>>> child node jcr:content of type nt:resource which has a child >>>>>>> property >>>>>>>>>> called jcr:data >>>>>>>>>> >>>>>>>>>> There are many cases where the jcr:data column has the world >>>>>>> 'include' >>>>>>>>>> in >>>>>>>>>> it. They are jsp files so, yes, I know this word exists in >>>>>>>>>>several >>>>>>>>>> files. >>>>>>>>>> >>>>>>>>>> So here's the sql I use: >>>>>>>>>> >>>>>>>>>> select * from [nt:resource] where contains([jcr:data], >>>>>>>>>>'include'); >>>>>>>>>> >>>>>>>>>> Here's the sql that is returned from q.getStatement() : >>>>>>>>>> >>>>>>>>>> SELECT [nt:resource].* FROM [nt:resource] WHERE >>>>>>>>>> CONTAINS([nt:resource].[jcr:data], 'include'); >>>>>>>>>> >>>>>>>>>> Here is a sample text in jcr:data to search on. >>>>>>>>>> >>>>>>>>>> <%@ include file="..." >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ... More jsp here.. >>>>>>>>>> <%/jsp:include... >>>>>>>>>> >>>>>>>>>> Yet it doesn¹t find it. I feel I'm missing something.. Do I need >>>>>>>>>>to >>>>>>> add >>>>>>>>>> a >>>>>>>>>> "searchable" mixin or something? >>>>>>>>>> >>>>>>>>>> Any ideas why this is not being found? >>>>>>>>>> >>>>>>>>>> It used to be that apache had the cdn file for jackrabbit node >>>>>>> types >>>>>>>>>> was >>>>>>>>>> readily available. Does anyone know where I can find the cdn >>>>>>>>>>file >>>>>>> for >>>>>>>>>> jackrabbit node types? >>>>>>>>>> >>>>>>>>>> jcr:content is unstructured, but I explicitly make the type >>>>>>> nt:resource >>>>>>>>>> (otherwise the statement would would not be parsed, Query object >>>>>>> would >>>>>>>>>> throw an error, like "table not found," right? Because the type >>>>>>>>>>is >>>>>>> a >>>>>>>>>> table). So the type is right.. The field is right.. The search >>>>>>>>>>is >>>>>>> not >>>>>>>>>> working. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'm using Jackrabbit without any special configuration. Just the >>>>>>> war in >>>>>>>>>> a >>>>>>>>>> simple tomcat deployment. So it's sitting on top of Derby and >>>>>>> Lucine. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Any help would be appreciated. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Carl Furst >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ********************************************************** >>>>>>>>>> >>>>>>>>>> MLB.com: Where Baseball is Always On >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ********************************************************** >>>>>>>> >>>>>>>> MLB.com: Where Baseball is Always On >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ********************************************************** >>>>>>> >>>>>>> MLB.com: Where Baseball is Always On >>>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ********************************************************** >>>>> >>>>> MLB.com: Where Baseball is Always On >>>> >>>> >>>> >>>> >>>> >>>> >>>> ********************************************************** >>>> >>>> MLB.com: Where Baseball is Always On >>> >> >> >> >> >> >> >>********************************************************** >> >>MLB.com: Where Baseball is Always On > > > > > > >********************************************************** > >MLB.com: Where Baseball is Always On ********************************************************** MLB.com: Where Baseball is Always On
