from:"openvictor Open"

Re: [POLL] How do you (like to) do logging with Solr

2011-05-17 Thread openvictor Open

Wow... Nobody is using the one with Jetty ? It was a good option for me
because I like to have separate processes for different things : A tomcat
server for all the webapps of my server, Jetty Server with Solr and a drools
server. Was it a stupid idea from the beginning ?

So my choice :

[ ]  I always use the JDK logging as bundled in solr.war, that's perfect
[ ]  I sometimes use log4j or another framework and am happy with
re-packaging solr.war
[ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at
deploy time
[ ]  Let me choose whether to bundle a binding or not at build time, using
an ANT option
[X]  What's wrong with the "solr/example" Jetty? I never run Solr elsewhere!
[ ]  What? Solr can do logging? How cool!

Victor

2011/5/17 Shawn Heisey 

> On 5/16/2011 5:47 AM, Jan Høydahl wrote:
>
>> That's what happens if we ship solr.war without any pre-set logger binding
>> - it's the binding provided in your app-server's classpath which will be
>> used.
>>
>
> I use the jetty that's bundled in the example, but with my own directory
> structure that's a lot different, and a homegrown init.d script.  I haven't
> changed the binding in solr.war, but I have created a logging.properties
> file to reduce it to WARNING by default and configured
> java.util.logging.config.file in jetty.xml.
>
> If I understand what you've said above correctly, removing the binding in
> solr.war would make it inherit the binding in jetty/tomcat/whatever, is that
> right?  That sounds like an awesome plan to me.  The example jetty server
> can be configured instead of solr.war.  Once you've answered this, I can
> submit my vote.
>
> A semi-related question ... is there any way to get jetty to log the entire
> URL in its request log?  Almost every request we send is truncated.  Some of
> our request URLs are nearly 20K in size.  We've had to tune all the configs
> for that to work.  We are working on making them smaller, but that's not
> going to happen quickly.  I've done a lot of searching on this topic and
> come up empty.
>
> Thanks,
> Shawn
>
>

Re: Using autocomplete with the new Suggest component

2011-04-15 Thread openvictor Open

Hi Quentin, well stick in this thread, I will try to see how it works and
get inputs from other people.

Here is the link to my blog who shows how to do it :

http://www.victorkabdebon.net/archives/16

Note that I used Tomcat + SolR, but it can easily done with PHP. Also solrj
in 1.4.1 didn't have terms component so I had to find a way around that
problem but it's provided.



2011/4/15 Quentin Proust 

> Hi Victor,
>
> I have the same questions about the new Suggest component.
> I can't really help you as I didn't really manage to understand how it
> worked.
> Sometimes, I had more results, sometimes less.
>
> Even so, I would really be interested in your resources using Terms and
> shingles to implement auto-complete.
> I am myself a French student and it could help me improve the solution of
> one of my project.
>
> Best regards,
> Quentin
>
> 2011/4/15 openvictor Open 
>
> > Hi everybody,
> >
> >
> > Recently I implemented an autocomplete mechanism for my website using a
> > custom TermsComponent. I was quite happy with that because it also
> enables
> > me to do a Google-like feature where complete sentences where suggested
> to
> > the user when he typed in the search field. I used Shingles to search
> > against pieces of sentences.
> > (I have resources for French people if somebody asks)
> >
> > Then came solr 3.1 and its new suggest component. I have looked at the
> > documentation but it's still unclear how it works exactly. So please let
> me
> > ask some questions :
> >
> >
> >   - Is there performance improvements over TermsComponent ?
> >   - Is it able to autosuggest sentences and not only words ? If yes, how
> ?
> >   Should I keep my shingles ?
> >   - What is this "threshold" value that I see ? Is it a mandatory field
> to
> >   complete ? I want to have suggestion no matter what the frequency is in
> > the
> >   document !
> >
> >
> > Thank you all, if I succeed to do that I will try to provide a tutorial
> to
> > do what with Jquery UI autocomplete + Suggest component if anyone's
> > interested.
> > Best regards.
> >
> > Victor
> >
>
>
>
> --
> 
> Quentin Proust
> Email : q.pro...@gmail.com
> Tel : 06.78.81.15.94
> http://www.linkedin.com/in/quentinproust
> 
>

Using autocomplete with the new Suggest component

2011-04-15 Thread openvictor Open

Hi everybody,


Recently I implemented an autocomplete mechanism for my website using a
custom TermsComponent. I was quite happy with that because it also enables
me to do a Google-like feature where complete sentences where suggested to
the user when he typed in the search field. I used Shingles to search
against pieces of sentences.
(I have resources for French people if somebody asks)

Then came solr 3.1 and its new suggest component. I have looked at the
documentation but it's still unclear how it works exactly. So please let me
ask some questions :


   - Is there performance improvements over TermsComponent ?
   - Is it able to autosuggest sentences and not only words ? If yes, how ?
   Should I keep my shingles ?
   - What is this "threshold" value that I see ? Is it a mandatory field to
   complete ? I want to have suggestion no matter what the frequency is in the
   document !


Thank you all, if I succeed to do that I will try to provide a tutorial to
do what with Jquery UI autocomplete + Suggest component if anyone's
interested.
Best regards.

Victor

Re: Solrj performance bottleneck

2011-04-04 Thread openvictor Open

Dear Rahul,

Stefan has the right solution. the autosuggest must be checked both from
Javascript and your backend. For javascript there are some really nice tools
to do that such as Jquery which implements a auto-suggest with a tunable
delay. It has also highlighting, you can add additional information etc...
It is actually quite impressive. Here is the address :
http://jqueryui.com/demos/autocomplete/#remote-jsonp. It's open source so
you can just copy what they have done or see the method they used.
For backend limit the number of request / second per ip or session and / or
cache result. As for cache normally solr caches the common request but I
don't know for term components.

Hope this helps you !

Victor

2011/4/4 Stefan Matheis 

> rahul,
>
> On Mon, Apr 4, 2011 at 4:18 PM, rahul  wrote:
> > if anybody has some suggestions/experience on how to leverage
> autosuggestion
> > without affecting search performance much, please do share them.
>
> we use javascript intervals for autosuggestion. regularly check the
> value of the monitored input field and if changed, trigger a new
> request. this will cover both cases, slow-typing users and also
> ten-finger-guys (which will type much faster). a new request for every
> added character is indeed too much, even if your backend is responding
> within a few ms.
>
> Regards
> Stefan
>

Re: Searching all terms - SolrJ

2011-03-01 Thread openvictor Open

Great !

Thank you very much Chris, it will come handy !

Best regards,
Victor

2011/3/1 Chris Hostetter 

>
> : Yes but I want to leave the choice to the user.
> :
> : He can either search all the terms or just some.
> :
> : Is there any more flexible solution ? Even if I have to code it by hand ?
>
> the declaration in the schema dictates the default.
>
> you can override the default at query time using the "q.op" param (ie:
> q.op=AND, q.op=OR) in the request.
>
> in SolrJ you would just call solrQuery.set("q.op","OR") on your SolrQuery
> object.
>
> -Hoss
>

Re: Searching all terms - SolrJ

2011-03-01 Thread openvictor Open

Yes but I want to leave the choice to the user.

He can either search all the terms or just some.

Is there any more flexible solution ? Even if I have to code it by hand ?



2011/3/1 Ahmet Arslan 

>
> --- On Wed, 3/2/11, openvictor Open  wrote:
>
> > From: openvictor Open 
> > Subject: Searching all terms - SolrJ
> > To: solr-user@lucene.apache.org
> > Date: Wednesday, March 2, 2011, 12:20 AM
> > Dear all,
> >
> > First I am sorry if this question has already been asked (
> > I am sure it
> > was...) but I can't find the right option with solrj.
> >
> > I want to query only documents that contains ALL query
> > terms.
> > Let me take an example, I have 4 documents that are simple
> > sequences  ( they
> > have only one field : text ):
> >
> > 1 : The cat is on the roof
> > 2 : The dog is on the roof
> > 3 : The cat is black
> > 4 : the cat is black and on the roof
> >
> > if I search "cat roof" I will have doc 1,2,3,4
> > In my case I would like to have only : doc 1 and doc 4
> > (either cat or roof
> > don't appear in doc 2 and 3).
> >
> > Is there a simple way to do that automatically with SolrJ
> > or should I should
> > something like :
> > text:cat AND text:roof ?
> >
> > Thank you very much for your help !
>
> You can use  in your schema.xml
>
>
>
>

Searching all terms - SolrJ

2011-03-01 Thread openvictor Open

Dear all,

First I am sorry if this question has already been asked ( I am sure it
was...) but I can't find the right option with solrj.

I want to query only documents that contains ALL query terms.
Let me take an example, I have 4 documents that are simple sequences  ( they
have only one field : text ):

1 : The cat is on the roof
2 : The dog is on the roof
3 : The cat is black
4 : the cat is black and on the roof

if I search "cat roof" I will have doc 1,2,3,4
In my case I would like to have only : doc 1 and doc 4 (either cat or roof
don't appear in doc 2 and 3).

Is there a simple way to do that automatically with SolrJ or should I should
something like :
text:cat AND text:roof ?

Thank you very much for your help !

Best regards,
Victor

Re: Using terms and N-gram

2011-02-04 Thread openvictor Open

Hi Otis,

That's good I finally made it. For sematext I am afraid that I am too poor
to consider this solution :) (I am doing that for fun)
Thank you anyway !

2011/2/4 Otis Gospodnetic 

> Hi,
>
> The main difference is that CommonGrams will take 2 adjacent words and put
> them
> together, while NGram* stuff will take a single word and chop it up in
> sequences
> of one or more characters/letters.
>
> If you are stuck with auto-complete stuff, consider
> http://sematext.com/products/autocomplete/index.html
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
> > From: openvictor Open 
> > To: solr-user@lucene.apache.org
> > Sent: Thu, February 3, 2011 10:15:47 AM
> > Subject: Re: Using terms and N-gram
> >
> > Thank you, I will do that and hopefuly it will be handy !
> >
> > But can someone  explain me difference between CommonGramFIlterFactory et
> > NGramFilterFactory ?  ( Maybe the solution is there)
> >
> > Thank you all,
> > best  regards
> >
> > 2011/2/3 Grijesh 
> >
> > >
> > >  Use analysis.jsp to see what happening at index time and query time
>  with
> > > your
> > > input data.You can use highlighting to see if match  found.
> > >
> > > -
> > > Thanx:
> > > Grijesh
> > > http://lucidimagination.com
> > > --
> > > View this message in  context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
> > >  Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>

Re: Using terms and N-gram

2011-02-03 Thread openvictor Open

Okay so as suggested Shingle works perfectly well for what I need !
Thank you Erick

2011/2/3 openvictor Open 

> Thank you for these inputs.
>
> I was silly asking for ngrams because I already knew it. I think I was
> tired yesterday...
>
> Thank you Eric Erickson, once again you gave me a more than useful comment.
> Indeed Shingles seems to be the perfect fit for the work I want to do. I
> will try to implement that tonight and I will come back to see if it's
> working.
>
> Regards,
> Victor
>
> 2011/2/3 Erick Erickson 
>
> First, you'll get a lot of insight by defining something simply and looking
>> at the analysis page from solr admin. That's a very valuable page.
>>
>> To your question:
>> commongrams are "shingles" that work between stopwords and
>> other words. For instance, "this is some text" gets analyzed into
>> this, this_is, is, is_some, some text. Note that the stopwords
>> are the only things that get combined with the text after.
>>
>> NGrams form on letters. It's too long to post the whole thing, but
>> the above phrase gets analyzed as
>> t, h, i, s, th, hi, is, i, s, is, s, o, m, e, so, om, me.. It splits a
>> single
>> token into grams whereas commongrams essentially combines tokens
>> when they're stopwords.
>>
>> Have you looked at "shingles"? See:
>>
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
>> Best
>> Erick
>>
>>
>> On Thu, Feb 3, 2011 at 10:15 AM, openvictor Open > >wrote:
>>
>> > Thank you, I will do that and hopefuly it will be handy !
>> >
>> > But can someone explain me difference between CommonGramFIlterFactory et
>> > NGramFilterFactory ? ( Maybe the solution is there)
>> >
>> > Thank you all,
>> > best regards
>> >
>> > 2011/2/3 Grijesh 
>> >
>> > >
>> > > Use analysis.jsp to see what happening at index time and query time
>> with
>> > > your
>> > > input data.You can use highlighting to see if match found.
>> > >
>> > > -
>> > > Thanx:
>> > > Grijesh
>> > > http://lucidimagination.com
>> > > --
>> > > View this message in context:
>> > >
>> >
>> http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
>> > > Sent from the Solr - User mailing list archive at Nabble.com.
>> > >
>> >
>>
>
>

Re: Solr for finding similar word between two documents

2011-02-03 Thread openvictor Open

Rohan : what you want to do can be done with quite little effort if your
document has a limited size (up to some Mo) with common and basic structures
like Hasmap.

Do you have any additional information on your problem so that we can give
you more useful inputs ?

2011/2/3 Gora Mohanty 

> On Thu, Feb 3, 2011 at 11:32 PM, rohan rai  wrote:
> > Is there a way to use solr and get similar words between two document
> > (files).
> [...]
>
> This is *way* too vague t make any sense out of. Could you elaborate,
> as I could have sworn that what you seem to want is the essential
> function of a search engine.
>
> Regards,
> Gora
>

Re: Using terms and N-gram

2011-02-03 Thread openvictor Open

Thank you for these inputs.

I was silly asking for ngrams because I already knew it. I think I was tired
yesterday...

Thank you Eric Erickson, once again you gave me a more than useful comment.
Indeed Shingles seems to be the perfect fit for the work I want to do. I
will try to implement that tonight and I will come back to see if it's
working.

Regards,
Victor

2011/2/3 Erick Erickson 

> First, you'll get a lot of insight by defining something simply and looking
> at the analysis page from solr admin. That's a very valuable page.
>
> To your question:
> commongrams are "shingles" that work between stopwords and
> other words. For instance, "this is some text" gets analyzed into
> this, this_is, is, is_some, some text. Note that the stopwords
> are the only things that get combined with the text after.
>
> NGrams form on letters. It's too long to post the whole thing, but
> the above phrase gets analyzed as
> t, h, i, s, th, hi, is, i, s, is, s, o, m, e, so, om, me.. It splits a
> single
> token into grams whereas commongrams essentially combines tokens
> when they're stopwords.
>
> Have you looked at "shingles"? See:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
> Best
> Erick
>
>
> On Thu, Feb 3, 2011 at 10:15 AM, openvictor Open  >wrote:
>
> > Thank you, I will do that and hopefuly it will be handy !
> >
> > But can someone explain me difference between CommonGramFIlterFactory et
> > NGramFilterFactory ? ( Maybe the solution is there)
> >
> > Thank you all,
> > best regards
> >
> > 2011/2/3 Grijesh 
> >
> > >
> > > Use analysis.jsp to see what happening at index time and query time
> with
> > > your
> > > input data.You can use highlighting to see if match found.
> > >
> > > -
> > > Thanx:
> > > Grijesh
> > > http://lucidimagination.com
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>

Re: Using terms and N-gram

2011-02-03 Thread openvictor Open

Thank you, I will do that and hopefuly it will be handy !

But can someone explain me difference between CommonGramFIlterFactory et
NGramFilterFactory ? ( Maybe the solution is there)

Thank you all,
best regards

2011/2/3 Grijesh 

>
> Use analysis.jsp to see what happening at index time and query time with
> your
> input data.You can use highlighting to see if match found.
>
> -
> Thanx:
> Grijesh
> http://lucidimagination.com
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Terms and termscomponent questions

2011-02-03 Thread openvictor Open

Dear Erick,

You were totally right about the fact that I didn't use any space to
separate words, cause SolR to concatenate words !
Everything is solved now. Thank you very much for your help !

Best regards,
Victor Kabdebon

2011/2/3 Erick Erickson 

> There are a couple of things going on here. First,
> WordDelimiterFilterFactory is
> splitting things up on letter/number boundaries. Take a look at:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
> for a list of *some* of the available tokenizers. You may want to just use
> one of the others, or change the parameters to
> WordDelimiterFilterFilterFactory
> to not split as it is.
>
> See the page: http://localhost:8983/solr/admin/analysis.jsp and check the
> "verbose"
> box to see what the effects of the various elements in your analysis chain
> are.
> This is a very important page for understanding the analysis part of the
> whole
> operation.
>
> Second, if you've been trying different things out, you may well have some
> old stuff in your index. When you delete documents, the terms are still in
> the index until an optimize. I'd advise starting with a clean slate for
> your
> experiments each time. The cheap way to do this is stop your server and
> delete /data/index. Delete the index directory too, not just the
> contents. So it's possible your TermsComponent is returning data from
> previous
> attempts, because I sure don't see how the concatenated terms would be
> in this index given the definition you've posted.
>
> And if none of that works, well, we'll try something else ..
>
> Best
> Erick
>
> On Tue, Feb 1, 2011 at 10:07 AM, openvictor Open  >wrote:
>
> > Dear Erick,
> >
> > Thank you for your answer, here is my fieldtype definition. I took the
> > standard one because I don't need a better one for this field
> >
> > 
> > 
> > 
> >  > words="stopwords.txt" enablePositionIncrements="true"/>
> >  > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="1"/>
> > 
> >  > protected="protwords.txt"/>
> > 
> > 
> > 
> >  > ignoreCase="true" expand="true"/>
> >  > words="stopwords.txt" enablePositionIncrements="true"/>
> >  > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="1"/>
> > 
> >  > protected="protwords.txt"/>
> > 
> > 
> >
> > Now my field :
> >
> > 
> >
> > But I have a doubt now... Do I really put a space between words or is it
> > just a coma... If I only put a coma then the whole process is going to be
> > impacted ? What I don't really understand is that I find the separate
> > words,
> > but also their concatenation (but again in one direction only). Let me
> > explain : if a have "man" "bear" "pig" I will find :
> > "manbearpig" "bearpig" but never pigman or anyother combination in a
> > different order.
> >
> > Thank you very much
> > Best Regards,
> > Victor
> >
> > 2011/2/1 Erick Erickson 
> >
> > > Nope, this isn't what I'd expect. There are a couple of possibilities:
> > > 1> check out what WordDelimiterFilterFactory is doing, although
> > > if you're really sending spaces that's probably not it.
> > > 2> Let's see the  and  definitions for the field
> > > in question. type="text" doesn't say anything about analysis,
> > > and that's where I'd expect you're having trouble. In particular
> > > if your analysis chain uses KeywordTokenizerFactory for instance.
> > > 3> Look at the admin/schema browse page, look at your field and
> > > see what the actual tokens are. That'll tell you what
> TermsComponents
> > > is returning, perhaps the concatenation is happening somewhere
> > > else.
> > >
> > > Bottom line: Solr will not concatenate terms like this unless you tell
> it
> > > to,
> > > so I suspect you're telling it to, you just don't realize it ...
> > >
> > > Best
> > > Erick
> > >
> > > On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open  > > >wrote:
> > >
> > > > Dear Solr users,
> > > >
> > > > I am currently using SolR and TermsComponents to make an auto suggest
> > for
> > > > my
> > > > website.
> > > >
> > > > I have a field called p_field indexed and stored with type="text" in
> > the
> > > > schema xml. Nothing out of the usual.
> > > > I feed to Solr a set of words separated by a coma and a space such as
> > > (for
> > > > two documents) :
> > > >
> > > > Document 1:
> > > > word11, word12, word13. word14
> > > >
> > > > Document 2:
> > > > word21, word22, word23. word24
> > > >
> > > >
> > > > When I use my newly designed field I get things for the prefix
> "word1"
> > :
> > > > word11, word12, word13. word14 word11word12 word11word13 etc...
> > > > Is it normal to have the concatenation of words and not only the
> words
> > > > indexed ? Did I miss something about Terms ?
> > > >
> > > > Thank you very much,
> > > > Best regards all,
> > > > Victor
> > > >
> > >
> >
>

Using terms and N-gram

2011-02-02 Thread openvictor Open

Dear all,

I am trying to implement an autocomplete system for research. But I am stuck
on some problems that I can't solve.

Here is my problem :
I give text like :
"the cat is black" and I want to explore all 1 gram to 8 gram for all the
text that are passed :
the, cat, is, black, the cat, cat is, is black, etc...

In order to do that I have defined the following fieldtype in my schema :



  


  
  


  



Then the following field :



Then I feed solr with some phrases and I was really surprised to see that
Solr didn't behave as expected.
I went to the schema browser to see the result for the very profound query :
"the cat is black and it rains"

The results are quite deceiving : first 1 grams are not found. some 2 grams
are found like : the_cat, "and_it" etc... But not what I expected.
Is there something I am missing here ? (by the way I also tried to remove
the mingramsize and maxgramsize even the words).

Thank you,
Victor Kabdebon

Re: Solr for noSQL

2011-02-01 Thread openvictor Open

Hi All I don't know if it answers any of your question but if you are
interested by that check out :

Lucandra ( Cassandra + Lucene)



2011/2/1 Steven Noels 

> On Tue, Feb 1, 2011 at 11:52 AM, Upayavira  wrote:
>
>
> >
> > Apologies if my "nothing funky" sounded like you weren't doing cool
> > stuff.
>
>
> No offense whatsoever. I think my longer reply paints a more accurate light
> on what Lily means in terms of "SOLR for NoSQL", and it was your reaction
> who triggered this additional explanation.
>
>
> > I was merely attempting to say that I very much doubt you were
> > doing anything funky like putting HBase underneath Solr as a replacement
> > of FSDirectory.
>
>
> There are some initiatives in the context of Cassandra IIRC, as well as a
> project which stores Lucene index files in HBase tables, but frankly they
> seem more experimentation, and also I think the nature of how Lucene/SOLR
> works + what HBase does on top of Hadoop FS somehow is in conflict with
> each
> other. Too many layers of indirection will kill performance on every layer.
>
>
>
> > I was trying to imply that, likely your integration with
> > Solr was relatively conventional (interacting with its REST interface),
> >
>
>
> Yep. We figured that was the wiser road to walk, and leaves a clear-defined
> interface and possible area of improvement against a too-low level of
> integration.
>
>
> > and the "funky" stuff that you are doing sits outside of that space.
> >
> > Hope that's a clearer (and more accurate?) attempt at what I was trying
> > to say.
> >
> > Upayavira (who finds the Lily project interesting, and would love to
> > find the time to play with it)
> >
>
> Anytime, Upayavira. Anytime! ;-)
>
> Steven.
> --
> Steven Noels
> http://outerthought.org/
> Scalable Smart Data
> Makers of Kauri, Daisy CMS and Lily
>

Re: Terms and termscomponent questions

2011-02-01 Thread openvictor Open

Dear Erick,

Thank you for your answer, here is my fieldtype definition. I took the
standard one because I don't need a better one for this field



















Now my field :



But I have a doubt now... Do I really put a space between words or is it
just a coma... If I only put a coma then the whole process is going to be
impacted ? What I don't really understand is that I find the separate words,
but also their concatenation (but again in one direction only). Let me
explain : if a have "man" "bear" "pig" I will find :
"manbearpig" "bearpig" but never pigman or anyother combination in a
different order.

Thank you very much
Best Regards,
Victor

2011/2/1 Erick Erickson 

> Nope, this isn't what I'd expect. There are a couple of possibilities:
> 1> check out what WordDelimiterFilterFactory is doing, although
> if you're really sending spaces that's probably not it.
> 2> Let's see the  and  definitions for the field
> in question. type="text" doesn't say anything about analysis,
> and that's where I'd expect you're having trouble. In particular
> if your analysis chain uses KeywordTokenizerFactory for instance.
> 3> Look at the admin/schema browse page, look at your field and
> see what the actual tokens are. That'll tell you what TermsComponents
> is returning, perhaps the concatenation is happening somewhere
> else.
>
> Bottom line: Solr will not concatenate terms like this unless you tell it
> to,
> so I suspect you're telling it to, you just don't realize it ...
>
> Best
> Erick
>
> On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open  >wrote:
>
> > Dear Solr users,
> >
> > I am currently using SolR and TermsComponents to make an auto suggest for
> > my
> > website.
> >
> > I have a field called p_field indexed and stored with type="text" in the
> > schema xml. Nothing out of the usual.
> > I feed to Solr a set of words separated by a coma and a space such as
> (for
> > two documents) :
> >
> > Document 1:
> > word11, word12, word13. word14
> >
> > Document 2:
> > word21, word22, word23. word24
> >
> >
> > When I use my newly designed field I get things for the prefix "word1" :
> > word11, word12, word13. word14 word11word12 word11word13 etc...
> > Is it normal to have the concatenation of words and not only the words
> > indexed ? Did I miss something about Terms ?
> >
> > Thank you very much,
> > Best regards all,
> > Victor
> >
>

Terms and termscomponent questions

2011-01-31 Thread openvictor Open

Dear Solr users,

I am currently using SolR and TermsComponents to make an auto suggest for my
website.

I have a field called p_field indexed and stored with type="text" in the
schema xml. Nothing out of the usual.
I feed to Solr a set of words separated by a coma and a space such as (for
two documents) :

Document 1:
word11, word12, word13. word14

Document 2:
word21, word22, word23. word24


When I use my newly designed field I get things for the prefix "word1" :
word11, word12, word13. word14 word11word12 word11word13 etc...
Is it normal to have the concatenation of words and not only the words
indexed ? Did I miss something about Terms ?

Thank you very much,
Best regards all,
Victor

Re: [POLL] How do you (like to) do logging with Solr

Re: Using autocomplete with the new Suggest component

Using autocomplete with the new Suggest component

Re: Solrj performance bottleneck

Re: Searching all terms - SolrJ

Re: Searching all terms - SolrJ

Searching all terms - SolrJ

Re: Using terms and N-gram

Re: Using terms and N-gram

Re: Solr for finding similar word between two documents

Re: Using terms and N-gram

Re: Using terms and N-gram

Re: Terms and termscomponent questions

Using terms and N-gram

Re: Solr for noSQL

Re: Terms and termscomponent questions

Terms and termscomponent questions

17 matches

Site Navigation

Mail list logo

Footer information