Solr Autosuggest help

2011-02-26 Thread rahul
Hi,

I am using Solr  (1.4.1) AutoSuggest feature using termsComponent.

Currently, if I type 'goo' means, Solr suggest words like 'google'.

But I would like to receive suggestions like 'google, google alerts, ..' .
ie, suggestions with single and multiple terms.

Not sure, whether I need to use edgengrams for that. for eg, indexing google
like 'go', 'oo', 'og', ... . But I think I don't need this, Since I don't
want partial search. Please let me know if there is any way to do multiple
word suggestions .

Thanks in Advance. 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2580944.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to handle special character in filter query

2011-02-26 Thread Savvas-Andreas Moysidis
Hello,

Regarding HTTP specific characters(like spaces and ) , you'll need to
URL-encode those if you are firing queries directly to Solr but you don't
need to do so if you are using a Solr client such as SolrJ.

Regards,
- Savvas

On 26 February 2011 03:11, cyang2010 ysxsu...@hotmail.com wrote:

 How to handle special character when constructing filter query?

 for example, i want to do something like:

 http://.fq=genre:ACTION  ADVENTURE


 How do i handle the space and  in the filter query part?


 Thanks.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-handle-special-character-in-filter-query-tp2579978p2579978.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Text field not defined in Solr Schema?

2011-02-26 Thread McGibbney, Lewis John
Hello list,

I have recently been working on some JS (ajax solr) and when using Firebug I am 
alerted to an error within the JS file as below.
It immediately breaks on line 12 stating that 'doc.text' is undefined! Here is 
the code snippet.

10 AjaxSolr.theme.prototype.snippet = function (doc) {
11   var output = '';
12   if (doc.text.length  300) {
13 output += doc.dateline + ' ' + doc.text.substring(0, 300);
14 output += 'span style=display:none;' +
doc.text.substring(300);
15 output += '/span a href=# class=moremore/a';
16   }
17   else {
18 output += doc.dateline + ' ' + doc.text;
19   }
20   return output;
21 };

I have been advised that the problem might stem from my schema not defining a 
text field, however as my implementation of Solr
is currently geared to index docs from a Nutch web crawl I am using the Nutch 
schema. A snippet of the schema is below

schema name=nutch version=1.1
types
   ...
fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer
...
/types
fields
...
field name=content type=text stored=true indexed=true/
/fields
/schema

Can someone confirm if I require to add something similar to the following

fields
...
field name=text type=text stored=true indexed=true/
/fields

Then perform a fresh crawl and reindex so that the schema field is recognised 
by the JS snippet?

Also (sorry I apologise) from my reading on the Solr schema, I became intrigued 
in options for TextField... namely compressed
and compressThreshold. I understand that they are used hand in glove, however 
can anyone please explain what benefits compression
adds and what integer value should be appropriate for the latter option.

Any help would be great
Thank you Lewis

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html


Re: Make syntax highlighter caseinsensitive

2011-02-26 Thread Tarjei Huse
On 02/25/2011 03:02 PM, Koji Sekiguchi wrote:
 (11/02/25 18:30), Tarjei Huse wrote:
 Hi,
 On 02/25/2011 02:06 AM, Koji Sekiguchi wrote:
 (11/02/24 20:18), Tarjei Huse wrote:
 Hi,

 I got an index where I have two fields, body and caseInsensitiveBody.
 Body is indexed and stored while caseInsensitiveBody is just indexed.

 The idea is that by not storing the caseInsensitiveBody I save some
 space and gain some performance. So I query against the
 caseInsensitiveBody and generate highlighting from the case sensitive
 one.

 The problem is that as a result, I am missing highlighting terms. For
 example, when I search for solr and get a match in caseInsensitiveBody
 for solr but that it is Solr in the original document, no highlighting
 is done.

 Is there a way around this? Currently I am using the following
 highlighting params:
   'hl' =   'on',
   'hl.fl' =   'header,body',
   'hl.usePhraseHighlighter' =   'true',
   'hl.highlightMultiTerm' =   'true',
   'hl.fragsize' =   200,
   'hl.regex.pattern' =   '[-\w ,/\n\\']{20,200}',

 Tarjei,

 Maybe silly question, but why no you make body field case insensitive
 and eliminate caseInsensitiveBody field, and then query and
 highlight on
 just body field?
 Not silly. I need to support usage scenarios where case matters as well
 as scenarios where case doesn't matter.

 The best part would be if I could use one field for this, store it and
 handle case sensitivity in the query phase, but as I understand it, that
 is not possible.

 Hi Tarjei,

 If I understand it correctly, you want to highlight case insensitive way.
 If so, it is easy. You have:

 body: indexed but not stored
 caseInsensitiveBody: indexed and stored

 and request hl.fl=caseInsensitiveBody ?

But I also want to be able to do it the other way around - i.e. I need
to keep both options open so that I at duntime can select if I want to
do a query that is or is not case insensitive. That is why I'm storing
the non lowercased version of the field - with that I do not loose
information.

Regards,
Tarjei


 Koji


-- 
Regards / Med vennlig hilsen
Tarjei Huse
Mobil: 920 63 413



Studying all files of Solr SRC

2011-02-26 Thread Anurag
Is there any place where a detailed tutorial about all the Java files of
Apache Solr(under Src folder) is available.?
I want to study them as my purpose is to either write codes for my
implementation or modify the existing files to fulfill my purpose.

Actually i want to add Advance Search in my Solr based search engine. This
advance search will include options like ...at least half , as many as
possible , most etc which are linguistic operators. We can say that these
options will help the user in finding fuzziness in their search results.

The user wants show me all the documents which contains at least half  of
terms like t1,t2,t3 or show me all the documents which contains most  of
the terms like t1,t5,t7 etc...These at least half and most have been
given some weight . These advance search is different from normal boolean
search.

Thanks


-
Kumar Anurag

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Studying-all-files-of-Solr-SRC-tp2581715p2581715.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Studying all files of Solr SRC

2011-02-26 Thread Markus Jelsma
DismaxQParser's mm parameter might help you out:
http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29

 Is there any place where a detailed tutorial about all the Java files of
 Apache Solr(under Src folder) is available.?
 I want to study them as my purpose is to either write codes for my
 implementation or modify the existing files to fulfill my purpose.
 
 Actually i want to add Advance Search in my Solr based search engine. This
 advance search will include options like ...at least half , as many as
 possible , most etc which are linguistic operators. We can say that
 these options will help the user in finding fuzziness in their search
 results.
 
 The user wants show me all the documents which contains at least half 
 of terms like t1,t2,t3 or show me all the documents which contains most 
 of the terms like t1,t5,t7 etc...These at least half and most have
 been given some weight . These advance search is different from normal
 boolean search.
 
 Thanks
 
 
 -
 Kumar Anurag


Re: Text field not defined in Solr Schema?

2011-02-26 Thread Markus Jelsma
Yes, you need to add the field text of type Text or use content instead of 
text.

 Hello list,
 
 I have recently been working on some JS (ajax solr) and when using Firebug
 I am alerted to an error within the JS file as below. It immediately
 breaks on line 12 stating that 'doc.text' is undefined! Here is the code
 snippet.
 
 10 AjaxSolr.theme.prototype.snippet = function (doc) {
 11   var output = '';
 12   if (doc.text.length  300) {
 13 output += doc.dateline + ' ' + doc.text.substring(0, 300);
 14 output += 'span style=display:none;' +
 doc.text.substring(300);
 15 output += '/span a href=# class=moremore/a';
 16   }
 17   else {
 18 output += doc.dateline + ' ' + doc.text;
 19   }
 20   return output;
 21 };
 
 I have been advised that the problem might stem from my schema not defining
 a text field, however as my implementation of Solr is currently geared to
 index docs from a Nutch web crawl I am using the Nutch schema. A snippet
 of the schema is below
 
 schema name=nutch version=1.1
 types
...
 fieldType name=text class=solr.TextField
 positionIncrementGap=100
 analyzer
 ...
 /types
 fields
 ...
 field name=content type=text stored=true indexed=true/
 /fields
 /schema
 
 Can someone confirm if I require to add something similar to the following
 
 fields
 ...
 field name=text type=text stored=true indexed=true/
 /fields
 
 Then perform a fresh crawl and reindex so that the schema field is
 recognised by the JS snippet?
 
 Also (sorry I apologise) from my reading on the Solr schema, I became
 intrigued in options for TextField... namely compressed and
 compressThreshold. I understand that they are used hand in glove, however
 can anyone please explain what benefits compression adds and what integer
 value should be appropriate for the latter option.
 
 Any help would be great
 Thank you Lewis
 
 Glasgow Caledonian University is a registered Scottish charity, number
 SC021474
 
 Winner: Times Higher Education’s Widening Participation Initiative of the
 Year 2009 and Herald Society’s Education Initiative of the Year 2009.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,
 en.html
 
 Winner: Times Higher Education’s Outstanding Support for Early Career
 Researchers of the Year 2010, GCU as a lead with Universities Scotland
 partners.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691
 ,en.html


Re: Make syntax highlighter caseinsensitive

2011-02-26 Thread Koji Sekiguchi

That is why I'm storing
the non lowercased version of the field - with that I do not loose
information.


You do not loose information when you store lowercased version of the field.

Koji
--
http://www.rondhuit.com/en/


loading XML docbook files into solr

2011-02-26 Thread Derek Werthmuller
I've been working on this for a while an seem to hit a wall.  The error
messages aren't complete enought to give guidance why importing a sample
docbook document
into solr is not working.
I'm using the curl tool to post the xml file and receive a non error message
but the document count doesn't increase and the *:* returns no results
still.
The docbook document has a attribute id and this is mapped to the uniquekey
in the schema.xml file.  But it seems this may be the issue still.  Its not
clear
how the field names map to the XML.  Do they only map to attributes?  or do
they map to elements?   How to you differentiate?
Can field names in the schema.xml file have xpath statements?

Are there other important sections of the solrconfig that could be keeping
this from working?

We want to maintain much of the document structure so we have more control
over the searching.

Here is what the docbook XML looks like:  (tried setting the uniquekey to id
and docid but no go either way)

book label=issuebriefs id=proi
docid245/docid
titleabbrevAdvancing Return on Investment Analysis for Government IT:
A Pu
blic Value Framework /titleabbrev
chapter
titleAdvancing Return on Investment Analysis for Government IT: A
Publ
ic Value Framework/title
para
mediaobject
imageobject
imagedata
fileref=/publications/annualreports/ar2006/image
s/public-value.jpg format=jpg contentdepth=157 contentwidth=216
align=le
ft/
/imageobject
textobject
phrasePublic Value Illustration/phrase
/textobject
/mediaobject

..

Here is the section of the schema.xml  
field name=id type=string indexed=true stored=true
multiValued=false required=true /
field name=titleabbrev type=text indexed=true stored=true
/
field name=title type=text indexed=true stored=true /

field name=para type=text indexed=true stored=true /
field name=ulink type=string indexed=true stored=true /
field name=listitem type=text indexed=true stored=true /

field name=all_text type=text indexed=true stored=false
multiValued=true /

   copyField source=title dest=all_text /
copyField source=para dest=all_text /
copyField source=listitem dest=all_text /
copyField source=titleabbrev dest=all_text /


 /fields

 !-- Field to use to determine and enforce document uniqueness. 
  Unless this field is marked with required=false, it will be a
required field
   --
 uniqueKeyid/uniqueKey

 !-- field for the QueryParser to use when an explicit fieldname is absent
--
 defaultSearchFieldall_text/defaultSearchField

 !-- SolrQueryParser configuration: defaultOperator=AND|OR --
 solrQueryParser defaultOperator=OR/

/schema

Load command results.

$ ./postfile.sh 
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime56/int/lst
/response
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime15/int/lst
/response


Thanks
Derek


RE: Text field not defined in Solr Schema?

2011-02-26 Thread McGibbney, Lewis John
Thank you Markus,

I am wondering if anyone can comment on the latter question I posted regarding 
supporting TextField
or StrField with compression options. I understand the methodology behind 
configuring compressThreshold
to the field type definition (1st part of my schema) and adding individual 
options to the individual field definitions (2nd part of my schema),
my question regards any real benefits which can be gained when implemented in a 
'small/medium' Solr use case.

Thank you Lewis

From: Markus Jelsma [markus.jel...@openindex.io]
Sent: 26 February 2011 13:42
To: solr-user@lucene.apache.org
Cc: McGibbney, Lewis John
Subject: Re: Text field not defined in Solr Schema?

Yes, you need to add the field text of type Text or use content instead of
text.


Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html


Re: loading XML docbook files into solr

2011-02-26 Thread Gora Mohanty
On Sat, Feb 26, 2011 at 9:10 PM, Derek Werthmuller
dwert...@ctg.albany.edu wrote:
 I've been working on this for a while an seem to hit a wall.  The error
 messages aren't complete enought to give guidance why importing a sample
 docbook document
 into solr is not working.
 I'm using the curl tool to post the xml file and receive a non error message
 but the document count doesn't increase and the *:* returns no results
 still.
[...]

Which curl tool? The post.sh included with Solr? You refer to a postfile.sh
below.

Unless I am missing something, it seems like you are trying to post
a standard XML file to Solr. You cannot do that. There are two ways
to proceed:
* Reformat the XML into Solr's format. See the .xml documents in
  the example/exampledocs directory of your Solr distribution, or see, e.g.,
  http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with-lucene-andrest.html
* Write a DataImportHandler script with an XPathEntityProcessor. Please
  see http://wiki.apache.org/solr/DataImportHandler
 Load command results.

 $ ./postfile.sh
[...]

This is not the problem here, but the standard Solr post.sh takes filenames
to be posted as command-line arguments.

Regards,
Gora


Re: loading XML docbook files into solr

2011-02-26 Thread Sujit Pal
Hi Derek,

The XML files you post to Solr needs to be in the correct Solr specific
XML format. 

One way to preserve the original structure would be to flatten the
document into field names indicating the position of the text, for
example:
book_titleabbrev: Advancing Return on Investment Analysis for Government IT:\
 A Public Value Framework
... etc.

But you will still have to parse your docbook XML into the appropriate schema 
that you want to use for Solr. I believe DIH also allows XSLT based 
preprocessors
so you don't have to write parsing code, but I haven't used them.

-sujit

On Sat, 2011-02-26 at 10:40 -0500, Derek Werthmuller wrote:
 I've been working on this for a while an seem to hit a wall.  The error
 messages aren't complete enought to give guidance why importing a sample
 docbook document
 into solr is not working.
 I'm using the curl tool to post the xml file and receive a non error message
 but the document count doesn't increase and the *:* returns no results
 still.
 The docbook document has a attribute id and this is mapped to the uniquekey
 in the schema.xml file.  But it seems this may be the issue still.  Its not
 clear
 how the field names map to the XML.  Do they only map to attributes?  or do
 they map to elements?   How to you differentiate?
 Can field names in the schema.xml file have xpath statements?
 
 Are there other important sections of the solrconfig that could be keeping
 this from working?
 
 We want to maintain much of the document structure so we have more control
 over the searching.
 
 Here is what the docbook XML looks like:  (tried setting the uniquekey to id
 and docid but no go either way)
 
 book label=issuebriefs id=proi
   docid245/docid
 titleabbrevAdvancing Return on Investment Analysis for Government IT:
 A Pu
 blic Value Framework /titleabbrev
 chapter
 titleAdvancing Return on Investment Analysis for Government IT: A
 Publ
 ic Value Framework/title
 para
 mediaobject
 imageobject
 imagedata
 fileref=/publications/annualreports/ar2006/image
 s/public-value.jpg format=jpg contentdepth=157 contentwidth=216
 align=le
 ft/
 /imageobject
 textobject
 phrasePublic Value Illustration/phrase
 /textobject
 /mediaobject
 
 ..
 
 Here is the section of the schema.xml  
 field name=id type=string indexed=true stored=true
 multiValued=false required=true /
   field name=titleabbrev type=text indexed=true stored=true
 /
   field name=title type=text indexed=true stored=true /
   
   field name=para type=text indexed=true stored=true /
   field name=ulink type=string indexed=true stored=true /
   field name=listitem type=text indexed=true stored=true /
   
   field name=all_text type=text indexed=true stored=false
 multiValued=true /
 
copyField source=title dest=all_text /
   copyField source=para dest=all_text /
   copyField source=listitem dest=all_text /
   copyField source=titleabbrev dest=all_text /
 
 
  /fields
 
  !-- Field to use to determine and enforce document uniqueness. 
   Unless this field is marked with required=false, it will be a
 required field
--
  uniqueKeyid/uniqueKey
 
  !-- field for the QueryParser to use when an explicit fieldname is absent
 --
  defaultSearchFieldall_text/defaultSearchField
 
  !-- SolrQueryParser configuration: defaultOperator=AND|OR --
  solrQueryParser defaultOperator=OR/
 
 /schema
 
 Load command results.
 
 $ ./postfile.sh 
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime56/int/lst
 /response
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime15/int/lst
 /response
 
 
 Thanks
   Derek



Re: How to handle special character in filter query

2011-02-26 Thread Rosa (Anuncios)

Try this:

fq={!field f=category}insert value, URL encoded of course, here

or double quote.

Regards

Le 26/02/2011 04:11, cyang2010 a écrit :

How to handle special character when constructing filter query?

for example, i want to do something like:

http://.fq=genre:ACTION;  ADVENTURE


How do i handle the space and  in the filter query part?


Thanks.








Blacklist keyword list on dataimporter

2011-02-26 Thread Rosa (Anuncios)

Hi,

Is there a way to drop document when indexing based of a blacklist 
keyword list?


Something like the stopwords.txt...

But in this case when one keyword is detected in a specific field at 
indexing, the whole doc would be skipped.


Regards




Re: Solr Autosuggest help

2011-02-26 Thread Ahmet Arslan
 I am using Solr  (1.4.1) AutoSuggest feature using
 termsComponent.
 
 Currently, if I type 'goo' means, Solr suggest words like
 'google'.
 
 But I would like to receive suggestions like 'google,
 google alerts, ..' .
 ie, suggestions with single and multiple terms.
 
 Not sure, whether I need to use edgengrams for that. for
 eg, indexing google
 like 'go', 'oo', 'og', ... . But I think I don't need this,
 Since I don't
 want partial search. Please let me know if there is any way
 to do multiple
 word suggestions .

If you will stick with TermsComponent, you need to add ShingleFilterFactory to 
your index analyzer chain for that.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory





RE: loading XML docbook files into solr

2011-02-26 Thread Derek Werthmuller
Thank you this clearifies a lot.
 

-Original Message-
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Saturday, February 26, 2011 11:49 AM
To: solr-user@lucene.apache.org
Subject: Re: loading XML docbook files into solr

On Sat, Feb 26, 2011 at 9:10 PM, Derek Werthmuller dwert...@ctg.albany.edu
wrote:
 I've been working on this for a while an seem to hit a wall.  The 
 error messages aren't complete enought to give guidance why importing 
 a sample docbook document into solr is not working.
 I'm using the curl tool to post the xml file and receive a non error 
 message but the document count doesn't increase and the *:* returns no 
 results still.
[...]

Which curl tool? The post.sh included with Solr? You refer to a postfile.sh
below.

Unless I am missing something, it seems like you are trying to post a
standard XML file to Solr. You cannot do that. There are two ways to
proceed:
* Reformat the XML into Solr's format. See the .xml documents in
  the example/exampledocs directory of your Solr distribution, or see, e.g.,
 
http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with-lucene-andrest.ht
ml
* Write a DataImportHandler script with an XPathEntityProcessor. Please
  see http://wiki.apache.org/solr/DataImportHandler
 Load command results.

 $ ./postfile.sh
[...]

This is not the problem here, but the standard Solr post.sh takes filenames
to be posted as command-line arguments.

Regards,
Gora


Re: query results filter

2011-02-26 Thread Babak Farhang
Just stumbled on field collapsing (
http://wiki.apache.org/solr/FieldCollapsing ), which is apparently
slated for inclusion in the next release.
Looks like I should be able to achieve my unique field requirement w/
group.limit=1group.main=true in the query string.

With regard to the known limitation Distributed search support for
result grouping has not yet been implemented, does it work
imperfectly with dist search, or does it fail?

-Babak

On Thu, Feb 24, 2011 at 10:20 PM, Babak Farhang farh...@gmail.com wrote:
 In my case, I want to filter out duplicate docs so that returned
 docs are unique w/ respect to a certain field (not the schema's unique
 field, of course): a duplicate doc here is one that has same value
 for a checksum field as one of the docs already in the results. It
 would be great if I could somehow express that w/ a query, but I don't
 think that would be possible.

 On Thu, Feb 24, 2011 at 5:11 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Hmm, depending on what you are actually needing to do, can you do it with a 
 simple fq param to filter out what you want filtered out, instead of needing 
 to write custom Java as you are suggesting? It would be a lot easier to just 
 use an fq.

 How would you describe the documents you want to filter from the query 
 results page?  Can that description be represented by a Solr query you can 
 already represent using the lucene, dismax, or any other existing query? If 
 so, why not just use a negated fq describing what to omit from the results?
 
 From: Babak Farhang [farh...@gmail.com]
 Sent: Thursday, February 24, 2011 6:58 PM
 To: solr-user
 Subject: query results filter

 Hi everyone,

 I have some existing solr cores that for one reason or another have
 documents that I need to filter from the query results page.

 I would like to do this inside Solr instead of doing it on the
 receiving end, in the client.  After searching the mailing list
 archives and Solr wiki, it appears you do this by registering a custom
 SearchHandler / SearchComponent with Solr.  Still, I don't quite
 understand how this machinery fits together.  Any suggestions / ideas
 / pointers much appreciated!

 Cheers,
 -Babak

 ~~

 Ideally, I'd like to find / code a solution that does the following:

 1. A request handler that works like the StandardRequestHandler but
 which allows an optional DocFilter (say, modeled like the
 java.io.FileFilter interface)
 2. Allows current pagination to work transparently.
 3. Works transparently with distributed/sharded queries.