Re: Solr and DateTimes - bug?

2011-09-13 Thread Nicklas Overgaard

Hi Mauricio,

Thanks for the suggestions :) I'm already running mono 2.10.5 so i 
should be safe..


And thanks to everybody for quick answers and friendly attitude.

Best regards,

Nicklas

On 2011-09-13 03:01, Mauricio Scheffer wrote:

Hi Nicklas,
Use a nullable DateTime type instead of MinValue. It's semantically more
correct, and SolrNet will do the right mapping.
I also heard that Mono had a bug in date parsing, it didn't behave just like
.NET :
https://github.com/mausch/SolrNet/commit/f3a76ea5535633f4b301e644e25eb2dc7f0cb7ef
IIRC this bug was fixed in Mono 2.10 or so, so make sure you're running the
latest version.
Finally, there's a specific mailing list for questions about SolrNet:
http://groups.google.com/group/solrnet

Cheers,
Mauricio



On Mon, Sep 12, 2011 at 7:54 AM, Nicklas Overgaardnick...@isharp.dkwrote:


I see. I'm using that date to flag that my entity has not yet ended. I
can just use another constant which Solr is capable of returning in the
correct format. The nice thing about DateTime.MinValue is that it's just
part of the .net framework :)

Hope that the issue is resolved at some point.

I'm wondering if it would be possible for you (or someone else) to fix the
issue with years from 1 to 999 being formatted incorrectly, and then
creating a new ticket for the issue with negative years?

Best regards,

Nicklas


On 2011-09-12 07:02, Chris Hostetter wrote:


: The XML output when performing a query via the solr interface is like
this:
:datename=endDate1-01-**01T00:00:00Z/date

i think you mean:date name=endDate1-01-01T00:00:**00Z/date

:  So my question is: Is this a bug in the solr output engine, or
should mono
:  be able to parse the date as given from solr? I have not yet tried
it out
:  on .net as I do not have access to a windows machine at the moment.

it is in fact a bug in Solr that not a lot of people have been overly
concerned with some most people don't deal with dates that far back

https://issues.apache.org/**jira/browse/SOLR-1899https://issues.apache.org/jira/browse/SOLR-1899

...I spent a little time working on it at one point but got side tracked
by other things since there are a coupld of related issues with the
canonical iso8601 date format arround year 0 that made it non obvious
what hte ideal solution was.

-Hoss







Re: question about Field Collapsing/ grouping

2011-09-13 Thread O. Klein
Isn't that what the parameter group.ngroups=true is for?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-about-Field-Collapsing-grouping-tp3331821p3332471.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: question about Field Collapsing/ grouping

2011-09-13 Thread Jayendra Patil
yup .. seems the group count feature is included now, as mentioned by Klein.

Regards,
Jayendra

On Tue, Sep 13, 2011 at 8:27 AM, O. Klein kl...@octoweb.nl wrote:
 Isn't that what the parameter group.ngroups=true is for?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/question-about-Field-Collapsing-grouping-tp3331821p3332471.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: can indexing information stored in db rather than filesystem?

2011-09-13 Thread kiran.bodigam
Thanks for u r reply guys

As suggested i agree that we are losing many of the benefits of Solr/Lucene
but i still want to store the index output (index files) in db table please
suggest what are the steps i need to follow to configure the db with SOLR
Engine (As how we done in it solrconfig.xml
dataDir${solr.data.dir:}/dataDir similarly i would like to give the path
for db table)..?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-indexing-information-stored-in-db-rather-than-filesystem-tp3319687p3332663.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: can indexing information stored in db rather than filesystem?

2011-09-13 Thread Markus Jelsma
I'm curious; what benefits do you think you'll get by storing the files in 
some DB table?

On Tuesday 13 September 2011 15:51:19 kiran.bodigam wrote:
 Thanks for u r reply guys
 
 As suggested i agree that we are losing many of the benefits of Solr/Lucene
 but i still want to store the index output (index files) in db table please
 suggest what are the steps i need to follow to configure the db with SOLR
 Engine (As how we done in it solrconfig.xml
 dataDir${solr.data.dir:}/dataDir similarly i would like to give the
 path for db table)..?
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/can-indexing-information-stored-in-db-r
 ather-than-filesystem-tp3319687p3332663.html Sent from the Solr - User
 mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


RE: can indexing information stored in db rather than filesystem?

2011-09-13 Thread Jaeger, Jay - DOT
I don't think you understand.  Solr does not have the code to do that.  It just 
isn't there, nor would I expect it would ever be there.

Solr is open source though.  You could look at the code and figure out how to 
do it (though why anyone would do that remains beyond my ability to 
understand).  As the saying goes:  Knock yourself out.

(Happy programmer's day to all.
http://en.wikipedia.org/wiki/Programmers'_Day ).

JRJ

-Original Message-
From: kiran.bodigam [mailto:kiran.bodi...@gmail.com] 
Sent: Tuesday, September 13, 2011 8:51 AM
To: solr-user@lucene.apache.org
Subject: Re: can indexing information stored in db rather than filesystem?

Thanks for u r reply guys

As suggested i agree that we are losing many of the benefits of Solr/Lucene
but i still want to store the index output (index files) in db table please
suggest what are the steps i need to follow to configure the db with SOLR
Engine (As how we done in it solrconfig.xml
dataDir${solr.data.dir:}/dataDir similarly i would like to give the path
for db table)..?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-indexing-information-stored-in-db-rather-than-filesystem-tp3319687p3332663.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: can indexing information stored in db rather than filesystem?

2011-09-13 Thread Walter Underwood
On Sep 13, 2011, at 6:51 AM, kiran.bodigam wrote:

 As suggested i agree that we are losing many of the benefits of Solr/Lucene
 but i still want to store the index output (index files) in db table please
 suggest what are the steps i need to follow to configure the db with SOLR
 Engine 

The steps are:

1. write the Java code to do that
2. submit it as contrib, because it is such a bad idea that I doubt it will be 
added to the common code

wunder
--
Walter Underwood




using a function query with OR and spaces?

2011-09-13 Thread Jason Toy
I had queries breaking on me when there were spaces in the text I was
searching for. Originally I had :

fq=state_s:New York
and that would break, I found a work around by using:

fq={!raw f=state_s}New York


My problem now is doing this with an OR query,  this is what I have now, but
it doesn't work:


fq=({!raw f=country_s}United States OR {!raw f=city_s}New York


Re: How to combine RSS w/ Tika when using Data Import Handler (DIH)

2011-09-13 Thread Pulkit Singhal
Hello Everyone,

I've been investigating and I understand that using the RegexTransformer is
an option that is open for identifying and extracting data to multiple
fields from a single rss value source ... But rather than hack together
something I once again wanted to check with the community: Is there another
option for navigating the HTML DOM tree using some well-tested transformer
or TIka or something?

Thanks!
- Pulkit

On Mon, Sep 12, 2011 at 1:45 PM, Pulkit Singhal pulkitsing...@gmail.comwrote:

 Given an RSS raw feed source link such as the following:

 http://persistent.info/cgi-bin/feed-proxy?url=http%3A%2F%2Fwww.amazon.com%2Frss%2Ftag%2Fblu-ray%2Fnew%2Fref%3Dtag_rsh_hl_ersn

 I can easily get to the value of the description for an item like so:
 field column=description xpath=/rss/item/description /

 But the content of description happens to be in HTML and sadly it is this
 HTML chunk that has some pretty decent information that I would like to
 import as well.
 1) For example it has the image for the item:
 img src=
 http://ecx.images-amazon.com/images/I/51yyAAoYzKL._SL160_SS160_.jpg; ...
 /
 2) It has the price for the item:
 span class=tgProductPrice$13.99/span
 And many other useful pieces of data that aren't in a proper rss format but
 they are simply thrown together inside the html chunk that is served as the
 value for the xpath=/rss/item/description

 So, how can I configure DIH to start importing this html information as
 well?
 Is Tika the way to go?
 Can someone give a brief example of what a config file with both Tika
 config and RSS config would/should look like?

 Thanks!
 - Pulkit



Re: using a function query with OR and spaces?

2011-09-13 Thread josh lucas
On Sep 13, 2011, at 8:37 AM, Jason Toy wrote:

 I had queries breaking on me when there were spaces in the text I was
 searching for. Originally I had :
 
 fq=state_s:New York
 and that would break, I found a work around by using:
 
 fq={!raw f=state_s}New York
 
 
 My problem now is doing this with an OR query,  this is what I have now, but
 it doesn't work:
 
 
 fq=({!raw f=country_s}United States OR {!raw f=city_s}New York

Couldn't you do:

fq=(country_s:(United States) OR city_s:(New York))

I think that should work though you probably will need to surround the queries 
with quotes to get the exact phrase match.

Re: using a function query with OR and spaces?

2011-09-13 Thread Chris Hostetter
: Subject: using a function query with OR and spaces?

First off, what you are asking about is a filter query not a function 
query

https://wiki.apache.org/solr/CommonQueryParameters#fq

: I had queries breaking on me when there were spaces in the text I was
: searching for. Originally I had :
: 
: fq=state_s:New York
: and that would break, I found a work around by using:
: 
: fq={!raw f=state_s}New York

assuming the field is a StrField, the raw or Term QParsers will work, 
or you can quote the value using something like fq=stats_s:New York

: My problem now is doing this with an OR query,  this is what I have now, but
: it doesn't work:
...
: fq=({!raw f=country_s}United States OR {!raw f=city_s}New York

That's beacause:

a) local params (ie: the {! ...} syntax) must comeat the start of a SOlr 
Param as an instruction of how to parse it.

b) the raw and term QParsers don't support *any* query markup/syntax 
(like OR modifiers).  If you want to build a complex query using 
multiple clauses that are constucted using specific QParsers, you need to 
build them up using multiple query params and/or the _query_ hook in the 
LuceneQParser...

fq=_query_:{!term f=state_s}New York OR _query_:{!term f=country_s}United 
States

https://wiki.apache.org/solr/LocalParams
http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/


-Hoss


Re: How to combine RSS w/ Tika when using Data Import Handler (DIH)

2011-09-13 Thread Chris Hostetter

: I've been investigating and I understand that using the RegexTransformer is
: an option that is open for identifying and extracting data to multiple
: fields from a single rss value source ... But rather than hack together
: something I once again wanted to check with the community: Is there another
: option for navigating the HTML DOM tree using some well-tested transformer
: or TIka or something?

I don't think so ... if it's a *really* wellformed feed, then the 
description will actually be xhtml nodes (with the appropriate 
namespace) that are already part of the Document's DOM.

But if it's just a blob of CDATA that happens to contain welformed HTML, 
then I think a regex is currently your best option -- you'll probably want 
something tailor made for the subtleties of the site whose RSS you're 
scraping anyway since things like are  chars in the URLs html escaped? 
is going to vary from site to site.

It would probably be possible to write a DIH Transformer based on 
something like tagsoup to actually produce a DOM from an arbitrary html 
string in an entity, so you could then treat it as a subentity and use the 
XPathEntityProcessor -- but i don't think i've seen anyone talk about 
doing anything like that before.

-Hoss


Highlight compounded word instead of part

2011-09-13 Thread O. Klein
I am using DictionaryCompoundWordTokenFilterFactory and want to highlight the
whole word instead of the part that matched the dictionary. 

So when the query word matches on compoundedword, the whole word
compoundedword is highlighted, instead of just word.

Any ideas?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlight-compounded-word-instead-of-part-tp225p225.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr messing up the UK GBP (pound) symbol in response, even though Java environment variabe has file encoding is set to UTF 8....

2011-09-13 Thread Chris Hostetter

: Any idea why solr is unable to return the pound sign as-is?
: 
: I tried typing in £ 1 million in Solr admin GUI and got following response.
...
: str name=q£ 1 million/str
...
: Here is my Java Properties I got also from admin interface:
...
: catalina.home =
: /home/rbhagdev/SCCRepos/SCC_Platform/search/solr/target/

Looks like you are using tomcat, so I suspect you are getting bit by 
this...

https://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

If that's not the problem, please try running the 
example/exampledocs/test_utf8.sh script against your Solr instance (you'll 
need to change the URL variable to match your host:port)


-Hoss

How to plug a new ANTLR grammar

2011-09-13 Thread Roman Chyla
Hi,

The standard lucene/solr parsing is nice but not really flexible. I
saw questions and discussion about ANTLR, but unfortunately never a
working grammar, so... maybe you find this useful:
https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr

In the grammar, the parsing is completely abstracted from the Lucene
objects, and the parser is not mixed with Java code. At first it
produces structures like this:
https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html

But now I have a problem. I don't know if I should use query parsing
framework in contrib.

It seems that the qParser in contrib can use different parser
generators (the default JavaCC, but also ANTLR). But I am confused and
I don't understand this new queryParser from contrib. It is really
very confusing to me. Is there any benefit in trying to plug the ANTLR
tree into it? Because looking at the AST pictures, it seems that with
a relatively simple tree walker we could build the same queries as the
current standard lucene query parser. And it would be much simpler and
flexible. Does it bring something new? I have a feeling I miss
something...

Many thanks for help,

  Roman


Re: How to serach on specific file types ?

2011-09-13 Thread ahmad ajiloo
1- How can I put the file extension into my index? I'm using Nutch to
crawling web pages and sending Nutch's data to Solr for indexing. and I have
no idea to put the file extension to my index.
2- please give me some help links about mime type. I'm new to Solr and don't
know anything about mime type. please note that I should index data of Nutch
and I couldn't find useful commands in Nutch tutorial for advanced indexing!
thank you very much


On Mon, Sep 12, 2011 at 6:07 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote:

 Some possibilities:

 1) Put the file extension into your index (that is what we did when we were
 testing indexing documents with Solr)
 2) Put a mime type for the document into your index.
 3) Put the whole file name / URL into your index, and match on part of the
 name.  This will give some false positives.

 JRJ

 -Original Message-
 From: ahmad ajiloo [mailto:ahmad.aji...@gmail.com]
 Sent: Monday, September 12, 2011 5:58 AM
 To: solr-user@lucene.apache.org
 Subject: Fwd: How to serach on specific file types ?

 Hello
 I want to search on articles. So need to find only specific files like doc,
 docx, and pdf.
 I don't need any html pages. Thus the result of our search should only
 consists of doc, docx, and pdf files.
 can you help me?



Get field value in custom searchcomponent (solr 3.3)

2011-09-13 Thread Pablo Ricco
What is the best way to get a float field value from docID?
I tried the following code but when it runs throws an exception For input
string: `??eI at line float lat = Float.parseFloat(tlat);

schemal.xml:
...
fieldType name=float class=solr.TrieFloatField precisionStep=0
omitNorms=true positionIncrementGap=0/
...
field name=latitude type=float indexed=true stored=true
multiValued=false /

component.java:

@Override
public void process(ResponseBuilder rb) throws IOException {
DocSet docs = rb.getResults().docSet;
SolrIndexSearcher searcher = req.getSearcher()
FieldCache.StringIndex slat =
FieldCache.DEFAULT.getStringIndex(searcher.getReader(), latitude);
DocIterator iter = docs.iterator(); while (iter.hasNext()) {
 int docID = iter.nextDoc(); String tlat = slat.lookup[slat.order[docID]];
if (tlat != null) {
 float lat = Float.parseFloat(tlat); //Exception!
}
}
}

Thanks,
Pablo


Re: How to serach on specific file types ?

2011-09-13 Thread Chris Hostetter

: 1- How can I put the file extension into my index? I'm using Nutch to
: crawling web pages and sending Nutch's data to Solr for indexing. and I have
: no idea to put the file extension to my index.
: 2- please give me some help links about mime type. I'm new to Solr and don't
: know anything about mime type. please note that I should index data of Nutch
: and I couldn't find useful commands in Nutch tutorial for advanced indexing!
: thank you very much

I think you need to ask on the nutch user's list about the type of schema 
nutch uses when indexing into Solr, wether it creates a specific field for 
file extension, and/or how you can modify the nutch indexer to create a 
field like that for you.

Assuming you get nutch to create a field named extension you can query 
solr for only docs that have a certain extension by adding it as an fq...

q=what i wantfq=extension:doc


-Hoss


Re: How to serach on specific file types ?

2011-09-13 Thread Markus Jelsma

 1- How can I put the file extension into my index? I'm using Nutch to
 crawling web pages and sending Nutch's data to Solr for indexing. and I
 have no idea to put the file extension to my index.

To get the file extension in a separate field you can copyField the url and 
use Solr's char pattern replace filter to strip away everything up to the last 
dot, if there is any.

 2- please give me some help links about mime type. I'm new to Solr and
 don't know anything about mime type. please note that I should index data
 of Nutch and I couldn't find useful commands in Nutch tutorial for
 advanced indexing! thank you very much

Use Nutch' index-more plugin. It'll by default add two or three values to a 
multi valued field (type); both sub-types and the complete mime-type of i'm 
not mistaken. There's a configuration directive to have it only index the 
complete mime-type.

 
 On Mon, Sep 12, 2011 at 6:07 PM, Jaeger, Jay - DOT 
jay.jae...@dot.wi.govwrote:
  Some possibilities:
  
  1) Put the file extension into your index (that is what we did when we
  were testing indexing documents with Solr)
  2) Put a mime type for the document into your index.
  3) Put the whole file name / URL into your index, and match on part of
  the name.  This will give some false positives.
  
  JRJ
  
  -Original Message-
  From: ahmad ajiloo [mailto:ahmad.aji...@gmail.com]
  Sent: Monday, September 12, 2011 5:58 AM
  To: solr-user@lucene.apache.org
  Subject: Fwd: How to serach on specific file types ?
  
  Hello
  I want to search on articles. So need to find only specific files like
  doc, docx, and pdf.
  I don't need any html pages. Thus the result of our search should only
  consists of doc, docx, and pdf files.
  can you help me?


Re: using a function query with OR and spaces?

2011-09-13 Thread Jason Toy
I wrote the title wrong, its a filter query, not a function query, thanks
for the correction.
The field is a string, I had tried  fq=stats_s:New York  before and that
did not work, I'm puzzled to why this didn't work.
I tried out your b suggestion and that worked,thanks!

On Tue, Sep 13, 2011 at 9:00 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:

 : Subject: using a function query with OR and spaces?

 First off, what you are asking about is a filter query not a function
 query

 https://wiki.apache.org/solr/CommonQueryParameters#fq

 : I had queries breaking on me when there were spaces in the text I was
 : searching for. Originally I had :
 :
 : fq=state_s:New York
 : and that would break, I found a work around by using:
 :
 : fq={!raw f=state_s}New York

 assuming the field is a StrField, the raw or Term QParsers will work,
 or you can quote the value using something like fq=stats_s:New York

 : My problem now is doing this with an OR query,  this is what I have now,
 but
 : it doesn't work:
...
 : fq=({!raw f=country_s}United States OR {!raw f=city_s}New York

 That's beacause:

 a) local params (ie: the {! ...} syntax) must comeat the start of a SOlr
 Param as an instruction of how to parse it.

 b) the raw and term QParsers don't support *any* query markup/syntax
 (like OR modifiers).  If you want to build a complex query using
 multiple clauses that are constucted using specific QParsers, you need to
 build them up using multiple query params and/or the _query_ hook in the
 LuceneQParser...

 fq=_query_:{!term f=state_s}New York OR _query_:{!term
 f=country_s}United States

 https://wiki.apache.org/solr/LocalParams
 http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/


 -Hoss



Re: Adding Query Filter custom implementation to Solr's pipeline

2011-09-13 Thread Chris Hostetter

: If you do need to implement something truely custom, writing it as your 
: own QParser to trigger via an fq can be advantageous so it can cached 
: and re-used by many queries.

I forgot to mention a very cool new feature that is about to be released 
in Solr 3.4

You can now instruct Solr that an fq filter query should not be cached, 
in which case Solr will only consult it after executing the main query -- 
which can be handy if you have some filtering logic that is very expensive 
to compute for each document, and you only wnat to evaluate for documents 
that have already been matched by the main query and all other filter 
queries.

Details are on the wiki...

https://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters

-Hoss


Re: How to plug a new ANTLR grammar

2011-09-13 Thread Jason Toy
I'd love to see the progress on this.

On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla roman.ch...@gmail.com wrote:

 Hi,

 The standard lucene/solr parsing is nice but not really flexible. I
 saw questions and discussion about ANTLR, but unfortunately never a
 working grammar, so... maybe you find this useful:

 https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr

 In the grammar, the parsing is completely abstracted from the Lucene
 objects, and the parser is not mixed with Java code. At first it
 produces structures like this:

 https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html

 But now I have a problem. I don't know if I should use query parsing
 framework in contrib.

 It seems that the qParser in contrib can use different parser
 generators (the default JavaCC, but also ANTLR). But I am confused and
 I don't understand this new queryParser from contrib. It is really
 very confusing to me. Is there any benefit in trying to plug the ANTLR
 tree into it? Because looking at the AST pictures, it seems that with
 a relatively simple tree walker we could build the same queries as the
 current standard lucene query parser. And it would be much simpler and
 flexible. Does it bring something new? I have a feeling I miss
 something...

 Many thanks for help,

  Roman




-- 
- sent from my mobile
6176064373


Out of memory

2011-09-13 Thread Rohit
I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
core is very big containing 77516851 docs, the stats for searcher given
below

 

searcherName : Searcher@5a578998 main 
caching : true 
numDocs : 77516851 
maxDoc : 77518729 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842 
indexVersion : 1308817281798 
openedAt : Tue Sep 13 18:59:52 GMT 2011 
registeredAt : Tue Sep 13 19:00:55 GMT 2011 
warmupTime : 63139

 

. Is there a way to reduce the number of docs loaded into memory for
this core?

. At any given time I dont need data more than past 15 days, unless
someone queries for it explicetly. How can this be achieved?

. Will it be better to go for Solr replication or distribution if
there is little option left

 

 

Regards,

Rohit

Mobile: +91-9901768202

About Me:  http://about.me/rohitg http://about.me/rohitg

 



Lucene Grid question

2011-09-13 Thread sol myr
Hi,

I have a huge Lucene index, which I'd like to split between machines (Grid).

E.g. say I have a chain of book-stores, in different countries, and I'm aiming 
for the following:
- Each country has its own index file, on its own machine (e.g. books from 
Japan are indexed on machine japan1)
- Most users search only within their own country (e.g. search only the 
japan1 index)
- But sometimes, they might ask to search the entire chain (all countries), 
meaning some sort of map/reduce (=collect data from all countries).


The main challenge is the entire chain search, especially if I want 
reasonable ranking.

After some investigation (+great help from Hibernate Search forum), I've seen 
the following suggestions:


1) Implement a LuceneDirectory that transparently spreads across several 
machines.

I'm not sure how the Search would work - can I ask each index for *relevant* 
data only?
Or would I need to maintain one huge combined file, allowing random access 
for the Searcher?


2) Run an IndexReader on each machine.

They tell me each reader can report its relevant term-frequencies, and based on 
that I can fetch relevant results from each machine.
Apparently the ranking won't be perfect (for the overhaul result), but bearable.

Now, I'm not familiar with Lucene internals, and would really appreciate your 
views on it.
- Any good articles on Lucene Gridding?
- Any idea whether approach #1 makes any sense (IMHO it's not very sensible if 
I need to merge everything to a single huge file).
- Any good implementations (of either approaches)? So far I found Hibernate 
Search 4, and Solandra.


Thanks very much.



Using the contrib flexible query parser in Solr

2011-09-13 Thread Michael Ryan
Has anyone used the Flexible Query Parser 
(https://issues.apache.org/jira/browse/LUCENE-1567) in Solr?  I'm just starting 
to look at it for the first time and was wondering if it is something that can 
be dropped into Solr fairly easily, or if more extensive changes are needed.  I 
thought perhaps someone had already done this, but I couldn't find anything in 
the Solr bug tracker.

-Michael


Re: How to plug a new ANTLR grammar

2011-09-13 Thread Peter Keegan
Roman,

I'm not familiar with the contrib, but you can write your own Java code to
create Query objects from the tree produced by your lexer and parser
something like this:

StandardLuceneGrammarLexer lexer = new ANTLRReaderStream(new
StringReader(queryString));
CommonTokenStream tokens = new CommonTokenStream(lexer);
StandardLuceneGrammarParser parser = new
StandardLuceneGrammarParser(tokens);
StandardLuceneGrammarParser.query_return ret = parser.mainQ();
CommonTree t = (CommonTree) ret.getTree();
parseTree(t);

parseTree (Tree t) {

// recursively parse the Tree, visit each node

   visit (node);

}

visit (Tree node) {

switch (node.getType()) {
case (StandardLuceneGrammarParser.AND:
// Create BooleanQuery, push onto stack
...
}
}

I use the stack to build up the final Query from the queries produced in the
tree parsing.

Hope this helps.
Peter


On Tue, Sep 13, 2011 at 3:16 PM, Jason Toy jason...@gmail.com wrote:

 I'd love to see the progress on this.

 On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla roman.ch...@gmail.com
 wrote:

  Hi,
 
  The standard lucene/solr parsing is nice but not really flexible. I
  saw questions and discussion about ANTLR, but unfortunately never a
  working grammar, so... maybe you find this useful:
 
 
 https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr
 
  In the grammar, the parsing is completely abstracted from the Lucene
  objects, and the parser is not mixed with Java code. At first it
  produces structures like this:
 
 
 https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html
 
  But now I have a problem. I don't know if I should use query parsing
  framework in contrib.
 
  It seems that the qParser in contrib can use different parser
  generators (the default JavaCC, but also ANTLR). But I am confused and
  I don't understand this new queryParser from contrib. It is really
  very confusing to me. Is there any benefit in trying to plug the ANTLR
  tree into it? Because looking at the AST pictures, it seems that with
  a relatively simple tree walker we could build the same queries as the
  current standard lucene query parser. And it would be much simpler and
  flexible. Does it bring something new? I have a feeling I miss
  something...
 
  Many thanks for help,
 
   Roman
 



 --
 - sent from my mobile
 6176064373



Re: OOM issue

2011-09-13 Thread Erick Erickson
Multiple webapps will not help you, they're still on the underlying
memory. In fact, it'll make matters worse since they won't share
resources.

So questions become:
1 Why do you have 10 cores? Putting 10 cores on the same machine
doesn't really do much. It can make lots of sense to put 10 cores on the
same machine for *indexing*, then replicate them out. But putting
10 cores on one machine in hopes of making better use of memory
isn't useful. It may be useful to just go to one core.

2 Indexing, reindexing and searching on a single machine is requiring a
lot from that machine. Really you should consider having a master/slave
setup.

3 But assuming more hardware of any sort isn't in the cards, sure. reduce
your cache sizes. Look at ramBufferSizeMB and make it small.

4 Consider indexing with Tika via SolrJ and only sending the finished
document to Solr.

Best
Erick

On Mon, Sep 12, 2011 at 5:42 AM, Manish Bafna manish.bafna...@gmail.com wrote:
 Number of cache is definitely going to reduce heap usage.

 Can you run those xlsx file separately with Tika and see if you are getting
 OOM issue.

 On Mon, Sep 12, 2011 at 3:09 PM, abhijit bashetti abhijitbashe...@gmail.com
 wrote:

 I am facing the OOM issue.

 OTHER than increasing the RAM , Can we chnage some other parameters to
 avoid the OOM issue.


 such as minimizing the filter cache size , document cache size etc.

 Can you suggest me some other option to avoid the OOM issue?


 Thanks in advance!


 Regards,

 Abhijit




Re: Document row in solr Result

2011-09-13 Thread Erick Erickson
Not sure if it really applies, but consider the
QueryElevationComponent. It can force
the display of certain documents (identified by search term) to the
top of the results
list.

Best
Erick

On Mon, Sep 12, 2011 at 5:44 AM, Eric Grobler impalah...@googlemail.com wrote:
 Hi Pierre,

 Great idea, that will speed things up!

 Thank your very much.

 Regards
 Ericz


 On Mon, Sep 12, 2011 at 10:19 AM, Pierre GOSSE pierre.go...@arisem.comwrote:

 Hi Eric,

 If you want a query informing one customer of its product row at any given
 time, the easiest way is to filter on submission date greater than this
 customer's and return the result count. If you have 500 products with an
 earlier submission date, your row number is 501.

 Hope this helps,

 Pierre


 -Message d'origine-
 De : Eric Grobler [mailto:impalah...@googlemail.com]
 Envoyé : lundi 12 septembre 2011 11:00
 À : solr-user@lucene.apache.org
 Objet : Re: Document row in solr Result

 Hi Manish,

 Thank you for your time.

 For upselling reasons I want to inform the customer that:
 your product is on the last page of the search result. However, click here
 to put your product back on the first page...


 Here is an example:
 I have a phone with productid 635001 in the iphone category.
 When I sort this category by submissiondate this product will be near the
 end of the result (on row 9863 in this example).
 At the moment I have to scan nearly 1 rows in the client to determine
 the position of this product.
 Is there a more efficient way to find the position of a specific document
 in
 a resultset without returning the full result?

 q=category:iphone
 fl=productid
 sort=submissiondate desc
 rows=1

  row productid submissiondate
   1 656569    2011-09-12 08:12
   2 656468    2011-09-12 08:03
   3 656201    2011-09-11 23:41
 ...
 9863 635001    2011-08-11 17:22
 ...
 9922 634423    2011-08-10 21:51

 Regards
 Ericz

 On Mon, Sep 12, 2011 at 9:38 AM, Manish Bafna manish.bafna...@gmail.com
 wrote:

  You might not be able to find the row index.
  Can you post your query in detail. The kind of inputs and outputs you are
  expecting.
 
  On Mon, Sep 12, 2011 at 2:01 PM, Eric Grobler impalah...@googlemail.com
  wrote:
 
   Hi Manish,
  
   Thanks for your reply - but how will that return me the row index of
 the
   original query.
  
   Regards
   Ericz
  
   On Mon, Sep 12, 2011 at 9:24 AM, Manish Bafna 
 manish.bafna...@gmail.com
   wrote:
  
fq - filter query parameter searches within the results.
   
On Mon, Sep 12, 2011 at 1:49 PM, Eric Grobler 
  impalah...@googlemail.com
wrote:
   
 Hi Solr experts,

 If you have a site with products sorted by submission date, the
  product
of
 a
 customer might be on page 1 on the first day, and then move down to
   page
x
 as other customers submit newer entries.

 To find the row of a product you can of course run the query and
 loop
 through the result until you find the specific productid like:
 q=category:myproducttype
 fl=productid
 sort=submissiondate desc
 rows=1

 But is there perhaps a more efficient way to do this? Maybe a
 special
 syntax
 to search within the result.

 Thanks
 Ericz

   
  
 




RE: can indexing information stored in db rather than filesystem?

2011-09-13 Thread Jaeger, Jay - DOT
Nicely put.  ;^)

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Tuesday, September 13, 2011 9:16 AM
To: solr-user@lucene.apache.org
Subject: Re: can indexing information stored in db rather than filesystem?

On Sep 13, 2011, at 6:51 AM, kiran.bodigam wrote:

 As suggested i agree that we are losing many of the benefits of Solr/Lucene
 but i still want to store the index output (index files) in db table please
 suggest what are the steps i need to follow to configure the db with SOLR
 Engine 

The steps are:

1. write the Java code to do that
2. submit it as contrib, because it is such a bad idea that I doubt it will be 
added to the common code

wunder
--
Walter Underwood




RE: Out of memory

2011-09-13 Thread Jaeger, Jay - DOT
numDocs is not the number of documents in memory.  It is the number of 
documents currently in the index (which is kept on disk).  Same goes for 
maxDocs, except that it is a count of all of the documents that have ever been 
in the index since it was created or optimized (including deleted documents).

Your subject indicates that something is giving you some kind of Out of memory 
error.  We might better be able to help you if you provide more information 
about your exact problem.

JRJ


-Original Message-
From: Rohit [mailto:ro...@in-rev.com] 
Sent: Tuesday, September 13, 2011 2:29 PM
To: solr-user@lucene.apache.org
Subject: Out of memory

I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
core is very big containing 77516851 docs, the stats for searcher given
below

 

searcherName : Searcher@5a578998 main 
caching : true 
numDocs : 77516851 
maxDoc : 77518729 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842 
indexVersion : 1308817281798 
openedAt : Tue Sep 13 18:59:52 GMT 2011 
registeredAt : Tue Sep 13 19:00:55 GMT 2011 
warmupTime : 63139

 

. Is there a way to reduce the number of docs loaded into memory for
this core?

. At any given time I dont need data more than past 15 days, unless
someone queries for it explicetly. How can this be achieved?

. Will it be better to go for Solr replication or distribution if
there is little option left

 

 

Regards,

Rohit

Mobile: +91-9901768202

About Me:  http://about.me/rohitg http://about.me/rohitg

 



Can index size increase when no updates/optimizes are happening?

2011-09-13 Thread Yury Kats
One of my users observed that the index size (in bytes)
increased over night. There was no indexing activity
at that time, only querying was taking place.

Running optimize brought the index size back down to
what it was when indexing finished the day before.

What could explain that?



Document Boost not evaluated when using standard Query Type?

2011-09-13 Thread Daniel Pötzinger
Hey all

I want to show all documents with of a certain type. The documents should be 
ordered by the index time document boost.

So I expected that this would work:

/select?debugQuery=onq=doctype:musicq.op=ORqt=standard

But in fact every document gets the same score:

0.7306 = (MATCH) fieldWeight(doctype:music in 1), product of:
  1.0 = tf(termFreq(doctype:music)=1)
  0.7306 = idf(docFreq=37138, maxDocs=37138)
  1.0 = fieldNorm(field=doctype, doc=1)


So I am a bit confused now? When is the (index time) document boost evaluated? 
 ( My understanding was that during indexing time the document field values are 
multiplicated - and that during search this will result in the higher scores? )

Is there a better way to get a list of all documents (matching a simple where 
clause) sorted by documents boost?

Thanks for any hints.

Daniel




Re: DIH load only selected documents with XPathEntityProcessor

2011-09-13 Thread Pulkit Singhal
This solution doesn't seem to be working for me.

I am using Solr trunk and I have the same question as Bernd with a small
twist: the field that should NOT be empty, happens to be a derived field
called price, see the config below:

entity ...
  transformer=RegexTransformer,HTMLStripTransformer,DateFormatTransformer,
script:skipRow

field column=description
  xpath=/rss/channel/item/description
  /

field column=price
 regex=.*\$(\d*.\d*)
 sourceColName=description
 /
...
/entity

I have also changed the sample script to check the price field isntead of
the link field that was being used as an example in this thread earlier:

script
![CDATA[
function skipRow(row) {
var price = row.get( 'price' );
if ( price == null || price == '' ) {
row.put( '$skipRow', 'true' );
}
return row;
}
]]
/script

Does anyone have any thoughts on what I'm missing?
Thanks!
- Pulkit

On Mon, Jan 10, 2011 at 3:06 AM, Bernd Fehling 
bernd.fehl...@uni-bielefeld.de wrote:

 Hi Gora,

 thanks a lot, very nice solution, works perfectly.
 I will dig more into ScriptTransformer, seems to be very powerful.

 Regards,
 Bernd

 Am 08.01.2011 14:38, schrieb Gora Mohanty:
  On Fri, Jan 7, 2011 at 12:30 PM, Bernd Fehling
  bernd.fehl...@uni-bielefeld.de wrote:
  Hello list,
 
  is it possible to load only selected documents with
 XPathEntityProcessor?
  While loading docs I want to drop/skip/ignore documents with missing
 URL.
 
  Example:
  documents
 document
 titlefirst title/title
 ididentifier_01/id
 linkhttp://www.foo.com/path/bar.html/link
 /document
 document
 titlesecond title/title
 ididentifier_02/id
 link/link
 /document
  /documents
 
  The first document should be loaded, the second document should be
 ignored
  because it has an empty link (should also work for missing link field).
  [...]
 
  You can use a ScriptTransformer, along with $skipRow/$skipDoc.
  E.g., something like this for your data import configuration file:
 
  dataConfig
  script![CDATA[
function skipRow(row) {
  var link = row.get( 'link' );
  if( link == null || link == '' ) {
row.put( '$skipRow', 'true' );
  }
  return row;
}
  ]]/script
  dataSource type=FileDataSource /
  document
  entity name=f processor=FileListEntityProcessor
  baseDir=/home/gora/test fileName=.*xml newerThan='NOW-3DAYS'
  recursive=true rootEntity=false dataSource=null
  entity name=top processor=XPathEntityProcessor
  forEach=/documents/document url=${f.fileAbsolutePath}
  transformer=script:skipRow
 field column=link xpath=/documents/document/link/
 field column=title xpath=/documents/document/title/
 field column=id xpath=/documents/document/id/
  /entity
  /entity
  /document
  /dataConfig
 
  Regards,
  Gora



Re: DIH load only selected documents with XPathEntityProcessor

2011-09-13 Thread Pulkit Singhal
Oh and Im sure that I'm using Java 6 because the properties from the Solr
webpage spit out:

java.runtime.version = 1.6.0_26-b03-384-10M3425


On Tue, Sep 13, 2011 at 4:15 PM, Pulkit Singhal pulkitsing...@gmail.comwrote:

 This solution doesn't seem to be working for me.

 I am using Solr trunk and I have the same question as Bernd with a small
 twist: the field that should NOT be empty, happens to be a derived field
 called price, see the config below:

 entity ...
   transformer=RegexTransformer,HTMLStripTransformer,DateFormatTransformer,
 script:skipRow

 field column=description
   xpath=/rss/channel/item/description
   /

 field column=price
  regex=.*\$(\d*.\d*)
  sourceColName=description
  /
 ...
 /entity

 I have also changed the sample script to check the price field isntead of
 the link field that was being used as an example in this thread earlier:


 script
 ![CDATA[
 function skipRow(row) {
 var price = row.get( 'price' );
 if ( price == null || price == '' ) {

 row.put( '$skipRow', 'true' );
 }
 return row;
 }
 ]]
 /script

 Does anyone have any thoughts on what I'm missing?
 Thanks!
 - Pulkit


 On Mon, Jan 10, 2011 at 3:06 AM, Bernd Fehling 
 bernd.fehl...@uni-bielefeld.de wrote:

 Hi Gora,

 thanks a lot, very nice solution, works perfectly.
 I will dig more into ScriptTransformer, seems to be very powerful.

 Regards,
 Bernd

 Am 08.01.2011 14:38, schrieb Gora Mohanty:
  On Fri, Jan 7, 2011 at 12:30 PM, Bernd Fehling
  bernd.fehl...@uni-bielefeld.de wrote:
  Hello list,
 
  is it possible to load only selected documents with
 XPathEntityProcessor?
  While loading docs I want to drop/skip/ignore documents with missing
 URL.
 
  Example:
  documents
 document
 titlefirst title/title
 ididentifier_01/id
 linkhttp://www.foo.com/path/bar.html/link
 /document
 document
 titlesecond title/title
 ididentifier_02/id
 link/link
 /document
  /documents
 
  The first document should be loaded, the second document should be
 ignored
  because it has an empty link (should also work for missing link field).
  [...]
 
  You can use a ScriptTransformer, along with $skipRow/$skipDoc.
  E.g., something like this for your data import configuration file:
 
  dataConfig
  script![CDATA[
function skipRow(row) {
  var link = row.get( 'link' );
  if( link == null || link == '' ) {
row.put( '$skipRow', 'true' );
  }
  return row;
}
  ]]/script
  dataSource type=FileDataSource /
  document
  entity name=f processor=FileListEntityProcessor
  baseDir=/home/gora/test fileName=.*xml newerThan='NOW-3DAYS'
  recursive=true rootEntity=false dataSource=null
  entity name=top processor=XPathEntityProcessor
  forEach=/documents/document url=${f.fileAbsolutePath}
  transformer=script:skipRow
 field column=link xpath=/documents/document/link/
 field column=title xpath=/documents/document/title/
 field column=id xpath=/documents/document/id/
  /entity
  /entity
  /document
  /dataConfig
 
  Regards,
  Gora





Managing solr machines (start/stop/status)

2011-09-13 Thread Jamie Johnson
I know this isn't a solr specific question but I was wondering what
folks do in regards to managing the machines in their solr cluster?
Are there any recommendations for how to start/stop/manage these
machines?  Any suggestions would be appreciated.


DIH skipping imports with skipDoc vs skipDoc

2011-09-13 Thread Pulkit Singhal
Hello,

1)  The documented explanation of skipDoc and skipRow is not enough
for me to discern the difference between them:
$skipDoc : Skip the current document . Do not add it to Solr. The
value can be String true/false
$skipRow : Skip the current row. The document will be added with rows
from other entities. The value can be String true/false
Can someone please elaborate and help me out with an example?

2) I am working off the Solr trunk (4.x) and nothing I do seems to
make the import for a given row/doc get skipped.
As proof I've added these tests to my data import xml and all the rows
are still getting indexed!!!
If anyone sees something wrong with my config please tell me.
Make sure to take note of the blatant use of row.put( '$skipDoc',
'true' ); and field column=$skipDoc template=true/
Yet stuff still gets imported, this is beyond me. Need a fresh pair of eyes :)

dataConfig
dataSource type=URLDataSource /
script
![CDATA[
function skipRow(row) {
row.put( '$skipDoc', 'true' );
return row;
}
]]
/script
document
entity name=amazon
pk=link

url=http://www.amazon.com/gp/rss/new-releases/apparel/1040660/ref=zg_bsnr_1040660_rsslink;
processor=XPathEntityProcessor
forEach=/rss/channel | /rss/channel/item

transformer=RegexTransformer,HTMLStripTransformer,DateFormatTransformer,script:skipRow,TemplateTransformer
field column=description
   xpath=/rss/channel/item/description
   /
field column=price
   regex=.*\$(\d*.\d*)
   sourceColName=description
   /
field column=$skipDoc template=true/
field column=link xpath=/rss/channel/item/link /
/entity
/document
/dataConfig


Thanks!
- Pulkit


Re: select query does not find indexed pdf document

2011-09-13 Thread Michael Dockery
Thank you for your informative reply.

I would like to start simple by combining both filename and content 
  into the same default search field
   ...which my default schema xml calls  text
...
defaultSearchFieldtext/defaultSearchField
...

also:
-case and accent insensitive
-no splits on numb3rs
-no highlights 
-text processing same for index and search

however I do like
-I like ngrams prerrably (partial/prefix word/token search)


what schema mod's would be needed?

also what curl syntax to submit/index a pdf (with filename and content combined 
into the default search field)?




From: Bob Sandiford bob.sandif...@sirsidynix.com
To: Michael Dockery dockeryjava...@yahoo.com
Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Monday, September 12, 2011 1:38 PM
Subject: RE: select query does not find indexed pdf document

Hi, Michael.

Well, the stock answer is, 'it depends'

For example - would you want to be able to search filename without searching 
file contents, or would you always search both of them together?  If both, then 
copy both the file name and the parsed file content from the pdf into a single 
search field, and you can set that up as the default search field.

Or - what kind of processing / normalizing do you want on this data?  Case 
insensitive?  Accent insensitive?  If a 'word' contains camel case (e.g. 
TheVeryIdea), do you want that split on the case changes?  (but then watch out 
for things like iPad)  If a 'word' contains numbers, do want them left 
together, or separated?  Do you want stemming (where searching for 'stemming' 
would also find 'stem', 'stemmed', that sort of thing?)  Is this always 
English, or are the other languages involved.  Do you want the text processing 
to be the same for indexing vs searching?  Do you want to be able to find hits 
based on the first few characters of a term?  (ngrams)

Do you want to be able to highlight text segments where the search terms were 
found?

probably you want to read up on the various tokenizers and filters that are 
available.  Do some prototyping and see how it looks.

Here's a starting point: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Basically, there is no 'one size fits all' here.  Part of the power of Solr / 
Lucene is its configurability to achieve the results your business case calls 
for.  Part of the drawback of Solr / Lucene - especially for new folks - is its 
configurability to achieve the results you business case calls for. :)

Anyone got anything else to suggest for Michael?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.comhttp://www.sirsidynix.com/

From: Michael Dockery [mailto:dockeryjava...@yahoo.com]
Sent: Monday, September 12, 2011 1:18 PM
To: Bob Sandiford
Subject: Re: select query does not find indexed pdf document

thank you.  that worked.

Any tips for   very   very  basic setup of the schema xml?
   or is the default basic enough?

I basically only want to search search on
        filename   and    file contents


From: Bob Sandiford bob.sandif...@sirsidynix.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org; Michael 
Dockery dockeryjava...@yahoo.com
Sent: Monday, September 12, 2011 10:04 AM
Subject: RE: select query does not find indexed pdf document

Um - looks like you specified your id value as pdfy, which is reflected in 
the results from the *:* query, but your id query is searching for vpn, 
hence no matches...

What does this query yield?

http://www/SearchApp/select/?q=id:pdfy

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | 
bob.sandif...@sirsidynix.commailto:bob.sandif...@sirsidynix.com
www.sirsidynix.com

 -Original Message-
 From: Michael Dockery 
 [mailto:dockeryjava...@yahoo.commailto:dockeryjava...@yahoo.com]
 Sent: Monday, September 12, 2011 9:56 AM
 To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
 Subject: Re: select query does not find indexed pdf document

 http://www/SearchApp/select/?q=id:vpn

 yeilds this:
   ?xml version=1.0 encoding=UTF-8 ?
 - response
 - lstname=responseHeader
   intname=status0/int
   intname=QTime15/int
 - lstname=params
   strname=qid:vpn/str
   /lst
   /lst
   result name=responsenumFound=0start=0/
   /response


 *

  http://www/SearchApp/select/?q=*:*

 yeilds this:

   ?xml version=1.0 encoding=UTF-8 ?
 - response
 - lstname=responseHeader
   intname=status0/int
   intname=QTime16/int
 - lstname=params
   strname=q*.*/str
   /lst
   /lst
 - resultname=responsenumFound=1start=0
 - doc
   strname=authordoc/str
 - arrname=content_type
   strapplication/pdf/str
   /arr
   strname=idpdfy/str
   datename=last_modified2011-05-20T02:08:48Z/date
 - arrname=title
   strdmvpndeploy.pdf/str
   /arr
   /doc
   /result
   /response


 From: Jan Høydahl jan@cominvent.commailto:jan@cominvent.com
 To: 

Re: Document Boost not evaluated when using standard Query Type?

2011-09-13 Thread Chris Hostetter

: I want to show all documents with of a certain type. The documents 
: should be ordered by the index time document boost.

...

: But in fact every document gets the same score:
: 
: 0.7306 = (MATCH) fieldWeight(doctype:music in 1), product of:
:   1.0 = tf(termFreq(doctype:music)=1)
:   0.7306 = idf(docFreq=37138, maxDocs=37138)
:   1.0 = fieldNorm(field=doctype, doc=1)

Index boosts are folded into the fieldNorm.  by the looks of it, you are 
using omitNorms=true on the field doctype

: Is there a better way to get a list of all documents (matching a simple 
: where clause) sorted by documents boost?

fieldNorms are very corse.  In my opinion, if you have a 
weighting you want to use to affect score sort, it's better to index 
that weight as a numeric field, and explicitly factor it into the score 
using a function query...

q={!boost b=yourWeightField v=$qq}qq=doctype:music

More info...

https://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html
http://www.lucidimagination.com/blog/2011/06/20/solr-powered-isfdb-part-10/
https://github.com/lucidimagination/isfdb-solr/commit/75f830caa1a11fd97ab48d6428096cf63f53cb3b

-Hoss


where is the SOLR_HOME ?

2011-09-13 Thread ahmad ajiloo
Hi
In this page 
(http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory
)http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactorysaid:
Note: to use this filter, see solr/contrib/analysis-extras/README.txt for
instructions on which jars you need to add to your SOLR_HOME/lib 
 I can't find SOLR_HOME/lib !
1- Is there: apache-solr-3.3.0\example\solr ? there is no directory which
name is lib
I created example/solr/lib directory and copied jar files to it and tested
this expressions in solrconfig.xml :
lib dir=../../example/solr/lib /
lib dir=./lib /
lib dir=../../../example/solr/lib / (for more assurance!!!)
but it doesn't work and still has following errors !

2- or: apache-solr-3.3.0\ ? there is no directory which name is lib
3- or : apache-solr-3.3.0\example ? there is a lib directory. I copied 4
libraries exist in solr/contrib/analysis-extras/
 to apache-solr-3.3.0\example\lib but some errors exist in loading page 
http://localhost:8983/solr/admin; :

I use Nutch to crawling the web and fetching web pages. I send data of Nutch
to Solr for Indexing. according to Nutch tutorial (
http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch) I
should copy schema.xml of Nutch to conf directory of Solr.
So I added all of my required Analyzer like *ICUNormalizer2FilterFactory *to
this new shema.xml


this is schema.xml :
-I
added bold text to this file
?xml version=1.0 encoding=UTF-8 ?
schema name=nutch version=1.3
types
fieldType name=string class=solr.StrField
sortMissingLast=true
omitNorms=true/
fieldType name=long class=solr.TrieLongField precisionStep=0
omitNorms=true positionIncrementGap=0/
fieldType name=float class=solr.TrieFloatField
precisionStep=0
omitNorms=true positionIncrementGap=0/
fieldType name=date class=solr.TrieDateField precisionStep=0
omitNorms=true positionIncrementGap=0/

fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

*fieldType name=text_icu class=solr.TextField
autoGeneratePhraseQueries=false
 analyzer
tokenizer class=solr.ICUTokenizerFactory/
  /analyzer
/fieldType
fieldType name=icu_sort_en class=solr.TextField
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.ICUCollationKeyFilterFactory
locale=en strength=primary/
/analyzer
/fieldType
fieldType name=normalized class=solr.TextField
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ICUNormalizer2FilterFactory name=nfkc_cf
mode=compose/
  /analyzer
/fieldType
fieldType name=folded class=solr.TextField
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ICUFoldingFilterFactory/
/analyzer
/fieldType
fieldType name=transformed class=solr.TextField
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ICUTransformFilterFactory
id=Traditional-Simplified/
/analyzer
/fieldType*

fieldType name=url class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1/
/analyzer
/fieldType
*fieldType name=text_fa class=solr.TextField
positionIncrementGap=100
analyzer
charFilter class=solr.PersianCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
/analyzer
/fieldType
fieldType name=text_fanormal class=solr.TextField
positionIncrementGap=100
analyzer
charFilter 

Re: Document Boost not evaluated when using standard Query Type?

2011-09-13 Thread Daniel Pötzinger
Thanks, that helped!

Am Sep 14, 2011 um 4:56 PM schrieb Chris Hostetter:

 
 
 fieldNorms are very corse.  In my opinion, if you have a 
 weighting you want to use to affect score sort, it's better to index 
 that weight as a numeric field, and explicitly factor it into the score 
 using a function query...
 
   q={!boost b=yourWeightField v=$qq}qq=doctype:music
 
 More info...
 
 https://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html
 http://www.lucidimagination.com/blog/2011/06/20/solr-powered-isfdb-part-10/
 https://github.com/lucidimagination/isfdb-solr/commit/75f830caa1a11fd97ab48d6428096cf63f53cb3b
 
 -Hoss



Re: Document Boost not evaluated when using standard Query Type?

2011-09-13 Thread Daniel Pötzinger
 
 fieldNorms are very corse.  In my opinion, if you have a 
 weighting you want to use to affect score sort, it's better to index 
 that weight as a numeric field, and explicitly factor it into the score 
 using a function query...

I see that in this use case this makes most sense - thanks.

But why are fieldNorms in general very corse?

Thanks,
Daniel



DIH delta last_index_time

2011-09-13 Thread Maria Vazquez
Hi,
How do you handle the situation where the time on the server running Solr
doesn¹t match the time in the database?
I¹m using the last_index_time saved by Solr in the delta query checking it
against lastModifiedDate field in the database but the times are not in sync
so I might lose some changes.
Can we use something else other than last_index_time? Maybe something like
last_pk or something.
Thanks in advance.
Maria