date:20110519

 : query that in fact returns the negative results. As a simple example, 
 : I believe that, for a boolean field, -field:true is exactly the same as 
 : +field:false, but the former is a negative query and the latter is a 
 
 that's not strictly true in all cases... 
 
 * if the field is multivalued=true, a doc may contain both false and 
   true in field, in which case it would match +field:false but it 
   would not match -field:true
 
 * if the field is not multivalued-false, and required=false, a doc
   may not contain any value, in which case it would match -field:true but 
   it would not match +field:false

You're totally right. But it was just an example. I just didn't think about 
specifying the field to be single valued and required.

I did some testing yesterday about how are filteres cached, using the admin 
interface.
I noticed that if I perform a facet.query on a boolean field testing it to be 
true or false it always looks to add two entries to the query cache. May be it 
also adds an entry to test for unexsistence of the value?
And if I perform a facet.field on the same boolean field, three new entries are 
inserted into the filter cache. May be one for true, one for false and one for 
unexsistence? I really don't know what it's exactly doing, but doesn't look, at 
first sight, like a very optimal behaviour...
I'm testing on 1.4.1 lucidworks version of solr, using the boolean field 
inStock of its example schema, with its example data.

Out of memory on sorting

2011-05-19 Thread Rohit

Hi,

 

We are moving to a multi-core Solr installation with each of the core having
millions of documents, also documents would be added to the index on an
hourly basis.  Everything seems to run find and I getting the expected
result and performance, except where sorting is concerned.

 

I have an index size of 13217121 documents, now when I want to get documents
between two dates and then sort them by ID  solr goes out of memory. This is
with just me using the system, we might also have simultaneous users, how
can I improve this performance?

 

Rohit

Re: Out of memory on sorting

2011-05-19 Thread rajini maski

Explicit Warming of Sort Fields

If you do a lot of field based sorting, it is advantageous to add explicitly
warming queries to the newSearcher and firstSearcher event listeners in
your solrconfig which sort on those fields, so the FieldCache is populated
prior to any queries being executed by your users.
firstSearcher
lst str name=qsolr rocks/strstr name=start0/strstr
name=rows10/strstr name=sortempID asc/str/lst



On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are moving to a multi-core Solr installation with each of the core
 having
 millions of documents, also documents would be added to the index on an
 hourly basis.  Everything seems to run find and I getting the expected
 result and performance, except where sorting is concerned.



 I have an index size of 13217121 documents, now when I want to get
 documents
 between two dates and then sort them by ID  solr goes out of memory. This
 is
 with just me using the system, we might also have simultaneous users, how
 can I improve this performance?



 Rohit

SOLR Custom datasource integration

2011-05-19 Thread amit.b....@gmail.com

Hi,

We are trying build enterprise search solution using SOLR , out data source
is Database which is interfaced with JPA.

Solution looks like 

SOLR INDEX  JPA  Oracle database.

We need help to findout what is the best approch integrate Solr Index with
JPA.

We tried out two appoches 

Approch 1 - 
1 Polulating SolrInputDocument with data from JPA 
2 Updating EmbeddedSolrServer with captured data using SolrJ API.

Approch 2 - 
1 Customizing dataimporthandler of HTTPSolrServer
2 Retrieving data in dataimporthandler using JPA entity.

Functional requirement - 
1 Solution should be performant for huge magnitude of data
2 Should be scalable  

We have few question which will help us to decide solution 
Will like know which one is better approch to meet our requirement.
Is it good idea to integrate with Lucene against using EmbeddedSolrServer +
JPA
If JVM is crashes ,  EmbeddedSolrServer content will be lost on reboot.
Can we get support from Jasper Experts team ? can we buy ? how ?






--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Custom-datasource-integration-tp2960475p2960475.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting does not work when using !boost as a nested query

Hi,

The query is generated dynamically and can be more or less complex depending on 
different parameters. I'm also not free to give many details of our 
implementation, but I'll give you the minimal query string that fails and the 
relevant pieces of the config.
The query string is:

/select?q=+id:12345^0.01 +_query_:{!boost b=$dateboost v=$qq 
deftype=dismax}dateboost=recip(ms(NOW/DAY,published_date),3.16e-11,1,1)qq=user_textqf=text1^2
 text2pf=text1^2 text2tie=0.1q.alt=*:*hl=truehl.fl=text1 
text2hl.mergeContiguous=true

where id is an int and text1 and text2 are type text. hl.fl has proven to be 
necessary whenever I use dismax in an inner query. Ohterwise, only text2 (the 
default field) is highlighted, and not both fields appearing in qf. For example,
q={!dismax v=$qq}... does not require hl.fl to highlight both text1 and 
text2.
q=+_query_:{!dismax v=$qq}... only highlights text2, unless I specify 
hl.fl.

The given query is probably not minimal in the sense that some of the 
dismax-related parameters can be omitted and the query still fails. But the one 
given always fails (and adding more complexity to it does not make it work, 
quite obviously). Unfortunately, hl.requireFieldMatch=false does not help.

Request handler config is the following:

requestHandler name=standard class=solr.SearchHandler default=true
  lst name=defaults
str name=echoParamsexplicit/str
  /lst
/requestHandler

Highlighter config is the following:

highlighting
  fragmenter name=gap class=org.apache.solr.highlight.GapFragmenter 
default=true
lst name=defaults
  int name=hl.fragsize100/int
/lst
  /fragmenter
  fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter
lst name=defaults
  int name=hl.fragsize70/int
  float name=hl.regex.slop0.5/float
  str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str
/lst
  /fragmenter
  formatter name=html class=org.apache.solr.highlight.HtmlFormatter 
default=true
lst name=defaults
  str name=hl.simple.preem/str
  str name=hl.simple.post/em/str
/lst
  /formatter
/highlighting

If there's any other information that could be useful, just ask.
Thank you very much for your help,

Juan

El 16/05/2011, a las 23:18, Chris Hostetter escribió:

 
 : As I said in my previous message, if I issue:
 : q=+field1:range +field2:value +_query_:{!dismax v=$qq}
 : highlighting works. I've just discovered the problem is not just with 
 {!boost...}. If I just add a bf parameter to the previous query, highlighting 
 also fails.
 : Anybody knows what can be happening? I'm really stuck on this problem...
 
 Just a hunch, but i suspect the problem has to do with 
 highlighter (or maybe it's the fragment generator?) trying to determine
 matches from query types it doens't understand 
 
 I thought there was a query param you could use to tell the highlighter to 
 use an alternate query string (that would be simpler) instead of the 
 real query ... but i'm not seeing it in the docs.
 
 hl.requireFieldMatch=false might also help (not sure)
 
 In general it would probably be helpful for folks if you could post the 
 *entire* request you are making (full query string and all request params) 
 along with the solrconfig.xml sections that show how your request handler 
 and highlighter are configured.
 
 
 
 -Hoss

How do I write/build query using qf parameter of dismax handler for my use case?

2011-05-19 Thread Gnanakumar

Hi,

How do I write/build a Solr query using dismax handler for my application
specific use case explained below:

Snippet of fields definition from schema.xml:

field name=documentid type=string indexed=true stored=true
required=true /
field name=companyid type=long indexed=true stored=true
required=true /
field name=textfield1 type=text indexed=true stored=false
required=true /
field name=textfield2 type=text indexed=true stored=false
required=true /
field name=textfield3 type=text indexed=true stored=false
required=true /

uniqueKeydocumentid/uniqueKey
defaultSearchFieldtextfield1/defaultSearchField

Now, I want to search for documents containing solr and struts in all 3
text fields (textfield1, textfield2, textfield3) but within the companyid =
100.

As you can see from above statement, companyid=100 is common here but search
keywords should be searched only in 3 text fields (textfield1, textfield2,
textfield3).

I also understand that this can be written as shown below by qualifying all
the 3 text fields explicitly:
http://localhost/solr/select?q=companyid:100textfield1:solr AND
strutstextfield2:solr AND strutstextfield3:solr AND struts

But how do I write/build a query using qf parameter of dismax query
handler, so that I don't need to specify all the 3 fields explicitly.

Wiki says: For each word in the query string, dismax builds a
DisjunctionMaxQuery object for that word across all of the fields in the qf
param

NOTE: I'm using edismax as my default query type in my Search Handler.

Regards,
Gnanam

RE: Out of memory on sorting

2011-05-19 Thread Rohit

Thanks for pointing me in the right direction, now I see the configuration
for firstsearcher or newsearcher, the str name=q needs to configured
previously. In my case the q is every changing, users can actually search
for anything and the possibilities of queries unlimited. 

How can I make this generic?

-Rohit



-Original Message-
From: rajini maski [mailto:rajinima...@gmail.com] 
Sent: 19 May 2011 14:53
To: solr-user@lucene.apache.org
Subject: Re: Out of memory on sorting

Explicit Warming of Sort Fields

If you do a lot of field based sorting, it is advantageous to add explicitly
warming queries to the newSearcher and firstSearcher event listeners in
your solrconfig which sort on those fields, so the FieldCache is populated
prior to any queries being executed by your users.
firstSearcher
lst str name=qsolr rocks/strstr name=start0/strstr
name=rows10/strstr name=sortempID asc/str/lst



On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are moving to a multi-core Solr installation with each of the core
 having
 millions of documents, also documents would be added to the index on an
 hourly basis.  Everything seems to run find and I getting the expected
 result and performance, except where sorting is concerned.



 I have an index size of 13217121 documents, now when I want to get
 documents
 between two dates and then sort them by ID  solr goes out of memory. This
 is
 with just me using the system, we might also have simultaneous users, how
 can I improve this performance?



 Rohit

Field collapsing patch issues

2011-05-19 Thread Isha Garg


Hi All!


Kindly provide me the links for suitable patches that are applied to 
solr version 1.4.1 and 3.0 so that field collapsing should work properly.



Thanks in advance!
Isha garg

Re: How do I write/build query using qf parameter of dismax handler for my use case?

2011-05-19 Thread Grijesh

edismax supports full query format of lucene parser.But you can search using
filter queries eg.

qf=textfield1, textfield2, textfield3fq=textfield1:solr AND
strutsfq=textfield2:solr AND strutsfq=textfield3:solr AND struts
fq=companyid:100


-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-write-build-query-using-qf-parameter-of-dismax-handler-for-my-use-case-tp2960766p2960911.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: filter cache and negative filter query

 lookups to work with an arbitrary query, you would either need to changed 
 the cache structure from Query=DocSet to a mapping of 
 Query=[DocSet,inverseionBit] and store the same cache value needs needs 
 with two keys -- both the positive and the negative; or you keep the 

Well, I don't know how it's working right now, but I guess that, as the 
positive version is being stored, when you look a negative query up, you 
already have a similar lookup problem: or you store two keys for the same value 
or you just transform the negative query into a positive canonical one before 
looking it up. The same could be done in this case, with the difference that 
yes, you need an inversion bit stored too. The double lookup option sounds 
worse, though benchmarking should be done to know for sure.
Would this optimization influence only memory usage or also smaller sets are 
faster to intersect, for example? Well, in any case, saving memory allows to 
use the additional memory to speed up the application, for example, with bigger 
caches.

Re: Highlighting does not work when using !boost as a nested query

By the way, I was wrong when saying that using bf instead of !boost did not 
work either. I probably hit more than one problem at the same time when I first 
tested that.
I've retested now and this works:

/select?q=+id:12345^0.01 +_query_:{!dismax 
v=$qq}bf=recip(ms(NOW/DAY,published_date),3.16e-11,1,1)qq=user_textqf=text1^2
 text2pf=text1^2 text2tie=0.1q.alt=*:*hl=truehl.fl=text1 
text2hl.mergeContiguous=true

But I don't get the multiplicative boost I'd like to use...

El 19/05/2011, a las 11:31, Juan Antonio Farré Basurte escribió:

 Hi,
 
 The query is generated dynamically and can be more or less complex depending 
 on different parameters. I'm also not free to give many details of our 
 implementation, but I'll give you the minimal query string that fails and the 
 relevant pieces of the config.
 The query string is:
 
 /select?q=+id:12345^0.01 +_query_:{!boost b=$dateboost v=$qq 
 deftype=dismax}dateboost=recip(ms(NOW/DAY,published_date),3.16e-11,1,1)qq=user_textqf=text1^2
  text2pf=text1^2 text2tie=0.1q.alt=*:*hl=truehl.fl=text1 
 text2hl.mergeContiguous=true
 
 where id is an int and text1 and text2 are type text. hl.fl has proven to be 
 necessary whenever I use dismax in an inner query. Ohterwise, only text2 (the 
 default field) is highlighted, and not both fields appearing in qf. For 
 example,
 q={!dismax v=$qq}... does not require hl.fl to highlight both text1 and 
 text2.
 q=+_query_:{!dismax v=$qq}... only highlights text2, unless I specify 
 hl.fl.
 
 The given query is probably not minimal in the sense that some of the 
 dismax-related parameters can be omitted and the query still fails. But the 
 one given always fails (and adding more complexity to it does not make it 
 work, quite obviously). Unfortunately, hl.requireFieldMatch=false does not 
 help.
 
 Request handler config is the following:
 
 requestHandler name=standard class=solr.SearchHandler default=true
   lst name=defaults
 str name=echoParamsexplicit/str
   /lst
 /requestHandler
 
 Highlighter config is the following:
 
 highlighting
   fragmenter name=gap class=org.apache.solr.highlight.GapFragmenter 
 default=true
 lst name=defaults
   int name=hl.fragsize100/int
 /lst
   /fragmenter
   fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter
 lst name=defaults
   int name=hl.fragsize70/int
   float name=hl.regex.slop0.5/float
   str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str
 /lst
   /fragmenter
   formatter name=html class=org.apache.solr.highlight.HtmlFormatter 
 default=true
 lst name=defaults
   str name=hl.simple.preem/str
   str name=hl.simple.post/em/str
 /lst
   /formatter
 /highlighting
 
 If there's any other information that could be useful, just ask.
 Thank you very much for your help,
 
 Juan
 
 El 16/05/2011, a las 23:18, Chris Hostetter escribió:
 
 
 : As I said in my previous message, if I issue:
 : q=+field1:range +field2:value +_query_:{!dismax v=$qq}
 : highlighting works. I've just discovered the problem is not just with 
 {!boost...}. If I just add a bf parameter to the previous query, 
 highlighting also fails.
 : Anybody knows what can be happening? I'm really stuck on this problem...
 
 Just a hunch, but i suspect the problem has to do with 
 highlighter (or maybe it's the fragment generator?) trying to determine
 matches from query types it doens't understand 
 
 I thought there was a query param you could use to tell the highlighter to 
 use an alternate query string (that would be simpler) instead of the 
 real query ... but i'm not seeing it in the docs.
 
 hl.requireFieldMatch=false might also help (not sure)
 
 In general it would probably be helpful for folks if you could post the 
 *entire* request you are making (full query string and all request params) 
 along with the solrconfig.xml sections that show how your request handler 
 and highlighter are configured.
 
 
 
 -Hoss

Solr book

2011-05-19 Thread Savvas-Andreas Moysidis

Hello,

Does anyone know if there is a v 3.1 book coming any time soon?

Regards,
Savvas

Re: indexing directed graph

2011-05-19 Thread dani.b.angelov

Thank you Gora in advance!

However, I decided to create a bean for indexing something like that:
...
String[] vertices
String[] edges
int[] triple_inx_levels
...
So I can search in vertices text  edge text in vertices  edges array
fields, and I hope to find the relation from triple_inx_levels array, where
I will save indexes ot the upper two array in specific order(with some math
function, I do not find out yet). I will try in this way, I hope this will
enough for me. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2960964.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr book

2011-05-19 Thread Rafał Kuć

Hello!

  Take   a   look   at   the   Solr   resources   page   on  the  wiki
(http://wiki.apache.org/solr/SolrResources). 


-- 
Regards,
 Rafał Kuć
 http://solr.pl

RE: How do I write/build query using qf parameter of dismax handler for my use case?

2011-05-19 Thread Gnanakumar

 edismax supports full query format of lucene parser.But you can search
using
 filter queries eg.

 qf=textfield1, textfield2, textfield3fq=textfield1:solr AND
 strutsfq=textfield2:solr AND strutsfq=textfield3:solr AND struts
 fq=companyid:100
Is it not possible to build query without filter queries fq?

For example, something like this (I believe this is syntactically not
correct, but something equivalent to this):
q=companyid:100 AND solr AND strutsqf= textfield1,textfield2,textfield3

Basically, I'm just trying/finding to simplify the query syntax.

SOLR-2209

2011-05-19 Thread Jean-Sebastien Vachon

Hi All,

I am having some problems with the presence of unnecessary  parenthesis in my 
query.
A query such as:
title:software AND (title:engineer)
will return no results. Remove the parenthesis fix the issue but then since my 
user can enter the parenthesis by himself I need to find a way to fix or 
work-around this bug. I found that this is related to SOLR-2209 but there is no 
activity on this bug.

Anyone know if this will get fixed some time in the future or if it is already 
fixed in Solr 4?

Otherwise, could someone point me to the code handling this so that I can 
attempt to make a fix?

Thx

Re: Solr book

2011-05-19 Thread Savvas-Andreas Moysidis

great, thanks!

So, I guess  the Solr In Action and Solr Cookbook will be based on 3.1..
:)

2011/5/19 Rafał Kuć ra...@alud.com.pl

 Hello!

  Take   a   look   at   the   Solr   resources   page   on  the  wiki
 (http://wiki.apache.org/solr/SolrResources).


 --
 Regards,
  Rafał Kuć
  http://solr.pl

Re: sorting on date field in facet query

Hi Erick,

It is about ordering the facet information. The result set is empty via
rows=0.

Here is the logics and example:

Each doc has string field someStr and a date field associated with it, and
same doc id has same value of the date field. Question: is it possible to
sort the facet values given below on that date field?

curl
http://localhost:8983/solr/select?q=someStr:networkfacet=truefacet.field=idfacet.limit=1000facet.mincount=1rows=0

result excerpt:

lst name=facet_fields
lst name=id
int name=T-AS_1386229
54
/int
int name=T-AS_1386181
45
/int
int name=T-CP_1370095
36
/int
int name=T-AS_1377809
25
/int
int name=T-CP_1380207
18
/int
int name=T-CP_1373820
11
/int
int name=T-AS_1372073-1
8
/int
int name=T-AS_1367577
6
/int
int name=T-AS_1383141
5
/int
int name=T-AS_1383648-1
5
/int
int name=T-AS_1351183-1
4
/int
/lst
/lst


Regards,

Dmitry




On Wed, May 18, 2011 at 3:33 PM, Erick Erickson erickerick...@gmail.comwrote:

 Can you provide an example of what you are trying to do? Are you
 referring to ordering the result set or the facet information?

 Best
 Erick

 On Wed, May 18, 2011 at 7:21 AM, Dmitry Kan dmitry@gmail.com wrote:
  Hello list,
 
  Is it possible to sort on date field in a facet query in SOLR 3.1?
 
  --
  Regards,
 
  Dmitry Kan

Re: Fuzzy search and solr 4.0

2011-05-19 Thread Michael McCandless

Well the good news is FuzzyQuery is indeed much faster in Lucene/Solr 4.0.

But the bad news is... FuzzyQuery won't do what you need here.  You
need some sort of FuzzyPhraseQuery, which is able to replace terms
similar to one another (comp/company/corporation) by some metric.  I
don't know of such a query in Lucene/Solr... but it'd be a nice
addition.  Others have asked about this before.

FuzzyQuery finds terms close to other terms, when measured by edit
distance, eg fuzzy/wuzzy/muzzy are all edit distance one from each
other.

Mike

http://blog.mikemccandless.com

On Wed, May 18, 2011 at 8:03 PM, Guilherme Aiolfi grad...@gmail.com wrote:
 Hi,

 I want to do a fuzzy search that compare a phrase to a field in solr. For
 example:

 abc company ltda will be compared to abc comp, abc corporation, def
 company ltda, nothing to match here.

 The thing is the it has to always returns documents sorted by its score.

 I've found some good algorithms to do that, like StrikeAMatch[1] and
 JaroWinkler.

 Using the JaroWinkler with strdist() I can do exactly that. But, I rather
 prefer to use the StrikeAMatch that had a patch in the lucene jira that was
 never commited.

 So, I contacted the author of that patch and he told me that I should use
 the solr 4.0 that it has now some pretty good new fuzzy search enhancements
 that made StrikeAMatch seems toys for kids.

 Anyone know how can I achieve that using solr 4.0?

 [1] http://www.catalysoft.com/articles/StrikeAMatch.html

Re: Out of memory on sorting

The warming queries warm up the caches used in sorting. So
just including the sort=. will warm the sort caches. the terms
searched are not important. The same is true with facets...

However, I don't understand how that relates to your OOM problems. I'd
expect the OOM to start happening on startup, you'd be doing
the operation that runs you out of memory on startup...

So, we need more details:
1 how is your sort field defined? String? Integer? If it's a string
 and you could change it to a numeric type, you'd use a lot
 less memory.
2 How many distinct terms? I'm guessing one/document actually,
 this is somewhat of an anti-pattern in Solr for all it's sometimes
 necessary.
3 How much memory are you allocating for the JVM?
4 What other fields are you sorting on and how many unique values
 in each? Solr Admin can help you here

Best
Erick


On Thu, May 19, 2011 at 6:20 AM, Rohit ro...@in-rev.com wrote:
 Thanks for pointing me in the right direction, now I see the configuration
 for firstsearcher or newsearcher, the str name=q needs to configured
 previously. In my case the q is every changing, users can actually search
 for anything and the possibilities of queries unlimited.

 How can I make this generic?

 -Rohit



 -Original Message-
 From: rajini maski [mailto:rajinima...@gmail.com]
 Sent: 19 May 2011 14:53
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory on sorting

 Explicit Warming of Sort Fields

 If you do a lot of field based sorting, it is advantageous to add explicitly
 warming queries to the newSearcher and firstSearcher event listeners in
 your solrconfig which sort on those fields, so the FieldCache is populated
 prior to any queries being executed by your users.
 firstSearcher
 lst str name=qsolr rocks/strstr name=start0/strstr
 name=rows10/strstr name=sortempID asc/str/lst



 On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are moving to a multi-core Solr installation with each of the core
 having
 millions of documents, also documents would be added to the index on an
 hourly basis.  Everything seems to run find and I getting the expected
 result and performance, except where sorting is concerned.



 I have an index size of 13217121 documents, now when I want to get
 documents
 between two dates and then sort them by ID  solr goes out of memory. This
 is
 with just me using the system, we might also have simultaneous users, how
 can I improve this performance?



 Rohit

Re: Field collapsing patch issues

Here's the root issue, and all available patches:
https://issues.apache.org/jira/browse/SOLR-236

I confess I have no clue what's what here, so
you're largely on your own. There are some
encouraging titles (note you can sort the patches
by date, which might help in figuring out which
to use)..

Best
Erick


On Thu, May 19, 2011 at 6:43 AM, Isha Garg isha.g...@orkash.com wrote:
 Hi All!


 Kindly provide me the links for suitable patches that are applied to solr
 version 1.4.1 and 3.0 so that field collapsing should work properly.


 Thanks in advance!
 Isha garg

Spatial search with SolrJ 3.1 ? How to

2011-05-19 Thread martin_groenhof

How do you construct a query in java for spatial search ? not the default
solr REST interface

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-search-with-SolrJ-3-1-How-to-tp2961136p2961136.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Out of memory on sorting

2011-05-19 Thread Rohit

Hi Erick,

My OOM problem starts when I query the core with 13217121 documents. My
schema and other details are given below,

1 how is your sort field defined? String? Integer? If it's a string and you
could change it to a numeric type, you'd use a lot less memory.

We primarily use two different sort criteria one is a date field and the
other is string (id). I cannot change the id field as this is also the
uniquekey for my schema. 

2 How many distinct terms? I'm guessing one/document actually,this is
somewhat of an anti-pattern in Solr for all it's sometimes necessary.

Since one of the field is a timestamp instance and the other a unique key
all are distinct. (These are tweets happening for keyword)

3 How much memory are you allocating for the JVM?

I am starting solr with the following command java -Xms1024M -Xmx-2048M
start.jar


All out test case for moving to solr has passed, this is proving to be a big
set back. Help would be greatly appreciated.

Regards,
Rohit



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 19 May 2011 18:21
To: solr-user@lucene.apache.org
Subject: Re: Out of memory on sorting

The warming queries warm up the caches used in sorting. So
just including the sort=. will warm the sort caches. the terms
searched are not important. The same is true with facets...

However, I don't understand how that relates to your OOM problems. I'd
expect the OOM to start happening on startup, you'd be doing
the operation that runs you out of memory on startup...

So, we need more details:
1 how is your sort field defined? String? Integer? If it's a string
 and you could change it to a numeric type, you'd use a lot
 less memory.
2 How many distinct terms? I'm guessing one/document actually,
 this is somewhat of an anti-pattern in Solr for all it's sometimes
 necessary.
3 How much memory are you allocating for the JVM?
4 What other fields are you sorting on and how many unique values
 in each? Solr Admin can help you here

Best
Erick


On Thu, May 19, 2011 at 6:20 AM, Rohit ro...@in-rev.com wrote:
 Thanks for pointing me in the right direction, now I see the configuration
 for firstsearcher or newsearcher, the str name=q needs to configured
 previously. In my case the q is every changing, users can actually search
 for anything and the possibilities of queries unlimited.

 How can I make this generic?

 -Rohit



 -Original Message-
 From: rajini maski [mailto:rajinima...@gmail.com]
 Sent: 19 May 2011 14:53
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory on sorting

 Explicit Warming of Sort Fields

 If you do a lot of field based sorting, it is advantageous to add
explicitly
 warming queries to the newSearcher and firstSearcher event listeners
in
 your solrconfig which sort on those fields, so the FieldCache is populated
 prior to any queries being executed by your users.
 firstSearcher
 lst str name=qsolr rocks/strstr name=start0/strstr
 name=rows10/strstr name=sortempID asc/str/lst



 On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are moving to a multi-core Solr installation with each of the core
 having
 millions of documents, also documents would be added to the index on an
 hourly basis.  Everything seems to run find and I getting the expected
 result and performance, except where sorting is concerned.



 I have an index size of 13217121 documents, now when I want to get
 documents
 between two dates and then sort them by ID  solr goes out of memory. This
 is
 with just me using the system, we might also have simultaneous users, how
 can I improve this performance?



 Rohit

Re: Spatial search with SolrJ 3.1 ? How to

2011-05-19 Thread Yonik Seeley

On Thu, May 19, 2011 at 8:52 AM, martin_groenhof
martin.groen...@yahoo.com wrote:
 How do you construct a query in java for spatial search ? not the default
 solr REST interface

It depends on what you are trying to do - a spatial request (as
currently implemented in Solr) is typically more than just a query...
it can be filtering by a bounding box, filtering by a distance radius,
 or using a distance (geodist) function query in another way such as
sorting by it or using it as a factor in relevance.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Re: Fuzzy search and solr 4.0

2011-05-19 Thread Guilherme Aiolfi

You, or any other solr member, knows a good fuzzy string matching library to
recommend?

On Thu, May 19, 2011 at 9:39 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Well the good news is FuzzyQuery is indeed much faster in Lucene/Solr
 4.0.

 But the bad news is... FuzzyQuery won't do what you need here.  You
 need some sort of FuzzyPhraseQuery, which is able to replace terms
 similar to one another (comp/company/corporation) by some metric.  I
 don't know of such a query in Lucene/Solr... but it'd be a nice
 addition.  Others have asked about this before.

 FuzzyQuery finds terms close to other terms, when measured by edit
 distance, eg fuzzy/wuzzy/muzzy are all edit distance one from each
 other.

 Mike

 http://blog.mikemccandless.com

 On Wed, May 18, 2011 at 8:03 PM, Guilherme Aiolfi grad...@gmail.com
 wrote:
  Hi,
 
  I want to do a fuzzy search that compare a phrase to a field in solr. For
  example:
 
  abc company ltda will be compared to abc comp, abc corporation,
 def
  company ltda, nothing to match here.
 
  The thing is the it has to always returns documents sorted by its score.
 
  I've found some good algorithms to do that, like StrikeAMatch[1] and
  JaroWinkler.
 
  Using the JaroWinkler with strdist() I can do exactly that. But, I rather
  prefer to use the StrikeAMatch that had a patch in the lucene jira that
 was
  never commited.
 
  So, I contacted the author of that patch and he told me that I should use
  the solr 4.0 that it has now some pretty good new fuzzy search
 enhancements
  that made StrikeAMatch seems toys for kids.
 
  Anyone know how can I achieve that using solr 4.0?
 
  [1] http://www.catalysoft.com/articles/StrikeAMatch.html

[Announce[ White paper describing Near Real Time Implementation with Solr and RankingAlgorithm

2011-05-19 Thread Nagendra Nagarajayya


Hi!

I would like to announce a white paper that describes the technical 
details of  Near Real Time implementation with Solr and the 
RankingAlgorithm. The paper discusses the modifications made to enable NRT.


You can download the white paper from here:
http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf

The modified src can also be downloaded from here:
http://solr-ra.tgels.com

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.com

Re: sorting on date field in facet query

The only two ways to influence facet order is by count and alphabetically.

facet.sort=index will sort by alpha, the default is facet.sort=count

All that said, I still don't quite understand what you're asking for. Facets
are simply a count of the documents that have unique values for, in your
case, the id field. It doesn't make sense to sort the returned facets
by some other field. You can facet on the other field and sort *that*.

Sorting the documents returned is unrelated, but I don't think that's what
you're asking...

Or I completely miss the point...

Best
Erick

On Thu, May 19, 2011 at 8:24 AM, Dmitry Kan dmitry@gmail.com wrote:
Hi Erick,

It is about ordering the facet information. The result set is empty via
rows=0.

Here is the logics and example:

Each doc has string field someStr and a date field associated with it, and
same doc id has same value of the date field. Question: is it possible to
sort the facet values given below on that date field?

curl
http://localhost:8983/solr/select?q=someStr:networkfacet=truefacet.field=idfacet.limit=1000facet.mincount=1rows=0

result excerpt:

lst name=facet_fields
lst name=id
int name=T-AS_1386229
54
/int
int name=T-AS_1386181
45
/int
int name=T-CP_1370095
36
/int
int name=T-AS_1377809
25
/int
int name=T-CP_1380207
18
/int
int name=T-CP_1373820
11
/int
int name=T-AS_1372073-1
8
/int
int name=T-AS_1367577
6
/int
int name=T-AS_1383141
5
/int
int name=T-AS_1383648-1
5
/int
int name=T-AS_1351183-1
4
/int
/lst
/lst

Regards,

Dmitry

On Wed, May 18, 2011 at 3:33 PM, Erick Erickson
erickerick...@gmail.comwrote:

Can you provide an example of what you are trying to do? Are you
referring to ordering the result set or the facet information?

Best
Erick

On Wed, May 18, 2011 at 7:21 AM, Dmitry Kan dmitry@gmail.com wrote:
Hello list,

Is it possible to sort on date field in a facet query in SOLR 3.1?

--
Regards,

Dmitry Kan

Re: sorting on date field in facet query

2011-05-19 Thread Stefan Matheis

Dmitry,

how should that work? Take a this short sample-data:

id | date
T-AS_1386229 | 1995-12-31T23:59:59Z
T-AS_1386181 | 1996-12-31T23:59:59Z
T-AS_1386229 | 1997-12-31T23:59:59Z

So, you'll have two facets for the ids .. but how should they be
sorted? One (of the two) is the first and the other the last Document
.. so, sort by lowest date? highest date? i guess, that would/could
not really work.

Perhaps we have to ask another Question .. what are you trying to
achieve? Boost by Date?

Regards
Stefan

On Thu, May 19, 2011 at 2:24 PM, Dmitry Kan dmitry@gmail.com wrote:
 Hi Erick,

 It is about ordering the facet information. The result set is empty via
 rows=0.

 Here is the logics and example:

 Each doc has string field someStr and a date field associated with it, and
 same doc id has same value of the date field. Question: is it possible to
 sort the facet values given below on that date field?

 curl
 http://localhost:8983/solr/select?q=someStr:networkfacet=truefacet.field=idfacet.limit=1000facet.mincount=1rows=0

 result excerpt:

 lst name=facet_fields
 lst name=id
 int name=T-AS_1386229
 54
 /int
 int name=T-AS_1386181
 45
 /int
 int name=T-CP_1370095
 36
 /int
 int name=T-AS_1377809
 25
 /int
 int name=T-CP_1380207
 18
 /int
 int name=T-CP_1373820
 11
 /int
 int name=T-AS_1372073-1
 8
 /int
 int name=T-AS_1367577
 6
 /int
 int name=T-AS_1383141
 5
 /int
 int name=T-AS_1383648-1
 5
 /int
 int name=T-AS_1351183-1
 4
 /int
 /lst
 /lst


 Regards,

 Dmitry




 On Wed, May 18, 2011 at 3:33 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Can you provide an example of what you are trying to do? Are you
 referring to ordering the result set or the facet information?

 Best
 Erick

 On Wed, May 18, 2011 at 7:21 AM, Dmitry Kan dmitry@gmail.com wrote:
  Hello list,
 
  Is it possible to sort on date field in a facet query in SOLR 3.1?
 
  --
  Regards,
 
  Dmitry Kan

Re: Out of memory on sorting

See below:

On Thu, May 19, 2011 at 9:06 AM, Rohit ro...@in-rev.com wrote:
 Hi Erick,

 My OOM problem starts when I query the core with 13217121 documents. My
 schema and other details are given below,

H, how many cores are you running and what are they doing? Because they
all use the same memory pool, so you may be getting some carry-over. So one
strategy would be just to move this core to a dedicated machine.


 1 how is your sort field defined? String? Integer? If it's a string and you
 could change it to a numeric type, you'd use a lot less memory.

 We primarily use two different sort criteria one is a date field and the
 other is string (id). I cannot change the id field as this is also the
 uniquekey for my schema.

OK, but can you use a separate field just for sorting? Populate it with
a copyField and sort on that rather than ID. This is only helpful if
you can make a compact representation, e.g. integer.


 2 How many distinct terms? I'm guessing one/document actually,this is
 somewhat of an anti-pattern in Solr for all it's sometimes necessary.

 Since one of the field is a timestamp instance and the other a unique key
 all are distinct. (These are tweets happening for keyword)


Not one, but two fields where all values are distinct. Although  I don't think
the timestamp is much of a problem, assuming you're storing it as one
of the numeric types (I'd especially make sure it was one of the Trie types,
specifically tdate if you're going to do range queries). There are tricks for
dealing with this, but your id field will get you a bigger bang for the buck,
concentrate on that first.

 3 How much memory are you allocating for the JVM?

 I am starting solr with the following command java -Xms1024M -Xmx-2048M
 start.jar


Well, you can bump this higher if you're on 64 bit OSs, The other possibility is
to shard your index. But really, with 13M documents this should fit on one
machine.

What does your statistics page tell you, especially about cache usage?




 All out test case for moving to solr has passed, this is proving to be a big
 set back. Help would be greatly appreciated.

 Regards,
 Rohit



 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: 19 May 2011 18:21
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory on sorting

 The warming queries warm up the caches used in sorting. So
 just including the sort=. will warm the sort caches. the terms
 searched are not important. The same is true with facets...

 However, I don't understand how that relates to your OOM problems. I'd
 expect the OOM to start happening on startup, you'd be doing
 the operation that runs you out of memory on startup...

 So, we need more details:
 1 how is your sort field defined? String? Integer? If it's a string
     and you could change it to a numeric type, you'd use a lot
     less memory.
 2 How many distinct terms? I'm guessing one/document actually,
     this is somewhat of an anti-pattern in Solr for all it's sometimes
     necessary.
 3 How much memory are you allocating for the JVM?
 4 What other fields are you sorting on and how many unique values
     in each? Solr Admin can help you here

 Best
 Erick


 On Thu, May 19, 2011 at 6:20 AM, Rohit ro...@in-rev.com wrote:
 Thanks for pointing me in the right direction, now I see the configuration
 for firstsearcher or newsearcher, the str name=q needs to configured
 previously. In my case the q is every changing, users can actually search
 for anything and the possibilities of queries unlimited.

 How can I make this generic?

 -Rohit



 -Original Message-
 From: rajini maski [mailto:rajinima...@gmail.com]
 Sent: 19 May 2011 14:53
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory on sorting

 Explicit Warming of Sort Fields

 If you do a lot of field based sorting, it is advantageous to add
 explicitly
 warming queries to the newSearcher and firstSearcher event listeners
 in
 your solrconfig which sort on those fields, so the FieldCache is populated
 prior to any queries being executed by your users.
 firstSearcher
 lst str name=qsolr rocks/strstr name=start0/strstr
 name=rows10/strstr name=sortempID asc/str/lst



 On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are moving to a multi-core Solr installation with each of the core
 having
 millions of documents, also documents would be added to the index on an
 hourly basis.  Everything seems to run find and I getting the expected
 result and performance, except where sorting is concerned.



 I have an index size of 13217121 documents, now when I want to get
 documents
 between two dates and then sort them by ID  solr goes out of memory. This
 is
 with just me using the system, we might also have simultaneous users, how
 can I improve this performance?



 Rohit

Re: sorting on date field in facet query

Hi,

Thanks for the questions, guys, and sorry for the confusion. I should start
with a broader picture of what we are trying to achieve. The only problem is
that I cannot speak about specifics of the task we are solving the way we
do. We currently sort the facets on the client side, having the date values
at hand (done by an boolean query to SOLR with a list of ids). However,
sometimes we have glitches, that is since we limit the facets to first
facet.limit ones, and there is no date boosting we may have some facet
counts end up beyond the facet counts range and that's sad. One way around
it would be to facet with pagination, where a page would correspond to a
date subrange in the range of required dates. But we haven't tried it yet
before we investigate what can be done inside SOLR (by modifying its source
code, if needed).

So as said every solr doc that has some id in the solr index (this id is
used to combine several solr docs logically, only that purpose; this design
comes from the task definition) has a date field, and the value of that date
field is always same for a given doc id across all the solr docs with the
same doc id.

Now, taking the Stefan's example, I would like to sort desc the facets by
date (yes, date boosting during the facet gathering process) that were
calculated against someStr field:

int name=T-AS_1386181
45
/int
int name=T-AS_1386229
54
/int

So SOLR facet component would ignore the counts and sort the facets by dates
desc (in reverse chronological order).

Is it possible to implement such a solution through some class inheritance
in facet component?

Regards,

Dmitry

On Thu, May 19, 2011 at 4:25 PM, Stefan Matheis
matheis.ste...@googlemail.com wrote:

Dmitry,

how should that work? Take a this short sample-data:

id | date
T-AS_1386229 | 1995-12-31T23:59:59Z
T-AS_1386181 | 1996-12-31T23:59:59Z
T-AS_1386229 | 1997-12-31T23:59:59Z

So, you'll have two facets for the ids .. but how should they be
sorted? One (of the two) is the first and the other the last Document
.. so, sort by lowest date? highest date? i guess, that would/could
not really work.

Perhaps we have to ask another Question .. what are you trying to
achieve? Boost by Date?

Regards
Stefan

On Thu, May 19, 2011 at 2:24 PM, Dmitry Kan dmitry@gmail.com wrote:
Hi Erick,

It is about ordering the facet information. The result set is empty via
rows=0.

Here is the logics and example:

Each doc has string field someStr and a date field associated with it,
and
same doc id has same value of the date field. Question: is it possible to
sort the facet values given below on that date field?

curl

http://localhost:8983/solr/select?q=someStr:networkfacet=truefacet.field=idfacet.limit=1000facet.mincount=1rows=0

result excerpt:

Regards,

Dmitry

On Wed, May 18, 2011 at 3:33 PM, Erick Erickson erickerick...@gmail.com
wrote:

Can you provide an example of what you are trying to do? Are you
referring to ordering the result set or the facet information?

Best
Erick

On Wed, May 18, 2011 at 7:21 AM, Dmitry Kan dmitry@gmail.com
wrote:
Hello list,

Is it possible to sort on date field in a facet query in SOLR 3.1?

--
Regards,

Dmitry Kan

Re: SOLR-2209

What version of Solr are you using? Because this works fine for me.

Could you attach the results of adding debugQuery=on in both instances?
The parsed form of the query is identical in 1.4.1 as far as I can tell. The bug
you're referencing is a peculiarity of the not (-) operator I think.

Best
Erick

On Thu, May 19, 2011 at 7:25 AM, Jean-Sebastien Vachon
jean-sebastien.vac...@wantedtech.com wrote:
 Hi All,

 I am having some problems with the presence of unnecessary  parenthesis in my 
 query.
 A query such as:
                title:software AND (title:engineer)
 will return no results. Remove the parenthesis fix the issue but then since 
 my user can enter the parenthesis by himself I need to find a way to fix or 
 work-around this bug. I found that this is related to SOLR-2209 but there is 
 no activity on this bug.

 Anyone know if this will get fixed some time in the future or if it is 
 already fixed in Solr 4?

 Otherwise, could someone point me to the code handling this so that I can 
 attempt to make a fix?

 Thx

Re: Spatial search with SolrJ 3.1 ? How to

2011-05-19 Thread martin_groenhof

I don't care about the method, I just want results within let's say 10km of a
lat,lng ?

(I can do this with REST) but don't know how to with a Java API

[code]SpatialOptions spatialOptions =
new SpatialOptions(company.getLatitude() + , +
company.getLongitude(),
10, new SchemaField(geolocation, null), searchName, 20,
DistanceUnits.KILOMETERS);

LatLonType latLonType = new LatLonType();

Query query = latLonType.createSpatialQuery(new
SpatialFilterQParser(searchString.toString(), solrq, solrq, null, true),
spatialOptions);[/code]

(I am trying with this, but it does not seem to be compatible with solr only
lucene)

Any example will do, Thx

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-search-with-SolrJ-3-1-How-to-tp2961136p2961452.html
Sent from the Solr - User mailing list archive at Nabble.com.

Facetting: Some questions concerning method:fc

2011-05-19 Thread Erik Fäßler


 Hey all!

I have a few questions concerning the field cache method for faceting.
The wiki says for enum method: This was the default (and only) method 
for faceting multi-valued fields prior to Solr 1.4. . And for fc 
method: This was the default method for single valued fields prior to 
Solr 1.4. .
I just ran into the problem of using fc for a field which can have 
multiple terms for one field. The facet counts would be wrong, seemingly 
only counting the first term in the field of each document. I observed 
this in Solr 1.4.1 and in 3.1 with the same index.


Question 1: The quotes above say prior to Solr 1.4. Has this changed? 
Is there another method for multi-valued faceting since Solr 1.4?
Question 2: Very weird is another observation: When faceting on another 
field, namely the text field holding a large variety of terms and 
especially a lot of different terms in one single field, the fc method 
seems to count everything correctly. In fact, the results between fc and 
enum don't seem to differ. The field in which the fc and enum faceting 
results differ consists of a lot of terms which have all start- end end 
offsets 0, 0 and position increment 1. Could this be a problem?


Best regards,

Erik

how to convert YYYY-MM-DD to YYY-MM-DD hh:mm:ss - DIH

Hello

i want to index some datefields with this dateformat: -mm-dd. Solr
thwows an exception like this: can not be represented as java.sql.Date

i am unsing ...transformer=DateFormatTransformer
and ...zeroDateTimeBehavoir=convertToNull

how can i say to DIH to convert this fields in correct format ?? thx

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores  100.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-convert--MM-DD-to-YYY-MM-DD-hh-mm-ss-DIH-tp2961481p2961481.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facetting: Some questions concerning method:fc

2011-05-19 Thread Yonik Seeley

On Thu, May 19, 2011 at 9:56 AM, Erik Fäßler erik.faess...@uni-jena.de wrote:
 I have a few questions concerning the field cache method for faceting.
 The wiki says for enum method: This was the default (and only) method for
 faceting multi-valued fields prior to Solr 1.4. . And for fc method: This
 was the default method for single valued fields prior to Solr 1.4. .
 I just ran into the problem of using fc for a field which can have multiple
 terms for one field. The facet counts would be wrong, seemingly only
 counting the first term in the field of each document. I observed this in
 Solr 1.4.1 and in 3.1 with the same index.

That doesn't sound right... the results should always be identical
between facet.method=fc and facet.method=enum. Are you sure you didn't
index a multi-valued field and then change the fieldType in the schema
to be single valued? Are you sure the field is indexed the way you
think it is?  If so, is there an easy way for someone to reproduce
what you are seeing?

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Re: sorting on date field in facet query

Oh, isn't that ducky. The facet.sort parameter only sorts ascending
as far as I can tell. Which is exactly the reverse of what you want.

Would it work to cleverly encode the facet field to do what you want
just by a lexical sort? Something like use a very large constant,
subtract the date for each record from that and then put that in a
new field that you facet/sort by? Then un-transform it for display? Let's
say you have a range from 0-9. Then your facet field could be
something like
original doc values
doc 1: 2 - oldest
doc 2: 5
doc 3: 8 - newest

You'd store values like these in facetme (9 - orig value) + text
doc1: 7_docid1
doc2: 4_docid2
doc3: 1_docid3

Now a natural ordering (facet.sort=index) wold return them in
date order. If this was a well-defined process you could easily
transform it back for proper display. Although watch out for
leading zeros!

Thinking off the top of my head here

Erick

On Thu, May 19, 2011 at 9:46 AM, Dmitry Kan dmitry@gmail.com wrote:
Hi,

Now, taking the Stefan's example, I would like to sort desc the facets by
date (yes, date boosting during the facet gathering process) that were
calculated against someStr field:

int name=T-AS_1386181
45
/int
int name=T-AS_1386229
54
/int

So SOLR facet component would ignore the counts and sort the facets by dates
desc (in reverse chronological order).

Is it possible to implement such a solution through some class inheritance
in facet component?

Regards,

Dmitry

On Thu, May 19, 2011 at 4:25 PM, Stefan Matheis
matheis.ste...@googlemail.com wrote:

Dmitry,

how should that work? Take a this short sample-data:

id | date
T-AS_1386229 | 1995-12-31T23:59:59Z
T-AS_1386181 | 1996-12-31T23:59:59Z
T-AS_1386229 | 1997-12-31T23:59:59Z

Perhaps we have to ask another Question .. what are you trying to
achieve? Boost by Date?

Regards
Stefan

On Thu, May 19, 2011 at 2:24 PM, Dmitry Kan dmitry@gmail.com wrote:
Hi Erick,

It is about ordering the facet information. The result set is empty via
rows=0.

Here is the logics and example:

Each doc has string field someStr and a date field associated with it,
and
same doc id has same value of the date field. Question: is it possible to
sort the facet values given below on that date field?

curl

http://localhost:8983/solr/select?q=someStr:networkfacet=truefacet.field=idfacet.limit=1000facet.mincount=1rows=0

result excerpt:

Regards,

Dmitry

On Wed, May 18, 2011 at 3:33 PM, Erick Erickson erickerick...@gmail.com
wrote:

Can you provide an example of what you are trying to do? Are you
referring to ordering the result set or the facet information?

Best
Erick

On Wed, May 18, 2011 at 7:21 AM, Dmitry Kan dmitry@gmail.com
wrote:
Hello list,

Is it possible to sort on date field in a facet query in SOLR 3.1?

--
Regards,

Dmitry Kan

Re: sorting on date field in facet query

Thanks Erick, this sounds solid to me!

It of course will require the repost of the entire index (pretty big one,
sharded), but that's not an issue as we periodically do that anyway.

Thanks and regards,

Dmitry

On Thu, May 19, 2011 at 5:08 PM, Erick Erickson erickerick...@gmail.comwrote:

Oh, isn't that ducky. The facet.sort parameter only sorts ascending
as far as I can tell. Which is exactly the reverse of what you want.

You'd store values like these in facetme (9 - orig value) + text
doc1: 7_docid1
doc2: 4_docid2
doc3: 1_docid3

Thinking off the top of my head here

Erick

On Thu, May 19, 2011 at 9:46 AM, Dmitry Kan dmitry@gmail.com wrote:
Hi,

Thanks for the questions, guys, and sorry for the confusion. I should
start
with a broader picture of what we are trying to achieve. The only problem
is
that I cannot speak about specifics of the task we are solving the way we
do. We currently sort the facets on the client side, having the date
values
at hand (done by an boolean query to SOLR with a list of ids). However,
sometimes we have glitches, that is since we limit the facets to first
facet.limit ones, and there is no date boosting we may have some facet
counts end up beyond the facet counts range and that's sad. One way
around
it would be to facet with pagination, where a page would correspond to a
date subrange in the range of required dates. But we haven't tried it yet
before we investigate what can be done inside SOLR (by modifying its
source
code, if needed).

So as said every solr doc that has some id in the solr index (this id is
used to combine several solr docs logically, only that purpose; this
design
comes from the task definition) has a date field, and the value of that
date
field is always same for a given doc id across all the solr docs with the
same doc id.

Now, taking the Stefan's example, I would like to sort desc the facets by
date (yes, date boosting during the facet gathering process) that were
calculated against someStr field:

int name=T-AS_1386181
45
/int
int name=T-AS_1386229
54
/int

So SOLR facet component would ignore the counts and sort the facets by
dates
desc (in reverse chronological order).

Is it possible to implement such a solution through some class
inheritance
in facet component?

Regards,

Dmitry

On Thu, May 19, 2011 at 4:25 PM, Stefan Matheis
matheis.ste...@googlemail.com wrote:

Dmitry,

how should that work? Take a this short sample-data:

id | date
T-AS_1386229 | 1995-12-31T23:59:59Z
T-AS_1386181 | 1996-12-31T23:59:59Z
T-AS_1386229 | 1997-12-31T23:59:59Z

Perhaps we have to ask another Question .. what are you trying to
achieve? Boost by Date?

Regards
Stefan

On Thu, May 19, 2011 at 2:24 PM, Dmitry Kan dmitry@gmail.com
wrote:
Hi Erick,

It is about ordering the facet information. The result set is empty
via
rows=0.

Here is the logics and example:

Each doc has string field someStr and a date field associated with it,
and
same doc id has same value of the date field. Question: is it possible
to
sort the facet values given below on that date field?

curl

http://localhost:8983/solr/select?q=someStr:networkfacet=truefacet.field=idfacet.limit=1000facet.mincount=1rows=0

result excerpt:

Regards,

Dmitry

On Wed, May 18, 2011 at 3:33 PM, Erick Erickson
erickerick...@gmail.com
wrote:

Can you provide an example of what you are trying to do? Are you
referring to ordering the result set or the facet information?

Re: lucene parser, negative OR operands

2011-05-19 Thread Jonathan Rochkind


On 5/18/2011 9:07 PM, Chris Hostetter wrote:

You could implement a parser like that relatively easily -- just make sure
you put a MatchAllDocsQuery in every BooleanQuery object thta you
construct, and only ever use the PROHIBITED and MANDATORY clause types
(never OPTIONAL) ...  the thing is, a parser like that isn't as useful
as you think it might be when dealing with search results.  OPTIONAL
clauses are where most of the useful factors of scoring documents ocme
into play.


Thanks for the background and ideas, very helpful.

Hmm, but what if it DID use OPTIONAL clause types but just turned 
all pure-negative clauses into the alternative combination with 
MatchAllDocsQuery ( *:* AND $pure_negative)?  Just like lucene query 
parser does now, but not only for top-level clauses. Seems like that 
might maintain the power of optional clauses for scoring, but still 
allow negative clauses to work the 'boolean logic' way people expect -- 
same rationale that has the query parser doing this at top-level, why 
not do it for sub-clauses as well? Does that have any promise do you think?


Jonathan

Re: how to convert YYYY-MM-DD to YYY-MM-DD hh:mm:ss - DIH

2011-05-19 Thread roySolr

Try this in your query:

TIME_FORMAT(timeDb, '%H:%i') as timefield

http://www.java2s.com/Tutorial/MySQL/0280__Date-Time-Functions/TIMEFORMATtimeformat.htm


--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-convert--MM-DD-to-YYY-MM-DD-hh-mm-ss-DIH-tp2961481p2961591.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: sorting on date field in facet query

2011-05-19 Thread kenf_nc

This is more a speculation than direction, I don't currently use Field
Collapsing but my take on it is that it returns the number of docs
collapsed. So instead of faceting could you do a search returning DocID,
collapsing on DocID sorting on date, then the count of collapsed docs
*should* match the facet count?

Just wondering.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/sorting-on-date-field-in-facet-query-tp2956540p2961612.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: sorting on date field in facet query

Hi,

1. Is it possible to produce the collapsed docs count in the same query?
2. What is the performance of Field Collapsing versus Facet Search?

Dmitry

On Thu, May 19, 2011 at 5:36 PM, kenf_nc ken.fos...@realestate.com wrote:

 This is more a speculation than direction, I don't currently use Field
 Collapsing but my take on it is that it returns the number of docs
 collapsed. So instead of faceting could you do a search returning DocID,
 collapsing on DocID sorting on date, then the count of collapsed docs
 *should* match the facet count?

 Just wondering.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/sorting-on-date-field-in-facet-query-tp2956540p2961612.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,

Dmitry Kan

Re: how to convert YYYY-MM-DD to YYY-MM-DD hh:mm:ss - DIH

did you mean something like this ? 

DATE_FORMAT(cp.field, '%Y-%m-%di %H:%i:%s') AS field ???

i think i need to add the timestamp to my date fields? or not ? 
why cannot DIH handle with this ? 

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores  100.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-convert--MM-DD-to-YYY-MM-DD-hh-mm-ss-DIH-tp2961481p2961684.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: sorting on date field in facet query

Oooh, that's clever

The glitch is that field collapsing is scheduled for 3.2, but that
probably means
the patch is close to being applicable to 3.1 but I don't know that for sure.

Erick

On Thu, May 19, 2011 at 10:36 AM, kenf_nc ken.fos...@realestate.com wrote:
 This is more a speculation than direction, I don't currently use Field
 Collapsing but my take on it is that it returns the number of docs
 collapsed. So instead of faceting could you do a search returning DocID,
 collapsing on DocID sorting on date, then the count of collapsed docs
 *should* match the facet count?

 Just wondering.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/sorting-on-date-field-in-facet-query-tp2956540p2961612.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to convert YYYY-MM-DD to YYY-MM-DD hh:mm:ss - DIH

Offhand, I don't think the problem is DIH since your stack trace
specifies a SQL error. What is the SQL you're using? And
the DIH configuration?

Best
Erick

On Thu, May 19, 2011 at 10:53 AM, stockii stock.jo...@googlemail.com wrote:
 did you mean something like this ?

 DATE_FORMAT(cp.field, '%Y-%m-%di %H:%i:%s') AS field ???

 i think i need to add the timestamp to my date fields? or not ?
 why cannot DIH handle with this ?

 -
 --- System 
 

 One Server, 12 GB RAM, 2 Solr Instances, 7 Cores,
 1 Core with 31 Million Documents other Cores  100.000

 - Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
 - Solr2 for Update-Request  - delta every Minute - 4GB Xmx
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-convert--MM-DD-to-YYY-MM-DD-hh-mm-ss-DIH-tp2961481p2961684.html
 Sent from the Solr - User mailing list archive at Nabble.com.

New release of Python/Solr library Sunburnt

2011-05-19 Thread Toby White

Hi,

I'd like to announce the release of a new version of my Python-Solr
library, sunburnt:
http://pypi.python.org/pypi/sunburnt/0.5

Documentation and tutorial examples are available at:
http://opensource.timetric.com/sunburnt/

and there's a mailing list for discussion at
http://groups.google.com/group/python-sunburnt

Sunburnt was written initially for use with the Timetric platform
(http://timetric.com) and is in use by several other internet-scale
sites.

Toby

-- 
http://timetric.com
2nd Floor, White Bear Yard, 144a Clerkenwell Road, London EC1R 5DF
phone: +44 20 3286 0677 (office), +44 7747 603618 (mobile)
twitter: @timetric, @tow21 | skype: tobyohwhite

Re: how to convert YYYY-MM-DD to YYY-MM-DD hh:mm:ss - DIH

entity name=foo pk=cp_id transformer=DateFormatTransformer
query=SELECT ...,
...some fields ...

cp.start_date_1,
cp.start_date_2,
cp.end_date_1,
cp.end_date_2,

.. some other fields ..

FROM ... 
/entity


that not works with fields with this value: -00-00 OR/AND 2011-05-18


id tried with:
field column=start_date_1 dateTimeFormat=-MM-dd'T'hh:mm:ss /


but solr say always that these fields have a wrong format ! i try my
sql-selects before i post it here ,-)

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores  100.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-convert--MM-DD-to-YYY-MM-DD-hh-mm-ss-DIH-tp2961481p2961787.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to convert YYYY-MM-DD to YYY-MM-DD hh:mm:ss - DIH

okay, i found the problem.


i put the fields two times in my data-config ;-)

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores  100.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-convert--MM-DD-to-YYY-MM-DD-hh-mm-ss-DIH-tp2961481p2961834.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Replication - replicated failed at the same time?

2011-05-19 Thread Stefan Matheis

Hm, anyone?

On Sat, May 14, 2011 at 7:11 PM, Stefan Matheis
matheis.ste...@googlemail.com wrote:
 Hi Guys,

 while working on the UI for Replication, i've got confused sometimes because
 of the following response (from /replication?command=details):

 ?xml version=1.0 encoding=UTF-8?
 response
  lst name=details
    lst name=slave
      !-- .. --
      str name=indexReplicatedAtSat May 14 16:25:53 UTC 2011/str
      arr name=indexReplicatedAtList
        strSat May 14 16:25:53 UTC 2011/str
      /arr
      str name=replicationFailedAtSat May 14 16:25:53 UTC 2011/str
      arr name=replicationFailedAtList
        strSat May 14 16:25:53 UTC 2011/str
      /arr
      !-- .. --
    /lst
  /lst
 /response

 To reproduce that: Start with Solr-Instance (with a clean index), trigger
 replication, abort fetch - look at the details.

 Does not really make sense to me? If it's okay .. please let me know: how 
 why - especially interested in how to display that information in the UI
 (Current State: http://files.mathe.is/solr-admin/10_replication.png).

 Regards
 Stefan

Re: Facetting: Some questions concerning method:fc

2011-05-19 Thread Erik Fäßler


 Am 19.05.2011 16:07, schrieb Yonik Seeley:

On Thu, May 19, 2011 at 9:56 AM, Erik Fäßlererik.faess...@uni-jena.de  wrote:

I have a few questions concerning the field cache method for faceting.
The wiki says for enum method: This was the default (and only) method for
faceting multi-valued fields prior to Solr 1.4. . And for fc method: This
was the default method for single valued fields prior to Solr 1.4. .
I just ran into the problem of using fc for a field which can have multiple
terms for one field. The facet counts would be wrong, seemingly only
counting the first term in the field of each document. I observed this in
Solr 1.4.1 and in 3.1 with the same index.

That doesn't sound right... the results should always be identical
between facet.method=fc and facet.method=enum. Are you sure you didn't
index a multi-valued field and then change the fieldType in the schema
to be single valued? Are you sure the field is indexed the way you
think it is?  If so, is there an easy way for someone to reproduce
what you are seeing?

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco
Thanks a lot for your help: Changing the field type to multiValued did 
the trick. The point is, I built the index using Lucene directly (I need 
to for some special manipulation of offsets and position increments). So 
my question is which requirements a Lucene field has to fulfill so 
Solr's faceting works correctly.
Particular question: In Lucene terms, what exactly is denoted by a 
multiValued field? I thought that would result in multiple Lucene 
Field instances with the same name for a single document. But I think my 
field has only one instance per document (but I could check that back).


Thanks again for your quick and helpful answer!

Erik

DIH Response

2011-05-19 Thread Savvas-Andreas Moysidis

Hello,

We have configured solr for delta processing through DIH and we kick off the
index request from within a batch process.
However, we somehow need to know whether our indexing request succeeded or
not because we want to be able to rollback a db transaction if that step
fails.

By looking at the SolrServer API we weren't able to find a method that could
help us with that, so the only solution we see is by constantly polling the
server and parsing the response for the idle or Rolledback words.

What we noticed though is that the response also contains a message saying
This response format is experimental.  It is likely to change in the
future.

Does this mean that we can't rely on this response to build our module? Is
there a better way?

Thank you,
Savvas

Similarity class for an individual field

2011-05-19 Thread Brian Lamb

Hi all,

Based on advice I received on a previous email thread, I applied patch
https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able to
apply a similarity class to certain fields but not all fields.

I ran the following commands:

$ cd your Solr trunk checkout dir
$ svn up
$ wget https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch
$ patch -p0 -i SOLR-2338.patch

And I did not get any errors. I then created my own SimilarityClass
listed below because it isn't very large:

package org.apache.lucene.misc;
import org.apache.lucene.search.DefaultSimilarity;

public class SimpleSimilarity extends DefaultSimilarity {
  public SimpleSimilarity() { super(); }
  public float idf(int dont, int care) { return 1; }
}

As you can see, it isn't very complicated. I'm just trying to remove
the idf from the scoring equation in certain cases.

Next, I make a change to the schema.xml file:

fieldType name=string_noidf class=solr.StrField
sortMissingLast=true omitNorms=true
  similarity class=org.apache.lucene.misc.SimpleSimilarity/
/fieldType

And apply that to the field in question:

field name=string_noidf multiValued=true type=string_noidf
indexed=true stored=true required=false omitNorms=true /

But I think something did not get applied correctly to the patch. I
restarted and did a full import but the scores are exactly the same.
Also, I tried using the existing SweetSpotSimilarity:
fieldType name=string_noidf class=solr.StrField
sortMissingLast=true omitNorms=true
  similarity class=org.apache.lucene.misc.SweetSpotSimilarity/
/fieldType

But the scores remained unchanged even in that case. At this point,
I'm not quite sure how to debug this to see whether the problem is
with the patch or the similarity class but given that the SweetSpot
similarity class didn't work either, I'm inclined to think it was a
problem with the patch.

Any thoughts on this one?

Thanks,

Brian Lamb

Re: Similarity class for an individual field

2011-05-19 Thread Brian Lamb

Also, I've tried adding:

similarity class=org.apache.lucene.misc.SweetSpotSimilarity/

To the end of the schema file so that it is applied globally but it does not
appear to change the score either. What am I doing incorrectly?

Thanks,

Brian Lamb

On Thu, May 19, 2011 at 2:45 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Hi all,

 Based on advice I received on a previous email thread, I applied patch
 https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able to
 apply a similarity class to certain fields but not all fields.

 I ran the following commands:

 $ cd your Solr trunk checkout dir
 $ svn up
 $ wget 
 https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch
 $ patch -p0 -i SOLR-2338.patch

 And I did not get any errors. I then created my own SimilarityClass listed 
 below because it isn't very large:

 package org.apache.lucene.misc;
 import org.apache.lucene.search.DefaultSimilarity;

 public class SimpleSimilarity extends DefaultSimilarity {
   public SimpleSimilarity() { super(); }

   public float idf(int dont, int care) { return 1; }
 }

 As you can see, it isn't very complicated. I'm just trying to remove the idf 
 from the scoring equation in certain cases.

 Next, I make a change to the schema.xml file:

 fieldType name=string_noidf class=solr.StrField sortMissingLast=true 
 omitNorms=true

   similarity class=org.apache.lucene.misc.SimpleSimilarity/
 /fieldType

 And apply that to the field in question:

 field name=string_noidf multiValued=true type=string_noidf 
 indexed=true stored=true required=false omitNorms=true /

 But I think something did not get applied correctly to the patch. I restarted 
 and did a full import but the scores are exactly the same. Also, I tried 
 using the existing SweetSpotSimilarity:
 fieldType name=string_noidf class=solr.StrField sortMissingLast=true 
 omitNorms=true
   similarity class=org.apache.lucene.misc.SweetSpotSimilarity/

 /fieldType

 But the scores remained unchanged even in that case. At this point, I'm not 
 quite sure how to debug this to see whether the problem is with the patch or 
 the similarity class but given that the SweetSpot similarity class didn't 
 work either, I'm inclined to think it was a problem with the patch.

 Any thoughts on this one?

 Thanks,

 Brian Lamb

Re: Similarity class for an individual field

2011-05-19 Thread Brian Lamb

I tried editing the SweetSpotSimilarity class located at
lucene/contrib/misc/src/java/org/apache/lucene/misc/SweetSpotSimilarity.java
to just return 1 for each function and the score does not change at all.
This has led me to believe that it does not recognize similarity at all. At
this point, all I have for similarity is the line at the end of the file to
apply similarity to all searches but that does not even work. So where am I
going wrong?

Thanks,

Brian Lamb

On Thu, May 19, 2011 at 3:41 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Also, I've tried adding:

 similarity class=org.apache.lucene.misc.SweetSpotSimilarity/

 To the end of the schema file so that it is applied globally but it does
 not appear to change the score either. What am I doing incorrectly?

 Thanks,

 Brian Lamb

 On Thu, May 19, 2011 at 2:45 PM, Brian Lamb brian.l...@journalexperts.com
  wrote:

 Hi all,

 Based on advice I received on a previous email thread, I applied patch
 https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able
 to apply a similarity class to certain fields but not all fields.

 I ran the following commands:

 $ cd your Solr trunk checkout dir
 $ svn up
 $ wget 
 https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch
 $ patch -p0 -i SOLR-2338.patch

 And I did not get any errors. I then created my own SimilarityClass listed 
 below because it isn't very large:

 package org.apache.lucene.misc;
 import org.apache.lucene.search.DefaultSimilarity;

 public class SimpleSimilarity extends DefaultSimilarity {
   public SimpleSimilarity() { super(); }


   public float idf(int dont, int care) { return 1; }
 }

 As you can see, it isn't very complicated. I'm just trying to remove the idf 
 from the scoring equation in certain cases.


 Next, I make a change to the schema.xml file:

 fieldType name=string_noidf class=solr.StrField sortMissingLast=true 
 omitNorms=true


   similarity class=org.apache.lucene.misc.SimpleSimilarity/
 /fieldType

 And apply that to the field in question:

 field name=string_noidf multiValued=true type=string_noidf 
 indexed=true stored=true required=false omitNorms=true /


 But I think something did not get applied correctly to the patch. I 
 restarted and did a full import but the scores are exactly the same. Also, I 
 tried using the existing SweetSpotSimilarity:
 fieldType name=string_noidf class=solr.StrField sortMissingLast=true 
 omitNorms=true
   similarity class=org.apache.lucene.misc.SweetSpotSimilarity/


 /fieldType

 But the scores remained unchanged even in that case. At this point, I'm not 
 quite sure how to debug this to see whether the problem is with the patch or 
 the similarity class but given that the SweetSpot similarity class didn't 
 work either, I'm inclined to think it was a problem with the patch.


 Any thoughts on this one?

 Thanks,

 Brian Lamb

Re: solr sorting on multiple conditions, please help

2011-05-19 Thread Chris Hostetter


: sort=query({!v=area_id: 78153}) desc, score desc
: 
: What I want to achieve is sort by if there is a match with area_id, then
: sort by the actual score

I think you can use the map function here to map all scores greater then 
zero (matching docs) to some fixed value.  something like this should 
work...

qq=area_id:78153
sort=map(query($qq,-1),0,,1) desc, score desc

http://wiki.apache.org/solr/FunctionQuery#map

-Hoss

How to get Error caught in SOLR layer to SOLRj layer

2011-05-19 Thread geeta...@gmail.com

Hi,

I have a code logic to push documents to SOLR using SOLRj APIs.
Due to an error in schema, i get appropriate error in SOLR logs printed in
catalina.log inside tomcat. Here is a snippet:

SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values
encountered for non multiValued copy field suggestion: E:\Files\lpsimdev.inf
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:288)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.solr.servlet.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:104)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
at
org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
at 
org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
at java.lang.Thread.run(Unknown Source)


But in my JAVA logs, i simply get this snippet:
###  13  05/19 17:27:52:333  ###  Runner@9be1041:: (SOLR failed with
SolrException for DocId = [2dac611a5bb7ce87831dc0245ffcb66a] and detailed
Exception: [org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

request:
http://dm2search2.dm2.commvault.com:27000/solr/update/extract?fmap.content=bodyliteral.contentid=2dac611a5bb7ce87831dc0245ffcb66aliteral.jid=5literal.afln=27009287literal.conv=lpsimdev.infliteral.cvowner=SX1X5X32X544literal.cvreadacls=SX1X5X32X544;SX1X5X18;SX1X5X32X544;SX1X5X32X545literal.mtmstr=1000502032literal.afofstr=43
02 25 29 17
literal.bktm=2011-5-19T14:56:3Zliteral.mtm=2001-9-14T21:13:52Zliteral.afof=4302252917literal.atyp=33literal.clid=2literal.cijid=22literal.afid=1literal.szkb=822literal.ccn=-1literal.apid=6literal.url=E:\Files\lpsimdev.infliteral.cistate=1wt=javabinversion=2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:436)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245)
at
org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer.request(StreamingUpdateSolrServer.java:202)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:33)
at
com.commvault.commclient.ciengine.CVRequestWrapper.processRequests(CVRequestWrapper.java:551)
at
com.commvault.commclient.ciengine.solr.SOLRHTTPConnector$Runner.run(SOLRHTTPConnector.java:692)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

How can i get the same error in my solrj side, so that i can debug easily?

Thanks a lot for your time  help,
Geeta

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-Error-caught-in-SOLR-layer-to-SOLRj-layer-tp2963446p2963446.html
Sent from the Solr - User mailing list archive at Nabble.com.

Help, Data Import not indexing in solr.

2011-05-19 Thread fredylee

Newbie at SOLR,

When I ran through my test data config, it was able to find my 91 rows
sample test.  However, it didn't add any into my index.

Can someone help me and tell me why?   

Please find the data config below:

dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://localhost\TESTSERVER:4317;databaseName=Northwind;user=sa;password=datapassword
/
document
entity name=Customers query=select * from Customers
field column=CustomerID name=customerid /
field column=CompanyName name=companyname /
field column=ContactName name=contactname /
field column=Address name=address /
field column=City name=city /
field column=ContactTitle name=contacttitle /
   
/entity
/document
/dataConfig


Here is the result when I run http://localhost:8983/solr/dataimport? (after
I ran the full import)
response
lst name=responseHeader
int name=status0/int
int name=QTime15/int
/lst
lst name=initArgs
lst name=defaults
str name=configdataconfig.xml/str
/lst
/lst
str name=statusidle/str
str name=importResponse/
lst name=statusMessages
str name=Total Requests made to DataSource1/str
str name=Total Rows Fetched91/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2011-05-19 15:09:56/str
str name=
Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
/str
str name=Committed2011-05-19 15:09:57/str
str name=Optimized2011-05-19 15:09:57/str
str name=Total Documents Processed0/str
str name=Time taken 0:0:1.765/str
/lst
str name=WARNING
This response format is experimental. It is likely to change in the future.
/str
/response

Please help. 

Thx.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Data-Import-not-indexing-in-solr-tp2963450p2963450.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting on distance in Solr: how do you generate links that search withing a given range of distance?

2011-05-19 Thread Chris Hostetter


: It is fairly simple to generate facets for ranges or 'buckets' of
: distance in Solr:
: http://wiki.apache.org/solr/SpatialSearch#How_to_facet_by_distance.
: What isnt described is how to generate the links for these facets

any query you specify in a facet.query to generate a constraint count can 
be specified in an fq to actaully apply that constraint.

So if you use...
   facet.query={!frange l=5.001 u=3000}geodist()

...to get a count of 34 and the user wants to constrain to those docs, 
you would add...

   fq={!frange l=5.001 u=3000}geodist()

...to the query to do that.


-Hoss

Re: Embedded Solr Optimize under Windows

2011-05-19 Thread Chris Hostetter


: Thanks for the reply. I'm at home right now, or I'd try this myself, but is
: the suggestion that two optimize() calls in a row would resolve the issue?

it might ... I think the situations in which it happens have evolved a bit 
over the years as IndexWRiter has gotten smarter about knowing when it 
really needs to touch the disk to reduce IO.

there's a relatively new explicit method (IndexWriter.deleteUnusedFiles) 
that can force this...

https://issues.apache.org/jira/browse/LUCENE-2259

...but it's only on trunk, and there isn't any user level hook for it in 
Solr yet (i opened SOLR-2532 to consider adding it)


-Hoss

Re: SOLR Custom datasource integration

2011-05-19 Thread Lance Norskog

What is JPA?

You are better off pulling from JPA yourself than coding with the
DataImportHandler. It will be much easier.

EmbeddedSolr is just like web solr: when you commit data it is on the
disk. If you crash during indexing, it may or may not be available to
commit. EmbeddedSolr does not do anything special with index storage.

Lance

On Thu, May 19, 2011 at 2:08 AM, amit.b@gmail.com
amit.b@gmail.com wrote:
Hi,

We are trying build enterprise search solution using SOLR , out data source
is Database which is interfaced with JPA.

Solution looks like

SOLR INDEX JPA Oracle database.

We need help to findout what is the best approch integrate Solr Index with
JPA.

We tried out two appoches

Approch 1 -
1 Polulating SolrInputDocument with data from JPA
2 Updating EmbeddedSolrServer with captured data using SolrJ API.

Approch 2 -
1 Customizing dataimporthandler of HTTPSolrServer
2 Retrieving data in dataimporthandler using JPA entity.

Functional requirement -
1 Solution should be performant for huge magnitude of data
2 Should be scalable

We have few question which will help us to decide solution
Will like know which one is better approch to meet our requirement.
Is it good idea to integrate with Lucene against using EmbeddedSolrServer +
JPA
If JVM is crashes , EmbeddedSolrServer content will be lost on reboot.
Can we get support from Jasper Experts team ? can we buy ? how ?

--
View this message in context:
http://lucene.472066.n3.nabble.com/SOLR-Custom-datasource-integration-tp2960475p2960475.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Lance Norskog
goks...@gmail.com

Re: Embedded Solr Optimize under Windows

2011-05-19 Thread Greg Pendlebury

Ahh, thanks. I might try a basic commit() then and see, although it's not a
huge deal for me. It occurred to me that two optimize() calls would probably
leave exactly the same problem behind.

On 20 May 2011 09:52, Chris Hostetter hossman_luc...@fucit.org wrote:


 : Thanks for the reply. I'm at home right now, or I'd try this myself, but
 is
 : the suggestion that two optimize() calls in a row would resolve the
 issue?

 it might ... I think the situations in which it happens have evolved a bit
 over the years as IndexWRiter has gotten smarter about knowing when it
 really needs to touch the disk to reduce IO.

 there's a relatively new explicit method (IndexWriter.deleteUnusedFiles)
 that can force this...

 https://issues.apache.org/jira/browse/LUCENE-2259

 ...but it's only on trunk, and there isn't any user level hook for it in
 Solr yet (i opened SOLR-2532 to consider adding it)


 -Hoss

Mysql vs Postgres DIH

2011-05-19 Thread antonio

Hi, 
i make the same query to import my data with mysql and postgres.
But only postgres index all data (17090).
While Mysql index 17086, after 197085, after 17087... never 17090. But the
response tell me that it has skipped 0 documents. I don't understand!

Help me please, i woul to use Mysql for my application...

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mysql-vs-Postgres-DIH-tp2963822p2963822.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Mysql vs Postgres DIH

2011-05-19 Thread antonio

Excuse me, i wrong to write 197085, correct is 17085. But never the same
count...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mysql-vs-Postgres-DIH-tp2963822p2963824.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: error while doing full import

thank you dan... i have checked the code that produces XML for solr and then
fixed nbsp problem 

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/error-while-doing-full-import-tp2951185p2963832.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Too slow indexing while using 2 different data sources

hi Gora,

i guess you are right, i have checked and url seems serving data slowly...
maybe its because of the crappy test env too...

thank you so much

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-slow-indexing-while-using-2-different-data-sources-tp2959551p2963833.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: SOLR-2209

2011-05-19 Thread Jean-Sebastien Vachon

I'm using Solr 1.4...

I thought I had a case without a NOT but it seems to work now :S
It might be a glitch on my server.

The problem is easily reproducible with the NOT operator

http://10.0.5.221:8983/jobs/select?q=title:java%20AND%20(-title:programmer)
http://10.0.5.221:8983/jobs/select?q=title:java%20AND%20(-(title:programmer)
)

both queries returns 0 results while...

http://10.0.5.221:8983/jobs/select?q=title:java%20AND%20-(title:programmer)
(note the position of the negation operator)

returns more than 50 000 results

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: May-19-11 9:53 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR-2209

What version of Solr are you using? Because this works fine for me.

Could you attach the results of adding debugQuery=on in both instances?
The parsed form of the query is identical in 1.4.1 as far as I can tell. The
bug you're referencing is a peculiarity of the not (-) operator I think.

Best
Erick

On Thu, May 19, 2011 at 7:25 AM, Jean-Sebastien Vachon
jean-sebastien.vac...@wantedtech.com wrote:
 Hi All,

 I am having some problems with the presence of unnecessary  parenthesis in
my query.
 A query such as:
                title:software AND (title:engineer) will return no 
 results. Remove the parenthesis fix the issue but then since my user can
enter the parenthesis by himself I need to find a way to fix or work-around
this bug. I found that this is related to SOLR-2209 but there is no activity
on this bug.

 Anyone know if this will get fixed some time in the future or if it is
already fixed in Solr 4?

 Otherwise, could someone point me to the code handling this so that I can
attempt to make a fix?

 Thx

Re: Problem about Solrj

you mean you have change the code of the solr admin page to remove all
indexes?  and also, when you by indexes are gone you mean they are deleted
or solr sees no indexes when you run it? a little bit confusing post :)

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-about-Solrj-tp2952009p2963901.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem about Solrj