from:"Martijn v Groningen"

Re: Exception using distributed field-collapsing

2012-06-20 Thread Martijn v Groningen

Hi Bryan,

What is the fieldtype of the groupField? You can only group by field
that is of type string as is described in the wiki:
http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters

When you group by another field type a http 400 should be returned
instead if this error. At least that what I'd expect.

Martijn

On 20 June 2012 20:37, Bryan Loofbourrow
bloofbour...@knowledgemosaic.com wrote:
 I am doing a search on three shards with identical schemas (I
 double-checked!), using the group feature, and Solr/Lucene 3.5. Solr is
 giving me back the exception listed at the bottom of this email:



 Other information:



 My schema uses the following field types: StrField, DateField,
 TrieDateField, TextField, SortableInt, SortableLong, BoolField



 My query looks like this (I’ve messed with it to anonymize but, I hope,
 kept the essentials:



 http://[solr core2] /select/?start=0rows=25q={!qsol}machinessort=[sort
 field] fl=[list of fields] shards=[solr core1]%2c[solr core2]%2c[solr
 core3]group=truegroup.field=[group field]



 java.lang.ClassCastException: java.util.Date cannot be cast to 
 java.lang.String

        at 
 org.apache.lucene.search.FieldComparator$StringOrdValComparator.compareValues(FieldComparator.java:844)

        at 
 org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:180)

        at 
 org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:154)

        at java.util.TreeMap.put(TreeMap.java:547)

        at java.util.TreeSet.add(TreeSet.java:255)

        at 
 org.apache.lucene.search.grouping.SearchGroup$GroupMerger.updateNextGroup(SearchGroup.java:222)

        at 
 org.apache.lucene.search.grouping.SearchGroup$GroupMerger.merge(SearchGroup.java:285)

        at 
 org.apache.lucene.search.grouping.SearchGroup.merge(SearchGroup.java:340)

        at 
 org.apache.solr.search.grouping.distributed.responseprocessor.SearchGroupShardResponseProcessor.process(SearchGroupShardResponseProcessor.java:77)

        at 
 org.apache.solr.handler.component.QueryComponent.handleGroupedResponses(QueryComponent.java:565)

        at 
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:548)

        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289)

        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)

        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)

        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)

        at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)

        at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)

        at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)

        at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

        at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)

        at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)

        at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)

        at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)

        at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

        at java.lang.Thread.run(Thread.java:679)



 Any thoughts or advice?



 Thanks,



 -- Bryan



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread Martijn v Groningen

The ngroups returns the number of groups that have matched with the
query. However if you want ngroups to be correct in a distributed
environment you need
to put document belonging to the same group into the same shard.
Groups can't cross shard boundaries. I guess you need to do
some manual document partitioning.

Martijn

On 11 June 2012 14:29, Nitesh Nandy niteshna...@gmail.com wrote:
 Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices and
 2 shards)

 The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud

 We are doing distributed search. While querying, we use field collapsing
 with ngroups set as true as we need the number of search results.

 However, there is a difference in the number of result list returned and
 the ngroups value returned.

 Ex:
 http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3group=truegroup.field=idgroup.ngroups=true


 The response XMl looks like

 response
 script/
 lst name=responseHeader
 int name=status0/int
 int name=QTime46/int
 lst name=params
 str name=group.fieldid/str
 str name=group.ngroupstrue/str
 str name=grouptrue/str
 str name=qmessagebody:monit AND usergroupid:3/str
 /lst
 /lst
 lst name=grouped
 lst name=id
 int name=matches10/int
 int name=ngroups9/int
 arr name=groups
 lst
 str name=groupValue320043/str
 result name=doclist numFound=1 start=0
 doc.../doc
 /result
 /lst
 lst
 str name=groupValue398807/str
 result name=doclist numFound=5 start=0 maxScore=2.4154348...
 /result
 /lst
 lst
 str name=groupValue346878/str
 result name=doclist numFound=2 start=0.../result
 /lst
 lst
 str name=groupValue346880/str
 result name=doclist numFound=2 start=0.../result
 /lst
 /arr
 /lst
 /lst
 /response

 So you can see that the ngroups value returned is 9 and the actual number
 of groups returned is 4

 Why do we have this discrepancy in the ngroups, matches and actual number
 of groups. Is this an open issue ?

  Any kind of help is appreciated.

 --
 Regards,

 Nitesh Nandy



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Grouping ngroups count

2012-05-03 Thread Martijn v Groningen

Hi Francois,

The issue you describe looks like a similar issue we have fixed before
with matches count.
Open an issue and we can look into it.

Martijn

On 1 May 2012 20:14, Francois Perron
francois.per...@wantedanalytics.com wrote:
 Thanks for your response Cody,

  First, I used distributed grouping on 2 shards and I'm sure then all 
 documents of each group are in the same shard.

 I take a look on JIRA issue and it seem really similar.  There is the same 
 problem with group.ngroups.  The count is calculated in second pass so we 
 only had result from useful shards and it's why when I increase rows limit 
 i got the right count (they must use all my shards).

 Except it's a feature (i hope not), I will create a new JIRA issue for this.

 Thanks

 On 2012-05-01, at 12:32 PM, Young, Cody wrote:

 Hello,

 When you say 2 slices, do you mean 2 shards? As in, you're doing a 
 distributed query?

 If you're doing a distributed query, then for group.ngroups to work you need 
 to ensure that all documents for a group exist on a single shard.

 However, what you're describing sounds an awful lot like this JIRA issue 
 that I entered a while ago for distributed grouping. I found that the hit 
 count was coming only from the shards that ended up having results in the 
 documents that were returned. I didn't test group.ngroups at the time.

 https://issues.apache.org/jira/browse/SOLR-3316

 If this is a similar issue then you should make a new Jira issue.

 Cody

 -Original Message-
 From: Francois Perron [mailto:francois.per...@wantedanalytics.com]
 Sent: Tuesday, May 01, 2012 6:47 AM
 To: solr-user@lucene.apache.org
 Subject: Grouping ngroups count

 Hello all,

  I tried to use grouping with 2 slices with a index of 35K documents.  When 
 I ask top 10 rows, grouped by filed A, it gave me about 16K groups.  But, if 
 I ask for top 20K rows, the ngroups property is now at 30K.

 Do you know why and of course how to fix it ?

 Thanks.




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Solr Cloud vs sharding vs grouping

2012-04-20 Thread Martijn v Groningen

Hi Jean-Sebastien,

For some grouping features (like total group count and grouped
faceting), the distributed grouping requires you to partition your
documents into the right shard. Basically groups can't cross shards.
Otherwise the group counts or grouped facet counts may not be correct.
If you use the basic grouping functionality then this limitation
doesn't apply.

I think right now that SolrCloud partitions documents based on the
unique id (id % number_shards). You need to modify this somehow or
maybe do the distributed indexing yourself.

Martijn

On 20 April 2012 12:07, Lance Norskog goks...@gmail.com wrote:
 The implementation of grouping in the trunk is completely different
 from 236. Grouping works across distributed search:
 https://issues.apache.org/jira/browse/SOLR-2066

 committed last September.

 On Thu, Apr 19, 2012 at 6:04 PM, Jean-Sebastien Vachon
 jean-sebastien.vac...@wantedanalytics.com wrote:
 Hi All,

 I am currently trying out SolrCloud on a small cluster and I'm enjoying Solr 
 more than ever. Thanks to all the contributors.

 That being said, one very important feature for us is the 
 grouping/collapsing of results on a specific field value on a distributed 
 index. We are currently using Solr 1.4 with Patch 236 and it does the job as 
 long as all documents with a common field value are on the same shard. 
 Otherwise grouping on a distributed index will not work as expected.

 I looked everywhere if this limitation was still present in the trunk but 
 found no mention of it.
 Is this still a requirement for grouping results on a distributed index?

 Thanks




 --
 Lance Norskog
 goks...@gmail.com



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: To truncate or not to truncate (group.truncate vs. facet)

2012-04-09 Thread Martijn v Groningen

The group.facet option only works for field facets (facet.field). Others
facets types (query, range and pivot) aren't supported yet.
The group.facet works for both single and multivalued fields specified in
the facet.field parameter.

Martijn

On 9 April 2012 20:58, danjfoley d...@micamedia.com wrote:

 I am using group.facet and it works fine for regular facet.field but not
 for facet.query

 Sent from my phone

 - Reply message -
 From: Young, Cody [via Lucene] ml-node+s472066n3897487...@n3.nabble.com
 
 Date: Mon, Apr 9, 2012 1:38 pm
 Subject: To truncate or not to truncate (group.truncate vs. facet)
 To: danjfoley d...@micamedia.com



 One other thing, I believe that you need to be using facet.field on single
 valued string fields for group.facet to function properly. Are the fields
 you're faceting on multiValued=false?

 Cody

 -Original Message-
 From: Young, Cody [mailto:cody.yo...@move.com]
 Sent: Monday, April 09, 2012 10:36 AM
 To: solr-user@lucene.apache.org
 Subject: RE: To truncate or not to truncate (group.truncate vs. facet)

 You tried adding the parameter

 group.facet=true ?

 Cody

 -Original Message-
 From: danjfoley [mailto:d...@micamedia.com]
 Sent: Monday, April 09, 2012 10:09 AM
 To: solr-user@lucene.apache.org
 Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

 I did get this working with version 4. However my facet queries still
 don't group.

 Sent from my phone

 - Reply message -
 From: Young, Cody [via Lucene] ml-node+s472066n3897366...@n3.nabble.com
 
 Date: Mon, Apr 9, 2012 12:45 pm
 Subject: To truncate or not to truncate (group.truncate vs. facet)
 To: danjfoley d...@micamedia.com



 I believe you're looking for what's called, Matrix Counts

 Please see this JIRA issue. To my knowledge it has been committed in trunk
 but not 3.x.

 https://issues.apache.org/jira/browse/SOLR-2898

 This feature is accessed by using group.facet=true

 Cody

 -Original Message-
 From: danjfoley [mailto:d...@micamedia.com]
 Sent: Saturday, April 07, 2012 7:02 PM
 To: solr-user@lucene.apache.org
 Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

 I've been searching for a solution to my issue, and this seems to come
 closest to it. But not exactly.

 I am indexing clothing. Each article of clothing comes in many sizes and
 colors, and can belong to any number of categories.

 For example take the following: I add 6 documents to solr as follows:

 product, color, size, category

 shirt A, red, small, valentines day
 shirt A, red, large, valentines day
 shirt A, blue, small, valentines day
 shirt A, blue, large, valentines day
 shirt A, green, small, valentines day
 shirt A, green, large, valentines day

 I'd like my facet counts to return as follows:

 color

 red (1)
 blue (1)
 green (1)

 size

 small (1)
 large (1)

 category

 valentines day (1)

 But they come back like this:

 color:
 red (2)
 blue (2)
 green (2)

 size:
 small (2)
 large (2)

 category
 valentines day (6)

 I see the group.facet parameter in version 4.0 does exactly this. However
 how can I make this happen now? There are all sorts of ecommerce systems
 out there that facet exactly how i'm asking. i thought solr is supposed to
 be the very best fastest search system, yet it doesn't seem to be able to
 facet correct for items with multiple values?

 Am i indexing my data wrong?

 how can i make this happen?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3893744.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 ___
 If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897366.html

 To unsubscribe from To truncate or not to truncate (group.truncate vs.
 facet, visit
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=ZGFuQG1pY2FtZWRpYS5jb218MzgzODc5N3wtMTEyNjQzODIyNg==

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897422.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 ___
 If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897487.html

 To unsubscribe from To truncate or not to truncate (group.truncate vs.
 facet, visit
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=ZGFuQG1pY2FtZWRpYS5jb218MzgzODc5N3wtMTEyNjQzODIyNg==

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897694.html

Re: Distributed grouping issue

2012-04-02 Thread Martijn v Groningen

The matches element in the response should return the number of documents
that matched with the query and not the number of groups.
Did you encountered this issue also with other Solr versions (3.5 or
another nightly build)?

Martijn

On 2 April 2012 09:41, fbrisbart fbrisb...@bestofmedia.com wrote:

 Hi,

 when you write I get xxx results, does it come from 'numFound' ? Or
 you really display xxx results ?
 When using both field collapsing and sharding, the 'numFound' may be
 wrong. In that case, think about using 'shards.rows' parameter with a
 high value (be careful, it's bad for performance).

 If the problem is really about the returned results, it may be because
 of several documents having the same unique key document_id in
 different shards.

 Hope it helps,
 Franck



 Le vendredi 30 mars 2012 à 23:52 +, Young, Cody a écrit :
  I forgot to mention, I can see the distributed requests happening in the
 logs:
 
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core2] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core2NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=2
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core4] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core1] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core1NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core3] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core3NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core0] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core0NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core6] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=0
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core7] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core7NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=3
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core5] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core5NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core4] webapp=/solr path=/select
 params={distrib=falsegroup.distributed.second=truewt=javabinversion=2rows=10group.topgroups.group_field=4183765296group.topgroups.group_field=4608765424group.topgroups.group_field=3524954944group.topgroups.group_field=4182445488group.topgroups.group_field=4213143392group.topgroups.group_field=4328299312group.topgroups.group_field=4206259648group.topgroups.group_field=3465497912group.topgroups.group_field=3554417600group.topgroups.group_field=3140802904fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=2
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core6] webapp=/solr path=/select
 params={distrib=falsegroup.distributed.second=truewt=javabinversion=2rows=10group.topgroups.group_field=4183765296group.topgroups.group_field=4608765424group.topgroups.group_field=3524954944group.topgroups.group_field=4182445488group.topgroups.group_field=4213143392group.topgroups.group_field=4328299312group.topgroups.group_field=4206259648group.topgroups.group_field=3465497912group.topgroups.group_field=3554417600group.topgroups.group_field=3140802904fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0

Re: Distributed grouping issue

2012-04-02 Thread Martijn v Groningen


 All documents of a group exist on a single shard, there are no cross-shard
 groups.

You only have to partition documents by group when the groupCount and some
other features need to be accurate. For the matches this is not
necessary. The matches are summed up during merging the shared responses.

I can't reproduce the error you are describing on a small local setup I
have here. I have two Solr cores with a simple schema. Each core has 3
documents. When grouping the matches element returns 6. I'm running on a
trunk that I have updated 30 minutes ago. Can you try to isolate the
problem by testing with a small subset of your data?

Martijn

Re: Distributed grouping issue

2012-04-02 Thread Martijn v Groningen

I tried the to reproduce this. However the matches always returns 4 in my
case (when using rows=1 and rows=2).
In your case the 2 documents on each core do belong to the same group,
right?

I did find something else. If I use rows=0 then an error occurs. I think we
need to further investigate this.
Can you open an issue in Jira? I'm a bit busy today. We can then further
look into this in the coming days.

Martijn

On 2 April 2012 23:00, Young, Cody cody.yo...@move.com wrote:

Okay, I've played with this a bit more. Found something interesting:

When the groups returned do not include results from a core, then the core
is excluded from the count. (I have 1 group, 2 documents per core)

Example:

http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=1

lst name=grouped
lst name=group_field
int name=matches2/int

Then, just by changing rows=2

http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=2

lst name=grouped
lst name=group_field
int name=matches4/int

Let me know if you have any luck reproducing.

Thanks,
Cody

-Original Message-
From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On
Behalf Of Martijn v Groningen
Sent: Monday, April 02, 2012 1:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Distributed grouping issue

All documents of a group exist on a single shard, there are no
cross-shard groups.

You only have to partition documents by group when the groupCount and some
other features need to be accurate. For the matches this is not
necessary. The matches are summed up during merging the shared responses.

I can't reproduce the error you are describing on a small local setup I
have here. I have two Solr cores with a simple schema. Each core has 3
documents. When grouping the matches element returns 6. I'm running on a
trunk that I have updated 30 minutes ago. Can you try to isolate the
problem by testing with a small subset of your data?

Martijn

--
Met vriendelijke groet,

Martijn van Groningen

Re: Grouping queries

2012-03-23 Thread Martijn v Groningen

On 22 March 2012 03:10, Jamie Johnson jej2...@gmail.com wrote:

 I need to apologize I believe that in my example I have too grossly
 over simplified the problem and it's not clear what I am trying to do,
 so I'll try again.

 I have a situation where I have a set of access controls say user,
 super user and ultra user.  These controls are not necessarily
 hierarchical in that user  super user  ultra user.  Each of these
 controls should only be able to see documents from with some
 combination of access controls they have.  In my actual case we have
 many access controls and they can be combined in a number of fashions
 so I can't simply constrain what they are searching by a query alone
 (i.e. if it's a user the query is auth:user AND (some query)).  Now I
 have a case where a document contains information that a user can see
 but also contains information a super user can see.  Our current
 system marks this document at the super user level and the user can't
 see it.  We now have a requirement to make the pieces that are at the
 user level available to the user while still allowing the super user
 to see and search all the information.  My original thought was to
 simply index the document twice, this would end up in a possible
 duplicate (say if a user had both user and super user) but since this
 situation is rare it may not matter.  After coming across the grouping
 capability in solr I figured I could execute a group query where we
 grouped on some key which indicated that 2 documents were the same
 just with different access controls (user and super user in this
 example).  We could then filter out the documents in the group the
 user isn't allowed to see and only keep the document with the access
 controls they have.

Maybe I'm not understanding this right... But why can't you save the access
controls
as a multivalued field in your schema? In your example your can then if the
current user is a normal user just query auth:user AND (query) and if the
current user
is a super user auth:superuser AND (query). A document that is searchable
for
both superuser and user is then returned (if it matches the rest of the
query).


 I hope this makes more sense, unfortunately the Join queries I don't
 believe will work because I don't think if I create documents which
 would be relevant to each access control I could search across these
 document as if it was a single document (i.e. search for something in
 the user document and something in the super user document in a single
 query).  This lead me to believe that grouping was the way to go in
 this case, but again I am very interested in any suggestions that the
 community could offer.

I wouldn't use grouping. The Solr join is still a option. Lets say you have
many access controls and the access controls change often on your documents.
You can then choose to store the access controls with an id to your logic
document
as a separate document in a different Solr Core (index). In the core were
your main
documents are you don't keep the access controls. You can then use the solr
join
to filter out documents that the current user isn't supposed to search.
Something like this:
q=(query)fq={!join fromIndex=core1 from=doc_id to=id}auth:superuser
Core 1 is the core containing the access control documents and the doc_id
is the id that
points to your regular documents.

The benefit of this approach is that if you fine tune the core1 for high
updatability you can
change you access controls very frequently without paying a big
performance penalty.

Re: Grouping queries

2012-03-23 Thread Martijn v Groningen


 Where is Join documented?  I looked at
 http://wiki.apache.org/solr/Join and see no reference to fromIndex.
 Also does this work in a distributed environment?

The fromIndex isn't documented in the wiki It is mentioned in the
issue and you can find in the Solr code:
https://issues.apache.org/jira/browse/SOLR-2272?focusedCommentId=13024918page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13024918


The Solr join only works in a distributed environment if you partition your
documents properly. Documents that link to each other need to reside on the
same shard and this can be a problem in some cases.

Martijn

Re: Grouping queries

2012-03-21 Thread Martijn v Groningen

I'm not sure if grouping is the right feature to use for your
requirements... Grouping does have an impact on performance which you need
to take into account.
Depending on what grouping features you're going to use (grouped facets,
ngroups), grouping performs well on large indices if you use filters
queries well.
(E.g. 100M travel offers, but during searching only interrested in in
travel offers to specific destinations or in a specific period of time).
Best way to find this out, is to just try it out.

On 21 March 2012 22:34, Jamie Johnson jej2...@gmail.com wrote:

 I was wondering how much more intensive grouping queries are in
 general.  I am considering using grouping queries as my primary query
 because I have the need to store a document as pieces with varying
 access controls, for instance a portion of a document a user can see
 but an admin can see the entire thing (I'm greatly simplifying this).
 My thought was to do a grouping request and group on a field which
 contained a key which the documents all shared, but I am worried about
 how well this will perform at scale.  Any thoughts/suggestions on this
 would be appreciated.




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Solr group witch minimum count in each group

2012-03-21 Thread Martijn v Groningen

Filtering results based on group count isn't supported yet. There is
already an issue created for this feature:
https://issues.apache.org/jira/browse/SOLR-3152

Martijn

On 21 March 2012 11:52, ViruS svi...@gmail.com wrote:

 Hi,

 I try to get all duplicated documents in my index.
 I have signature field and I now can take duplicates witch
 facet.filed=signature , but I want use only one request to get groups of
 duplicates documents
 at end I stop witch this
 query:
 /solr/select?q=*:*group=truegroup.field=signaturegroup.truncate=truegroup.ngroups=true
 I search somthing similar to MySQL query:
 SELECT * FROM table GROUP BY signature HAVING COUNT(*)2

 Thanks in advanced

 --
 Piotr (ViruS) Sikora
 svi...@gmail.com
 JID: vi...@hostv.pl




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: To truncate or not to truncate (group.truncate vs. facet)

2012-03-19 Thread Martijn v Groningen

Hi Rasmus,

You might want to use the group.facet parameter:
http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters

I think that will give you the right facet counts with faceting. The
parameter is not available in Solr 3.x, so you'll need to use a 4.0 nightly
build.

Martijn

On 19 March 2012 13:39, Erick Erickson erickerick...@gmail.com wrote:

 Groups and faceting are orthogonal and really have nothing to do with
 each other, so that might be where part of the problem lies.

 In your example, you can consider grouping by brand and count the
 *groups* returned, not elements within those groups. Then you're
 simply counting up the groups returned in the response packet. Note
 that this is NOT the stuff down in the facets section. You can still
 facet by color just as you are.

 Another possibility is to look at pivot faceting:
 https://issues.apache.org/jira/browse/SOLR-792
 but I confess I haven't explored this so don't really understand if it
 applies to your problem.

 Best
 Erick

 On Mon, Mar 19, 2012 at 5:22 AM, Rasmus Østergård r...@vertica.dk wrote:
  I have an index that contains variants of cars. In this small sample I
 have 2 car models (Audi and Volvo) and the Audi is available in black or
 white, whereas the Volvo is only available in black.
 
  On the search page I want to display products not variants - in this
 test case 2 products should be shown: Audi A4 and Volvo V50
 
  I'm having trouble achieving the desired count in my search facets.
 
  This is my test schema:
 
  fields
field name= variant_id type=string indexed=true stored=true
 required=true /
field name=sku type=string indexed=true stored=true /
field name=color type=string indexed=true stored=true/
   /fields
  uniqueKeyvariant_id/uniqueKey
 
  Test data:
 
  doc
  field name=skuAudi A4/field
  field name=brandaudi/field
  !-- Variant properties --
  field name=variant_idA4_black/field
  field name=colorblack/field
  /doc
  doc
  field name=skuAudi A4/field
  field name=brandaudi/field
  !-- Variant properties --
  field name=variant_idA4_white/field
  field name=colorwhite/field
  /doc
  doc
  field name=skuVolvo V50/field
  field name=brandvolvo/field
  !-- Variant properties --
  field name=variant_idVolvo_V50/field
  field name=colorblack/field
  /doc
 
  My goal is to to get this facet:
  brand
  -
  audi (1)
  volvo (1)
 
  color
  -
  black (2)
  white (1)
 
 
  I'm trying to use group.truncate to do this but fails.
 
  This is my search query when not truncating:
 
 /select?facet.field=colorfacet.field=branddefType=edixmaxfacet=truegroup=truegroup.field=skugroup.truncate=false
 
  brand
  -
  audi (2)
  volvo (1)
 
  color
  -
  black (2)
  white (1)
 
 
  This is my search query when doing truncating:
 
 /select?facet.field=colorfacet.field=branddefType=edixmaxfacet=truegroup=truegroup.field=skugroup.truncate=true
 
  brand
  -
  audi (1)
  volvo (1)
 
  color
  -
  black (2)
  white (0)
 
 
  As can be seen none of the 2 above match my desire.
  The problem seems to be that I want trucating on only selected facets. I
 this posible?
 
  Thanks!




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: JoinQuery and document score problem

2012-03-06 Thread Martijn v Groningen

Hi Stefan,

The score isn't moved from the from side to the to side and as far as
I know there isn't a way to configure the scoring of the joined documents.
The Solr join query isn't a real join (like in sql) and should be used as
filtering mechanism. The best way is to achieve that is to put the join
query inside a fq parameter.

Martijn

On 5 March 2012 14:01, Stefan Moises moi...@shoptimax.de wrote:

Hi list,

we are using the kinda new JoinQuery feature in Solr 4.x Trunk and are
facing a problem (and also Solr 3.5. with the JoinQuery patch applied) ...
We have documents with a parent - child relationship where a parent can
have any number of childs, parents being identified by the field parentid.

Now after a join (from the field parentid to id) to get the parent
documents only (and to filter out the variants/childs of the parent
documents), the document score gets lost - all the returned documents
have a score of 1.0 - if we remove the join from the query, the scores
are fine again. Here is an example call:

http://localhost:8983/solr4/**select?qt=dismaxq={!join%**
20from=parentid%20to=id}foo**fl=id,title,scorehttp://localhost:8983/solr4/select?qt=dismaxq=%7B!join%20from=parentid%20to=id%7Dfoofl=id,title,score

All the results now have a score of 1.0, which makes the order of
results pretty much random and the scoring therefore useless... :(
(the same applies for the standard query type, so it's not the dismax
parser)

I can't imagine this is expected behaviour...? Is there an easy way to
get the right scores for the joined documents (e.g. using the max. score
of the childs)? Can the scoring of joined documents be configured
somewhere / somehow?

Thanks a lot in advance,
best regards,
Stefan

--
Mit den besten Grüßen aus Nürnberg,
Stefan Moises

*
Stefan Moises
Senior Softwareentwickler
Leiter Modulentwicklung

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax: 0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
*

--
Met vriendelijke groet,

Martijn van Groningen

Re: Multiple Sort for Group/Folding

2012-01-11 Thread Martijn v Groningen

Hi Mauro,

During the first pass search the sort param is used to determine the top N
groups. Then during the second pass search the documents inside the top N
groups are sorted using the group.sort parameter. The group.sort doesn't
change how the groups them self are sorted.

Martijn

On 11 January 2012 08:03, Mauro Asprea mauroasp...@gmail.com wrote:

 Hi, I'm having some issues trying to sort my grouped results by more than
 one field. If I use just one, independently  of which I use it just work
 fine (I mean it sorts).

 I have a case that the first sorting key is equal for all the head docs of
 each group, so I expect to return the groups sorted by its second sorting
 key. But its not the case. Only sorts using the first key no matter what.

 This is my query:
 fq=type:Moviefq={!tag=cg01w4p3bcj3}date_start_dt:[2012\-01\-11T06\:47\:38Z
 TO 2012\-01\-12T08\:59\:59Z]fq=event_id_i:[1 TO *]sort=location_weight_i
 desc, weight_i descq=Village Avellanedafl=* scoreqf=name_texts
 location_name_textdefType=dismaxstart=0rows=12group=truegroup.field=event_id_str_sgroup.field=location_name_str_sgroup.sort=date_start_dt
 ascgroup.limit=10group.limit=1facet=truefacet.query={!ex=cg01w4p3bcj3}date_start_dt:[2012\-01\-11T06\:47\:38Z
 TO
 2012\-01\-12T08\:59\:59Z]facet.query={!ex=cg01w4p3bcj3}date_start_dt:[2012\-01\-12T09\:00\:00Z
 TO
 2012\-01\-13T08\:59\:59Z]facet.query={!ex=cg01w4p3bcj3}date_start_dt:[2012\-01\-13T21\:00\:00Z
 TO
 2012\-01\-16T08\:59\:59Z]facet.query={!ex=cg01w4p3bcj3}date_start_dt:[2012\-01\-11T06\:47\:38Z
 TO
 2012\-01\-18T12\:47\:38Z]facet.query={!ex=cg01w4p3bcj3}date_start_dt:[2012\-01\-11T06\:47\:38Z
 TO 2012\-02\-10T12\:47\:38Z]

 Using the last Solr release 3.5


 Thanks!
 --
 Mauro Asprea

 E-Mail: mauroasp...@gmail.com
 Mobile: +34 654297582
 Skype: mauro.asprea





-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Setting group.ngroups=true considerable slows down queries

2011-12-12 Thread Martijn v Groningen

Hi!

As as I know currently there isn't another way. Unfortunately the
performance degrades badly when having a lot of unique groups.
I think an issue should be opened to investigate how we can improve this...

Question: Does Solr have a decent chuck of heap space (-Xmx)? Because
grouping requires quite some heap space (also without
group.ngroups=true).

Martijn

On 9 December 2011 23:08, Michael Jakl jakl.mich...@gmail.com wrote:
 Hi!

 On Fri, Dec 9, 2011 at 17:41, Martijn v Groningen
 martijn.v.gronin...@gmail.com wrote:
 On what field type are you grouping and what version of Solr are you
 using? Grouping by string field is faster.

 The field is defined as follows:
 field name=signature type=string indexed=true stored=true /

 Grouping itself is quite fast, only computing the number of groups
 seems to increase significantly with the number of documents (linear).

 I was hoping for a faster solution to compute the total number of
 distinct documents (or in other terms, the number of distinct values
 in the signature field). Facets came to mind, but as far as I could
 see, they don't offer a total number of facets as well.

 I'm using Solr 3.5 (upgraded from Solr 3.4 without reindexing).

 Thanks,
 Michael

 On 9 December 2011 12:46, Michael Jakl jakl.mich...@gmail.com wrote:
 Hi, I'm using the grouping feature of Solr to return a list of unique
 documents together with a count of the duplicates.

 Essentially I use Solr's signature algorithm to create the signature
 field and use grouping on it.

 To provide good numbers for paging through my result list, I'd like to
 compute the total number of documents found (= matches) and the number
 of unique documents (= ngroups). Unfortunately, enabling
 group.ngroups considerably slows down the query (from 500ms to
 23000ms for a result list of roughly 30 documents).

 Is there a faster way to compute the number of groups (or unique
 values in the signature field) in the search result? My Solr instance
 currently contains about 50 million documents and around 10% of them
 are duplicates.

 Thank you,
 Michael



 --
 Met vriendelijke groet,

 Martijn van Groningen



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Setting group.ngroups=true considerable slows down queries

2011-12-12 Thread Martijn v Groningen

I'd not make a subtaks onder SOLR-236 b/c it is related to a
completely different implementation which was never committed.
SOLR-2205 is related to general result grouping and think should be closed.
I'd make a new issue for improving the performance of
group.ngroups=true when there are a lot of unique groups.

Martijn

On 12 December 2011 14:32, Michael Jakl jakl.mich...@gmail.com wrote:
 Hi!

 On Mon, Dec 12, 2011 at 13:57, Martijn v Groningen
 martijn.v.gronin...@gmail.com wrote:
 As as I know currently there isn't another way. Unfortunately the
 performance degrades badly when having a lot of unique groups.
 I think an issue should be opened to investigate how we can improve this...

 Question: Does Solr have a decent chuck of heap space (-Xmx)? Because
 grouping requires quite some heap space (also without
 group.ngroups=true).

 Thanks, for answering. The Server has gotten as much memory as the
 machine can afford (without swapping):
  -Xmx21g \
  -Xms4g \

 Shall I open an issue as a subtask of SOLR-236 even though there is
 already a performance related task (SOLR-2205)?

 Cheers,
 Michael



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Field collapsing results caching

2011-12-09 Thread Martijn v Groningen

There is no cross query cache for result grouping. The only caching
option out there is the group.cache.percent option:
http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters

Martijn

On 8 December 2011 14:29, Kissue Kissue kissue...@gmail.com wrote:
 Hi,

 I was just testing field collapsing in my solr admin on solr 3.5.0. I have
 observed that the results of field collapsing are not being cached unlike
 other solr query results. Am doing the same query multiple times and the
 time taken still remains approximately the same.  Is there something i need
 to configure?

 Thanks.



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Setting group.ngroups=true considerable slows down queries

2011-12-09 Thread Martijn v Groningen

Hi Micheal,

On what field type are you grouping and what version of Solr are you
using? Grouping by string field is faster.

Martijn

On 9 December 2011 12:46, Michael Jakl jakl.mich...@gmail.com wrote:
 Hi, I'm using the grouping feature of Solr to return a list of unique
 documents together with a count of the duplicates.

 Essentially I use Solr's signature algorithm to create the signature
 field and use grouping on it.

 To provide good numbers for paging through my result list, I'd like to
 compute the total number of documents found (= matches) and the number
 of unique documents (= ngroups). Unfortunately, enabling
 group.ngroups considerably slows down the query (from 500ms to
 23000ms for a result list of roughly 30 documents).

 Is there a faster way to compute the number of groups (or unique
 values in the signature field) in the search result? My Solr instance
 currently contains about 50 million documents and around 10% of them
 are duplicates.

 Thank you,
 Michael



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Group.ngroup parameter memory consumption

2011-11-12 Thread Martijn v Groningen

BTW this applies for 4.0-dev. In 3x the String instance from a
StringIndex is directly used, this is then put into a list. So there
is no extra object instance created per group matching the query.

Martijn

On 12 November 2011 08:49, Rafał Kuć r@solr.pl wrote:
 Hello!

 Thanks, that's what I was looking for :)

 --
 Regards,
  Rafał Kuć
  http://solr.pl

 The ngroup option collects per search the number of unique groups
 matching the query. Based on the collected groups it returns the
 count.
 So it depends of the number of groups matching the query.
 To get more in detail: per unique group a ByteRef instance is created
 to represent a group and this put into a ArrayList. I think you can
 use this to roughly calculate the memory used per request.

 Besides this the ngroup relies in most cases on the FieldCache which
 is also takes a decent amount of memory. But just like sorting this is
 cache is
 shared between requests.

 Martijn

 On 11 November 2011 22:34, Rafał Kuć r@solr.pl wrote:
 Hello!

 I was wondering if there is a way for calculating the memory
 consumption of group.ngroups parameter. I know the the answer can be
 'that depends', but what I'm actually wondering about is what the
 memory consumption depends on - number of documents returned by a
 query or number of groups ? Or maybe by it depends on another factor ?



 --
 Regards,
  Rafał Kuć













-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Group.ngroup parameter memory consumption

2011-11-11 Thread Martijn v Groningen

The ngroup option collects per search the number of unique groups
matching the query. Based on the collected groups it returns the
count.
So it depends of the number of groups matching the query.
To get more in detail: per unique group a ByteRef instance is created
to represent a group and this put into a ArrayList. I think you can
use this to roughly calculate the memory used per request.

Besides this the ngroup relies in most cases on the FieldCache which
is also takes a decent amount of memory. But just like sorting this is
cache is
shared between requests.

Martijn

On 11 November 2011 22:34, Rafał Kuć r@solr.pl wrote:
 Hello!

 I was wondering if there is a way for calculating the memory
 consumption of group.ngroups parameter. I know the the answer can be
 'that depends', but what I'm actually wondering about is what the
 memory consumption depends on - number of documents returned by a
 query or number of groups ? Or maybe by it depends on another factor ?



 --
 Regards,
  Rafał Kuć





-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Selective Result Grouping

2011-11-03 Thread Martijn v Groningen

Ok I think I get this. I think this can be achieved if one could
specify a filter inside a group and only documents that pass the
filter get grouped. For example only group documents with the value
image for the mimetype field. This filter should be specified per
group command. Maybe we should open an issue for this?

Martijn

On 1 November 2011 19:58, entdeveloper cameron.develo...@gmail.com wrote:

Martijn v Groningen-2 wrote:

When using the group.field option values must be the same otherwise
they don't get grouped together. Maybe fuzzy grouping would be nice.
Grouping videos and images based on mimetype should be easy, right?
Videos have a mimetype that start with video/ and images have a
mimetype that start with image/. Storing the mime type's subtype and
type in separate fields and group on the type field would do the job.
Off course you need to know the mimetype during indexing, but
solutions like Apache Tika can do that for you.

Not necessarily interested in grouping by mimetype (that's an analysis
issue). I simply used videos and images as an example.

I'm not sure what you mean by fuzzy grouping. But my goal is to have
collapse be more selective somehow on what gets grouped. As a more specific
example, I have a field called 'type', with the following possible field
values:

Type
--
image
video
webpage

Basically I want to be able to collapse all the images into a single result
so that they don't fill up the first page of the results. This is not
possible with the current grouping implementation because if you call
group.field=type, it'll group everything. I do not want to collapse videos
or webpages, only images.

I've attached a screenshot of google's srp to help explain what I mean.

http://lucene.472066.n3.nabble.com/file/n3471548/Screen_Shot_2011-11-01_at_11.52.04_AM.png

Hopefully that makes more sense. If it's still not clear I can email you
privately.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Selective-Result-Grouping-tp3391538p3471548.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Met vriendelijke groet,

Martijn van Groningen

Re: facet with group by (or field collapsing)

2011-11-03 Thread Martijn v Groningen

collapse.facet=after doesn't exists in Solr 3.3. This parameter exists
in the SOLR-236 patches and is implemented differently in the released
versions of Solr.
From Solr 3.4 you can use group.truncate. The facet counts are then
computed based on the most relevant documents per group.

Martijn

On 3 November 2011 22:47, erj35 eric.ja...@yale.edu wrote:
 I'm attempting the following query:

 http://{host}/apache-solr-3.3.0/select/?q=cesyversion=2.2start=0rows=10indent=ongroup=truegroup.field=SIPgroup.limit=1facet=truefacet.field=REPOSITORYNAME

 The result is 4 matches all in 1 group (with group.limit=1).  Rather than
 show facet.field=REPOSITORYNAME's 4 facets, I want to see the
 REPOSITORYNAMES facet with a count of 1 (for the 1 group returned) with the
 value of the REPOSITORYNAMES field in the 1 doc returned in the group. Is
 this possible? I tried adding the parameter collapse.facet=after, but that
 seemed to have no effect.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/facet-with-group-by-or-field-collapsing-tp497252p3478515.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: UnInvertedField vs FieldCache for facets for single-token text fields

2011-11-03 Thread Martijn v Groningen

Hi Micheal,

The FieldCache is an easier data structure and easier to create, so I
also expect it to be faster. Unfortunately for TextField
UnInvertedField
is always used even if you have one token per document. I think
overriding the multiValuedFieldCache method and return false would
work.

If you're using 4.0-dev (trunk) I'd use facet.method=fcs (this
parameter is only useable if multiValuedFieldCache method returns
false)
This is per segment faceting and the cache will only be extended for
new segments. This field facet approach is better for indexes with
frequent changes.
I think this even faster in your case then just using the FieldCache
method (which operates on a top level reader. After each commit the
complete cache is invalid and has to be recreated).

Otherwise I'd try facet.method=enum which is fast if you have fewer
distinct facet values (num of docs doesn't influence the performance
that much).
The facet.method=enum option is also valid for normal TextFields, so
no need to have custom code.

Martijn

On 3 November 2011 21:16, Michael Ryan mr...@moreover.com wrote:
 I have some fields I facet on that are TextFields but have just a single 
 token.
 The fieldType looks like this:

 fieldType name=myStringFieldType class=solr.TextField indexed=true
    stored=false omitNorms=true sortMissingLast=true
    positionIncrementGap=100
  analyzer
    tokenizer class=solr.KeywordTokenizerFactory/
  /analyzer
 /fieldType

 SimpleFacets uses an UnInvertedField for these fields because
 multiValuedFieldCache() returns true for TextField. I tried changing the type 
 for
 these fields to the plain string type (StrField). The facets *seem* to be
 generated much faster. Is it expected that FieldCache would be faster than
 UnInvertedField for single-token strings like this?

 My goal is to make the facet re-generation after a commit as fast as 
 possible. I
 would like to continue using TextField for these fields since I have a need 
 for
 filters like LowerCaseFilterFactory, which still produces a single token. Is 
 it
 safe to extend TextField and have multiValuedFieldCache() return false for 
 these
 fields, so that UnInvertedField is not used? Or is there a better way to
 accomplish what I'm trying to do?

 -Michael




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: joins and filter queries effecting scoring

2011-10-28 Thread Martijn v Groningen

Have your tried using the join in the fq instead of the q?
Like this (assuming user_id_i is a field in the post document type and
self_id_i a field in the user document type):
q=posts_text:hellofq={!join from=self_id_i
to=user_id_i}is_active_boolean:true

In this example the fq produces a docset that contains all user
documents that are active. This docset is used as filter during the
execution of the main query (q param),
so it only returns posts with the contain the text hello for active users.

Martijn

On 28 October 2011 01:57, Jason Toy jason...@gmail.com wrote:
 Does anyone have any idea on this issue?

 On Tue, Oct 25, 2011 at 11:40 AM, Jason Toy jason...@gmail.com wrote:

 Hi Yonik,

 Without a Join I would normally query user docs with:
 q=data_text:testfq=is_active_boolean:true

 With joining users with posts, I get no no results:
 q={!join from=self_id_i
 to=user_id_i}data_text:testfq=is_active_boolean:truefq=posts_text:hello



 I am able to use this query, but it gives me the results in an order that I
 dont want(nor do I understand its order):
 q={!join from=self_id_i to=user_id_i}data_text:test AND
 is_active_boolean:truefq=posts_text:hello

 I want the order to be the same as I would get from my original
 q=data_text:testfq=is_active_boolean:true, but with the ability to join
 with the Posts docs.





 On Tue, Oct 25, 2011 at 11:30 AM, Yonik Seeley yo...@lucidimagination.com
  wrote:

 Can you give an example of the request (URL) you are sending to Solr?

 -Yonik
 http://www.lucidimagination.com



 On Mon, Oct 24, 2011 at 3:31 PM, Jason Toy jason...@gmail.com wrote:
  I have 2 types of docs, users and posts.
  I want to view all the docs that belong to certain users by joining
 posts
  and users together.  I have to filter the users with a filter query of
  is_active_boolean:true so that the score is not effected,but since I
 do a
  join, I have to move the filter query to the query parameter so that I
 can
  get the filter applied. The problem is that since the is_active_boolean
 is
  moved to the query, the score is affected which returns back an order
 that I
  don't want.
   If I leave the is_active_boolean:true in the fq paramater, I get no
  results back.
 
  My question is how can I apply a filter query to users so that the score
 is
  not effected?
 




 --
 - sent from my mobile





 --
 - sent from my mobile




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Solr 3.4 group.truncate does not work with facet queries

2011-10-28 Thread Martijn v Groningen

Hi Ian,

I think this is a bug. After looking into the code the facet.query
feature doesn't take into account the group.truncate option.
This needs to be fixed. You can open a new issue in Jira if you want to.

Martijn

On 28 October 2011 12:09, Ian Grainger i...@isfluent.com wrote:
 Hi, I'm using Grouping with group.truncate=true, The following simple facet
 query:

 facet.query=Monitor_id:[38 TO 40]

 Doesn't give the same number as the nGroups result (with
 grouping.ngroups=true) for the equivalent filter query:

 fq=Monitor_id:[38 TO 40]

 I thought they should be the same - from the Wiki page: 'group.truncate: If
 true, facet counts are based on the most relevant document of each group
 matching the query.'

 What am I doing wrong?

 If I turn off group.truncate then the counts are the same, as I'd expect -
 but unfortunately I'm only interested in the grouped results.

 - I have also asked this question on StackOverflow, here:
 http://stackoverflow.com/questions/7905756/solr-3-4-group-truncate-does-not-work-with-facet-queries

 Thanks!

 --
 Ian

 i...@isfluent.com a...@endissolutions.com
 +44 (0)1223 257903




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: About the indexing process

2011-10-25 Thread Martijn v Groningen

Hi Amos,

How are you currently indexing files? Are you indexing Solr input
documents or just regular files?

You can use Solr cell to index binary files:
http://wiki.apache.org/solr/ExtractingRequestHandler

Martijn

On 25 October 2011 10:21, 刘浪 liu.l...@eisoo.com wrote:
 Hi，
     I appreciate you can help me. When I index a file, can I transfer the 
 content of the file and metadate to solr to construct the index, instead of 
 the file path and metadate?

 Thank you!
 Amos



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Selective Result Grouping

2011-10-23 Thread Martijn v Groningen

 The current grouping functionality using group.field is basically
 all-or-nothing: all documents will be grouped by the field value or none
 will. So there would be no way to, for example, collapse just the videos or
 images like they do in google.
When using the group.field option values must be the same otherwise
they don't get grouped together. Maybe fuzzy grouping would be nice.
Grouping videos and images based on mimetype should be easy, right?
Videos have a mimetype that start with video/ and images have a
mimetype that start with image/. Storing the mime type's subtype and
type in separate fields and group on the type field would do the job.
Off course you need to know the mimetype during indexing, but
solutions like Apache Tika can do that for you.

-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Solr 3.4 Grouping group.main=true results in java.lang.NoClassDefFound

2011-09-28 Thread Martijn v Groningen

Hi Frank,

How is Solr deployed? And how did you upgrade?
The commons-lang library (containing ArrayUtils) is included in the
Solr war file.

Martijn

On 28 September 2011 09:16, Frank Romweber fr...@romweber.de wrote:
 I use drupal for accessing the solr search engine. After updating an
 creating my new index everthing works as before. Then I activate the
 group=true and group.field=site and solr delivers me the wanted search
 results but in Drupal nothing appears just an empty search page. I found out
 that the group changes the resultset names. No problem solr offers for this
 case the group.main=true parameter. So I added this and get this 500 error.

 HTTP Status 500 - org/apache/commons/lang/ArrayUtils
 java.lang.NoClassDefFoundError: org/apache/commons/lang/ArrayUtils at
 org.apache.solr.search.Grouping$Command.createSimpleResponse(Grouping.java:573)
 at org.apache.solr.search.Grouping$CommandField.finish(Grouping.java:675) at
 org.apache.solr.search.Grouping.execute(Grouping.java:339) at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:240)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:662)

 I found out that solr didt find the class ArrayUtils.class. I try a lot of
 things to get this work. Setting JAVA_HOME and CLASSPATH vars and I changed
 the jre without any success. I am really wondering all my other programms
 are still running even solr in the normal mode is working and accesibly
 but not the group.main=true function.

 So my question is now what is nessesary to get this work?
 Any help is apreciated.

 Thx frank






-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Problem with SolrJ and Grouping

2011-09-12 Thread Martijn v Groningen

Also the error you described when wt=xml and using SolrJ is also fixed
in 3.4 (and in trunk / branch3x).
You can wait for the 3.4 release of use a night 3x build.

Martijn

On 12 September 2011 12:41, Sanal K Stephen sanalkstep...@gmail.com wrote:
 Kirill,

         Parsing the grouped result using SolrJ is not released yet I
 think..its going to release with Solr 3.4.0.SolrJ client cannot parse
 grouped and range facets results SOLR-2523.

 see the release notes of Solr 3.4.0
 http://wiki.apache.org/solr/ReleaseNote34


 On Mon, Sep 12, 2011 at 3:51 PM, Kirill Lykov lykov.kir...@gmail.comwrote:

 I found that SolrQuery doesn’t work with grouping.
 I constructed SolrQuery this way:

 solrQuery = constructFullSearchQuery(searchParams);
 solrQuery.set(group, true);
 solrQuery.set(group.field, GeonameId);

 Solr successfully handles request and writes about that in log:

 INFO: [] webapp=/solr path=/select

 params={start=1q=*:*timeAllowed=1500group.field=GeonameIdgroup=truewt=xmlrows=20version=2.2}
 hits=12099579 status=0 QTime=2968

 The error occurs when SolrJ tries to parse
 XMLResponseParser.processResponse (line 324), where builder stores
 “/lst”:

        Object val = type.read( builder.toString().trim() );
        if( val == null  type != KnownType.NULL) {
          throw new XMLStreamException( error reading value:+type,
 parser.getLocation() );
        }
        vals.add( val );
        break;

 The problem is - val is null. It happens because handler for the type
 LST returns null(line 178 in the same file):

 LST    (false) { @Override public Object read( String txt ) { return null;
 } },

 I don’t understand why it works this way. Xml which was returned by
 Solr is valid.
 If any I attached response xml to the letter. The error occures in the
 line 3, column 14 661.
 I use apache solr 3.3.0 and the same SolrJ.
 --
 Best regards,
 Kirill Lykov,
 Software Engineer,
 Data East LLC,
 tel.:+79133816052,
 LinkedIn profile: http://www.linkedin.com/pub/kirill-lykov/12/860/16




 --
 Regards,
 Sanal Kannappilly Stephen




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Nested documents

2011-09-12 Thread Martijn v Groningen

To support this, we also need to implement indexing block of documents in Solr.
Basically the UpdateHandler should also use this method:
IndexWriter#addDocuments(Collection documents)

On 12 September 2011 01:01, Michael McCandless
luc...@mikemccandless.com wrote:
 Even if it applies, this is for Lucene.  I don't think we've added
 Solr support for this yet... we should!

 Mike McCandless

 http://blog.mikemccandless.com

 On Sun, Sep 11, 2011 at 12:16 PM, Erick Erickson
 erickerick...@gmail.com wrote:
 Does this JIRA apply?

 https://issues.apache.org/jira/browse/LUCENE-3171

 Best
 Erick

 On Sat, Sep 10, 2011 at 8:32 PM, Andy angelf...@yahoo.com wrote:
 Hi,

 Does Solr support nested documents? If not is there any plan to add such a 
 feature?

 Thanks.





-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Problem with SolrJ and Grouping

2011-09-12 Thread Martijn v Groningen

The changes to that class were minor. There is only support for
parsing a grouped response. Check the QueryResponse class there is a
method getGroupResponse()
I ran into similar exceptions when creating the
QueryResponseTest#testGroupResponse test. The test use a xml response
from a file.

On 12 September 2011 15:38, Kirill Lykov lykov.kir...@gmail.com wrote:
 Martijn,

 I can't find the fixed version.
 I've got the last version of SolrJ but I see only minor changes in
 XMLResponseParser.java. And it doesn't support grouping yet. I also
 checked branch_3x, branch for 3.4.

 On Mon, Sep 12, 2011 at 5:45 PM, Martijn v Groningen
 martijn.v.gronin...@gmail.com wrote:
 Also the error you described when wt=xml and using SolrJ is also fixed
 in 3.4 (and in trunk / branch3x).
 You can wait for the 3.4 release of use a night 3x build.

 Martijn

 On 12 September 2011 12:41, Sanal K Stephen sanalkstep...@gmail.com wrote:
 Kirill,

         Parsing the grouped result using SolrJ is not released yet I
 think..its going to release with Solr 3.4.0.SolrJ client cannot parse
 grouped and range facets results SOLR-2523.

 see the release notes of Solr 3.4.0
 http://wiki.apache.org/solr/ReleaseNote34


 On Mon, Sep 12, 2011 at 3:51 PM, Kirill Lykov lykov.kir...@gmail.comwrote:

 I found that SolrQuery doesn’t work with grouping.
 I constructed SolrQuery this way:

 solrQuery = constructFullSearchQuery(searchParams);
 solrQuery.set(group, true);
 solrQuery.set(group.field, GeonameId);

 Solr successfully handles request and writes about that in log:

 INFO: [] webapp=/solr path=/select

 params={start=1q=*:*timeAllowed=1500group.field=GeonameIdgroup=truewt=xmlrows=20version=2.2}
 hits=12099579 status=0 QTime=2968

 The error occurs when SolrJ tries to parse
 XMLResponseParser.processResponse (line 324), where builder stores
 “/lst”:

        Object val = type.read( builder.toString().trim() );
        if( val == null  type != KnownType.NULL) {
          throw new XMLStreamException( error reading value:+type,
 parser.getLocation() );
        }
        vals.add( val );
        break;

 The problem is - val is null. It happens because handler for the type
 LST returns null(line 178 in the same file):

 LST    (false) { @Override public Object read( String txt ) { return null;
 } },

 I don’t understand why it works this way. Xml which was returned by
 Solr is valid.
 If any I attached response xml to the letter. The error occures in the
 line 3, column 14 661.
 I use apache solr 3.3.0 and the same SolrJ.
 --
 Best regards,
 Kirill Lykov,
 Software Engineer,
 Data East LLC,
 tel.:+79133816052,
 LinkedIn profile: http://www.linkedin.com/pub/kirill-lykov/12/860/16




 --
 Regards,
 Sanal Kannappilly Stephen




 --
 Met vriendelijke groet,

 Martijn van Groningen




 --
 Best regards,
 Kirill Lykov,
 Software Engineer,
 Data East LLC,
 tel.:+79133816052,
 LinkedIn profile: http://www.linkedin.com/pub/kirill-lykov/12/860/16




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Sorting groups by numFound group size

2011-09-10 Thread Martijn v Groningen

Not yet. If you want you can create an issue for sorting groups by numFound.

On 9 September 2011 18:49, O. Klein kl...@octoweb.nl wrote:

 I am also looking for way to sort on numFound.

 Has an issue been created?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Sorting-groups-by-numFound-group-size-tp3315740p3323420.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: TermsComponent from deleted document

2011-09-10 Thread Martijn v Groningen

I'd use the suggester:
http://wiki.apache.org/solr/Suggester

The suggester can give a collation. The TermsComponent can't do that.
The suggester builds on top of the spellchecking infrastructure, so
should be easy to use if you're familiar with that.

Martijn

On 10 September 2011 08:37, Manish Bafna manish.bafna...@gmail.com wrote:

 Which is preferable? using TermsComponent or Facets for autosuggest?

 On Fri, Sep 9, 2011 at 10:33 PM, Chris Hostetter
 hossman_luc...@fucit.orgwrote:

 
  : http://wiki.apache.org/solr/TermsComponent states that TermsComponent
  will
  : return frequencies from deleted documents too.
  :
  : Is there anyway to omit the deleted documents to get the frequencies.
 
  not really -- until a deleted document is expunged from segment merging,
  they are still included in the term stats which is what the TermsComponent
  looks at.
 
  If having 100% accurate term counts is really important to you, then you
  can optimize after doing any updates on your index - but there is
  obviously a performance tradeoff there.
 
 
 
  -Hoss
 



--
Met vriendelijke groet,

Martijn van Groningen

Re: Sorting groups by numFound group size

2011-09-08 Thread Martijn v Groningen

No, as far as I know sorting by group count isn't planned. You can create an
issue in Jira where future development of this feature can be tracked.

On 7 September 2011 23:54, bobsolr xbvbvccvb...@hotmail.com wrote:

 Hi Martijn,

 Thanks for the reply. Unfortunately I can't reference the group size using
 a
 function or by specifying a schema field, although I use those methods to
 sort docs within a group. Do you know if sort by group size is
 available/planned for a future release of solr(maybe 4.0)? if so, it might
 be worth a shot do get a trunk version to try that. btw, I'm using 3.3 now.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Sorting-groups-by-numFound-group-size-tp3315740p3318060.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Sorting groups by numFound group size

2011-09-07 Thread Martijn v Groningen

Sorting groups by numfound isn't possible. You can sort groups by specifying
a function or a field (from your schema) in the sort parameter.
The numFound isn't a field so that is why you can't sort on it.

Martijn

On 7 September 2011 08:17, bobsolr xbvbvccvb...@hotmail.com wrote:

 Hi,

 I'm using this sample query to group the result set by category:

 q=testgroup=truegroup.field=category

 This works as expected and I get this sample response:

 response:
 {numFound:1,start:0,docs:[
  {
   ...
  }
 {numFound:6,start:0,docs:[
  {
   ...
  }
 {numFound:3,start:0,docs:[
  {
   ...
  }

 However, I can't find a way to specify the sort order of the groups by
 number of docs each group has (numFound field). I think the sort param
 has something to do with it, but I don't know how to use it.

 Any help will be greatly appreciated!

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Sorting-groups-by-numFound-group-size-tp3315740p3315740.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Reading results from FieldCollapsing

2011-08-31 Thread Martijn v Groningen

The CollapseComponent was never comitted. This class exists in the
SOLR-236 patches. You don't need to change the configuration in order
to use grouping.
The blog you mentioned is based on the SOLR-236 patches. The current
grouping in Solr 3.3 has superseded these patches.

From Solr 3.4 (not yet released) the QueryResponse class in solrj has
a method getGroupResponse. Use this method to get the grouped
response.

On 31 August 2011 14:10, Erick Erickson erickerick...@gmail.com wrote:
 Actually, I haven't used the new stuff yet, so I'm not entirely sure either,
 but that sure would be the place to start. There's some historical
 ambiguity, Grouping started out as Field Collapsing, and they are
 used interchangeably.

 If you go to the bug I linked to and open up the patch file, you'll
 see the code that implements the grouping in SolrJ, that should
 give you a good place to start.

 Best
 Erick

 On Wed, Aug 31, 2011 at 3:28 AM, Sowmya V.B. vbsow...@gmail.com wrote:
 Hi Erick

 I downloaded the latest build from (
 https://builds.apache.org/job/Solr-3.x/lastSuccessfulBuild/artifact/artifacts/
 )
 But, I don't find the required class CollapseComponent in the src.
 (org.apache.solr.handler.component.CollapseComponent).

 The SolrJ in 3.4 does seem to have something like GroupResponse,
 GroupCommand classes, which might be the ones I am looking for (though I am
 not very sure).


 Regards
 Sowmya.

 On Tue, Aug 30, 2011 at 5:14 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Ahhh, see: https://issues.apache.org/jira/browse/SOLR-2637

 Short form: It's in 3.4, not 3.3.

 So, your choices are:
 1 parse the XML yourself
 2 get a current 3x build (as in one of the nightlys) and use SolrJ there.

 Best
 Erick

 On Tue, Aug 30, 2011 at 11:09 AM, Sowmya V.B. vbsow...@gmail.com wrote:
  Hi Erick
 
  Yes, I did see the XML format. But, I did not understand how to read the
  response using SolrJ.
 
  I found some information about Collapse Component on googling, which
 looks
  like a normal Solr XML results format.
 
 http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/
 
  However, this class CollapseComponent does not seem to exist in Solr
  3.3. (org.apache.solr.handler.component.CollapseComponent)
  was the component mentioned in that link, which is not there in Solr3.3
  class files.
 
  Sowmya.
 
  On Tue, Aug 30, 2011 at 4:48 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Have you looked at the XML (or JSON) response format?
  You're right, it is different and you have to parse it
  differently, there are move levels. Try this query
  and you'll see the format (default data set).
 
  http://localhost:8983/solr/select?q=*:*group=ongroup.field=manu_exact
 
 
  Best
  Erick
 
  On Tue, Aug 30, 2011 at 9:25 AM, Sowmya V.B. vbsow...@gmail.com
 wrote:
   Hi All
  
   I am trying to use FieldCollapsing feature in Solr. On the Solr admin
   interface, I give ...group=truegroup.field=fieldA and I can see
  grouped
   results.
   But, I am not able to figure out how to read those results in that
 order
  on
   java.
  
   Something like: SolrDocumentList doclist = response.getResults();
   gives me a set of results, on which I iterate, and get something like
   doclist.get(1).getFieldValue(title) etc.
  
   After grouping, doing the same step throws me error (apparently,
 because
  the
   returned xml formats are different too).
  
   How can I read groupValues and thereby other fieldvalues of the
 documents
   inside that group?
  
   S.
   --
   Sowmya V.B.
   
   Losing optimism is blasphemy!
   http://vbsowmya.wordpress.com
   
  
 
 
 
 
  --
  Sowmya V.B.
  
  Losing optimism is blasphemy!
  http://vbsowmya.wordpress.com
  
 




 --
 Sowmya V.B.
 
 Losing optimism is blasphemy!
 http://vbsowmya.wordpress.com
 





-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Grouping and performing statistics per group

2011-08-25 Thread Martijn v Groningen

Hi Omri,

I think you can achieve that with grouping and the Solr StatsComponent (
http://wiki.apache.org/solr/StatsComponent).
In order to compute statistics on groups you must set the option
group.truncate=true
An example query:
q=*:*group=truegroup.field=gendergroup.truncate=truestats=truestats.field=weight

Let me know if this fits your use case.

Martijn

On 25 August 2011 11:17, Omri Cohen omri...@gmail.com wrote:

 Thanks, but I actually need something more deterministic and more
 accurate..

 Anyone knows if there is an already existing feature for that?

 thanks again




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Grouping and performing statistics per group

2011-08-25 Thread Martijn v Groningen

Or if you dont care about grouped results you can also add the following
option:
stats.facet=gender

On 25 August 2011 14:40, Martijn v Groningen
martijn.v.gronin...@gmail.comwrote:

 Hi Omri,

 I think you can achieve that with grouping and the Solr StatsComponent (
 http://wiki.apache.org/solr/StatsComponent).
 In order to compute statistics on groups you must set the option
 group.truncate=true
 An example query:

 q=*:*group=truegroup.field=gendergroup.truncate=truestats=truestats.field=weight

 Let me know if this fits your use case.

 Martijn

 On 25 August 2011 11:17, Omri Cohen omri...@gmail.com wrote:

 Thanks, but I actually need something more deterministic and more
 accurate..

 Anyone knows if there is an already existing feature for that?

 thanks again




 --
 Met vriendelijke groet,

 Martijn van Groningen




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Grouping and performing statistics per group

2011-08-25 Thread Martijn v Groningen

If you take this query from the wiki:
http://localhost:8983/solr/select?q=*:*stats=truestats.field=pricestats.field=popularitystats.twopass=truerows=0indent=truestats.facet=inStock
In this case you get stats about the popularity per inStock value (true /
false). Replacing this values with weight and gender respectively and if you
run the query on your index, this should give the result you want.

On 25 August 2011 14:47, Omri Cohen omri...@gmail.com wrote:

 Hi, thanks for your reply..

 it doesn't work.  I am getting the plain stats results at the end of the
 response, but no statistics per group..

 thanks anyway




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Results Group-By using SolrJ

2011-08-14 Thread Martijn v Groningen

Hi Omri,

SOLR-2637 was concerned with adding grouped response parsing. There is no
convenience method for grouping, but you can use the normal
SolrQuery#set(...) methods to enable grouping.
The following code should enable grouping via SolrJ api:
SolrQuery query = new SolrQuery();
query.set(GroupParams.GROUP, true);
query.set(GroupParams.GROUP_FIELD, your_field);

Martijn

On 14 August 2011 16:53, Omri Cohen omri...@gmail.com wrote:

 Hi All,

 I am trying to group by results using SolrJ. According to
 https://issues.apache.org/jira/browse/SOLR-2637 the feature was added, so
 I
 upgraded to SolrJ-3.4-Snapshot and I can see the necessary method for
 grouping in QueryResponse, which is getGroupResponse(). The only thing left
 that I don't understand is where do I set on which field to group. There is
 no method that looks like it does so on the SolrQuery object..

 Ideas anyone ?

 thanks,
 Omri




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: SOLR 3.3.0 multivalued field sort problem

2011-08-13 Thread Martijn v Groningen

The first solution would make sense to me. Some kind of a strategy
mechanism
for this would allow anyone to define their own rules. Duplicating results
would be confusing to me.

On 13 August 2011 18:39, Michael Lackhoff mich...@lackhoff.de wrote:

 On 13.08.2011 18:03 Erick Erickson wrote:

  The problem I've always had is that I don't quite know what
  sorting on multivalued fields means. If your field had tokens
  a and z, would sorting on that field put the doc
  at the beginning or end of the list? Sure, you can define
  rules (first token, last token, average of all tokens (whatever
  that means)), but each solution would be wrong sometime,
  somewhere, and/or completely useless.

 Of course it would need rules but I think it wouldn't be too hard to
 find rules that are at least far better than the current situation.

 My wish would include an option that decides if the field can be used
 just once or every value on its own. If the option is set to FALSE, only
 the first value would be used, if it is TRUE, every value of the field
 would get its place in the result list.

 so, if we have e.g.
 record1: ccc and bbb
 record2: aaa and zzz
 it would be either
 record2 (aaa)
 record1 (ccc)
 or
 record2 (aaa)
 record1 (bbb)
 record1 (ccc)
 record2 (zzz)

 I find these two outcomes most plausible so I would allow them if
 technical possible but whatever rule looks more plausible to the
 experts: some solution is better than no solution.

 -Michael




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: SOLR 3.3.0 multivalued field sort problem

2011-08-12 Thread Martijn v Groningen

Hi Johnny,

Sorting on a multivalued field has never really worked in Solr.
Solr versions = 1.4.1 allowed it, but there was a change that an error
occurred and that the sorting might not be what you expect.
From Solr 3.1 and up sorting on a multivalued isn't allowed and a http 400
is returned.

Duplicating documents or fields (what Peter describes) is as far as I know
they only option until Lucene supports sorting on multivalued fields
properly.

Martijn

2011/8/12 Péter Király kirun...@gmail.com

Hi,

There is no direct solution, you have to create single value field(s)
to create search. I am aware of two workarounds:

- you can use a random or a given (e.g. the first) instance of the
multiple values of the field, and that would be your sortable field.
- you can create two sortable fields: _min and _max, which
contains the minimal and maximal values of the given field values.

At least, that's what I do. Probably there are other solutions as well.

Péter
--
eXtensible Catalog
http://drupal.org/project/xc

2011/8/12 johnnyisrael johnnyi.john...@gmail.com:
Hi,

I am currently using SOLR 1.4.1, With this version sorting working fine
even
in multivalued field.

Now I am planning to upgrade my SOLR version from 1.4.1 -- 3.3.0, In
this
latest version sorting is not working on multivauled field.

So I am in unable to upgrade my SOLR due to this drawback.

Is there a work around available to fix this problem?

Thanks,

Johnny

--
View this message in context:
http://lucene.472066.n3.nabble.com/SOLR-3-3-0-multivalued-field-sort-problem-tp3248778p3248778.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Met vriendelijke groet,

Martijn van Groningen

Re: SOLR Support for Lucene Nested Documents

2011-08-05 Thread Martijn v Groningen

Hi Josh,

Solr doesn't expose this Lucene feature yet. To support this Solr needs to
be able to index documents in a single block.
Also the BlockJoinQuery needs to be exposed to Solr (this can easily happen
via a QParserPlugin).

Martijn

On 5 August 2011 00:00, Joshua Harness jkharnes...@gmail.com wrote:

 I noticed that lucene supports 'Nested Documents'. However - I
 couldn't find mention of this feature within SOLR. Does anybody know
 how to leverage this lucene feature through SOLR?

 Thanks!

 Josh Harness




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Minimum Score

2011-08-05 Thread Martijn v Groningen

As far as I know there is no built-in solution for this like there is for
max score.
An alternative approach to the one already mentioned is to send a second
request with rows=1 and sort=score asc
This will return the lowest scoring document and you can then retrieve the
score from that document (if fl=*, score).

Martijn

On 5 August 2011 10:45, Kissue Kissue kissue...@gmail.com wrote:

 But that would mean returning all the results without pagination which i
 dont want to do. I am looking for a way to do it without having to return
 all the results at once.

 Thanks.

 On Thu, Aug 4, 2011 at 11:18 PM, Darren Govoni dar...@ontrenet.com
 wrote:

  Off the top of my head you maybe you can get the number of results and
  then
  look at the last document and check its score. I believe the results will
  be ordered by score?
 
 
  On 08/04/2011 05:44 PM, Kissue Kissue wrote:
 
  Hi,
 
  I am using Solr 3.1 with the SolrJ client library. I can see that it is
  possible to get the maximum score for your search by using the
 following:
 
  response.getResults().**getMaxScore()
 
  I am wondering is there some simple solution to get the minimum score?
 
  Many thanks.
 
 
 




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: A rant about field collapsing

2011-08-04 Thread Martijn v Groningen

The development of the field collapse feature is a long and confusing story.
The main point is that SOLR-236 was never going to scale
and the performance in general was bad. A new approach was needed.
This was implemented in SOLR-1682 and added to the trunk (4.0-dev)
around September last year. Later in LUCENE-1421 the code was moved
from Solr to Lucene as a module / contrib. After that the grouping module
and contrib were wired into Solr 3.3 and 4.0-dev in SOLR-2524 and SOLR-2564.

Field collapsing is not gone, but it is just a form of result grouping. The
core
SOLR-236 feature is in 3.3 / 4.0-dev. Other features that SOLR-236 offered
will
eventually get in. Like for example post grouping facets.
The http parameters and response have changed without keeping it compatible
with
the SOLR-236 patches. I think that isn't a problem since SOLR-236 was
never a committed feature. But a widely used feature should never be
attached as a patch
to a Jira issue for 3+ years

On 3 August 2011 18:33, baronDodd barond...@googlemail.com wrote:

 I am working on an implementation of search within our application using
 solr.

 About 2 months ago we had the need to group results by a certain field.
 After some searching I came across the JIRA in progress for this - field
 collapsing: https://issues.apache.org/jira/browse/SOLR-236

 It was scheduled for the next solr release and had a full set of proper
 JIRA
 subtasks and patch files of almost complete implementations attached. So as
 you can imagine I was happy to apply this patch and build it into our
 application and await for the next release when it would be part of the
 main
 trunk.

 Now imagine my surprise when we have come around to upgrade to see that
 suddenly field collapsing has been thrown away in favour of a totally
 different grouping implementation
 https://issues.apache.org/jira/browse/SOLR-2524

 How was it decided that this would be used instead? It was not made very
 clear that LUCENE-1421 was in progress which would effectively make the
 field collapsing work irrelevant by fixing the problem in lucene rather
 than
 primarily in solr. This has cost me days of work to now merge our custom
 changes somehow to the new implementation. I guess it is my own fault for
 basing our custom changes around an unresolved enhancement but as SOLR-236
 had been 3-4 years in progress and SOLR-2524 did not exist at the time it
 seemed pretty safe to assume that the same problem was not being fixed in 2
 totally different ways!

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/A-rant-about-field-collapsing-tp3222798p3222798.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: A rant about field collapsing

2011-08-04 Thread Martijn v Groningen

Well, the original page moved to:
http://wiki.apache.org/solr/FieldCollapsingUncommitted

Assuming that you're using Solr 3.3 you can't get the grouped result (lst
name=grouped) with SolrJ.
I added grouping support to SolrJ some time ago and will be in Solr 3.4. You
can use a nightly 3.x build to use the grouping support now.
You can also use the group.main=true option, that returns a response that is
compatible with the normal search response.
However you can only use one group command per request (group.field,
group.func and group.query). Also there were
some bugs with this response format in Solr 3.3 that have been fixed and
will be included when Solr 3.4 is released.

Martijn

On 4 August 2011 15:24, baronDodd barond...@googlemail.com wrote:

 Ok thank you very much for clearing that up a little. I think another
 reason
 I was confused was that the wiki page for grouping was based around the
 original field collapsing plan at the time which led me to the jira and
 hence the patch files, rant over!

 Perhaps you can help to clarify if the current grouping changes work with
 solrj? In QueryResponse.setResponse() there is a loop which builds up the
 results object, but has no check at present for grouped in the NamedList,
 so the solrj client gets no results back when searching with grouping
 parameters. I assume I can add this on my local working copy and all will
 be
 well?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/A-rant-about-field-collapsing-tp3222798p3225361.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: ideas for versioning query?

2011-08-01 Thread Martijn v Groningen

Hi Mike, how many docs and groups do you have in your index?
I think the group.sort option fits your requirements.

If I remember correctly group.ngroup=true adds something like 30% extra time
on top of the search request with grouping,
but that was on my local test dataset (~30M docs, ~8000 groups) and my
machine. You might encounter different search times when setting
group.ngroup=true.

Martijn

2011/8/1 Mike Sokolov soko...@ifactory.com

Thanks, Tomas. Yes we are planning to keep a current flag in the most
current document. But there are cases where, for a given user, the most
current document is not that one, because they only have access to some
older documents.

I took a look at
http://wiki.apache.org/solr/**FieldCollapsinghttp://wiki.apache.org/solr/FieldCollapsingand
it seems as if it will do what we need here. My one concern is that it
might not be efficient at computing group.ngroups for a very large number of
groups, which we would ideally want. Is that something I should be worried
about?

-Mike

On 08/01/2011 10:08 AM, Tomás Fernández Löbbe wrote:

Hi Michael, I guess this could be solved using grouping as you said.
Documents inside a group can be sorted on a field (in your case, the
version
field, see parameter group.sort), and you can show only the first one. It
will be more complex to show facets (post grouping faceting is work in
progress but still not committed to the trunk).

I would be easier from the Solr side if you could do something at index
time, like indicating which document is the current one and which one is
an old one (you would need to update the old document whenever a new
version
is indexed).

Regards,

Tomás

On Mon, Aug 1, 2011 at 10:47 AM, Mike Sokolovsoko...@ifactory.com
wrote:

A customer has an interesting problem: some documents will have multiple
versions. In search results, only the most recent version of a given
document should be shown. The trick is that each user has access to a
different set of document versions, and each user should see only the
most
recent version of a document that they have access to.

Is this something that can reasonably be solved with grouping? In 3.x? I
haven't followed the grouping discussions closely: would someone point me
in
the right direction please?

--
Michael Sokolov
Engineering Director
www.ifactory.com

--
Met vriendelijke groet,

Martijn van Groningen

Re: SolrJ and class versions

2011-07-26 Thread Martijn v Groningen

Where you upgrading from Solr 1.4?
SolrJ uses by default for querying the javabin format (wt parameter).
The javabin format is not compatible between 1.4 and 3.1 and above.
So If your clients where running with SolrJ 1.4 versions I would expect
errors to occur.

Martijn

On 25 July 2011 12:15, Tarjei Huse tar...@scanmine.com wrote:

 Hi, I recently went through a little hell when I upgraded my Solr
 servers to 3.2.0. What I didn't anticipate was that my Java SolrJ
 clients depend on the server version.

 I would like to add a note about this in the SolrJ docs:
 http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update

 Any comments with regard to this?

 Are all SolrJ methods dependent on Java Serialization and thus class
 versions?

 --
 Regards / Med vennlig hilsen
 Tarjei Huse
 Mobil: 920 63 413




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: SolrJ and class versions

2011-07-26 Thread Martijn v Groningen

I agree! It should be noted in the documentation.
I just wanted to say that SolrJ doen't depend on Java serialization, but
uses its own serialization:
http://lucene.apache.org/solr/api/solrj/org/apache/solr/common/util/JavaBinCodec.html

Martijn

On 26 July 2011 15:31, Tarjei Huse tar...@scanmine.com wrote:

 On 07/26/2011 09:26 AM, Martijn v Groningen wrote:
  Where you upgrading from Solr 1.4?
 Yep.
  SolrJ uses by default for querying the javabin format (wt parameter).
  The javabin format is not compatible between 1.4 and 3.1 and above.
  So If your clients where running with SolrJ 1.4 versions I would expect
  errors to occur.
 I understood the error when I saw it, but I think the version dependency
 should be noted in the SolrJ manual.
 Regards,
 T
  Martijn
 
  On 25 July 2011 12:15, Tarjei Huse tar...@scanmine.com wrote:
 
  Hi, I recently went through a little hell when I upgraded my Solr
  servers to 3.2.0. What I didn't anticipate was that my Java SolrJ
  clients depend on the server version.
 
  I would like to add a note about this in the SolrJ docs:
  http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update
 
  Any comments with regard to this?
 
  Are all SolrJ methods dependent on Java Serialization and thus class
  versions?
 
  --
  Regards / Med vennlig hilsen
  Tarjei Huse
  Mobil: 920 63 413
 
 
 


 --
 Regards / Med vennlig hilsen
 Tarjei Huse
 Mobil: 920 63 413




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Possible bug in Solr 3.3 grouping

2011-07-12 Thread Martijn v Groningen

Hi Nikhil,

Thanks for raising this issue. I checked this particular issue in a test
case and I ran into the same error, so this is indeed a bug. I've fixed this
issue for 3x in revision 1145748.
So checking out the latest 3x branch and building Solr yourself should give
you this bug fix. Or you can wait until the 3x build produces
new nightly artifacts:
https://builds.apache.org/job/Solr-3.x/lastSuccessfulBuild/artifact/artifacts/

The numFound in your case is based on number of documents, not groups. So
you might still get an empty result, because there might be less than 40
groups in this result set.
You can see the number of groups by using the group.ngroups=true parameter
this includes the number of groups, but the the group.format must be grouped
otherwise you don't get the number of groups in the response.

Martijn

On 12 July 2011 09:00, Nikhil Chhaochharia nikhil...@yahoo.com wrote:

 Hi,

 I am using Solr 3.3 and have run into a problem with grouping.  If
 'group.main' is 'true' and 'start' is greater than 'rows', then I do not get
 any results.  A sample response is:

 response
 lst name=responseHeaderint name=status0/intint
 name=QTime602/int/lstlst name=grouped/result name=response
 numFound=2239538 start=40 maxScore=6.140459/
 /response


 If 'group.main' is false, then I get results.

 Did anyone else come across this problem?  Using the grouping feature with
 pagination of results will make start  rows from the third page onwards.

 Thanks,
 Nikhil

Re: SolrJ and Range Faceting

2011-06-13 Thread Martijn v Groningen

No, that doesn't work yet.
I think we'll need to enhance the Solr query parser in order to support
this.
If that is supported we don't need to use the DataMathParser and this a more
general solution also for other client libraries.

Martijn

On 12 June 2011 20:58, Jamie Johnson jej2...@gmail.com wrote:

 Martjin,

 I had not considered doing something like


 manufacturedate_dt:[2007-02-13T15:26:37Z TO 2007-02-13T15:26:37Z+1YEAR]

 does this work?  If so that completely eliminates the need to use the date
 math parsers right?

 On Sun, Jun 12, 2011 at 9:10 AM, Martijn v Groningen 
 martijn.is.h...@gmail.com wrote:

  Hi Jamie,  Just letting you know that I've added a comment to SOLR-2523.
 I
  ran into a class dependency issue regarding MathDateParser.
 
  Martijn
   On 11 June 2011 16:04, Jamie Johnson jej2...@gmail.com wrote:
 
   Awesome Martjin, let me know when you have it comitted and I'll check
 out
   the latest version.  Again thanks!
  
   On Sat, Jun 11, 2011 at 8:15 AM, Martijn v Groningen 
   martijn.is.h...@gmail.com wrote:
  
Hi James,
   
Good idea! I'll add a getAsFilterQuery method to the patch.
   
Martijn
   
On 6 June 2011 19:32, Jamie Johnson jej2...@gmail.com wrote:
   
 Small error, shouldn't be using this.start but should instead be
  using
 Double.parseDouble(this.getValue());
 and
 sdf.parse(count.getValue());
 respectfully.

 On Mon, Jun 6, 2011 at 1:16 PM, Jamie Johnson jej2...@gmail.com
   wrote:

  Thanks Martijn.  I pulled your patch and it looks like what I was
looking
  for.  The original FacetField class has a getAsFilterQuery method
   which
  returns the criteria to use as an fq parameter, I have logic
 which
   does
 this
  in my class which works, any chance of getting something like
 this
added
 to
  the patch as well?
 
 
public static class Numeric extends RangeFacetNumber, Number
 {
 
  public Numeric(String name, Number start, Number end, Number
  gap)
   {
super(name, start, end, gap);
  }
 
public String getAsFilterQuery(){
Double end = this.start.doubleValue() +
this.gap.doubleValue()
 -
  1;
return this.name + :[ + this.start +  TO  + end +
   ]);
}
 
 
}
 
 
  and for dates (there's a parse exception below which I am not
 doing
  anything with currently)
 
public String getAsFilterQuery(){
RangeFacet.Date dateCount =
  (RangeFacet.Date)count.getRangeFacet();
 
DateMathParser parser = new
 DateMathParser(TimeZone.getDefault(),
  Locale.getDefault());
SimpleDateFormat sdf = new
  SimpleDateFormat(-MM-dd'T'HH:mm:ss);
 
parser.setNow(dateCount.getStart());
Date end = parser.parseMath(dateCount.getGap());
String startStr = sdf.format(dateCount.getStart()) +
 Z;
String endStr = sdf.format(end) + Z;
String label = startStr +  TO  + endStr;
return facetField.getName() + :[ + label + ]);
 
}
 
 
  On Fri, Jun 3, 2011 at 7:05 AM, Martijn v Groningen 
  martijn.is.h...@gmail.com wrote:
 
  Hi Jamie,
 
  I don't know why range facets didn't make it into SolrJ. But
 I've
 recently
  opened an issue for this:
  https://issues.apache.org/jira/browse/SOLR-2523
 
  I hope this will be committed soon. Check the patch out and see
 if
   you
  like
  it.
 
  Martijn
 
  On 2 June 2011 18:22, Jamie Johnson jej2...@gmail.com wrote:
 
   Currently the range and date faceting in SolrJ acts a bit
differently
  than
   I
   would expect.  Specifically, range facets aren't parsed at all
  and
 date
   facets end up generating filterQueries which don't have the
  range,
 just
  the
   lower bound.  Is there a reason why SolrJ doesn't support
 these?
I
 have
   written some things on my end to handle these and generate
 filterQueries
   for
   date ranges of the form dateTime:[start TO end] and I have a
function
   (which
   I copied from the date faceting) which parses the range
 facets,
   but
  would
   prefer not to have to maintain these myself.  Is there a plan
 to
  implement
   these?  Also is there a plan to update FacetField to not have
  end
   be
a
   date,
   perhaps making it a String like start so we can support date
 and
range
   queries?
  
 
 
 
  --
  Met vriendelijke groet,
 
  Martijn van Groningen
 
 
 

   
   
   
--
 Met vriendelijke groet,
   
Martijn van Groningen
   
  
 
 
 
  --
   Met vriendelijke groet,
 
  Martijn van Groningen
 




-- 
Met vriendelijke groet,

Martijn van

Re: SolrJ and Range Faceting

2011-06-12 Thread Martijn v Groningen

Hi Jamie,  Just letting you know that I've added a comment to SOLR-2523. I
ran into a class dependency issue regarding MathDateParser.

Martijn
On 11 June 2011 16:04, Jamie Johnson jej2...@gmail.com wrote:

 Awesome Martjin, let me know when you have it comitted and I'll check out
 the latest version.  Again thanks!

 On Sat, Jun 11, 2011 at 8:15 AM, Martijn v Groningen 
 martijn.is.h...@gmail.com wrote:

  Hi James,
 
  Good idea! I'll add a getAsFilterQuery method to the patch.
 
  Martijn
 
  On 6 June 2011 19:32, Jamie Johnson jej2...@gmail.com wrote:
 
   Small error, shouldn't be using this.start but should instead be using
   Double.parseDouble(this.getValue());
   and
   sdf.parse(count.getValue());
   respectfully.
  
   On Mon, Jun 6, 2011 at 1:16 PM, Jamie Johnson jej2...@gmail.com
 wrote:
  
Thanks Martijn.  I pulled your patch and it looks like what I was
  looking
for.  The original FacetField class has a getAsFilterQuery method
 which
returns the criteria to use as an fq parameter, I have logic which
 does
   this
in my class which works, any chance of getting something like this
  added
   to
the patch as well?
   
   
  public static class Numeric extends RangeFacetNumber, Number {
   
public Numeric(String name, Number start, Number end, Number gap)
 {
  super(name, start, end, gap);
}
   
  public String getAsFilterQuery(){
  Double end = this.start.doubleValue() +
  this.gap.doubleValue()
   -
1;
  return this.name + :[ + this.start +  TO  + end +
 ]);
  }
   
   
  }
   
   
and for dates (there's a parse exception below which I am not doing
anything with currently)
   
  public String getAsFilterQuery(){
  RangeFacet.Date dateCount =
(RangeFacet.Date)count.getRangeFacet();
   
  DateMathParser parser = new
   DateMathParser(TimeZone.getDefault(),
Locale.getDefault());
  SimpleDateFormat sdf = new
SimpleDateFormat(-MM-dd'T'HH:mm:ss);
   
  parser.setNow(dateCount.getStart());
  Date end = parser.parseMath(dateCount.getGap());
  String startStr = sdf.format(dateCount.getStart()) + Z;
  String endStr = sdf.format(end) + Z;
  String label = startStr +  TO  + endStr;
  return facetField.getName() + :[ + label + ]);
   
  }
   
   
On Fri, Jun 3, 2011 at 7:05 AM, Martijn v Groningen 
martijn.is.h...@gmail.com wrote:
   
Hi Jamie,
   
I don't know why range facets didn't make it into SolrJ. But I've
   recently
opened an issue for this:
https://issues.apache.org/jira/browse/SOLR-2523
   
I hope this will be committed soon. Check the patch out and see if
 you
like
it.
   
Martijn
   
On 2 June 2011 18:22, Jamie Johnson jej2...@gmail.com wrote:
   
 Currently the range and date faceting in SolrJ acts a bit
  differently
than
 I
 would expect.  Specifically, range facets aren't parsed at all and
   date
 facets end up generating filterQueries which don't have the range,
   just
the
 lower bound.  Is there a reason why SolrJ doesn't support these?
  I
   have
 written some things on my end to handle these and generate
   filterQueries
 for
 date ranges of the form dateTime:[start TO end] and I have a
  function
 (which
 I copied from the date faceting) which parses the range facets,
 but
would
 prefer not to have to maintain these myself.  Is there a plan to
implement
 these?  Also is there a plan to update FacetField to not have end
 be
  a
 date,
 perhaps making it a String like start so we can support date and
  range
 queries?

   
   
   
--
Met vriendelijke groet,
   
Martijn van Groningen
   
   
   
  
 
 
 
  --
   Met vriendelijke groet,
 
  Martijn van Groningen
 




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: SolrJ and Range Faceting

2011-06-11 Thread Martijn v Groningen

Hi James,

Good idea! I'll add a getAsFilterQuery method to the patch.

Martijn

On 6 June 2011 19:32, Jamie Johnson jej2...@gmail.com wrote:

 Small error, shouldn't be using this.start but should instead be using
 Double.parseDouble(this.getValue());
 and
 sdf.parse(count.getValue());
 respectfully.

 On Mon, Jun 6, 2011 at 1:16 PM, Jamie Johnson jej2...@gmail.com wrote:

  Thanks Martijn.  I pulled your patch and it looks like what I was looking
  for.  The original FacetField class has a getAsFilterQuery method which
  returns the criteria to use as an fq parameter, I have logic which does
 this
  in my class which works, any chance of getting something like this added
 to
  the patch as well?
 
 
public static class Numeric extends RangeFacetNumber, Number {
 
  public Numeric(String name, Number start, Number end, Number gap) {
super(name, start, end, gap);
  }
 
public String getAsFilterQuery(){
Double end = this.start.doubleValue() + this.gap.doubleValue()
 -
  1;
return this.name + :[ + this.start +  TO  + end + ]);
}
 
 
}
 
 
  and for dates (there's a parse exception below which I am not doing
  anything with currently)
 
public String getAsFilterQuery(){
RangeFacet.Date dateCount =
  (RangeFacet.Date)count.getRangeFacet();
 
DateMathParser parser = new
 DateMathParser(TimeZone.getDefault(),
  Locale.getDefault());
SimpleDateFormat sdf = new
  SimpleDateFormat(-MM-dd'T'HH:mm:ss);
 
parser.setNow(dateCount.getStart());
Date end = parser.parseMath(dateCount.getGap());
String startStr = sdf.format(dateCount.getStart()) + Z;
String endStr = sdf.format(end) + Z;
String label = startStr +  TO  + endStr;
return facetField.getName() + :[ + label + ]);
 
}
 
 
  On Fri, Jun 3, 2011 at 7:05 AM, Martijn v Groningen 
  martijn.is.h...@gmail.com wrote:
 
  Hi Jamie,
 
  I don't know why range facets didn't make it into SolrJ. But I've
 recently
  opened an issue for this:
  https://issues.apache.org/jira/browse/SOLR-2523
 
  I hope this will be committed soon. Check the patch out and see if you
  like
  it.
 
  Martijn
 
  On 2 June 2011 18:22, Jamie Johnson jej2...@gmail.com wrote:
 
   Currently the range and date faceting in SolrJ acts a bit differently
  than
   I
   would expect.  Specifically, range facets aren't parsed at all and
 date
   facets end up generating filterQueries which don't have the range,
 just
  the
   lower bound.  Is there a reason why SolrJ doesn't support these?  I
 have
   written some things on my end to handle these and generate
 filterQueries
   for
   date ranges of the form dateTime:[start TO end] and I have a function
   (which
   I copied from the date faceting) which parses the range facets, but
  would
   prefer not to have to maintain these myself.  Is there a plan to
  implement
   these?  Also is there a plan to update FacetField to not have end be a
   date,
   perhaps making it a String like start so we can support date and range
   queries?
  
 
 
 
  --
  Met vriendelijke groet,
 
  Martijn van Groningen
 
 
 




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: SolrJ and Range Faceting

2011-06-03 Thread Martijn v Groningen

Hi Jamie,

I don't know why range facets didn't make it into SolrJ. But I've recently
opened an issue for this:
https://issues.apache.org/jira/browse/SOLR-2523

I hope this will be committed soon. Check the patch out and see if you like
it.

Martijn

On 2 June 2011 18:22, Jamie Johnson jej2...@gmail.com wrote:

 Currently the range and date faceting in SolrJ acts a bit differently than
 I
 would expect.  Specifically, range facets aren't parsed at all and date
 facets end up generating filterQueries which don't have the range, just the
 lower bound.  Is there a reason why SolrJ doesn't support these?  I have
 written some things on my end to handle these and generate filterQueries
 for
 date ranges of the form dateTime:[start TO end] and I have a function
 (which
 I copied from the date faceting) which parses the range facets, but would
 prefer not to have to maintain these myself.  Is there a plan to implement
 these?  Also is there a plan to update FacetField to not have end be a
 date,
 perhaps making it a String like start so we can support date and range
 queries?




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Result Grouping always returns grouped output

2011-06-02 Thread Martijn v Groningen

Hi Karel,

group.main=true should do the trick. When that is set to true the
group.format is always simple.

Martijn

On 27 May 2011 19:13, kare...@gmail.com kare...@gmail.com wrote:

 Hello,

 I am using the latest nightly build of Solr 4.0 and I would like to
 use grouping/field collapsing while maintaining compatibility with my
 current parser.  I am using the regular webinterface to test it, the
 same commands like in the wiki, just with the field names matching my
 dataset.

 Grouping itself works, group=true and group.field return the expected
 results, but neither group.main=true or group.format=simple seem to
 change anything.

 Do I have to include something special in solrconconfig.xml or
 scheme.xml to make the simple output work?

 Thanks for any hints,
 K




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Martijn v Groningen

[ ]  I always use the JDK logging as bundled in solr.war, that's perfect
[ ]  I sometimes use log4j or another framework and am happy with
re-packaging solr.war
[X]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at
deploy time
[ ]  Let me choose whether to bundle a binding or not at build time, using
an ANT option
[ ]  What's wrong with the solr/example Jetty? I never run Solr elsewhere!
[ ]  What? Solr can do logging? How cool!

On 16 May 2011 11:32, Gora Mohanty g...@mimirtech.com wrote:

 On Mon, May 16, 2011 at 2:13 PM, Jan Høydahl jan@cominvent.com
 wrote:
 [...]
  Please tick one of the options below with an [X]:
 
  [ X]  I always use the JDK logging as bundled in solr.war, that's perfect
  [ ]  I sometimes use log4j or another framework and am happy with
 re-packaging solr.war
  [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at
 deploy time
  [ ]  Let me choose whether to bundle a binding or not at build time,
 using an ANT option
  [ ]  What's wrong with the solr/example Jetty? I never run Solr
 elsewhere!
  [ ]  What? Solr can do logging? How cool!

 Regards,
 Gora




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-19 Thread Martijn v Groningen

[] ASF Mirrors (linked in our release announcements or via the Lucene
website)

[ X ] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[ X ] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a
downstream project)

On 19 January 2011 14:59, Matthew Hall mh...@informatics.jax.org wrote:

 [X] ASF Mirrors (linked in our release announcements or via the Lucene
 website)


 [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

 [] I/we build them from source via an SVN/Git checkout.

 [] Other (someone in your company mirrors them internally or via a
 downstream project)



 --
 Matthew Hall
 Software Engineer
 Mouse Genome Informatics
 mh...@informatics.jax.org
 (207) 288-6012





-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Field Collapse question

2010-07-04 Thread Martijn v Groningen

Hi Ken,

Not collapsing on null field values is not possible in the patch.
However you can if you want to fix this in the patch it is a really
small change. Assuming that you're using the default collapsing
algorithm you can add the following piece of code in the
NonAdjacentDocumentCollapser.java file:
if (currentValue == null) {
   continue;
}

Place it in the doCollapsing method after the following statement:
String currentValue = values.lookup[values.order[currentId]];

This makes sure that documents that have no value in the collapse
field are not collapsed.

Field collapsing has a big impact on your search times and also on
memory usage to a lesser extend. It can increase search times up to 10
times, but this depends per situation. As indexes get bigger this
becomes a bigger problem. Also using field collapsing in a distributed
environment can cause problems. This is due that collapse information
is not shared between shards, resulting in incorrect collapse results.
They only work around for this problem I know is, that you 'll have to
make sure that the groups are distributes evenly between shards and
that a group's documents are not spread across shards.

Other then that there are no further major issues with this patch.
Many people are using this patch in their Solr setups, but it is a
patch so you 'll have to keep that in mind. There are efforts to put
grouping functionality into Solr (without patching) in SOLR-236's
child issues, so keep an eye on these issues.

Cheers,

Martijn

On 3 July 2010 19:20, osocurious2 ken.fos...@realestate.com wrote:

 boost

 I wanted to extend my question some. My original question about collapsing
 null fields is still open, but in trying to research elsewhere I see a lot
 of angst about the Field Collapse functionality in general. Can anyone
 summarize what the current state of affairs is with it? I'm on Solr 1.4,
 just the latest release build, not any current builds. Field Collapse seems
 to be in my build because I could do single field collapse just fine (hence
 my null field question). However there seems to be talk of problems with
 Field Collapse that aren't fixed yet. What kinds of issues are people
 having? Should I avoid Field Collapse in a production app for now? (tricky
 because I'm merging my schema with a third party tool schema and they are
 using Field Collapse).

 Any insight would be helpful, thanks
 Ken
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Field-Collapse-question-tp939118p940923.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR-236 Patch

2010-06-25 Thread Martijn v Groningen

Hi Sam,

It seems that the patch is out of sync again with the trunk. Can you
try patching with revision 955615? I'll update the patch shortly.

Martijn

On 24 June 2010 09:49, Amdebirhan, Samson, VF-Group
samson.amdebir...@vodafone.com wrote:
 Hi



 Trying to apply the SOLR-236 patch to a trunk i get what follows. Can
 anyone help me understanding what I am missing ?



 
 .

 svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk



 patch -p0 -i SOLR-236-trunk.patch --dry-run



 patching file
 solr/src/test/org/apache/solr/search/fieldcollapse/MyDocTermsIndex.java

 patching file
 solr/src/java/org/apache/solr/handler/component/CollapseComponent.java

 patching file
 solr/src/test/test-files/solr/conf/solrconfig-fieldcollapse.xml

 patching file
 solr/src/java/org/apache/solr/search/fieldcollapse/collector/FieldValueC
 ountCollapseCollectorFactory.java

 patching file
 solr/src/java/org/apache/solr/search/fieldcollapse/collector/DocumentGro
 upCountCollapseCollectorFactory.java

 can't find file to patch at input line 1068

 Perhaps you used the wrong -p or --strip option?

 The text leading up to this was:

 --

 |Index: solr/src/java/org/apache/solr/search/DocSetHitCollector.java

 |===

 |--- solr/src/java/org/apache/solr/search/DocSetHitCollector.java
 (revision 922957)

 |+++ solr/src/java/org/apache/solr/search/DocSetHitCollector.java
 (revision )



 
 .







 Regards

 Sam

Re: Field Collapsing SOLR-236

2010-06-22 Thread Martijn v Groningen

What exactly did not work? Patching, compiling or running it?

On 22 June 2010 16:06, Rakhi Khatwani rkhatw...@gmail.com wrote:
 Hi,
      I tried checking out the latest code (rev 956715) the patch did not
 work on it.
 Infact i even tried hunting for the revision mentioned earlier in this
 thread (i.e. rev 955615) but cannot find it in the repository. (it has
 revision 955569 followed by revision 955785).

 Any pointers??
 Regards
 Raakhi

 On Tue, Jun 22, 2010 at 2:03 AM, Martijn v Groningen 
 martijn.is.h...@gmail.com wrote:

 Oh in that case is the code stable enough to use it for production?
     -  Well this feature is a patch and I think that says it all.
 Although bugs are fixed it is deferentially an experimental feature
 and people should keep that in mind when using one of the patches.
 Does it support features which solr 1.4 normally supports?
    - As far as I know yes.

 am using facets as a workaround but then i am not able to sort on any
 other field. is there any workaround to support this feature??
    - Maybee http://wiki.apache.org/solr/Deduplication prevents from
 adding duplicates in you index, but then you miss the collapse counts
 and other computed values

 On 21 June 2010 09:04, Rakhi Khatwani rkhatw...@gmail.com wrote:
  Hi,
     Oh in that case is the code stable enough to use it for production?
  Does it support features which solr 1.4 normally supports?
 
  I am using facets as a workaround but then i am not able to sort on any
  other field. is there any workaround to support this feature??
 
  Regards,
  Raakhi
 
  On Fri, Jun 18, 2010 at 6:14 PM, Martijn v Groningen 
  martijn.is.h...@gmail.com wrote:
 
  Hi Rakhi,
 
  The patch is not compatible with 1.4. If you want to work with the
  trunk. I'll need to get the src from
  https://svn.apache.org/repos/asf/lucene/dev/trunk/
 
  Martijn
 
  On 18 June 2010 13:46, Rakhi Khatwani rkhatw...@gmail.com wrote:
   Hi Moazzam,
  
                    Where did u get the src code from??
  
   I am downloading it from
   https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4
  
   and the latest revision in this location is 955469.
  
   so applying the latest patch(dated 17th june 2010) on it still
 generates
   errors.
  
   Any Pointers?
  
   Regards,
   Raakhi
  
  
   On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan moazz...@gmail.com
  wrote:
  
   I knew it wasn't me! :)
  
   I found the patch just before I read this and applied it to the trunk
   and it works!
  
   Thanks Mark and martijn for all your help!
  
   - Moazzam
  
   On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen
   martijn.is.h...@gmail.com wrote:
I've added a new patch to the issue, so building the trunk (rev
955615) with the latest patch should not be a problem. Due to
 recent
changes in the Lucene trunk the patch was not compatible.
   
On 17 June 2010 20:20, Erik Hatcher erik.hatc...@gmail.com
 wrote:
   
On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote:
   
p.s. I'd be glad to contribute our Maven build re-organization
 back
  to
   the
community to get Solr properly Mavenized so that it can be
  distributed
   and
released more often.  For us the benefit of this structure is
 that
  we
   will
be able to overlay addons such as RequestHandlers and other third
  party
support without having to rebuild Solr from scratch.
   
But you don't have to rebuild Solr from scratch to add a new
 request
   handler
or other plugins - simply compile your custom stuff into a JAR and
  put
   it in
solr-home/lib (or point to it with lib in solrconfig.xml).
   
 Ideally, a Maven Archetype could be created that would allow one
   rapidly
produce a Solr webapp and fire it up in Jetty in mere seconds.
   
How's that any different than cd example; java -jar start.jar?  Or
 do
   you
mean a Solr client webapp?
   
Finally, with projects such as Bobo, integration with Spring
 would
  make
configuration more consistent and request significantly less java
   coding
just to add new capabilities everytime someone authors a new
   RequestHandler.
   
It's one line of config to add a new request handler.  How many
   ridiculously
ugly confusing lines of Spring XML would it take?
   
 The biggest thing I learned about Solr in my work thusfar is
 that
   patches
like these could be standalone modules in separate projects if it
   weren't
for having to hack the configuration and solrj methods up to
 adopt
   them.
 Which brings me to SolrJ, great API if it would stay generic and
  have
   less
concern for adding method each time some custom collections and
  query
support for morelikethis or collapseddocs needs to be added.
   
I personally find it silly that we customize SolrJ for all these
  request
handlers anyway.  You get a decent navigable data structure back
 from
general SolrJ query requests as it is, there's no need to build in
  all

Re: collapse exception

2010-06-22 Thread Martijn v Groningen

I checked your stacktrace and I can't remember putting
SolrIndexSearcher.getDocListAndSet(...) in the doQuery(...) method. I
guess the patch was modified before it was applied.
I think the error occurs when you do a field collapse search with a fq
parameter. That is the only reason I can think of why this exception
is thrown.

When this component become a contrib? Using patch is so annoying
Patching is a bit of a hassle. This patch has some changes in the
SolrIndexSearcher which makes it difficult to make it a contrib or an
extension.

On 22 June 2010 04:52, Li Li fancye...@gmail.com wrote:
 I don't know because it's patched by someone else but I can't get his
 help. When this component become a contrib? Using patch is so annoying

 2010/6/22 Martijn v Groningen martijn.is.h...@gmail.com:
 What version of Solr and which patch are you using?

 On 21 June 2010 11:46, Li Li fancye...@gmail.com wrote:
 it says  Either filter or filterList may be set in the QueryCommand,
 but not both. I am newbie of solr and have no idea of the exception.
 What's wrong with it? thank you.

 java.lang.IllegalArgumentException: Either filter or filterList may be
 set in the QueryCommand, but not both.
        at 
 org.apache.solr.search.SolrIndexSearcher$QueryCommand.setFilter(SolrIndexSearcher.java:1711)
        at 
 org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1286)
        at 
 org.apache.solr.search.fieldcollapse.NonAdjacentDocumentCollapser.doQuery(NonAdjacentDocumentCollapser.java:205)
        at 
 org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.executeCollapse(AbstractDocumentCollapser.java:246)
        at 
 org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:173)
        at 
 org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:174)
        at 
 org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
        at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
        at java.lang.Thread.run(Thread.java:619)




 --
 Met vriendelijke groet,

 Martijn van Groningen





-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Field Collapsing SOLR-236

2010-06-21 Thread Martijn v Groningen

Oh in that case is the code stable enough to use it for production?
-  Well this feature is a patch and I think that says it all.
Although bugs are fixed it is deferentially an experimental feature
and people should keep that in mind when using one of the patches.
Does it support features which solr 1.4 normally supports?
   - As far as I know yes.

am using facets as a workaround but then i am not able to sort on any
other field. is there any workaround to support this feature??
   - Maybee http://wiki.apache.org/solr/Deduplication prevents from
adding duplicates in you index, but then you miss the collapse counts
and other computed values

On 21 June 2010 09:04, Rakhi Khatwani rkhatw...@gmail.com wrote:
 Hi,
    Oh in that case is the code stable enough to use it for production?
 Does it support features which solr 1.4 normally supports?

 I am using facets as a workaround but then i am not able to sort on any
 other field. is there any workaround to support this feature??

 Regards,
 Raakhi

 On Fri, Jun 18, 2010 at 6:14 PM, Martijn v Groningen 
 martijn.is.h...@gmail.com wrote:

 Hi Rakhi,

 The patch is not compatible with 1.4. If you want to work with the
 trunk. I'll need to get the src from
 https://svn.apache.org/repos/asf/lucene/dev/trunk/

 Martijn

 On 18 June 2010 13:46, Rakhi Khatwani rkhatw...@gmail.com wrote:
  Hi Moazzam,
 
                   Where did u get the src code from??
 
  I am downloading it from
  https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4
 
  and the latest revision in this location is 955469.
 
  so applying the latest patch(dated 17th june 2010) on it still generates
  errors.
 
  Any Pointers?
 
  Regards,
  Raakhi
 
 
  On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan moazz...@gmail.com
 wrote:
 
  I knew it wasn't me! :)
 
  I found the patch just before I read this and applied it to the trunk
  and it works!
 
  Thanks Mark and martijn for all your help!
 
  - Moazzam
 
  On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen
  martijn.is.h...@gmail.com wrote:
   I've added a new patch to the issue, so building the trunk (rev
   955615) with the latest patch should not be a problem. Due to recent
   changes in the Lucene trunk the patch was not compatible.
  
   On 17 June 2010 20:20, Erik Hatcher erik.hatc...@gmail.com wrote:
  
   On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote:
  
   p.s. I'd be glad to contribute our Maven build re-organization back
 to
  the
   community to get Solr properly Mavenized so that it can be
 distributed
  and
   released more often.  For us the benefit of this structure is that
 we
  will
   be able to overlay addons such as RequestHandlers and other third
 party
   support without having to rebuild Solr from scratch.
  
   But you don't have to rebuild Solr from scratch to add a new request
  handler
   or other plugins - simply compile your custom stuff into a JAR and
 put
  it in
   solr-home/lib (or point to it with lib in solrconfig.xml).
  
    Ideally, a Maven Archetype could be created that would allow one
  rapidly
   produce a Solr webapp and fire it up in Jetty in mere seconds.
  
   How's that any different than cd example; java -jar start.jar?  Or do
  you
   mean a Solr client webapp?
  
   Finally, with projects such as Bobo, integration with Spring would
 make
   configuration more consistent and request significantly less java
  coding
   just to add new capabilities everytime someone authors a new
  RequestHandler.
  
   It's one line of config to add a new request handler.  How many
  ridiculously
   ugly confusing lines of Spring XML would it take?
  
    The biggest thing I learned about Solr in my work thusfar is that
  patches
   like these could be standalone modules in separate projects if it
  weren't
   for having to hack the configuration and solrj methods up to adopt
  them.
    Which brings me to SolrJ, great API if it would stay generic and
 have
  less
   concern for adding method each time some custom collections and
 query
   support for morelikethis or collapseddocs needs to be added.
  
   I personally find it silly that we customize SolrJ for all these
 request
   handlers anyway.  You get a decent navigable data structure back from
   general SolrJ query requests as it is, there's no need to build in
 all
  these
   convenience methods specific to all the Solr componetry.  Sure, it's
   convenient, but it's a maintenance headache and as you say, not
  generic.
  
   But hacking configuration is reasonable, I think, for adding in
 plugins.
   I
   guess you're aiming for some kind of Spring-like auto-discovery of
  plugins?
    Yeah, maybe, but I'm pretty -1 on Spring coming into Solr.  It's
  overkill
   and ugly, IMO.  But you like it :)  And that's cool by me, to each
 their
   own.
  
   Oh, and Hi Mark! :)
  
          Erik
  
  
  
  
  
   --
   Met vriendelijke groet,
  
   Martijn van Groningen
  
 
 





-- 
Met vriendelijke groet,

Martijn van Groningen

Re: collapse exception

2010-06-21 Thread Martijn v Groningen

What version of Solr and which patch are you using?

On 21 June 2010 11:46, Li Li fancye...@gmail.com wrote:
 it says  Either filter or filterList may be set in the QueryCommand,
 but not both. I am newbie of solr and have no idea of the exception.
 What's wrong with it? thank you.

 java.lang.IllegalArgumentException: Either filter or filterList may be
 set in the QueryCommand, but not both.
        at 
 org.apache.solr.search.SolrIndexSearcher$QueryCommand.setFilter(SolrIndexSearcher.java:1711)
        at 
 org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1286)
        at 
 org.apache.solr.search.fieldcollapse.NonAdjacentDocumentCollapser.doQuery(NonAdjacentDocumentCollapser.java:205)
        at 
 org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.executeCollapse(AbstractDocumentCollapser.java:246)
        at 
 org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:173)
        at 
 org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:174)
        at 
 org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
        at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
        at java.lang.Thread.run(Thread.java:619)




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Field Collapsing SOLR-236

2010-06-18 Thread Martijn v Groningen

Hi Rakhi,

The patch is not compatible with 1.4. If you want to work with the
trunk. I'll need to get the src from
https://svn.apache.org/repos/asf/lucene/dev/trunk/

Martijn

On 18 June 2010 13:46, Rakhi Khatwani rkhatw...@gmail.com wrote:
 Hi Moazzam,

                  Where did u get the src code from??

 I am downloading it from
 https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4

 and the latest revision in this location is 955469.

 so applying the latest patch(dated 17th june 2010) on it still generates
 errors.

 Any Pointers?

 Regards,
 Raakhi


 On Fri, Jun 18, 2010 at 1:24 AM, Moazzam Khan moazz...@gmail.com wrote:

 I knew it wasn't me! :)

 I found the patch just before I read this and applied it to the trunk
 and it works!

 Thanks Mark and martijn for all your help!

 - Moazzam

 On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen
 martijn.is.h...@gmail.com wrote:
  I've added a new patch to the issue, so building the trunk (rev
  955615) with the latest patch should not be a problem. Due to recent
  changes in the Lucene trunk the patch was not compatible.
 
  On 17 June 2010 20:20, Erik Hatcher erik.hatc...@gmail.com wrote:
 
  On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote:
 
  p.s. I'd be glad to contribute our Maven build re-organization back to
 the
  community to get Solr properly Mavenized so that it can be distributed
 and
  released more often.  For us the benefit of this structure is that we
 will
  be able to overlay addons such as RequestHandlers and other third party
  support without having to rebuild Solr from scratch.
 
  But you don't have to rebuild Solr from scratch to add a new request
 handler
  or other plugins - simply compile your custom stuff into a JAR and put
 it in
  solr-home/lib (or point to it with lib in solrconfig.xml).
 
   Ideally, a Maven Archetype could be created that would allow one
 rapidly
  produce a Solr webapp and fire it up in Jetty in mere seconds.
 
  How's that any different than cd example; java -jar start.jar?  Or do
 you
  mean a Solr client webapp?
 
  Finally, with projects such as Bobo, integration with Spring would make
  configuration more consistent and request significantly less java
 coding
  just to add new capabilities everytime someone authors a new
 RequestHandler.
 
  It's one line of config to add a new request handler.  How many
 ridiculously
  ugly confusing lines of Spring XML would it take?
 
   The biggest thing I learned about Solr in my work thusfar is that
 patches
  like these could be standalone modules in separate projects if it
 weren't
  for having to hack the configuration and solrj methods up to adopt
 them.
   Which brings me to SolrJ, great API if it would stay generic and have
 less
  concern for adding method each time some custom collections and query
  support for morelikethis or collapseddocs needs to be added.
 
  I personally find it silly that we customize SolrJ for all these request
  handlers anyway.  You get a decent navigable data structure back from
  general SolrJ query requests as it is, there's no need to build in all
 these
  convenience methods specific to all the Solr componetry.  Sure, it's
  convenient, but it's a maintenance headache and as you say, not
 generic.
 
  But hacking configuration is reasonable, I think, for adding in plugins.
  I
  guess you're aiming for some kind of Spring-like auto-discovery of
 plugins?
   Yeah, maybe, but I'm pretty -1 on Spring coming into Solr.  It's
 overkill
  and ugly, IMO.  But you like it :)  And that's cool by me, to each their
  own.
 
  Oh, and Hi Mark! :)
 
         Erik
 
 
 
 
 
  --
  Met vriendelijke groet,
 
  Martijn van Groningen

Re: Field Collapsing SOLR-236

2010-06-17 Thread Martijn v Groningen

I've added a new patch to the issue, so building the trunk (rev
955615) with the latest patch should not be a problem. Due to recent
changes in the Lucene trunk the patch was not compatible.

On 17 June 2010 20:20, Erik Hatcher erik.hatc...@gmail.com wrote:

 On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote:

 p.s. I'd be glad to contribute our Maven build re-organization back to the
 community to get Solr properly Mavenized so that it can be distributed and
 released more often.  For us the benefit of this structure is that we will
 be able to overlay addons such as RequestHandlers and other third party
 support without having to rebuild Solr from scratch.

 But you don't have to rebuild Solr from scratch to add a new request handler
 or other plugins - simply compile your custom stuff into a JAR and put it in
 solr-home/lib (or point to it with lib in solrconfig.xml).

  Ideally, a Maven Archetype could be created that would allow one rapidly
 produce a Solr webapp and fire it up in Jetty in mere seconds.

 How's that any different than cd example; java -jar start.jar?  Or do you
 mean a Solr client webapp?

 Finally, with projects such as Bobo, integration with Spring would make
 configuration more consistent and request significantly less java coding
 just to add new capabilities everytime someone authors a new RequestHandler.

 It's one line of config to add a new request handler.  How many ridiculously
 ugly confusing lines of Spring XML would it take?

  The biggest thing I learned about Solr in my work thusfar is that patches
 like these could be standalone modules in separate projects if it weren't
 for having to hack the configuration and solrj methods up to adopt them.
  Which brings me to SolrJ, great API if it would stay generic and have less
 concern for adding method each time some custom collections and query
 support for morelikethis or collapseddocs needs to be added.

 I personally find it silly that we customize SolrJ for all these request
 handlers anyway.  You get a decent navigable data structure back from
 general SolrJ query requests as it is, there's no need to build in all these
 convenience methods specific to all the Solr componetry.  Sure, it's
 convenient, but it's a maintenance headache and as you say, not generic.

 But hacking configuration is reasonable, I think, for adding in plugins.  I
 guess you're aiming for some kind of Spring-like auto-discovery of plugins?
  Yeah, maybe, but I'm pretty -1 on Spring coming into Solr.  It's overkill
 and ugly, IMO.  But you like it :)  And that's cool by me, to each their
 own.

 Oh, and Hi Mark! :)

        Erik





-- 
Met vriendelijke groet,

Martijn van Groningen

Re: question about the fieldCollapseCache

2010-06-09 Thread Martijn v Groningen

The fieldCollapseCache should not be used as it is now, it uses too
much memory. It stores any information relevant for a field collapse
search. Like document collapse counts, collapsed document ids /
fields, collapsed docset and uncollapsed docset (everything per unique
search). So the memory usage will grow for each unique query (and fast
with all this information). So its best I think to disable this cache
for now.

Martijn

On 8 June 2010 19:05, Jean-Sebastien Vachon js.vac...@videotron.ca wrote:
 Hi All,

 I've been running some tests using 6 shards each one containing about 1 
 millions documents.
 Each shard is running in its own virtual machine with 7 GB of ram (5GB 
 allocated to the JVM).
 After about 1100 unique queries the shards start to struggle and run out of 
 memory. I've reduced all
 other caches without significant impact.

 When I remove completely the fieldCollapseCache, the server can keep up for 
 hours
 and use only 2 GB of ram. (I'm even considering returning to a 32 bits JVM)

 The size of the fieldCollapseCache was set to 5000 items. How can 5000 items 
 eat 3 GB of ram?

 Can someone tell me what is put in this cache? Has anyone experienced this 
 kind of problem?

 I am running Solr 1.4.1 with patch 236. All requests are collapsing on a 
 single field (pint) and
 collapse.maxdocs set to 200 000.

 Thanks for any hints...

Re: question about the fieldCollapseCache

2010-06-09 Thread Martijn v Groningen

I agree. I'll add this information to the wiki.

On 9 June 2010 14:32, Jean-Sebastien Vachon js.vac...@videotron.ca wrote:
 ok great.

 I believe this should be mentioned in the wiki.

 Later

 On 2010-06-09, at 4:06 AM, Martijn v Groningen wrote:

 The fieldCollapseCache should not be used as it is now, it uses too
 much memory. It stores any information relevant for a field collapse
 search. Like document collapse counts, collapsed document ids /
 fields, collapsed docset and uncollapsed docset (everything per unique
 search). So the memory usage will grow for each unique query (and fast
 with all this information). So its best I think to disable this cache
 for now.

 Martijn

 On 8 June 2010 19:05, Jean-Sebastien Vachon js.vac...@videotron.ca wrote:
 Hi All,

 I've been running some tests using 6 shards each one containing about 1 
 millions documents.
 Each shard is running in its own virtual machine with 7 GB of ram (5GB 
 allocated to the JVM).
 After about 1100 unique queries the shards start to struggle and run out of 
 memory. I've reduced all
 other caches without significant impact.

 When I remove completely the fieldCollapseCache, the server can keep up for 
 hours
 and use only 2 GB of ram. (I'm even considering returning to a 32 bits JVM)

 The size of the fieldCollapseCache was set to 5000 items. How can 5000 
 items eat 3 GB of ram?

 Can someone tell me what is put in this cache? Has anyone experienced this 
 kind of problem?

 I am running Solr 1.4.1 with patch 236. All requests are collapsing on a 
 single field (pint) and
 collapse.maxdocs set to 200 000.

 Thanks for any hints...







-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Applying collapse patch

2010-05-28 Thread Martijn v Groningen

The trunk should work with the latest patch (SOLR-236-trunk.patch).
Did patching go successful? What compilation errors you get?

On 28 May 2010 11:10, Sophie M. sop...@beezik.com wrote:

 Ok I will have a look on the comments and I will post if necessary.

 Thanks ^^
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Applying-collapse-patch-tp851121p851170.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Applying collapse patch

2010-05-28 Thread Martijn v Groningen

Have you executed: ant example after building? (Assuming that this
is the example solr)

On 28 May 2010 12:17, Sophie M. sop...@beezik.com wrote:

 It is ok for applying the patch, thanks Martin. When I start Solr I get this
 logs in my console :

 C:\Users\Sophie\workspace\lucene-solr\lucene-solr\solr\solrjava -jar
 start.jar
 2010-05-28 12:09:30.037:INFO::Logging to STDERR via
 org.mortbay.log.StdErrLog
 2010-05-28 12:09:30.178:INFO::jetty-6.1.22
 2010-05-28 12:09:30.208:INFO::Opened
 C:\Users\Sophie\workspace\lucene-solr\lucene-solr\solr\solr\logs\2010_05_28.request.log
 2010-05-28 12:09:30.218:INFO::Started socketconnec...@0.0.0.0:8983

 and I have a 404 error on http://localhost:8983/solr

 I am investigating ^^

 regards

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Applying-collapse-patch-tp851121p851308.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Field Collapsing SOLR-236

2010-03-25 Thread Martijn v Groningen

Hi Blargy,

The latest path is not compatible with 1.4, I believe that the latest
field-collapse-5.patch file is compatible with 1.4. The file should at
least compile with 1.4 trunk. I'm not sure how the performance is.

Martijn

On 25 March 2010 01:49, Dennis Gearon gear...@sbcglobal.net wrote:
 Boy, I hope that field collapsing works! I'm planning on using it heavily.
 Dennis Gearon

 Signature Warning
 
 EARTH has a Right To Life,
  otherwise we all die.

 Read 'Hot, Flat, and Crowded'
 Laugh at http://www.yert.com/film.php


 --- On Wed, 3/24/10, blargy zman...@hotmail.com wrote:

 From: blargy zman...@hotmail.com
 Subject: Field Collapsing SOLR-236
 To: solr-user@lucene.apache.org
 Date: Wednesday, March 24, 2010, 12:17 PM

 Has anyone had any luck with the field collapsing patch
 (SOLR-236) with Solr
 1.4? I tried patching my version of 1.4 with no such luck.

 Thanks
 --
 View this message in context: 
 http://old.nabble.com/Field-Collapsing-SOLR-236-tp28019949p28019949.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.






-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Dedupe of document results at query-time

2010-01-23 Thread Martijn v Groningen

This manner of detecting duplicates at query time does really match
with what field collapsing does. So I suggest you look into that. As
far as I know there isn't any function query that does something you
have described in your example.

Cheers,

Martijn

On 23 January 2010 12:31, Peter S pete...@hotmail.com wrote:

 Hi,



 I wonder if someone might be able to shed some insight into this problem:



 Is it possible and/or what is the best/accepted way to achieve deduplication 
 of documents by field at query-time?



 For example:

 Let's say an index contains:



 Doc1

 

 host:Host1

 time:1 Sept 09

 appname:activePDF



 Doc2

 

 host:Host1

 time:2 Sept 09

 appname:activePDF



 Doc3

 

 host:Host1

 time:3 Sept 09

 appname:activePDF



 Can a query be constructed that would return only 1 of these Documents based 
 on appname (doesn't really matter which one)?



 i.e.:

   match on host:Host1

   ignore time

   dedupe on appname:activePDF



 Is this possible? Would FunctionQuery be helpful here, maybe? Am I actually 
 talking about field collapsing?



 Many thanks,

 Peter



 _
 Got a cool Hotmail story? Tell us now
 http://clk.atdmt.com/UKM/go/195013117/direct/01/



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Solr 1.4 Field collapsing - What are the steps for applying the SOLR-236 patch?

2010-01-12 Thread Martijn v Groningen

I wouldn't use the patches of the sub issues right now as they are
under development right now (the are currently a POC). I also think
that the latest patch in SOLR-236 is currently the best option. There
are some memory related problems with the patch that have to do with
caching. The fieldCollapse cache requires a lot of memory (best is not
to use it right now). The filterCache also becomes quite large as
well. Depending on the size your corpus you would need to increase
your heap size and play around with that.

Martijn

2010/1/12 Joe Calderon calderon@gmail.com:
 it seems to be in flux right now as the solr developers slowly make
 improvements and ingest the various pieces into the solr trunk, i think your
 best bet might be to use the 12/24 patch and fix any errors where it doesnt
 apply cleanly

 im using solr trunk r892336 with the 12/24 patch


 --joe
 On 01/11/2010 08:48 PM, Kelly Taylor wrote:

 Hi,

 Is there a step-by-step for applying the patch for SOLR-236 to enable
 field
 collapsing in Solr 1.4?

 Thanks,
 Kelly






-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Field Collapsing - disable cache

2009-12-23 Thread Martijn v Groningen

Latest SOLR-236.patch is for the trunk, if have updated the latest
patch so it should patch now without conflicts. If I remember
correctly the latest field-collapse-5.patch should work for 1.4, but
it doesn't for the trunk.

2009/12/23  r...@intelcompute.com:

 Sorry, that was when trying to patch the 1.4 branch

 attempting to patch the trunk gives...

 patching file src/test/test-files/fieldcollapse/testResponse.xml
 patching file src/test/org/apache/solr/BaseDistributedSearchTestCase.java
 Hunk #2 FAILED at 502.
 1 out of 2 hunks FAILED -- saving rejects to file 
 src/test/org/apache/solr/BaseDistributedSearchTestCase.java.rej
 patching file 
 src/test/org/apache/solr/search/fieldcollapse/FieldCollapsingIntegrationTest.java


 btw, when is trunk actually updated?





 On Wed 23/12/09 11:53 , r...@intelcompute.com wrote:

 I'm currently trying to patch aginst trunk, using SOLR-236.patch
 from 18/12/2009 but getting the following error...
 [ solr]$ patch -p0  SOLR-236.patch
 patching file
 src/test/test-files/solr/conf/solrconfig-fieldcollapse.xml
 patching file src/test/test-files/solr/conf/schema-fieldcollapse.xml
 patching file src/test/test-files/solr/conf/solrconfig.xml
 patching file src/test/test-files/fieldcollapse/testResponse.xml
 can't find file to patch at input line 787
 Perhaps you used the wrong -p or --strip option?
 The text leading up to this was:
 --
 |
 |Property changes on:
 src/test/test-files/fieldcollapse/testResponse.xml
 |___
 |Added: svn:keywords
 |   + Date Author Id Revision HeadURL
 |Added: svn:eol-style
 |   + native
 |
 |Index: src/test/org/apache/solr/BaseDistributedSearchTestCase.java
 |===
 |--- src/test/org/apache/solr/BaseDistributedSearchTestCase.java
 (revision 891214)
 |+++ src/test/org/apache/solr/BaseDistributedSearchTestCase.java
 (working copy)
 --
 File to patch:
 any suggestions, or should i checkout the 1.4 branch instead?
 can't remember what i did last time to get field-collapse-5.patch
 working successfully.
 On Tue 22/12/09 22:43 , Lance Norskog  wrote:
  To avoid this possible bug, you could change the cache to only
 have a
  few entries.
  On Tue, Dec 22, 2009 at 6:34 AM, Martijn v Groningen
  wrote:
   In the latest patch some changes where made on the configuration
  side,
   but if you add the CollapseComponent to the conf no field
 collapse
   cache should be enabled. If not let me know.
  
   Martijn
  
   2009/12/22  :
  
  
  
  
  
   On Tue 22/12/09 12:28 , Martijn v Groningen  wrote:
  
   Hi Rob,
   What patch are you actually using from SOLR-236?
   Martijn
   2009/12/22  :
I've tried both, the whole fieldCollapsing tag, and just the
fieldCollapseCache tag inside it.
       both cause error.
       I guess I can just set size, initialSize, and
  autowarmCount
   to 0 ??
On Tue 22/12/09 11:17 , Toby Cole  wrote:Which elements did
  you
comment out? It could be the case that you need
to get rid of the entire fieldCollapsing element, not just
 the
fieldCollapsingCache element.
(Disclaimer: I've not used field collapsing in anger before
 :)
Toby.
   
On 22 Dec 2009, at 11:09,  wrote:
   
That's what I assumed, but I'm getting the following error
  with
   it
commented out
MESSAGE null java.lang.NullPointerException at
org
.apache
.solr
.search
.fieldcollapse
.AbstractDocumentCollapser
   
  .createDocumentCollapseResult(AbstractDocumentCollapser.java:276)
at
org
.apache
.solr
.search
.fieldcollapse
.AbstractDocumentCollapser
.executeCollapse(AbstractDocumentCollapser.java:249)
at
org
.apache
.solr
.search
.fieldcollapse
   
  
 
 .AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:
   
172)
at
org
.apache
.solr
.handler
   
  
  .component.CollapseComponent.doProcess(CollapseComponent.java:173)
at
org
.apache
.solr
   
   
  
 
 .handler.component.CollapseComponent.process(CollapseComponent.java:
127)
at
org
.apache
.solr
.handler
   
  
  .component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at
org
.apache
.solr
   
   
  
 
 .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
  at
org
.apache
   
   
  
 
 .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336)
at
org
.apache
   
   
  
 
 .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239)
at
org
.apache
.catalina
.core
   
   
  
 
 .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:
   
215)
at
org
.apache
.catalina
   
   
  
 
 .core.ApplicationFilterChain.doFilter

Re: Field Collapsing - disable cache

2009-12-23 Thread Martijn v Groningen

How have you configured field collapsing?
I have field collapsing configured without caching like this:
searchComponent name=collapse
class=org.apache.solr.handler.component.CollapseComponent /

and with caching like this:
searchComponent name=collapse
class=org.apache.solr.handler.component.CollapseComponent

fieldCollapseCache
class=solr.FastLRUCache
size=512
initialSize=512
autowarmCount=128/

  /searchComponent

In both situations I don't get NPE. Also the line (in the patch) where
the exceptions occurs seems unlikely for a NPE.
if (fieldCollapseCache != null) {
  fieldCollapseCache.put(currentCacheKey, new CacheValue(result,
collectors, collapseContext)); // line 276
}

Does the exception occur immediately after the first search?

Martijn

2009/12/23  r...@intelcompute.com:

 Still seeing the same error when trying to comment out the fieldCollapsing 
 block, or even just the fieldCollapseCache block inside it to disable the 
 cache.

 fresh trunk and latest patch.



 message null java.lang.NullPointerException at 
 org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.createDocumentCollapseResult(AbstractDocumentCollapser.java:276)
  at 
 org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.executeCollapse(AbstractDocumentCollapser.java:249)
  at 
 org.apache.solr.search.fieldcollapse.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:172)
  at 
 org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:173)
  at 
 org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:127)
  at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
  at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336)
  at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239)
  at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
  at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
  at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210)
  at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
  at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) 
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) 
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
  at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151) 
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870) 
 at 
 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
  at 
 org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
  at 
 org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
  at 
 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685)
  at java.lang.Thread.run(Thread.java:636)







 On Wed 23/12/09 13:26 , r...@intelcompute.com wrote:

 Thanks, that latest update to the patch works fine now.
 On Wed 23/12/09 13:13 , Martijn v Groningen  wrote:
  Latest SOLR-236.patch is for the trunk, if have updated the latest
  patch so it should patch now without conflicts. If I remember
  correctly the latest field-collapse-5.patch should work for 1.4,
 but
  it doesn't for the trunk.
  2009/12/23  :
  
   Sorry, that was when trying to patch the 1.4 branch
  
   attempting to patch the trunk gives...
  
   patching file src/test/test-files/fieldcollapse/testResponse.xml
   patching file
  src/test/org/apache/solr/BaseDistributedSearchTestCase.java
   Hunk #2 FAILED at 502.
   1 out of 2 hunks FAILED -- saving rejects to file
  src/test/org/apache/solr/BaseDistributedSearchTestCase.java.rej
   patching file
 
 src/test/org/apache/solr/search/fieldcollapse/FieldCollapsingIntegrationTes
  t.java
  
   btw, when is trunk actually updated?
  
  
  
  
  
   On Wed 23/12/09 11:53 ,  wrote:
  
   I'm currently trying to patch aginst trunk, using
 SOLR-236.patch
   from 18/12/2009 but getting the following error...
   [ solr]$ patch -p0  SOLR-236.patch
   patching file
   src/test/test-files/solr/conf/solrconfig-fieldcollapse.xml
   patching file
  src/test/test-files/solr/conf/schema-fieldcollapse.xml
   patching file src/test/test-files/solr/conf/solrconfig.xml
   patching file
 src/test/test-files/fieldcollapse/testResponse.xml
   can't find file to patch at input line 787
   Perhaps you used the wrong -p or --strip option?
   The text leading up

Re: Field Collapsing - disable cache

2009-12-22 Thread Martijn v Groningen

Hi Rob,

What patch are you actually using from SOLR-236?

Martijn

2009/12/22  r...@intelcompute.com:
 I've tried both, the whole fieldCollapsing tag, and just the
 fieldCollapseCache tag inside it.
        both cause error.
        I guess I can just set size, initialSize, and autowarmCount to 0 ??
 On Tue 22/12/09 11:17 , Toby Cole  wrote:Which elements did you
 comment out? It could be the case that you need
 to get rid of the entire fieldCollapsing element, not just the
 fieldCollapsingCache element.
 (Disclaimer: I've not used field collapsing in anger before :)
 Toby.

 On 22 Dec 2009, at 11:09,  wrote:

 That's what I assumed, but I'm getting the following error with it
 commented out
 MESSAGE null java.lang.NullPointerException at
 org
 .apache
 .solr
 .search
 .fieldcollapse
 .AbstractDocumentCollapser
 .createDocumentCollapseResult(AbstractDocumentCollapser.java:276)
 at
 org
 .apache
 .solr
 .search
 .fieldcollapse
 .AbstractDocumentCollapser
 .executeCollapse(AbstractDocumentCollapser.java:249)
 at
 org
 .apache
 .solr
 .search
 .fieldcollapse
 .AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:

 172)
 at
 org
 .apache
 .solr
 .handler
 .component.CollapseComponent.doProcess(CollapseComponent.java:173)
 at
 org
 .apache
 .solr

 .handler.component.CollapseComponent.process(CollapseComponent.java:
 127)
 at
 org
 .apache
 .solr
 .handler
 .component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at
 org
 .apache
 .solr

 .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
 org
 .apache

 .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336)
 at
 org
 .apache

 .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239)
 at
 org
 .apache
 .catalina
 .core

 .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:

 215)
 at
 org
 .apache
 .catalina

 .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
 at
 org
 .apache

 .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:

 210)
 at
 org
 .apache

 .catalina.core.StandardContextValve.invoke(StandardContextValve.java:

 172)
 at
 org
 .apache
 .catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
 org
 .apache
 .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
 at
 org
 .apache
 .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:

 108)
 at
 org

 .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
 151)
 at
 org
 .apache.coyote.http11.Http11Processor.process(Http11Processor.java:

 870)
 at
 org.apache.coyote.http11.Http11BaseProtocol
 $Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:

 665)
 at
 org
 .apache

 .tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:
 528)
 at
 org
 .apache
 .tomcat
 .util
 .net

 .LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
 at
 org.apache.tomcat.util.threads.ThreadPool
 $ControlRunnable.run(ThreadPool.java:685)
 at java.lang.Thread.run(Thread.java:636)
 On Tue 22/12/09 11:02 , Toby Cole  wrote:If you take out the
 fieldCollapsing/fieldCollapseCache element in your
 config the fieldcollapse component will not use a cache.
 From http://wiki.apache.org/solr/FieldCollapsing%2523line-63 [2]
 target=_blankhttp://wiki.apache.org/solr/FieldCollapsing%23line-63
 [1]

 target=_blankhttp://wiki.apache.org/solr/FieldCollapsing%23line-63
 [3]
 target=_blankhttp://wiki.apache.org/solr/FieldCollapsing#line-63
 If the field collapse cache is not configured then the field
 collapse logic will not be cached.

 Regards, Toby.

 On 22 Dec 2009, at 10:56,  wrote:

 my
 solconfig can be seen at
 http://www.intelcompute.com/solrconfig.xml [4]
 target=_blankhttp://www.intelcompute.com/solrconfig.xml
 [3] target=_blankhttp://www.intelcompute.com/solrconfig.xml
 [5] target=_blankhttp://www.intelcompute.com/solrconfig.xml
 [1]
 On Tue 22/12/09 10:51 ,  wrote:Is
 it possible to disable the field collapsing cache?  I'm trying to
 perform some speed tests, and have managed to comment out the
 filter,
 queryResult, and document caches successfully.

 on 1.5
 ...
 ...
 collapse

 facet

 tvComponent
 ...
 -
 Message sent via Atmail Open - http://atmail.org/ [6]
 target=_blankhttp://atmail.org/ [5]
 target=_blankhttp://atmail.org/ [7]
 target=_blankhttp://atmail.org/ [2]
 target=_blankhttp://atmail.org/ [8]
 target=_blankhttp://atmail.org/ [6]
 target=_blankhttp://atmail.org/ [9]
 target=_blankhttp://atmail.org/
 -
 Message sent via Atmail Open - http://atmail.org/ [10]
 target=_blankhttp://atmail.org/ [7]
 target=_blankhttp://atmail.org/ [11]
 target=_blankhttp://atmail.org/

 Links:
 --
 [1] http://www.intelcompute.com/solrconfig.xml [12]
 target=_blankhttp://www.intelcompute.com/solrconfig.xml [8]
 target=_blankhttp://www.intelcompute.com/solrconfig.xml [13]

Re: Field Collapsing - disable cache

2009-12-22 Thread Martijn v Groningen

In the latest patch some changes where made on the configuration side,
but if you add the CollapseComponent to the conf no field collapse
cache should be enabled. If not let me know.

Martijn

2009/12/22  r...@intelcompute.com:





 On Tue 22/12/09 12:28 , Martijn v Groningen martijn.is.h...@gmail.com wrote:

 Hi Rob,
 What patch are you actually using from SOLR-236?
 Martijn
 2009/12/22  :
  I've tried both, the whole fieldCollapsing tag, and just the
  fieldCollapseCache tag inside it.
         both cause error.
         I guess I can just set size, initialSize, and autowarmCount
 to 0 ??
  On Tue 22/12/09 11:17 , Toby Cole  wrote:Which elements did you
  comment out? It could be the case that you need
  to get rid of the entire fieldCollapsing element, not just the
  fieldCollapsingCache element.
  (Disclaimer: I've not used field collapsing in anger before :)
  Toby.
 
  On 22 Dec 2009, at 11:09,  wrote:
 
  That's what I assumed, but I'm getting the following error with
 it
  commented out
  MESSAGE null java.lang.NullPointerException at
  org
  .apache
  .solr
  .search
  .fieldcollapse
  .AbstractDocumentCollapser
  .createDocumentCollapseResult(AbstractDocumentCollapser.java:276)
  at
  org
  .apache
  .solr
  .search
  .fieldcollapse
  .AbstractDocumentCollapser
  .executeCollapse(AbstractDocumentCollapser.java:249)
  at
  org
  .apache
  .solr
  .search
  .fieldcollapse
 
 .AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:
 
  172)
  at
  org
  .apache
  .solr
  .handler
 
 .component.CollapseComponent.doProcess(CollapseComponent.java:173)
  at
  org
  .apache
  .solr
 
 
 .handler.component.CollapseComponent.process(CollapseComponent.java:
  127)
  at
  org
  .apache
  .solr
  .handler
 
 .component.SearchHandler.handleRequestBody(SearchHandler.java:195)
  at
  org
  .apache
  .solr
 
 
 .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
  org
  .apache
 
 
 .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336)
  at
  org
  .apache
 
 
 .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239)
  at
  org
  .apache
  .catalina
  .core
 
 
 .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:
 
  215)
  at
  org
  .apache
  .catalina
 
 
 .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
  at
  org
  .apache
 
 
 .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:
 
  210)
  at
  org
  .apache
 
 
 .catalina.core.StandardContextValve.invoke(StandardContextValve.java:
 
  172)
  at
  org
  .apache
 
 .catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
  at
  org
  .apache
 
 .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
  at
  org
  .apache
 
 .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:
 
  108)
  at
  org
 
 
 .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
  151)
  at
  org
 
 .apache.coyote.http11.Http11Processor.process(Http11Processor.java:
 
  870)
  at
  org.apache.coyote.http11.Http11BaseProtocol
 
 $Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:
 
  665)
  at
  org
  .apache
 
 
 .tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:
  528)
  at
  org
  .apache
  .tomcat
  .util
  .net
 
 
 .LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
  at
  org.apache.tomcat.util.threads.ThreadPool
  $ControlRunnable.run(ThreadPool.java:685)
  at java.lang.Thread.run(Thread.java:636)
  On Tue 22/12/09 11:02 , Toby Cole  wrote:If you take out the
  fieldCollapsing/fieldCollapseCache element in your
  config the fieldcollapse component will not use a cache.
  From http://wiki.apache.org/solr/FieldCollapsing%2523line-63 [2]
 
 target=_blankhttp://wiki.apache.org/solr/FieldCollapsing%23line-63
  [1]
 
 
 target=_blankhttp://wiki.apache.org/solr/FieldCollapsing%23line-63
  [3]
 
 target=_blankhttp://wiki.apache.org/solr/FieldCollapsing#line-63
  If the field collapse cache is not configured then the field
  collapse logic will not be cached.
 
  Regards, Toby.
 
  On 22 Dec 2009, at 10:56,  wrote:
 
  my
  solconfig can be seen at
  http://www.intelcompute.com/solrconfig.xml [4]
  target=_blankhttp://www.intelcompute.com/solrconfig.xml
  [3] target=_blankhttp://www.intelcompute.com/solrconfig.xml
  [5] target=_blankhttp://www.intelcompute.com/solrconfig.xml
  [1]
  On Tue 22/12/09 10:51 ,  wrote:Is
  it possible to disable the field collapsing cache?  I'm trying
 to
  perform some speed tests, and have managed to comment out the
  filter,
  queryResult, and document caches successfully.
 
  on 1.5
  ...
  ...
  collapse
 
  facet
 
  tvComponent
  ...
  -
  Message sent via Atmail Open - http://atmail.org/ [6]
  target=_blankhttp://atmail.org/ [5]
  target=_blankhttp://atmail.org/ [7]
  target=_blankhttp://atmail.org/ [2]
  target=_blankhttp://atmail.org

Re: Results after using Field Collapsing are not matching the results without using Field Collapsing

2009-12-13 Thread Martijn v Groningen

I would not expect that Solr 1.4 build is the cause of the problem.
Just out of curiosity does the same happen when collapse.threshold=1?

2009/12/11 Varun Gupta varun.vgu...@gmail.com:
Here is the field type configuration of ctype:
field name=ctype type=integer indexed=true stored=true
omitNorms=true /

In solrconfig.xml, this is how I am enabling field collapsing:
searchComponent name=query
class=org.apache.solr.handler.component.CollapseComponent/

Apart from this, I made no changes in solrconfig.xml for field collapse. I
am currently not using the field collapse cache.

I have applied the patch on the Solr 1.4 build. I am not using the latest
solr nightly build. Can that cause any problem?

--
Thanks
Varun Gupta

On Fri, Dec 11, 2009 at 3:44 AM, Martijn v Groningen
martijn.is.h...@gmail.com wrote:

I tried to reproduce a similar situation here, but I got the expected
and correct results. Those three documents that you saw in your first
search result should be the first in your second search result (unless
the index changes or the sort changes ) when fq on that specific
category. I'm not sure what is causing this problem. Can you give me
some more information like the field type configuration for the ctype
field and how have configured field collapsing?

I did find another problem to do with field collapse caching. The
collapse.threshold or collapse.maxdocs parameters are not taken into
account when caching, which is off course wrong because they do matter
when collapsing. Based on the information you have given me this
caching problem is not the cause of the situation you have. I will
update the patch that fixes this problem shortly.

Martijn

2009/12/10 Varun Gupta varun.vgu...@gmail.com:
Hi Martijn,

I am not sending the collapse parameters for the second query. Here are
the
queries I am using:

*When using field collapsing (searching over all categories):*

spellcheck=truecollapse.info.doc=truefacet=truecollapse.threshold=3facet.mincount=1spellcheck.q=weight+losscollapse.facet=beforewt=xmlf.content.hl.snippets=2hl=trueversion=2.2rows=20collapse.field=ctypefl=id,sid,title,image,ctype,scorestart=0q=weight+losscollapse.info.count=falsefacet.field=ctypeqt=contentsearch

categories is represented as the field ctype above.

*Without using field collapsing:*

spellcheck=truefacet=truefacet.mincount=1spellcheck.q=weight+losswt=xmlhl=truerows=10version=2.2fl=id,sid,title,image,ctype,scorestart=0q=weight+lossfacet.field=ctypeqt=contentsearch

I append fq=ctype:1 to the above queries when trying to get results
for a
particular category.

--
Thanks
Varun Gupta

On Thu, Dec 10, 2009 at 5:58 PM, Martijn v Groningen
martijn.is.h...@gmail.com wrote:

Hi Varun,

Can you send the whole requests (with params), that you send to Solr
for both queries?
In your situation the collapse parameters only have to be used for the
first query and not the second query.

Martijn

2009/12/10 Varun Gupta varun.vgu...@gmail.com:
Hi,

I have documents under 6 different categories. While searching, I want
to
show 3 documents from each category along with a link to see all the
documents under a single category. I decided to use field collapsing
so
that
I don't have to make 6 queries (one for each category). Currently I am
using
the field collapsing patch uploaded on 29th Nov.

Now, the results that are coming after using field collapsing are not
matching the results for a single category. For example, for category
C1,
I
am getting results R1, R2 and R3 using field collapsing, but after I
see
results only from the category C1 (without using field collapsing)
these
results are nowhere in the first 10 results.

Am I doing something wrong or using the field collapsing for the wrong
feature?

I am using the following field collapsing parameters while querying:
collapse.field=category
collapse.facet=before
collapse.threshold=3

--
Thanks
Varun Gupta

--
Met vriendelijke groet,

Martijn van Groningen

--
Met vriendelijke groet,

Martijn van Groningen

--
Met vriendelijke groet,

Martijn van Groningen

Re: Results after using Field Collapsing are not matching the results without using Field Collapsing

2009-12-10 Thread Martijn v Groningen

Hi Varun,

Can you send the whole requests (with params), that you send to Solr
for both queries?
In your situation the collapse parameters only have to be used for the
first query and not the second query.

Martijn

2009/12/10 Varun Gupta varun.vgu...@gmail.com:
 Hi,

 I have documents under 6 different categories. While searching, I want to
 show 3 documents from each category along with a link to see all the
 documents under a single category. I decided to use field collapsing so that
 I don't have to make 6 queries (one for each category). Currently I am using
 the field collapsing patch uploaded on 29th Nov.

 Now, the results that are coming after using field collapsing are not
 matching the results for a single category. For example, for category C1, I
 am getting results R1, R2 and R3 using field collapsing, but after I see
 results only from the category C1 (without using field collapsing) these
 results are nowhere in the first 10 results.

 Am I doing something wrong or using the field collapsing for the wrong
 feature?

 I am using the following field collapsing parameters while querying:
   collapse.field=category
   collapse.facet=before
   collapse.threshold=3

 --
 Thanks
 Varun Gupta




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Results after using Field Collapsing are not matching the results without using Field Collapsing

2009-12-10 Thread Martijn v Groningen

Martijn

2009/12/10 Varun Gupta varun.vgu...@gmail.com:
Hi Martijn,

I am not sending the collapse parameters for the second query. Here are the
queries I am using:

*When using field collapsing (searching over all categories):*
spellcheck=truecollapse.info.doc=truefacet=truecollapse.threshold=3facet.mincount=1spellcheck.q=weight+losscollapse.facet=beforewt=xmlf.content.hl.snippets=2hl=trueversion=2.2rows=20collapse.field=ctypefl=id,sid,title,image,ctype,scorestart=0q=weight+losscollapse.info.count=falsefacet.field=ctypeqt=contentsearch

categories is represented as the field ctype above.

*Without using field collapsing:*
spellcheck=truefacet=truefacet.mincount=1spellcheck.q=weight+losswt=xmlhl=truerows=10version=2.2fl=id,sid,title,image,ctype,scorestart=0q=weight+lossfacet.field=ctypeqt=contentsearch

I append fq=ctype:1 to the above queries when trying to get results for a
particular category.

--
Thanks
Varun Gupta

On Thu, Dec 10, 2009 at 5:58 PM, Martijn v Groningen
martijn.is.h...@gmail.com wrote:

Hi Varun,

Can you send the whole requests (with params), that you send to Solr
for both queries?
In your situation the collapse parameters only have to be used for the
first query and not the second query.

Martijn

2009/12/10 Varun Gupta varun.vgu...@gmail.com:
Hi,

I have documents under 6 different categories. While searching, I want to
show 3 documents from each category along with a link to see all the
documents under a single category. I decided to use field collapsing so
that
I don't have to make 6 queries (one for each category). Currently I am
using
the field collapsing patch uploaded on 29th Nov.

Now, the results that are coming after using field collapsing are not
matching the results for a single category. For example, for category C1,
I
am getting results R1, R2 and R3 using field collapsing, but after I see
results only from the category C1 (without using field collapsing) these
results are nowhere in the first 10 results.

Am I doing something wrong or using the field collapsing for the wrong
feature?

I am using the following field collapsing parameters while querying:
collapse.field=category
collapse.facet=before
collapse.threshold=3

--
Thanks
Varun Gupta

--
Met vriendelijke groet,

Martijn van Groningen

--
Met vriendelijke groet,

Martijn van Groningen

Re: Grouping

2009-12-06 Thread Martijn v Groningen

Field collapsing has some aggregation functions like sum() and avg(),
but the statistics are computed based on collapse groups instead of
all documents with the same field value. A collapse group contains
documents that were not relevant enough to end up (collapsed
documents) in the search result and one or more documents that are
relevant for the current search result, that are being displayed in
the search result. This number is controlled by the collapse.threshold
parameter, that defaults to one.

The statistics are calculated based on the collapsed documents, so it
is not exactly the same as a sql group by. You can however get similar
results when the collapse.threshold is one and add the field value
(e.g. price) of the most relevant document to the aggregated
statistic. Off course you will have to this yourself on the client
side. Hope this clarifies the field collapse functionality a bit.

Martijn

2009/12/4 Otis Gospodnetic otis_gospodne...@yahoo.com:
 Not out of the box.  You could group by using SOLR-236 perhaps?

 Otis
 --
 Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



 - Original Message 
 From: Bruno brun...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, December 4, 2009 1:08:59 PM
 Subject: Grouping

 Is there a way to make a group by or distinct query?

 --
 Bruno Morelli Vargas
 Mail: brun...@gmail.com
 Msn: brun...@hotmail.com
 Icq: 165055101
 Skype: morellibmv

Re: Deduplication in 1.4

2009-11-26 Thread Martijn v Groningen

Field collapsing has been used by many in their production
environment. The last few months the stability of the patch grew as
quiet some bugs were fixed. The only big feature missing currently is
caching of the collapsing algorithm. I'm currently working on that and
I will put it in a new patch in the coming next days.  So yes the
patch is very near being production ready.

Martijn

2009/11/26 KaktuChakarabati jimmoe...@gmail.com:

 Hey Otis,
 Yep, I realized this myself after playing some with the dedupe feature
 yesterday.
 So it does look like Field collapsing is what I need pretty much.
 Any idea on how close it is to being production-ready?

 Thanks,
 -Chak

 Otis Gospodnetic wrote:

 Hi,

 As far as I know, the point of deduplication in Solr (
 http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate
 document before indexing it in order to avoid duplicates in the index in
 the first place.

 What you are describing is closer to field collapsing patch in SOLR-236.

  Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



 - Original Message 
 From: KaktuChakarabati jimmoe...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, November 24, 2009 5:29:00 PM
 Subject: Deduplication in 1.4


 Hey,
 I've been trying to find some documentation on using this feature in 1.4
 but
 Wiki page is alittle sparse..
 In specific, here's what i'm trying to do:

 I have a field, say 'duplicate_group_id' that i'll populate based on some
 offline documents deduplication process I have.

 All I want is for solr to compute a 'duplicate_signature' field based on
 this one at update time, so that when i search for documents later, all
 documents with same original 'duplicate_group_id' value will be rolled up
 (e.g i'll just get the first one that came back  according to relevancy).

 I enabled the deduplication processor and put it into updater, but i'm
 not
 seeing any difference in returned results (i.e results with same
 duplicate_id are returned separately..)

 is there anything i need to supply in query-time for this to take effect?
 what should be the behaviour? is there any working example of this?

 Anything will be helpful..

 Thanks,
 Chak
 --
 View this message in context:
 http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 View this message in context: 
 http://old.nabble.com/Deduplication-in-1.4-tp26504403p26522386.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Deduplication in 1.4

2009-11-26 Thread Martijn v Groningen

Two sites that use field-collapsing:
1) www.ilocal.nl
2) www.welke.nl
I'm not sure what you mean with double-tripping? The sites mentioned
do not have performance problems that are caused by field collapsing.

Field-collapsing currently only supports quasi distributed
field-collapsing (as I have described on the Solr wiki). Currently I
don't know a distributed field-collapsing algorithm that works
properly and does not influence the search time in such a way that the
search becomes slow.

Martijn

2009/11/26 Otis Gospodnetic otis_gospodne...@yahoo.com:
 Hi Martijn,


 - Original Message 

 From: Martijn v Groningen martijn.is.h...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thu, November 26, 2009 3:19:40 AM
 Subject: Re: Deduplication in 1.4

 Field collapsing has been used by many in their production
 environment.

 Got any pointers to public sites you know use it?  I know of a high traffic 
 site that used an early version, and it caused performance problems.  Is 
 double-tripping still required?

 The last few months the stability of the patch grew as
 quiet some bugs were fixed. The only big feature missing currently is
 caching of the collapsing algorithm. I'm currently working on that and

 Is it also full distributed-search-ready?

 I will put it in a new patch in the coming next days.  So yes the
 patch is very near being production ready.

 Thanks,
 Otis

 Martijn

 2009/11/26 KaktuChakarabati :
 
  Hey Otis,
  Yep, I realized this myself after playing some with the dedupe feature
  yesterday.
  So it does look like Field collapsing is what I need pretty much.
  Any idea on how close it is to being production-ready?
 
  Thanks,
  -Chak
 
  Otis Gospodnetic wrote:
 
  Hi,
 
  As far as I know, the point of deduplication in Solr (
  http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate
  document before indexing it in order to avoid duplicates in the index in
  the first place.
 
  What you are describing is closer to field collapsing patch in SOLR-236.
 
   Otis
  --
  Sematext is hiring -- http://sematext.com/about/jobs.html?mls
  Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
  - Original Message 
  From: KaktuChakarabati
  To: solr-user@lucene.apache.org
  Sent: Tue, November 24, 2009 5:29:00 PM
  Subject: Deduplication in 1.4
 
 
  Hey,
  I've been trying to find some documentation on using this feature in 1.4
  but
  Wiki page is alittle sparse..
  In specific, here's what i'm trying to do:
 
  I have a field, say 'duplicate_group_id' that i'll populate based on some
  offline documents deduplication process I have.
 
  All I want is for solr to compute a 'duplicate_signature' field based on
  this one at update time, so that when i search for documents later, all
  documents with same original 'duplicate_group_id' value will be rolled up
  (e.g i'll just get the first one that came back  according to relevancy).
 
  I enabled the deduplication processor and put it into updater, but i'm
  not
  seeing any difference in returned results (i.e results with same
  duplicate_id are returned separately..)
 
  is there anything i need to supply in query-time for this to take effect?
  what should be the behaviour? is there any working example of this?
 
  Anything will be helpful..
 
  Thanks,
  Chak
  --
  View this message in context:
  http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
  --
  View this message in context:
 http://old.nabble.com/Deduplication-in-1.4-tp26504403p26522386.html
  Sent from the Solr - User mailing list archive at Nabble.com.

Re: question about collapse.type = adjacent

2009-11-02 Thread Martijn v Groningen

Hi Micheal,

Field collapsing is basicly done in two steps. The first step is to
get the uncollapsed sorted (whether it is score or a field value)
documents and the second step is to apply the collapse algorithm on
the uncollapsed documents. So yes, when specifying
collapse.type=adjacent the documents can get collapsed after the sort
has been applied, but this also the case when not specifying
collapse.type=adjacent
I hope this answers your question.

Cheers,

Martijn

2009/11/2 michael8 mich...@saracatech.com:

 Hi,

 I would like to confirm if 'adjacent' in collapse.type means the documents
 (with the same collapse field value) are considered adjacent *after* the
 'sort' param from the query has been applied, or *before*?  I would think it
 would be *after* since collapse feature primarily is meant for presentation
 use.

 Thanks,
 Michael
 --
 View this message in context: 
 http://old.nabble.com/question-about-collapse.type-%3D-adjacent-tp26157114p26157114.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
Met vriendelijke groet,

Martijn van Groningen

Re: weird problem with letters S and T

2009-10-28 Thread Martijn v Groningen

I think that is not a problem, because your are only storing one
character per field. There are other text field types that do not have
the stop word filter, so give your first letter field that field type.
In this way stopword filter analyser is only disabled for searches on
the first letter field.

Cheers,

Martijn

2009/10/28 Joel Nylund jnyl...@yahoo.com:
 Thanks Bern, now that you mention it they are in there, I assume if I remove
 them it will work, but I probably dont want to do that right?

 Is there a way for this particular query to ignore stopwords

 thanks
 Joel

 On Oct 28, 2009, at 6:20 PM, Bernadette Houghton wrote:

 Hi Joel, I had a similar issue the other day; in my case the solution
 turned out to be that the letters were stopwords. Don't know if this is your
 answer, but worth checking.
 Bern

 -Original Message-
 From: Joel Nylund [mailto:jnyl...@yahoo.com]
 Sent: Thursday, 29 October 2009 9:17 AM
 To: solr-user@lucene.apache.org
 Subject: weird problem with letters S and T

 (I am super new to solr, sorry if this is an easy one)

 Hi, I want to support an A-Z type view of my data.

 I have a DataImportHandler that uses sql (my query is complex, but the
 part that matters is:

 SELECT f.id, f.title, LEFT(f.title,1) as firstLetterTitle FROM Foo f

 I can create this index with no issues.

 I can query the title with no problem:

 http://localhost:8983/solr/select?q=title:super

 I can query the first letters mostly with no problem:

 http://localhost:8983/solr/select?q=firstLetterTitle:a

 Returns all the foo's with the first letter a.

 This actually works with every letter except S and T

 If I query those, I get no results. The weird thing if I do the title
 query above with Super I get lots of results, and the xml shoes the
 firstLetterTitles for those to be S

 doc
 str name=firstLetterTitleS/str
 str name=id84861348/str
 str name=titleSuper Cool/str
 /doc
 −
 doc
 str name=firstLetterTitleS/str
 str name=id108692/str
 str name=titleSuper 45/str
 /doc
 −
 doc

 etc.

 Any ideas, are S and T special chars in query for solr?

 here is the response from the s query with debug = true

 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime24/int
 −
 lst name=params
 str name=qfirstLetterTitle:s/str
 str name=debugQuerytrue/str
 /lst
 /lst
 result name=response numFound=0 start=0/
 −
 lst name=debug
 str name=rawquerystringfirstLetterTitle:s/str
 str name=querystringfirstLetterTitle:s/str
 str name=parsedquery/
 str name=parsedquery_toString/
 lst name=explain/
 str name=QParserOldLuceneQParser/str
 −
 lst name=timing
 double name=time2.0/double
 −
 lst name=prepare
 double name=time1.0/double
 −
 lst name=org.apache.solr.handler.component.QueryComponent
 double name=time1.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.FacetComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.MoreLikeThisComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.HighlightComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.DebugComponent
 double name=time0.0/double
 /lst
 /lst
 −
 lst name=process
 double name=time0.0/double
 −
 lst name=org.apache.solr.handler.component.QueryComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.FacetComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.MoreLikeThisComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.HighlightComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.DebugComponent
 double name=time0.0/double
 /lst
 /lst
 /lst
 /lst
 /response



 thanks
 Joel

Re: field collapsing bug (java.lang.ArrayIndexOutOfBoundsException)

2009-10-25 Thread Martijn v Groningen

Hi Joe,

Can you give a bit more context info? Like the exact search and the
field types you are using for example. Also are you doing a lot of
frequent updates to the index?

Cheers,

Martijn

2009/10/23 Joe Calderon calderon@gmail.com:
 seems to happen when sort on anything besides strictly score, even
 score desc, num desc triggers it, using latest nightly and 10/14 patch

 Problem accessing /solr/core1/select. Reason:

    4731592

 java.lang.ArrayIndexOutOfBoundsException: 4731592
        at 
 org.apache.lucene.search.FieldComparator$StringOrdValComparator.copy(FieldComparator.java:660)
        at 
 org.apache.solr.search.NonAdjacentDocumentCollapser$DocumentComparator.compare(NonAdjacentDocumentCollapser.java:235)
        at 
 org.apache.solr.search.NonAdjacentDocumentCollapser$DocumentPriorityQueue.lessThan(NonAdjacentDocumentCollapser.java:173)
        at 
 org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)
        at 
 org.apache.solr.search.NonAdjacentDocumentCollapser.doCollapsing(NonAdjacentDocumentCollapser.java:95)
        at 
 org.apache.solr.search.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:208)
        at 
 org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:98)
        at 
 org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:66)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1148)
        at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:387)
        at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
        at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
        at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
        at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:539)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
        at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520)

Re: field collapsing bug (java.lang.ArrayIndexOutOfBoundsException)

2009-10-25 Thread Martijn v Groningen

I was able to reproduce the exact same stacktrace you have sent. The
exception occured when I removed a document from a newly created index
(with a commit) and then did a search with field collapsing enabled. I
have attached a new patch to SOLR-236 that includes a fix for this
bug.

Martijn

2009/10/25 Martijn v Groningen martijn.is.h...@gmail.com:
 Hi Joe,

 Can you give a bit more context info? Like the exact search and the
 field types you are using for example. Also are you doing a lot of
 frequent updates to the index?

 Cheers,

 Martijn

 2009/10/23 Joe Calderon calderon@gmail.com:
 seems to happen when sort on anything besides strictly score, even
 score desc, num desc triggers it, using latest nightly and 10/14 patch

 Problem accessing /solr/core1/select. Reason:

    4731592

 java.lang.ArrayIndexOutOfBoundsException: 4731592
        at 
 org.apache.lucene.search.FieldComparator$StringOrdValComparator.copy(FieldComparator.java:660)
        at 
 org.apache.solr.search.NonAdjacentDocumentCollapser$DocumentComparator.compare(NonAdjacentDocumentCollapser.java:235)
        at 
 org.apache.solr.search.NonAdjacentDocumentCollapser$DocumentPriorityQueue.lessThan(NonAdjacentDocumentCollapser.java:173)
        at 
 org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)
        at 
 org.apache.solr.search.NonAdjacentDocumentCollapser.doCollapsing(NonAdjacentDocumentCollapser.java:95)
        at 
 org.apache.solr.search.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:208)
        at 
 org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:98)
        at 
 org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:66)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1148)
        at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:387)
        at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
        at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
        at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
        at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:539)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
        at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520)

Re: Collapse with multiple fields

2009-10-23 Thread Martijn v Groningen

No this actually not supported at the moment. If you really need to
collapse on two different field you can concatenate the two fields
together in another field while indexing and then collapse on that
field.

Martijn

2009/10/23 Thijs vonk.th...@gmail.com:
 I haven't had time to actually ask this on the list my self but seeing this,
 I just had to reply. I was wondering this myself.

 Thijs

 On 23-10-2009 5:50, R. Tan wrote:

 Hi,
 Is it possible to collapse the results from multiple fields?

 Rih

Re: JVM OOM when using field collapse component

2009-10-02 Thread Martijn v Groningen

No I have not encountered OOM exception yet with current field collapse patch.
How large is your configured JVM heap space (-Xmx)? Field collapsing
requires more memory then regular searches so. Does Solr run out of
memory during the first search(es) or does it run out of memory after
a while when it performed quite a few field collapse searches?

I see that you are also using the collapse.includeCollapsedDocs.fl
parameter for your search. This feature will require more memory then
a normal field collapse search.

I normally give the Solr instance a heap space of 1024M when having an
index of a few million.

Martijn

2009/10/2 Joe Calderon calderon@gmail.com:
 i gotten two different out of memory errors while using the field
 collapsing component, using the latest patch (2009-09-26) and the
 latest nightly,

 has anyone else encountered similar problems? my collection is 5
 million results but ive gotten the error collapsing as little as a few
 thousand

 SEVERE: java.lang.OutOfMemoryError: Java heap space
        at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:173)
        at 
 org.apache.lucene.util.OpenBitSet.ensureCapacityWords(OpenBitSet.java:749)
        at 
 org.apache.lucene.util.OpenBitSet.ensureCapacity(OpenBitSet.java:757)
        at 
 org.apache.lucene.util.OpenBitSet.expandingWordNum(OpenBitSet.java:292)
        at org.apache.lucene.util.OpenBitSet.set(OpenBitSet.java:233)
        at 
 org.apache.solr.search.AbstractDocumentCollapser.addCollapsedDoc(AbstractDocumentCollapser.java:402)
        at 
 org.apache.solr.search.NonAdjacentDocumentCollapser.doCollapsing(NonAdjacentDocumentCollapser.java:115)
        at 
 org.apache.solr.search.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:208)
        at 
 org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:98)
        at 
 org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:66)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1148)
        at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:387)
        at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
        at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
        at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
        at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:539)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
        at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520)

 SEVERE: java.lang.OutOfMemoryError: Java heap space
        at 
 org.apache.solr.util.DocSetScoreCollector.init(DocSetScoreCollector.java:44)
        at 
 org.apache.solr.search.NonAdjacentDocumentCollapser.doQuery(NonAdjacentDocumentCollapser.java:68)
        at 
 org.apache.solr.search.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:205)
        at 
 org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:98)
        at 
 org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:66)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at

Re: field collapsing sums

2009-10-02 Thread Martijn v Groningen

Well that is odd. How have you configured field collapsing with the
dismax request handler?
The collapse counts should X - 1 (if collapse.threshold=1).

Martijn

2009/10/1 Joe Calderon calderon@gmail.com:
 thx for the reply, i just want the number of dupes in the query
 result, but it seems i dont get the correct totals,

 for example a non collapsed dismax query for belgian beer returns X
 number results
 but when i collapse and sum the number of docs under collapse_counts,
 its much less than X

 it does seem to work when the collapsed results fit on one page (10
 rows in my case)


 --joe

 2) It seems that you are using the parameters as was intended. The
 collapsed documents will contain all documents (from whole query
 result) that have been collapsed on a certain field value that occurs
 in the result set that is being displayed. That is how it should work.
 But if I'm understanding you correctly you want to display all dupes
 from the whole query result set (also those which collapse field value
 does not occur in the in the displayed result set)?




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: field collapsing sums

2009-10-01 Thread Martijn v Groningen

Hi Joe,

Currently the patch does not do that, but you can do something else
that might help you in getting your summed stock.

In the latest patch you can include fields of collapsed documents in
the result per distinct field value.
If your specify collapse.includeCollapseDocs.fl=num_in_stock in the
request nd lets say you collapse on brand then in the response you
will receive the following xml:
lst name=collapsedDocs
   result name=brand1 numFound=48 start=0
doc
  str name=num_in_stock2/str
/doc
 doc
  str name=num_in_stock3/str
/doc
  ...
   /result
   result name=”brand2” numFound=”9” start=”0”
  ...
   /result
/lst

On the client side you can do whatever you want with this data and for
example sum it together. Although the patch does not sum for you, I
think it will allow to implement your requirement without to much
hassle.

Cheers,

Martijn

2009/10/1 Matt Weber m...@mattweber.org:
 You might want to see how the stats component works with field collapsing.

 Thanks,

 Matt Weber

 On Sep 30, 2009, at 5:16 PM, Uri Boness wrote:

 Hi,

 At the moment I think the most appropriate place to put it is in the
 AbstractDocumentCollapser (in the getCollapseInfo method). Though, it might
 not be the most efficient.

 Cheers,
 Uri

 Joe Calderon wrote:

 hello all, i have a question on the field collapsing patch, say i have
 an integer field called num_in_stock and i collapse by some other
 column, is it possible to sum up that integer field and return the
 total in the output, if not how would i go about extending the
 collapsing component to support that?


 thx much

 --joe

Re: field collapsing sums

2009-10-01 Thread Martijn v Groningen

1) That is correct. Including collapsed documents fields can make you
search significantly slower (depending on how many documents are
returned).
2) It seems that you are using the parameters as was intended. The
collapsed documents will contain all documents (from whole query
result) that have been collapsed on a certain field value that occurs
in the result set that is being displayed. That is how it should work.
But if I'm understanding you correctly you want to display all dupes
from the whole query result set (also those which collapse field value
does not occur in the in the displayed result set)?

Martijn

2009/10/1 Joe Calderon calderon@gmail.com:
 hello martijn, thx for the tip, i tried that approach but ran into two
 snags, 1. returning the fields makes collapsing a lot slower for
 results, but that might just be the nature of iterating large results.
 2. it seems like only dupes of records on the first page are returned

 or is tehre a a setting im missing? currently im only sending,
 collapse.field=brand and collapse.includeCollapseDocs.fl=num_in_stock

 --joe

 On Thu, Oct 1, 2009 at 1:14 AM, Martijn v Groningen
 martijn.is.h...@gmail.com wrote:
 Hi Joe,

 Currently the patch does not do that, but you can do something else
 that might help you in getting your summed stock.

 In the latest patch you can include fields of collapsed documents in
 the result per distinct field value.
 If your specify collapse.includeCollapseDocs.fl=num_in_stock in the
 request nd lets say you collapse on brand then in the response you
 will receive the following xml:
 lst name=collapsedDocs
   result name=brand1 numFound=48 start=0
        doc
          str name=num_in_stock2/str
        /doc
         doc
          str name=num_in_stock3/str
        /doc
      ...
   /result
   result name=”brand2” numFound=”9” start=”0”
      ...
   /result
 /lst

 On the client side you can do whatever you want with this data and for
 example sum it together. Although the patch does not sum for you, I
 think it will allow to implement your requirement without to much
 hassle.

 Cheers,

 Martijn

 2009/10/1 Matt Weber m...@mattweber.org:
 You might want to see how the stats component works with field collapsing.

 Thanks,

 Matt Weber

 On Sep 30, 2009, at 5:16 PM, Uri Boness wrote:

 Hi,

 At the moment I think the most appropriate place to put it is in the
 AbstractDocumentCollapser (in the getCollapseInfo method). Though, it might
 not be the most efficient.

 Cheers,
 Uri

 Joe Calderon wrote:

 hello all, i have a question on the field collapsing patch, say i have
 an integer field called num_in_stock and i collapse by some other
 column, is it possible to sum up that integer field and return the
 total in the output, if not how would i go about extending the
 collapsing component to support that?


 thx much

 --joe









-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Mapping SolrDoc to SolrInputDoc

2009-09-16 Thread Martijn v Groningen

Hi Licinio,

You can use ClientUtils.toSolrInputDocument(...), that converts a
SolrDocument to a SolrInputDocument.

Martijn

2009/9/16 Licinio Fernández Maurelo licinio.fernan...@gmail.com:
 Hi there,

 currently i'm working on a small app which creates an Embedded Solr Server,
 reads all documents from one core and puts these docs into another one.

 The purpose of this app is to apply (small) changes on schema.xml to indexed
 data (offline) resulting a new index with documents updated to schema.xml
 changes.

 What i want to know is if there is an easy way to map SolrDoc  to
 SolrInputDoc.

 Any help would be much appreciated

 --
 Lici




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Monitoring split time for fq queries when filter cache is used

2009-09-01 Thread Martijn v Groningen

Hi Rahul,

Yes you are understanding is correct, but it is not possible to
monitor these actions separately with Solr.

Martijn

2009/9/1 Rahul R rahul.s...@gmail.com:
 Hello,
 I am trying to measure the benefit that I am getting out of using the filter
 cache. As I understand, there are two major parts to an fq query. Please
 correct me if I am wrong :
 - doing full index queries of each of the fq params (if filter cache is
 used, this result will be retrieved from the cache)
 - set intersection of above results (Will be done again even with filter
 cache enabled)

 Is there any flag/setting that I can enable to monitor how much time the
 above operations take separately i.e. the querying and the set-intersection
 ?

 Regards
 Rahul




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: Problems importing HTML content contained within XML document

2009-08-19 Thread Martijn v Groningen

Hi Venn,

I think what is happening when the BODY element is being processed by
xpath expressen (/document/category/BODY), is that it does not
retrieve the text content from the P elements inside the body element.
The expression will only retrieve text content that is directly a
child of the BODY element. I do not know the xpath function(s) the
data importhandler currently supports to return the text content of a
node and all its child nodes.

Maybe the expression  /document/category/BODY/* will work.

Cheers,

Martijn

2009/8/19 venn hardy venn.ha...@hotmail.com:

 Hello,

 I have just started trying out SOLR to index some XML documents that I 
 receive. I am
 using the SOLR 1.3 and its HttpDataSource in conjunction with the 
 XPathEntityProcessor.



 I am finding the data import really useful so far, but I am having a few 
 problems when
 I try and import HTML contained within one of the XML tags BODY. The data 
 import just seems
 to ignore the textContent silently but it imports everything else.



 When I do a query through the SOLR admin interface, only the id and author 
 fields are displayed.

 Any ideas what I am doing wrong?



 Thanks



 This is what my dataConfig looks like:
 dataConfig
  dataSource type=HttpDataSource /
  document
  entity name=archive pk=id 
 url=http://localhost:9080/data/20090817070752.xml; 
 processor=XPathEntityProcessor forEach=/document/category 
 transformer=DateFormatTransformer stream=true dataSource=dataSource
         field column=id xpath=/document/category/reference /
  field column=textContent xpath=/document/category/BODY /
  field column=author xpath=/document/category/author /
  /entity
  /document
 /dataConfig



 This is how I have specified my schema
 fields
   field name=id type=string indexed=true stored=true required=true 
 /
   field name=author type=string indexed=true stored=true/
   field name=textContent type=text indexed=true stored=true /
 /fields

  uniqueKeyid/uniqueKey
  defaultSearchFieldid/defaultSearchField



 And this is what my XML document looks like:

 document
  category
  reference123456/reference
  authorAuthori name/author
  BODY
  PLorem ipsum dolor sit amet, consectetur adipiscing elit.
  Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus varius 
 varius felis ut vestibulum/P
  PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
 vestibulum/P
  PLorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem elit,
  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
 vestibulum/P
  /BODY
  /category
 /document

 _
 Looking for a place to rent, share or buy this winter? Find your next place 
 with Ninemsn property
 http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline_t=774152450_r=Domain_tagline_m=EXT

95 matches

Mail list logo