RE: Exception using distributed field-collapsing

2012-06-21 Thread Young, Cody
Does it work in the non distributed case?

Is the field you're grouping on stored? What is the type on the uniqueKey 
field? Is it stored and indexed?

I've had a problem with distributed not working when the uniqueKey field was 
indexed but not stored.

Also, in distributed searches, the uniqueKey is used to retrieve documents from 
shards, so if were say, a date, that may be causing the issue. 

Cody

-Original Message-
From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com] 
Sent: Wednesday, June 20, 2012 1:54 PM
To: solr-user@lucene.apache.org
Subject: RE: Exception using distributed field-collapsing

 Hi Bryan,

 What is the fieldtype of the groupField? You can only group by field 
 that is of type string as is described in the wiki:
 http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters

 When you group by another field type a http 400 should be returned 
 instead if this error. At least that what I'd expect.

 Martijn

Martijn,

The group-by field is a string. I have been unable to figure how a date comes 
into the picture at all, and have basically been wondering if there is some 
problem in the grouping code that misaligns the field values from different 
results in the group, so that it is not comparing like with like. Not a strong 
theory, just the only thing I can think of.

-- Bryan


RE: Exception using distributed field-collapsing

2012-06-21 Thread Young, Cody
No, I believe it was a different exception, just brainstorming. (it was a null 
reference iirc)

Does a *:* query with no sorting work?

Cody

-Original Message-
From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com] 
Sent: Thursday, June 21, 2012 10:33 AM
To: solr-user@lucene.apache.org
Subject: RE: Exception using distributed field-collapsing

Cody,

 Does it work in the non distributed case?

Yes.


 Is the field you're grouping on stored? What is the type on the
uniqueKey
 field? Is it stored and indexed?

The field I'm grouping on is a string, stored and indexed. The unique key field 
is a string, stored and indexed.

 I've had a problem with distributed not working when the uniqueKey 
 field was indexed but not stored.

Was it the same exception I'm seeing?

-- Bryan


 -Original Message-
 From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com]
 Sent: Wednesday, June 20, 2012 1:54 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Exception using distributed field-collapsing

  Hi Bryan,
 
  What is the fieldtype of the groupField? You can only group by field 
  that is of type string as is described in the wiki:
  http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters
 
  When you group by another field type a http 400 should be returned 
  instead if this error. At least that what I'd expect.
 
  Martijn

 Martijn,

 The group-by field is a string. I have been unable to figure how a 
 date comes into the picture at all, and have basically been wondering 
 if
there
 is some problem in the grouping code that misaligns the field values
from
 different results in the group, so that it is not comparing like with 
 like. Not a strong theory, just the only thing I can think of.

 -- Bryan


Indexing Polygons

2012-05-22 Thread Young, Cody
Hi All,

I'm trying to figure out how to index polygons in solr (trunk). I'm using LSP 
right now as the solr integration of the new spatial module hasn't completed. I 
have searching for a point using a polygon working, but I'm also looking for 
searching for a polygon using a point.

I've seen some indication that LSP supports this but I haven't been able to 
find an example.

What field type would I need to use? Would it be multivalued?

Please and thank you!
Cody




RE: Grouping ngroups count

2012-05-01 Thread Young, Cody
Hello,

When you say 2 slices, do you mean 2 shards? As in, you're doing a distributed 
query?

If you're doing a distributed query, then for group.ngroups to work you need to 
ensure that all documents for a group exist on a single shard.

However, what you're describing sounds an awful lot like this JIRA issue that I 
entered a while ago for distributed grouping. I found that the hit count was 
coming only from the shards that ended up having results in the documents that 
were returned. I didn't test group.ngroups at the time.

https://issues.apache.org/jira/browse/SOLR-3316

If this is a similar issue then you should make a new Jira issue.

Cody

-Original Message-
From: Francois Perron [mailto:francois.per...@wantedanalytics.com] 
Sent: Tuesday, May 01, 2012 6:47 AM
To: solr-user@lucene.apache.org
Subject: Grouping ngroups count

Hello all,

  I tried to use grouping with 2 slices with a index of 35K documents.  When I 
ask top 10 rows, grouped by filed A, it gave me about 16K groups.  But, if I 
ask for top 20K rows, the ngroups property is now at 30K.  

Do you know why and of course how to fix it ?

Thanks.


RE: To truncate or not to truncate (group.truncate vs. facet)

2012-04-10 Thread Young, Cody
You may be able to fake your price requirements by rounding at index time.

For instance, if you wanted 10-19$, 20-29$, 30+ then you create a second price 
field specifically for faceting, round down to 10, 20, 30 at index time and 
then facet on that field.

Cody

-Original Message-
From: danjfoley [mailto:d...@micamedia.com] 
Sent: Monday, April 09, 2012 7:26 PM
To: solr-user@lucene.apache.org
Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

Is this planned as a future feature? Is it in the bug tracker as a feature 
yet..just wondering how long until it is a feature.  I could live without price 
counts for a bit.

Sent from my phone

- Reply message -
From: Martijn v Groningen-2 [via Lucene] 
ml-node+s472066n3897768...@n3.nabble.com
Date: Mon, Apr 9, 2012 3:31 pm
Subject: To truncate or not to truncate (group.truncate vs. facet)
To: danjfoley d...@micamedia.com



The group.facet option only works for field facets (facet.field). Others facets 
types (query, range and pivot) aren't supported yet.
The group.facet works for both single and multivalued fields specified in the 
facet.field parameter.

Martijn

On 9 April 2012 20:58, danjfoley d...@micamedia.com wrote:

 I am using group.facet and it works fine for regular facet.field but 
 not for facet.query

 Sent from my phone

 - Reply message -
 From: Young, Cody [via Lucene] 
 ml-node+s472066n3897487...@n3.nabble.com
 
 Date: Mon, Apr 9, 2012 1:38 pm
 Subject: To truncate or not to truncate (group.truncate vs. facet)
 To: danjfoley d...@micamedia.com



 One other thing, I believe that you need to be using facet.field on 
 single valued string fields for group.facet to function properly. Are 
 the fields you're faceting on multiValued=false?

 Cody

 -Original Message-
 From: Young, Cody [mailto:cody.yo...@move.com]
 Sent: Monday, April 09, 2012 10:36 AM
 To: solr-user@lucene.apache.org
 Subject: RE: To truncate or not to truncate (group.truncate vs. facet)

 You tried adding the parameter

 group.facet=true ?

 Cody

 -Original Message-
 From: danjfoley [mailto:d...@micamedia.com]
 Sent: Monday, April 09, 2012 10:09 AM
 To: solr-user@lucene.apache.org
 Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

 I did get this working with version 4. However my facet queries still 
 don't group.

 Sent from my phone

 - Reply message -
 From: Young, Cody [via Lucene] 
 ml-node+s472066n3897366...@n3.nabble.com
 
 Date: Mon, Apr 9, 2012 12:45 pm
 Subject: To truncate or not to truncate (group.truncate vs. facet)
 To: danjfoley d...@micamedia.com



 I believe you're looking for what's called, Matrix Counts

 Please see this JIRA issue. To my knowledge it has been committed in 
 trunk but not 3.x.

 https://issues.apache.org/jira/browse/SOLR-2898

 This feature is accessed by using group.facet=true

 Cody

 -Original Message-
 From: danjfoley [mailto:d...@micamedia.com]
 Sent: Saturday, April 07, 2012 7:02 PM
 To: solr-user@lucene.apache.org
 Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

 I've been searching for a solution to my issue, and this seems to come 
 closest to it. But not exactly.

 I am indexing clothing. Each article of clothing comes in many sizes 
 and colors, and can belong to any number of categories.

 For example take the following: I add 6 documents to solr as follows:

 product, color, size, category

 shirt A, red, small, valentines day
 shirt A, red, large, valentines day
 shirt A, blue, small, valentines day
 shirt A, blue, large, valentines day
 shirt A, green, small, valentines day
 shirt A, green, large, valentines day

 I'd like my facet counts to return as follows:

 color

 red (1)
 blue (1)
 green (1)

 size

 small (1)
 large (1)

 category

 valentines day (1)

 But they come back like this:

 color:
 red (2)
 blue (2)
 green (2)

 size:
 small (2)
 large (2)

 category
 valentines day (6)

 I see the group.facet parameter in version 4.0 does exactly this. 
 However how can I make this happen now? There are all sorts of 
 ecommerce systems out there that facet exactly how i'm asking. i 
 thought solr is supposed to be the very best fastest search system, 
 yet it doesn't seem to be able to facet correct for items with multiple 
 values?

 Am i indexing my data wrong?

 how can i make this happen?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-grou
 p-truncate-vs-facet-tp3838797p3893744.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 ___
 If you reply to this email, your message will be added to the 
 discussion
 below:

 http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-grou
 p-truncate-vs-facet-tp3838797p3897366.html

 To unsubscribe from To truncate or not to truncate (group.truncate vs.
 facet, visit
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro

RE: To truncate or not to truncate (group.truncate vs. facet)

2012-04-09 Thread Young, Cody
I believe you're looking for what's called, Matrix Counts

Please see this JIRA issue. To my knowledge it has been committed in trunk but 
not 3.x.

https://issues.apache.org/jira/browse/SOLR-2898

This feature is accessed by using group.facet=true

Cody

-Original Message-
From: danjfoley [mailto:d...@micamedia.com] 
Sent: Saturday, April 07, 2012 7:02 PM
To: solr-user@lucene.apache.org
Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

I've been searching for a solution to my issue, and this seems to come closest 
to it. But not exactly.

I am indexing clothing. Each article of clothing comes in many sizes and 
colors, and can belong to any number of categories.

For example take the following: I add 6 documents to solr as follows:

product, color, size, category

shirt A, red, small, valentines day
shirt A, red, large, valentines day
shirt A, blue, small, valentines day
shirt A, blue, large, valentines day
shirt A, green, small, valentines day
shirt A, green, large, valentines day

I'd like my facet counts to return as follows:

color

red (1)
blue (1)
green (1)

size

small (1)
large (1)

category

valentines day (1)

But they come back like this:

color:
red (2)
blue (2)
green (2)

size:
small (2)
large (2)

category
valentines day (6)

I see the group.facet parameter in version 4.0 does exactly this. However how 
can I make this happen now? There are all sorts of ecommerce systems out there 
that facet exactly how i'm asking. i thought solr is supposed to be the very 
best fastest search system, yet it doesn't seem to be able to facet correct for 
items with multiple values?

Am i indexing my data wrong? 

how can i make this happen?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3893744.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: To truncate or not to truncate (group.truncate vs. facet)

2012-04-09 Thread Young, Cody
You tried adding the parameter

group.facet=true ?

Cody

-Original Message-
From: danjfoley [mailto:d...@micamedia.com] 
Sent: Monday, April 09, 2012 10:09 AM
To: solr-user@lucene.apache.org
Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

I did get this working with version 4. However my facet queries still don't 
group.

Sent from my phone

- Reply message -
From: Young, Cody [via Lucene] ml-node+s472066n3897366...@n3.nabble.com
Date: Mon, Apr 9, 2012 12:45 pm
Subject: To truncate or not to truncate (group.truncate vs. facet)
To: danjfoley d...@micamedia.com



I believe you're looking for what's called, Matrix Counts

Please see this JIRA issue. To my knowledge it has been committed in trunk but 
not 3.x.

https://issues.apache.org/jira/browse/SOLR-2898

This feature is accessed by using group.facet=true

Cody

-Original Message-
From: danjfoley [mailto:d...@micamedia.com] 
Sent: Saturday, April 07, 2012 7:02 PM
To: solr-user@lucene.apache.org
Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

I've been searching for a solution to my issue, and this seems to come closest 
to it. But not exactly.

I am indexing clothing. Each article of clothing comes in many sizes and 
colors, and can belong to any number of categories.

For example take the following: I add 6 documents to solr as follows:

product, color, size, category

shirt A, red, small, valentines day
shirt A, red, large, valentines day
shirt A, blue, small, valentines day
shirt A, blue, large, valentines day
shirt A, green, small, valentines day
shirt A, green, large, valentines day

I'd like my facet counts to return as follows:

color

red (1)
blue (1)
green (1)

size

small (1)
large (1)

category

valentines day (1)

But they come back like this:

color:
red (2)
blue (2)
green (2)

size:
small (2)
large (2)

category
valentines day (6)

I see the group.facet parameter in version 4.0 does exactly this. However how 
can I make this happen now? There are all sorts of ecommerce systems out there 
that facet exactly how i'm asking. i thought solr is supposed to be the very 
best fastest search system, yet it doesn't seem to be able to facet correct for 
items with multiple values?

Am i indexing my data wrong? 

how can i make this happen?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3893744.html
Sent from the Solr - User mailing list archive at Nabble.com.


___
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897366.html

To unsubscribe from To truncate or not to truncate (group.truncate vs. facet, 
visit 
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=ZGFuQG1pY2FtZWRpYS5jb218MzgzODc5N3wtMTEyNjQzODIyNg==

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897422.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: To truncate or not to truncate (group.truncate vs. facet)

2012-04-09 Thread Young, Cody
One other thing, I believe that you need to be using facet.field on single 
valued string fields for group.facet to function properly. Are the fields 
you're faceting on multiValued=false?

Cody

-Original Message-
From: Young, Cody [mailto:cody.yo...@move.com] 
Sent: Monday, April 09, 2012 10:36 AM
To: solr-user@lucene.apache.org
Subject: RE: To truncate or not to truncate (group.truncate vs. facet)

You tried adding the parameter

group.facet=true ?

Cody

-Original Message-
From: danjfoley [mailto:d...@micamedia.com] 
Sent: Monday, April 09, 2012 10:09 AM
To: solr-user@lucene.apache.org
Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

I did get this working with version 4. However my facet queries still don't 
group.

Sent from my phone

- Reply message -
From: Young, Cody [via Lucene] ml-node+s472066n3897366...@n3.nabble.com
Date: Mon, Apr 9, 2012 12:45 pm
Subject: To truncate or not to truncate (group.truncate vs. facet)
To: danjfoley d...@micamedia.com



I believe you're looking for what's called, Matrix Counts

Please see this JIRA issue. To my knowledge it has been committed in trunk but 
not 3.x.

https://issues.apache.org/jira/browse/SOLR-2898

This feature is accessed by using group.facet=true

Cody

-Original Message-
From: danjfoley [mailto:d...@micamedia.com] 
Sent: Saturday, April 07, 2012 7:02 PM
To: solr-user@lucene.apache.org
Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

I've been searching for a solution to my issue, and this seems to come closest 
to it. But not exactly.

I am indexing clothing. Each article of clothing comes in many sizes and 
colors, and can belong to any number of categories.

For example take the following: I add 6 documents to solr as follows:

product, color, size, category

shirt A, red, small, valentines day
shirt A, red, large, valentines day
shirt A, blue, small, valentines day
shirt A, blue, large, valentines day
shirt A, green, small, valentines day
shirt A, green, large, valentines day

I'd like my facet counts to return as follows:

color

red (1)
blue (1)
green (1)

size

small (1)
large (1)

category

valentines day (1)

But they come back like this:

color:
red (2)
blue (2)
green (2)

size:
small (2)
large (2)

category
valentines day (6)

I see the group.facet parameter in version 4.0 does exactly this. However how 
can I make this happen now? There are all sorts of ecommerce systems out there 
that facet exactly how i'm asking. i thought solr is supposed to be the very 
best fastest search system, yet it doesn't seem to be able to facet correct for 
items with multiple values?

Am i indexing my data wrong? 

how can i make this happen?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3893744.html
Sent from the Solr - User mailing list archive at Nabble.com.


___
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897366.html

To unsubscribe from To truncate or not to truncate (group.truncate vs. facet, 
visit 
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=ZGFuQG1pY2FtZWRpYS5jb218MzgzODc5N3wtMTEyNjQzODIyNg==

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897422.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Distributed grouping issue

2012-04-04 Thread Young, Cody
Hi Martijn,

I created a JIRA issue and attached a test that fails. It seems to exhibit the 
same issue that I see on my local box. (If you run it multiple times you can 
see that the group value of the top doc changes between runs.)

Also, I had to change add fixShardCount = true; in the constructor of the 
TestDistributedGrouping class, which caused another test case to fail. (It's 
commented out in the patch with a TODO above it.)

Please let me know if you need any other information.

https://issues.apache.org/jira/browse/SOLR-3316

Thanks!!
Cody

-Original Message-
From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of 
Martijn v Groningen
Sent: Monday, April 02, 2012 10:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Distributed grouping issue

I tried the to reproduce this. However the matches always returns 4 in my case 
(when using rows=1 and rows=2).
In your case the 2 documents on each core do belong to the same group, right?

I did find something else. If I use rows=0 then an error occurs. I think we 
need to further investigate this.
Can you open an issue in Jira? I'm a bit busy today. We can then further look 
into this in the coming days.

Martijn

On 2 April 2012 23:00, Young, Cody cody.yo...@move.com wrote:

 Okay, I've played with this a bit more. Found something interesting:

 When the groups returned do not include results from a core, then the 
 core is excluded from the count. (I have 1 group, 2 documents per 
 core)

 Example:


 http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/s
 olr/core0,localhost:8983/solr/core1group=truegroup.field=group_field
 group.limit=10rows=1

 lst name=grouped
 lst name=group_field
 int name=matches2/int

 Then, just by changing rows=2


 http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/s
 olr/core0,localhost:8983/solr/core1group=truegroup.field=group_field
 group.limit=10rows=2

 lst name=grouped
 lst name=group_field
 int name=matches4/int

 Let me know if you have any luck reproducing.

 Thanks,
 Cody

 -Original Message-
 From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On 
 Behalf Of Martijn v Groningen
 Sent: Monday, April 02, 2012 1:48 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Distributed grouping issue

 
  All documents of a group exist on a single shard, there are no 
  cross-shard groups.
 
 You only have to partition documents by group when the groupCount and 
 some other features need to be accurate. For the matches this is not 
 necessary. The matches are summed up during merging the shared responses.

 I can't reproduce the error you are describing on a small local setup 
 I have here. I have two Solr cores with a simple schema. Each core has 
 3 documents. When grouping the matches element returns 6. I'm running 
 on a trunk that I have updated 30 minutes ago. Can you try to isolate 
 the problem by testing with a small subset of your data?

 Martijn




--
Met vriendelijke groet,

Martijn van Groningen


RE: shards and grouping

2012-04-03 Thread Young, Cody
I received this error if the solr unique key was set to stored=false. Once I 
changed it to stored=true I stopped receiving this error. 

Cody

-Original Message-
From: ramires [mailto:uy...@beriltech.com] 
Sent: Tuesday, April 03, 2012 7:15 AM
To: solr-user@lucene.apache.org
Subject: shards and grouping

hi 

I use solr4 with four shards like that 

http://x.x.x.x:8984/solr/arama4/select?shards=192.168.200.202:8984/solr/ar1/,192.168.200.202:8984/solr/ar2,192.168.200.202:8984/solr/ar3,192.168.200.202:8984/solr/ar4q=tablefacet=truefacet.field=sitef.site.facet.numFacetTerms=1facet.mincount=1facet.limit=-1group=truegroup.field=site

It`s works all querys well but with group querys it throws a error 500. Is 
grouping work in solr 4 ? 

HTTP ERROR 500

Problem accessing /solr/arama4/select. Reason:

null

java.lang.NullPointerException
at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:639)
at
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:536)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:331)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1308)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/shards-and-grouping-tp3881069p3881069.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Distributed grouping issue

2012-04-02 Thread Young, Cody
In the case of group=false:

numFound=26

In the case of group=true:

int name=matches34000/int

As a note, the grouped number changes when I hit refresh. It seems to display 
the count from any single shard. (The top match also changes).

I haven't tried this in other versions of solr.

All documents of a group exist on a single shard, there are no cross-shard 
groups.

Thanks,
Cody 

-Original Message-
From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of 
Martijn v Groningen
Sent: Monday, April 02, 2012 3:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Distributed grouping issue

The matches element in the response should return the number of documents 
that matched with the query and not the number of groups.
Did you encountered this issue also with other Solr versions (3.5 or another 
nightly build)?

Martijn

On 2 April 2012 09:41, fbrisbart fbrisb...@bestofmedia.com wrote:

 Hi,

 when you write I get xxx results, does it come from 'numFound' ? Or 
 you really display xxx results ?
 When using both field collapsing and sharding, the 'numFound' may be 
 wrong. In that case, think about using 'shards.rows' parameter with a 
 high value (be careful, it's bad for performance).

 If the problem is really about the returned results, it may be because 
 of several documents having the same unique key document_id in 
 different shards.

 Hope it helps,
 Franck



 Le vendredi 30 mars 2012 à 23:52 +, Young, Cody a écrit :
  I forgot to mention, I can see the distributed requests happening in 
  the
 logs:
 
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core2] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core2NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=2
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core4] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core1] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core1NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core3] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core3NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core0] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core0NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core6] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=0
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core7] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core7NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=3
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core5] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core5NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core4] webapp=/solr path=/select
 params={distrib=falsegroup.distributed.second=truewt=javabinversion
 =2rows=10group.topgroups.group_field=4183765296group.topgroups.grou
 p_field=4608765424group.topgroups.group_field=3524954944group.topgro
 ups.group_field=4182445488group.topgroups.group_field=4213143392grou
 p.topgroups.group_field=4328299312group.topgroups.group_field=4206259
 648group.topgroups.group_field=3465497912group.topgroups.group_field
 =3554417600group.topgroups.group_field=3140802904fl=document_id,scor
 eshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:*
 group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime

RE: Distributed grouping issue

2012-04-02 Thread Young, Cody
Okay, I've played with this a bit more. Found something interesting:

When the groups returned do not include results from a core, then the core is 
excluded from the count. (I have 1 group, 2 documents per core)

Example:

http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=1

lst name=grouped
lst name=group_field
int name=matches2/int

Then, just by changing rows=2

http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=2

lst name=grouped
lst name=group_field
int name=matches4/int

Let me know if you have any luck reproducing.

Thanks,
Cody 

-Original Message-
From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of 
Martijn v Groningen
Sent: Monday, April 02, 2012 1:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Distributed grouping issue


 All documents of a group exist on a single shard, there are no 
 cross-shard groups.

You only have to partition documents by group when the groupCount and some 
other features need to be accurate. For the matches this is not necessary. 
The matches are summed up during merging the shared responses.

I can't reproduce the error you are describing on a small local setup I have 
here. I have two Solr cores with a simple schema. Each core has 3 documents. 
When grouping the matches element returns 6. I'm running on a trunk that I have 
updated 30 minutes ago. Can you try to isolate the problem by testing with a 
small subset of your data?

Martijn


Distributed grouping issue

2012-03-30 Thread Young, Cody
Hi All,

I'm having an issue getting distributed grouping  working on trunk (Mar 29, 
2012).

If I send this query:

http://localhost:8086/solr/core0/select/?q=*:*group=false 
shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7

I get 260,000 results. As soon as I change to using grouping:

http://localhost:8086/solr/core0/select/?q=*:*group=truegroup.field=group_fieldshards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7

I only get 32,000 results. (the number of documents in a single core.)

The field that I am grouping on is defined as:

field name=group_field type=string indexed=true stored=true 
multiValued=false /

fieldType name=string class=solr.StrField sortMissingLast=true 
omitNorms=true/

The document id:


field name=document_id type=string indexed=true stored=true 
required=true /

fieldType name=string class=solr.StrField sortMissingLast=true 
omitNorms=true/

uniqueKeydocument_id/uniqueKey

Anyone else experiencing this? Any ideas?

Thanks,
Cody


RE: Distributed grouping issue

2012-03-30 Thread Young, Cody
-
From: Young, Cody [mailto:cody.yo...@move.com] 
Sent: Friday, March 30, 2012 4:35 PM
To: solr-user@lucene.apache.org
Subject: Distributed grouping issue

Hi All,

I'm having an issue getting distributed grouping  working on trunk (Mar 29, 
2012).

If I send this query:

http://localhost:8086/solr/core0/select/?q=*:*group=false 
shards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7

I get 260,000 results. As soon as I change to using grouping:

http://localhost:8086/solr/core0/select/?q=*:*group=truegroup.field=group_fieldshards=localhost:8086/solr/core0,localhost:8086/solr/core1,localhost:8086/solr/core2,localhost:8086/solr/core3,localhost:8086/solr/core4,localhost:8086/solr/core5,localhost:8086/solr/core6,localhost:8086/solr/core7

I only get 32,000 results. (the number of documents in a single core.)

The field that I am grouping on is defined as:

field name=group_field type=string indexed=true stored=true 
multiValued=false /

fieldType name=string class=solr.StrField sortMissingLast=true 
omitNorms=true/

The document id:


field name=document_id type=string indexed=true stored=true 
required=true /

fieldType name=string class=solr.StrField sortMissingLast=true 
omitNorms=true/

uniqueKeydocument_id/uniqueKey

Anyone else experiencing this? Any ideas?

Thanks,
Cody


RE: Solr core swap after rebuild in HA-setup / High-traffic

2012-03-14 Thread Young, Cody
We have a very similar system. In our case we have a row version field in our 
sql database. When we run the full import we keep track of the latest row 
version at the time that the full import started. Once the full import is done 
we run an optimize and then run a delta import (actually this is a full import 
in a different DIH file, same mappings just with some different parameters to 
the stored procedure that we call). This catches the offline core up to live 
and then we swap.

Why can't you run the delta import on the rebuild core? Is there some specific 
issue that is preventing you?

Cody
-Original Message-
From: KeesSchepers [mailto:k...@keesschepers.nl] 
Sent: Wednesday, March 14, 2012 11:59 AM
To: solr-user@lucene.apache.org
Subject: Solr core swap after rebuild in HA-setup / High-traffic

Hello everybody,

I am designing a new Solr architecture for one of my clients. This sorl 
architecture is for a high-traffic website with million of visitors but I am 
facing some design problems were I hope you guys could help me out.

In my situation there are 4 Solr servers running, 1 server is master and 3 are 
slave. They are running Solr version 1.4.

I use two cores 'live' and 'rebuild' and I use Solr DIH to rebuild a core which 
goes like this:

1. I wipe the reindex core
2. I run the DIH to the complete dataset (4 million documents) in peices of
20.000 records (to prevent very long mysql locks) 3. After the DIH is finished 
(2 hours) we have to also have to update the rebuild core with changes from the 
last two hours, this is a problem 4. After updating is done and the core is not 
more then some seconds behind we want to SWAP the cores.

Everything goes well except for step 3. The rebuild and the core swap is all 
okay. 

Because the website is undergoing changes every minute we cannot pauze the 
delta-import on the live and walk behind for 2 hours. The problem is that I 
can't figure out a closing system with not delaying the live core to long and 
use the DIH instead of writing a lot of code.

Did anyone face this problem before or could give me some tips?

Thanks!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-core-swap-after-rebuild-in-HA-setup-High-traffic-tp3826461p3826461.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Problem with result grouping

2011-12-13 Thread Young, Cody
There's another discussion going on about this in the solr users mail
list, but I think you're using *.* instead of *:* to match all
documents. *.* ends up doing a search against the default field where
*:* means match all documents.

Cody

-Original Message-
From: Kissue Kissue [mailto:kissue...@gmail.com] 
Sent: Tuesday, December 13, 2011 6:06 AM
To: solr-user@lucene.apache.org
Subject: Problem with result grouping

Hi,

Maybe there is something i am missing here but i have a field in my solr
index called categoryId. The field definition is as follows:

field name=categoryId type=string indexed=true stored=true
required=true /

I am trying to group on this field and i get a result as follows:
str name=groupValue43201810/str
result name=doclist numFound=72 start=0

This is the query i am sending to solr:
http://localhost:8080/solr/catalogue/select/?q=*.*%0D%0Aversion=2.2sta
rt=0rows=1000indent=ongroup=truegroup.field=categoryId

My understanding is that this means there are 72 documents in my index
that have the value 43201810 for categoryId. Now surprisingly when i
search my index specifically for categoryId:43201810 expecting to get 72
results i instead get 124 results. This is the query sent:
http://localhost:8080/solr/catalogue/select/?q=categoryId%3A43201810ver
sion=2.2start=0rows=10indent=on

Is my understanding of result grouping correct? Is there something i am
doing wrong. Any help will be much appreciated. I am using Solr 3.5

Thanks.


RE: avoid overwrite in DataImportHandler

2011-12-08 Thread Young, Cody
I believe all you need to do is add a ?clean=false to your query string.

If you have a unique key setup as your ID in solr then it should update
the existing documents instead of delete and re-indexing.

Cody

-Original Message-
From: P Williams [mailto:williams.tricia.l...@gmail.com] 
Sent: Thursday, December 08, 2011 11:11 AM
To: solr-user@lucene.apache.org
Subject: Re: avoid overwrite in DataImportHandler

Ah.  Thanks Erick.

I see now that my question is different from sabman's.

Is there a way to use the DataImportHandler's full-import command so
that it does not delete the existing material before it begins?

Thanks,
Tricia

On Thu, Dec 8, 2011 at 6:35 AM, Erick Erickson
erickerick...@gmail.comwrote:

 This is all controlled by Solr via the uniqueKey field in your
schema.
 Just
 remove that entry.

 But then it's all up to you to handle the fact that there will be 
 multiple documents with the same ID all returned as a result of 
 querying. And it won't matter what program adds data, *nothing* will 
 be overwritten, DIH has no part in that decision.

 Deduplication is about defining some fields in your record and 
 avoiding adding another document if the contents are close, where 
 close is a slippery concept. I don't think it's related to your
problem at all.

 Best
 Erick

 On Wed, Dec 7, 2011 at 3:27 PM, P Williams 
 williams.tricia.l...@gmail.com wrote:
  Hi,
 
  I've wondered the same thing myself.  I feel like the clean 
  parameter
 has
  something to do with it but it doesn't work as I'd expect either.  
  Thanks in advance to anyone who can answer this question.
 
  *clean* : (default 'true'). Tells whether to clean up the index 
  before
 the
  indexing is started.
 
  Tricia
 
  On Wed, Dec 7, 2011 at 12:49 PM, sabman sab...@gmail.com wrote:
 
  I have a unique ID defined for the documents I am indexing. I want 
  to
 avoid
  overwriting the documents that have already been indexed. I am 
  using XPathEntityProcessor and TikaEntityProcessor to process the
documents.
 
  The DataImportHandler does not seem to have the option to set 
  overwrite=false. I have read some other forums to use deduplication
 instead
  but I don't see how it is related to my problem.
 
  Any help on this (or explanation on how deduplication would apply 
  to my probelm ) would be great. Thanks!
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/avoid-overwrite-in-DataImportHandle
 r-tp3568435p3568435.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



RE: debugging failed documents

2011-12-06 Thread Young, Cody
In my experience with DIH, the errors for failed documents end up in the
log files. Catalina.out for Tomcat. 

Can you check your log files?

Cody

-Original Message-
From: Alan Miller [mailto:alan.mill...@gmail.com] 
Sent: Tuesday, December 06, 2011 1:25 PM
To: Solr
Subject: debugging failed documents

Just getting started with DIH and I have a very simple setup.

My dih-config.xml is querying my postgres db and does a select on a
crosstab() table that returns just 100 rows.

When i do a full-import i see that 22 docs fail but what debug settings
do i have to tweak to see why the docs failed?

When I run the same query manually I see all 100 rows. The  22 rows
missing in my index look good in the manual output. 

Alan


Sent from my iPhone


RE: Index a null text field

2011-11-25 Thread Young, Cody
I don't see anything wrong so far other than a typo here (missing a p in
the second price):
field column=lastbid_price name=lastbid_rice /

 Can you see if there are any warnings in the log about documents not
being able to be created?

Also, you should have a field type definition for text in your schema.
It will look something like 

fieldtype name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index

Can you send the full field type definition along as well?

You can also try running a query like: 
?q=keyword_stock:[* TO *]
That will return any documents where keyword_stock is populated.

Thanks,
Cody

-Original Message-
From: jawedshamshedi [mailto:jawedshamsh...@gmail.com] 
Sent: Thursday, November 24, 2011 9:42 PM
To: solr-user@lucene.apache.org
Subject: RE: Index a null text field

Hi Cody,

Thanks for the reply.

Please find the detail of that I am doing. 

Yes, I am using dataimport handler and the code snippet of it from
solrconfig.xml is given below.

requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=configdata-config.xml/str
/lst
/requestHandler

The data-config.xml is give below.

dataConfig
dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost/database?zeroDateTimeBehavior=convertToNull
user=username password=password/
document name=content
 entity name=auction query=query(it's working file)
field column=keyword name=keyword_stock /
field column=start_bidprice
name=start_bidprice /
field column=end_date name=end_date /
field column=start_date name=start_date /
field column=lastbid_price
name=lastbid_rice /

field column=un_id name=un_id /

/entity
/document
/dataConfig

schema.xml


 fields


field name=un_id type=string indexed=true stored=true
required=true /   
field name=keyword_stock type=text  indexed=true
stored=true / 
field name=start_bidprice type=tlong  indexed=true
stored=true / 
field name=end_date type=tdate  indexed=true
stored=true / 
field name=start_date type=tdate  indexed=true
stored=true /  
field name=lastbid_price type=tfloat  indexed=true
stored=true / 
  

 /fields

 
 uniqueKeyun_id/uniqueKey

 
 defaultSearchFieldST_Name/defaultSearchField

he date type in mysql is given below.

keyword text
start_bidprice  float(12,2)
end_datedatetime
start_bidprice  float(12,2)
start_date  datetime


for some fields that are simple float, there index are being created. I
also added this in data-config.xml's url
zeroDateTimeBehavior=convertToNull but no avail.

Please help Thanks in advance.



--
View this message in context:
http://lucene.472066.n3.nabble.com/Index-a-null-text-field-tp3533636p353
5376.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Index a null text field

2011-11-24 Thread Young, Cody
Hello,

We'll need more information please. How are you indexing the documents?
DataImportHandler? Xml Updates?

Can you show us the relevant parts of your schema? (Field definition and
data type for the field)

Are you getting any error messages in the log files?

Tell us more about your environment. Windows? Linux?

Thanks,
Cody

-Original Message-
From: jawedshamshedi [mailto:jawedshamsh...@gmail.com] 
Sent: Thursday, November 24, 2011 5:38 AM
To: solr-user@lucene.apache.org
Subject: Index a null text field

Hi all,

I am indexing a table that has a field by the name of solr_keywords of
type text in mysql. And it contains null values also. While creating
index in solr, this field is not getting indexed.

Any help will be appreciated. 

Thanks


--
View this message in context:
http://lucene.472066.n3.nabble.com/Index-a-null-text-field-tp3533636p353
3636.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Efficient title sorting on large result sets.

2011-11-21 Thread Young, Cody
Hi Andrew,

When you request a sort on a field, Lucene stores every unique value in
a field cache, which stays in ram. If you have a large index and you're
sorting on a Unicode string field, this can be very memory intensive.
The way that I've solved this in the past is to make a field
specifically for sorting and then truncate the string to a small number
of characters and sort on that. You have to accept that in some cases
sort order will be wrong. (If you truncate to 6 characters and then sort
Thisisastring and Thisisnotastring) you're not guaranteed to get the
correct sort order. 

The memory benefits to this are two-fold though, you have a shorter
string which takes up less memory, and you have a decreased number of
unique values.

Cody

-Original Message-
From: Andrew Ingram [mailto:andrew.ing...@tangentlabs.co.uk] 
Sent: Monday, November 21, 2011 3:23 AM
To: solr-user@lucene.apache.org
Subject: Efficient title sorting on large result sets.

Hi everyone,

We have a large product catalogue (currently 9 million, but soon to
inflate to around 25 million) with each product have a unicode title.
We're offering the facility to sort by title, but often within quite
large result sets, eg 1 million fiction books (we are correctly using
filters). Aside from the obvious questionable use of sorting over such a
large set of results, I'm wondering if there's any steps I can take to
optimise title sorting and minimise memory use.

Solr also crashes with OutOfMemoryErrors every couple of days, could
this be related to the sorting by title? Or should I be looking for
another cause? The machine Solr is on has 8gb ram, 7 of which is given
to Solr. We have other sites with larger catalogues and similar spec
hardware that aren't having any issues, the title sorting seems to be
the only major difference in functionality.

I'll be very grateful for any assistance.

Regards,
Andy Ingram