date:20120409

I'm working on a prototype of a scheme that uses SolrCloud to, in
effect, distribute a computation by running it inside of a request
processor.

If there are N shards and M operations, I want each node to perform
M/N operations. That, of course, implies that I know N.

Is that fact available anyplace inside Solr, or do I need to just configure it?

Re: Cloud-aware request processing?

2012-04-09 Thread Jan Høydahl

Hi,

Instead of using Solr, you may want to have a look at Hadoop or another 
framework for distributed computation, see e.g. 
http://java.dzone.com/articles/comparison-gridcloud-computing

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 9. apr. 2012, at 13:41, Benson Margulies wrote:

 I'm working on a prototype of a scheme that uses SolrCloud to, in
 effect, distribute a computation by running it inside of a request
 processor.
 
 If there are N shards and M operations, I want each node to perform
 M/N operations. That, of course, implies that I know N.
 
 Is that fact available anyplace inside Solr, or do I need to just configure 
 it?

'No JSP support' error in embedded Jetty for solrCloud as of apache-solr-4.0-2012-04-02_11-54-55

Starting the leader with:

 java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=rnicloud
-DzkRun -DnumShards=3 -Djetty.port=9167  -jar start.jar

and browsing to

http://localhost:9167/solr/rnicloud/admin/zookeeper.jsp

I get:

HTTP ERROR 500

Problem accessing /solr/rnicloud/admin/zookeeper.jsp. Reason:

JSP support not configured
Powered by Jetty://

Re: Cloud-aware request processing?

 Jan Høydahl,

My problem is intimately connected to Solr. it is not a batch job for
hadoop, it is a distributed real-time query scheme. I hate to add yet
another complex framework if a Solr RP can do the job simply.

For this problem, I can transform a Solr query into a subset query on
each shard, and then let the SolrCloud mechanism.

I am well aware of the 'zoo' of alternatives, and I will be evaluating
them if I can't get what I want from Solr.

On Mon, Apr 9, 2012 at 9:34 AM, Jan Høydahl jan@cominvent.com wrote:
 Hi,

 Instead of using Solr, you may want to have a look at Hadoop or another 
 framework for distributed computation, see e.g. 
 http://java.dzone.com/articles/comparison-gridcloud-computing

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 9. apr. 2012, at 13:41, Benson Margulies wrote:

 I'm working on a prototype of a scheme that uses SolrCloud to, in
 effect, distribute a computation by running it inside of a request
 processor.

 If there are N shards and M operations, I want each node to perform
 M/N operations. That, of course, implies that I know N.

 Is that fact available anyplace inside Solr, or do I need to just configure 
 it?

RE: Re: Cloud-aware request processing?

2012-04-09 Thread Darren Govoni


...it is a distributed real-time query scheme...

SolrCloud does this already. It treats all the shards like one-big-index, and you can 
query it normally to get subset results from each shard. Why do you have to 
re-write the query for each shard? Seems unnecessary.

brbrbr--- Original Message ---
On 4/9/2012  08:45 AM Benson Margulies wrote:br Jan Høydahl,
br
brMy problem is intimately connected to Solr. it is not a batch job for
brhadoop, it is a distributed real-time query scheme. I hate to add yet
branother complex framework if a Solr RP can do the job simply.
br
brFor this problem, I can transform a Solr query into a subset query on
breach shard, and then let the SolrCloud mechanism.
br
brI am well aware of the 'zoo' of alternatives, and I will be evaluating
brthem if I can't get what I want from Solr.
br
brOn Mon, Apr 9, 2012 at 9:34 AM, Jan Høydahl jan@cominvent.com wrote:
br Hi,
br
br Instead of using Solr, you may want to have a look at Hadoop or another 
framework for distributed computation, see e.g. 
http://java.dzone.com/articles/comparison-gridcloud-computing
br
br --
br Jan Høydahl, search solution architect
br Cominvent AS - www.cominvent.com
br Solr Training - www.solrtraining.com
br
br On 9. apr. 2012, at 13:41, Benson Margulies wrote:
br
br I'm working on a prototype of a scheme that uses SolrCloud to, in
br effect, distribute a computation by running it inside of a request
br processor.
br
br If there are N shards and M operations, I want each node to perform
br M/N operations. That, of course, implies that I know N.
br
br Is that fact available anyplace inside Solr, or do I need to just 
configure it?
br
br
br

Is http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster up to date?

I specify -Dcollection.configName=rnicloud, but the admin gui tells me
that I have a collection named 'collection1'.

And, as reported in a prior email, the admin UI URL in there seems wrong.

Re: JNDI in db-data-config.xml websphere

2012-04-09 Thread tech20nn

Have to use exact JNDI name in db-data-config.xml, as unmanaged threads in
Websphere do not have access to java:comp/env namespace. 

Resource name can not be mapped to websphere jdbc datasource name via
reference definition in web.xml.

Now using jndiName=jdbc/testdb instead of
jndiName=java:comp/env/jdbc/testdb and also defining websphere JDBC
datasource as jdbc/testdb

--
View this message in context: 
http://lucene.472066.n3.nabble.com/JNDI-in-db-data-config-xml-websphere-tp3884787p3896869.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Re: Cloud-aware request processing?

On Mon, Apr 9, 2012 at 9:50 AM, Darren Govoni ontre...@ontrenet.com wrote:
 ...it is a distributed real-time query scheme...

 SolrCloud does this already. It treats all the shards like one-big-index,
 and you can query it normally to get subset results from each shard. Why
 do you have to re-write the query for each shard? Seems unnecessary.

For reasons described in previous email that I won't repeat here.


 brbrbr--- Original Message ---
 On 4/9/2012  08:45 AM Benson Margulies wrote:br Jan Høydahl,
 br
 brMy problem is intimately connected to Solr. it is not a batch job for
 brhadoop, it is a distributed real-time query scheme. I hate to add yet
 branother complex framework if a Solr RP can do the job simply.
 br
 brFor this problem, I can transform a Solr query into a subset query on
 breach shard, and then let the SolrCloud mechanism.
 br
 brI am well aware of the 'zoo' of alternatives, and I will be evaluating
 brthem if I can't get what I want from Solr.
 br
 brOn Mon, Apr 9, 2012 at 9:34 AM, Jan Høydahl jan@cominvent.com
 wrote:
 br Hi,
 br
 br Instead of using Solr, you may want to have a look at Hadoop or
 another framework for distributed computation, see e.g.
 http://java.dzone.com/articles/comparison-gridcloud-computing
 br
 br --
 br Jan Høydahl, search solution architect
 br Cominvent AS - www.cominvent.com
 br Solr Training - www.solrtraining.com
 br
 br On 9. apr. 2012, at 13:41, Benson Margulies wrote:
 br
 br I'm working on a prototype of a scheme that uses SolrCloud to, in
 br effect, distribute a computation by running it inside of a request
 br processor.
 br
 br If there are N shards and M operations, I want each node to perform
 br M/N operations. That, of course, implies that I know N.
 br
 br Is that fact available anyplace inside Solr, or do I need to just
 configure it?
 br
 br
 br

Re: Solr is indexing but not showing results

2012-04-09 Thread Ahmet Arslan

 field name=CID type=string indexed=true
 stored=true
 required=true/
 field name=XML type=string indexed=true
 stored=true
 required=true/      

String type is not tokenized. Indexed verbatim. Use a different type for full 
text search. e.g. type=text

Stumped on using a custom update request processor with SolrCloud

If you would be so kind as to look at
https://issues.apache.org/jira/browse/SOLR-3342, you will see that I
tried to use a working configuration for a URP of mine with SolrCloud,
and received in return an NPE.

Somehow or another, by default, the XmlUpdateRequestHandler ends up
using (I think) the PeerSync class to establish the indexibleId. When
I add in my URP, I am somehow turning this off, and I'm currently
stumped as to how to turn it back on.

If you don't care to read the JIRA, my relevant configuration is right
here. Is there something else I need in the 'defaults' list, or some
other processor I need to put in my chain?

   updateRequestProcessorChain name=RNI
!-- some day, add parameters when we have some --
processor 
class=com.basistech.rni.solr.NameIndexingUpdateRequestProcessorFactory/
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain

!-- activate RNI processing by adding the RNI URP to the chain
for xml updates --
  requestHandler name=/update
  class=solr.XmlUpdateRequestHandler
lst name=defaults
  str name=update.chainRNI/str
/lst
/requestHandler

RE: To truncate or not to truncate (group.truncate vs. facet)

2012-04-09 Thread Young, Cody

I believe you're looking for what's called, Matrix Counts

Please see this JIRA issue. To my knowledge it has been committed in trunk but 
not 3.x.

https://issues.apache.org/jira/browse/SOLR-2898

This feature is accessed by using group.facet=true

Cody

-Original Message-
From: danjfoley [mailto:d...@micamedia.com] 
Sent: Saturday, April 07, 2012 7:02 PM
To: solr-user@lucene.apache.org
Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

I've been searching for a solution to my issue, and this seems to come closest 
to it. But not exactly.

I am indexing clothing. Each article of clothing comes in many sizes and 
colors, and can belong to any number of categories.

For example take the following: I add 6 documents to solr as follows:

product, color, size, category

shirt A, red, small, valentines day
shirt A, red, large, valentines day
shirt A, blue, small, valentines day
shirt A, blue, large, valentines day
shirt A, green, small, valentines day
shirt A, green, large, valentines day

I'd like my facet counts to return as follows:

color

red (1)
blue (1)
green (1)

size

small (1)
large (1)

category

valentines day (1)

But they come back like this:

color:
red (2)
blue (2)
green (2)

size:
small (2)
large (2)

category
valentines day (6)

I see the group.facet parameter in version 4.0 does exactly this. However how 
can I make this happen now? There are all sorts of ecommerce systems out there 
that facet exactly how i'm asking. i thought solr is supposed to be the very 
best fastest search system, yet it doesn't seem to be able to facet correct for 
items with multiple values?

Am i indexing my data wrong? 

how can i make this happen?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3893744.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: how to correctly facet clothing multiple sizes and colors?

2012-04-09 Thread Robert Petersen

You *could* do it by making one and only one solr document for each
clothing item, then just have the front end render all the sizes and
colors available for that item as size/color pickers on the product
page.  You can add all the colors and sized to the one document in the
index so they are searchable also, but the caveat is that they won't
show up as a facet.  This is just one simple approach.

-Original Message-
From: danjfoley [mailto:d...@micamedia.com] 
Sent: Saturday, April 07, 2012 7:04 PM
To: solr-user@lucene.apache.org
Subject: how to correctly facet clothing multiple sizes and colors?

I've been searching for a solution to my issue, and this seems to come
closest to it. But not exactly. 

I am indexing clothing. Each article of clothing comes in many sizes and
colors, and can belong to any number of categories. 

For example take the following: I add 6 documents to solr as follows: 

product, color, size, category 

shirt A, red, small, valentines day 
shirt A, red, large, valentines day 
shirt A, blue, small, valentines day 
shirt A, blue, large, valentines day 
shirt A, green, small, valentines day 
shirt A, green, large, valentines day 

I'd like my facet counts to return as follows: 

color 

red (1) 
blue (1) 
green (1) 

size 

small (1) 
large (1) 

category 

valentines day (1) 

But they come back like this: 

color: 
red (2) 
blue (2) 
green (2) 

size: 
small (2) 
large (2) 

category 
valentines day (6) 

I see the group.facet parameter in version 4.0 does exactly this.
However
how can I make this happen now? There are all sorts of ecommerce systems
out
there that facet exactly how i'm asking. i thought solr is supposed to
be
the very best fastest search system, yet it doesn't seem to be able to
facet
correct for items with multiple values? 

Am i indexing my data wrong? 

how can i make this happen?

--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-correctly-facet-clothing-multi
ple-sizes-and-colors-tp3893747p3893747.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: 'No JSP support' error in embedded Jetty for solrCloud as of apache-solr-4.0-2012-04-02_11-54-55

2012-04-09 Thread Ryan McKinley

zookeeper.jsp was removed (along with all JSP stuff) in trunk

Take a look at the cloud tab in the UI, or check the /zookeeper
servlet for the JSON raw output

ryan


On Mon, Apr 9, 2012 at 6:42 AM, Benson Margulies bimargul...@gmail.com wrote:
 Starting the leader with:

  java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=rnicloud
 -DzkRun -DnumShards=3 -Djetty.port=9167  -jar start.jar

 and browsing to

 http://localhost:9167/solr/rnicloud/admin/zookeeper.jsp

 I get:

 HTTP ERROR 500

 Problem accessing /solr/rnicloud/admin/zookeeper.jsp. Reason:

    JSP support not configured
 Powered by Jetty://

Re: To truncate or not to truncate (group.truncate vs. facet)

I did get this working with version 4. However my facet queries still don't 
group.

Sent from my phone

- Reply message -
From: Young, Cody [via Lucene] ml-node+s472066n3897366...@n3.nabble.com
Date: Mon, Apr 9, 2012 12:45 pm
Subject: To truncate or not to truncate (group.truncate vs. facet)
To: danjfoley d...@micamedia.com

I believe you're looking for what's called, Matrix Counts

Please see this JIRA issue. To my knowledge it has been committed in trunk but 
not 3.x.

https://issues.apache.org/jira/browse/SOLR-2898

This feature is accessed by using group.facet=true

Cody

-Original Message-
From: danjfoley [mailto:d...@micamedia.com] 
Sent: Saturday, April 07, 2012 7:02 PM
To: solr-user@lucene.apache.org
Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

I've been searching for a solution to my issue, and this seems to come closest 
to it. But not exactly.

I am indexing clothing. Each article of clothing comes in many sizes and 
colors, and can belong to any number of categories.

For example take the following: I add 6 documents to solr as follows:

product, color, size, category

shirt A, red, small, valentines day
shirt A, red, large, valentines day
shirt A, blue, small, valentines day
shirt A, blue, large, valentines day
shirt A, green, small, valentines day
shirt A, green, large, valentines day

I'd like my facet counts to return as follows:

color

red (1)
blue (1)
green (1)

size

small (1)
large (1)

category

valentines day (1)

But they come back like this:

color:
red (2)
blue (2)
green (2)

size:
small (2)
large (2)

category
valentines day (6)

I see the group.facet parameter in version 4.0 does exactly this. However how 
can I make this happen now? There are all sorts of ecommerce systems out there 
that facet exactly how i'm asking. i thought solr is supposed to be the very 
best fastest search system, yet it doesn't seem to be able to facet correct for 
items with multiple values?

Am i indexing my data wrong? 

how can i make this happen?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3893744.html
Sent from the Solr - User mailing list archive at Nabble.com.

___
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897366.html

To unsubscribe from To truncate or not to truncate (group.truncate vs. facet, 
visit 
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=ZGFuQG1pY2FtZWRpYS5jb218MzgzODc5N3wtMTEyNjQzODIyNg==

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897422.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: To truncate or not to truncate (group.truncate vs. facet)

2012-04-09 Thread Young, Cody

You tried adding the parameter

group.facet=true ?

Cody

-Original Message-
From: danjfoley [mailto:d...@micamedia.com] 
Sent: Monday, April 09, 2012 10:09 AM
To: solr-user@lucene.apache.org
Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

I did get this working with version 4. However my facet queries still don't 
group.

Sent from my phone

- Reply message -
From: Young, Cody [via Lucene] ml-node+s472066n3897366...@n3.nabble.com
Date: Mon, Apr 9, 2012 12:45 pm
Subject: To truncate or not to truncate (group.truncate vs. facet)
To: danjfoley d...@micamedia.com

I believe you're looking for what's called, Matrix Counts

Please see this JIRA issue. To my knowledge it has been committed in trunk but 
not 3.x.

https://issues.apache.org/jira/browse/SOLR-2898

This feature is accessed by using group.facet=true

Cody

-Original Message-
From: danjfoley [mailto:d...@micamedia.com] 
Sent: Saturday, April 07, 2012 7:02 PM
To: solr-user@lucene.apache.org
Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

I've been searching for a solution to my issue, and this seems to come closest 
to it. But not exactly.

I am indexing clothing. Each article of clothing comes in many sizes and 
colors, and can belong to any number of categories.

For example take the following: I add 6 documents to solr as follows:

product, color, size, category

shirt A, red, small, valentines day
shirt A, red, large, valentines day
shirt A, blue, small, valentines day
shirt A, blue, large, valentines day
shirt A, green, small, valentines day
shirt A, green, large, valentines day

I'd like my facet counts to return as follows:

color

red (1)
blue (1)
green (1)

size

small (1)
large (1)

category

valentines day (1)

But they come back like this:

color:
red (2)
blue (2)
green (2)

size:
small (2)
large (2)

category
valentines day (6)

I see the group.facet parameter in version 4.0 does exactly this. However how 
can I make this happen now? There are all sorts of ecommerce systems out there 
that facet exactly how i'm asking. i thought solr is supposed to be the very 
best fastest search system, yet it doesn't seem to be able to facet correct for 
items with multiple values?

Am i indexing my data wrong? 

how can i make this happen?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3893744.html
Sent from the Solr - User mailing list archive at Nabble.com.

___
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897366.html

To unsubscribe from To truncate or not to truncate (group.truncate vs. facet, 
visit 
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=ZGFuQG1pY2FtZWRpYS5jb218MzgzODc5N3wtMTEyNjQzODIyNg==

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897422.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: To truncate or not to truncate (group.truncate vs. facet)

2012-04-09 Thread Young, Cody

One other thing, I believe that you need to be using facet.field on single 
valued string fields for group.facet to function properly. Are the fields 
you're faceting on multiValued=false?

Cody

-Original Message-
From: Young, Cody [mailto:cody.yo...@move.com] 
Sent: Monday, April 09, 2012 10:36 AM
To: solr-user@lucene.apache.org
Subject: RE: To truncate or not to truncate (group.truncate vs. facet)

You tried adding the parameter

group.facet=true ?

Cody

-Original Message-
From: danjfoley [mailto:d...@micamedia.com] 
Sent: Monday, April 09, 2012 10:09 AM
To: solr-user@lucene.apache.org
Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

I did get this working with version 4. However my facet queries still don't 
group.

Sent from my phone

- Reply message -
From: Young, Cody [via Lucene] ml-node+s472066n3897366...@n3.nabble.com
Date: Mon, Apr 9, 2012 12:45 pm
Subject: To truncate or not to truncate (group.truncate vs. facet)
To: danjfoley d...@micamedia.com

I believe you're looking for what's called, Matrix Counts

Please see this JIRA issue. To my knowledge it has been committed in trunk but 
not 3.x.

https://issues.apache.org/jira/browse/SOLR-2898

This feature is accessed by using group.facet=true

Cody

-Original Message-
From: danjfoley [mailto:d...@micamedia.com] 
Sent: Saturday, April 07, 2012 7:02 PM
To: solr-user@lucene.apache.org
Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

I've been searching for a solution to my issue, and this seems to come closest 
to it. But not exactly.

I am indexing clothing. Each article of clothing comes in many sizes and 
colors, and can belong to any number of categories.

For example take the following: I add 6 documents to solr as follows:

product, color, size, category

shirt A, red, small, valentines day
shirt A, red, large, valentines day
shirt A, blue, small, valentines day
shirt A, blue, large, valentines day
shirt A, green, small, valentines day
shirt A, green, large, valentines day

I'd like my facet counts to return as follows:

color

red (1)
blue (1)
green (1)

size

small (1)
large (1)

category

valentines day (1)

But they come back like this:

color:
red (2)
blue (2)
green (2)

size:
small (2)
large (2)

category
valentines day (6)

I see the group.facet parameter in version 4.0 does exactly this. However how 
can I make this happen now? There are all sorts of ecommerce systems out there 
that facet exactly how i'm asking. i thought solr is supposed to be the very 
best fastest search system, yet it doesn't seem to be able to facet correct for 
items with multiple values?

Am i indexing my data wrong? 

how can i make this happen?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3893744.html
Sent from the Solr - User mailing list archive at Nabble.com.

___
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897366.html

To unsubscribe from To truncate or not to truncate (group.truncate vs. facet, 
visit 
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=ZGFuQG1pY2FtZWRpYS5jb218MzgzODc5N3wtMTEyNjQzODIyNg==

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897422.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr is indexing but not showing results

2012-04-09 Thread srini

Hi Thanks for your reply. As per your suggestion I changed XML field type to
text. 

field name=XML type=string indexed=true stored=true
required=true/   

but when I start solr it is throwing following exception.
SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'text'
specified on field XML

Any suggestions!!(Thanks for your reply)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-is-indexing-but-not-showing-results-tp3897176p3897626.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr is indexing but not showing results

2012-04-09 Thread Walter Underwood

You will need to define or customize a field type for text. 

The example schema.xml file that is installed with Solr 3.5 has a several kinds 
of text fields, text_general and text_en are good places to start. You can 
use one of those, then customize it.

wunder

On Apr 9, 2012, at 11:27 AM, srini wrote:

 Hi Thanks for your reply. As per your suggestion I changed XML field type to
 text. 
 
 field name=XML type=string indexed=true stored=true
 required=true/   
 
 but when I start solr it is throwing following exception.
 SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'text'
 specified on field XML
 
 Any suggestions!!(Thanks for your reply)
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-is-indexing-but-not-showing-results-tp3897176p3897626.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr is indexing but not showing results

2012-04-09 Thread Jeevanandam Madanagopal

Srini -

This text datatype comes as sample configuration in SOLR distribution. Check 
this, it may suit your need!

fieldType name=text class=solr.TextField positionIncrementGap=100 
autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
/fieldType


-Jeevanandam
 
On Apr 10, 2012, at 12:08 AM, Walter Underwood wrote:

 You will need to define or customize a field type for text. 
 
 The example schema.xml file that is installed with Solr 3.5 has a several 
 kinds of text fields, text_general and text_en are good places to start. 
 You can use one of those, then customize it.
 
 wunder
 
 On Apr 9, 2012, at 11:27 AM, srini wrote:
 
 Hi Thanks for your reply. As per your suggestion I changed XML field type to
 text. 
 
 field name=XML type=string indexed=true stored=true
 required=true/   
 
 but when I start solr it is throwing following exception.
 SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'text'
 specified on field XML
 
 Any suggestions!!(Thanks for your reply)
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-is-indexing-but-not-showing-results-tp3897176p3897626.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr is indexing but not showing results

2012-04-09 Thread Walter Underwood

That is not a good configuration. Synonyms should be expanded at index time, 
not query time. --wunder

On Apr 9, 2012, at 11:43 AM, Jeevanandam Madanagopal wrote:

 Srini -
 
 This text datatype comes as sample configuration in SOLR distribution. 
 Check this, it may suit your need!
 
 fieldType name=text class=solr.TextField positionIncrementGap=100 
 autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
 protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
 protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
/fieldType
 
 
 -Jeevanandam
 
 On Apr 10, 2012, at 12:08 AM, Walter Underwood wrote:
 
 You will need to define or customize a field type for text. 
 
 The example schema.xml file that is installed with Solr 3.5 has a several 
 kinds of text fields, text_general and text_en are good places to start. 
 You can use one of those, then customize it.
 
 wunder
 
 On Apr 9, 2012, at 11:27 AM, srini wrote:
 
 Hi Thanks for your reply. As per your suggestion I changed XML field type to
 text. 
 
 field name=XML type=string indexed=true stored=true
 required=true/   
 
 but when I start solr it is throwing following exception.
 SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'text'
 specified on field XML
 
 Any suggestions!!(Thanks for your reply)
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-is-indexing-but-not-showing-results-tp3897176p3897626.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: To truncate or not to truncate (group.truncate vs. facet)

I am using group.facet and it works fine for regular facet.field but not for 
facet.query

Sent from my phone

- Reply message -
From: Young, Cody [via Lucene] ml-node+s472066n3897487...@n3.nabble.com
Date: Mon, Apr 9, 2012 1:38 pm
Subject: To truncate or not to truncate (group.truncate vs. facet)
To: danjfoley d...@micamedia.com

One other thing, I believe that you need to be using facet.field on single 
valued string fields for group.facet to function properly. Are the fields 
you're faceting on multiValued=false?

Cody

-Original Message-
From: Young, Cody [mailto:cody.yo...@move.com] 
Sent: Monday, April 09, 2012 10:36 AM
To: solr-user@lucene.apache.org
Subject: RE: To truncate or not to truncate (group.truncate vs. facet)

You tried adding the parameter

group.facet=true ?

Cody

-Original Message-
From: danjfoley [mailto:d...@micamedia.com] 
Sent: Monday, April 09, 2012 10:09 AM
To: solr-user@lucene.apache.org
Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

I did get this working with version 4. However my facet queries still don't 
group.

Sent from my phone

- Reply message -
From: Young, Cody [via Lucene] ml-node+s472066n3897366...@n3.nabble.com
Date: Mon, Apr 9, 2012 12:45 pm
Subject: To truncate or not to truncate (group.truncate vs. facet)
To: danjfoley d...@micamedia.com

I believe you're looking for what's called, Matrix Counts

Please see this JIRA issue. To my knowledge it has been committed in trunk but 
not 3.x.

https://issues.apache.org/jira/browse/SOLR-2898

This feature is accessed by using group.facet=true

Cody

-Original Message-
From: danjfoley [mailto:d...@micamedia.com] 
Sent: Saturday, April 07, 2012 7:02 PM
To: solr-user@lucene.apache.org
Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

I've been searching for a solution to my issue, and this seems to come closest 
to it. But not exactly.

I am indexing clothing. Each article of clothing comes in many sizes and 
colors, and can belong to any number of categories.

For example take the following: I add 6 documents to solr as follows:

product, color, size, category

shirt A, red, small, valentines day
shirt A, red, large, valentines day
shirt A, blue, small, valentines day
shirt A, blue, large, valentines day
shirt A, green, small, valentines day
shirt A, green, large, valentines day

I'd like my facet counts to return as follows:

color

red (1)
blue (1)
green (1)

size

small (1)
large (1)

category

valentines day (1)

But they come back like this:

color:
red (2)
blue (2)
green (2)

size:
small (2)
large (2)

category
valentines day (6)

I see the group.facet parameter in version 4.0 does exactly this. However how 
can I make this happen now? There are all sorts of ecommerce systems out there 
that facet exactly how i'm asking. i thought solr is supposed to be the very 
best fastest search system, yet it doesn't seem to be able to facet correct for 
items with multiple values?

Am i indexing my data wrong? 

how can i make this happen?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3893744.html
Sent from the Solr - User mailing list archive at Nabble.com.

___
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897366.html

To unsubscribe from To truncate or not to truncate (group.truncate vs. facet, 
visit 
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=ZGFuQG1pY2FtZWRpYS5jb218MzgzODc5N3wtMTEyNjQzODIyNg==

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897422.html
Sent from the Solr - User mailing list archive at Nabble.com.

___
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897487.html

To unsubscribe from To truncate or not to truncate (group.truncate vs. facet, 
visit 
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=ZGFuQG1pY2FtZWRpYS5jb218MzgzODc5N3wtMTEyNjQzODIyNg==

--
View this message in context: 
http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897694.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr is indexing but not showing results

2012-04-09 Thread Jeevanandam Madanagopal

I agree partially, it actually depends. For instance during index time few of 
the synonyms mapping may or may not expand (for e.g.. frequent data index 
population from different source). So good apply at index time as well as query 
time to achieve complete ratio. Mostly of the time I did similar settings to 
meet customer requirements.

For example: 
-
Below sample text datatype with synonyms at index  query time (below config 
has similar analyzer structure of tokenizer  filter; so we can keep commonly 
one analyzer config too.) 

fieldType name=text class=solr.TextField positionIncrementGap=100 
autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory  ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory  ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
/fieldType

-Jeevanandam


On Apr 10, 2012, at 12:18 AM, Walter Underwood wrote:

 That is not a good configuration. Synonyms should be expanded at index time, 
 not query time. --wunder
 
 On Apr 9, 2012, at 11:43 AM, Jeevanandam Madanagopal wrote:
 
 Srini -
 
 This text datatype comes as sample configuration in SOLR distribution. 
 Check this, it may suit your need!
 
 fieldType name=text class=solr.TextField positionIncrementGap=100 
 autoGeneratePhraseQueries=true
 analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory
   ignoreCase=true
   words=stopwords.txt
   enablePositionIncrements=true
   /
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=1 catenateNumbers=1 
 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.KeywordMarkerFilterFactory 
 protected=protwords.txt/
   filter class=solr.PorterStemFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory
   ignoreCase=true
   words=stopwords.txt
   enablePositionIncrements=true
   /
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=0 catenateNumbers=0 
 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.KeywordMarkerFilterFactory 
 protected=protwords.txt/
   filter class=solr.PorterStemFilterFactory/
 /analyzer
   /fieldType
 
 
 -Jeevanandam
 
 On Apr 10, 2012, at 12:08 AM, Walter Underwood wrote:
 
 You will need to define or customize a field type for text. 
 
 The example schema.xml file that is installed with Solr 3.5 has a several 
 kinds of text fields, text_general and text_en are good places to 
 start. You can use one of those, then customize it.
 
 wunder
 
 On Apr 9, 2012, at 11:27 AM, srini wrote:
 
 Hi Thanks for your reply. As per your suggestion I changed XML field type 
 to
 text. 
 
 field name=XML type=string indexed=true stored=true
 required=true/   
 
 but when I start solr it is throwing following exception.
 SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'text'
 specified on field XML
 
 Any suggestions!!(Thanks for your reply)
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-is-indexing-but-not-showing-results-tp3897176p3897626.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr is indexing but not showing results

2012-04-09 Thread Walter Underwood

There are some well-understood problems with query-time synonyms. Read about 
them here:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

Expanding synonyms at both index and query time causes a different problem, 
over-counting the score for any term in the synonym map.

wunder

On Apr 9, 2012, at 12:14 PM, Jeevanandam Madanagopal wrote:

 I agree partially, it actually depends. For instance during index time few of 
 the synonyms mapping may or may not expand (for e.g.. frequent data index 
 population from different source). So good apply at index time as well as 
 query time to achieve complete ratio. Mostly of the time I did similar 
 settings to meet customer requirements.
 
 For example: 
 -
 Below sample text datatype with synonyms at index  query time (below config 
 has similar analyzer structure of tokenizer  filter; so we can keep commonly 
 one analyzer config too.) 
 
 fieldType name=text class=solr.TextField positionIncrementGap=100 
 autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
filter class=solr.StopFilterFactory  ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
 protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory  ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory 
 protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
  /analyzer
/fieldType
 
 -Jeevanandam
 
 
 On Apr 10, 2012, at 12:18 AM, Walter Underwood wrote:
 
 That is not a good configuration. Synonyms should be expanded at index time, 
 not query time. --wunder
 
 On Apr 9, 2012, at 11:43 AM, Jeevanandam Madanagopal wrote:
 
 Srini -
 
 This text datatype comes as sample configuration in SOLR distribution. 
 Check this, it may suit your need!
 
 fieldType name=text class=solr.TextField positionIncrementGap=100 
 autoGeneratePhraseQueries=true
analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.StopFilterFactory
  ignoreCase=true
  words=stopwords.txt
  enablePositionIncrements=true
  /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=1 catenateNumbers=1 
 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.KeywordMarkerFilterFactory 
 protected=protwords.txt/
  filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory
  ignoreCase=true
  words=stopwords.txt
  enablePositionIncrements=true
  /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=0 catenateNumbers=0 
 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.KeywordMarkerFilterFactory 
 protected=protwords.txt/
  filter class=solr.PorterStemFilterFactory/
/analyzer
  /fieldType
 
 
 -Jeevanandam
 
 On Apr 10, 2012, at 12:08 AM, Walter Underwood wrote:
 
 You will need to define or customize a field type for text. 
 
 The example schema.xml file that is installed with Solr 3.5 has a several 
 kinds of text fields, text_general and text_en are good places to 
 start. You can use one of those, then customize it.
 
 wunder
 
 On Apr 9, 2012, at 11:27 AM, srini wrote:
 
 Hi Thanks for your reply. As per your suggestion I changed XML field type 
 to
 text. 
 
 field name=XML type=string indexed=true stored=true
 required=true/   
 
 but when I start solr it is throwing following exception.
 SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'text'
 specified on field XML
 
 Any suggestions!!(Thanks for your reply)
 
 --
 View this message in context:

Re: To truncate or not to truncate (group.truncate vs. facet)

2012-04-09 Thread Martijn v Groningen

The group.facet option only works for field facets (facet.field). Others
facets types (query, range and pivot) aren't supported yet.
The group.facet works for both single and multivalued fields specified in
the facet.field parameter.

Martijn

On 9 April 2012 20:58, danjfoley d...@micamedia.com wrote:

 I am using group.facet and it works fine for regular facet.field but not
 for facet.query

 Sent from my phone

 - Reply message -
 From: Young, Cody [via Lucene] ml-node+s472066n3897487...@n3.nabble.com
 
 Date: Mon, Apr 9, 2012 1:38 pm
 Subject: To truncate or not to truncate (group.truncate vs. facet)
 To: danjfoley d...@micamedia.com



 One other thing, I believe that you need to be using facet.field on single
 valued string fields for group.facet to function properly. Are the fields
 you're faceting on multiValued=false?

 Cody

 -Original Message-
 From: Young, Cody [mailto:cody.yo...@move.com]
 Sent: Monday, April 09, 2012 10:36 AM
 To: solr-user@lucene.apache.org
 Subject: RE: To truncate or not to truncate (group.truncate vs. facet)

 You tried adding the parameter

 group.facet=true ?

 Cody

 -Original Message-
 From: danjfoley [mailto:d...@micamedia.com]
 Sent: Monday, April 09, 2012 10:09 AM
 To: solr-user@lucene.apache.org
 Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

 I did get this working with version 4. However my facet queries still
 don't group.

 Sent from my phone

 - Reply message -
 From: Young, Cody [via Lucene] ml-node+s472066n3897366...@n3.nabble.com
 
 Date: Mon, Apr 9, 2012 12:45 pm
 Subject: To truncate or not to truncate (group.truncate vs. facet)
 To: danjfoley d...@micamedia.com



 I believe you're looking for what's called, Matrix Counts

 Please see this JIRA issue. To my knowledge it has been committed in trunk
 but not 3.x.

 https://issues.apache.org/jira/browse/SOLR-2898

 This feature is accessed by using group.facet=true

 Cody

 -Original Message-
 From: danjfoley [mailto:d...@micamedia.com]
 Sent: Saturday, April 07, 2012 7:02 PM
 To: solr-user@lucene.apache.org
 Subject: Re: To truncate or not to truncate (group.truncate vs. facet)

 I've been searching for a solution to my issue, and this seems to come
 closest to it. But not exactly.

 I am indexing clothing. Each article of clothing comes in many sizes and
 colors, and can belong to any number of categories.

 For example take the following: I add 6 documents to solr as follows:

 product, color, size, category

 shirt A, red, small, valentines day
 shirt A, red, large, valentines day
 shirt A, blue, small, valentines day
 shirt A, blue, large, valentines day
 shirt A, green, small, valentines day
 shirt A, green, large, valentines day

 I'd like my facet counts to return as follows:

 color

 red (1)
 blue (1)
 green (1)

 size

 small (1)
 large (1)

 category

 valentines day (1)

 But they come back like this:

 color:
 red (2)
 blue (2)
 green (2)

 size:
 small (2)
 large (2)

 category
 valentines day (6)

 I see the group.facet parameter in version 4.0 does exactly this. However
 how can I make this happen now? There are all sorts of ecommerce systems
 out there that facet exactly how i'm asking. i thought solr is supposed to
 be the very best fastest search system, yet it doesn't seem to be able to
 facet correct for items with multiple values?

 Am i indexing my data wrong?

 how can i make this happen?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3893744.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 ___
 If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897366.html

 To unsubscribe from To truncate or not to truncate (group.truncate vs.
 facet, visit
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=ZGFuQG1pY2FtZWRpYS5jb218MzgzODc5N3wtMTEyNjQzODIyNg==

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897422.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 ___
 If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897487.html

 To unsubscribe from To truncate or not to truncate (group.truncate vs.
 facet, visit
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=ZGFuQG1pY2FtZWRpYS5jb218MzgzODc5N3wtMTEyNjQzODIyNg==

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3897694.html

How to facet data from a multivalued field?

2012-04-09 Thread Thiago

Hello everybody,

I've already searched about this topic in the forum, but I didn't find any
case like this. I ask for apologizes if this topic have been already
discussed.

I'm having a problem in faceting a multivalued field. My field is called
series, and it has names of TV series like the big bang theory, two and a
half men ...

In this field I can have a lot of TV series names. For example:

arr name=series
   strTwo and a Half Men/str
   strHow I Met Your Mother/str
   strThe Big Bang Theory/str
/arr

What I want to do is: search and count how many documents related to each
series. I'm doing it using facet search in this field. But it's returning
each word separately. Like this:

lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=series
   int name=bang91/int
   int name=big91/int
   int name=half21/int
   int name=how45/int
   int name=i45/int
   int name=men21/int
   int name=met45/int
   int name=mother45/int
   int name=theori91/int
   int name=two21/int
   int name=your45/int
/lst
/lst
lst name=facet_dates/
lst name=facet_ranges/
/lst

And what I want is something like:

lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=series
   int name=Two and a Half Men21/int
   int name=How I Met Your Mother45/int
   int name=The Big Bang Theory91/int
/lst
/lst
lst name=facet_dates/
lst name=facet_ranges/
/lst

Is there any possible way to do it with facet search? I don't want the
terms, I just want each string including the white spaces. Do I have to
change my fieldtype to do this?

Thanks to everybody.

Thiago 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-facet-data-from-a-multivalued-field-tp3897853p3897853.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to facet data from a multivalued field?

2012-04-09 Thread Darren Govoni

Your handler for that field should be looked at.
Try not using a handler that tokenizes or stems the field.
You want to leave the text as is. I forget the handler setting for that,
but its documented in there somewhere.

On Mon, 2012-04-09 at 13:02 -0700, Thiago wrote:
Hello everybody,

I've already searched about this topic in the forum, but I didn't find any
case like this. I ask for apologizes if this topic have been already
discussed.

I'm having a problem in faceting a multivalued field. My field is called
series, and it has names of TV series like the big bang theory, two and a
half men ...

In this field I can have a lot of TV series names. For example:

arr name=series
strTwo and a Half Men/str
strHow I Met Your Mother/str
strThe Big Bang Theory/str
/arr

What I want to do is: search and count how many documents related to each
series. I'm doing it using facet search in this field. But it's returning
each word separately. Like this:

lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=series
int name=bang91/int
int name=big91/int
int name=half21/int
int name=how45/int
int name=i45/int
int name=men21/int
int name=met45/int
int name=mother45/int
int name=theori91/int
int name=two21/int
int name=your45/int
/lst
/lst
lst name=facet_dates/
lst name=facet_ranges/
/lst

And what I want is something like:

lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=series
int name=Two and a Half Men21/int
int name=How I Met Your Mother45/int
int name=The Big Bang Theory91/int
/lst
/lst
lst name=facet_dates/
lst name=facet_ranges/
/lst

Is there any possible way to do it with facet search? I don't want the
terms, I just want each string including the white spaces. Do I have to
change my fieldtype to do this?

Thanks to everybody.

Thiago

--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-facet-data-from-a-multivalued-field-tp3897853p3897853.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Strange behavior with search on empty string and NOT


: Would it be a good idea to have Solr throw syntax error if an empty string
: query occurs? 

erick's explanation wasn't very precise ... 

solr doesn't have any special handling of empty strings, but what you 
are searching for *might* be a totally valid query based on how the field 
type is configured (ie: strfield, or keywordtokenizer, etc...

in your case, you seem to be seraching for  in a field for the 
analyzer produces no tokens for , so it falls out of the query.


-Hoss

Re: Dynamically changing facet hierarchies and facet values


: I have a use case where the facet hierarchies as well as facet names change
: very frequently.
: 
: For example:
: (Smartphones  Android ) may become
: Smartphones  GSM  And roid.
: 
: OR
:Smartphone  could be renamed to Smart Phone
: 
: If I use traditional hierarchical faceting, then every change would mean a
: re-index of a large number of documents.
: 
: Just curious to know how others have solved this problem in the past.

I've dealt with this in the past using a custom plugin for the faceting. 
basically each document had a category field that only contained the id# 
of a category it was directly in, and the actaul hierarchy info was stored 
in an XML data file that the plugin loaded at init and used to build the 
query associated with each node by looking at all the categoryId number 
from all hte descendent categories (optimizations can be made if you know 
documents are only mapped to leaf level categories, or if you can define 
your hierarchy in terms of other fields -- ie: catId#345might be definable 
by the query type:phone AND os:android AND tech:GSM)

for small hiarchies, you can do the same thing from any solr client that 
knows what hierarchy you have usng many facet.queries - just put whatever 
info you need to remap the flat facet.query responses into a hierarchy as 
localparams on each facet.query.




-Hoss

Boosting when matching specific field values

2012-04-09 Thread gseoeltru solr

I am using edismax when executing search against set of news articles. I
would like to also boost the scores of matched documents based on another
field in the documents which I will call source which can be set to 3
possible strings.   So if the source field has a value a, then I want
to multiply the score by 1. If the source field has a value b, then I
want to multiple the score by 2 ... and so on. What is the way to go about
doing this ?
Any help here mucho appreciated !

Re: solr analysis-extras configuration


: Further info: I can make this work if I stay out of tomcat -- I
: download a fresh solr binary distro, copy those five JARs from 'dist'
: and 'contrib' into example/solr/lib/, copy my solrconfig.xml and
: schema.xml, and run 'java -jar start.jar', and it works fine.  But
: trying to add those same JARs to my tomcat instance's solrhome/lib
: doesn't work.  Any ideas how to troubleshoot?

is there anything else about how you have tomicat+solr configured that 
might be causing tomcat to load *any* solr or lucene jars directly, 
instead of letting the solr.war file load them from your solr home dir?  
did you change anything about tomcat's classpath? did you copy any jars 
anywhere other then your solrhome/lib dir?

these kinds of classloader hell errors can happen if a parent 
classloader has already loaded some class that depends on (or is depended 
on by) a another class loaded by the solr war.


-Hoss

Re: how to correctly facet clothing multiple sizes and colors?

2012-04-09 Thread Andrew Harvey

What we do in our application is exactly what Robert described. We index 
Products, not variants. The variant data (colour, size etc.) is denormalised 
into the product document at index time. We then facet on the variant 
attributes and get product count instead of variant count. 

What you're seeing are correct results. You are indexing 6 documents, as you 
said before. You actually only want to index one document with multi-valued 
fields. 

Hope that's somehow helpful,

Andrew

On 10/04/2012, at 3:01, Robert Petersen rober...@buy.com wrote:

 You *could* do it by making one and only one solr document for each
 clothing item, then just have the front end render all the sizes and
 colors available for that item as size/color pickers on the product
 page.  You can add all the colors and sized to the one document in the
 index so they are searchable also, but the caveat is that they won't
 show up as a facet.  This is just one simple approach.
 
 -Original Message-
 From: danjfoley [mailto:d...@micamedia.com] 
 Sent: Saturday, April 07, 2012 7:04 PM
 To: solr-user@lucene.apache.org
 Subject: how to correctly facet clothing multiple sizes and colors?
 
 I've been searching for a solution to my issue, and this seems to come
 closest to it. But not exactly. 
 
 I am indexing clothing. Each article of clothing comes in many sizes and
 colors, and can belong to any number of categories. 
 
 For example take the following: I add 6 documents to solr as follows: 
 
 product, color, size, category 
 
 shirt A, red, small, valentines day 
 shirt A, red, large, valentines day 
 shirt A, blue, small, valentines day 
 shirt A, blue, large, valentines day 
 shirt A, green, small, valentines day 
 shirt A, green, large, valentines day 
 
 I'd like my facet counts to return as follows: 
 
 color 
 
 red (1) 
 blue (1) 
 green (1) 
 
 size 
 
 small (1) 
 large (1) 
 
 category 
 
 valentines day (1) 
 
 But they come back like this: 
 
 color: 
 red (2) 
 blue (2) 
 green (2) 
 
 size: 
 small (2) 
 large (2) 
 
 category 
 valentines day (6) 
 
 I see the group.facet parameter in version 4.0 does exactly this.
 However
 how can I make this happen now? There are all sorts of ecommerce systems
 out
 there that facet exactly how i'm asking. i thought solr is supposed to
 be
 the very best fastest search system, yet it doesn't seem to be able to
 facet
 correct for items with multiple values? 
 
 Am i indexing my data wrong? 
 
 how can i make this happen?
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-correctly-facet-clothing-multi
 ple-sizes-and-colors-tp3893747p3893747.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr with UIMA

2012-04-09 Thread chris3001

Tommaso,
I apologize for my delayed response. Thank you very much for your time
looking into this!! 
I will try to replicate your efforts on my end this week.

Respectfully,
Chris

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3898094.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Suggester not working for digit starting terms

Is it possible that your fieldType definition for a_suggest is
stripping out the digits? Consider using TermsComponent
http://wiki.apache.org/solr/TermsComponent or the admin
page or Luke to examine the terms actually _in_ your
index. Or look at the admin/analysis page and give it some
sample input to determine what the results of the analysis
chain is

Best
Erick

On Sat, Apr 7, 2012 at 3:24 PM, jmlucjav jmluc...@gmail.com wrote:
 Hi,

 I am using Suggester component, as advised in Solr3 book (using solr3.5):
        searchComponent name=suggest class=solr.SpellCheckComponent
                lst name=spellchecker
                        str name=namea_suggest/str
                        str 
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
                        str
 name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str
                        str name=fielda_suggest/str
                        str name=buildOnCommittrue/str
                        int name=weightBuckets100/int
                /lst
        /searchComponent
        requestHandler name=/suggest class=solr.SearchHandler
                lst name=defaults
                        str name=spellchecktrue/str
                        str name=spellcheck.dictionarya_suggest/str
                        str name=spellcheck.onlyMorePopulartrue/str
                        str name=spellcheck.count5/str
                        str name=spellcheck.collatetrue/str
                /lst
                arr name=components
                        strsuggest/str
                /arr
        /requestHandler

 But, even if it works fine with words, it seems it does not work for terms
 starting with diggits. For example:
 http://localhost:8983/solr/suggest?q=500
 gets 0 results, but I know '500 $' is in the a_suggest field, as I can find
 many hits by:
 http://localhost:8983/solr/select/?q={!prefix f=a_suggest}500

 Am I missing something? I have been trying to play with
 spellcheck.onlyMorePopular and spellcheck.accuracy but I get the same
 results.

 thansk
 xab

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Suggester-not-working-for-digit-starting-terms-tp3893433p3893433.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question on using dynamic fields

Hmmm, not sure about the dataconfig.xml file. What
are you trying to index? Is this DIH? Because
if you're simply posting Solr-formatted XML docs,
dataconfig.xml is irrelevant

You say you're not seeing the output. One of two
things is going on:
1 The data is not in the index. See the admin/schema browser
  page to examine what actually went in your index.
2 Try doing the query with fl=*. You may simply not be asking
  for the fields to be returned.

Best
Erick

On Sun, Apr 8, 2012 at 9:09 PM, Rakesh Varna rakeshva...@gmail.com wrote:
 Hello Solr-users,
   I am trying to index xml files which have the following tags: (I am
 using Solr 3.5 on Tomcat)

 DOC
 theta10.98/theta_1
 theta20.767/theta_2
 .
 ..
 ..
 ..
 thetaN0.2873/thetaN
 /DOC

 The numbers after theta are not a continuous sequence and I do not know
 how many such tags are there. I thought this was a good candidate for
 dynamic fields and have the following schema for those tags:
   dynamicField name=theta*  type=double indexed=true
  stored=true/
 Is this correct? If so, what should I use in the data-config.xml file to
 index these tags?

 When I try the admin feature in the browser and query *:* , I don't see the
 theta fields in the response.

 If not, is dynamicFields a wrong choice? Is there another way of indexing
 these fields?

 Thanks in advance,
 Rakesh Varna

Re: Problem about range search

Hmmm, works fine for me using the popularity field in
the default schema.

What version of Solr are you using? What is your complete
handler definition?

Best
Erick

On Mon, Apr 9, 2012 at 12:10 AM, ZHANG Liang F
liang.f.zh...@alcatel-sbell.com.cn wrote:
 Hi,
 I ran into a problem when trying range facet search. I had a schema define 
 like this:
  fields
   field name=title type=string indexed=true stored=true/
   field name=author type=string indexed=true stored=true /
   field name=text type=text indexed=true stored=true /
   field name=path type=string indexed=true stored=true /
   field name=size type=long indexed=true stored=true /
   field name=lastmodified type=date indexed=true stored=true /
  /fields

 I try to set up a range search on size field which stands for the size of a 
 file. So I have the following requestHandler config in solrconfig.xml:
  str name=facet.range.otherafter/str
  str name=facet.rangesize/str
  int name=f.size.facet.range.start0/int
  int name=f.size.facet.range.end15728640/int!-- around 15M --
  int name=f.size.facet.range.gap3145728/int!-- around 3M --

 But an error says:  Unable to range facet on 
 field:size{type=long,properties=indexed,stored,omitNorms,omitTermFreqAndPositions}

 It doesn't show any clue, and I also tried long tag, but got the same error.

 Could you please help to suggest?

 Thanks in advance!

Re: Re: Cloud-aware request processing?

I _think_ you need to look at the Zookeeper information, perhaps
something like ZkController.getCloudState or some such?

Warning: I haven't been in that code, so this is just a guess. But
since the SolrCloud stuff has to know this kind of info in order
to do distributed indexing, it's got to be available, but I confess
it's not clear where.

But I'm guessing here...

Best
Erick

On Mon, Apr 9, 2012 at 8:22 AM, Benson Margulies bimargul...@gmail.com wrote:
 On Mon, Apr 9, 2012 at 9:50 AM, Darren Govoni ontre...@ontrenet.com wrote:
 ...it is a distributed real-time query scheme...

 SolrCloud does this already. It treats all the shards like one-big-index,
 and you can query it normally to get subset results from each shard. Why
 do you have to re-write the query for each shard? Seems unnecessary.

 For reasons described in previous email that I won't repeat here.


 brbrbr--- Original Message ---
 On 4/9/2012  08:45 AM Benson Margulies wrote:br Jan Høydahl,
 br
 brMy problem is intimately connected to Solr. it is not a batch job for
 brhadoop, it is a distributed real-time query scheme. I hate to add yet
 branother complex framework if a Solr RP can do the job simply.
 br
 brFor this problem, I can transform a Solr query into a subset query on
 breach shard, and then let the SolrCloud mechanism.
 br
 brI am well aware of the 'zoo' of alternatives, and I will be evaluating
 brthem if I can't get what I want from Solr.
 br
 brOn Mon, Apr 9, 2012 at 9:34 AM, Jan Høydahl jan@cominvent.com
 wrote:
 br Hi,
 br
 br Instead of using Solr, you may want to have a look at Hadoop or
 another framework for distributed computation, see e.g.
 http://java.dzone.com/articles/comparison-gridcloud-computing
 br
 br --
 br Jan Høydahl, search solution architect
 br Cominvent AS - www.cominvent.com
 br Solr Training - www.solrtraining.com
 br
 br On 9. apr. 2012, at 13:41, Benson Margulies wrote:
 br
 br I'm working on a prototype of a scheme that uses SolrCloud to, in
 br effect, distribute a computation by running it inside of a request
 br processor.
 br
 br If there are N shards and M operations, I want each node to perform
 br M/N operations. That, of course, implies that I know N.
 br
 br Is that fact available anyplace inside Solr, or do I need to just
 configure it?
 br
 br
 br

Re: Boosting when matching specific field values


: possible strings.   So if the source field has a value a, then I want
: to multiply the score by 1. If the source field has a value b, then I
: want to multiple the score by 2 ... and so on. What is the way to go about
: doing this ?

how long is your and so on list?

You could use the boost param of edismax to do this, by constructing a 
function that returns the appropriate value based on the results of your 
query (either using the termfreq() or query() functions) ... but if these 
mappings from values-score multipliers are generally static, you can also 
use ExternalFileFiel (it doesn't have to key off of the unique key field, 
it can key off of any single valued field) .. of the mappings are 
*REALLY* static just computed them when indexing hte docs.


-Hoss

Re: how to correctly facet clothing multiple sizes and colors?

The problem with that approach is that if you selected say large and red you'd 
get back all the products with large and red as variants. Not the products with 
red in the large size add would be expected.

Sent from my phone

- Reply message -
From: Andrew Harvey [via Lucene] ml-node+s472066n3898049...@n3.nabble.com
Date: Mon, Apr 9, 2012 5:21 pm
Subject: how to correctly facet clothing multiple sizes and colors?
To: danjfoley d...@micamedia.com

What we do in our application is exactly what Robert described. We index 
Products, not variants. The variant data (colour, size etc.) is denormalised 
into the product document at index time. We then facet on the variant 
attributes and get product count instead of variant count. 

What you're seeing are correct results. You are indexing 6 documents, as you 
said before. You actually only want to index one document with multi-valued 
fields. 

Hope that's somehow helpful,

Andrew

On 10/04/2012, at 3:01, Robert Petersen rober...@buy.com wrote:

 You *could* do it by making one and only one solr document for each
 clothing item, then just have the front end render all the sizes and
 colors available for that item as size/color pickers on the product
 page.  You can add all the colors and sized to the one document in the
 index so they are searchable also, but the caveat is that they won't
 show up as a facet.  This is just one simple approach.

 -Original Message-
 From: danjfoley [mailto:d...@micamedia.com] 
 Sent: Saturday, April 07, 2012 7:04 PM
 To: solr-user@lucene.apache.org
 Subject: how to correctly facet clothing multiple sizes and colors?

 I've been searching for a solution to my issue, and this seems to come
 closest to it. But not exactly. 

 I am indexing clothing. Each article of clothing comes in many sizes and
 colors, and can belong to any number of categories. 

 For example take the following: I add 6 documents to solr as follows: 

 product, color, size, category 

 shirt A, red, small, valentines day 
 shirt A, red, large, valentines day 
 shirt A, blue, small, valentines day 
 shirt A, blue, large, valentines day 
 shirt A, green, small, valentines day 
 shirt A, green, large, valentines day 

 I'd like my facet counts to return as follows: 

 color 

 red (1) 
 blue (1) 
 green (1) 

 size 

 small (1) 
 large (1) 

 category 

 valentines day (1) 

 But they come back like this: 

 color: 
 red (2) 
 blue (2) 
 green (2) 

 size: 
 small (2) 
 large (2) 

 category 
 valentines day (6) 

 I see the group.facet parameter in version 4.0 does exactly this.
 However
 how can I make this happen now? There are all sorts of ecommerce systems
 out
 there that facet exactly how i'm asking. i thought solr is supposed to
 be
 the very best fastest search system, yet it doesn't seem to be able to
 facet
 correct for items with multiple values? 

 Am i indexing my data wrong? 

 how can i make this happen?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-correctly-facet-clothing-multi
 ple-sizes-and-colors-tp3893747p3893747.html
 Sent from the Solr - User mailing list archive at Nabble.com.

___
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/how-to-correctly-facet-clothing-multiple-sizes-and-colors-tp3893747p3898049.html

To unsubscribe from how to correctly facet clothing multiple sizes and colors?, 
visit 
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3893747code=ZGFuQG1pY2FtZWRpYS5jb218Mzg5Mzc0N3wtMTEyNjQzODIyNg==

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-correctly-facet-clothing-multiple-sizes-and-colors-tp3893747p3898271.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrCloud versus a SearchComponent that rescores

Those of you insomniacs who have read my messages here over the last
few weeks might recall that I've been working on a request handler
that wraps the SearchHandler to rewrite queries and then reorder
results.

(I haven't quite worked out how to apply Grant's alternative
suggestions without losing the performance advantages I was looking
for in the first place.)

Today, I realized that the RequestHandler approach, as opposed to
search components, wasn't going to be viable. I was growing too much
dependency on internal Solr quirks.

So I refactored it into a pair of SearchComponents -- one to go first
and rewrite the query, and one to go after query and rescore.

And it works just fine - until I configure it into a SolrCloud
cluster. At which point it started coming up with very wrong answers.

I think that the reason is that I don't have an implementation of the
distributedProcess method, or, more generally, that I don't understand
the protocol on a SearchComponent when distributed processing is
happening. Has anyone written anything yet about these considerations?
I can put multiple processes in the debugging and see who gets called
with what, but I was hoping for some sort of short cut.

Re: Question on using dynamic fields

2012-04-09 Thread Rakesh Varna

Hi Erick,
   Thanks for the response. I am trying to index xml files in a directory.
I provide the xpath details, file location etc in data-config.xml. I will
try the 2 approaches that you have mentioned.

Regards,
Rakesh Varna

On Mon, Apr 9, 2012 at 3:38 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, not sure about the dataconfig.xml file. What
 are you trying to index? Is this DIH? Because
 if you're simply posting Solr-formatted XML docs,
 dataconfig.xml is irrelevant

 You say you're not seeing the output. One of two
 things is going on:
 1 The data is not in the index. See the admin/schema browser
  page to examine what actually went in your index.
 2 Try doing the query with fl=*. You may simply not be asking
  for the fields to be returned.

 Best
 Erick

 On Sun, Apr 8, 2012 at 9:09 PM, Rakesh Varna rakeshva...@gmail.com
 wrote:
  Hello Solr-users,
I am trying to index xml files which have the following tags: (I am
  using Solr 3.5 on Tomcat)
 
  DOC
  theta10.98/theta_1
  theta20.767/theta_2
  .
  ..
  ..
  ..
  thetaN0.2873/thetaN
  /DOC
 
  The numbers after theta are not a continuous sequence and I do not know
  how many such tags are there. I thought this was a good candidate for
  dynamic fields and have the following schema for those tags:
dynamicField name=theta*  type=double indexed=true
   stored=true/
  Is this correct? If so, what should I use in the data-config.xml file to
  index these tags?
 
  When I try the admin feature in the browser and query *:* , I don't see
 the
  theta fields in the response.
 
  If not, is dynamicFields a wrong choice? Is there another way of indexing
  these fields?
 
  Thanks in advance,
  Rakesh Varna

Re: SolrCloud versus a SearchComponent that rescores

2012-04-09 Thread Mark Miller


On Apr 9, 2012, at 7:34 PM, Benson Margulies wrote:

 Those of you insomniacs who have read my messages here over the last
 few weeks might recall that I've been working on a request handler
 that wraps the SearchHandler to rewrite queries and then reorder
 results.
 
 (I haven't quite worked out how to apply Grant's alternative
 suggestions without losing the performance advantages I was looking
 for in the first place.)
 
 Today, I realized that the RequestHandler approach, as opposed to
 search components, wasn't going to be viable. I was growing too much
 dependency on internal Solr quirks.
 
 So I refactored it into a pair of SearchComponents -- one to go first
 and rewrite the query, and one to go after query and rescore.
 
 And it works just fine - until I configure it into a SolrCloud
 cluster. At which point it started coming up with very wrong answers.
 
 I think that the reason is that I don't have an implementation of the
 distributedProcess method, or, more generally, that I don't understand
 the protocol on a SearchComponent when distributed processing is
 happening. Has anyone written anything yet about these considerations?
 I can put multiple processes in the debugging and see who gets called
 with what, but I was hoping for some sort of short cut.



Grant started something on this once: 
http://wiki.apache.org/solr/WritingDistributedSearchComponents
It's only a start though.

Unfortunately, to this point, adventurous souls have had to debug and study 
there way to understanding the distrib process solo mostly.

Perhaps we can encourage anyone that has written a distributed component to 
help jump in on that wiki page. Any takers?

- Mark Miller
lucidimagination.com

Re: Why this document does not match?

2012-04-09 Thread Alexander Ramos Jardim

Sorry for the answer.

2012/3/29 Erick Erickson erickerick...@gmail.com

 Alexander:

 Your images were stripped by one of our mail servers, so there's not
 much we can see G...

 But guessing, you aren't searching the fields you think you are:
 itemNameSearch:fifa12
 becomes
 itemNameSearch:fifa defaultSearchField:12


That's exactly what's happening! Why does this happen?



 where defaultSearchField is defined in your schema.xml file.
 Try itemNameSearch:(fifa 12) or similar.

 Using debugQuery=on should show this in the parsed_query section if I'm
 right.

 If that doesn't help, maybe you can post your info again?

 standard comment that this is really old Solr, are you sure you can't
 upgrade?


this has been discussed a lot. And my customer's sysdamin accepted
upgrading to Solr 3.5 , but we won't be doing this in the next month.



 Best
 Erick

 On Wed, Mar 28, 2012 at 5:31 PM, Alexander Ramos Jardim
 alexander.ramos.jar...@gmail.com wrote:
 
  Hi,
 
  I have an old Solr 1.3 version running on an issue. I have a field
 configured in such a way that fifa 12 and fifa12 should match the same
 documents, as it can be seen in screenshot bellow.
 
 
 
 
  When I run the query itemNameSearch:fifa12, I get the folowing result:
 
 
 
 
  That seems okay. But I have the following document on the index:
 
 
  As my field is defined, I expected the query to match this document.
 This is not what is happening. Does anyone have any idea on what is wrong?
 
 
  --
  Alexander Ramos Jardim




-- 
Alexander Ramos Jardim

Re: SolrCloud versus a SearchComponent that rescores

That page seems to be saying that the 'distributed' APIs take place on
the leader, and the ordinary prepare/process APIs out at the leaves.
I'll set out to prove or disprove that tomorrow.


On Mon, Apr 9, 2012 at 8:17 PM, Mark Miller markrmil...@gmail.com wrote:

 On Apr 9, 2012, at 7:34 PM, Benson Margulies wrote:

 Those of you insomniacs who have read my messages here over the last
 few weeks might recall that I've been working on a request handler
 that wraps the SearchHandler to rewrite queries and then reorder
 results.

 (I haven't quite worked out how to apply Grant's alternative
 suggestions without losing the performance advantages I was looking
 for in the first place.)

 Today, I realized that the RequestHandler approach, as opposed to
 search components, wasn't going to be viable. I was growing too much
 dependency on internal Solr quirks.

 So I refactored it into a pair of SearchComponents -- one to go first
 and rewrite the query, and one to go after query and rescore.

 And it works just fine - until I configure it into a SolrCloud
 cluster. At which point it started coming up with very wrong answers.

 I think that the reason is that I don't have an implementation of the
 distributedProcess method, or, more generally, that I don't understand
 the protocol on a SearchComponent when distributed processing is
 happening. Has anyone written anything yet about these considerations?
 I can put multiple processes in the debugging and see who gets called
 with what, but I was hoping for some sort of short cut.



 Grant started something on this once: 
 http://wiki.apache.org/solr/WritingDistributedSearchComponents
 It's only a start though.

 Unfortunately, to this point, adventurous souls have had to debug and study 
 there way to understanding the distrib process solo mostly.

 Perhaps we can encourage anyone that has written a distributed component to 
 help jump in on that wiki page. Any takers?

 - Mark Miller
 lucidimagination.com

Re: Is http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster up to date?

2012-04-09 Thread Mark Miller


On Apr 9, 2012, at 9:52 AM, Benson Margulies wrote:

 I specify -Dcollection.configName=rnicloud, but the admin gui tells me
 that I have a collection named 'collection1'.
 
 And, as reported in a prior email, the admin UI URL in there seems wrong.


Sorry - that param name is not entirely clear I guess - it's the name of the 
collection set you are uploading. Later, you could refer multiple Collections 
to that set of config files by using that name. In that example, because you 
don't override the collection name, it takes the default, which is the SolrCore 
name, which is collection1. You can override the collection name to something 
else by adding an attrib in solr.xml or using a param with CoreAdmin when 
creating a core dynamically. If you don't, it simply uses the SolrCore name for 
the name of the collection.

- Mark Miller
lucidimagination.com

Re: Why this document does not match?


:  itemNameSearch:fifa defaultSearchField:12

: That's exactly what's happening! Why does this happen?

whyspace is meaningful to the query parser: it tells the query parser 
there are multiple clauses for a boolean query.

if you want to search for any works the user typed in the field 
itemNameSearch then you can either set the default search field to 
itemNameSearch...
df=itemNameSearch  q=fifa 12
...or put parens arround the words you want to search for in a specific 
field...
q=itemNameSearch:(fifa 12)

-Hoss

[CFP] Open Source Search Conference Oct 2, 2012

2012-04-09 Thread Erik Hatcher

Sending this on behalf of my friends at BasisTech -



Subject: Call for Presentations: Open Source Search Conference Oct. 2, 2012 
(Chantilly, VA)

==
Call for Presentations  Save the Date
Open Source Search Conference Oct 2, 2012 
(tutorials Oct. 1) in Chantilly, VA
http://www.basistech.com/conference/2012/oss/
==

The second annual Open Source Search Conference will be held on October 2, 2012 
in Chantilly, VA, and you are invited to submit a presentation. The conference 
will be attended by government employees and contractors who are evaluating, 
building, or using Apache Solr and other open source tools for search 
applications throughout the government.

This event is a unique opportunity to share tips and ideas to overcome 
challenges working with open source search projects. We are also looking for 
people who are interested in providing half- and full-day tutorials on the day 
before the conference (October 1, 2012). The tutorials should provide hands-on 
guidance for using or developing open source search applications.
For more information, visit: http://www.basistech.com/conference/2012/oss/

==Dates==
Conference: October 2, 2012
Tutorials: October 1, 2012

==Submission Instructions==
Please email submissions for conference presentations and tutorials to 
oss2...@basistech.com by April 23, 2012.
To submit a presentation or tutorial, e-mail the following information:
1. Title
2. Author
3. Brief Biography
4. Description of presentation or tutorial (100-150 words)
5. Brief description of author’s experience with Apache Solr and/or other open 
source tools
6. Specify whether the presentation or tutorial is targeted towards users or 
developers

==Suggested Topics==
1. Large-scale Apache Solr
* Solr at exabyte scale 
* High-load deployments
* Complex queries
2. Analytic interfaces
* Geospatial search
* Iterative Analytics using Solr (index reprocessing, etc.)
* Exploring and Discovering Big Data with Solr
* Linguistic plug-in use and development
* Document clustering (semantic, field collapsing, dynamic faceting)
* Language identification
* Search in a multilingual site
* Sentiment analysis
3. Text Mining
* Text analytics processing
* Entity extraction
* Name matching
4. Security
* Access control
* Index encryption
5.Case studies and user experiences
* Migrating to Solr from other search engines
* Other topics


==About the Conference==
The Open Source Search Conference is sponsored by Basis Technology, which has 
been producing government conferences since 2006 and focuses on topics 
including text analytics, human language technology, and the nexus of language, 
culture and technology for the federal community. For more information about 
our conferences, visit: http://www.basistech.com/conference.
Basis Technology provides software solutions for text analytics, information 
retrieval, and name resolution in over 40 languages. Our customers include 
leading software vendors, content providers, financial institutions, and 
government agencies in the defense and intelligence industry.

[Lucene Revolution] Agenda Updated!

2012-04-09 Thread Erik Hatcher

We've updated the agenda and keynotes for the upcoming Lucene Revolution 
conference, May 7-10 in Boston, MA.  We've got a lot of the committers coming, 
and Hoss' infamous Stump the Chump session, and many great talks.  All we're 
missing is you it's not too late to sign up ;)

 http://www.lucenerevolution.com/agenda

We're unveiling a couple of new/revamped training classes, Solr 101 and Solr 
201 - the seats are filling up, so register soon.  I'm working like mad to 
complete the Solr 201 materials and will be teaching one of those sessions 
myself. 

See you at the Revolution.

Erik

Re: SolrCloud versus a SearchComponent that rescores

Um, maybe I've hit a quirk?

In my solrconfig.xml, my special SearchComponents are installed only
for a specific QT. So, it looks to me as if that QT is not propagated
into the request out to the shards, and so they run the ordinary
request handler without my components in it.

Is this intended behavior I have to tweak via a distribution-aware
component, or perhaps a bug, or does it make no sense at all and I
need to look for some mistake of mine?

Re: Question on using dynamic fields

2012-04-09 Thread Rakesh Varna

Hi Erick,
   The schema browser says that no dynamic fields were indexed. Any idea
how do I specify dynamic fields through XPath when I only know the prefix
and nothing else?

Regards,
Rakesh Varna

On Mon, Apr 9, 2012 at 4:49 PM, Rakesh Varna rakeshva...@gmail.com wrote:

 Hi Erick,
Thanks for the response. I am trying to index xml files in a directory.
 I provide the xpath details, file location etc in data-config.xml. I will
 try the 2 approaches that you have mentioned.

 Regards,
 Rakesh Varna


 On Mon, Apr 9, 2012 at 3:38 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, not sure about the dataconfig.xml file. What
 are you trying to index? Is this DIH? Because
 if you're simply posting Solr-formatted XML docs,
 dataconfig.xml is irrelevant

 You say you're not seeing the output. One of two
 things is going on:
 1 The data is not in the index. See the admin/schema browser
  page to examine what actually went in your index.
 2 Try doing the query with fl=*. You may simply not be asking
  for the fields to be returned.

 Best
 Erick

 On Sun, Apr 8, 2012 at 9:09 PM, Rakesh Varna rakeshva...@gmail.com
 wrote:
  Hello Solr-users,
I am trying to index xml files which have the following tags: (I am
  using Solr 3.5 on Tomcat)
 
  DOC
  theta10.98/theta_1
  theta20.767/theta_2
  .
  ..
  ..
  ..
  thetaN0.2873/thetaN
  /DOC
 
  The numbers after theta are not a continuous sequence and I do not
 know
  how many such tags are there. I thought this was a good candidate for
  dynamic fields and have the following schema for those tags:
dynamicField name=theta*  type=double indexed=true
   stored=true/
  Is this correct? If so, what should I use in the data-config.xml file to
  index these tags?
 
  When I try the admin feature in the browser and query *:* , I don't see
 the
  theta fields in the response.
 
  If not, is dynamicFields a wrong choice? Is there another way of
 indexing
  these fields?
 
  Thanks in advance,
  Rakesh Varna

Re: SolrCloud versus a SearchComponent that rescores

2012-04-09 Thread Mark Miller

Yeah, that's how it works - it ends up hitting the select request handler (this 
might be overridable with shards.qt) All the params are passed along, so in 
general, it will act the same as the top level req handler - but it can the 
remove the shards param so you don't have an infinite recursion of distrib 
requests (say in the case you put shards in the tea handler in solrconfig). 

I think you have to investigate shards.qt  
Or look at adding those components to the std select handler as well. 

Sent from my iPhone

On Apr 9, 2012, at 9:26 PM, Benson Margulies bimargul...@gmail.com wrote:

 Um, maybe I've hit a quirk?
 
 In my solrconfig.xml, my special SearchComponents are installed only
 for a specific QT. So, it looks to me as if that QT is not propagated
 into the request out to the shards, and so they run the ordinary
 request handler without my components in it.
 
 Is this intended behavior I have to tweak via a distribution-aware
 component, or perhaps a bug, or does it make no sense at all and I
 need to look for some mistake of mine?

Re: To truncate or not to truncate (group.truncate vs. facet)