Re: deploy a brand new index in solrcloud

2012-06-10 Thread Anatoli Matuskova
I've thought in setting replication in solrCloud:
http://www.searchworkings.org/forum/-/message_boards/view_message/339527#_19_message_339527
What I don't know is if while replication is being handled, the replica
slaves (that are not the master in replication) can keep handling puts via
transaction log

--
View this message in context: 
http://lucene.472066.n3.nabble.com/deploy-a-brand-new-index-in-solrcloud-tp3988731p3988757.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to do custom sorting in Solr?

2012-06-10 Thread roz dev
Hi All


 I have an index which contains a Catalog of Products and Categories, with
 Solr 4.0 from trunk

 Data is organized like this:

 Category: Books

 Sub Category: Programming

 Products:

 Product # 1,  Price: Regular Sort Order:1
 Product # 2,  Price: Markdown, Sort Order:2
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Regular, Sort Order:4
 
 .
 ...
 Product # 100   Price: Regular, Sort Order:100

 Sub Category: Fiction

 Products:

 Product # 1,  Price: Markdown, Sort Order:1
 Product # 2,  Price: Regular, Sort Order:2
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Markdown, Sort Order:4
 
 .
 ...
 Product # 70   Price: Regular, Sort Order:70


 I want to query Solr and sort these products within each of the
 sub-category in a such a way that products which are on markdown, are at
 the bottom of the documents list and other products
 which are on regular price, are sorted as per their sort order in their
 sub-category.

 Expected Results are

 Category: Books

 Sub Category: Programming

 Products:

 Product # 1,  Price: Regular Sort Order:1
 Product # 2,  Price: Markdown, Sort Order:101
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Regular, Sort Order:4
 
 .
 ...
 Product # 100   Price: Regular, Sort Order:100

 Sub Category: Fiction

 Products:

 Product # 1,  Price: Markdown, Sort Order:71
 Product # 2,  Price: Regular, Sort Order:2
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Markdown, Sort Order:71
 
 .
 ...
 Product # 70   Price: Regular, Sort Order:70


 My query is like this:

 q=*:*fq=category:Books

 What are the options to implement custom sorting and how do I do it?


- Define a Custom Function query?
- Define a Custom Comparator? Or,
- Define a Custom Collector?


 Please let me know the best way to go about it and any pointers to
 customize Solr 4.


Thanks
Saroj


x most similar documents

2012-06-10 Thread Benjamin Murauer
Hi there,
i have a solr server running containing tweets. my schema.xml contains
following fields:

fields
 field name=id type=string indexed=true stored=true
 required=true /
 field name=tweet type=text_general
indexed=true stored=true termVectors=true/
 field
name=hashtags type=text_general indexed=true stored=true
termVectors=true/
/fields

my problem is actually quite simple; somewhere in my GUI the user types
text and i want to retrieve tweets that are most similar to it.
Therefore, i tried the morelikethis functionality. My problem is that
currently, mlt finds additional tweets to every tweet found by the
select handler. I'm not sure however if the select handler finds the
most fitting tweet or just returns the first match. currently, i am
using following query:

http://localhost:8983/solr/select/?q=tweet:heavenmlt=truemlt.fl=tweet,hashtagswt=jsonindent=true

Am i missing something critical? So eventually, i just want to retrieve
x tweets with the most similar text, sorted by their similarity (cosine
of termVectors). Is MoreLikeThis the way to go?

Thanks in advance!


Re: How to do custom sorting in Solr?

2012-06-10 Thread Erick Erickson
Skimming this, I two options come to mind:

1 Simply apply primary, secondary, etc sorts. Something like
   sort=subcategory asc,markdown_or_regular desc,sort_order asc

2 You could also use grouping to arrange things in groups and sort within
  those groups. This has the advantage of returning some members
  of each of the top N groups in the result set, which makes it easier to
  get some of each group rather than having to analyze the whole list

But your example is somewhat contradictory. You say
products which are on markdown, are at
the bottom of the documents list

But in your examples, products on markdown are intermingled

Best
Erick

On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote:
 Hi All


 I have an index which contains a Catalog of Products and Categories, with
 Solr 4.0 from trunk

 Data is organized like this:

 Category: Books

 Sub Category: Programming

 Products:

 Product # 1,  Price: Regular Sort Order:1
 Product # 2,  Price: Markdown, Sort Order:2
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Regular, Sort Order:4
 
 .
 ...
 Product # 100   Price: Regular, Sort Order:100

 Sub Category: Fiction

 Products:

 Product # 1,  Price: Markdown, Sort Order:1
 Product # 2,  Price: Regular, Sort Order:2
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Markdown, Sort Order:4
 
 .
 ...
 Product # 70   Price: Regular, Sort Order:70


 I want to query Solr and sort these products within each of the
 sub-category in a such a way that products which are on markdown, are at
 the bottom of the documents list and other products
 which are on regular price, are sorted as per their sort order in their
 sub-category.

 Expected Results are

 Category: Books

 Sub Category: Programming

 Products:

 Product # 1,  Price: Regular Sort Order:1
 Product # 2,  Price: Markdown, Sort Order:101
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Regular, Sort Order:4
 
 .
 ...
 Product # 100   Price: Regular, Sort Order:100

 Sub Category: Fiction

 Products:

 Product # 1,  Price: Markdown, Sort Order:71
 Product # 2,  Price: Regular, Sort Order:2
 Product # 3   Price: Regular, Sort Order:3
 Product # 4   Price: Markdown, Sort Order:71
 
 .
 ...
 Product # 70   Price: Regular, Sort Order:70


 My query is like this:

 q=*:*fq=category:Books

 What are the options to implement custom sorting and how do I do it?


    - Define a Custom Function query?
    - Define a Custom Comparator? Or,
    - Define a Custom Collector?


 Please let me know the best way to go about it and any pointers to
 customize Solr 4.


 Thanks
 Saroj


Re: x most similar documents

2012-06-10 Thread Jack Krupansky
Yes, it sounds like MLT is the way to go, but sometimes you have to get 
creative in figuring out how to set the numerous parameters. And sometimes 
you have to use the MLT request handler rather than /select with the MLT 
component.


You might also encounter issues related to the shortness of the text of 
tweets. Some of the MLT parameters might be optimized for much larger texts.


Can you give us an example of a (very brief) tweet that your query finds, 
the tweet(s) that MLT returns, and what other tweet(s) you would have 
expected.


MLT will use the first search result from the original query.

-- Jack Krupansky

-Original Message- 
From: Benjamin Murauer

Sent: Sunday, June 10, 2012 7:32 AM
To: solr-user@lucene.apache.org
Subject: x most similar documents

Hi there,
i have a solr server running containing tweets. my schema.xml contains
following fields:

fields
field name=id type=string indexed=true stored=true
required=true /
field name=tweet type=text_general
indexed=true stored=true termVectors=true/
field
name=hashtags type=text_general indexed=true stored=true
termVectors=true/
/fields

my problem is actually quite simple; somewhere in my GUI the user types
text and i want to retrieve tweets that are most similar to it.
Therefore, i tried the morelikethis functionality. My problem is that
currently, mlt finds additional tweets to every tweet found by the
select handler. I'm not sure however if the select handler finds the
most fitting tweet or just returns the first match. currently, i am
using following query:

http://localhost:8983/solr/select/?q=tweet:heavenmlt=truemlt.fl=tweet,hashtagswt=jsonindent=true

Am i missing something critical? So eventually, i just want to retrieve
x tweets with the most similar text, sorted by their similarity (cosine
of termVectors). Is MoreLikeThis the way to go?

Thanks in advance! 



Re: x most similar documents

2012-06-10 Thread Jack Krupansky
Oops, I said MLT will use the first search result from the original query, 
but that is for the MLT handler. For the MLT component you get a separate 
set of documents for each document in the results of the original query.


-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Sunday, June 10, 2012 1:25 PM
To: solr-user@lucene.apache.org
Subject: Re: x most similar documents

Yes, it sounds like MLT is the way to go, but sometimes you have to get
creative in figuring out how to set the numerous parameters. And sometimes
you have to use the MLT request handler rather than /select with the MLT
component.

You might also encounter issues related to the shortness of the text of
tweets. Some of the MLT parameters might be optimized for much larger texts.

Can you give us an example of a (very brief) tweet that your query finds,
the tweet(s) that MLT returns, and what other tweet(s) you would have
expected.

MLT will use the first search result from the original query.

-- Jack Krupansky

-Original Message- 
From: Benjamin Murauer

Sent: Sunday, June 10, 2012 7:32 AM
To: solr-user@lucene.apache.org
Subject: x most similar documents

Hi there,
i have a solr server running containing tweets. my schema.xml contains
following fields:

fields
field name=id type=string indexed=true stored=true
required=true /
field name=tweet type=text_general
indexed=true stored=true termVectors=true/
field
name=hashtags type=text_general indexed=true stored=true
termVectors=true/
/fields

my problem is actually quite simple; somewhere in my GUI the user types
text and i want to retrieve tweets that are most similar to it.
Therefore, i tried the morelikethis functionality. My problem is that
currently, mlt finds additional tweets to every tweet found by the
select handler. I'm not sure however if the select handler finds the
most fitting tweet or just returns the first match. currently, i am
using following query:

http://localhost:8983/solr/select/?q=tweet:heavenmlt=truemlt.fl=tweet,hashtagswt=jsonindent=true

Am i missing something critical? So eventually, i just want to retrieve
x tweets with the most similar text, sorted by their similarity (cosine
of termVectors). Is MoreLikeThis the way to go?

Thanks in advance! 



Building a heat map from geo data in index

2012-06-10 Thread Jamie Johnson
I had a request from a customer which to this point I have not seen
much similar so I figured I'd pose the question here.  I've been asked
if it was possible to build a heat map from the results of a query.  I
can imagine a process to do this through some post processing, but
that sounds very expensive for large/distributed indices so I was
wondering if with all of the new geospatial support that is being
added to lucene/solr there was a way to do geospatial faceting.  What
I am imagining is bounding box being defined and that box being broken
into an N by N matrix, each of which would return counts so a heat map
could be constructed.  Any other thoughts on this would be greatly
appreciated, right now I am really just fishing for some ideas.


Re: How to do custom sorting in Solr?

2012-06-10 Thread roz dev
Thanks Erik for your quick feedback

When Products are assigned to a category or Sub-Category then they can be
in any order and price type can be regular or markdown.
So, reg and markdown products are intermingled  as per their assignment but
I want to sort them in such a way that we
ensure that all the products which are on markdown are at the bottom of the
list.

I can use these multiple sorts but I realize that they are costly in terms
of heap used, as they are using FieldCache.

I have an index with 2M docs and docs are pretty big. So, I don't want to
use them unless there is no other option.

I am wondering if I can define a custom function query which can be like
this:


   - check if product is on the markdown
   - if yes then change its sort order field to be the max value in the
   given sub-category, say 99
   - else, use the sort order of the product in the sub-category

I have been looking at existing function queries but do not have a good
handle on how to make one of my own.

- Another option could be use a custom sort comparator but I am not sure
about the way it works

Any thoughts?


-Saroj




On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.comwrote:

 Skimming this, I two options come to mind:

 1 Simply apply primary, secondary, etc sorts. Something like
   sort=subcategory asc,markdown_or_regular desc,sort_order asc

 2 You could also use grouping to arrange things in groups and sort within
  those groups. This has the advantage of returning some members
  of each of the top N groups in the result set, which makes it easier
 to
  get some of each group rather than having to analyze the whole
 list

 But your example is somewhat contradictory. You say
 products which are on markdown, are at
 the bottom of the documents list

 But in your examples, products on markdown are intermingled

 Best
 Erick

 On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote:
  Hi All
 
 
  I have an index which contains a Catalog of Products and Categories,
 with
  Solr 4.0 from trunk
 
  Data is organized like this:
 
  Category: Books
 
  Sub Category: Programming
 
  Products:
 
  Product # 1,  Price: Regular Sort Order:1
  Product # 2,  Price: Markdown, Sort Order:2
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Regular, Sort Order:4
  
  .
  ...
  Product # 100   Price: Regular, Sort Order:100
 
  Sub Category: Fiction
 
  Products:
 
  Product # 1,  Price: Markdown, Sort Order:1
  Product # 2,  Price: Regular, Sort Order:2
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Markdown, Sort Order:4
  
  .
  ...
  Product # 70   Price: Regular, Sort Order:70
 
 
  I want to query Solr and sort these products within each of the
  sub-category in a such a way that products which are on markdown, are at
  the bottom of the documents list and other products
  which are on regular price, are sorted as per their sort order in their
  sub-category.
 
  Expected Results are
 
  Category: Books
 
  Sub Category: Programming
 
  Products:
 
  Product # 1,  Price: Regular Sort Order:1
  Product # 2,  Price: Markdown, Sort Order:101
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Regular, Sort Order:4
  
  .
  ...
  Product # 100   Price: Regular, Sort Order:100
 
  Sub Category: Fiction
 
  Products:
 
  Product # 1,  Price: Markdown, Sort Order:71
  Product # 2,  Price: Regular, Sort Order:2
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Markdown, Sort Order:71
  
  .
  ...
  Product # 70   Price: Regular, Sort Order:70
 
 
  My query is like this:
 
  q=*:*fq=category:Books
 
  What are the options to implement custom sorting and how do I do it?
 
 
 - Define a Custom Function query?
 - Define a Custom Comparator? Or,
 - Define a Custom Collector?
 
 
  Please let me know the best way to go about it and any pointers to
  customize Solr 4.
 
 
  Thanks
  Saroj



Re: How to do custom sorting in Solr?

2012-06-10 Thread Erick Erickson
2M docs is actually pretty small. Sorting is sensitive to the number
of _unique_ values in the sort fields, not necessarily the number of
documents.

And sorting only works on fields with a single value (i.e. it can't have
more than one token after analysis). So for each field you're only talking
2M values at the vary maximum, assuming that the field in question has
a unique value per document, which I doubt very much given your
problem description.

So with a corpus that size, I'd just try it'.

Best
Erick

On Sun, Jun 10, 2012 at 7:12 PM, roz dev rozde...@gmail.com wrote:
 Thanks Erik for your quick feedback

 When Products are assigned to a category or Sub-Category then they can be
 in any order and price type can be regular or markdown.
 So, reg and markdown products are intermingled  as per their assignment but
 I want to sort them in such a way that we
 ensure that all the products which are on markdown are at the bottom of the
 list.

 I can use these multiple sorts but I realize that they are costly in terms
 of heap used, as they are using FieldCache.

 I have an index with 2M docs and docs are pretty big. So, I don't want to
 use them unless there is no other option.

 I am wondering if I can define a custom function query which can be like
 this:


   - check if product is on the markdown
   - if yes then change its sort order field to be the max value in the
   given sub-category, say 99
   - else, use the sort order of the product in the sub-category

 I have been looking at existing function queries but do not have a good
 handle on how to make one of my own.

 - Another option could be use a custom sort comparator but I am not sure
 about the way it works

 Any thoughts?


 -Saroj




 On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Skimming this, I two options come to mind:

 1 Simply apply primary, secondary, etc sorts. Something like
   sort=subcategory asc,markdown_or_regular desc,sort_order asc

 2 You could also use grouping to arrange things in groups and sort within
      those groups. This has the advantage of returning some members
      of each of the top N groups in the result set, which makes it easier
 to
      get some of each group rather than having to analyze the whole
 list

 But your example is somewhat contradictory. You say
 products which are on markdown, are at
 the bottom of the documents list

 But in your examples, products on markdown are intermingled

 Best
 Erick

 On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote:
  Hi All
 
 
  I have an index which contains a Catalog of Products and Categories,
 with
  Solr 4.0 from trunk
 
  Data is organized like this:
 
  Category: Books
 
  Sub Category: Programming
 
  Products:
 
  Product # 1,  Price: Regular Sort Order:1
  Product # 2,  Price: Markdown, Sort Order:2
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Regular, Sort Order:4
  
  .
  ...
  Product # 100   Price: Regular, Sort Order:100
 
  Sub Category: Fiction
 
  Products:
 
  Product # 1,  Price: Markdown, Sort Order:1
  Product # 2,  Price: Regular, Sort Order:2
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Markdown, Sort Order:4
  
  .
  ...
  Product # 70   Price: Regular, Sort Order:70
 
 
  I want to query Solr and sort these products within each of the
  sub-category in a such a way that products which are on markdown, are at
  the bottom of the documents list and other products
  which are on regular price, are sorted as per their sort order in their
  sub-category.
 
  Expected Results are
 
  Category: Books
 
  Sub Category: Programming
 
  Products:
 
  Product # 1,  Price: Regular Sort Order:1
  Product # 2,  Price: Markdown, Sort Order:101
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Regular, Sort Order:4
  
  .
  ...
  Product # 100   Price: Regular, Sort Order:100
 
  Sub Category: Fiction
 
  Products:
 
  Product # 1,  Price: Markdown, Sort Order:71
  Product # 2,  Price: Regular, Sort Order:2
  Product # 3   Price: Regular, Sort Order:3
  Product # 4   Price: Markdown, Sort Order:71
  
  .
  ...
  Product # 70   Price: Regular, Sort Order:70
 
 
  My query is like this:
 
  q=*:*fq=category:Books
 
  What are the options to implement custom sorting and how do I do it?
 
 
     - Define a Custom Function query?
     - Define a Custom Comparator? Or,
     - Define a Custom Collector?
 
 
  Please let me know the best way to go about it and any pointers to
  customize Solr 4.
 
 
  Thanks
  Saroj



Re: How to do custom sorting in Solr?

2012-06-10 Thread roz dev
Yes, these documents have lots of unique values as the same product could
be assigned to lots of other categories and that too, in a different sort
order.

We did some evaluation of heap usage and found that with kind of queries we
generate, heap usage was going up to 24-26 GB. I could trace it to the fact
that
fieldCache is creating an array of 2M size for each of the sort fields.

Since same products are mapped to multiple categories, we incur significant
memory overhead. Therefore, any solve where memory consumption can be
reduced is a good one for me.

In fact, we have situations where same product is mapped to more than 1
sub-category in the same category like


Books
 -- Programming
  - Java in a nutshell
 -- Sale (40% off)
  - Java in a nutshell


So,another thought in my mind is to somehow use second pass collector to
group books appropriately in Programming and Sale categories, with right
sort order.

But, i have no clue about that piece :(

-Saroj


On Sun, Jun 10, 2012 at 4:30 PM, Erick Erickson erickerick...@gmail.comwrote:

 2M docs is actually pretty small. Sorting is sensitive to the number
 of _unique_ values in the sort fields, not necessarily the number of
 documents.

 And sorting only works on fields with a single value (i.e. it can't have
 more than one token after analysis). So for each field you're only talking
 2M values at the vary maximum, assuming that the field in question has
 a unique value per document, which I doubt very much given your
 problem description.

 So with a corpus that size, I'd just try it'.

 Best
 Erick

 On Sun, Jun 10, 2012 at 7:12 PM, roz dev rozde...@gmail.com wrote:
  Thanks Erik for your quick feedback
 
  When Products are assigned to a category or Sub-Category then they can be
  in any order and price type can be regular or markdown.
  So, reg and markdown products are intermingled  as per their assignment
 but
  I want to sort them in such a way that we
  ensure that all the products which are on markdown are at the bottom of
 the
  list.
 
  I can use these multiple sorts but I realize that they are costly in
 terms
  of heap used, as they are using FieldCache.
 
  I have an index with 2M docs and docs are pretty big. So, I don't want to
  use them unless there is no other option.
 
  I am wondering if I can define a custom function query which can be like
  this:
 
 
- check if product is on the markdown
- if yes then change its sort order field to be the max value in the
given sub-category, say 99
- else, use the sort order of the product in the sub-category
 
  I have been looking at existing function queries but do not have a good
  handle on how to make one of my own.
 
  - Another option could be use a custom sort comparator but I am not sure
  about the way it works
 
  Any thoughts?
 
 
  -Saroj
 
 
 
 
  On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Skimming this, I two options come to mind:
 
  1 Simply apply primary, secondary, etc sorts. Something like
sort=subcategory asc,markdown_or_regular desc,sort_order asc
 
  2 You could also use grouping to arrange things in groups and sort
 within
   those groups. This has the advantage of returning some members
   of each of the top N groups in the result set, which makes it
 easier
  to
   get some of each group rather than having to analyze the whole
  list
 
  But your example is somewhat contradictory. You say
  products which are on markdown, are at
  the bottom of the documents list
 
  But in your examples, products on markdown are intermingled
 
  Best
  Erick
 
  On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote:
   Hi All
  
  
   I have an index which contains a Catalog of Products and Categories,
  with
   Solr 4.0 from trunk
  
   Data is organized like this:
  
   Category: Books
  
   Sub Category: Programming
  
   Products:
  
   Product # 1,  Price: Regular Sort Order:1
   Product # 2,  Price: Markdown, Sort Order:2
   Product # 3   Price: Regular, Sort Order:3
   Product # 4   Price: Regular, Sort Order:4
   
   .
   ...
   Product # 100   Price: Regular, Sort Order:100
  
   Sub Category: Fiction
  
   Products:
  
   Product # 1,  Price: Markdown, Sort Order:1
   Product # 2,  Price: Regular, Sort Order:2
   Product # 3   Price: Regular, Sort Order:3
   Product # 4   Price: Markdown, Sort Order:4
   
   .
   ...
   Product # 70   Price: Regular, Sort Order:70
  
  
   I want to query Solr and sort these products within each of the
   sub-category in a such a way that products which are on markdown,
 are at
   the bottom of the documents list and other products
   which are on regular price, are sorted as per their sort order in
 their
   sub-category.
  
   Expected Results are
  
   Category: Books
  
   Sub Category: Programming
  
   Products:
  
   Product # 1,  Price: Regular Sort Order:1
   Product # 2,  Price: Markdown, Sort Order:101
   Product 

Issues with whitespace tokenization in QueryParser

2012-06-10 Thread John Berryman
According to https://issues.apache.org/jira/browse/LUCENE-2605, the Lucene
QueryParser tokenizes on white space before giving any text to the
Analyzer. This makes it impossible to use multi-term synonyms because the
SynonymFilter only receives one word at a time.

Resolution to this would really help with my current project. My project
client sells clothing and accessories online. They have plenty of examples
of compound words e.g.rain coat. But some of these compound words are
really tripping them up. A prime example is that a search for dress shoes
returns a list of dresses and random shoes (not necessarily dress shoes). I
wish that I was able to synonym compound words to single tokens (e.g.
dress shoes = dress_shoes), but with this whitespace tokenization issue,
it's impossible.

Has anything happened with this bug recently? For a short time I've got a
client that would be willing to pay for this issues to be fixed if it's not
too much of a rabbit hole. Anyone care to catch me up with what this might
entail?

LinkedIn http://www.linkedin.com/pub/john-berryman/13/b17/864
Twitter http://twitter.com/#!/jnbrymn


Re: What would cause: SEVERE: java.lang.ClassCastException: com.company.MyCustomTokenizerFactory cannot be cast to org.apache.solr.analysis.TokenizerFactory

2012-06-10 Thread Aaron Daubman
Jack,

Thanks - this was indeed the issue. I still don't understand exactly why
(the same local-nexus-hosted Solr jars were the ones being duplicated on
the classpath: included in my custom -with-dependencies jars as well as in
the solr war, which was build/distributed/and hosted from the same nexus
repo used to host my jars) but shading solr from my -with-dependencies jars
fixed the issue.
(if anybody could point me to reading on why this happened - e.g. the
classes on the classpath would be duplicated but identical, in
my naive understanding of the classloader this should have still just
worked - it would be appreciated)

Thanks again,
 Aaron

On Sat, Jun 9, 2012 at 2:40 PM, Jack Krupansky j...@basetechnology.comwrote:

 Make sure there are no stray jars/classes in your jar, especially any that
 might contain BaseTokenizerFactory or TokenizerFactory. I notice that your
 jar name says -with-dependencies, raising a little suspicion. The
 exception is as if your class was referring to a BaseTokenizerFactory,
 which implements TokenizerFactory, coming from your jar (or a contained
 jar) rather than getting resolved to Solr 3.6's own BaseTokenizerFactory
 and TokenizerFactory.

 -- Jack Krupansky

 -Original Message- From: Aaron Daubman
 Sent: Saturday, June 09, 2012 12:03 AM
 To: solr-user@lucene.apache.org
 Subject: What would cause: SEVERE: java.lang.ClassCastException:
 com.company.**MyCustomTokenizerFactory cannot be cast to
 org.apache.solr.analysis.**TokenizerFactory


 Greetings,

 I am in the process of updating custom code and schema from Solr 1.4 to
 3.6.0 and have run into the following issue with our two custom Tokenizer
 and Token Filter components.

 I've been banging my head against this one for far too long, especially
 since it must be something obvious I'm missing.

 I have  custom Tokenizer and Token Filter components along with
 corresponding factories. The code for all looks very similar to the
 Tokenizer and TokenFilter (and Factory) code that is standard with 3.6.0
 (and I have also read through
 http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**shttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

 I have ensured my custom code is on the classpath, it is
 in ENSolrComponents-1.0-SNAPSHOT-**jar-with-dependencies.jar:
 ---output snip---
 Jun 8, 2012 10:41:00 PM org.apache.solr.core.**CoreContainer load
 INFO: loading shared library: /opt/test_artists_solr/jetty-**solr/lib/en
 Jun 8, 2012 10:41:00 PM org.apache.solr.core.**SolrResourceLoader
 replaceClassLoader
 INFO: Adding
 'file:/opt/test_artists_solr/**jetty-solr/lib/en/**
 ENSolrComponents-1.0-SNAPSHOT-**jar-with-dependencies.jar'
 to classloader
 Jun 8, 2012 10:41:00 PM org.apache.solr.core.**SolrResourceLoader
 replaceClassLoader
 INFO: Adding
 'file:/opt/test_artists_solr/**jetty-solr/lib/en/ENUtil-1.0-**
 SNAPSHOT-jar-with-**dependencies.jar'
 to classloader
 Jun 8, 2012 10:41:00 PM org.apache.solr.core.**CoreContainer create
 --snip---

 After successfully parsing the schema and creating many fields, etc.. the
 following is logged:
 ---snip---
 Jun 8, 2012 10:41:00 PM org.apache.solr.util.plugin.**AbstractPluginLoader
 load
 INFO: created : com.company.**MyCustomTokenizerFactory
 Jun 8, 2012 10:41:00 PM org.apache.solr.common.**SolrException log
 SEVERE: java.lang.ClassCastException: com.company.**
 MyCustomTokenizerFactory
 cannot be cast to org.apache.solr.analysis.**TokenizerFactory
 at org.apache.solr.schema.**IndexSchema$5.init(**IndexSchema.java:966)
 at
 org.apache.solr.util.plugin.**AbstractPluginLoader.load(**
 AbstractPluginLoader.java:148)
 at org.apache.solr.schema.**IndexSchema.readAnalyzer(**
 IndexSchema.java:986)
 at org.apache.solr.schema.**IndexSchema.access$100(**IndexSchema.java:60)
 at org.apache.solr.schema.**IndexSchema$1.create(**IndexSchema.java:453)
 at org.apache.solr.schema.**IndexSchema$1.create(**IndexSchema.java:433)
 at
 org.apache.solr.util.plugin.**AbstractPluginLoader.load(**
 AbstractPluginLoader.java:140)
 at org.apache.solr.schema.**IndexSchema.readSchema(**IndexSchema.java:490)
 at org.apache.solr.schema.**IndexSchema.init(**IndexSchema.java:123)
 at org.apache.solr.core.**CoreContainer.create(**CoreContainer.java:481)
 at org.apache.solr.core.**CoreContainer.load(**CoreContainer.java:335)
 at org.apache.solr.core.**CoreContainer.load(**CoreContainer.java:219)
 at
 org.apache.solr.core.**CoreContainer$Initializer.**
 initialize(CoreContainer.java:**161)
 at
 org.apache.solr.servlet.**SolrDispatchFilter.init(**
 SolrDispatchFilter.java:96)
 at org.eclipse.jetty.servlet.**FilterHolder.doStart(**
 FilterHolder.java:102)
 at
 org.eclipse.jetty.util.**component.AbstractLifeCycle.**
 start(AbstractLifeCycle.java:**59)
 at
 org.eclipse.jetty.servlet.**ServletHandler.initialize(**
 ServletHandler.java:748)
 at
 org.eclipse.jetty.servlet.**ServletContextHandler.**startContext(**
 ServletContextHandler.java:**249)
 at
 

Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-10 Thread Aaron Daubman
Hoss,

The new FieldValueSubsetUpdateProcessorFactory classes look phenomenal. I
haven't looked yet, but what are the chances these will be back-ported to
3.6 (or how hard would it be to backport them?)... I'll have to check out
the source in more detail.

If stuck on 3.6, what would be the best way to deal with this situation?
It's currently looking like it will have to be a custom update handler, but
I'd hate to have to go down this route if there are more future-proof
options.

Thanks again,
 Aaron

On Tue, Jun 5, 2012 at 6:53 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : The real issue here is that the docs are created externally, and the
 : producer won't (yet) guarantee that fields that should appear once will
 : actually appear once. Because of this, I don't want to declare the field
 as
 : multiValued=false as I don't want to cause indexing errors. It would be
 : great for me (and apparently many others after searching) if there were
 an
 : option as simple as forceSingleValued=true - where some deterministic
 : behavior such as use first field encountered, ignore all others, would
 : occur.

 This will be trivial in Solr 4.0, using one of the new
 FieldValueSubsetUpdateProcessorFactory classes that are now available --
 just pick your rule...


 https://builds.apache.org/view/G-L/view/Lucene/job/Solr-trunk/javadoc/org/apache/solr/update/processor/FieldValueSubsetUpdateProcessorFactory.html
 Direct Known Subclasses:
FirstFieldValueUpdateProcessorFactory,
LastFieldValueUpdateProcessorFactory,
MaxFieldValueUpdateProcessorFactory,
MinFieldValueUpdateProcessorFactory

 -Hoss