Re: Restricting search results by field value

2012-12-06 Thread Way Cool
Grouping should work:
group=truegroup.field=source_idgroup.limit=3group.main=true

On Thu, Dec 6, 2012 at 2:35 AM, Tom Mortimer bano...@gmail.com wrote:

 Sounds like it's worth a try! Thanks Andre.
 Tom

 On 5 Dec 2012, at 17:49, Andre Bois-Crettez andre.b...@kelkoo.com wrote:

  If you do grouping on source_id, it should be enough to request 3 times
  more documents than you need, then reorder and drop the bottom.
 
  Is a 3x overhead acceptable ?
 
 
 
  On 12/05/2012 12:04 PM, Tom Mortimer wrote:
  Hi everyone,
 
  I've got a problem where I have docs with a source_id field, and there
 can be many docs from each source. Searches will typically return docs from
 many sources. I want to restrict the number of docs from each source in
 results, so there will be no more than (say) 3 docs from source_id=123 etc.
 
  Field collapsing is the obvious approach, but I want to get the results
 back in relevancy order, not grouped by source_id. So it looks like I'll
 have to fetch more docs than I need to and re-sort them. It might even be
 better to count source_ids in the client code and drop excess docs that
 way, but the potential overhead is large.
 
  Is there any way of doing this in Solr without hacking in a custom
 Lucene Collector? (which doesn't look all that straightforward).
 
  cheers,
  Tom
 
 
  --
  André Bois-Crettez
 
  Search technology, Kelkoo
  http://www.kelkoo.com/
 
  Kelkoo SAS
  Société par Actions Simplifiée
  Au capital de € 4.168.964,30
  Siège social : 8, rue du Sentier 75002 Paris
  425 093 069 RCS Paris
 
  Ce message et les pièces jointes sont confidentiels et établis à
 l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
 destinataire de ce message, merci de le détruire et d'en avertir
 l'expéditeur.




Re: Couple issues with edismax in 3.5

2012-03-01 Thread Way Cool
Thanks Ahmet! That's good to know someone else also tried to make  phrase
queries to fix multi-word synonym issue. :-)


On Thu, Mar 1, 2012 at 1:42 AM, Ahmet Arslan iori...@yahoo.com wrote:

  I don't think mm will help here because it defaults to 100%
  already by the
  following code.

 Default behavior of mm has changed recently. So it is a good idea to
 explicitly set it to 100%. Then all of the search terms must match.

  Regarding multi-word synonym, what is the best way to handle
  it now? Make
  it as a phrase with  or adding -  in between?
  I don't like index time expansion because it adds lots of
  noises.

 Solr wiki advices to use them at index time for various reasons.

 ... The recommended approach for dealing with synonyms like this, is to
 expand the synonym when indexing...


 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

 However index time synonyms has its own problems as well. If you add a new
 synonym, you need to re-index those documents that contain this  newly
 added synonym.

 Also highlighting highlights whole phrases. For example you have :
us, united states
 Searching for states will highlight both united and stated.
 Not sure but this seems fixed with LUCENE-3668

 I was thinking to have query expansion module to handle multi-word
 synonyms at query time only. Either using o.a.l.search.Query manipulation
 or String manipulation. Similar to Lukas' posting here
 http://www.searchworkings.org/forum/-/message_boards/view_message/146097






Re: Couple issues with edismax in 3.5

2012-02-29 Thread Way Cool
Thanks Ahmet for your reply.

I don't think mm will help here because it defaults to 100% already by the
following code.

 if (parsedUserQuery != null  doMinMatched) {
String minShouldMatch = solrParams.get(DMP.MM, 100%);
if (parsedUserQuery instanceof BooleanQuery) {
  U.setMinShouldMatch((BooleanQuery)parsedUserQuery,
minShouldMatch);
}
  }

Regarding multi-word synonym, what is the best way to handle it now? Make
it as a phrase with  or adding -  in between?
I don't like index time expansion because it adds lots of noises.

That's good to know Analysis.jsp does not perform actual query parsing. I
was hoping edismax can do something similar to analysis tool because it
shows everything I need for multi-word synonym.

Thanks.

On Wed, Feb 29, 2012 at 1:23 AM, Ahmet Arslan iori...@yahoo.com wrote:

  1. Search for 4X6 generated the following parsed query:
  +DisjunctionMaxQueryid:4 id:x id:6)^1.2) | ((name:4
  name:x
  name:6)^1.025) )
  while the search for 4 X 6 (with space in between)
  generated the query
  below: (I like this one)
  +((DisjunctionMaxQuery((id:4^1.2 | name:4^1.025)
  +((DisjunctionMaxQuery((id:x^1.2 | name:x^1.025)
  +((DisjunctionMaxQuery((id:6^1.2 | name:6^1.025)
 
  Is that really intentional? The first query is pretty weird
  because it will
  return all of the docs with one of 4, x, 6.

 Minimum Should Match (mm) parameter is used to control how many search
 terms should match. For example, you can set it to mm=100%.

 Also you can tweak relevancy be setting phrase fields (pf) parameter.

  Any easy way we can force 4X6 search to be the same as 4
  X 6?
 
  2. Issue with multi words synonym because edismax separates
  keywords to
  multiple words via the line below:
  clauses = splitIntoClauses(userQuery, false);
  and seems like edismax doesn't quite respect fieldType at
  query time, for
  example, handling stopWords differently than what's
  specified in schema.
 
  For example: I have the following synonym:
  AAA BBB, AAABBB, AAA-BBB, CCC DDD
 
  When I search for AAA-BBB, it works, however search for
  CCC DDD was not
  returning results containing AAABBB. What is interesting is
  that
  admin/analysis.jsp is returning great results.

 Query string is tokenized (according to white spaces) before it reaches
 analyzer. https://issues.apache.org/jira/browse/LUCENE-2605
 That's why multi-word synonyms are not advised to use at query time.

 Analysis.jsp does not perform actual query parsing.



Couple issues with edismax in 3.5

2012-02-28 Thread Way Cool
Hi, Guys,

I am having the following issues with edismax:

1. Search for 4X6 generated the following parsed query:
+DisjunctionMaxQueryid:4 id:x id:6)^1.2) | ((name:4 name:x
name:6)^1.025) )
while the search for 4 X 6 (with space in between)  generated the query
below: (I like this one)
+((DisjunctionMaxQuery((id:4^1.2 | name:4^1.025)
+((DisjunctionMaxQuery((id:x^1.2 | name:x^1.025)
+((DisjunctionMaxQuery((id:6^1.2 | name:6^1.025)

Is that really intentional? The first query is pretty weird because it will
return all of the docs with one of 4, x, 6.

Any easy way we can force 4X6 search to be the same as 4 X 6?

2. Issue with multi words synonym because edismax separates keywords to
multiple words via the line below:
clauses = splitIntoClauses(userQuery, false);
and seems like edismax doesn't quite respect fieldType at query time, for
example, handling stopWords differently than what's specified in schema.

For example: I have the following synonym:
AAA BBB, AAABBB, AAA-BBB, CCC DDD

When I search for AAA-BBB, it works, however search for CCC DDD was not
returning results containing AAABBB. What is interesting is that
admin/analysis.jsp is returning great results.


Thanks,

YH


Re: Searching multiple fields

2011-09-28 Thread Way Cool
It will be nice if we can have dissum in addition to dismax. ;-)

On Tue, Sep 27, 2011 at 9:26 AM, lee carroll
lee.a.carr...@googlemail.comwrote:

 see


 http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html



 On 27 September 2011 16:04, Mark static.void@gmail.com wrote:
  I thought that a similarity class will only affect the scoring of a
 single
  field.. not across multiple fields? Can anyone else chime in with some
  input? Thanks.
 
  On 9/26/11 9:02 PM, Otis Gospodnetic wrote:
 
  Hi Mark,
 
  Eh, I don't have Lucene/Solr source code handy, but I *think* for that
  you'd need to write custom Lucene similarity.
 
  Otis
  
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/
 
 
  
  From: Markstatic.void@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Monday, September 26, 2011 8:12 PM
  Subject: Searching multiple fields
 
  I have a use case where I would like to search across two fields but I
 do
  not want to weight a document that has a match in both fields higher
 than a
  document that has a match in only 1 field.
 
  For example.
 
  Document 1
  - Field A: Foo Bar
  - Field B: Foo Baz
 
  Document 2
  - Field A: Foo Blarg
  - Field B: Something else
 
  Now when I search for Foo I would like document 1 and 2 to be
 similarly
  scored however document 1 will be scored much higher in this use case
  because it matches in both fields. I could create a third field and use
  copyField directive to search across that but I was wondering if there
 is an
  alternative way. It would be nice if we could search across some sort
 of
  virtual field that will use both underlying fields but not actually
  increase the size of the index.
 
  Thanks
 
 
 
 



Re: Boost Exact matches on Specific Fields

2011-09-28 Thread Way Cool
I will give str_category more weight than ts_category because we want
str_category to win if they have exact matches ( you converted to
lowercase).

On Mon, Sep 26, 2011 at 10:23 PM, Balaji S mcabal...@gmail.com wrote:

 Hi

   You mean to say copy the String field to a Text field or the reverse .
 This is the approach I am currently following

 Step 1: Created a FieldType


 fieldType name=string_lower class=solr.TextField
 sortMissingLast=true omitNorms=true
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.TrimFilterFactory /
/analyzer
 /fieldType

 Step 2 : field name=str_category type=string_lower indexed=true
 stored=true/

 Step 3 : copyField source=ts_category dest=str_category/

 And in the SOLR Query planning to q=hospitalsqf=body^4.0 title^5.0
 ts_category^10.0 str_category^8.0


 The One Question I have here is All the above mentioned fields will have
 Hospital present in them , will the above approach work to get the exact
 match on the top and bring Hospitalization below in the results


 Thanks
 Balaji


 On Tue, Sep 27, 2011 at 9:38 AM, Way Cool way1.wayc...@gmail.com wrote:

  If I were you, probably I will try defining two fields:
  1. ts_category as a string type
  2. ts_category1 as a text_en type
  Make sure copy ts_category to ts_category1.
 
  You can use the following as qf in your dismax:
  qf=body^4.0 title^5.0 ts_category^10.0 ts_category1^5.0
  or something like that.
 
  YH
  http://thetechietutorials.blogspot.com/
 
 
  On Mon, Sep 26, 2011 at 2:06 PM, balaji mcabal...@gmail.com wrote:
 
   Hi all
  
  I am new to SOLR and have a doubt on Boosting the Exact Terms to the
  top
   on a Particular field
  
   For ex :
  
   I have a text field names ts_category and I want to give more boost
  to
   this field rather than other fields, SO in my Query I pass the
 following
  in
   the QF params qf=body^4.0 title^5.0 ts_category^21.0 and also sort on
   SCORE desc
  
   When I do a search against Hospitals . I get Hospitalization
   Management , Hospital Equipment  Supplies  on Top rather than the
 exact
   matches of Hospitals
  
So It would be great , If I could be helped over here
  
  
   Thanks
   Balaji
  
  
  
  
  
  
  
   Thanks in Advance
   Balaji
  
   --
   View this message in context:
  
 
 http://lucene.472066.n3.nabble.com/Boost-Exact-matches-on-Specific-Fields-tp3370513p3370513.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
 



Any plans to support function queries on score?

2011-09-26 Thread Way Cool
Hi, guys,

Do you have any plans to support function queries on score field? for
example, sort=floor(product(score, 100)+0.5) desc?

So far I am getting the following error:
undefined field score

I can't use subquery in this case because I am trying to use secondary
sorting, however I will be open for that if someone successfully use
another field to boost the results.

Thanks,

YH
http://thetechietutorials.blogspot.com/


Re: Boost Exact matches on Specific Fields

2011-09-26 Thread Way Cool
If I were you, probably I will try defining two fields:
1. ts_category as a string type
2. ts_category1 as a text_en type
Make sure copy ts_category to ts_category1.

You can use the following as qf in your dismax:
qf=body^4.0 title^5.0 ts_category^10.0 ts_category1^5.0
or something like that.

YH
http://thetechietutorials.blogspot.com/


On Mon, Sep 26, 2011 at 2:06 PM, balaji mcabal...@gmail.com wrote:

 Hi all

I am new to SOLR and have a doubt on Boosting the Exact Terms to the top
 on a Particular field

 For ex :

 I have a text field names ts_category and I want to give more boost to
 this field rather than other fields, SO in my Query I pass the following in
 the QF params qf=body^4.0 title^5.0 ts_category^21.0 and also sort on
 SCORE desc

 When I do a search against Hospitals . I get Hospitalization
 Management , Hospital Equipment  Supplies  on Top rather than the exact
 matches of Hospitals

  So It would be great , If I could be helped over here


 Thanks
 Balaji







 Thanks in Advance
 Balaji

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Boost-Exact-matches-on-Specific-Fields-tp3370513p3370513.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Faceting DIH

2011-08-29 Thread Way Cool
I think you need to setup entity hierarchy with product as a top level
entity and attribute as another entity under product, otherwise the record
#2 and 3 will override the first one.

On Mon, Aug 29, 2011 at 3:52 PM, Aaron Bains aaronba...@gmail.com wrote:

 Hello,

 I am trying to setup Solr Faceting on products by using the
 DataImportHandler to import data from my database. I have setup my
 data-config.xml with the proper queries and schema.xml with the fields.
 After the import/index is complete I can only search one productid record
 in
 Solr. For example of the three productid '10100039' records there are I am
 only able to search for one of those. Should I somehow disable unique ids?
 What is the best way of doing this?

 Below is the schema I am trying to index:

 +---+-+-++
 | productid | attributeid | valueid | categoryid |
 +---+-+-++
 |  10100039 |  331100 |1580 |  1 |
 |  10100039 |  331694 |1581 |  1 |
 |  10100039 |33113319 | 1537370 |  1 |
 |  10100040 |  331100 |1580 |  1 |
 |  10100040 |  331694 | 1540230 |  1 |
 |  10100040 |33113319 | 1537370 |  1 |
 +---+-+-++

 Thanks!



When are you planning to release SolrCloud feature with ZooKeeper?

2011-08-18 Thread Way Cool
Hi, guys,

When are you planning to release the SolrCloud feature with ZooKeeper
currently in trunk? The new admin interface looks great. Great job.

Thanks,

YH


Problem with xinclude in solrconfig.xml

2011-08-10 Thread Way Cool
Hi, Guys,

Based on the document below, I should be able to include a file under the
same directory by specifying relative path via xinclude in solrconfig.xml:
http://wiki.apache.org/solr/SolrConfigXml

However I am getting the following error when I use relative path (absolute
path works fine though):
SEVERE: org.xml.sax.SAXParseException: Error attempting to parse XML file

Any ideas?

Thanks,

YH


Re: Problem with xinclude in solrconfig.xml

2011-08-10 Thread Way Cool
Sorry for the spam. I just figured it out. Thanks.

On Wed, Aug 10, 2011 at 2:17 PM, Way Cool way1.wayc...@gmail.com wrote:

 Hi, Guys,

 Based on the document below, I should be able to include a file under the
 same directory by specifying relative path via xinclude in solrconfig.xml:
 http://wiki.apache.org/solr/SolrConfigXml

 However I am getting the following error when I use relative path (absolute
 path works fine though):
 SEVERE: org.xml.sax.SAXParseException: Error attempting to parse XML file

 Any ideas?

 Thanks,

 YH



Re: What's the best way (practice) to do index distribution at this moment? Hadoop? rsyncd?

2011-08-05 Thread Way Cool
 I will look at that. Thanks Shalin!

On Fri, Aug 5, 2011 at 1:39 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Fri, Aug 5, 2011 at 12:22 AM, Way Cool way1.wayc...@gmail.com wrote:

  Hi, guys,
 
  What's the best way (practice) to do index distribution at this moment?
  Hadoop? or rsyncd (back to 3 years ago ;-)) ?
 
 
 See http://wiki.apache.org/solr/SolrReplication

 --
 Regards,
 Shalin Shekhar Mangar.



Re: Is there anyway to sort differently for facet values?

2011-08-05 Thread Way Cool
That's right. It should work if I already know these values ahead of time,
however I want to use business rules to control display orders for different
search terms. Maybe I have to code it by myself. Thanks everyone.

On Fri, Aug 5, 2011 at 12:25 AM, Jayendra Patil 
jayendra.patil@gmail.com wrote:

 you can give it a try with the facet.sort.

 We had such a requirement for sorting facets by order determined by
 other field and had to resort to a very crude way to get through it.
 We pre-pended the facets values with the order in which it had to be
 displayed ... and used the facet.sort to sort alphabetically.

 e.g. prepend Small - 0_Small, Medium - 1_Medium, Large - 2_Large, XL -
 3_XL

 You would need to handle the display part though.

 Surely not the best way, but worked for us.

 Regards,
 Jayendra

 On Thu, Aug 4, 2011 at 4:38 PM, Sethi, Parampreet
 parampreet.se...@teamaol.com wrote:
  It can be achieved by creating own (app specific) custom comparators for
  fields defined in schema.xml and having an extra attribute to specify the
  comparator class in the field tag itself. But it will require changes in
 the
  Solr to support this feature. (Not sure if it's feasible though just
  throwing an idea.)
 
  -param
 
  On 8/4/11 4:29 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 
  No, it can not. It just sorts alphabetically, actually by raw
 byte-order.
 
  No other facet sorting functionality is available, and it would be
  tricky to implement in a performant way because of the way lucene
  works.  But it would certainly be useful to me too if someone could
  figure out a way to do it.
 
  On 8/4/2011 2:43 PM, Way Cool wrote:
  Thanks Eric for your reply. I am aware of facet.sort, but I haven't
 used it.
  I will try that though.
 
  Can it handle the values below in the correct order?
  Under 10
  10 - 20
  20 - 30
  Above 30
 
  Or
  Small
  Medium
  Large
  XL
  ...
 
  My second question is that if Solr can't do that for the values above
 by
  using facet.sort. Is there any other ways in Solr?
 
  Thanks in advance,
 
  YH
 
  On Wed, Aug 3, 2011 at 8:35 PM, Erick Ericksonerickerick...@gmail.com
 wrote:
 
  have you looked at the facet.sort parameter? The index value is what
 I
  think you want.
 
  Best
  Erick
  On Aug 3, 2011 7:03 PM, Way Coolway1.wayc...@gmail.com  wrote:
  Hi, guys,
 
  Is there anyway to sort differently for facet values? For example,
  sometimes
  I want to sort facet values by their values instead of # of docs, and
 I
  want
  to be able to have a predefined order for certain facets as well. Is
 that
  possible in Solr we can do that?
 
  Thanks,
 
  YH
 
 



Re: Is there anyway to sort differently for facet values?

2011-08-04 Thread Way Cool
Thanks Eric for your reply. I am aware of facet.sort, but I haven't used it.
I will try that though.

Can it handle the values below in the correct order?
Under 10
10 - 20
20 - 30
Above 30

Or
Small
Medium
Large
XL
...

My second question is that if Solr can't do that for the values above by
using facet.sort. Is there any other ways in Solr?

Thanks in advance,

YH

On Wed, Aug 3, 2011 at 8:35 PM, Erick Erickson erickerick...@gmail.comwrote:

 have you looked at the facet.sort parameter? The index value is what I
 think you want.

 Best
 Erick
 On Aug 3, 2011 7:03 PM, Way Cool way1.wayc...@gmail.com wrote:
  Hi, guys,
 
  Is there anyway to sort differently for facet values? For example,
 sometimes
  I want to sort facet values by their values instead of # of docs, and I
 want
  to be able to have a predefined order for certain facets as well. Is that
  possible in Solr we can do that?
 
  Thanks,
 
  YH



What's the best way (practice) to do index distribution at this moment? Hadoop? rsyncd?

2011-08-04 Thread Way Cool
Hi, guys,

What's the best way (practice) to do index distribution at this moment?
Hadoop? or rsyncd (back to 3 years ago ;-)) ?

Thanks,

Yugang


Re: What's the best way (practice) to do index distribution at this moment? Hadoop? rsyncd?

2011-08-04 Thread Way Cool
Yes, I am talking about replication feature. I remember I tried rsync 3
years ago with solr 1.2. Just not sure if someone else have done anything
better than that during the last 3 years. ;-) Personally I am thinking about
using Hadoop and ZooKeeper. Has anyone tried those features?
I found a couple links below, but no success on that yet.
http://wiki.apache.org/solr/SolrCloud
http://wiki.apache.org/solr/DeploymentofSolrCoreswithZookeeper

Thanks for your reply Jonathan.

On Thu, Aug 4, 2011 at 2:31 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 I'm not sure what you mean by index distribution, that could possibly
 mean several things.

 But Solr has had a replication feature built into it from 1.4, that can
 probably handle the same use cases as rsync, but better.  So that may be
 what you want.

 There are certainly other experiments going on involving various kinds of
 scaling distribution, that I'm not familiar with, including the sharding
 feature, that I'm not very familiar with. I don't know if anyone's tried to
 do anything with hadoop.




 On 8/4/2011 2:52 PM, Way Cool wrote:

 Hi, guys,

 What's the best way (practice) to do index distribution at this moment?
 Hadoop? or rsyncd (back to 3 years ago ;-)) ?

 Thanks,

 Yugang




Is there anyway to sort differently for facet values?

2011-08-03 Thread Way Cool
Hi, guys,

Is there anyway to sort differently for facet values? For example, sometimes
I want to sort facet values by their values instead of # of docs, and I want
to be able to have a predefined order for certain facets as well. Is that
possible in Solr we can do that?

Thanks,

YH


Re: fetcher no agents listed in 'http.agent.name' property

2011-07-07 Thread Way Cool
Just make sure you did change the files under
NUTCH_HOME/runtime/local/conf if you are running from runtime/local.

On Thu, Jul 7, 2011 at 8:34 AM, serenity keningston 
serenity.kenings...@gmail.com wrote:

 Hello Friends,


 I am experiencing this error message  fetcher no agents listed in '
 http.agent.name' property when I am trying to crawl with Nutch 1.3
 I referred other mails regarding the same error message and tried to change
 the nutch-default.xml and nutch-site.xml file details with

 property
  namehttp.agent.name/name
  valueMy Nutch Spider/value
  descriptionEMPTY/description
 /property

 I also filled out the other property details without blank and still
 getting
 the same error. May I know my mistake ?


 Serenity



Re: fetcher no agents listed in 'http.agent.name' property

2011-07-07 Thread Way Cool
Cool. Glad it worked out.

On Thu, Jul 7, 2011 at 11:22 AM, serenity keningston 
serenity.kenings...@gmail.com wrote:

 Thank you very much, I never tried to modify the config files from
 NUTCH_HOME/runtime/local/conf .

 In Nutch-0.9, we will just modify from NUTCH-HOME/conf  directory. I
 appreciate your time and help.

 Merci

 On Thu, Jul 7, 2011 at 12:05 PM, Way Cool way1.wayc...@gmail.com wrote:

  Just make sure you did change the files under
  NUTCH_HOME/runtime/local/conf if you are running from runtime/local.
 
  On Thu, Jul 7, 2011 at 8:34 AM, serenity keningston 
  serenity.kenings...@gmail.com wrote:
 
   Hello Friends,
  
  
   I am experiencing this error message  fetcher no agents listed in '
   http.agent.name' property when I am trying to crawl with Nutch 1.3
   I referred other mails regarding the same error message and tried to
  change
   the nutch-default.xml and nutch-site.xml file details with
  
   property
namehttp.agent.name/name
valueMy Nutch Spider/value
descriptionEMPTY/description
   /property
  
   I also filled out the other property details without blank and still
   getting
   the same error. May I know my mistake ?
  
  
   Serenity
  
 



Re: Apache Nutch and Solr Integration

2011-07-05 Thread Way Cool
Can you let me know when and where you were getting the error? A screen-shot
will be helpful.

On Tue, Jul 5, 2011 at 8:15 AM, serenity keningston 
serenity.kenings...@gmail.com wrote:

 Hello Friends,


 I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2
 . I did the steps explained in the following two URL's :

 http://wiki.apache.org/nutch/RunningNutchAndSolr


 http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html


 I downloaded both the softwares, however, I am getting error (*solrUrl is
 not set, indexing will be skipped..*) when I am trying to crawl using
 Cygwin.

 Can anyone please help me out to fix this issue ?
 Else any other website suggesting for Apache Nutch and Solr integration
 would be greatly helpful.



 Thanks  Regards,
 Serenity



Dynamic Facets

2011-07-05 Thread Way Cool
Hi, guys,

We have more than 1000 attributes scattered around 700K docs. Each doc might
have about 50 attributes. I would like Solr to return up to 20 facets for
every searches, and each search can return facets dynamically depending on
the matched docs. Anyone done that before? That'll be awesome if the facets
returned will be changed after we drill down facets.

I have looked at the following docs:
http://wiki.apache.org/solr/SimpleFacetParameters
http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr

Wondering what's the best way to accomplish that. Any advice?

Thanks,

YH


Re: Dynamic Facets

2011-07-05 Thread Way Cool
Thanks Erik and Darren.
A pre-faceting component (post querying) will be ideal as though maybe a
little performance penalty there. :-) I will try to implement one if no one
has done so.

Darren, I did look at the taxonomy faceting thread. My main concern is that
I want to have dynamic facets to be returned because I don't know what
facets I can specify as a part of query ahead of time, and there are too
many search terms. ;-)

Thanks for help.

On Tue, Jul 5, 2011 at 11:49 AM, dar...@ontrenet.com wrote:


 You can issue a new facet search as you drill down from your UI.
 You have to specify the fields you want to facet on and they can be
 dynamic.

 Take a look at recent threads here on taxonomy faceting for help.
 Also, look here[1]

 [1] http://wiki.apache.org/solr/SimpleFacetParameters

 On Tue, 5 Jul 2011 11:15:51 -0600, Way Cool way1.wayc...@gmail.com
 wrote:
  Hi, guys,
 
  We have more than 1000 attributes scattered around 700K docs. Each doc
  might
  have about 50 attributes. I would like Solr to return up to 20 facets
 for
  every searches, and each search can return facets dynamically depending
 on
  the matched docs. Anyone done that before? That'll be awesome if the
 facets
  returned will be changed after we drill down facets.
 
  I have looked at the following docs:
  http://wiki.apache.org/solr/SimpleFacetParameters
 

 http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr
 
  Wondering what's the best way to accomplish that. Any advice?
 
  Thanks,
 
  YH



Re: A beginner problem

2011-07-05 Thread Way Cool
You can follow the links below to setup Nutch and Solr:
http://thetechietutorials.blogspot.com/2011/06/solr-and-nutch-integration.html

http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html
http://wiki.apache.org/nutch/RunningNutchAndSolr

Of course, more details will be helpful for troubleshooting your env issue.
:-)

Have fun!

On Tue, Jul 5, 2011 at 11:49 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:

 : follow a receipe.  So I went to the the solr site, downloaded solr and
 : tried to follow the tutorial.  In the  example folder of solr, using
 : java -jar start.jar  I got:
 :
 : 2011-07-04 13:22:38.439:INFO::Logging to STDERR via
 org.mortbay.log.StdErrLog
 : 2011-07-04 13:22:38.893:INFO::jetty-6.1-SNAPSHOT
 : 2011-07-04 13:22:38.946:INFO::Started SocketConnector@0.0.0.0:8983

 if that is everything you got in the logs, then i suspect:
  a) you download a source release (ie: has *-src-* in it's name) in
 which the solr.war app has not yet been compiled)
  b) you did not run ant example to build solr and setup the example
 instance.

 If i'm wrong, then yes please more details would be helpful: what exact
 URL did you download?

 -Hoss



Re: Getting started with Velocity

2011-07-01 Thread Way Cool
By default, browse is using the following config:
requestHandler name=/browse class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str

   !-- VelocityResponseWriter settings --
   str name=wtvelocity/str

   str name=v.templatebrowse/str
   str name=v.layoutlayout/str
   str name=titleSolritas/str

   str name=defTypeedismax/str
   str name=q.alt*:*/str
   str name=rows10/str
   str name=fl*,score/str
   str name=mlt.qf
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
   /str
   str name=mlt.fltext,features,name,sku,id,manu,cat/str
   int name=mlt.count3/int

   str name=qf
  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
   /str

   str name=faceton/str
   str name=facet.fieldcat/str
   str name=facet.fieldmanu_exact/str
   str name=facet.queryipod/str
   str name=facet.queryGB/str
   str name=facet.mincount1/str
   str name=facet.pivotcat,inStock/str
   str name=facet.rangeprice/str
   int name=f.price.facet.range.start0/int
   int name=f.price.facet.range.end600/int
   int name=f.price.facet.range.gap50/int
   str name=f.price.facet.range.otherafter/str
   str name=facet.rangemanufacturedate_dt/str
   str
name=f.manufacturedate_dt.facet.range.startNOW/YEAR-10YEARS/str
   str name=f.manufacturedate_dt.facet.range.endNOW/str
   str name=f.manufacturedate_dt.facet.range.gap+1YEAR/str
   str name=f.manufacturedate_dt.facet.range.otherbefore/str
   str name=f.manufacturedate_dt.facet.range.otherafter/str


   !-- Highlighting defaults --
   str name=hlon/str
   str name=hl.fltext features name/str
   str name=f.name.hl.fragsize0/str
   str name=f.name.hl.alternateFieldname/str
 /lst
 arr name=last-components
   strspellcheck/str
 /arr
 !--
 str name=url-schemehttpx/str
 --
  /requestHandler

while the normal search is using the following:
requestHandler name=search class=solr.SearchHandler default=true
!-- default values for query parameters can be specified, these
 will be overridden by parameters in the request
  --
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
 /lst
/requestHandler.

Just make sure you have those fields defined in browse also in your doc,
otherwise change to not use dismax. :-)


On Fri, Jul 1, 2011 at 12:51 PM, Chip Calhoun ccalh...@aip.org wrote:

 I'm a Solr novice, so I hope I'm missing something obvious.  When I run a
 search in the Admin view, everything works fine.  When I do the same search
 in http://localhost:8983/solr/browse , I invariably get 0 results found.
  What am i missing?  Are these not supposed to be searching the same index?

 Thanks,
 Chip



How to avoid double counting for facet query

2011-06-14 Thread Way Cool
Hi, guys,

I fixed Solr search UI (solr/browse) to display the price range facet values
via
http://thetechietutorials.blogspot.com/2011/06/fix-price-facet-display-in-solr-search.htm
l:

   - Under 
50http://localhost:9090/solr/browse?q=Shakespearefq=price:%5B0.0+TO+50%5D
   (1331)
   - [50.0 TO 
100]http://localhost:9090/solr/browse?q=Shakespearefq=price:%5B50.0+TO+100%5D
   (133)
   - [100.0 TO 
150]http://localhost:9090/solr/browse?q=Shakespearefq=price:%5B100.0+TO+150%5D
   (31)
   - [150.0 TO 
200]http://localhost:9090/solr/browse?q=Shakespearefq=price:%5B150.0+TO+200%5D
   (7)
   - [200.0 TO 
250]http://localhost:9090/solr/browse?q=Shakespearefq=price:%5B200.0+TO+250%5D
   (2)
   - [250.0 TO 
300]http://localhost:9090/solr/browse?q=Shakespearefq=price:%5B250.0+TO+300%5D
   (5)
   - [300.0 TO 
350]http://localhost:9090/solr/browse?q=Shakespearefq=price:%5B300.0+TO+350%5D
   (3)
   - [350.0 TO 
400]http://localhost:9090/solr/browse?q=Shakespearefq=price:%5B350.0+TO+400%5D
   (6)
   - [400.0 TO 
450]http://localhost:9090/solr/browse?q=Shakespearefq=price:%5B400.0+TO+450%5D
   (1)
   - 
600.0+http://localhost:9090/solr/browse?q=Shakespearefq=price:%5B600.0+TO+*%5D(1)

However I am having double counting issue.

Here is the URL to only return docs whose prices are in between 110.0 and
160.0 and price facets:
http://localhost:8983/solr/select/?q=Shakespeareversion=2.2rows=0*
fq=price:[110.0+TO+160]**
facet.query=price:[110%20TO%20160]facet.query=price:[160%20TO%20200]*
facet.field=price

The response is as below:
*result name=response numFound=23 start=0 maxScore=0.37042576/
lst name=facet_counts
lst name=facet_queries
int name=price:[110 TO 160]23/int
int name=price:[160 TO 200]1/int
/lst
...
/result*

As you notice, the number of the results is 23, however an extra doc was
found in the 160-200 range.

Any way I can avoid double counting issue? Or does anyone have similar
issues?

Thanks,

YH


Re: How to avoid double counting for facet query

2011-06-14 Thread Way Cool
Thanks! That's what I was trying to find.

On Tue, Jun 14, 2011 at 1:48 PM, Ahmet Arslan iori...@yahoo.com wrote:

  int name=price:[110 TO 160]23/int
  int name=price:[160 TO 200]1/int
  /lst
  ...
  /result*
 
  As you notice, the number of the results is 23, however an
  extra doc was
  found in the 160-200 range.
 
  Any way I can avoid double counting issue?

 You can use exclusive range queries which are denoted by curly brackets.

 price:[110 TO 160}
 price:[160 TO 200}



Re: How to avoid double counting for facet query

2011-06-14 Thread Way Cool
You sure Solr supports that?
I am getting exceptions by doing that. Ahmet, do you remember where you see
that document? Thanks.



On Tue, Jun 14, 2011 at 1:58 PM, Way Cool way1.wayc...@gmail.com wrote:

 Thanks! That's what I was trying to find.


 On Tue, Jun 14, 2011 at 1:48 PM, Ahmet Arslan iori...@yahoo.com wrote:

  int name=price:[110 TO 160]23/int
  int name=price:[160 TO 200]1/int
  /lst
  ...
  /result*
 
  As you notice, the number of the results is 23, however an
  extra doc was
  found in the 160-200 range.
 
  Any way I can avoid double counting issue?

 You can use exclusive range queries which are denoted by curly brackets.

 price:[110 TO 160}
 price:[160 TO 200}





Re: How to avoid double counting for facet query

2011-06-14 Thread Way Cool
That's good to know. From the ticket, looks like the fix will be in 4.0
then?

Currently I can see {} and [] worked, but not combined for Solr 3.1. I will
try 3.2 soon. Thanks.

On Tue, Jun 14, 2011 at 2:07 PM, Ahmet Arslan iori...@yahoo.com wrote:

  You sure Solr supports that?
  I am getting exceptions by doing that. Ahmet, do you
  remember where you see
  that document? Thanks.

 I tested it with trunk.
 https://issues.apache.org/jira/browse/SOLR-355
 https://issues.apache.org/jira/browse/LUCENE-996




Re: Modifying Configuration from a Browser

2011-06-14 Thread Way Cool
+1 Good idea! I was thinking to write a web interface to change contents for
elevate.xml and feed back to Solr core.

On Tue, Jun 14, 2011 at 1:51 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 There is no API. Upload and restart the core is the way to go.

  Does anyone have any examples of modifying a configuration file, like
  elevate.xml from a browser? Is there an API that would help for this?
 
  If nothing exists for this, I am considering implementing something that
  would change the elevate.xml file then reload the core. Or is there a
  better approach for dynamic configuration?
 
  Thank you.



Re: How to avoid double counting for facet query

2011-06-14 Thread Way Cool
I already checked out facet range query. By the way, I did put the
facet.range.include as below:
str name=f.price.facet.range.includelower/str

Couple things I don't like though are:
1. It returns the following without end values (I have to re-calculate the
end values) :
lst name=counts
int name=100.020/int
int name=150.03/int
/lst
float name=gap50.0/float
float name=start0.0/float
float name=end600.0/float
int name=before0/int

2. I can't specify custom ranges of values, for example, 1,2,3,4,5,...10,
15, 20, 30,40,50,60,80,90,100,200, ..., 600, 800, 900, 1000, 2000, ... etc.

Thanks.

On Tue, Jun 14, 2011 at 3:50 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : You can use exclusive range queries which are denoted by curly brackets.

 that will solve the problem of making the fq exclude a bound, but
 for the range facet counts you'll want to pay attention to look at
 facet.range.include...

 http://wiki.apache.org/solr/SimpleFacetParameters#facet.range.include


 -Hoss



Re: How to avoid double counting for facet query

2011-06-14 Thread Way Cool
I just checked SolrQueryParser.java from 3.2.0 source. Looks like Yonik
Seeley's changes for
LUCENE-996https://issues.apache.org/jira/browse/LUCENE-996is not in.
I will check trunk later. Thanks!

On Tue, Jun 14, 2011 at 5:34 PM, Way Cool way1.wayc...@gmail.com wrote:

 I already checked out facet range query. By the way, I did put the
 facet.range.include as below:
 str name=f.price.facet.range.includelower/str

 Couple things I don't like though are:
 1. It returns the following without end values (I have to re-calculate the
 end values) :
 lst name=counts
 int name=100.020/int
 int name=150.03/int
 /lst
 float name=gap50.0/float
 float name=start0.0/float
 float name=end600.0/float
 int name=before0/int

 2. I can't specify custom ranges of values, for example, 1,2,3,4,5,...10,
 15, 20, 30,40,50,60,80,90,100,200, ..., 600, 800, 900, 1000, 2000, ... etc.

 Thanks.


 On Tue, Jun 14, 2011 at 3:50 PM, Chris Hostetter hossman_luc...@fucit.org
  wrote:


 : You can use exclusive range queries which are denoted by curly brackets.

 that will solve the problem of making the fq exclude a bound, but
 for the range facet counts you'll want to pay attention to look at
 facet.range.include...

 http://wiki.apache.org/solr/SimpleFacetParameters#facet.range.include


 -Hoss





FYI: How to build and start Apache Solr admin app from source with Maven

2011-06-10 Thread Way Cool
Hi, guys,

FYI: Here is the link to how to build and start Apache Solr admin app from
source with Maven just in case you might be interested:
http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html

Have fun.

YH