Dismax and worddelimiterfilter

2011-03-25 Thread David Yang
Hi,

 

I am having some really strange issues matching N61JQ-B2. If I had a
field N61JQ-B2, and I wanted to match N61JQ, N61JQB2, N61JQ-B2
and N61JQ B2 in dismax, what fieldtype should it have? My final
fallback is to use ngrams but that would impose a pretty large overhead,
since the field could be a long normal string with one model number in
it.

 

I noticed when I used WordDelimiterFilterFactory the dismax would
convert the parsed query to some pre-analyzed query.

 

Cheers,

David



Tokenizer that Protects Phrases

2011-02-22 Thread David Yang
Hi,

 

I am trying to tokenize a string field of products. Two different
products are: camera, security camera. What I would like is for
security camera to be treated differently to camera - and only be
displayed when the search is for security camera, otherwise, the
results should only display camera. 

 

In other words, even though they share the English word camera, their
meanings are different.

 

Now my guess about the best way to deal with this is just to manually
provide a file of words that together is a token. For ex. laptop
battery, security camera. Kind of like protwords, but like
protphrases.

 

Is this a good idea to solve this problem? How do I implement it if it
is the right way? If there is a better way of dealing with this what is
it?

 

Thanks for your time,

David

 



RE: DIH and updating specific record

2011-02-22 Thread David Yang
Chris Hostetter answered this just recently:
http://wiki.apache.org/solr/DataImportHandler#Accessing_request_paramete
rs

My addition:
Pass a parameter like command=delta-importidz=31415
And access it via 'sql where id=${dataimporter.request.idz}'

If the idz is a string you might need to prequote the idz value.

-Original Message-
From: Olson, Ron [mailto:rol...@lbpc.com] 
Sent: Tuesday, February 22, 2011 3:18 PM
To: solr-user@lucene.apache.org
Subject: DIH and updating specific record

Hi all-

I am trying to determine if there is a way to tell Solr to update its
index with a specific ID to a record in the database. All the examples
and documentation seems to discuss using a last updated date/time
field, but in this case modifying the table would not be an option.
Instead, I'd like to invoke Solr's DIH delta query with a specific ID to
say here's something new or updated, please update your index with it.

I apologize if this is a trivial thing, but I can't seem to find any
documentation on how to do it.

Thanks,

Ron


DISCLAIMER: This electronic message, including any attachments, files or
documents, is intended only for the addressee and may contain
CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are
not the intended recipient, you are hereby notified that any use,
disclosure, copying or distribution of this message or any of the
information included in or with it is unauthorized and strictly
prohibited. If you have received this message in error, please notify
the sender immediately by reply e-mail and permanently delete and
destroy this message and its attachments, along with any copies thereof.
This message does not create any contractual obligation on behalf of the
sender or Law Bulletin Publishing Company.
Thank you.


RE: Sort Stability With Date Boosting and Rounding

2011-02-22 Thread David Yang
One suggestion: use logarithms to compress the large time range into something 
easier to compare: 1/log(ms(now,date)

-Original Message-
From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com] 
Sent: Tuesday, February 22, 2011 6:03 PM
To: solr-user@lucene.apache.org
Subject: Sort Stability With Date Boosting and Rounding

I'm trying to use
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
as
a bf parameter to my dismax handler.  The problem is, the value of NOW can
cause documents in a similar range (date value within a few seconds of each
other) to sometimes round to be equal, and sometimes not, changing their
sort order (when equal, falling back to a secondary sort).  This, in turn,
screws up paging.

The problem is that score is rounded to a lower level of precision than what
the suggested formula produces as a difference between two values within
seconds of each other.  It seems to me if I could round the value to minutes
or hours, where the difference will be large enough to not be rounded-out,
then I wouldn't have problems with order changing on me.  But it's not legal
syntax to specify something like:
recip(ms(NOW,manufacturedate_dt/HOUR),3.16e-11,1,1)

Is this a problem anyone has faced and solved?  Anyone have suggested
solutions, other than indexing a copy of the date field that's rounded to
the hour?

--
Stephen Duncan Jr
www.stephenduncanjr.com


Per field facet limit

2010-11-17 Thread David Yang
Hi,

 

The wiki on facet.limit
(http://wiki.apache.org/solr/SimpleFacetParameters#facet.limit) says
This parameter can be specified on a per field basis to indicate a
separate limit for certain fields. But it is not specified how to
specify a specific field. How do you do this?

 

I  tried

 

str name=facet.fielda_id/str

int name=a_id.facet.mincount30/int

 

str name=facet.fieldb_id/str

int name=b_id.facet.mincount3/int

 

Which didn't work, as well as plain 'facet.mincount' twice which also
didn't work. 

 

Cheers,

David.



RE: Per field facet limit

2010-11-17 Thread David Yang
Thanks!

Is there any way to apply this to facet queries as well? 
(I could just apply a f.field.facet.limit to each and every field, and
then apply a global facet.limit for facet queries.)

Cheers
david

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Wednesday, November 17, 2010 6:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Per field facet limit

f.name_of_field.facet.limit

The f.name_of_field.original_value thing is a common pattern in Solr, 
but, yeah, sometimes it's hard to find it in the documentation.

So same with any of the other facet parameters. 
f.name_of_field.facet.mincount, whatever.

David Yang wrote:
 Hi,

  

 The wiki on facet.limit
 (http://wiki.apache.org/solr/SimpleFacetParameters#facet.limit) says
 This parameter can be specified on a per field basis to indicate a
 separate limit for certain fields. But it is not specified how to
 specify a specific field. How do you do this?

  

 I  tried

  

 str name=facet.fielda_id/str

 int name=a_id.facet.mincount30/int

  

 str name=facet.fieldb_id/str

 int name=b_id.facet.mincount3/int

  

 Which didn't work, as well as plain 'facet.mincount' twice which also
 didn't work. 

  

 Cheers,

 David.


   


RE: Per field facet limit

2010-11-17 Thread David Yang
Sorry for the typo, I meant mincount, not limit... :p

Cheers,
David

-Original Message-
From: David Yang [mailto:dy...@nextjump.com] 
Sent: Wednesday, November 17, 2010 6:15 PM
To: solr-user@lucene.apache.org
Subject: RE: Per field facet limit

Thanks!

Is there any way to apply this to facet queries as well? 
(I could just apply a f.field.facet.limit to each and every field, and
then apply a global facet.limit for facet queries.)

Cheers
david

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Wednesday, November 17, 2010 6:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Per field facet limit

f.name_of_field.facet.limit

The f.name_of_field.original_value thing is a common pattern in Solr, 
but, yeah, sometimes it's hard to find it in the documentation.

So same with any of the other facet parameters. 
f.name_of_field.facet.mincount, whatever.

David Yang wrote:
 Hi,

  

 The wiki on facet.limit
 (http://wiki.apache.org/solr/SimpleFacetParameters#facet.limit) says
 This parameter can be specified on a per field basis to indicate a
 separate limit for certain fields. But it is not specified how to
 specify a specific field. How do you do this?

  

 I  tried

  

 str name=facet.fielda_id/str

 int name=a_id.facet.mincount30/int

  

 str name=facet.fieldb_id/str

 int name=b_id.facet.mincount3/int

  

 Which didn't work, as well as plain 'facet.mincount' twice which also
 didn't work. 

  

 Cheers,

 David.


   


RE: Per field facet limit

2010-11-17 Thread David Yang
Makes sense. The processing is already done and there is no reason to
not return it, since it is wont explode into a horribly long list,
unlike a field facet. 
Thanks!

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Wednesday, November 17, 2010 6:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Per field facet limit

I don't think a facet.limit or facet.mincount apply to facet queries, 
it's not applicable, whether global or field-specific. 

Keep in mind that a single facet query just returns ONE count, for the 
query you supplied. It's up to you to supply a query that will give the 
count you want, it won't use facet.limit or facet.mincount, those 
parameters apply to ordinary facetting where you get many values per 
field, to filter the values per field. Each facet.query only gives you 
one count already.


David Yang wrote:
 Thanks!

 Is there any way to apply this to facet queries as well? 
 (I could just apply a f.field.facet.limit to each and every field, and
 then apply a global facet.limit for facet queries.)

 Cheers
 david

 -Original Message-
 From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
 Sent: Wednesday, November 17, 2010 6:12 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Per field facet limit

 f.name_of_field.facet.limit

 The f.name_of_field.original_value thing is a common pattern in Solr, 
 but, yeah, sometimes it's hard to find it in the documentation.

 So same with any of the other facet parameters. 
 f.name_of_field.facet.mincount, whatever.

 David Yang wrote:
   
 Hi,

  

 The wiki on facet.limit
 (http://wiki.apache.org/solr/SimpleFacetParameters#facet.limit) says
 This parameter can be specified on a per field basis to indicate a
 separate limit for certain fields. But it is not specified how to
 specify a specific field. How do you do this?

  

 I  tried

  

 str name=facet.fielda_id/str

 int name=a_id.facet.mincount30/int

  

 str name=facet.fieldb_id/str

 int name=b_id.facet.mincount3/int

  

 Which didn't work, as well as plain 'facet.mincount' twice which also
 didn't work. 

  

 Cheers,

 David.


   
 

   


Shuffle results a little

2010-11-12 Thread David Yang
Hi,

 

I am interested in using solr to return search results for products. Is
there any feature which will allow the result to be spread/shuffled
around a little? The problem is that there are lots of results for one
brand, but there are lots of other brands a few pages later. Is it
possible to somehow shuffle it so that one brand does not dominate the
top results? 

 

Like a field_limit=2 means in the at most two of any same field will be
shown, and then the rest are skipped. I could implement this as a
post-filter step, but that means that I would have to pull many more
results from the index. 

 

Cheers,

David



DataImportHandler with multiline SQL

2010-09-16 Thread David Yang
Hi

 

I am using the DIH to retrieve data, and as part of the process, I
wanted to create a temporary table and then import data from that. I
have played around a little with DIH and it seems like for a query like:
select x; select y; you can have select y to return no results and do
random stuff, but the first select x needs to return results.

Does anybody know exactly how DIH handles multiple sql statements in the
query?

 

Cheers,

David



Autocomplete with Filter Query

2010-09-10 Thread David Yang
Hi,

 

Is there any way to provide autocomplete while filtering results?
Suppose I had a bunch of people and each person has multiple
occupations. When I select 'Assistant' in a filter box, it would be nice
if autocomplete only provides assistant names, instead of all names. The
other issue is that I use DisMax to do my search (name, title, phone
number etc) - so it might be more complex to do autocomplete. I could
have a copy field to copy all dismax terms into one big field.

 

Cheers,

 

David 



RE: Autocomplete with Filter Query

2010-09-10 Thread David Yang
 in, but each white-space-seperated word as
they type it, this won't do THAT either.  Trying to get all those things
to work becomes even more complicated -- especially with the requirement
that you want to be able to apply the 'fq's from your current search
context to the auto-complete.  I haven't entirely thought through a
possible way to do all that. 

But hopefully this gives you some clues to think about it. 

Jonathan

From: David Yang [dy...@nextjump.com]
Sent: Friday, September 10, 2010 11:14 AM
To: solr-user@lucene.apache.org
Subject: Autocomplete with Filter Query

Hi,



Is there any way to provide autocomplete while filtering results?
Suppose I had a bunch of people and each person has multiple
occupations. When I select 'Assistant' in a filter box, it would be nice
if autocomplete only provides assistant names, instead of all names. The
other issue is that I use DisMax to do my search (name, title, phone
number etc) - so it might be more complex to do autocomplete. I could
have a copy field to copy all dismax terms into one big field.



Cheers,



David



How to use TermsComponent when I need a filter

2010-09-08 Thread David Yang
Hi,

 

I have a solr index, which for simplicity is just a list of names, and a
list of associations. (either a multivalue field e.g. {A1, A2, A3, A6}
or a string concatenation list e.g. A1 A2 A3 A6)

 

I want to be able to provide autocomplete but with a specific
association. E.g. Names beginning with Bob in association A5. 

 

Is this possible? I would prefer not to have to have one index per
association, since the number of associations is pretty large

 

Cheers,

 

David 

 



Delta Import with something other than Date

2010-09-08 Thread David Yang
Hi,

I have a table that I want to index, and the table has no datetime
stamp. However, the table is append only so the primary key can only go
up. Is it possible to store the last primary key, and use some delta
query=select id where id${last_id_value}

Cheers,

David



RE: Delta Import with something other than Date

2010-09-08 Thread David Yang
Currently DIH delta import uses the SQL query of type select id from
item where last_modified  ${dataimporter.last_index_time}
What I need is some field like ${dataimporter.last_primary_key}
wiki.apache.org/solr/DataImportHandler
I am thinking of storing the last primary key externally and calling the
delta-import with a parameter and using
${dataimporter.request.last_primary_key} but that seems like a very
brittle approach

Cheers,
David

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Wednesday, September 08, 2010 6:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Delta Import with something other than Date

Of course you can store whatever you want in a solr index. And if you 
store an integer as a Solr 1.4 int type, you can certainly query for 
all documents that have greater than some specified integer in a field.

You can't use SQL to query Solr though.

I'm not sure what you're really asking?

Jonathan

David Yang wrote:
 Hi,

 I have a table that I want to index, and the table has no datetime
 stamp. However, the table is append only so the primary key can only
go
 up. Is it possible to store the last primary key, and use some delta
 query=select id where id${last_id_value}

 Cheers,

 David


   


How to use TermsComponent when I need a filter

2010-09-07 Thread David Yang
Hi,

 

I have a solr index, which for simplicity is just a list of names, and a
list of associations. (either a multivalue field e.g. {A1, A2, A3, A6}
or a string concatenation list e.g. A1 A2 A3 A6)

I want to be able to provide autocomplete but with a specific
association. E.g. Names beginning with Bob in association A5. 

Is this possible? I would prefer not to have to have one index per
association, since the number of associations is pretty large

 

Cheers,

David 



A few query issues with solr

2010-08-26 Thread David Yang
Hi,

 

I'm new to using Solr, and I have started an index with it and it works
great. I have encountered a few minor issues that I currently solve by
modifying the query beforehand - however I feel like there is a much
more configuration oriented and Solr-correct way of achieving.

 

Current manual modifications

* Searching for car actually means buying a car so it should
look for car -rent, whereas searching for car rent should still look
for car rent

* Searching for macy's and searching for macys is different
- currently I force macy's to macys

* Searching for att gets converted to at, t which are
both stop worded - I am forced to convert att=att before indexing and
querying

 

Is there a nice way to handle these or will I always need to resort to
manual fixes for these?

 

Cheers

David