from:"Daniel Papasian"

Re: Dismax + Dynamic fields

2008-06-17 Thread Daniel Papasian

Norberto Meijome wrote:
> Thanks Yonik. ok, that matches what I've seen - if i know the actual
> name of the field I'm after, I can use it in a query it, but i can't
> use the dynamic_field_name_* (with wildcard) in the config.
> 
> Is adding support for this something that is desirable / needed
> (doable??) , and is it being worked on ?

You can use a wildcard with copyFrom to copy the dynamic fields that
match the pattern to another field that you can then query on. It seems
like that would cover your needs, no?

Daniel

Re: expression in an fq parameter fails

2008-05-21 Thread Daniel Papasian


Ezra Epstein wrote:

  storeAvailableDate:[* TO NOW]
  storeExpirationDate:[NOW TO *]

...


This works perfectly.  Only trouble is that the two data fields may
actually be empty, in which case this filters out such records and we
want to include them.  


I think the easiest thing to do would be either use a zero-date for 
storeAvailableDate and an infinity-date for storeExpirationDate instead 
of having them be empty for things you want to be always available or 
always expired (if I've understood your problem) or, add another field 
alwaysAvailable or neverExpiring, and then do an OR off of that.


Maybe that's cheating?

HTH,
Daniel

Re: Fwd: Grouping products

2008-05-14 Thread Daniel Papasian


Vender Livre wrote:

But it can find the most probable product, can't it?

Is there a library or tool that do something like that?

Someone told me SOLR would solve this problem.


I wouldn't say solr would solve this problem... sounds like someone sold 
you snake oil!


If you wanted to use solr, I think your best bet is to use a nightly and 
run a MoreLikeThis query - http://wiki.apache.org/solr/MoreLikeThis - 
but whether that's going to work well for you with so few terms, I have 
no idea.  Good luck!


Daniel

Re: Extending XmlRequestHandler

2008-05-09 Thread Daniel Papasian


Alexander Ramos Jardim wrote:

Ok,

Thanks for the advice!

I got the XmlRequestHandler code. I see it uses Stax right at the XML it
gets. There isn't anything to plug in or out to get an easy way to change
the xml format.


To maybe save you from reinventing the wheel, when I asked a similar 
question a couple weeks back, hossman pointed me towards SOLR-285 and 
SOLR-370.  285 does XSLT, 270 does STX.


Daniel

Re: SOLR-470 & default value in schema with NOW (update)

2008-05-07 Thread Daniel Papasian

Chris Hostetter wrote:
> The two exceptions you cited both indicate there was at least one date 
> instance with no millis included -- NOW can't do that.  it always inlcudes 
> millis (even though it shouldn't). 

I've seen people suggest, for performance reasons, that they reduce the
granularity of the timestamps they're storing down to what they need -
i.e. minute, hour, or day, instead of millisecond.  But it seems that
functionality will break if you don't store it with millis.

I'm just trying to make sure I'm reconciling these here-- Is the goal of
reducing the granularity simply to reduce the cardinality of the indexed
date terms?  If so, is the best practice when you don't need
significance beyond date to just fill the rest of the date with zeros,
and index, say, 2008-07-05T00:00:00.000Z?

(Hope this doesn't count as a threadjack!)

Daniel

Re: XSLT transform before update?

2008-04-17 Thread Daniel Papasian


Shalin Shekhar Mangar wrote:

Hi Daniel,

Maybe if you can give us a sample of how your XML looks like, we can suggest
how to use SOLR-469 (Data Import Handler) to index it. Most of the use-cases
we have yet encountered are solvable using the XPathEntityProcessor in
DataImportHandler without using XSLT, for details look at
http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476


I think even if it is possible to use SOLR-469 for my needs, I'd still 
prefer the XSLT approach, because it's going to be a bit of 
configuration either way, and I'd rather it be an XSLT stylesheet than 
solrconfig.xml.  In addition, I haven't yet decided whether I want to 
apply any patches to the version that we will deploy, but if I do go 
down the route of the XSLT transform patch, if I end up having to back 
it out the amount of work that it would be for me to do the transform at 
the XML source would be negligible, where it would be quite a bit of 
work ahead of me to go from using the DataImportHandler to not using it 
at all.


Because both the solr instance and the XML source are in house, I have 
the ability to apply the XSLT at the source instead of at solr. 
However, there are different teams of people that control the XML source 
and solr, so it would require a bit more office coordination to do it on 
the backend.


The data is a filemaker XML export (DTD fmresultset) and it looks 
roughly like this:


  
125
Ford Foundation
...

  
Y5-A
John Smith
  
  
Y5-B
Jane Doe
  



I'm taking the product of the resultset and the relatedset, using both 
IDs concatenated as a unique identifier, like so:



125Y5-A
Ford Foundation
John Smith


125Y5-B
Ford Foundation
Jane Doe


I can do the transform pretty simply with XSLT.  I suppose it is 
possible to get the DataImportHandler to do this, but I'm not yet 
convinced that it's easier.


Daniel

XSLT transform before update?

2008-04-16 Thread Daniel Papasian

Hey everyone,

I'm experimenting with updating solr from a remote XML source, using an
XSLT transform to get it into the solr XML syntax (and yes, I've looked
into SOLR-469, but disregarded it as I need to do quite a bit using XSLT
to get it to what I can index) to let me maintain an index.

I'm looking at using stream.url, but I need to do the XSLT at some point
in there.  I would prefer to do the XSLT on the client (solr) side of
the transfer, for various reasons.

Is there a way to implement a custom request handler or similar to get
solr to apply an XSLT transform to the content stream before it attempts
to parse it?  If not possible OOTB, where would be the right place to
add said functionality?

Thanks much for your help,

Daniel

Re: how to suppress result

2008-04-07 Thread Daniel Papasian


Evgeniy Strokin wrote:

I'm sorry, I didn't explain my case clearly. My Index base should
stay the same. User run query, and each time he runs query he wants
to suppress his own IDs. The example will be a merchant, who sell
books. He sell only fantasy books and he wants to see all fantasy
books in stock of wholesaler except books he already has in his own
stack. So he provides a list of books he already has and want them to
be excluded from his search result. So suppression is per query
actually (it would be better to say per user's session, but since
Solr has no sessions I'd say per query). Obviously other book shop
has his own book list and his own query and he wants to search and
suppress from the same index base of wholesaler.


What I would do is index book-merchant pairs, instead of books and 
merchants separately.  Each document would have the merchant's ID in 
there, so you can just add a fq statement to exclude the current merchant.


It's a far ways from normalized data, but this is an index, not an 
RDBMS.  Denormalize the data into documents, and index that.


Daniel

Re: matching exact/whole phrase

2008-04-01 Thread Daniel Papasian


Sandeep Shetty wrote:

Hi people,

I am looking to provide exact phrase match, along with the full text
search with solr.  I want to achieve the same effect in solr rather
than use a separate SQL query. I want to do the following as an
example

The indexed field has the text "car repair" (without the double
quotes)  for a document and I want this document to come in the
search result only if someone searches for "car repair". The document
should not show up for "repair" and "car" searches.

Is it possible to do this type of exact phrase matching if needed
with solr itself?


It sounds like you want to do an exact string match, and not a text 
match, so I don't think there's anything complex you'd need to do... 
just store the field with "car repair" as type="string" and do all of 
the literal searches you want.


But if you are working off a field that contains something beyond the 
exact match of what you want to search for, you'll just need to define a 
new field type and use only the analysis filters that you need, and 
you'll have to think more about what you need if that's the case.


Daniel

Re: Multiple schemas?

2008-03-27 Thread Daniel Papasian


tim robertson wrote:

Hi,
Would I be correct in thinking that for each schema I want, I need a new
SOLR instance running?


Hey Tim,

Documents aren't required to have all of the fields (it's not a 
database), so what I would do is just have all of the field definitions 
in a single schema.xml file.


That approach would only be a problem if you needed to have a field name 
mean one thing some of the time and something else another -- I'd 
suggest using consistent naming so that fields named the same way were 
treated the same way, and then using a single solr instance.


Daniel

Re: Update schema.xml without restarting Solr?

2008-03-26 Thread Daniel Papasian


[EMAIL PROTECTED] wrote:

Quoting Daniel Papasian <[EMAIL PROTECTED]>:

Or if you're adding a new field to the schema (perhaps the most common
need for editing schema.xml), you don't need to reindex any documents at
all, right?  Unless I'm missing something?


Well, it all depends on if that "field" (not solar/lucene field) exists 
on the already indexed material, but that particular field was never 
indexed. Lets say that we have a bunch of articles, that has a field 
"author" that someone decided  that it doesn't need to be in the index. 
But then later he changes his mind, and add the author field to the 
schema. In this case all articles that has a populated author field 
should now be reindexed.


Yeah, I guess the use case I was thinking of was someone who had 
multiple different types of content in their index (say, articles, 
events, organizations) and when they added a new content type (book 
review) if they found the need to add a new field for that content type 
(say, publisher) that would only be relevant for that type -- as you're 
adding it before any data that would have it was indexed, I believe 
you'd be fine making that schema change without reindexing anything.



I suppose if you add a new dynamic field specification that conflicts
with existing fields, reindexing is probably a good idea, but if you're
doing that... well, I probably don't want to know.


I must say that I'm abit confused by these dynamic fields. Can someone 
tell me if there is any reasonable use of dynamic fields without having 
the "variable type" (for example i for int/sint) in the name?


Well, perhaps this is fulfilling your requirement on a technicality, but 
there's always higher order types...  Offhand, I can think of things 
where you might want to define a dynamic field like *_propername or 
*_cost and then you'd be able to use fields like author_propername or 
editor_propername, or book_cost or volume_cost or what have you.


Daniel

Re: Update schema.xml without restarting Solr?

2008-03-26 Thread Daniel Papasian

[EMAIL PROTECTED] wrote:
> Quoting Jeryl Cook <[EMAIL PROTECTED]>:
> 
>> 2. Make the "schema.xml" configurable at runtime, not really sure the
>> best way to address this, because changing the schema would require
>> "re-indexing" the documents.
> 
> Isn't the best way to address this just to leave it to the persons that
> integrate solr into their system? I mean, if a change in the schema only
> effects 1% of all documents, then it's a bad idea to reindex them all
> (at least if the dataset is big).

Or if you're adding a new field to the schema (perhaps the most common
need for editing schema.xml), you don't need to reindex any documents at
all, right?  Unless I'm missing something?

I suppose if you add a new dynamic field specification that conflicts
with existing fields, reindexing is probably a good idea, but if you're
doing that... well, I probably don't want to know.

Daniel

Re: Dismax + Dynamic fields

Re: expression in an fq parameter fails

Re: Fwd: Grouping products

Re: Extending XmlRequestHandler

Re: SOLR-470 & default value in schema with NOW (update)

Re: XSLT transform before update?

XSLT transform before update?

Re: how to suppress result

Re: matching exact/whole phrase

Re: Multiple schemas?

Re: Update schema.xml without restarting Solr?

Re: Update schema.xml without restarting Solr?

12 matches

Site Navigation

Mail list logo

Footer information