from:"Peter Cline"

Custom fieldtype with sharding?

2011-03-10 Thread Peter Cline


Hi all,
I'm having an issue with using a custom fieldtype with distributed 
search.  It may be the case that what I'm looking for could be 
accomplished in a different way, but this is my first stab at it.


I'm looking to store XML in a field.  What I've done, which works fine, 
is to:

- on ingest, wrap the XML in a CDATA tag
- write a simple class that extends org.apache.solr.schema.TextField, 
which writes an XML node much in the way that a textfield would, but 
without escaping the contents


It looks like this:
public class XMLField extends TextField {
   @Override
   public void write(TextResponseWriter xmlWriter, String name, 
Fieldable f)

 throws java.io.IOException {
  Writer writer = xmlWriter.getWriter();
  writer.write("');
  writer.write(f.stringValue(), 0, f.stringValue() == null ? 0 : 
f.stringValue().length());

  writer.write("");
 }
}

Like I said, simple.  Not especially pretty, but it does the job.  Works 
fine for normal searching, I get back a response like:



When I try to use this with distributed searching, though, it comes back 
written as a normal textfield, like:



It looks like it doesn't know anything about my custom fieldtype at all, 
and is defaulting to writing it as a StrField or TextField instead.


So, my question:
- is there a better way to do this?  I'd be fine if it came back with a 
'str' element name, as long as it's not escaped.
- is there perhaps a different class I should extend to do this with 
sharded searching?
- should I just bite the bullet and manually unescape the xml after 
receiving the response?  I'd really prefer not to do this if I can get 
around it.


Thanks in advance for any help.

Peter

Re: facet.offset with facet.sort=lex and shards problem?

2011-02-24 Thread Peter Cline


On 02/24/2011 02:58 PM, Peter Cline wrote:

On 02/24/2011 12:37 PM, Yonik Seeley wrote:
On Thu, Feb 24, 2011 at 10:57 AM, Peter 
Cline  wrote:

Hi all,

I'm having a problem using distributed search in conjunction with the
facet.offset parameter and lexical facet value sorting.  Is there an
incompatibility between these?  I'm using Solr 1.41.

I have a facet with ~100k values in one index.  I'm wanting to page 
through
them alphabetically.  When not using distributed search, everything 
works

just fine, and very quick.  A query like this works, returning 10 facet
values starting at the 50,001st:

http://server:port/solr/select/?q=*:*&facet.field=subject_full_facet&facet=true&f.subject_full_facet.facet.limit=10&facet.sort=lex&facet.offset=5 


# Butterflies - Indiana !

However, if I enable distributed search, using a single shard (which 
is the

same index), I get no facet values returned.

http://server:port/solr/select/?q=*:*&facet.field=subject_full_facet&facet=true&f.subject_full_facet.facet.limit=10&facet.sort=lex&facet.offset=5&shards=server:port/solr 


# empty list :(

Doing a little more testing, I'm finding that with sharding I often 
get an
empty list any time the facet.offset>= facet.limit.  Also, by 
example, if I
do facet.limit=100 and facet.offset=90, I get 10 facet values.  
Doing so
without sharding, I get the expected (by me, at least) 100 values 
(starting

at what would normally be the 91st).

Can anybody shed any light on this for me?

Sounds like a bug.
Have you tried a 3x or trunk development build to see if it's fixed 
there?


-Yonik
http://lucidimagination.com


I haven't.  I'll try the current trunk and get back to you.

Thanks,
Peter


I tried today's builds for the 3.x branch and the trunk.  The problem 
persists in both.


Peter

Re: facet.offset with facet.sort=lex and shards problem?

2011-02-24 Thread Peter Cline


On 02/24/2011 12:37 PM, Yonik Seeley wrote:

On Thu, Feb 24, 2011 at 10:57 AM, Peter Cline  wrote:

Hi all,

I'm having a problem using distributed search in conjunction with the
facet.offset parameter and lexical facet value sorting.  Is there an
incompatibility between these?  I'm using Solr 1.41.

I have a facet with ~100k values in one index.  I'm wanting to page through
them alphabetically.  When not using distributed search, everything works
just fine, and very quick.  A query like this works, returning 10 facet
values starting at the 50,001st:

http://server:port/solr/select/?q=*:*&facet.field=subject_full_facet&facet=true&f.subject_full_facet.facet.limit=10&facet.sort=lex&facet.offset=5
# Butterflies - Indiana !

However, if I enable distributed search, using a single shard (which is the
same index), I get no facet values returned.

http://server:port/solr/select/?q=*:*&facet.field=subject_full_facet&facet=true&f.subject_full_facet.facet.limit=10&facet.sort=lex&facet.offset=5&shards=server:port/solr
# empty list :(

Doing a little more testing, I'm finding that with sharding I often get an
empty list any time the facet.offset>= facet.limit.  Also, by example, if I
do facet.limit=100 and facet.offset=90, I get 10 facet values.  Doing so
without sharding, I get the expected (by me, at least) 100 values (starting
at what would normally be the 91st).

Can anybody shed any light on this for me?

Sounds like a bug.
Have you tried a 3x or trunk development build to see if it's fixed there?

-Yonik
http://lucidimagination.com


I haven't.  I'll try the current trunk and get back to you.

Thanks,
Peter

facet.offset with facet.sort=lex and shards problem?

2011-02-24 Thread Peter Cline


Hi all,

I'm having a problem using distributed search in conjunction with the 
facet.offset parameter and lexical facet value sorting.  Is there an 
incompatibility between these?  I'm using Solr 1.41.


I have a facet with ~100k values in one index.  I'm wanting to page 
through them alphabetically.  When not using distributed search, 
everything works just fine, and very quick.  A query like this works, 
returning 10 facet values starting at the 50,001st:


http://server:port/solr/select/?q=*:*&facet.field=subject_full_facet&facet=true&f.subject_full_facet.facet.limit=10&facet.sort=lex&facet.offset=5
# Butterflies - Indiana !

However, if I enable distributed search, using a single shard (which is 
the same index), I get no facet values returned.


http://server:port/solr/select/?q=*:*&facet.field=subject_full_facet&facet=true&f.subject_full_facet.facet.limit=10&facet.sort=lex&facet.offset=5&shards=server:port/solr
# empty list :(

Doing a little more testing, I'm finding that with sharding I often get 
an empty list any time the facet.offset >= facet.limit.  Also, by 
example, if I do facet.limit=100 and facet.offset=90, I get 10 facet 
values.  Doing so without sharding, I get the expected (by me, at least) 
100 values (starting at what would normally be the 91st).


Can anybody shed any light on this for me?

Thanks,
Peter

Re: Question about facet.prefix usage

2008-10-27 Thread Peter Cline

Hi Simon,
I came across your post to the solr users list about using facet
prefixes, shown below. I was wondering if you were still using your
modified version of SimpleFacets.java, and if so -- if you could send me
a copy. I'll need to implement something similar, and it never hurts to
start from existing material.

Thanks,
Peter

Simon Hu wrote:

I also need the exact same feature. I was not able to find an easy solution
and ended up modifying class SimpleFacets to make it accept an array of
facet prefixes per field. If you are interested, I can email you the
modified SimpleFacets.java.

-Simon

steve berry-2 wrote:

Question: Is it possible to pass complex queries to facet.prefix?
Example instead of facet.prefix:foo I want facet.prefix:foo OR
facet.prefix:bar

My application is for browsing business records that fall into
categories. The user is only allowed to see businesses falling into
categories which they have access to.

I have a series of documents dumped into the following basic structure
which I was hoping would help me deal with this:

123
Business Corp.
28255-0001
.
charlotte_2006 Banks
charlotte_2007 Banks
sanfrancisco_2006 Banks
sanfrancisco_2007 Banks
... (lots more market_category entries) ...

124
Factory Corp.
28205-0001
.
charlotte_2006 Banks
charlotte_2007 Banks
austin_2006 Banks
austin_2007 Banks
... (lots more market_category entries) ...

The multivalued market_category fields are flattened relational data
attributed to that business and I want to use those values for facted
navigation /but/ I want the facets to be restricted depending on what
products the user has access to. For example a user may have access to
sanfrancisco_2007 and sanfrancisco_2006 data but nothing else.

So I've created a request using facet.prefix that looks something like
this:
http://SOLRSERVER:8080/solr/select?q.op=AND&q=docType:gen&facet.field=market_category&facet.prefix=charlotte_2007

This ends up producing perfectly suitable facet results that look like
this:
..

1
1
1

1
1
1
0

Bingo! facet.prefix does exactly what I want it to.

Now I want to go a step further and pass a compound statement to the
facet.prefix along the lines of "facet.prefix:charlotte_2007 OR
sanfrancisco_2007" or "facet.prefix:charlotte_2007 OR charlotte_2006" to
return more complex facet sets. As far as I can tell looking at the docs
this won't work.

Is this possible using the existing facet.prefix functionality? Anyone
have a better idea of how I should accomplish this?

Thanks,
steve berry
American City Business Journals

uriEncoding for solr in glassfish

2008-03-13 Thread Peter Cline


Hi all,

This is a little off-topic, so I apologize.  I asked a question not too 
long ago about uri encoding problems, and got a quick and accurate 
response, so I thought I would try again.


I need to pass utf-8 encoded characters to solr instances, so I need the 
uri encoding to be done in UTF-8.  In tomcat, this was accomplished by 
setting an attribute of the Connector (thanks Nicholas and Yonik).  
We're considering moving from tomcat to Glassfish (for various reasons), 
so I'm trying to get this working there as well.  I found a very similar 
setting, setting the uriEncoding property in the http-listener, but it's 
not seeming to have any effect--solr is getting garbled strings.


So, in effect, my question is this: has anybody used solr in glassfish 
and had to address this problem?


Seems unlikely, but it's worth a shot. 


Thanks,
Peter

Re: Accented search

2008-03-11 Thread Peter Cline

I'm not sure about a way to boost scores in this case, but you can 
achieve the basic matching by applying a filter to the index and the 
queries.  The ISOLatin1Accent Filter seems like it may work for you, 
though I'm not entirely certain if that will cover all the accent 
characters you need.


My approach has been to write new filters, one to normalize the unicode 
into the "decomposed" version, then one to manually strip out all of the 
"add-on" characters (with decimal codepoint greater than 256).  I don't 
know if this will always work, but it's worked well for me so far.


I would test out adding a  
to your analyzer.  It might do the trick.  Once again, with this 
approach I'm not sure how to boost either score, so someone else may 
have better ideas.  I'm pretty new to all of this stuff.


Peter

climbingrose wrote:

Hi guys,

I'm running to some problems with accented (UTF-8) language. I'd love to
hear some ideas about how to use Solr with those languages. Basically, I
want to achieve what Google did with UTF-8 language.

My requirements including:
1) Accent insensitive search and proper highlighting:
  For example, we have 2 documents:

  Doc A (title:Lập Trình Viên)
  Doc B (title:Lap Trinh Vien)

  if the user enters "Lập Trình Viên", then Doc B is also matched and "Lập
Trình Viên" is highlighted.
  On the other hand, if the query is "Lap Trinh Vien", Doc A is also
matched.
2) Assign proper scores to accented or non-accented searches:
  if the user enters "Lập Trình Viên", then Doc A should be given higher
score than DOC B.
  if the query is "Lap Trinh Vien", Doc A should be given higher score.

Any ideas guys? Thanks in advance!

Re: Illegal xml/html character; unicode problems near solr

2008-03-07 Thread Peter Cline


Nicolas and Yonik,

Thank you both for your excellent responses--this fixed my problem.  Now 
it's time to go back and remove all the hacks I was using to pin this 
thing together without proper utf-8 support. 


Thanks again,
Peter

[EMAIL PROTECTED] wrote:

I think Tomcat defaults to the operating system default, e.g. cp1252 on a
classic windows.

You need to add an attribute URIEncoding="UTF-8" to the Connector you use in
the server.xml conf.

Nicolas

-Message d'origine-
De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] De la part de Yonik Seeley
Envoyé : vendredi 7 mars 2008 18:53
À : solr-user@lucene.apache.org
Objet : Re: Illegal xml/html character; unicode problems near solr

On Fri, Mar 7, 2008 at 12:30 PM, Peter Cline <[EMAIL PROTECTED]> wrote:
  

 The following is a snippet of a link to use a facet:
 search-faceted.html?q=[* TO
 *]&facet=true&rows=25&fq=name_facet:"Brasseur de
 Bourbourg, abb%C3%A9, 1814-1874, former owner""

 These characters are correctly specified. When it returns, I get an
 illegal character error. Examining the XML, I get an fq value of:
 name_facet:"Brasseur de Bourbourg, abbÃƒÂ(c), 1814-1874, former owner"



Is this bad XML part of the responseHeader (parameters that are simply
being echoed back)?
If so, it's most likely the config on whatever servlet container you
are using... you need to configure it to accept UTF-8 URLs rather than
latin-1 (Tomcat defaults to the old-style latin-1 AFAIK)

-Yonik

Illegal xml/html character; unicode problems near solr

2008-03-07 Thread Peter Cline


Hi all,

I'm new to the list, but I've been struggling with this problem for some 
time. I'm getting Illegal xml/html character errors and I'm trying to 
track down the source. The characters in question seem to be in the 
128-159 (decimal) range, which is illegal in XML. The characters are 
mostly diacritics and other types of accents.


The original data is encoded in UTF-8. I have verified that the data 
doesn't contain any of these characters prior to indexing, and when I 
get the records in question back in a list of results, they display 
fine. The problem arises when the characters occur in a facet value and 
I try to pass it through the URL.


As an example, consider a facet value:
Brasseur de Bourbourg, abb%C3%A9, 1814-1874, former owner

The %C3%A9 is an e with a diacritic, so roughly abbe'.

The following is a snippet of a link to use a facet:
search-faceted.html?q=[* TO 
*]&facet=true&rows=25&fq=name_facet:"Brasseur de 
Bourbourg, abb%C3%A9, 1814-1874, former owner""


These characters are correctly specified. When it returns, I get an 
illegal character error. Examining the XML, I get an fq value of:

name_facet:"Brasseur de Bourbourg, abbÃÂ©, 1814-1874, former owner"

I'm not sure how that will display in the email, but in short, it's not 
what I put in. Further, it's not legal html and things break.


Does anyone have any thoughts about this? I apologize if this has been 
asked somewhere in the past, but I did some digging and couldn't come up 
with anything. I welcome any input.


Regards,

Peter


Peter Cline, Digital Library Applications Programmer
University of Pennsylvania Library
email: pcline at pobox dot upenn dot edu

Custom fieldtype with sharding?

Re: facet.offset with facet.sort=lex and shards problem?

Re: facet.offset with facet.sort=lex and shards problem?

facet.offset with facet.sort=lex and shards problem?

Re: Question about facet.prefix usage

uriEncoding for solr in glassfish

Re: Accented search

Re: Illegal xml/html character; unicode problems near solr

Illegal xml/html character; unicode problems near solr

9 matches

Site Navigation

Mail list logo

Footer information