Case-insensitive on facet prefix

2012-07-12 Thread Nestor Oviedo
Hello all.
I have a field configured with the LowerCaseFilterFactory as the only
analyzer (for botth indexing and searching). The problem is that
facet.prefix doesn't work on that field as expected.
For example:
Indexed term: house --> LowerCaseFilterFactory applied

facet.prefix=hou --> returns a "house" entry as expected
facet.prefix=Hou --> no match

I suppose the LowerCaseFilterFactory it's not been applied on this prefix term.

So ... is this the expected behavior ? How can I perform a facet with
a case-insensitive prefix ?

Thanks in advance

Nestor

SeDiCI - http://sedici.unlp.edu.ar
PrEBi - http://prebi.unlp.edu.ar
Universidad Nacional de La Plata
La Plata, Buenos Aires, Argentina


Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Nestor Oviedo
Hi everyone!
I've been following this thread and I realized we've constructed something
similar to "Crawl Anywhere". The main difference is that our project is
oriented to the digital libraries and digital repositories context.
Specifically related to metadata collection from multiple sources,
information improvements and storing in multiple destinations.
So far, I can share an article about the project, because the code is in our
development machines and testing servers. If everything goes well, we plan
to make it open source in the near future.
I'd be glad to hear your comments and opinions about it. There is no need to
be polite.
Thanks in advance.

Best regards.
Nestor



On Wed, Mar 2, 2011 at 11:46 AM, Dominique Bejean  wrote:

> Hi,
>
> No, it doesn't. It looks like to be a apache httpclient 3.x limitation.
> https://issues.apache.org/jira/browse/HTTPCLIENT-579
>
> Dominique
>
> Le 02/03/11 15:04, Thumuluri, Sai a écrit :
>
>  Dominique, Does your crawler support NTLM2 authentication? We have content
>> under SiteMinder which uses NTLM2 and that is posing challenges with Nutch?
>>
>> -Original Message-
>> From: Dominique Bejean [mailto:dominique.bej...@eolya.fr]
>> Sent: Wednesday, March 02, 2011 6:22 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: [ANNOUNCE] Web Crawler
>>
>> Aditya,
>>
>> The crawler is not open source and won't be in the next future. Anyway,
>> I have to change the license because it can be use for any personal or
>> commercial projects.
>>
>> Sincerely,
>>
>> Dominique
>>
>> Le 02/03/11 10:02, findbestopensource a écrit :
>>
>>> Hello Dominique Bejean,
>>>
>>> Good job.
>>>
>>> We identified almost 8 open source web crawlers
>>> http://www.findbestopensource.com/tagged/webcrawler   I don't know how
>>> far yours would be different from the rest.
>>>
>>> Your license states that it is not open source but it is free for
>>> personnel use.
>>>
>>> Regards
>>> Aditya
>>> www.findbestopensource.com
>>>
>>>
>>> On Wed, Mar 2, 2011 at 5:55 AM, Dominique Bejean
>>> mailto:dominique.bej...@eolya.fr>>  wrote:
>>>
>>> Hi,
>>>
>>> I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java
>>> Web Crawler. It includes :
>>>
>>>   * a crawler
>>>   * a document processing pipeline
>>>   * a solr indexer
>>>
>>> The crawler has a web administration in order to manage web sites
>>> to be crawled. Each web site crawl is configured with a lot of
>>> possible parameters (no all mandatory) :
>>>
>>>   * number of simultaneous items crawled by site
>>>   * recrawl period rules based on item type (html, PDF, ...)
>>>   * item type inclusion / exclusion rules
>>>   * item path inclusion / exclusion / strategy rules
>>>   * max depth
>>>   * web site authentication
>>>   * language
>>>   * country
>>>   * tags
>>>   * collections
>>>   * ...
>>>
>>> The pileline includes various ready to use stages (text
>>> extraction, language detection, Solr ready to index xml writer, ...).
>>>
>>> All is very configurable and extendible either by scripting or
>>> java coding.
>>>
>>> With scripting technology, you can help the crawler to handle
>>> javascript links or help the pipeline to extract relevant title
>>> and cleanup the html pages (remove menus, header, footers, ..)
>>>
>>> With java coding, you can develop your own pipeline stage stage
>>>
>>> The Crawl Anywhere web site provides good explanations and screen
>>> shots. All is documented in a wiki.
>>>
>>> The current version is 1.1.4. You can download and try it out from
>>> here : www.crawl-anywhere.com
>>>
>>>
>>> Regards
>>>
>>> Dominique
>>>
>>>
>>>


TermsComponent prefix query with fileds analyzers

2010-12-02 Thread Nestor Oviedo
Hi everyone
Does anyone know how to apply some analyzers over a prefix query?
What I'm looking for is a way to build an autosuggest using the
termsComponent that could be able to remove the accents from the
query's prefix.
For example, I have the term "analisis" in the index and I want to
retrieve it with the prefix "Análi" (notice the accent in the third
letter).
I think the regexp function won't help here, so I was wondering if
specifying some analyzers (LowerCase and ASCIIFolding) in the
termComponents configuration, it would be applied over the prefix.

Thanks in advance.
Nestor


Re: facet.field problem in SolrParams to NamedList

2009-12-16 Thread Nestor Oviedo
Hi Hoss
I changed my code to use the AppendedSolrParams and it worked perfectly.

I'm opening a bug today with my simple test case.

Tank you very much for your help
Regards
Nestor




On Tue, Dec 15, 2009 at 6:39 PM, Chris Hostetter
 wrote:
>
> : Ej.: "q=something field:value" becomes "q=something value&fq=field:value"
> :
> : To do this, in the createParser method, I apply a regular expression
> : to the qstr param to obtain the fq part, and then I do the following:
> :
> : NamedList paramsList = params.toNamedList();
> : paramsList.add(CommonParams.FQ, generatedFilterQuery);
> : params = SolrParams.toSolrParams(paramsList);
> : req.setParams(params);
>        ...
> : SolrParams.toNameList() was saving the array correctly, but the method
> : SolrParams.toSolrParams(NamedList) was doing:
> : "params.getVal(i).toString()". So, it always loses the array.
>
> I'm having trouble thinking through exactly where the problem is being
> introduced here ... ultimately what it comes down to is that the NamedList
> souldn't be containing a String[] ... it should be containing multiple
> string values with the same name ("fq")
>
> It would be good to make sure all of these methods play nicely with one
> another so some round trip conversions worked as expected -- so if you
> could open a bug for this with a simple example test case that would be
> great, ...but...
>
> for your purposes, i would skip the NamedList conversion alltogether,
> and just use AppendedSolrParams...
>
>  MapSolrParams myNewParams = new MapSolrParams();
>  myNewParams.getMap().put("fq", generatedFileterQuery);
>  myNewParams.getMap().put("q", generatedQueryString);
>  req.setParams(new AppendedSolrParams(myNewParams, originalPrams));
>
> -Hoss
>
>


facet.field problem in SolrParams to NamedList

2009-12-15 Thread Nestor Oviedo
Hi!
I wrote a subclass of DisMaxQParserPlugin to add a little filter for
processing the "q" param and generate a "fq" param.
Ej.: "q=something field:value" becomes "q=something value&fq=field:value"

To do this, in the createParser method, I apply a regular expression
to the qstr param to obtain the fq part, and then I do the following:

NamedList paramsList = params.toNamedList();
paramsList.add(CommonParams.FQ, generatedFilterQuery);
params = SolrParams.toSolrParams(paramsList);
req.setParams(params);

The problem is when I include two "facet.field" in the request. In the
results (facets section) it prints "[Ljava.lang.String;@c77a748",
which is the result of a toString() over an String[] .

So, getting a little in deep in the code, I saw the method
SolrParams.toNameList() was saving the array correctly, but the method
SolrParams.toSolrParams(NamedList) was doing:
"params.getVal(i).toString()". So, it always loses the array.

Something similar occurs with the methods SolrParams.toMap() and
SolrParams.toMultiMap().

Is this a bug ?

thanks.
Nestor


Re: param "version" and diferences in /admin/ping response

2009-11-26 Thread Nestor Oviedo
Tank you Chris... I didn't see it. I was looking for something related
with the PingRequestHandler.
Regards.
Nestor Oviedo

On Wed, Nov 25, 2009 at 7:09 PM, Chris Hostetter
 wrote:
> : Hi everyone!
> : Can anyone tell me what's the meaning of the param "version" ?? There
> : isn't anything about it in the Solr documentation.
>
> http://wiki.apache.org/solr/XMLResponseFormat#A.27version.27
>
> -Hoss
>
>


param "version" and diferences in /admin/ping response

2009-11-25 Thread Nestor Oviedo
Hi everyone!
Can anyone tell me what's the meaning of the param "version" ?? There
isn't anything about it in the Solr documentation.

When I invoke the /admin/ping url, if the version value is between 0
and 2.1, the response looks like this:



0
5

all
10
all
solrpingquery
standard
2.1


OK


And when the version value is anything different from that range, the
response looks like this:



0
4

all
10
all
solrpingquery
standard


OK


Tanks.
Regards
Nestor Oviedo