Re: Best practice: Autosuggest/autocomplete vs. real search

2014-11-10 Thread Jorge Luis Betancourt Gonzalez
It wouldn’t be easy if in the site you’ll ensure that only terms are submitted 
to the actual search? In app I worked some time ago the default behavior of the 
Javascript component used for autocompletion was to first autocomplete the term 
in the input and then submit the query against the backend. I know this is not 
what you’ve asked for but could work? I’m just firing a bullet in the air here! 
:-)

On Nov 10, 2014, at 8:37 AM, Michael Sokolov msoko...@safaribooksonline.com 
wrote:

 The goal is to ensure that suggestions from autocomplete are actually terms 
 in the main index, so that the suggestions will actually result in matches.  
 You've considered expanding the main index by adding the suggestion n-grams 
 to it, but it would probably be better to alter your suggester so that it 
 produces only tokens that are in the main index.  I think this is basically 
 how all the Suggester implementations are designed to work already; are you 
 using one of those, or are you using the TermsComponent, or something else?
 
 -Mike
 
 On 11/10/14 2:54 AM, Thomas Michael Engelke wrote:
  
  We're using Solr as a backend for an ECommerce site/system. The Solr
 index stores products with selected attributes, as well as a dedicated
 field for autocomplete suggestions (Done via AJAX request when typing in
 the search box without pressing return).
 
 The autosuggest field is supplied by copyField directives from certain
 select product attribute fields (description and/or name mostly). It
 uses EdgeNGramFilterFactory to complete words not yet typed completely,
 and it works quite well.
 
 However, we come across an issue with a disconnect between the
 autosuggest results and results of a normal search, that is, a query
 over the full fields of the product. Let's say there are products that
 are called motor.
 
 - When autosuggesting, typing mot autosuggests all products with
 motor, because the EdgeNGram created m, mo, mot, moto and
 motor, respectively, and it matches.
 - When searching for mot, however (i.e. pressing enter when seeing the
 autosuggestions), it doesn't find any products. The autosuggest field is
 not part of the real search, and no product attribute contains mot
 as a word.
 
 One obvious solution would be to incorporate the autosuggest field
 into the real search, however, this adds many tokens to the index that
 aren't really part of the products indexed and makes for strange search
 results, for example when an NGram is also a word, but the record itself
 does contain the search term only as part of a word.
 
 Are there clever solutions to this problem?
 



Re: How to choose only one best hit from several ones ?

2014-11-09 Thread Jorge Luis Betancourt Gonzalez
How would you measure which snippet is the best? 

On Nov 9, 2014, at 1:59 PM, SolrUser1543 osta...@gmail.com wrote:

 Lets say that for some query there are several results , with several hits
 for each one , which shown in hightligth section of the response.
 
 Is it possible to select only one best hit for every result ? there are
 hl.snippets parameter which controls number of snippets . hl.snippets=1 ,
 will show the fisrt one , but not certenly the best one . 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-choose-only-one-best-hit-from-several-ones-tp4168416.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Search for partial name in Solr 4.x

2014-11-09 Thread Jorge Luis Betancourt Gonzalez
The whole idea behind Solr is to solve the problem that you just explain, in 
particular what you need is to define the title field as a solr.TextField and 
then define a tokenizer. The tokenizer essentially will transform the initial 
text into tokens. Solr has several tokenizers, each which its special 
characteristics, nevertheless one of the must commons is the StandardTokenizer, 
but again your choice will be influenced by how do you want to “divide” your 
initial text into “parts” or tokens. Basically when you fire a query against 
Solr (put it in simple words) will match the tokens of your query to the tokens 
stored in each of your documents, and the will output a list of matching 
documents.

One simple example of a fieldType you could use is:

fieldType name=text class=solr.TextField sortMissingLast=true
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

In this case the tokenizer will split the initial text into the tokens, and 
then each token will be lowercased so when you query you wouldn’t have to worry 
about the capitalization of the terms.

Hope it helps

On Nov 9, 2014, at 3:26 PM, PeriS peri.subrahma...@htcinc.com wrote:

 I was wondering if there is a way to search on partial names? Ex; Field is a 
 string and stores values like titles of a book; When searching part of the 
 title may be supplied; How do I resolve this? Please let me know
 
 
 Thanks
 -PeriS
 
 
 
 
 
 
 *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
 recipient, please delete without copying and kindly advise us by e-mail of 
 the mistake in delivery.
 NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
 Services to any order or other contract unless pursuant to explicit written 
 agreement or government initiative expressly permitting the use of e-mail for 
 such purpose.
 
 



Re: exporting to CSV with solrj

2014-10-31 Thread Jorge Luis Betancourt Gonzalez
When you fire a query against Solr with the wt=csv the response coming from 
Solr is *already* in CSV, the CSVResponseWriter is responsible for translating 
SolrDocument instances into a CSV on the server side, son I don’t see any 
reason on using it by your self, Solr already do the heavy lifting for you.

Regards,

On Oct 31, 2014, at 10:44 AM, tedsolr tsm...@sciquest.com wrote:

 I am trying to invoke the CSVResponseWriter to create a CSV file of all
 stored fields. There are millions of documents so I need to write to the
 file iteratively. I saw a snippet of code online that claimed it could
 effectively remove the SorDocumentList wrapper and allow the docs to be
 retrieved in the actual format requested in the query. However, I get a null
 pointer from the CSVResponseWriter.write() method.
 
 SolrQuery qry = new SolrQuery(*:*);
 qry.setParam(wt, csv);
 // set other params
 SolrServer server = getSolrServer();
 try {
   QueryResponse res = server.query(qry);
 
   CSVResponseWriter writer = new CSVResponseWriter();
   Writer w = new StringWriter();
 SolrQueryResponse solrResponse = new SolrQueryResponse();
   solrResponse.setAllValues(res.getResponse());
try {
 SolrParams list = new MapSolrParams(new HashMapString, 
 String());
 writer.write(w, new LocalSolrQueryRequest(null, list), 
 solrResponse);
} catch (IOException e) {
throw new RuntimeException(e);
}
System.out.print(w.toString());
 
 } catch (SolrServerException e) {
   e.printStackTrace();
 }
 
 NPE snippet:
 org.apache.solr.response.CSVWriter.writeResponse(CSVResponseWriter.java:281)
 org.apache.solr.response.CSVResponseWriter.write(CSVResponseWriter.java:56)
 
 Am I on the right track with the approach? I really don't want to roll my
 own document to CSV line convertor. Thanks!
 Solr 4.9
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/exporting-to-CSV-with-solrj-tp4166845.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR query - restrict access to user documents

2014-10-07 Thread Jorge Luis Betancourt Gonzalez
I see you’re defining a default value for “rows” this could be overridden on 
the request, and requesting a lot of documents from solr can stress out your 
server/cluster, of course if the client in question has that many documents. if 
this is a fixed value and the clients can’t request more documents, then I’ll 
consider moving this into the invariants section ensuring that no matter what 
this value can’t be changed by the request. Some time ago I had a similar use 
case, we wanted to expose Solr to the clients and eventually we faced problems 
where some clients requested “all of his documents” in one request stressing 
out our cluster in the end we wrote a custom SearchComponent to set max values 
(instead of a fixed value specified on invariants) for the rows and start 
parameters (actually this component those a little more as we add some 
limitations to each type of client, defining some constrains as how many 
documents. i.e. data points can be requested, etc.).

Hope it helps, 

On Oct 7, 2014, at 11:37 AM, Nitin Agarwal 2nitinagar...@gmail.com wrote:

 Hi, I have a question around SOLR query, I am trying to restrict access to
 SOLR data.
 
 We are running SOLR 4.7.1, and wish to expose the query capabilities to our
 customers for the data that belongs to them. Specifically /select, with
 default configuration is the only Request Handler that customers can
 access.
 
 requestHandler name=/select class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=dftext/str
 /lst
 /requestHandler
 
 The custom API that fronts SOLR, will inject appropriate restriction
 into the q param e.g. q=customerNumber:123 or
 append to q param q=customer query AND customerNumber:123, before
 sending the request to the /select handler.
 
 This works fine, however,
 
 I want to know if there is a way customer can override these restrictions?
 
 If so what can I do to prevent that?
 
 So far I have come across facet.mincount as one potential concern
 where by customer can see data that they should not, e.g.
 
 /select?q=customer query AND
 customerNumber:123facet=truefacet.field=customerNamerows=0*facet.mincount=0*
 
 will return those customer names as well that do not belong to
 customerNumber 123.
 
 Are there any other gotchas that I should know?
 
 Thanks for your time and help,
 
 Nitin

Concurso Mi selfie por los 5. Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: Best way to index wordpress blogs in solr

2014-10-07 Thread Jorge Luis Betancourt Gonzalez
If you’re talking about a generic web crawl you could use something like Nutch 
[1] keep in mind that his a full web crawler and it does a pretty good job. 
I’ve been using it for over more than 2 years now and I’m very happy, although 
I don’t crawl just a couple of sites but a more wide spectrum (think a country 
web scale). But with Nutch you just have to configure a couple of options in an 
xml file and it will crawl the web and index the content into Solr.

Regards,

[1] http://nutch.apache.org 

On Oct 7, 2014, at 4:53 PM, Vishal Sharma vish...@grazitti.com wrote:

 Makes sense.
 
 I'll just dive in now. Thanks so much.
 
 *Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754
 E: vish...@grazitti.com
 www.grazitti.com [image: Description: LinkedIn]
 http://www.linkedin.com/company/grazitti-interactive[image: Description:
 Twitter] https://twitter.com/grazitti[image: fbook]
 https://www.facebook.com/grazitti.interactive*dreamforce®*Oct 13-16,
 2014 *Meet
 us at the Cloud Expo*
 Booth N2341 Moscone North,
 San Francisco
 Schedule a Meeting
 http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule
   |   Follow us https://twitter.com/grazittiZakCalendar
 Dreamforce® Featured
 App
 https://appexchange.salesforce.com/listingDetail?listingId=a0N300B5UPKEA3
 
 
 
 
 
 
 On Tue, Oct 7, 2014 at 1:44 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:
 
 I am pretty sure Swift is not Solr. That's why I was asking whether
 you were starting from scratch.
 
 As to the other items, please re-read my original response. Solr has
 an example reading in RSS feeds, you could probably use that. Or a
 generic XML using DataImportHandler's mapping. Or directly from
 database, again with DIH.
 
 Basically, it sounds totally doable. So, it's hard to advise anything
 specific beyond go, do it and wait for you to come back with a lot
 more specific issue once you get going. Most of the issues will be
 related to your schema and your WordPress configuration, so no
 abstract advice is available.
 
 Regards,
Alex.
 
 On 7 October 2014 16:36, Vishal Sharma vish...@grazitti.com wrote:
 Hey Alex,
 
 Thanks for the prompt response.
 
 Here is what I am trying to solve: I am showing search results from
 content
 coming from 3 different places on a single site. And, I have done that by
 pumping all this content to Solr server running on single flat schema by
 using different APIs of these platforms. Now, I need to index blog posts
 written in word press also. I was wondering if there is any solution
 already availablw which can help me crawl and pump this posst to my
 running
 solr instance. Otherwise I might have to write few more scripts to do
 that.
 
 BTW, Is Swift using Solr on the backend? Because I thought its a paid
 enterprise solution.
 
 

Concurso Mi selfie por los 5. Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: Changed behavior in solr 4 ??

2014-09-30 Thread Jorge Luis Betancourt Gonzalez
Don’t worry, the way Hoss explained its indeed the way I’ve know that works, 
but the example provided in the book pick my curiosity and hence the question 
in this thread.

Regards,

On Sep 30, 2014, at 5:59 PM, Timothy Potter thelabd...@gmail.com wrote:

 Indeed - Hoss is correct ... it's a problem with the example in the
 book ... my apologies for the confusion!
 
 On Tue, Sep 30, 2014 at 3:57 PM, Chris Hostetter
 hossman_luc...@fucit.org wrote:
 
 : Thanks for the response, yes the way you describe I know it works and is
 : how I get it to work but then what does mean the snippet of the
 : documentation I see on the documentation about overriding the default
 
 It means that there is implicitly a set of search components that have
 default behavior, and there is an implicit list of component *names*
 used by default by SearchHandler -- and if you override one of those
 implicit searchComponent instances by declaring your own with the same
 name, then it will be used by default in SerachHandler.
 
 a very concrete example of this is HighlightComponent -- if you have no
 HighlightComponent declared in your solrconfig.xml, then an implicit
 instance exists with the name highlight  and SearchHandler by default
 includes that component.
 
 If you want to declare your own HighlightComponent instance with special
 initialization logic, you can either declare it with it's own unique name,
 and edit the components list on a SerachHandler declatarion to include
 that name, or you can just name it highlight and it will override the
 default instance -- this is in fact done in the example solrconfig.xml
 (grep for HighlightComponent)
 
 : components shipped with Solr? Even on the book Solr in Action in chapter
 : 7 listing 7.3 I saw something similar to what I wanted to do:
 :
 : searchComponent name=query class=solr.QueryComponent
 :   lst name=invariants
...
 
 That appears to be a mistake in Solr in Action ... the QueryComponent
 class does nothing with it's init params (the nested XML inside the
 searchComponent declaration) so that syntax does nothing.
 
 
 
 -Hoss
 http://www.lucidworks.com/

Concurso Mi selfie por los 5. Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: (auto)suggestions, but ony from a filtered set of documents

2014-09-26 Thread Jorge Luis Betancourt Gonzalez
Perhaps instead of the suggester component you could use the EdgeNGramFilter 
and provide partial matches so you will me able to configure a custom request 
handler that will “suggest” terms of phrases for you. I’m using this approach 
to provide queries suggestions, of course I’m indexing the queries into a 
separated core. 

Greetings,

On Sep 26, 2014, at 8:49 AM, Clemens Wyss DEV clemens...@mysign.ch wrote:

 Either my intention is dumb (pls let me know ;)), or there is no answer to 
 this problem. If so, I will have to index my sources into separate cores. 
 But then the questions arise:
 a) how do I get suggestions from more than one core? Multiple 
 suggest-requests, then merge?
 b) how doe I get (ranked) results from more than one core?
 In Lucene I was able to use a MultiIndexReader (one IndexReaders per index)
 
 -Ursprüngliche Nachricht-
 Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
 Gesendet: Donnerstag, 25. September 2014 10:24
 An: solr-user@lucene.apache.org
 Betreff: (auto)suggestions, but ony from a filtered set of documents
 
 What I'd like to do is
 http://localhost:8983/solr/solrpedia/suggest?q=atmqf=source:mysource
 
 Through qf (or however the parameter shall be called) I'd like to restrict 
 the suggestions to documents which fit the given qf-query. 
 I need this filter if (as posted in a previous thread) I intend to put 
 different kind of data into one core/collection, cause suggestion shall be 
 restrictable to one or many source(s)

Concurso Mi selfie por los 5. Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: Changed behavior in solr 4 ??

2014-09-25 Thread Jorge Luis Betancourt Gonzalez
I haven’t used it before this, basically I found out about this in the Solr in 
Action book and guided by the comment about redefining the default components 
by defining a new searchComponent with the same name. 

Any how thanks for your reply! 

Regards,

On Sep 25, 2014, at 8:01 AM, Jack Krupansky j...@basetechnology.com wrote:

 I am not aware of any such feature! That doesn't mean it doesn't exist, but I 
 don't recall seeing it in the Solr source code.
 
 -- Jack Krupansky
 
 -Original Message- From: Jorge Luis Betancourt Gonzalez
 Sent: Wednesday, September 24, 2014 1:31 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Changed behavior in solr 4 ??
 
 Hi Jack:
 
 Thanks for the response, yes the way you describe I know it works and is how 
 I get it to work but then what does mean the snippet of the documentation I 
 see on the documentation about overriding the default components shipped with 
 Solr? Even on the book Solr in Action in chapter 7 listing 7.3 I saw 
 something similar to what I wanted to do:
 
 searchComponent name=query class=solr.QueryComponent
 lst name=invariants
   str name=rows25/str
   str name=dfcontent_field/str
 /lst
 lst name=defaults
   str name=q*:*/str
   str name=indenttrue/str
   str name=echoParamsexplicit/str
 /lst
 /searchComponent
 Because each default search component exists by default even if it’s not 
 defined explicitly in the solrconfig.xml file, defining them explicitly as in 
 the previous listing will replace the default configuration.
 
 The previous snippet is from the quoted book Solr in Action, I understand 
 that in each SearchHandler I could define this parameters bu if defined in 
 the searchComponent (as the book says) this configuration wouldn’t apply to 
 all my request handlers? eliminating the need to replicate the same parameter 
 in several parts of my solrconfig.xml (i.e all the request handlers)?
 
 
 Regards,
 On Sep 23, 2014, at 11:53 PM, Jack Krupansky j...@basetechnology.com wrote:
 
 
 You set the defaults on the search handler, not the search component. 
 See solrconfig.xml:
 
 requestHandler name=/select class=solr.SearchHandler
 !-- default values for query parameters can be specified, these
 will be overridden by parameters in the request
  --
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=dftext/str
 /lst
 ...
 
 -- Jack Krupansky
 
 -Original Message- From: Jorge Luis Betancourt Gonzalez
 Sent: Tuesday, September 23, 2014 11:02 AM
 To: solr-user@lucene.apache.org
 Subject: Changed behavior in solr 4 ??
 
 Hi:
 
 I’m trying to change the default configuration for the query component of a 
 SearchHandler, basically I want to set a default value to the rows 
 parameters and that this value be shared by all my SearchHandlers, as stated 
 on the solrconfig.xml comments, this could be accomplished redeclaring the 
 query search component, however this is not working on solr 4.9.0 which is 
 the version I’m using, this is my configuration:
 
  searchComponent name=query class=solr.QueryComponent
  lst name=defaults
  int name=rows1/int
  /lst
  /searchComponent
 
 The relevant portion of the solrconfig.xml comment is: If you register a 
 searchComponent to one of the standard names,  will be used instead of the 
 default.” so is this a new desired behavior?? although just for testing a 
 redefined the components of the request handler to only use the query 
 component and not to use all the default components, this is how it looks:
 
 requestHandler name=/select class=solr.SearchHandler”
 arr name=components
  strquery/str
 /arr
 /requestHandler
 
 Everything works ok but the the rows parameter is not used, although I’m not 
 specifying the rows parameter on the URL.
 
 Regards,Concurso Mi selfie por los 5. Detalles en 
 http://justiciaparaloscinco.wordpress.com
 
 
 Concurso Mi selfie por los 5. Detalles en 
 http://justiciaparaloscinco.wordpress.com
 

Concurso Mi selfie por los 5. Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: Spellchecking and suggesting part numbers

2014-09-24 Thread Jorge Luis Betancourt Gonzalez
I’ve done something similar to this using the the EdgeNGram not the 
spellchecker component, I don’t know if this is along with your requirements:

The relevant portion of my fieldType config:

filter class=solr.WordDelimiterFilterFactory” 
generateWordParts=1 generateNumberParts=1
catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory maxGramSize=20 
minGramSize=1”/

Basically use the WorDelimiterFilterFactory to divide the ABCD1234 into two 
tokens (or don’t depending on your requirement) and then use the 
EdgeNGramFilterFactory to provide partial matching on the field.

On Sep 24, 2014, at 10:05 AM, Lochschmied, Alexander 
alexander.lochschm...@vishay.com wrote:

 Hello Solr Users,
 
 we are trying to get suggestions for part numbers using the spellchecker.
 
 Problem scenario:
 
 ABCD1234 // This is the search term
 ABCE1234 // This is what we get from spellchecker
 ABCD1244 // This is what we would like to get from spellchecker
 
 Characters towards the left of our part numbers are more relevant.
 
 
 The setup is:
 
   searchComponent name=spellcheck_part 
 class=solr.SpellCheckComponent
   lst name=spellchecker
   str name=classnamesolr.IndexBasedSpellChecker/str
   str name=spellcheckIndexDir./spellchecker/str
   str name=fielddid_you_mean_part/str
   /lst
   /searchComponent
   requestHandler name=/spell_part class=solr.SearchHandler 
 startup=lazy
   lst name=defaults
   str name=dfdid_you_mean_part/str
   str name=spellcheckon/str
   /lst
   arr name=last-components
   strspellcheck_part/str
   /arr
   /requestHandler
 
 
   fieldType name=did_you_mean_part class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
   charFilter 
 class=solr.PatternReplaceCharFilterFactory pattern=[\s]+ replacement=/
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EdgeNGramFilterFactory 
 minGramSize=1 maxGramSize=20 side=front/
   filter 
 class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
   charFilter 
 class=solr.PatternReplaceCharFilterFactory pattern=[\s]+ replacement=/
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EdgeNGramFilterFactory 
 minGramSize=1 maxGramSize=20 side=front/
   /analyzer
   /fieldType
 
 Can we tweak the setup such that we should get more relevant part numbers?
 
 Thanks,
 Alexander

Concurso Mi selfie por los 5. Detalles en 
http://justiciaparaloscinco.wordpress.com


Changed behavior in solr 4 ??

2014-09-23 Thread Jorge Luis Betancourt Gonzalez
Hi:

I’m trying to change the default configuration for the query component of a 
SearchHandler, basically I want to set a default value to the rows parameters 
and that this value be shared by all my SearchHandlers, as stated on the 
solrconfig.xml comments, this could be accomplished redeclaring the query 
search component, however this is not working on solr 4.9.0 which is the 
version I’m using, this is my configuration:

searchComponent name=query class=solr.QueryComponent
lst name=defaults
int name=rows1/int
/lst
/searchComponent

The relevant portion of the solrconfig.xml comment is: If you register a 
searchComponent to one of the standard names,  will be used instead of the 
default.” so is this a new desired behavior?? although just for testing a 
redefined the components of the request handler to only use the query component 
and not to use all the default components, this is how it looks:

requestHandler name=/select class=solr.SearchHandler”
arr name=components
strquery/str
/arr
/requestHandler

Everything works ok but the the rows parameter is not used, although I’m not 
specifying the rows parameter on the URL.

Regards,Concurso Mi selfie por los 5. Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: Changed behavior in solr 4 ??

2014-09-23 Thread Jorge Luis Betancourt Gonzalez
Hi Jack:

Thanks for the response, yes the way you describe I know it works and is how I 
get it to work but then what does mean the snippet of the documentation I see 
on the documentation about overriding the default components shipped with Solr? 
Even on the book Solr in Action in chapter 7 listing 7.3 I saw something 
similar to what I wanted to do:

searchComponent name=query class=solr.QueryComponent
  lst name=invariants
str name=rows25/str
str name=dfcontent_field/str
  /lst
  lst name=defaults
str name=q*:*/str
str name=indenttrue/str
str name=echoParamsexplicit/str
  /lst
/searchComponent
Because each default search component exists by default even if it’s not 
defined explicitly in the solrconfig.xml file, defining them explicitly as in 
the previous listing will replace the default configuration.

The previous snippet is from the quoted book Solr in Action, I understand that 
in each SearchHandler I could define this parameters bu if defined in the 
searchComponent (as the book says) this configuration wouldn’t apply to all my 
request handlers? eliminating the need to replicate the same parameter in 
several parts of my solrconfig.xml (i.e all the request handlers)?


Regards,
On Sep 23, 2014, at 11:53 PM, Jack Krupansky j...@basetechnology.com wrote:


 You set the defaults on the search handler, not the search component. See 
 solrconfig.xml:
 
 requestHandler name=/select class=solr.SearchHandler
 !-- default values for query parameters can be specified, these
  will be overridden by parameters in the request
   --
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=dftext/str
  /lst
 ...
 
 -- Jack Krupansky
 
 -Original Message- From: Jorge Luis Betancourt Gonzalez
 Sent: Tuesday, September 23, 2014 11:02 AM
 To: solr-user@lucene.apache.org
 Subject: Changed behavior in solr 4 ??
 
 Hi:
 
 I’m trying to change the default configuration for the query component of a 
 SearchHandler, basically I want to set a default value to the rows parameters 
 and that this value be shared by all my SearchHandlers, as stated on the 
 solrconfig.xml comments, this could be accomplished redeclaring the query 
 search component, however this is not working on solr 4.9.0 which is the 
 version I’m using, this is my configuration:
 
   searchComponent name=query class=solr.QueryComponent
   lst name=defaults
   int name=rows1/int
   /lst
   /searchComponent
 
 The relevant portion of the solrconfig.xml comment is: If you register a 
 searchComponent to one of the standard names,  will be used instead of the 
 default.” so is this a new desired behavior?? although just for testing a 
 redefined the components of the request handler to only use the query 
 component and not to use all the default components, this is how it looks:
 
 requestHandler name=/select class=solr.SearchHandler”
 arr name=components
   strquery/str
 /arr
 /requestHandler
 
 Everything works ok but the the rows parameter is not used, although I’m not 
 specifying the rows parameter on the URL.
 
 Regards,Concurso Mi selfie por los 5. Detalles en 
 http://justiciaparaloscinco.wordpress.com 


Concurso Mi selfie por los 5. Detalles en 
http://justiciaparaloscinco.wordpress.com



Re: How to exclude a mimetype in tika?

2014-09-20 Thread Jorge Luis Betancourt Gonzalez
Which crawler are you using?

On Sep 18, 2014, at 10:14 AM, keeblerh keebl...@yahoo.com wrote:

 eShard wrote
 Good afternoon,
 I'm using solr 4.0 Final
 I need movies hidden in zip files that need to be excluded from the
 index.
 I can't filter movies on the crawler because then I would have to exclude
 all zip files.
 I was told I can have tika skip the movies.
 the details are escaping me at this point.
 How do I exclude a file in the tika configuration?
 I assume it's something I add in the update/extract handler but I'm not
 sure.
 
 Thanks,
 
 I am having the same issue.  I need to exlcude some mime types from the zip
 files and using SOLR 4.8.  Did you ever get an answer to this?  THanks.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-exclude-a-mimetype-in-tika-tp4127168p4159676.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Concurso Mi selfie por los 5. Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: Solr(j) API for manipulating the schema(.xml)?

2014-09-20 Thread Jorge Luis Betancourt Gonzalez
Basically you could create a bunch of dynamic fields (according to your needs) 
so basically creating a dynamic field for each type of data (and several 
combinations) and then you can create a small wrapper around Solrj that will 
wrap the patterns defined on your schema.xml in a more understandable way. Like 
this you will be able to abstract the manipulation of the schema.xml file and 
only introduce it when is really needed i.e a new field type with new 
analyzers, etc. 

On Sep 18, 2014, at 3:16 AM, Clemens Wyss DEV clemens...@mysign.ch wrote:

 as our framework so far only knows a few field types dynamic fields may be 
 the way to go... And if there are new fieldtypes the new schema can be 
 distributed through ZooKeeper
 
 -Ursprüngliche Nachricht-
 Von: Erick Erickson [mailto:erickerick...@gmail.com] 
 Gesendet: Mittwoch, 17. September 2014 19:56
 An: solr-user@lucene.apache.org
 Betreff: Re: Solr(j) API for manipulating the schema(.xml)?
 
 Right, you can create new cores over the rest api.
 
 As far as changing the schema, there's no good way to do that that I know of 
 programmatically. In the SolrCloud world, you can upload the schema to 
 ZooKeeper and have it automatically distributed to all the nodes though.
 
 Best,
 Erick
 
 On Wed, Sep 17, 2014 at 2:28 AM, Clemens Wyss DEV clemens...@mysign.ch 
 wrote:
 Is there an API to manipulate/consolidate the schema(.xml) of a Solr-core? 
 Through SolrJ?
 
 Context:
 We already have a generic indexing/searching framework (based on lucene) 
 where any component can act as a so called IndexDataPorvider. This provider 
 delivers the field-types and also the entities to be (converted into 
 documents and then) indexed. Each of these IndexProviders has ist own lucene 
 index.
 So we kind of have the information for the Solr schema.xml.
 
 Hope the intention is clear. And yes the manipulation of the schema.xml is 
 basically only needed when the field types change. Thats why I am looking 
 for a way to consolidate the schema.xml (upon boot, initialization oft he 
 IndexDataProviders ...).
 In 99,999% it won't change, But I'd like to keep the possibility of an 
 IndexDataProvider to hand in its schema.
 
 Also, again driven by the dynamic nature of our framework, can I easily 
 create new cores over Sorj or the Solr-REST API ?

Concurso Mi selfie por los 5. Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: How to implement multilingual word components fields schema?

2014-09-08 Thread Jorge Luis Betancourt Gonzalez
In one of the talks by Trey Grainger (author of Solr in Action) it touches how 
on CareerBuilder are dealing with multilingual with payloads, its a little more 
of work but I think it would payoff. 

On Sep 8, 2014, at 7:58 AM, Jack Krupansky j...@basetechnology.com wrote:

 You also need to take a stance as to whether you wish to auto-detect the 
 language at query time vs. have a UI selection of language vs. attempt to 
 perform the same query for each available language and then determine which 
 has the best relevancy. The latter two options are very sensitive to short 
 queries. Keep in mind that auto-detection for indexing full documents is a 
 different problem that auto-detection for very short queries.
 
 -- Jack Krupansky
 
 -Original Message- From: Ilia Sretenskii
 Sent: Sunday, September 7, 2014 10:33 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to implement multilingual word components fields schema?
 
 Thank you for the replies, guys!
 
 Using field-per-language approach for multilingual content is the last
 thing I would try since my actual task is to implement a search
 functionality which would implement relatively the same possibilities for
 every known world language.
 The closest references are those popular web search engines, they seem to
 serve worldwide users with their different languages and even
 cross-language queries as well.
 Thus, a field-per-language approach would be a sure waste of storage
 resources due to the high number of duplicates, since there are over 200
 known languages.
 I really would like to keep single field for cross-language searchable text
 content, witout splitting it into specific language fields or specific
 language cores.
 
 So my current choice will be to stay with just the ICUTokenizer and
 ICUFoldingFilter as they are without any language specific
 stemmers/lemmatizers yet at all.
 
 Probably I will put the most popular languages stop words filters and
 stemmers into the same one searchable text field to give it a try and see
 if it works correctly in a stack.
 Does specific language related filters stacking work correctly in one field?
 
 Further development will most likely involve some advanced custom analyzers
 like the SimplePolyGlotStemmingTokenFilter to utilize the ICU generated
 ScriptAttribute.
 http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/100236
 https://github.com/whateverdood/cross-lingual-search/blob/master/src/main/java/org/apache/lucene/sandbox/analysis/polyglot/SimplePolyGlotStemmingTokenFilter.java
 
 So I would like to know more about those academic papers on this issue of
 how best to deal with mixed language/mixed script queries and documents.
 Tom, could you please share them? 

Concurso Mi selfie por los 5. Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: Strategies for effective prefix queries?

2014-07-16 Thread Jorge Luis Betancourt Gonzalez
Perhaps what you’re trying to do could be addressed by using the 
EdgeNGramFilterFactory filter? For query suggestions I’m using a very similar 
approach, this is an extract of the configuration I’m using:

tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory maxGramSize=“10 minGramSize=1/

Basically this allows you to get partial matches from any part of the string, 
let’s say the field get’s this content at index time: A brown fox”, this 
document will be matched by the query (“bro”) for instance. My personal 
recommendation is to use this in a separated field that get’s populated through 
a copyField, this way you could apply different boosts.

Greetings,

On Jul 16, 2014, at 2:00 PM, Hayden Muhl haydenm...@gmail.com wrote:

 A copy field does not address my problem, and this has nothing to do with
 stored fields. This is a query parsing problem, not an indexing problem.
 
 Here's the use case.
 
 If someone has a username like bob-smith, I would like it to match
 prefixes of bo and sm. I tokenize the username into the tokens bob
 and smith. Everything is fine so far.
 
 If someone enters bo sm as a search string, I would like bob-smith to
 be one of the results. The query to do this is straight forward,
 username:bo* username:sm*. Here's the problem. In order to construct that
 query, I have to tokenize the search string bo sm **on the client**. I
 don't want to reimplement tokenization on the client. Is there any way to
 give Solr the string bo sm, have Solr do the tokenization, then treat
 each token like a prefix?
 
 
 On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:
 
 So copyField it to another and apply alternative processing there. Use
 eDismax to search both. No need to store the copied field, just index it.
 
 Regards,
 Alex
 On 16/07/2014 2:46 am, Hayden Muhl haydenm...@gmail.com wrote:
 
 Both fields? There is only one field here: username.
 
 
 On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
 
 wrote:
 
 Search against both fields (one split, one not split)? Keep original
 and tokenized form? I am doing something similar with class name
 autocompletes here:
 
 
 
 https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24
 
 Regards,
   Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources: http://www.solr-start.com/ and @solrstart
 Solr popularizers community:
 https://www.linkedin.com/groups?gid=6713853
 
 
 On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl haydenm...@gmail.com
 wrote:
 I'm working on using Solr for autocompleting usernames. I'm running
 into
 a
 problem with the wildcard queries (e.g. username:al*).
 
 We are tokenizing usernames so that a username like solr-user will
 be
 tokenized into solr and user, and will match both sol and use
 prefixes. The problem is when we get solr-u as a prefix, I'm having
 to
 split that up on the client side before I construct a query
 username:solr*
 username:u*. I'm basically using a regex as a poor man's tokenizer.
 
 Is there a better way to approach this? Is there a way to tell Solr
 to
 tokenize a string and use the parts as prefixes?
 
 - Hayden
 
 
 

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu


Solr 4.x and master-slave schema

2014-07-10 Thread Jorge Luis Betancourt Gonzalez
Hi all:

We have a small installation of Solr 3.6 in our hands, right now we have 3 
physical servers (1 master and 2 slaves) the ingestion process it’s done in the 
master which replicates by solr internal mechanism into the slaves, which 
handles all the queries. We are trying to update to Solr 4.x, eventually we 
would like to migrate into SolrCloud, my question essentially is if we migrate 
our Solr 3.6 nodes into Solr 4.9 and keep the same master-slave schema, how 
hard it would be to migrate afterwards to SorlCloud.

Greetings,VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 
de julio de 2014. Ver www.uci.cu


Get position of first occurrence in search result

2014-06-23 Thread Jorge Luis Betancourt Gonzalez
I’m using Solr for an analytic use case, one of the requirements is basically 
given a search query get the position of the first hit. I’m indexing web pages, 
so given a search criteria the client want’s to know the position (first 
occurrence) of his webpage in the result set (if it appears at all). Is any way 
of getting this position without iterating and manually checking the solr 
response? 

Greetings,


VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu


Re: Get position of first occurrence in search result

2014-06-23 Thread Jorge Luis Betancourt Gonzalez
Basically given a few search terms (query) the idea is to know given one or 
more terms in which position your website is located for those specific terms.

On Jun 24, 2014, at 12:12 AM, Aman Tandon amantandon...@gmail.com wrote:

 What kind of search criteria, could you please explain
 
 With Regards
 Aman Tandon
 
 
 On Tue, Jun 24, 2014 at 4:30 AM, Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:
 
 I’m using Solr for an analytic use case, one of the requirements is
 basically given a search query get the position of the first hit. I’m
 indexing web pages, so given a search criteria the client want’s to know
 the position (first occurrence) of his webpage in the result set (if it
 appears at all). Is any way of getting this position without iterating and
 manually checking the solr response?
 
 Greetings,
 
 
 VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de
 julio de 2014. Ver www.uci.cu
 

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu


Re: Get position of first occurrence in search result

2014-06-23 Thread Jorge Luis Betancourt Gonzalez
Yes, but I’m looking for the position of the url field of interest in the 
response of solr. Solr matches the terms against the collection of documents 
and returns sorted list by score, what I’m trying to do is get the position of 
the a specific id in this sorted response. The response could be something like 
position: 5, or position 500. To do this manually suppose the response consists 
of a very large amount of documents (webpages) in this case I would need to 
iterate over the complete response to find the position, which in a worst case 
scenario could be in the last page for instance. For this particular use case 
I’m not so interested in the URL field per se but more on the position a 
certain url has in the full solr response.

On Jun 24, 2014, at 12:31 AM, Walter Underwood wun...@wunderwood.org wrote:

 Solr is designed to do exactly this very, very fast. So there isn't a faster 
 way to do it. But you only need to fetch the URL field. You can ignore 
 everything else.
 
 wunder
 
 On Jun 23, 2014, at 9:32 PM, Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:
 
 Basically given a few search terms (query) the idea is to know given one or 
 more terms in which position your website is located for those specific 
 terms.
 
 On Jun 24, 2014, at 12:12 AM, Aman Tandon amantandon...@gmail.com wrote:
 
 What kind of search criteria, could you please explain
 
 With Regards
 Aman Tandon
 
 
 On Tue, Jun 24, 2014 at 4:30 AM, Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:
 
 I’m using Solr for an analytic use case, one of the requirements is
 basically given a search query get the position of the first hit. I’m
 indexing web pages, so given a search criteria the client want’s to know
 the position (first occurrence) of his webpage in the result set (if it
 appears at all). Is any way of getting this position without iterating and
 manually checking the solr response?
 
 Greetings,
 
 
 VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de
 julio de 2014. Ver www.uci.cu
 
 
 VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio 
 de 2014. Ver www.uci.cu
 
 
 
 

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu


Re: Get position of first occurrence in search result

2014-06-23 Thread Jorge Luis Betancourt Gonzalez
Basically this is for analytical purposes, essentially we want to help people 
(which sites we’ve indexed in our app) to find out for which particular terms 
(in theory related with their domain) they are bad positioned in our index. 
Initially we’re starting with this basic “position per term” but the idea is to 
elaborate further in this direction.

This logic por position finding could be abstracted effectively in a plugin 
inside Solr? I guess it would be more efficient to iterate (or fire the 2 
queries) from within solr itself than in our app (written in PHP, so not so 
fast for some things) speeding up things?

Regards,

On Jun 24, 2014, at 1:42 AM, Aman Tandon amantandon...@gmail.com wrote:

 Jorge, i don't think that solr provide this functionality, you have to
 iterate and solr is very fast in this, you can create a script for that
 which search for pattern(term) and parse(request) the records until get the
 record of that desired url, i don't thing 1/3 seconds time to find out is
 more.
 
 As per the search result analysis, there are very few people who request
 for the second page for their query, otherwise mostly leave the search or
 modify query string. So i better suggest you that the if the website has
 the appropriate and good data it should come on first page, so its better
 to come on first page rather than finding the position.
 
 With Regards
 Aman Tandon
 
 
 On Tue, Jun 24, 2014 at 10:35 AM, Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:
 
 Yes, but I’m looking for the position of the url field of interest in the
 response of solr. Solr matches the terms against the collection of
 documents and returns sorted list by score, what I’m trying to do is get
 the position of the a specific id in this sorted response. The response
 could be something like position: 5, or position 500. To do this manually
 suppose the response consists of a very large amount of documents
 (webpages) in this case I would need to iterate over the complete response
 to find the position, which in a worst case scenario could be in the last
 page for instance. For this particular use case I’m not so interested in
 the URL field per se but more on the position a certain url has in the full
 solr response.
 
 On Jun 24, 2014, at 12:31 AM, Walter Underwood wun...@wunderwood.org
 wrote:
 
 Solr is designed to do exactly this very, very fast. So there isn't a
 faster way to do it. But you only need to fetch the URL field. You can
 ignore everything else.
 
 wunder
 
 On Jun 23, 2014, at 9:32 PM, Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:
 
 Basically given a few search terms (query) the idea is to know given
 one or more terms in which position your website is located for those
 specific terms.
 
 On Jun 24, 2014, at 12:12 AM, Aman Tandon amantandon...@gmail.com
 wrote:
 
 What kind of search criteria, could you please explain
 
 With Regards
 Aman Tandon
 
 
 On Tue, Jun 24, 2014 at 4:30 AM, Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:
 
 I’m using Solr for an analytic use case, one of the requirements is
 basically given a search query get the position of the first hit. I’m
 indexing web pages, so given a search criteria the client want’s to
 know
 the position (first occurrence) of his webpage in the result set (if
 it
 appears at all). Is any way of getting this position without
 iterating and
 manually checking the solr response?
 
 Greetings,
 
 
 VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de
 julio de 2014. Ver www.uci.cu
 
 
 VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de
 julio de 2014. Ver www.uci.cu
 
 
 
 
 
 VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de
 julio de 2014. Ver www.uci.cu
 

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu


Re: Customizing Solr; Where to draw the line?

2014-06-09 Thread Jorge Luis Betancourt Gonzalez
I’ve certainly go for the 2nd option. Depending of what you need you won’t need 
to modify Solr itself but extend it using different plugins for what you need. 
You’ll need to write different components depending on your specific 
requirements. I definitely recommend the talks from Trey Grainger, from 
CareerBuilder. I remember seeing in some of the talks they have A/B testing 
built into Solr, and a lot of other “crazy” things, so it would be a good 
starting point, and it will provide a look on what you could accomplish by 
extending Solr.

Of course you’ll need to update your source between big releases of Solr, and 
perhaps between some minor ones, but this way you don’t need to worry about the 
latency or maintain a new search layer between the client and Solr. 

I hope it helps,

On Jun 8, 2014, at 10:38 PM, Phanindra R phani...@gmail.com wrote:

 Hi,
 
 We have decided to migrate from Lucene 3.x to latest Solr. A lot of
 architectural discussions are going on. There are two possible approaches.
 
 Please note that our customer-facing app (or any client) and Search are
 hosted on different machines.
 
 *1) Have a clean architecture*
- Solr takes care of customized search only.
 
   - We certainly have to override some filtering, scoring,etc.
 
- There will be an intermediary search-app that
 
   - receives queries
  - does a/b testing assignments, and other non-search stuff.
  - does query expansion / rewriting (to avoid every Solr shard doing
  that)
  - transforms query into Solr syntax and uses Solr's http API to
  consume it.
  - returns the response to customer-facing app or whatever the client
  is.
 
   The problem with this approach is the additional layer and the latency
 between search-app and solr. The client of search has to make an API call,
 across the network, to the intermediary search-app which in turns makes
 another Http API call to Solr.
 
 *2) Customize Solr to the full extent*
 
   - Do all the crazy stuff within Solr.
   - We can literally create a new url and register a handler class to
   process that. With some limitations, we should be able to do almost
   anything.
 
 The benefit of this approach is that it obviates the additional layer
 and the latency. However, I see a lot of long-term problems like hard to
 upgrade Solr's version, Dev flexibility (usage of Spring, Hib, etc.).
 
 How about a distributed search? Where do above approaches stand?
 
 I understand that this is a subjective question. It'd be helpful if you
 could share your thoughts and experiences.
 
 Thanks.

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu


Percolator feature

2014-05-28 Thread Jorge Luis Betancourt Gonzalez
Is there some work around in Solr ecosystem to get something similar to the 
percolator feature offered by elastic search? 

Greetings!VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 
de julio de 2014. Ver www.uci.cu


Re: Writing a customize updateRequestHandler

2014-02-03 Thread Jorge Luis Betancourt Gonzalez
In the book Apache Solr Beginner’s Guide there is a section dedicated to write 
new Solr plugins, perhaps it would be a good place to start, also in the wiki 
there is a page about this, but the it’s a light introduction. I’ve found that 
a very good starting point it’s just browse throw the code of some standard 
components similar to the one you’re trying to customize.

On Feb 3, 2014, at 9:00 AM, neerajp neeraj_star2...@yahoo.com wrote:

 Hi,
 I want to write a custom updateRequestHandler.
 Can you pl.s guide me the steps I need to perform for that ?
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Writing-a-customize-updateRequestHandler-tp4115059.html
 Sent from the Solr - User mailing list archive at Nabble.com.


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Solr Nutch

2014-01-28 Thread Jorge Luis Betancourt Gonzalez
Q1: Nutch doesn’t only handle the parse of HTML files, it also use hadoop to 
achieve large-scale crawling using multiple nodes, it fetch the content of the 
HTML file, and yes it also parse its content.

Q2: In our case we use sold to crawl some website, store the content in one 
“main” solr core. We also have a web app with the typical “search box” we use a 
separated core to store the queries made by our users.

Q3: Not currently using SolrCloud so I’m going to let this one pass to a more 
experienced fellow.

On Jan 28, 2014, at 11:36 AM, rashmi maheshwari maheshwari.ras...@gmail.com 
wrote:

 Hi,
 
 Question1 -- When Solr could parse html, documents like doc, excel pdf
 etc, why do we need nutch to parse html files? what is different?
 
 Questions 2: When do we use multiple core in solar? any practical business
 case when we need multiple cores?
 
 Question 3: When do we go for cloud? What is meaning of implementing solr
 cloud?
 
 
 -- 
 Rashmi
 Be the change that you want to see in this world!
 www.minnal.zor.org
 disha.resolve.at
 www.artofliving.org


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: PHP + Solr

2014-01-28 Thread Jorge Luis Betancourt Gonzalez
I’ve some experience using Solarium and have been great so far. In particular 
we use the NelmioSolariumBundle to integrate with Symfony2.

Greetings!

On Jan 28, 2014, at 1:54 PM, Felipe Dantas de Souza Paiva 
cad_fpa...@uolinc.com wrote:

 ‎Hi Folks,
 
 I would like to know what is the best way to integrate PHP and Apache Solr. 
 Until now I've found two options:
 
 1) http://www.php.net/manual/en/intro.solr.php
 
 2) http://www.solarium-project.org/
 
 What do you guys say?
 
 Cheers,
 
 Felipe
 
 
 
 AVISO: A informaç?o contida neste e-mail, bem como em qualquer de seus 
 anexos, é CONFIDENCIAL e destinada ao uso exclusivo do(s) destinat?rio(s) 
 acima referido(s), podendo conter informaç?es sigilosas e/ou legalmente 
 protegidas. Caso você n?o seja o destinat?rio desta mensagem, informamos que 
 qualquer divulgaç?o, distribuiç?o ou c?pia deste e-mail e/ou de qualquer de 
 seus anexos é absolutamente proibida. Solicitamos que o remetente seja 
 comunicado imediatamente, respondendo esta mensagem, e que o original desta 
 mensagem e de seus anexos, bem como toda e qualquer c?pia e/ou impress?o 
 realizada a partir destes, sejam permanentemente apagados e/ou destru?dos. 
 Informaç?es adicionais sobre nossa empresa podem ser obtidas no site 
 http://sobre.uol.com.br/.
 
 NOTICE: The information contained in this e-mail and any attachments thereto 
 is CONFIDENTIAL and is intended only for use by the recipient named herein 
 and may contain legally privileged and/or secret information.
 If you are not the e-mail´s intended recipient, you are hereby notified that 
 any dissemination, distribution or copy of this e-mail, and/or any 
 attachments thereto, is strictly prohibited. Please immediately notify the 
 sender replying to the above mentioned e-mail address, and permanently delete 
 and/or destroy the original and any copy of this e-mail and/or its 
 attachments, as well as any printout thereof. Additional information about 
 our company may be obtained through the site http://www.uol.com.br/ir/.


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Solr server requirements for 100+ million documents

2014-01-28 Thread Jorge Luis Betancourt Gonzalez
Previously in the list a spreadsheet has been mentioned, taking into account 
that you already have documents in an index you could extract the needed 
information from your index and feed it into the spreadsheet and it probably 
will give you a rough approximated of the hardware you’ll bee needing. Also if 
I’m not mistaken no SolrCloud approximation is provided by this “tool”.

Greetings!

On Jan 28, 2014, at 11:02 PM, Susheel Kumar susheel.ku...@thedigitalgroup.net 
wrote:

 Thanks, Jack. That helps.
 
 -Original Message-
 From: Jack Krupansky [mailto:j...@basetechnology.com] 
 Sent: Tuesday, January 28, 2014 8:01 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr server requirements for 100+ million documents
 
 Lucene and Solr work best if the full index can be cached in OS memory. 
 Sure, Lucene/Solr does work properly once the index no longer fits, but 
 performance will drop off.
 
 I would say that you could fit 100 million moderate-size documents on a 
 single Solr server - provided that you give the OS enough RAM for the full 
 Lucene index. That said, if you want to configure a SolrCloud cluster with 
 shards, you can use more modest, commodity servers with less RAM, provided 
 each server still fits it's fraction of the total Lucene index in that 
 server's OS memory (file cache.)
 
 You may also need to add replicas for each shard to accommodate query load - 
 proof-of-concept testing is needed to verify that. It is worth noting that 
 sharding can improve total query performance since each node only searches a 
 fraction of the total data and those searches are done in parallel  (since 
 they are on different machines.)
 
 -- Jack Krupansky
 
 -Original Message-
 From: Susheel Kumar
 Sent: Sunday, January 26, 2014 10:54 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Solr server requirements for 100+ million documents
 
 Thank you Erick for your valuable inputs. Yes, we have to re-index data again 
  again. I'll look into possibility of tuning db access.
 
 On SolrJ and automating the indexing (incremental as well as one time) I want 
 to get your opinion on below two points. We will be indexing separate sets of 
 tables with similar data structure
 
 - Should we use SolrJ and write Java programs that can be scheduled to 
 trigger indexing on demand/schedule based.
 
 - Is using SolrJ a better idea even for searching than using SolrNet? As our 
 frontend is in .Net so we started using SolrNet but I am afraid down the road 
 when we scale/support SolrClod using SolrJ is better?
 
 
 Thanks
 Susheel
 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Sunday, January 26, 2014 8:37 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr server requirements for 100+ million documents
 
 Dumping the raw data would probably be a good idea. I guarantee you'll be 
 re-indexing the data several times as you change the schema to accommodate 
 different requirements...
 
 But it may also be worth spending some time figuring out why the DB access is 
 slow. Sometimes one can tune that.
 
 If you go the SolrJ route, you also have the possibility of setting up N 
 clients to work simultaneously, sometimes that'll help.
 
 FWIW,
 Erick
 
 On Sat, Jan 25, 2014 at 11:06 PM, Susheel Kumar 
 susheel.ku...@thedigitalgroup.net wrote:
 Hi Kranti,
 
 Attach are the solrconfig  schema xml for review. I did run indexing 
 with just few fields (5-6 fields) in schema.xml  keeping the same db 
 config but Indexing almost still taking similar time (average 1 
 million records 1
 hr) which confirms that the bottleneck is in the data acquisition 
 which in our case is oracle database. I am thinking to not use 
 dataimporthandler / jdbc to get data from Oracle but to rather dump 
 data somehow from oracle using SQL loader and then index it. Any thoughts?
 
 Thnx
 
 -Original Message-
 From: Kranti Parisa [mailto:kranti.par...@gmail.com]
 Sent: Saturday, January 25, 2014 12:08 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr server requirements for 100+ million documents
 
 can you post the complete solrconfig.xml file and schema.xml files to 
 review all of your settings that would impact your indexing performance.
 
 Thanks,
 Kranti K. Parisa
 http://www.linkedin.com/in/krantiparisa
 
 
 
 On Sat, Jan 25, 2014 at 12:56 AM, Susheel Kumar  
 susheel.ku...@thedigitalgroup.net wrote:
 
 Thanks, Svante. Your indexing speed using db seems to really fast.
 Can you please provide some more detail on how you are indexing db 
 records. Is it thru DataImportHandler? And what database? Is that 
 local db?  We are indexing around 70 fields (60 multivalued) but data 
 is not populated always in all fields. The average size of document 
 is in
 5-10 kbs.
 
 -Original Message-
 From: saka.csi...@gmail.com [mailto:saka.csi...@gmail.com] On Behalf 
 Of svante karlsson
 Sent: Friday, January 24, 2014 5:05 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 

Re: Implementing an alerting feature

2014-01-27 Thread Jorge Luis Betancourt Gonzalez
I believe that you are looking for something similar to the percolator feature 
present in elasticsearch. I remember something about a solar implementation 
being discussed here some time ago. Anyone knows if there have been any 
progress in this area?

On Jan 27, 2014, at 8:18 AM, Furkan KAMACI furkankam...@gmail.com wrote:

 Hi Charlie;
 
 Is there any written documentation that explains your library?
 
 Thanks;
 Furkan KAMACI
 
 
 2014-01-27 Charlie Hull char...@flax.co.uk
 
 On 27/01/2014 08:50, elmerfudd wrote:
 
 I want to implement an alert service in my solr system.
 In the FAST ESP system the service is called Real Time Alerting.
 
 The service I'm looking for is:
 - a document is fed to solr.
 - without the document indexed , a set of queries run on the document
 - if the document answers a query - an alert will be sent in near
 Real-Time.
 
 
 You might want to take a look at Luwak, a library we built recently for
 running lots of stored queries in an efficient manner. We use this for
 media monitoring applications.
 
 https://github.com/flaxsearch/luwak
 
 Cheers
 
 Charlie
 
 
 
 
 
 
 
 --
 View this message in context: http://lucene.472066.n3.
 nabble.com/Implementing-an-alerting-feature-tp4113666.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 --
 Charlie Hull
 Flax - Open Source Enterprise Search
 
 tel/fax: +44 (0)8700 118334
 mobile:  +44 (0)7767 825828
 web: www.flax.co.uk
 


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Solr Related Search Suggestions

2014-01-27 Thread Jorge Luis Betancourt Gonzalez
If I’m not remembering incorrectly Trey Grainger in one of his talks explained 
a few techniques that could be of use. If the equivalency is not dynamically 
you could just use synonyms. Otherwise some kind of offline processing should 
be used to compute the similarity between your queries (given that very little 
or none textual similarity it’s present in your queries). 

On Jan 27, 2014, at 4:29 AM, kumar pavan2...@gmail.com wrote:

 What is the best way to implement related search suggestions.
 
 For example :
 
 If the user is looking for marriage halls i need to show results like
 catering services, photography, wedding cards, invitation cards,
 music organisers. 
 
 
 Thanks  Regards,
 kumar
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Related-Search-Suggestions-tp4113672.html
 Sent from the Solr - User mailing list archive at Nabble.com.


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Unit testing custom update request processor

2014-01-07 Thread Jorge Luis Betancourt Gonzalez
Happy new year!

I’ve developed some custom update request processors to accomplish some custom 
logic needed in some user cases. I’m trying to write test for this processor, 
but I’d like to test in a very similar way of how the built in processors are 
tested in the solr source code. Is there any advice on how accomplish this or 
some experience that someone more experienced could share?

Greetings!


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: ANNOUNCE: Apache Solr Reference Guide 4.6

2013-12-09 Thread Ing. Jorge Luis Betancourt Gonzalez
Is it possible to export the doc into markdown? 

- Mensaje original -
De: Chris Hostetter hossman_luc...@fucit.org
Para: solr-user@lucene.apache.org
Enviados: Lunes, 9 de Diciembre 2013 14:00:34
Asunto: Re: ANNOUNCE: Apache Solr Reference Guide 4.6


: Can we please give some thought to producing these manuals in ebook formats?

People have given it thought, but it's not as simple as just snapping our 
fingers and making it happen.

If you would like to contibute to the effort of figuring out the
how/where/what to make this happening, there is an existing jira for 
dicussing it.

https://issues.apache.org/jira/browse/SOLR-5467



-Hoss
http://www.lucidworks.com/

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


How to boost documents with all the query terms

2013-12-07 Thread Ing. Jorge Luis Betancourt Gonzalez
Hi:

I'm using solr 3.6 with dismax query parser, I've found that docs that doesn't 
has all the query terms get ranked above other that contains all the terms in 
the search query. Using debugQuery I could see that the most part of the score 
in this cases come from the coord(q,d) factor. Is there any way I could boost 
the documents that contain all the search query terms?

Greetings!

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Introducing Luwak for high-performance stored Lucene queries

2013-12-06 Thread Ing. Jorge Luis Betancourt Gonzalez
+1 on this.

- Mensaje original -
De: Otis Gospodnetic otis.gospodne...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Viernes, 6 de Diciembre 2013 9:35:25
Asunto: Re: Introducing Luwak for high-performance stored Lucene queries

Hi Charlie,

Very nice - thanks!

I'd love to see a side-by-side comparison with ES percolator. got
something like that in your blog topic queue?

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Fri, Dec 6, 2013 at 9:29 AM, Charlie Hull char...@flax.co.uk wrote:

 Hi all,

 We've now released the library we mentioned in our presentation at Lucene
 Revolution: https://github.com/flaxsearch/luwak

 You can use this to apply tens of thousands of stored Lucene queries to an
 incoming document in a second or so on relatively modest hardware. We use
 it for media monitoring applications but it could equally be useful for
 categorisation, classification etc.

 It's currently based on a fork of Lucene (details supplied) but hopefully
 it'll work with release versions soon.

 Feedback is very welcome!

 Cheers

 Charlie

 --
 Charlie Hull
 Flax - Open Source Enterprise Search

 tel/fax: +44 (0)8700 118334
 mobile:  +44 (0)7767 825828
 web: www.flax.co.uk



III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: solr as a service for multiple projects in the same environment

2013-12-02 Thread Ing. Jorge Luis Betancourt Gonzalez
I think that one experience in this area could by provided by Tray Grainger, 
author of Solr in Action, I believe that some of his work on careerbuilder 
involve the creation of something (somehow) similar to what you're trying to 
accomplish. I must say that I'm also interested in this topic, but haven't had 
the time to really do anything about this.

- Mensaje original -
De: adfel70 adfe...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Domingo, 1 de Diciembre 2013 2:41:00
Asunto: Re: solr as a service for multiple projects in the same environment

The risk is if you buy mistake mess up a cluster while doing maintenance on
one of the systems, you can affect the other system.
Its a pretty amorfic risk.
Aside from having multiple systems share the same hardware resources, I
don't see any other real risk.

Are your collections share the same topology in terms of shards and
replicas?
Do you manually configure the nodes on which each collection is created so
that you'll still have some level of seperation between the systems?




michael.boom wrote
 Hi,
 
 There's nothing unusual in what you are trying to do, this scenario is
 very common.
 
 To answer your questions:
 1. as I understand I can separate the configs of each collection in
 zookeeper. is it correct? 
 Yes, that's correct. You'll have to upload your configs to ZK and use the
 CollectionAPI to create your collections.
 
2.are there any solr operations that can be performed on collection A and
somehow affect collection B? 
 No, I can't think of any cross-collection operation. Here you can find a
 list of collection related operations:
 https://cwiki.apache.org/confluence/display/solr/Collections+API
 
3. is the solr cache separated for each collection? 
 Yes, separate and configurable in solrconfig.xml for each collection.
 
4. I assume that I'll encounter a problem with the os cache, when the
different indices will compete on the same memory, right? how severe is this
issue? 
 Hardware can be a bottleneck. If all your collection will face the same
 load you should try to give solr a RAM amount equal to the index size (all
 indexes)
 
5. any other advice on building such an architecture? does the maintenance
overhead of maintaining multiple clusters in production really overwhelm the
problems and risks of using the same cluster for multiple systems? 
 I was in the same situation as you, and putting everything in multiple
 collections in just one cluster made sense for me : it's easier to manage
 and has no obvious downside. As for risks of using the same cluster for
 multiple systems they are pretty much the same  in both scenarios. Only
 that with multiple clusters you'll have much more machines to manage.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-as-a-service-for-multiple-projects-in-the-same-environment-tp4103523p4104206.html
Sent from the Solr - User mailing list archive at Nabble.com.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Client-side proxy for Solr 4.5.0

2013-11-26 Thread Ing. Jorge Luis Betancourt Gonzalez
Perhaps what you want is a transparent proxy? You could use nginx, squid, 
varnish, etc. W've been evaluating varnish as a posibility to run in front of 
our solr server and take advantage of the HTTP caching that varnish does so 
well.

Greetings!

- Mensaje original -
De: Markus Jelsma markus.jel...@openindex.io
Para: solr-user@lucene.apache.org
Enviados: Martes, 26 de Noviembre 2013 13:53:31
Asunto: RE: Client-side proxy for Solr 4.5.0

I don't think you mean client-side proxy. You need a server side layer such as 
a normal web application or good proxy. We use Nginx, it is very fast and very 
feature rich. Its config scripting is usually enough to restrict access and 
limit input parameters. We also use Nginx's embedded Perl and Lua scripting 
besides its config scripting to implement more difficult logic.

 
 
-Original message-
 From:Reyes, Mark mark.re...@bpiedu.com
 Sent: Tuesday 26th November 2013 19:27
 To: solr-user@lucene.apache.org
 Subject: Client-side proxy for Solr 4.5.0
 
 Are there any GOOD client-side solutions to proxy a Solr 4.5.0 instance so 
 that the end-user can see  their queries w/o being able to directly access 
 :8983?
 
 Applications/frameworks used:
 - Solr 4.5.0
 - AJAX Solr (javascript library)
 
 Thank you,
 Mark
 
 IMPORTANT NOTICE: This e-mail message is intended to be received only by 
 persons entitled to receive the confidential information it may contain. 
 E-mail messages sent from Bridgepoint Education may contain information that 
 is confidential and may be legally privileged. Please do not read, copy, 
 forward or store this message unless you are an intended recipient of it. If 
 you received this transmission in error, please notify the sender by reply 
 e-mail and delete the message and any attachments.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Solr logs encoding to UTF8

2013-11-22 Thread Ing. Jorge Luis Betancourt Gonzalez
Hi everybody:

Is there any way of forcing an UTF-8 conversion on the queries that are logged 
into the log? I've deployed solr in tomcat7. The file appears to be an UTF-8 
file but I'm seeing this in the logs:

INFO: [] webapp=/solr path=/select 
params={fl=*,scorestart=0q=disñemos+el+mundohl.simple.pre=bhl.simple.post=/bhl.fl=title,content,url,description,keywordswt=jsonhl=truerows=20}
 hits=48865 status=0 QTime=155.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Strange behavior of gap fragmenter on highlighting

2013-11-13 Thread Ing. Jorge Luis Betancourt Gonzalez
I'm seeing a rare behavior of the gap fragmenter on solr 3.6. Right now this is 
my configuration for the gap fragmenter:

  fragmenter name=gap
  default=true
  class=solr.highlight.GapFragmenter
lst name=defaults
  int name=hl.fragsize150/int
/lst
  /fragmenter

This is the basic configuration, just tweaked the fragsize parameter to get 
shorter fragments. The thing is that for 1 particular PDF document in my 
results I get a really long snippet, way over 150 characters. This get a little 
more odd, if I change the 150 value for 100 the snippet for the same document 
it's normal ~ 100 characters. The type of the field being highlighted is this:

fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
languange=Spanish/
charFilter class=solr.HTMLStripCharFilterFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1 types=characters.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

Any ideas about what's happening?? Or how could I debug what is really going 
on??

Greetings!

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Auto Suggest - Time decay

2013-10-01 Thread Ing. Jorge Luis Betancourt Gonzalez
Are you using the suggester component? or a separated core? I've used a 
separated core to store suggestions and order this suggestions (queries 
performed on the frontend) using a time decay function, and it works great for 
me.

Regards,

- Mensaje original -
De: SolrLover bbar...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Martes, 1 de Octubre 2013 12:12:13
Asunto: Auto Suggest - Time decay

I am trying to implement an auto suggest based on time decay function. I have
a separate index just to store auto suggest keywords.

I would be calculating the frequency over time rather than just calculating
just based on frequency alone. 

I am thinking of using a database to perform the calculation and update the
SOLR index with the boost calculated based on time decay function. I am not
sure if there is a better way to do this...

I need to boost the terms based on the frequency over time,

Ex: when someone searches for 'apple' 1 times during a iphone launch
(one particular day) shouldn't really make apple come up in the auto
suggestion always when someone types in the keyword 'a' rather it should
lose its popularity exponentially..

Anyone has any suggestions?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965.html
Sent from the Solr - User mailing list archive at Nabble.com.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Auto Suggest - Time decay

2013-10-01 Thread Ing. Jorge Luis Betancourt Gonzalez
For that core just use a boost factor as explained on [1]:

You could use a query like this to see (before make any change) how your 
suggestions will be retrieved, in this case a query for goog has been made, 
and recent documents will be boosted (an extra bonus will be given for the 
newer documents).

http://localhost:8983/solr/select?q={!boost 
b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}goog

If this is enough for you you could poot the boost parameter in your request 
handler and make it even simpler so any query againsta this particular request 
handler will be automatically boosted by date.

PS: You could tweak the above formula used in the boost parameter for a more 
suitable to your needs.

- Mensaje original -
De: SolrLover bbar...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Martes, 1 de Octubre 2013 12:19:51
Asunto: Re: Auto Suggest - Time decay

I am using a totally separate core for storing the auto suggest keywords.

Would you be able to send me some more details on your implementation? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965p4092969.html
Sent from the Solr - User mailing list archive at Nabble.com.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Auto Suggest - Time decay

2013-10-01 Thread Ing. Jorge Luis Betancourt Gonzalez
Sorry, I forgot the link:

[1] - http://wiki.apache.org/solr/SolrRelevancyFAQ

- Mensaje original -
De: Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu
Para: solr-user@lucene.apache.org
Enviados: Martes, 1 de Octubre 2013 13:34:03
Asunto: Re: Auto Suggest - Time decay

For that core just use a boost factor as explained on [1]:

You could use a query like this to see (before make any change) how your 
suggestions will be retrieved, in this case a query for goog has been made, 
and recent documents will be boosted (an extra bonus will be given for the 
newer documents).

http://localhost:8983/solr/select?q={!boost 
b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}goog

If this is enough for you you could poot the boost parameter in your request 
handler and make it even simpler so any query againsta this particular request 
handler will be automatically boosted by date.

PS: You could tweak the above formula used in the boost parameter for a more 
suitable to your needs.

- Mensaje original -
De: SolrLover bbar...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Martes, 1 de Octubre 2013 12:19:51
Asunto: Re: Auto Suggest - Time decay

I am using a totally separate core for storing the auto suggest keywords.

Would you be able to send me some more details on your implementation? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965p4092969.html
Sent from the Solr - User mailing list archive at Nabble.com.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-28 Thread Ing. Jorge Luis Betancourt Gonzalez
I forgot to mention you could check the boost section on the configuration file 
of the core to see how your suggestions will be ranked, basically the boost 
factor for each field allows you to decide which suggestion do you like to come 
first, perhaps in your app you could keep track of how much a suggestion given 
to a user is actually used as the query and boost this suggestions as is more 
likely to become a query for the user; thinking a little ahead this could 
improve your user experience and additionally low the load on your server, 
because if a suggestion given to a high number of users become a query, this 
query should already be in the cache. This are just thoughts but I hope could 
be useful to you.

Regards,

- Mensaje original -
De: Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu
Para: solr-user@lucene.apache.org
Enviados: Viernes, 27 de Septiembre 2013 19:44:28
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

Actually I don't use that field, it could be used to do some form of basic 
collaborative filtering, so you could use a high value for items in your 
collection that you want to come first, but in my case this was not a 
requirement and I don't use it at all.

- Mensaje original -
De: JMill apprentice...@googlemail.com
Para: solr-user@lucene.apache.org
Enviados: Viernes, 27 de Septiembre 2013 16:19:40
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

I am not sure about the value to use for the option popularity.  Is there
a method or do you just go with some arbitrary number?

On Thursday, September 26, 2013, Ing. Jorge Luis Betancourt Gonzalez 
jlbetanco...@uci.cu wrote:
 Great!! I haven't see your message yet, perhaps you could create a PR to
that Github repository, son it will be in sync with current versions of
Solr.

 - Mensaje original -
 De: JMill apprentice...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviados: Jueves, 26 de Septiembre 2013 9:10:49
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
columns)

 solved.


 On Thu, Sep 26, 2013 at 1:50 PM, JMill apprentice...@googlemail.com
wrote:

 I managed to get rid of the query error by playing jquery file in the
 velocity folder and adding line: script type=text/javascript

src=#{url_for_solr}/admin/file?file=/velocity/jquery.min.jscontentType=text/javascript/script.
 That has not solved the issues the console is showing a new error -
 [13:42:55.181] TypeError: $.browser is undefined @

http://localhost:8983/solr/ac/admin/file?file=/velocity/jquery.autocomplete.jscontentType=text/javascript:90
.
 Any ideas?


 On Thu, Sep 26, 2013 at 1:12 PM, JMill apprentice...@googlemail.com
wrote:

 Do you know the directory the #{url_root} in script
 type=text/javascript src=#{url_root}/js/lib/
 jquery-1.7.2.min.js/script points too? and same for
 #{url_for_solr} script type=text/javascript
 src=#{url_for_solr}/js/lib/jquery-1.7.2.min.js/script


 On Wed, Sep 25, 2013 at 7:33 PM, Ing. Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:

 Try quering the core where the data has been imported, something like:

 http://localhost:8983/solr/suggestions/select?q=uc

 In the previous URL suggestions is the name I give to the core, so this
 should change, if you get results, then the problem could be the jquery
 dependency. I don't remember doing any change, as far as I know that js
 file is bundled with solr (at leat in 3.x) version perhaps you could
change
 it the correct jquery version on solr 4.4, if you go into the admin
panel
 (in solr 3.6):

 http://localhost:8983/solr/admin/schema.jsp

 And inspect the loaded code, the required file (jquery-1.4.2.min.js)
 gets loaded in solr 4.4 it should load a similar file, but perhaps a
more
 recent version.

 Perhaps you could change that part to something like:

   script type=text/javascript
 src=#{url_root}/js/lib/jquery-1.7.2.min.js/script

 Which is used at least on a solr 4.1 that I have laying aroud here
 somewhere.

 In any case you can test the suggestions using the URL that I suggest
on
 the top of this mail, in that case you should be able to see the
possible
 results, of course in a less fancy way.

 - Mensaje original -
 De: JMill apprentice...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviados: Miércoles, 25 de Septiembre 2013 13:59:32
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
 columns)

 Could it be the jquery library that is the problem?   I opened up
 solr-home/ac/conf/velocity/head.vm with an editor and I see a reference
 to
 the jquery library but I can't seem to find the directory referenced,
  line:  script type=text/javascript
 src=#{url_for_solr}/admin/jquery-1.4.3.min.js. Do you know where
 #{url_for_solr} points to?


 On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez


III Escuela Internacional de

Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-27 Thread Ing. Jorge Luis Betancourt Gonzalez
Actually I don't use that field, it could be used to do some form of basic 
collaborative filtering, so you could use a high value for items in your 
collection that you want to come first, but in my case this was not a 
requirement and I don't use it at all.

- Mensaje original -
De: JMill apprentice...@googlemail.com
Para: solr-user@lucene.apache.org
Enviados: Viernes, 27 de Septiembre 2013 16:19:40
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

I am not sure about the value to use for the option popularity.  Is there
a method or do you just go with some arbitrary number?

On Thursday, September 26, 2013, Ing. Jorge Luis Betancourt Gonzalez 
jlbetanco...@uci.cu wrote:
 Great!! I haven't see your message yet, perhaps you could create a PR to
that Github repository, son it will be in sync with current versions of
Solr.

 - Mensaje original -
 De: JMill apprentice...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviados: Jueves, 26 de Septiembre 2013 9:10:49
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
columns)

 solved.


 On Thu, Sep 26, 2013 at 1:50 PM, JMill apprentice...@googlemail.com
wrote:

 I managed to get rid of the query error by playing jquery file in the
 velocity folder and adding line: script type=text/javascript

src=#{url_for_solr}/admin/file?file=/velocity/jquery.min.jscontentType=text/javascript/script.
 That has not solved the issues the console is showing a new error -
 [13:42:55.181] TypeError: $.browser is undefined @

http://localhost:8983/solr/ac/admin/file?file=/velocity/jquery.autocomplete.jscontentType=text/javascript:90
.
 Any ideas?


 On Thu, Sep 26, 2013 at 1:12 PM, JMill apprentice...@googlemail.com
wrote:

 Do you know the directory the #{url_root} in script
 type=text/javascript src=#{url_root}/js/lib/
 jquery-1.7.2.min.js/script points too? and same for
 #{url_for_solr} script type=text/javascript
 src=#{url_for_solr}/js/lib/jquery-1.7.2.min.js/script


 On Wed, Sep 25, 2013 at 7:33 PM, Ing. Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:

 Try quering the core where the data has been imported, something like:

 http://localhost:8983/solr/suggestions/select?q=uc

 In the previous URL suggestions is the name I give to the core, so this
 should change, if you get results, then the problem could be the jquery
 dependency. I don't remember doing any change, as far as I know that js
 file is bundled with solr (at leat in 3.x) version perhaps you could
change
 it the correct jquery version on solr 4.4, if you go into the admin
panel
 (in solr 3.6):

 http://localhost:8983/solr/admin/schema.jsp

 And inspect the loaded code, the required file (jquery-1.4.2.min.js)
 gets loaded in solr 4.4 it should load a similar file, but perhaps a
more
 recent version.

 Perhaps you could change that part to something like:

   script type=text/javascript
 src=#{url_root}/js/lib/jquery-1.7.2.min.js/script

 Which is used at least on a solr 4.1 that I have laying aroud here
 somewhere.

 In any case you can test the suggestions using the URL that I suggest
on
 the top of this mail, in that case you should be able to see the
possible
 results, of course in a less fancy way.

 - Mensaje original -
 De: JMill apprentice...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviados: Miércoles, 25 de Septiembre 2013 13:59:32
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
 columns)

 Could it be the jquery library that is the problem?   I opened up
 solr-home/ac/conf/velocity/head.vm with an editor and I see a reference
 to
 the jquery library but I can't seem to find the directory referenced,
  line:  script type=text/javascript
 src=#{url_for_solr}/admin/jquery-1.4.3.min.js. Do you know where
 #{url_for_solr} points to?


 On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-26 Thread Ing. Jorge Luis Betancourt Gonzalez
Great!! I haven't see your message yet, perhaps you could create a PR to that 
Github repository, son it will be in sync with current versions of Solr.

- Mensaje original -
De: JMill apprentice...@googlemail.com
Para: solr-user@lucene.apache.org
Enviados: Jueves, 26 de Septiembre 2013 9:10:49
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

solved.


On Thu, Sep 26, 2013 at 1:50 PM, JMill apprentice...@googlemail.com wrote:

 I managed to get rid of the query error by playing jquery file in the
 velocity folder and adding line: script type=text/javascript
 src=#{url_for_solr}/admin/file?file=/velocity/jquery.min.jscontentType=text/javascript/script.
 That has not solved the issues the console is showing a new error -
 [13:42:55.181] TypeError: $.browser is undefined @
 http://localhost:8983/solr/ac/admin/file?file=/velocity/jquery.autocomplete.jscontentType=text/javascript:90;.
 Any ideas?


 On Thu, Sep 26, 2013 at 1:12 PM, JMill apprentice...@googlemail.comwrote:

 Do you know the directory the #{url_root} in script
 type=text/javascript src=#{url_root}/js/lib/
 jquery-1.7.2.min.js/script points too? and same for
 #{url_for_solr} script type=text/javascript
 src=#{url_for_solr}/js/lib/jquery-1.7.2.min.js/script


 On Wed, Sep 25, 2013 at 7:33 PM, Ing. Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:

 Try quering the core where the data has been imported, something like:

 http://localhost:8983/solr/suggestions/select?q=uc

 In the previous URL suggestions is the name I give to the core, so this
 should change, if you get results, then the problem could be the jquery
 dependency. I don't remember doing any change, as far as I know that js
 file is bundled with solr (at leat in 3.x) version perhaps you could change
 it the correct jquery version on solr 4.4, if you go into the admin panel
 (in solr 3.6):

 http://localhost:8983/solr/admin/schema.jsp

 And inspect the loaded code, the required file (jquery-1.4.2.min.js)
 gets loaded in solr 4.4 it should load a similar file, but perhaps a more
 recent version.

 Perhaps you could change that part to something like:

   script type=text/javascript
 src=#{url_root}/js/lib/jquery-1.7.2.min.js/script

 Which is used at least on a solr 4.1 that I have laying aroud here
 somewhere.

 In any case you can test the suggestions using the URL that I suggest on
 the top of this mail, in that case you should be able to see the possible
 results, of course in a less fancy way.

 - Mensaje original -
 De: JMill apprentice...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviados: Miércoles, 25 de Septiembre 2013 13:59:32
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
 columns)

 Could it be the jquery library that is the problem?   I opened up
 solr-home/ac/conf/velocity/head.vm with an editor and I see a reference
 to
 the jquery library but I can't seem to find the directory referenced,
  line:  script type=text/javascript
 src=#{url_for_solr}/admin/jquery-1.4.3.min.js. Do you know where
 #{url_for_solr} points to?


 On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:
  Perhaps this could be an issue, I know that this works perfectly in
 solr
 3.6 (this is the one I was using). Currently I don't have a solr 4.4 to
 do
 some tests, but what have been done in that core should work in solr 4.4,
 perhaps there is a setting that need some tweaking but it's impossible of
 knowing without checking the logs. In case that any incompatibility is
 present it should pop out on the logs.
 
  Regards,
 
  - Mensaje original -
  De: JMill apprentice...@googlemail.com
  Para: solr-user@lucene.apache.org
  Enviados: Miércoles, 25 de Septiembre 2013 11:10:32
  Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
 columns)
 
  I simple query through admin (*:*) confirms the data is exists. The
 version
  I'm working with is solr 4.4.0. The autocomplete manual refers to 3.x.
 I
  wonder of this is the problem?
 
 
  On Wed, Sep 25, 2013 at 4:01 PM, Ing. Jorge Luis Betancourt Gonzalez 
  jlbetanco...@uci.cu wrote:
 
  The response does not show any error, can you confirm that the data
 is in
  solr? you should be able to see the numDoc stats in the admin UI.
 Which
  version of Solr are you using? I believe that the example was tested
 on
  Solr 3.x at least at the time I use it.
 
  Regards,
 
  - Mensaje original -
  De: JMill apprentice...@googlemail.com
  Para: solr-user@lucene.apache.org
  Enviados: Miércoles, 25 de Septiembre 2013 10:57:31
  Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
 columns)
 
  I followed the instructions, I am able to browse to 
  http://localhost:8983/solr/ac/browse?q=cedebugQuery=true; but I am
 not
  getting any suggestions (typed in c in Find Textbox).
 
  I wonder if loading the example data is the problem?  The response I
 get
  after executing the script  feed-ac.sh (step 3

Re: Sorting dependent on user preferences with FunctionQuery

2013-09-26 Thread Ing. Jorge Luis Betancourt Gonzalez
I think you could use boosting queries: for group A you boost one category and 
for group B some other category.

- Mensaje original -
De: Snubbel solrforum.20.x...@spamgourmet.com
Para: solr-user@lucene.apache.org
Enviados: Jueves, 26 de Septiembre 2013 8:01:36
Asunto: Sorting dependent on user preferences with FunctionQuery

Hello,

I want to present to different user groups a search result in different
orders.
Say, i have customer group A, which I know prefers Books, I want to get
Books at the top of my query result, DVDs at the bottom.
And for group B, preferring DVD, these first.
In my index I have a field of type text named category with values Book
and DVD.

I thought maybe I could solve this with QueryFunctions, maybe like this:

 
select?q=*%3A*sort=query(qf=category v='Book')desc

but Solr returns Can't determine a Sort Order (asc or desc) in sort.

What is wrong? I tried different ways of formulating the query without
success...


Or, does anyone have a better idea how to solve this?

Best regards, Nikola



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-dependent-on-user-preferences-with-FunctionQuery-tp4092119.html
Sent from the Solr - User mailing list archive at Nabble.com.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-25 Thread Ing. Jorge Luis Betancourt Gonzalez
I've used a separated core for storing suggestions, based on what I see in: 
https://github.com/cominvent/autocomplete. You can check the blog post on 
www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/‎. This is 
really flexible, on the downside it does not use the suggester component su 
this are like regular queries against a separated core.

Greetings!

- Mensaje original -
De: Erick Erickson erickerick...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Miércoles, 25 de Septiembre 2013 6:16:51
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

I've sometimes seen this handled by clever tokenizing. For Bill Rogers,
index (untokenized) something like
Bill|Bill Rogers
Rogers|Bill Rogers

Your suggester then is a simple term lookup (see TermsComponent)
which is quite fast. What you _don't_ get is autocorrect. But if you
use terms.prefix, you can also control whether it's whole word match
or not. To get whole-word in the above, you would set your prefix to
Rogers| for instance. Or you may want to leave off the | to see
more of an autocomplete-type response.

Then, of course, when you display this you need to only display what's
after the | (or whatever delimiter you use).

One other note, this will be case sensitive, so you probably want to
do casing yourself, index things like
rogers|Bill Rogers
and lowercase what you send in to terms component.

Best,
Erick



On Tue, Sep 24, 2013 at 2:01 PM, JMill apprentice...@googlemail.com wrote:
 Hi,

 I'm using Solr's Suggester function to implement an autocomplete feature.
 I have it setup to check against the username and name fields.  Problem
 is when running  a query against the name, the second term, after
 whitespace (surename) returns 0 results.  Works if if query is a partial
 name starting from the begining e.g. Given the name Bill Rogers, a query
 for Rogers will return 0 results whereas a query for Bill will return
 positive (Bill Rogers). As for the username, it's not working at.

 I am after the following behaviour.

 Match any partial words in the fields username or name and return the
 results.  If there is match in the field name the return the whole name
 e.g. given the queries Rogers or Bill return Bill Rogers (not the
 single word that was a match).

 schema.xml extract
 ..
 field name=username type=text_general indexed=true stored=true /
  field name=name type=text_general indexed=true stored=true/
 field name=autocomplete type=textSpell indexed=true stored=false
 multiValued=true omitNorms=true omitTermFreqAndPositions=false /
 ...
 copyField source=username dest=autocomplete/
 copyField source=name dest=autocomplete/
 ...

 fieldType class=solr.TextField name=textSpell
 positionIncrementGap=100
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
 /fieldType


 solrconfig.xml

 
 lst name=spellchecker
str name=namesuggest/str
str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
str name=fieldautocomplete/str  !-- the indexed field to derive
 suggestions from --
float name=threshold0.005/float
str name=buildOnCommittrue/str
!--
   str name=sourceLocationamerican-english/str
   --
 /lst

 /searchComponent

 ..
 requestHandler class=org.apache.solr.handler.component.SearchHandler
 name=/suggest
   lst name=defaults
 str name=spellchecktrue/str
 str name=spellcheck.dictionarysuggest/str
 str name=spellcheck.onlyMorePopulartrue/str
 str name=spellcheck.count5/str
 str name=spellcheck.collatetrue/str
   /lst
   arr name=components
  strspellcheck/str
   /arr
 /requestHandler

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-25 Thread Ing. Jorge Luis Betancourt Gonzalez
The response does not show any error, can you confirm that the data is in solr? 
you should be able to see the numDoc stats in the admin UI. Which version of 
Solr are you using? I believe that the example was tested on Solr 3.x at least 
at the time I use it.

Regards,

- Mensaje original -
De: JMill apprentice...@googlemail.com
Para: solr-user@lucene.apache.org
Enviados: Miércoles, 25 de Septiembre 2013 10:57:31
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

I followed the instructions, I am able to browse to 
http://localhost:8983/solr/ac/browse?q=cedebugQuery=true; but I am not
getting any suggestions (typed in c in Find Textbox).

I wonder if loading the example data is the problem?  The response I get
after executing the script  feed-ac.sh (step 3) is the following.

user$ ./feed-ac.sh
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime2239/int/lst
/response

Are you able to confirm if this the expected response?




On Wed, Sep 25, 2013 at 1:46 PM, Ing. Jorge Luis Betancourt Gonzalez 
jlbetanco...@uci.cu wrote:

 I've used a separated core for storing suggestions, based on what I see
 in: https://github.com/cominvent/autocomplete. You can check the blog
 post on
 www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/.
 This is really flexible, on the downside it does not use the suggester
 component su this are like regular queries against a separated core.

 Greetings!

 - Mensaje original -
 De: Erick Erickson erickerick...@gmail.com
 Para: solr-user@lucene.apache.org
 Enviados: Miércoles, 25 de Septiembre 2013 6:16:51
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

 I've sometimes seen this handled by clever tokenizing. For Bill Rogers,
 index (untokenized) something like
 Bill|Bill Rogers
 Rogers|Bill Rogers

 Your suggester then is a simple term lookup (see TermsComponent)
 which is quite fast. What you _don't_ get is autocorrect. But if you
 use terms.prefix, you can also control whether it's whole word match
 or not. To get whole-word in the above, you would set your prefix to
 Rogers| for instance. Or you may want to leave off the | to see
 more of an autocomplete-type response.

 Then, of course, when you display this you need to only display what's
 after the | (or whatever delimiter you use).

 One other note, this will be case sensitive, so you probably want to
 do casing yourself, index things like
 rogers|Bill Rogers
 and lowercase what you send in to terms component.

 Best,
 Erick



 On Tue, Sep 24, 2013 at 2:01 PM, JMill apprentice...@googlemail.com
 wrote:
  Hi,
 
  I'm using Solr's Suggester function to implement an autocomplete feature.
  I have it setup to check against the username and name fields.
  Problem
  is when running  a query against the name, the second term, after
  whitespace (surename) returns 0 results.  Works if if query is a partial
  name starting from the begining e.g. Given the name Bill Rogers, a
 query
  for Rogers will return 0 results whereas a query for Bill will return
  positive (Bill Rogers). As for the username, it's not working at.
 
  I am after the following behaviour.
 
  Match any partial words in the fields username or name and return the
  results.  If there is match in the field name the return the whole name
  e.g. given the queries Rogers or Bill return Bill Rogers (not the
  single word that was a match).
 
  schema.xml extract
  ..
  field name=username type=text_general indexed=true stored=true
 /
   field name=name type=text_general indexed=true stored=true/
  field name=autocomplete type=textSpell indexed=true stored=false
  multiValued=true omitNorms=true omitTermFreqAndPositions=false /
  ...
  copyField source=username dest=autocomplete/
  copyField source=name dest=autocomplete/
  ...
 
  fieldType class=solr.TextField name=textSpell
  positionIncrementGap=100
   analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StandardFilterFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
  /fieldType
 
 
  solrconfig.xml
 
  
  lst name=spellchecker
 str name=namesuggest/str
 str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str
  name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
 str name=fieldautocomplete/str  !-- the indexed field to derive
  suggestions from --
 float name=threshold0.005/float
 str name=buildOnCommittrue/str
 !--
str name=sourceLocationamerican-english/str
--
  /lst
 
  /searchComponent
 
  ..
  requestHandler class=org.apache.solr.handler.component.SearchHandler
  name=/suggest
lst name=defaults
  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count5/str
  str name=spellcheck.collatetrue/str
/lst
arr name=components

Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-25 Thread Ing. Jorge Luis Betancourt Gonzalez
Perhaps this could be an issue, I know that this works perfectly in solr 3.6 
(this is the one I was using). Currently I don't have a solr 4.4 to do some 
tests, but what have been done in that core should work in solr 4.4, perhaps 
there is a setting that need some tweaking but it's impossible of knowing 
without checking the logs. In case that any incompatibility is present it 
should pop out on the logs.

Regards,

- Mensaje original -
De: JMill apprentice...@googlemail.com
Para: solr-user@lucene.apache.org
Enviados: Miércoles, 25 de Septiembre 2013 11:10:32
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

I simple query through admin (*:*) confirms the data is exists. The version
I'm working with is solr 4.4.0. The autocomplete manual refers to 3.x. I
wonder of this is the problem?


On Wed, Sep 25, 2013 at 4:01 PM, Ing. Jorge Luis Betancourt Gonzalez 
jlbetanco...@uci.cu wrote:

 The response does not show any error, can you confirm that the data is in
 solr? you should be able to see the numDoc stats in the admin UI. Which
 version of Solr are you using? I believe that the example was tested on
 Solr 3.x at least at the time I use it.

 Regards,

 - Mensaje original -
 De: JMill apprentice...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviados: Miércoles, 25 de Septiembre 2013 10:57:31
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

 I followed the instructions, I am able to browse to 
 http://localhost:8983/solr/ac/browse?q=cedebugQuery=true; but I am not
 getting any suggestions (typed in c in Find Textbox).

 I wonder if loading the example data is the problem?  The response I get
 after executing the script  feed-ac.sh (step 3) is the following.

 user$ ./feed-ac.sh
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime2239/int/lst
 /response

 Are you able to confirm if this the expected response?




 On Wed, Sep 25, 2013 at 1:46 PM, Ing. Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:

  I've used a separated core for storing suggestions, based on what I see
  in: https://github.com/cominvent/autocomplete. You can check the blog
  post on
  www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/.
  This is really flexible, on the downside it does not use the suggester
  component su this are like regular queries against a separated core.
 
  Greetings!
 
  - Mensaje original -
  De: Erick Erickson erickerick...@gmail.com
  Para: solr-user@lucene.apache.org
  Enviados: Miércoles, 25 de Septiembre 2013 6:16:51
  Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
 columns)
 
  I've sometimes seen this handled by clever tokenizing. For Bill Rogers,
  index (untokenized) something like
  Bill|Bill Rogers
  Rogers|Bill Rogers
 
  Your suggester then is a simple term lookup (see TermsComponent)
  which is quite fast. What you _don't_ get is autocorrect. But if you
  use terms.prefix, you can also control whether it's whole word match
  or not. To get whole-word in the above, you would set your prefix to
  Rogers| for instance. Or you may want to leave off the | to see
  more of an autocomplete-type response.
 
  Then, of course, when you display this you need to only display what's
  after the | (or whatever delimiter you use).
 
  One other note, this will be case sensitive, so you probably want to
  do casing yourself, index things like
  rogers|Bill Rogers
  and lowercase what you send in to terms component.
 
  Best,
  Erick
 
 
 
  On Tue, Sep 24, 2013 at 2:01 PM, JMill apprentice...@googlemail.com
  wrote:
   Hi,
  
   I'm using Solr's Suggester function to implement an autocomplete
 feature.
   I have it setup to check against the username and name fields.
   Problem
   is when running  a query against the name, the second term, after
   whitespace (surename) returns 0 results.  Works if if query is a
 partial
   name starting from the begining e.g. Given the name Bill Rogers, a
  query
   for Rogers will return 0 results whereas a query for Bill will return
   positive (Bill Rogers). As for the username, it's not working at.
  
   I am after the following behaviour.
  
   Match any partial words in the fields username or name and return
 the
   results.  If there is match in the field name the return the whole
 name
   e.g. given the queries Rogers or Bill return Bill Rogers (not the
   single word that was a match).
  
   schema.xml extract
   ..
   field name=username type=text_general indexed=true stored=true
  /
field name=name type=text_general indexed=true stored=true/
   field name=autocomplete type=textSpell indexed=true
 stored=false
   multiValued=true omitNorms=true omitTermFreqAndPositions=false /
   ...
   copyField source=username dest=autocomplete/
   copyField source=name dest=autocomplete/
   ...
  
   fieldType class=solr.TextField name=textSpell
   positionIncrementGap=100
analyzer

Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-25 Thread Ing. Jorge Luis Betancourt Gonzalez
Try quering the core where the data has been imported, something like:

http://localhost:8983/solr/suggestions/select?q=uc

In the previous URL suggestions is the name I give to the core, so this should 
change, if you get results, then the problem could be the jquery dependency. I 
don't remember doing any change, as far as I know that js file is bundled with 
solr (at leat in 3.x) version perhaps you could change it the correct jquery 
version on solr 4.4, if you go into the admin panel (in solr 3.6):

http://localhost:8983/solr/admin/schema.jsp

And inspect the loaded code, the required file (jquery-1.4.2.min.js) gets 
loaded in solr 4.4 it should load a similar file, but perhaps a more recent 
version.

Perhaps you could change that part to something like:

  script type=text/javascript 
src=#{url_root}/js/lib/jquery-1.7.2.min.js/script

Which is used at least on a solr 4.1 that I have laying aroud here somewhere.

In any case you can test the suggestions using the URL that I suggest on the 
top of this mail, in that case you should be able to see the possible results, 
of course in a less fancy way.

- Mensaje original -
De: JMill apprentice...@googlemail.com
Para: solr-user@lucene.apache.org
Enviados: Miércoles, 25 de Septiembre 2013 13:59:32
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

Could it be the jquery library that is the problem?   I opened up
solr-home/ac/conf/velocity/head.vm with an editor and I see a reference to
the jquery library but I can't seem to find the directory referenced,
 line:  script type=text/javascript
src=#{url_for_solr}/admin/jquery-1.4.3.min.js. Do you know where
#{url_for_solr} points to?


On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez 
jlbetanco...@uci.cu wrote:
 Perhaps this could be an issue, I know that this works perfectly in solr
3.6 (this is the one I was using). Currently I don't have a solr 4.4 to do
some tests, but what have been done in that core should work in solr 4.4,
perhaps there is a setting that need some tweaking but it's impossible of
knowing without checking the logs. In case that any incompatibility is
present it should pop out on the logs.

 Regards,

 - Mensaje original -
 De: JMill apprentice...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviados: Miércoles, 25 de Septiembre 2013 11:10:32
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
columns)

 I simple query through admin (*:*) confirms the data is exists. The
version
 I'm working with is solr 4.4.0. The autocomplete manual refers to 3.x. I
 wonder of this is the problem?


 On Wed, Sep 25, 2013 at 4:01 PM, Ing. Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:

 The response does not show any error, can you confirm that the data is in
 solr? you should be able to see the numDoc stats in the admin UI. Which
 version of Solr are you using? I believe that the example was tested on
 Solr 3.x at least at the time I use it.

 Regards,

 - Mensaje original -
 De: JMill apprentice...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviados: Miércoles, 25 de Septiembre 2013 10:57:31
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
columns)

 I followed the instructions, I am able to browse to 
 http://localhost:8983/solr/ac/browse?q=cedebugQuery=true; but I am not
 getting any suggestions (typed in c in Find Textbox).

 I wonder if loading the example data is the problem?  The response I get
 after executing the script  feed-ac.sh (step 3) is the following.

 user$ ./feed-ac.sh
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime2239/int/lst
 /response

 Are you able to confirm if this the expected response?




 On Wed, Sep 25, 2013 at 1:46 PM, Ing. Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:

  I've used a separated core for storing suggestions, based on what I see
  in: https://github.com/cominvent/autocomplete. You can check the blog
  post on
  www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/.
  This is really flexible, on the downside it does not use the suggester
  component su this are like regular queries against a separated core.
 
  Greetings!
 
  - Mensaje original -
  De: Erick Erickson erickerick...@gmail.com
  Para: solr-user@lucene.apache.org
  Enviados: Miércoles, 25 de Septiembre 2013 6:16:51
  Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
 columns)
 
  I've sometimes seen this handled by clever tokenizing. For Bill
Rogers,
  index (untokenized) something like
  Bill|Bill Rogers
  Rogers|Bill Rogers
 
  Your suggester then is a simple term lookup (see TermsComponent)
  which is quite fast. What you _don't_ get is autocorrect. But if you
  use terms.prefix, you can also control whether it's whole word match
  or not. To get whole-word in the above, you would set your prefix to
  Rogers| for instance. Or you may want to leave off

Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-25 Thread Ing. Jorge Luis Betancourt Gonzalez
As far as I can tell it is. You can check that by seeing the Console logs on 
your browser (chrome, firefox, etc.). There should be an error saying that the 
$ function it's not found. In any case I'll try to set up a testing environment 
here, but I can only use solr 4.1, which I have here. I haven't 
downloaded/tested the 4.4 version yet. Do you try replacing the line that 
includes the jquery-1.4.3.min.js with the new one?

- Mensaje original -
De: JMill apprentice...@googlemail.com
Para: solr-user@lucene.apache.org
Enviados: Miércoles, 25 de Septiembre 2013 14:44:53
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

That seems to work. I get back an xml containing a bunch of suggestions.
Can we agree that it's jquery that's the problem?

On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez 
jlbetanco...@uci.cu wrote:
 Try quering the core where the data has been imported, something like:

 http://localhost:8983/solr/suggestions/select?q=uc

 In the previous URL suggestions is the name I give to the core, so this
should change, if you get results, then the problem could be the jquery
dependency. I don't remember doing any change, as far as I know that js
file is bundled with solr (at leat in 3.x) version perhaps you could change
it the correct jquery version on solr 4.4, if you go into the admin panel
(in solr 3.6):

 http://localhost:8983/solr/admin/schema.jsp

 And inspect the loaded code, the required file (jquery-1.4.2.min.js) gets
loaded in solr 4.4 it should load a similar file, but perhaps a more recent
version.

 Perhaps you could change that part to something like:

   script type=text/javascript
src=#{url_root}/js/lib/jquery-1.7.2.min.js/script

 Which is used at least on a solr 4.1 that I have laying aroud here
somewhere.

 In any case you can test the suggestions using the URL that I suggest on
the top of this mail, in that case you should be able to see the possible
results, of course in a less fancy way.

 - Mensaje original -
 De: JMill apprentice...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviados: Miércoles, 25 de Septiembre 2013 13:59:32
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
columns)

 Could it be the jquery library that is the problem?   I opened up
 solr-home/ac/conf/velocity/head.vm with an editor and I see a reference to
 the jquery library but I can't seem to find the directory referenced,
  line:  script type=text/javascript
 src=#{url_for_solr}/admin/jquery-1.4.3.min.js. Do you know where
 #{url_for_solr} points to?


 On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:
 Perhaps this could be an issue, I know that this works perfectly in solr
 3.6 (this is the one I was using). Currently I don't have a solr 4.4 to do
 some tests, but what have been done in that core should work in solr 4.4,
 perhaps there is a setting that need some tweaking but it's impossible of
 knowing without checking the logs. In case that any incompatibility is
 present it should pop out on the logs.

 Regards,

 - Mensaje original -
 De: JMill apprentice...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviados: Miércoles, 25 de Septiembre 2013 11:10:32
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
 columns)

 I simple query through admin (*:*) confirms the data is exists. The
 version
 I'm working with is solr 4.4.0. The autocomplete manual refers to 3.x. I
 wonder of this is the problem?


 On Wed, Sep 25, 2013 at 4:01 PM, Ing. Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:

 The response does not show any error, can you confirm that the data is
in
 solr? you should be able to see the numDoc stats in the admin UI. Which
 version of Solr are you using? I believe that the example was tested on
 Solr 3.x at least at the time I use it.

 Regards,

 - Mensaje original -
 De: JMill apprentice...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviados: Miércoles, 25 de Septiembre 2013 10:57:31
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
 columns)

 I followed the instructions, I am able to browse to 
 http://localhost:8983/solr/ac/browse?q=cedebugQuery=true; but I am not
 getting any suggestions (typed in c in Find Textbox).

 I wonder if loading the example data is the problem?  The response I get
 after executing the script  feed-ac.sh (step 3) is the following.

 user$ ./feed-ac.sh
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime2239/int/lst
 /response

 Are you able to confirm if this the expected response?




 On Wed, Sep 25, 2013 at 1:46 PM, Ing. Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:

  I've used a separated core for storing suggestions, based on what I
see
  in: https://github.com/cominvent/autocomplete. You can check the blog
  post on
  www.cominvent.com/2012/01/25/super-flexible-autocomplete

Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-25 Thread Ing. Jorge Luis Betancourt Gonzalez
That's and indication that jQuery can't be loaded, and without jQuery the 
autocomplete plugin won't work. This plugin is used to show the popup list that 
show up at the bottom of the input.

- Mensaje original -
De: JMill apprentice...@googlemail.com
Para: solr-user@lucene.apache.org
Enviados: Miércoles, 25 de Septiembre 2013 15:40:00
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

Not yet but I do see the $ not found in console.

On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez 
jlbetanco...@uci.cu wrote:
 As far as I can tell it is. You can check that by seeing the Console logs
on your browser (chrome, firefox, etc.). There should be an error saying
that the $ function it's not found. In any case I'll try to set up a
testing environment here, but I can only use solr 4.1, which I have here. I
haven't downloaded/tested the 4.4 version yet. Do you try replacing the
line that includes the jquery-1.4.3.min.js with the new one?

 - Mensaje original -
 De: JMill apprentice...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviados: Miércoles, 25 de Septiembre 2013 14:44:53
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
columns)

 That seems to work. I get back an xml containing a bunch of suggestions.
 Can we agree that it's jquery that's the problem?

 On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:
 Try quering the core where the data has been imported, something like:

 http://localhost:8983/solr/suggestions/select?q=uc

 In the previous URL suggestions is the name I give to the core, so this
 should change, if you get results, then the problem could be the jquery
 dependency. I don't remember doing any change, as far as I know that js
 file is bundled with solr (at leat in 3.x) version perhaps you could
change
 it the correct jquery version on solr 4.4, if you go into the admin panel
 (in solr 3.6):

 http://localhost:8983/solr/admin/schema.jsp

 And inspect the loaded code, the required file (jquery-1.4.2.min.js) gets
 loaded in solr 4.4 it should load a similar file, but perhaps a more
recent
 version.

 Perhaps you could change that part to something like:

   script type=text/javascript
 src=#{url_root}/js/lib/jquery-1.7.2.min.js/script

 Which is used at least on a solr 4.1 that I have laying aroud here
 somewhere.

 In any case you can test the suggestions using the URL that I suggest on
 the top of this mail, in that case you should be able to see the possible
 results, of course in a less fancy way.

 - Mensaje original -
 De: JMill apprentice...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviados: Miércoles, 25 de Septiembre 2013 13:59:32
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
 columns)

 Could it be the jquery library that is the problem?   I opened up
 solr-home/ac/conf/velocity/head.vm with an editor and I see a reference
to
 the jquery library but I can't seem to find the directory referenced,
  line:  script type=text/javascript
 src=#{url_for_solr}/admin/jquery-1.4.3.min.js. Do you know where
 #{url_for_solr} points to?


 On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:
 Perhaps this could be an issue, I know that this works perfectly in solr
 3.6 (this is the one I was using). Currently I don't have a solr 4.4 to
do
 some tests, but what have been done in that core should work in solr 4.4,
 perhaps there is a setting that need some tweaking but it's impossible of
 knowing without checking the logs. In case that any incompatibility is
 present it should pop out on the logs.

 Regards,

 - Mensaje original -
 De: JMill apprentice...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviados: Miércoles, 25 de Septiembre 2013 11:10:32
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
 columns)

 I simple query through admin (*:*) confirms the data is exists. The
 version
 I'm working with is solr 4.4.0. The autocomplete manual refers to 3.x. I
 wonder of this is the problem?


 On Wed, Sep 25, 2013 at 4:01 PM, Ing. Jorge Luis Betancourt Gonzalez 
 jlbetanco...@uci.cu wrote:

 The response does not show any error, can you confirm that the data is
 in
 solr? you should be able to see the numDoc stats in the admin UI. Which
 version of Solr are you using? I believe that the example was tested on
 Solr 3.x at least at the time I use it.




III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Suggest and Filtering

2013-06-13 Thread Ing. Jorge Luis Betancourt Gonzalez
If is query suggestion what you are looking for, what we've done is storing the 
user queries into a separated core and pull the suggestions from there. 

- Mensaje original -
De: Brendan Grainger brendan.grain...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Jueves, 13 de Junio 2013 19:43:03
Asunto: Suggest and Filtering

Hi Solr Guru's

I am trying to implement auto suggest where solr would suggest several
phrases that would return results as the user types in a query (as distinct
from autocomplete). e.g. say the user starts typing 'br' and we have
documents that contain brake pads and left disc brake, solr would
suggest both of those phrases with brake pads first. I also want to only
look at documents that match a given filter query. So say I have a bunch of
documents for a toyota cressida that contain the bi-gram brake pads,
while the documents for a honda accord don't have any brake pad articles.
If the user is filtering on the honda accord I wouldn't want brake pads
as a suggestion.

Right now, I've played with the suggest component and using faceting.

Any thoughts?

Thanks
Brendan

-- 
Brendan Grainger
www.kuripai.com

http://www.uci.cu
http://www.uci.cu


Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException

2013-04-24 Thread Jorge Luis Betancourt Gonzalez
One more thing:

The hack that you commented when the query is a combination of restricted query 
operators such +-, +, --++--+%, etc? In this cases the application has to 
deal with all this cases to.

Greetings!

- Mensaje original -
De: Jérôme Étévé jerome.et...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Martes, 23 de Abril 2013 10:44:39
Asunto: Re: Querying only for + character causes 
org.apache.lucene.queryParser.ParseException

If you want to allow your users to search for '+' , you also define your
'+' as being a regular ALPHA characters:

In config:

delimiter_types.txt:

#
# We let +, # and * be part of normal words.
# This is to let c++, c#, c* and RD as words.
#
+ = ALPHA
 # = ALPHA
* = ALPHA
 = ALPHA
@ = ALPHA

Then in your solr.WordDelimiterFilterFactory,
use types=delimiter_types.txt


You'll then be able to let your users search for + as part of a word.

If you want to allow them to search for just '+' , a little hacking is
necessary in your client code. Personally, I just  double quote the query
if it's only one char length. Can't be harmful and as it will turn your
single + into + , it will be considered as a token (rather than being
part of the query syntax) by the parser.

Providing you're using the edismax parser, it should be just fine for any
other queries, like '+ foo' , 'foo +', '++' ...


J.


On 23 April 2013 15:09, Jorge Luis Betancourt Gonzalez
jlbetanco...@uci.cuwrote:

 Hi Kai:

 Thanks for your reply, for what I've understood this logic must be
 included in my application, It would be possible to, for instance, use some
 regular expression at querying time in my schema to avoid a query that
 contains only this characters? for instance + and + would be a good
 catch to avoid.

 Thanks in advance!

 - Mensaje original -
 De: Kai Becker m...@kai-becker.com
 Para: solr-user@lucene.apache.org
 Enviados: Martes, 23 de Abril 2013 9:48:26
 Asunto: Re: Querying only for + character causes
 org.apache.lucene.queryParser.ParseException

 Hi,

 you need to escape that char in search terms.
 Special chars are + - ! ( ) { } [ ] ^  ~ * ? : \ / at the moment.

 The %2B is just the url encoding, but it will still be a + for Solr, so
 just put a \ in front of the chars I mentioned.

 Cheers,
 Kai

 Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez:

  Hi!
 
  Currently I'm working on a basica search engine for, the main problem is
 that during some tests a problem was detected, in the application if a user
 search for the + or - term only or the + string it causes an
 exception in my application, the problem is caused for an
 org.apache.lucene.queryParser.ParseException in solr. I get the same
 response if, from the solr admin interface, I search for the + term. For
 what I've seen the + character gets encoded into %2B which cause the
 exception. Is there any way of escaping this character so they behave like
 any other character? or at least get no response for this cases?
 
  I'm using solr 3.6.2, deployed in tomcat7.
 
  Greetings!
  http://www.uci.cu

 http://www.uci.cu
 http://www.uci.cu




--
Jerome Eteve
+44(0)7738864546
http://www.eteve.net/

http://www.uci.cu
http://www.uci.cu


Querying only for + character causes org.apache.lucene.queryParser.ParseException

2013-04-23 Thread Jorge Luis Betancourt Gonzalez
Hi!

Currently I'm working on a basica search engine for, the main problem is that 
during some tests a problem was detected, in the application if a user search 
for the + or - term only or the + string it causes an exception in my 
application, the problem is caused for an 
org.apache.lucene.queryParser.ParseException in solr. I get the same response 
if, from the solr admin interface, I search for the + term. For what I've seen 
the + character gets encoded into %2B which cause the exception. Is there 
any way of escaping this character so they behave like any other character? or 
at least get no response for this cases? 

I'm using solr 3.6.2, deployed in tomcat7.

Greetings! 
http://www.uci.cu


Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException

2013-04-23 Thread Jorge Luis Betancourt Gonzalez
Hi Kai:

Thanks for your reply, for what I've understood this logic must be included in 
my application, It would be possible to, for instance, use some regular 
expression at querying time in my schema to avoid a query that contains only 
this characters? for instance + and + would be a good catch to avoid.

Thanks in advance!

- Mensaje original -
De: Kai Becker m...@kai-becker.com
Para: solr-user@lucene.apache.org
Enviados: Martes, 23 de Abril 2013 9:48:26
Asunto: Re: Querying only for + character causes 
org.apache.lucene.queryParser.ParseException

Hi,

you need to escape that char in search terms.
Special chars are + - ! ( ) { } [ ] ^  ~ * ? : \ / at the moment.

The %2B is just the url encoding, but it will still be a + for Solr, so just 
put a \ in front of the chars I mentioned.

Cheers,
Kai

Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez:

 Hi!
 
 Currently I'm working on a basica search engine for, the main problem is that 
 during some tests a problem was detected, in the application if a user search 
 for the + or - term only or the + string it causes an exception in 
 my application, the problem is caused for an 
 org.apache.lucene.queryParser.ParseException in solr. I get the same response 
 if, from the solr admin interface, I search for the + term. For what I've 
 seen the + character gets encoded into %2B which cause the exception. Is 
 there any way of escaping this character so they behave like any other 
 character? or at least get no response for this cases? 
 
 I'm using solr 3.6.2, deployed in tomcat7.
 
 Greetings! 
 http://www.uci.cu

http://www.uci.cu
http://www.uci.cu


Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException

2013-04-23 Thread Jorge Luis Betancourt Gonzalez
Hi Jérôme:

Thanks for your suggestion Jérôme, I'll do as you told me for allowing the 
search of this specific tokens. I've also taked into account the option of add 
the quote if lenght is 1 in the application level, but I would like to keep 
this logic inside of Solr (if possible), this is why I was thinking of some 
kind of replace regular expresion at query time, so if this change in the 
future it won't require also changing the application level, can you advice me 
on this?

Greetings!

- Mensaje original -
De: Jérôme Étévé jerome.et...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Martes, 23 de Abril 2013 10:44:39
Asunto: Re: Querying only for + character causes 
org.apache.lucene.queryParser.ParseException

If you want to allow your users to search for '+' , you also define your
'+' as being a regular ALPHA characters:

In config:

delimiter_types.txt:

#
# We let +, # and * be part of normal words.
# This is to let c++, c#, c* and RD as words.
#
+ = ALPHA
 # = ALPHA
* = ALPHA
 = ALPHA
@ = ALPHA

Then in your solr.WordDelimiterFilterFactory,
use types=delimiter_types.txt


You'll then be able to let your users search for + as part of a word.

If you want to allow them to search for just '+' , a little hacking is
necessary in your client code. Personally, I just  double quote the query
if it's only one char length. Can't be harmful and as it will turn your
single + into + , it will be considered as a token (rather than being
part of the query syntax) by the parser.

Providing you're using the edismax parser, it should be just fine for any
other queries, like '+ foo' , 'foo +', '++' ...


J.


On 23 April 2013 15:09, Jorge Luis Betancourt Gonzalez
jlbetanco...@uci.cuwrote:

 Hi Kai:

 Thanks for your reply, for what I've understood this logic must be
 included in my application, It would be possible to, for instance, use some
 regular expression at querying time in my schema to avoid a query that
 contains only this characters? for instance + and + would be a good
 catch to avoid.

 Thanks in advance!

 - Mensaje original -
 De: Kai Becker m...@kai-becker.com
 Para: solr-user@lucene.apache.org
 Enviados: Martes, 23 de Abril 2013 9:48:26
 Asunto: Re: Querying only for + character causes
 org.apache.lucene.queryParser.ParseException

 Hi,

 you need to escape that char in search terms.
 Special chars are + - ! ( ) { } [ ] ^  ~ * ? : \ / at the moment.

 The %2B is just the url encoding, but it will still be a + for Solr, so
 just put a \ in front of the chars I mentioned.

 Cheers,
 Kai

 Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez:

  Hi!
 
  Currently I'm working on a basica search engine for, the main problem is
 that during some tests a problem was detected, in the application if a user
 search for the + or - term only or the + string it causes an
 exception in my application, the problem is caused for an
 org.apache.lucene.queryParser.ParseException in solr. I get the same
 response if, from the solr admin interface, I search for the + term. For
 what I've seen the + character gets encoded into %2B which cause the
 exception. Is there any way of escaping this character so they behave like
 any other character? or at least get no response for this cases?
 
  I'm using solr 3.6.2, deployed in tomcat7.
 
  Greetings!
  http://www.uci.cu

 http://www.uci.cu
 http://www.uci.cu




--
Jerome Eteve
+44(0)7738864546
http://www.eteve.net/

http://www.uci.cu
http://www.uci.cu


Getting better snippets in highlighting component

2013-03-29 Thread Jorge Luis Betancourt Gonzalez
Hi all:

I'm building a document search plattform, basically indexing a lot of PDF 
files. Some of this files has an index, which means that when I query for 
normativos in my application (built using Symfony2+PHP+Solarium) I get a few 
results like this:

10
 6.2 Elementos normativos generales 
12
 6.3 Elementos normativos técnicos 
..32
 ANEXOS A Formas verbales (normativo

Which is a bit of a problem, is there any way I can get rid of this dots? Is 
there any sort of relevance in the snippets that the highlighting components 
returns? I mean in this particular case, the snippet came from the index page 
of the PDF which I hardly think is the best snippet in the document for this 
particular query, any thought on this? Is there any golden rule to treat 
cases like this?

Thanks a lot!
http://www.uci.cu


Re: Getting better snippets in highlighting component

2013-03-29 Thread Jorge Luis Betancourt Gonzalez
Hi Jack:

Thanks for the reply, exactly I know is a common thing to encounter this TOC in 
a lot of files, I'm plying with the regex fragmenter to be a little more 
selective about the generated snippets, but no luck so far.

- Mensaje original -
De: Jack Krupansky j...@basetechnology.com
Para: solr-user@lucene.apache.org
Enviados: Sábado, 30 de Marzo 2013 0:40:03
Asunto: Re: Getting better snippets in highlighting component

It looks like a table of contents. The dots are followed by the page number,
followed by the text from the next table of contents entry, and repeat.

Even Google doesn't do anything special for this. For example, search for
chapter 1 chapter 2 pdf:

[PDF]
2013 Publication 505 - Internal Revenue Service
www.irs.gov/pub/irs-pdf/p505.pdfFile Format: PDF/Adobe Acrobat
Mar 21, 2013 – Introduction . . . . . . . . . . . . . . . . . . 1. What's
New for 2013 . . . . . . . . . . . . . 2. Reminders . . . . . . . . . . . .
. . . . . . . 2. Chapter 1. Tax Withholding for ...

I'm sure somebody can come up with a clever heuristic to avoid this kind of
thing.

Maybe simply truncate any sequence of white space and only punctuation down
to two or three characters or so.

-- Jack Krupansky
-Original Message-
From: Jorge Luis Betancourt Gonzalez
Sent: Friday, March 29, 2013 10:34 PM
To: solr-user@lucene.apache.org
Subject: Getting better snippets in highlighting component

Hi all:

I'm building a document search plattform, basically indexing a lot of PDF
files. Some of this files has an index, which means that when I query for
normativos in my application (built using Symfony2+PHP+Solarium) I get a 
few results like this:

10
6.2 Elementos normativos generales
12
6.3 Elementos normativos técnicos
..32
ANEXOS A Formas verbales (normativo

Which is a bit of a problem, is there any way I can get rid of this dots? Is
there any sort of relevance in the snippets that the highlighting components
returns? I mean in this particular case, the snippet came from the index
page of the PDF which I hardly think is the best snippet in the document for
this particular query, any thought on this? Is there any golden rule to
treat cases like this?

Thanks a lot!
http://www.uci.cu

http://www.uci.cu
http://www.uci.cu


Question about email search

2013-03-14 Thread Jorge Luis Betancourt Gonzalez
I'm using solr 3.6.2 to crawl some data using nutch, in my schema I've one 
field with all the content extracted from the page, which could possibly 
include email addresses, this is the configuration of my schema:

fieldType name=text class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
languange=Spanish/
charFilter class=solr.HTMLStripCharFilterFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1   
catenateWords=1 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

The thing is that I'm trying to search against a field of this type (text) with 
a value like @gmail.com and I'm intended to get documents with that text, any 
advice?

slds
--
It is only in the mysterious equation of love that any 
logical reasons can be found.
Good programmers often confuse halloween (31 OCT) with 
christmas (25 DEC)



Re: Question about email search

2013-03-14 Thread Jorge Luis Betancourt Gonzalez
Sorry for the duplicated mail :-(, any advice on a configuration for searching 
emails in a field that does not have only email addresses, so the email 
addresses are contained in larger textual messages?

- Mensaje original -
De: Ahmet Arslan iori...@yahoo.com
Para: solr-user@lucene.apache.org
Enviados: Jueves, 14 de Marzo 2013 11:23:47
Asunto: Re: Question about email search

Hi,

Since you have word delimiter filter in your analysis chain, I am not sure if 
e-mail addresses are recognised. You can check that on solr admin UI, analysis 
page.

If e-mail addresses kept one token, I would use leading wildcard query.
q=*@gmail.com

There was a similar question recently:
http://search-lucene.com/m/XF2ejnM6Vi2

--- On Thu, 3/14/13, Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu wrote:

 From: Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu
 Subject: Question about email search
 To: solr-user@lucene.apache.org
 Date: Thursday, March 14, 2013, 5:11 PM
 I'm using solr 3.6.2 to crawl some
 data using nutch, in my schema I've one field with all the
 content extracted from the page, which could possibly
 include email addresses, this is the configuration of my
 schema:

         fieldType name=text
 class=solr.TextField
            
 positionIncrementGap=100
 autoGeneratePhraseQueries=true
             analyzer
 type=index
                
 tokenizer class=solr.StandardTokenizerFactory/
                
 filter class=solr.StandardFilterFactory/
                
 filter class=solr.ISOLatin1AccentFilterFactory/
                
 filter class=solr.SnowballPorterFilterFactory
 languange=Spanish/
                
 charFilter class=solr.HTMLStripCharFilterFactory/
                
 filter class=solr.StopFilterFactory
                
     ignoreCase=true words=stopwords.txt/
                
 filter class=solr.WordDelimiterFilterFactory
                
     generateWordParts=1
 generateNumberParts=1   
                
     catenateWords=1 catenateNumbers=1
 catenateAll=0
                
     splitOnCaseChange=1/
                
 filter class=solr.LowerCaseFilterFactory/
                
 filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
             /analyzer
         /fieldType

 The thing is that I'm trying to search against a field of
 this type (text) with a value like @gmail.com and I'm
 intended to get documents with that text, any advice?

 slds
 --
 It is only in the mysterious equation of love that any
 logical reasons can be found.
 Good programmers often confuse halloween (31 OCT) with
 christmas (25 DEC)




Re: Using suggester for smarter phrase autocomplete

2013-03-13 Thread Jorge Luis Betancourt Gonzalez
Currently I'm using a separated core to query suggestions, for this I've 
started from: https://github.com/cominvent/autocomplete. Basically the 
suggester component I'm only using it for term suggestions based on the a 
tokenized field in my schema (all of this in solr 3.6), perhaps instead of 
using the suggester component could you use a more similar approach (more like 
the one on the github repo).

- Mensaje original -
De: Eric Wilson wilson.eri...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Miércoles, 13 de Marzo 2013 13:11:05
Asunto: Re: Using suggester for smarter phrase autocomplete

I'm not concerned about stopwords, rather the situation where the first and
second words are rarely used together, so don't occur together in a phrase
in the dictionary. Thanks.

On Wed, Mar 13, 2013 at 11:11 AM, Robert Muir rcm...@gmail.com wrote:

 On Wed, Mar 13, 2013 at 11:07 AM, Eric Wilson wilson.eri...@gmail.com
 wrote:
  I'm trying to use the suggester for auto-completion with Solr 4. I have
  followed the example configuration for phrase suggestions at the bottom
 of
  this wiki page:
  http://wiki.apache.org/solr/Suggester
 https://mail.manta.com/owa/redir.aspx?C=a570b5bb74f64f4fb810ba260e304ec5URL=http%3a%2f%2fwiki.apache.org%2fsolr%2fSuggester
 
 
  This shows how to use a text file with the following text for phrase
  suggestions:
 
  # simple auto-suggest phrase dictionary for testing
  # note this uses tabs as separator!
  the first phrase1.0
  the second phrase   2.0
  testing 12343.0
  foo 5.0
  the fifth phrase2.0
  the final phrase4.0
 
  This seems to be working in the expected way. If I query for the f I
  receive the following suggestions:
 
   strthe final phrase/str
   strthe fifth phrase/str
   strthe first phrase/str
 
  I would like to deal with the case where the user is interested in the
  foo. When the fo is entered, there will be no suggestions. Is it
  possible to provide both the phrase matches, and the matches for
 individual
  words, so that when the user entered text is no longer part of any actual
  phrase, there are still suggestions to be made for the final word?
 

 Is it really the case that you want matches for individual words, or
 just to handle e.g. the stopwords case like 'the fo' - foo ?

 the latter can be done with analyzingsuggester (configure a stopfilter
 on the analyzer).



Re: Building a central index with Lucene + Solr

2013-03-05 Thread Jorge Luis Betancourt Gonzalez
Agreed, PHP and Solr are an excellent combination. I'm using Solr 3.6 + PHP 
(Symfony2 + NelmioSolariumBundle + Solarium) and getting excellent results. 
Even solarium as a PHP library is great, right now it lack's of solr4 support, 
but for solr 3.6 it's great.

- Mensaje original -
De: David Quarterman da...@corexe.com
Para: solr-user@lucene.apache.org
Enviados: Martes, 5 de Marzo 2013 10:56:18
Asunto: RE: Building a central index with Lucene + Solr

Hi Alvaro,

I agree with Otis  Alexandre (esp. Windows + PHP!). However, there are plenty 
of people using Solr  PHP out there very successfully. There's another good 
package at http://code.google.com/p/solr-php-client/ which is easy to implement 
and has some example usage.

Regards,

DQ



From: Álvaro Vargas Quezada [mailto:al...@outlook.com]
Sent: 05 March 2013 14:53
To: solr-user@lucene.apache.org
Subject: Building a central index with Lucene + Solr



Hi everyone!



I'm trying to develop a central index, I installed Solr and I reach the screen 
that I attach. But the problem is that I don't know how to continue since this 
point, I wanted to develop an app in php which use Solr, but I don't know how, 
anyone that can help me maybe with a tutorial or something like that?



Thanks and greetz from Chile!





Custom update handler

2013-02-08 Thread Jorge Luis Betancourt Gonzalez
Hi:

I'm trying to build a custom update handler to accomplish one specific task. In 
our app we do query suggestions based on previous queries passed into our 
frontend app, the thing is that instead of getting this queries from the solr 
logs, we stored in a separated core. So far so good, but one particular 
requirement is that not every query typed by the users in the search box 
appears as a suggestion, only the more popuparls. For this we created a field 
in the schema called count. And write code in out frontend to increase this 
value, to be honest we don't like this. So we came up with an idea of writing a 
custom update handler that before store the query in the index, checks if the 
query exists and then add 1 to the counter. 

The thing is that right now we have set up a dedupe component to avoid storing 
very similar queries, is there any way of accessing the dedupe component from 
the custom update handler? Is there any documentation I can check out to see 
anything similar to this?

Greetings

Indexing several parts of PDF file

2013-02-05 Thread Jorge Luis Betancourt Gonzalez
Hi:

I'm working on a search engine for several PDF documents, right now one of the 
requirements is that we can provide not only the documents matching the search 
criteria but the page that match the criteria. Normally tika only extracts the 
text content and does not do this distinction, but using some custom library 
this could be achieve, but my question is how to structure the schema. For what 
I've seen one approach could be the use dynamic fields:

dynamicField name=page_* type=text indexed=true  stored=true/

So at query time I could extract the page number from the fields name. Is this 
the best approach? Is there any form of storing the number page into an 
attribute and not using the dynamic fields?

Thanks in advance!

Greetings
--
It is only in the mysterious equation of love that any 
logical reasons can be found.
Good programmers often confuse halloween (31 OCT) with 
christmas (25 DEC)


Re: Indexing several parts of PDF file

2013-02-05 Thread Jorge Luis Betancourt Gonzalez
Thanks for the advice the thing with this approach is that we are using nutch 
as our crawler for the intranet, and right now, doing this (indexing one 
crawled document as several solr documents) it's not possible without changing 
the way nutch works. Is there any other workaround this?

Thanks for the replies!

- Mensaje original -
De: Upayavira u...@odoko.co.uk
Para: solr-user@lucene.apache.org
Enviados: Martes, 5 de Febrero 2013 9:05:58
Asunto: Re: Indexing several parts of PDF file

This would involve you querying against every page in your document,
which will be too many fields and will break quickly.

The best way to do it is to index pages as documents. You can use field
collapsing to group pages from the same document together.

Upayavira

On Tue, Feb 5, 2013, at 02:00 PM, Jorge Luis Betancourt Gonzalez wrote:
 Hi:
 
 I'm working on a search engine for several PDF documents, right now one
 of the requirements is that we can provide not only the documents
 matching the search criteria but the page that match the criteria.
 Normally tika only extracts the text content and does not do this
 distinction, but using some custom library this could be achieve, but my
 question is how to structure the schema. For what I've seen one approach
 could be the use dynamic fields:
 
 dynamicField name=page_* type=text indexed=true  stored=true/
 
 So at query time I could extract the page number from the fields name. Is
 this the best approach? Is there any form of storing the number page into
 an attribute and not using the dynamic fields?
 
 Thanks in advance!
 
 Greetings
 --
 It is only in the mysterious equation of love that any 
 logical reasons can be found.
 Good programmers often confuse halloween (31 OCT) with 
 christmas (25 DEC)


Migrating from Solr 3.6.1 to Solr 4

2013-01-05 Thread Jorge Luis Betancourt Gonzalez
Hi:

I'm currently working with solr 3.6.1, but solr 4 has great features like the 
ones bundled with SolrCloud, the content in the index is really not the problem 
to the transition, the thing is that I've a large app written in PHP + Solarium 
that interacts with the index in solr 3. As far as I know there is no support 
for solr 4 in solarium. So my question is is possible to use a solr 3.6.1 
fronted that gets the data from a solr 4 behind scenes, or there is any other 
workaround this?

Greetings!
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Migrating from Solr 3.6.1 to Solr 4

2013-01-05 Thread Jorge Luis Betancourt Gonzalez
So, from my php app point of view if I have the desire of using solrcloud 
feautures changes will be needed right? One more thing the responses generated 
from solr4 are in any way different from the ones generated from solr3? Because 
solarium parses the JSON response from the server to provide high level objects 
encapsulating the response and response content.

Greetings!

- Mensaje original -
De: Upayavira u...@odoko.co.uk
Para: solr-user@lucene.apache.org
Enviados: Sábado, 5 de Enero 2013 4:49:01
Asunto: Re: Migrating from Solr 3.6.1 to Solr 4

Try pointing your app at 4.0. I converted an app recently. Here's the
steps I took (as I recall):

 * get original solrconfig.xml for the release I'm using
 * diff that and my solrconfig.xml
 * apply those changes to a 4.0 solrconfig.xml
 * try to start up solr with this new solrconfig and an old schema and
 an old index
 * fix each problem you find in the schema
- some class names have changed
- you may want to delete some field definitions that you're not
using
- you'll need to copy the version field from the 4.0 schema

I found my app was able to search/index without any difficulty via the
XML/HTTP interface.

Your mileage may vary, but for that particular app, that is what it
took.

Note, 4.0 can work in a 3.x way (old style replication, etc). You don't
need to use SolrCloud etc when using 4.0.

Upayavira

On Sat, Jan 5, 2013, at 08:20 AM, Jorge Luis Betancourt Gonzalez wrote:
 Hi:

 I'm currently working with solr 3.6.1, but solr 4 has great features like
 the ones bundled with SolrCloud, the content in the index is really not
 the problem to the transition, the thing is that I've a large app written
 in PHP + Solarium that interacts with the index in solr 3. As far as I
 know there is no support for solr 4 in solarium. So my question is is
 possible to use a solr 3.6.1 fronted that gets the data from a solr 4
 behind scenes, or there is any other workaround this?

 Greetings!
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Dedup component

2012-12-15 Thread Jorge Luis Betancourt Gonzalez
Is this updatable fields available in Solr 3.6.1, is the one I'm using right 
now.

- Mensaje original -
De: Upayavira u...@odoko.co.uk
Para: solr-user@lucene.apache.org
Enviados: Sábado, 15 de Diciembre 2012 7:56:45
Asunto: Re: Dedup component

Make the ID field out of the query text so you don't have to use the
dedup component, then use the updatable fields functionality in Solr
4.0:

$ curl http://localhost:8983/solr/update -H
'Content-type:application/json' -d '
[
 {id: book1,
  copies_i  : { inc : 1},
  cat   : { add : fantasy},
  ISBN_s: { set : 0-380-97365-0}
  remove_s  : { set : null } }
]'

/* example stolen from Yonik's ApacheCon talk */

Upayavira


On Sat, Dec 15, 2012, at 01:34 AM, Jorge Luis Betancourt Gonzalez wrote:
 Hi all:

 I'm trying to build a query suggestion system using solr (also used to
 index all the data in the app). I've a separated core dedicated only for
 this purpose (along with some other for images, etc.). In the main app,
 written in Symfoy2 + Solarium Bundle, we store the queries in this core,
 to prevent the indexing of duplicated queries, I use the dedup component:

 !--
  Delete similar duplicated documents on index time, using some fuzzy text
  similary techniques
 --
 updateRequestProcessorChain name=dedupe
 processor
 class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
 bool name=enabledtrue/bool
 bool name=overwriteDupesfalse/bool
 str name=signatureFieldsignature/str
 str name=fieldstextsuggest,textng/str
 str name=signatureClass
 org.apache.solr.update.processor.TextProfileSignature
 /str
 /processor
 processor class=solr.LogUpdateProcessorFactory/
 processor class=solr.RunUpdateProcessorFactory/
 /updateRequestProcessorChain

 Which prevent the store of very similar queries, but with this
 configuration, but what I really trying to accomplish is to increment a
 count (popularity) field when the same query is sent to solr.

 Any thought on this?

 Greetings!

 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Dedup component

2012-12-15 Thread Jorge Luis Betancourt Gonzalez
Exist any similar approach that I could use in solr 3.6.1 or should I add this 
logic to my application?

- Mensaje original -
De: Upayavira u...@odoko.co.uk
Para: solr-user@lucene.apache.org
Enviados: Sábado, 15 de Diciembre 2012 12:37:11
Asunto: Re: Dedup component

Nope, it is a Solr 4.0 thing. In order for it to work, you need to store
every field, as what it does behind the scenes is retrieve the stored
fields, rebuilds the document, and then posts the whole document back.

Upayavira

On Sat, Dec 15, 2012, at 04:52 PM, Jorge Luis Betancourt Gonzalez wrote:
 Is this updatable fields available in Solr 3.6.1, is the one I'm using
 right now.

 - Mensaje original -
 De: Upayavira u...@odoko.co.uk
 Para: solr-user@lucene.apache.org
 Enviados: Sábado, 15 de Diciembre 2012 7:56:45
 Asunto: Re: Dedup component

 Make the ID field out of the query text so you don't have to use the
 dedup component, then use the updatable fields functionality in Solr
 4.0:

 $ curl http://localhost:8983/solr/update -H
 'Content-type:application/json' -d '
 [
  {id: book1,
   copies_i  : { inc : 1},
   cat   : { add : fantasy},
   ISBN_s: { set : 0-380-97365-0}
   remove_s  : { set : null } }
 ]'

 /* example stolen from Yonik's ApacheCon talk */

 Upayavira


 On Sat, Dec 15, 2012, at 01:34 AM, Jorge Luis Betancourt Gonzalez wrote:
  Hi all:
 
  I'm trying to build a query suggestion system using solr (also used to
  index all the data in the app). I've a separated core dedicated only for
  this purpose (along with some other for images, etc.). In the main app,
  written in Symfoy2 + Solarium Bundle, we store the queries in this core,
  to prevent the indexing of duplicated queries, I use the dedup component:
 
  !--
   Delete similar duplicated documents on index time, using some fuzzy text
   similary techniques
  --
  updateRequestProcessorChain name=dedupe
  processor
  class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  bool name=overwriteDupesfalse/bool
  str name=signatureFieldsignature/str
  str name=fieldstextsuggest,textng/str
  str name=signatureClass
  org.apache.solr.update.processor.TextProfileSignature
  /str
  /processor
  processor class=solr.LogUpdateProcessorFactory/
  processor class=solr.RunUpdateProcessorFactory/
  /updateRequestProcessorChain
 
  Which prevent the store of very similar queries, but with this
  configuration, but what I really trying to accomplish is to increment a
  count (popularity) field when the same query is sent to solr.
 
  Any thought on this?
 
  Greetings!
 
  10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
  INFORMATICAS...
  CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
  http://www.uci.cu
  http://www.facebook.com/universidad.uci
  http://www.flickr.com/photos/universidad_uci

 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci

 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Solr PHP client

2012-12-14 Thread Jorge Luis Betancourt Gonzalez
Hi Guillaume:

I beg to differ, it's true that the native solr support has been a big aid to 
developers use of solr from many programming languages. But making all the 
queries by hand is not wice and in any case is hard to maintain, it's easier 
using some OO library to interact with solr. For instance, I'm using right now 
Solarium to interact with Solr 3.6.1 within a Symfony2 app, in this particular 
scenario the Solarium handles all the interaction with the solr server. I work 
in my code with classes and beneath solarium talks json with the solr server. 
My point is that the ability of solr to talk a lot of standard formats it's a 
huge plus, but having a library that handles the heavy stuffs with the server 
keeps your code clean.

Greetings,

- Mensaje original -
De: Guillaume Rossolini guillaume.rossol...@instantluxe.com
Para: solr-user@lucene.apache.org
Enviados: Viernes, 14 de Diciembre 2012 3:22:41
Asunto: Re: Solr PHP client

Hi,

The various Solr PHP clients have been a great help in the past, and I do
not mean to belittle their efforts.
However, the Solr project has made many efforts to support several input
and output data formats, including JSON and even serialized PHP, which are
fairly easy to implement. Maybe I am mistaken, but I am not sure any PHP
client (as an extension or as a library) would actually help much any more.

Regards,

--
I N S T A N T  |  L U X E - 40 Rue D'Aboukir - 75002 Paris - France



On Fri, Dec 14, 2012 at 8:23 AM, Romita Saha
romita.s...@sg.panasonic.comwrote:

 Hi,

 Can anyone please guide me to use SolrPhpClient? The documents available
 are not clear. As to where to place SolrPhpClient?

 I have downloaded SolrPhpClient and have changed the following lines,
 specifying the path (where the files are present in my computer)


 require_once('/home/solr/SolrPhpClient/Apache/Solr/Document.php./Document.php');

 require_once('/home/solr/SolrPhpClient/Apache/Solr/Document.php./Response.php');

 After this I am unable to proceed. What and how should I index my
 documents now. How should I start my solr. Where to place the conf files.
 I see there are few html documents inside the folder
 SolrPhpClien/phpdocs.

 Could someone please help.

 Thanks and regards,
 Romita


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Prevent indexing documents with some terms

2012-12-07 Thread Jorge Luis Betancourt Gonzalez
Hi:

Is there any way that I can prevent a document from being indexed? I've a 
separated core only for query suggestions, this queries are stored right from 
the frontend app, so I'm trying to prevent some kind of bad intended queries to 
be stored in my query, but keeping the logic of what I consider bad intended 
out of the fronted application. The stop words only prevent to store some words 
in the index, but there is any way of prevented the storing of the whole 
document?

Greetings!
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: PHP client

2012-12-07 Thread Jorge Luis Betancourt Gonzalez
Any news on Solarium Project? Is the one I'm using with Solr 3.6!

- Mensaje original -
De: Bill Au bill.w...@gmail.com
Para: solr-user@lucene.apache.org, Arkadi Colson ark...@smartbit.be
Enviados: Viernes, 7 de Diciembre 2012 13:40:20
Asunto: Re: PHP client

I have not used the pecl Solr client.  I have been using SolrPhpClient.  I
came across this patch for pecl when I was researching php client for Solr
4.0.  SolrPhpClient has the same problem with 4.0 that this patch addresses.

Bill


On Fri, Dec 7, 2012 at 11:00 AM, Arkadi Colson ark...@smartbit.be wrote:

 Thanks for the info!

 Do you know if it'spossible to use file uploads to Tika with this client?


 On 12/03/2012 03:56 PM, Bill Au wrote:

 https://bugs.php.net/bug.php?**id=62332https://bugs.php.net/bug.php?id=62332

 There is a fork with patches applied.


 On Mon, Dec 3, 2012 at 9:38 AM, Arkadi Colson ark...@smartbit.be wrote:

  Hi

 Anyone tested the pecl Solr Client in combination with SolrCloud? I seems
 to be broken since 4.0

 Best regard
 Arkadi





 --
 Met vriendelijke groeten

 Arkadi Colson

 Smartbit bvba . Hoogstraat 13 . 3670 Meeuwen
 T +32 11 64 08 80 . F +32 11 64 08 81




10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: News clustering

2012-12-03 Thread Jorge Luis Betancourt Gonzalez
I'm trying to using to search though news websites, but I was interested in 
classification on index time, is there any available solution for this?

Greetings!

On Dec 3, 2012, at 12:37 PM, Stanislaw Osinski stanis...@osinski.name wrote:

 I mean measuring the similarity between the document in each cluster.
 Also, difference between document on one cluster with another cluster.
 
 I saw the sample code ClusteringQualityBencmark.java
 However, I do not know how to make use of it for assessing my Solr
 Clustering performance.
 
 
 You'd need to write your own code for this, here are the most common
 clustering quality measures you mentioned:
 
 http://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_of_clustering_results
 
 These are meant for the general case (numeric attributes), to apply them to
 texts, you'd need to use the vector representation of the documents.
 
 One a more general note, synthetic measures test only the document-cluster
 assignments, but none take the quality of labels into account (this is
 really hard to measure objectively).
 
 Staszek
 
 
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Suggester with punctuation signs

2012-11-27 Thread Jorge Luis Betancourt Gonzalez
Hi! Upayavira:

Hi I'm using the standard tokenizer right now, and it's working fine, but I was 
wondering if there is any form I can strip this punctuation marks right in the 
suggest requestHandler, so no need for index again. I've been doing some tests 
and increasing the threshold has improved the accuracy of the suggestions, one 
more thing is that the suggestions are mainly in spanish, so, any best 
practice configuration for this? or any standard configuration will do the 
trick?

Thanks!

On Nov 26, 2012, at 6:18 PM, Upayavira u...@odoko.co.uk wrote:

 You may want to change your tokenisation anyhow, as a search for
 'universidad' will not match your term 'universidad,'
 
 But you are on the right track - to improve suggestions, improve what is
 in your index.
 
 Upayavira
 
 On Mon, Nov 26, 2012, at 07:54 PM, Jorge Luis Betancourt Gonzalez wrote:
 Hi:
 
 I've configured my solr setup to use the suggester component and to get
 terms suggestions from a PHP application, the thing is that I'm getting
 results like universidad, note the punctuation sign, is there any way I
 can get rid of this? Or do I need to create a separate field and strip
 all punctuation signs?.
 
 Greetings
 
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci
 
 
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Suggester with punctuation signs

2012-11-26 Thread Jorge Luis Betancourt Gonzalez
Hi:

I've configured my solr setup to use the suggester component and to get terms 
suggestions from a PHP application, the thing is that I'm getting results like 
universidad, note the punctuation sign, is there any way I can get rid of this? 
Or do I need to create a separate field and strip all punctuation signs?.

Greetings

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci



Re: php client for Solr 4.0.0

2012-11-12 Thread Jorge Luis Betancourt Gonzalez
I'm currently using solarium with solr 3.6, perhaps you can tweak solarium as 
needed? I suppose that pull requests are welcome into solarium for solr 4.

Greetings!

On Nov 12, 2012, at 2:56 PM, Bill Au bill.w...@gmail.com wrote:

 Anyone know of a PHP client that is compatible with Solr 4.0.0?  I am using
 an old PHP client that is trying to set the waitFlush parameter on a commit
 so it is failing.
 
 Bill
 
 
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: is it possible to save the search query?

2012-11-08 Thread Jorge Luis Betancourt Gonzalez
I think that solr by him self doesn't store the queries (correct me if I'm 
wrong, about this) but you can accomplish what you want by processing the solr 
log (its the only way I think). From the solr log you can get the queries and 
then process the queries according to your needs, and change the boost 
parameters in your app o solr config. 

On Nov 8, 2012, at 11:32 AM, Otis Gospodnetic otis.gospodne...@gmail.com 
wrote:

 Hi,
 
 Aha, I think I understand.  Yes, you could collect all doc IDs from each
 query and find the differences.  There is nothing in Solr that can find
 those differences or that would store doc IDs of returned hits in the first
 place, so you would have to implement this yourself.  Sematext's Search
 Analytics service my be of help here in the sense that all data you
 need (queries, doc IDs, etc.) are collected, so it would be a matter of
 providing an API to get the data for off-line analysis.  But this data
 collection+diffing is also something you could implement yourself.  One
 thing to think about - what do you do when a query returns a lrge
 number of hits.  Do you really want/need to get IDs for all of them, or
 only a page at a time.
 
 Otis
 --
 Search Analytics - http://sematext.com/search-analytics/index.html
 Performance Monitoring - http://sematext.com/spm/index.html
 
 
 On Thu, Nov 8, 2012 at 1:01 AM, Romita Saha 
 romita.s...@sg.panasonic.comwrote:
 
 Hi,
 
 The following is the example;
 1st query:
 
 
 http://localhost:8983/solr/db/select/?defType=dismaxdebugQuery=onq=cashier2qf=data
 ^2
 idstart=0rows=11fl=data,id
 
 Next query:
 
 
 http://localhost:8983/solr/db/select/?defType=dismaxdebugQuery=onq=cashier2qf=data
 id^2start=0rows=11fl=data,id
 
 In the 1st query the the field 'data' is boosted by 2. However may be the
 user was not satisfied with the response. Thus in the next query he
 boosted the field 'id' by 2.
 
 I want to record both the queries and compare between the two, meaning,
 what are the changes implemented on the 2nd query which are not present in
 the previous one.
 
 Thanks and regards,
 Romita Saha
 
 
 
 From:   Otis Gospodnetic otis.gospodne...@gmail.com
 To: solr-user@lucene.apache.org,
 Date:   11/08/2012 01:35 PM
 Subject:Re: is it possible to save the search query?
 
 
 
 Hi,
 
 Compare in what sense?  An example will help.
 
 Otis
 --
 Performance Monitoring - http://sematext.com/spm
 On Nov 7, 2012 8:45 PM, Romita Saha romita.s...@sg.panasonic.com
 wrote:
 
 Hi All,
 
 Is it possible to record a search query in solr and then compare it with
 the previous search query?
 
 Thanks and regards,
 Romita Saha
 
 
 
 
 
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Storing queries in Solr

2012-10-08 Thread Jorge Luis Betancourt Gonzalez
Thanks for the quick response, I'm trying to get a suggester query, I found odd 
the being a very common issue solr doesn't provide any built in mechanism for 
query suggestions, but implementing the other components isn't so hard either.

Greetiings!

On Oct 8, 2012, at 3:38 AM, Upayavira wrote:

 Solr has a small query cache, but this does not hold queries for any
 length of time, so won't suit your purpose.
 
 The LucidWorks Search product has (I believe) a click tracking feature,
 but that is about boosting documents that are clicked on, not specific
 search terms. Parsing the Solr log, or pushing query terms to a
 different core/index would really be the only way to achieve what you're
 suggesting, as far as I am aware.
 
 Processing logs would be preferable anyhow, as you don't really want to
 be triggering an index write during each query (assuming you have more
 queries than updates to your main index), and also if this is for
 building a suggester index, then it is unlikely to need updating that
 regularly - every hour or every day should be more than sufficient. You
 could write a SearchComponent that logs queries in another format,
 should the existing log format not be sufficient for you.
 
 Upayavira
 
 On Mon, Oct 8, 2012, at 01:24 AM, Jorge Luis Betancourt Gonzalez wrote:
 Hi!
 
 I was wondering if there are any built-in mechanism that allow me to
 store the queries made to a solr server inside the index itself. I know
 that the suggester module exist, but as far as I know it only works for
 terms existing in the index, and not with queries. I remember reading
 about using some external program to parse the solr log and pushing the
 queries or any other interesting data into the index, is this the only
 way of accomplish this?
 
 Greetings!
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci
 
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Storing queries in Solr

2012-10-07 Thread Jorge Luis Betancourt Gonzalez
Hi!

I was wondering if there are any built-in mechanism that allow me to store the 
queries made to a solr server inside the index itself. I know that the 
suggester module exist, but as far as I know it only works for terms existing 
in the index, and not with queries. I remember reading about using some 
external program to parse the solr log and pushing the queries or any other 
interesting data into the index, is this the only way of accomplish this?

Greetings!
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Question about OR operator

2012-10-05 Thread Jorge Luis Betancourt Gonzalez
Thanks a lot for all the replies, Chris it worked out with this mm value:

str name=mm
10%
/str

If this version of solr is affected with the bug you pointed out, shouldn't 
fail with this value as well?

Greetings!

On Oct 4, 2012, at 8:48 PM, Jorge Luis Betancourt Gonzalez wrote:

 Hi Chris:
 
 I'm using solr 3.6.1, is the bug present in this version?
 
 Greetings!
 
 On Oct 4, 2012, at 6:11 PM, Chris Hostetter wrote:
 
 
 : GRAVE: java.lang.NumberFormatException: For input string: 
 :100
 :
 :at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
 :at java.lang.Integer.parseInt(Integer.java:470)
 :at java.lang.Integer.init(Integer.java:636)
 :at 
 org.apache.solr.util.SolrPluginUtils.calculateMinShouldMatch(SolrPluginUtils.java:691)
 
 What version of Solr are you using?
 
 That looks like a simple parsing bug that seems to have been fixed a while 
 back (it's definitely not in the 4.0 branch)
 
 can you try eliminating hte whitespace from your XML configured value...
 
str name=mm100/str
 
 ...that should work arround the problem.
 
 
 -Hoss
 
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci
 
 
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci
 
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Question about OR operator

2012-10-04 Thread Jorge Luis Betancourt Gonzalez
Hi:

I'm having an issue with solr 3.6.1 and I'm sensing that is a lack  of 
understanding. I'm building a search engine, using of course solr to store the 
inverted index, so far so good. When I search for a term, let's say java I 
get 761 results, then querying the index with a php term give me 3194 results 
found. So if a do a query for java php (without any quotas) I suppose that solr 
will interpret this as an OR between the two terms, correct? so the results 
should be the JOIN between the two subsets of results? so can anyone  explain 
why I get less results searching for the last query? java php without any 
quotes??

Thanks in advance!!
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Question about OR operator

2012-10-04 Thread Jorge Luis Betancourt Gonzalez
Hi:

Thanks for all the replies, right now I have this in my mm parameter:

str name=mm
2-1 5-2 690%
/str

I'm trying to get an straight OR between all the terms in my query, should I 
set the mm parameter to 1? because this gave an error.

Greetings!

On Oct 4, 2012, at 11:06 AM, Jorge Luis Betancourt Gonzalez wrote:

 Hi:
 
 I'm having an issue with solr 3.6.1 and I'm sensing that is a lack  of 
 understanding. I'm building a search engine, using of course solr to store 
 the inverted index, so far so good. When I search for a term, let's say 
 java I get 761 results, then querying the index with a php term give me 
 3194 results found. So if a do a query for java php (without any quotas) I 
 suppose that solr will interpret this as an OR between the two terms, 
 correct? so the results should be the JOIN between the two subsets of 
 results? so can anyone  explain why I get less results searching for the last 
 query? java php without any quotes??
 
 Thanks in advance!!
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci
 
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Question about OR operator

2012-10-04 Thread Jorge Luis Betancourt Gonzalez
This is the error:

GRAVE: java.lang.NumberFormatException: For input string: 
100

at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:470)
at java.lang.Integer.init(Integer.java:636)
at 
org.apache.solr.util.SolrPluginUtils.calculateMinShouldMatch(SolrPluginUtils.java:691)
at 
org.apache.solr.util.SolrPluginUtils.setMinShouldMatch(SolrPluginUtils.java:656)
at 
org.apache.solr.search.DisMaxQParser.getUserQuery(DisMaxQParser.java:210)
at 
org.apache.solr.search.DisMaxQParser.addMainQuery(DisMaxQParser.java:166)
at org.apache.solr.search.DisMaxQParser.parse(DisMaxQParser.java:77)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:105)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:165)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

This is the parameter in my solrconfig.xml

str name=mm
0
/str

On Oct 4, 2012, at 1:46 PM, Otis Gospodnetic wrote:

 What's the error Jorge?
 
 Otis
 --
 Search Analytics - http://sematext.com/search-analytics/index.html
 Performance Monitoring - http://sematext.com/spm/index.html
 
 
 On Thu, Oct 4, 2012 at 1:36 PM, Jorge Luis Betancourt Gonzalez
 jlbetanco...@uci.cu wrote:
 Hi:
 
 Thanks for all the replies, right now I have this in my mm parameter:
 
str name=mm
2-1 5-2 690%
/str
 
 I'm trying to get an straight OR between all the terms in my query, should I 
 set the mm parameter to 1? because this gave an error.
 
 Greetings!
 
 On Oct 4, 2012, at 11:06 AM, Jorge Luis Betancourt Gonzalez wrote:
 
 Hi:
 
 I'm having an issue with solr 3.6.1 and I'm sensing that is a lack  of 
 understanding. I'm building a search engine, using of course solr to store 
 the inverted index, so far so good. When I search for a term, let's say 
 java I get 761 results, then querying the index with a php term give me 
 3194 results found. So if a do a query for java php (without any quotas) I 
 suppose that solr will interpret this as an OR between the two terms, 
 correct? so the results should be the JOIN between the two subsets of 
 results? so can anyone  explain why I get less results searching for the 
 last query? java php without any quotes??
 
 Thanks in advance!!
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci
 
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci
 
 
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos

Re: Question about OR operator

2012-10-04 Thread Jorge Luis Betancourt Gonzalez
Thanks for the quick response, I got the same response, what I'm trying to 
accomplish is to get straight OR between all the clauses or terms in my query, 
the value I should use is 0 right?


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Question about OR operator

2012-10-04 Thread Jorge Luis Betancourt Gonzalez
Hi Chris:

I'm using solr 3.6.1, is the bug present in this version?

Greetings!

On Oct 4, 2012, at 6:11 PM, Chris Hostetter wrote:

 
 : GRAVE: java.lang.NumberFormatException: For input string: 
 : 100
 : 
 : at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
 : at java.lang.Integer.parseInt(Integer.java:470)
 : at java.lang.Integer.init(Integer.java:636)
 : at 
 org.apache.solr.util.SolrPluginUtils.calculateMinShouldMatch(SolrPluginUtils.java:691)
 
 What version of Solr are you using?
 
 That looks like a simple parsing bug that seems to have been fixed a while 
 back (it's definitely not in the 4.0 branch)
 
 can you try eliminating hte whitespace from your XML configured value...
 
 str name=mm100/str
 
 ...that should work arround the problem.
 
 
 -Hoss
 
 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci