date:20120606

Hi abdul and Jack,

i got the tstamp working but I really need to know the published date of
each page.


On Sat, Jun 2, 2012 at 12:01 AM, Jack Krupansky j...@basetechnology.comwrote:

 If you uncomment the timestamp field in the Solr example, Solr will
 automatically initialize it for each new document to be the time when the
 document is indexed (or most recently indexed). Any field declared with
 default=NOW and not explicitly initialized will have the current time
 when indexed (or re-indexed.)

 -- Jack Krupansky

 -Original Message- From: in.abdul
 Sent: Friday, June 01, 2012 6:55 AM
 To: solr-user@lucene.apache.org
 Subject: Re: How to find the age of a page


 Shameema Umer,

 you can add another one new field in schema ..  while updating or indexing
 add the time stamp to that current field ..

   Thanks and Regards,
   S SYED ABDUL KATHER



 On Fri, Jun 1, 2012 at 3:44 PM, Shameema Umer [via Lucene] 
 ml-node+s472066n3987234h80@n3.**nabble.comml-node%2bs472066n3987234...@n3.nabble.com
 wrote:

  Hi all,

 How can i find the age of a page solr results? that is the last updated
 time.
 tstamp refers to the fetch time, not the exact updated time, right?


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.**nabble.com/How-to-find-the-**
 age-of-a-page-tp3987234.htmlhttp://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234.html
  To unsubscribe from Lucene, click herehttp://lucene.472066.n3.**
 nabble.com/template/**NamlServlet.jtp?macro=**unsubscribe_by_codenode=**
 472066code=**aW4uYWJkdWxAZ21haWwuY29tfDQ3Mj**A2NnwxMDczOTUyNDEwhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
 
 .
 NAMLhttp://lucene.472066.n3.**nabble.com/template/**
 NamlServlet.jtp?macro=macro_**viewerid=instant_html%**
 21nabble%3Aemail.namlbase=**nabble.naml.namespaces.**
 BasicNamespace-nabble.view.**web.template.NabbleNamespace-**
 nabble.view.web.template.**NodeNamespacebreadcrumbs=**
 notify_subscribers%21nabble%**3Aemail.naml-instant_emails%**
 21nabble%3Aemail.naml-send_**instant_email%21nabble%**3Aemail.namlhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 



 -
 THANKS AND REGARDS,
 SYED ABDUL KATHER
 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/How-to-find-the-**age-of-a-page-**tp3987234p3987238.htmlhttp://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234p3987238.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Issue with Solrcloud /solr 4.0 : Discrepancy in number of groups and ngroups value

2012-06-06 Thread Nitesh Nandy

We are using Solr 4.0 (svn build 30th may, 2012) with Solr Cloud. While
querying, we use field collpasing with ngroups set to true. However, there
is a difference in the number of results got and the ngroups value
returned.

Ex:
http://localhost:8983/solr/select?q=messagebody:monit%20AND%20usergroupid:3group=truegroup.field=idfacet.limit=20group.ngroups=true

The values returned are like

int name=matches10/int
int name=ngroups9/int

Actual groups returned :4

Why do we have this discrepancy in the ngroups, matches and actual number
of groups.

Earlier we were using the same query with solr 3.5 (without solr cloud) and
it was giving correct results. Any kind of help is appreciated.
-- 
Regards,

Nitesh Nandy

Re: How to find the age of a page

2012-06-06 Thread in.abdul

when ever you reindex add the current TimeStamp .. that will be the publish
date .. from there you can calculate
Thanks and Regards,
S SYED ABDUL KATHER



On Wed, Jun 6, 2012 at 2:16 PM, Shameema Umer [via Lucene] 
ml-node+s472066n3987930...@n3.nabble.com wrote:

 Hi abdul and Jack,

 i got the tstamp working but I really need to know the published date of
 each page.


 On Sat, Jun 2, 2012 at 12:01 AM, Jack Krupansky [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3987930i=0wrote:


  If you uncomment the timestamp field in the Solr example, Solr will
  automatically initialize it for each new document to be the time when
 the
  document is indexed (or most recently indexed). Any field declared with
  default=NOW and not explicitly initialized will have the current time
  when indexed (or re-indexed.)
 
  -- Jack Krupansky
 
  -Original Message- From: in.abdul
  Sent: Friday, June 01, 2012 6:55 AM
  To: [hidden email]http://user/SendEmail.jtp?type=nodenode=3987930i=1
  Subject: Re: How to find the age of a page
 
 
  Shameema Umer,
 
  you can add another one new field in schema ..  while updating or
 indexing
  add the time stamp to that current field ..
 
Thanks and Regards,
S SYED ABDUL KATHER
 
 
 
  On Fri, Jun 1, 2012 at 3:44 PM, Shameema Umer [via Lucene] 
  ml-node+s472066n3987234h80@n3.**nabble.comml-node%[hidden 
  email]http://user/SendEmail.jtp?type=nodenode=3987930i=2

  wrote:
 
   Hi all,
 
  How can i find the age of a page solr results? that is the last updated
  time.
  tstamp refers to the fetch time, not the exact updated time, right?
 
 
  --
   If you reply to this email, your message will be added to the
 discussion
  below:
 
  http://lucene.472066.n3.**nabble.com/How-to-find-the-**
  age-of-a-page-tp3987234.html
 http://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234.html

   To unsubscribe from Lucene, click herehttp://lucene.472066.n3.**
 
 nabble.com/template/**NamlServlet.jtp?macro=**unsubscribe_by_codenode=**
  472066code=**aW4uYWJkdWxAZ21haWwuY29tfDQ3Mj**A2NnwxMDczOTUyNDEw
  
  .
  NAMLhttp://lucene.472066.n3.**nabble.com/template/**
  NamlServlet.jtp?macro=macro_**viewerid=instant_html%**
  21nabble%3Aemail.namlbase=**nabble.naml.namespaces.**
  BasicNamespace-nabble.view.**web.template.NabbleNamespace-**
  nabble.view.web.template.**NodeNamespacebreadcrumbs=**
  notify_subscribers%21nabble%**3Aemail.naml-instant_emails%**
  21nabble%3Aemail.naml-send_**instant_email%21nabble%**3Aemail.naml
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

  
 
 
 
  -
  THANKS AND REGARDS,
  SYED ABDUL KATHER
  --
  View this message in context: http://lucene.472066.n3.**
  nabble.com/How-to-find-the-**age-of-a-page-**tp3987234p3987238.html
 http://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234p3987238.html

  Sent from the Solr - User mailing list archive at Nabble.com.
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234p3987930.html
  To unsubscribe from Lucene, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234p3987942.html
Sent from the Solr - User mailing list archive at Nabble.com.

issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults

2012-06-06 Thread Markus Jelsma

Hi,

We've had some issues with a bad zero-hits collation being returned for a two 
word query where one word was only one edit away from the required collation. 
With spellcheck.maxCollations to a reasonable number we saw the various 
suggestions without the required collation. We decreased 
thresholdTokenFrequency to make it appear in the list of collations. However, 
with collateExtendedResults=true the hits field for each collation was zero, 
which is incorrect.

Required collation=huub stapel (two hits) and q=huup stapel

  collation:{
collationQuery:heup stapel,
hits:0,
misspellingsAndCorrections:{
  huup:heup}},
  collation:{
collationQuery:hugo stapel,
hits:0,
misspellingsAndCorrections:{
  huup:hugo}},
  collation:{
collationQuery:hulp stapel,
hits:0,
misspellingsAndCorrections:{
  huup:hulp}},
  collation:{
collationQuery:hup stapel,
hits:0,
misspellingsAndCorrections:{
  huup:hup}},
  collation:{
collationQuery:huub stapel,
hits:0,
misspellingsAndCorrections:{
  huup:huub}},
  collation:{
collationQuery:huur stapel,
hits:0,
misspellingsAndCorrections:{
  huup:huur}

Now, with maxCollationTries set to 3 or higher we finally get the required 
collation and the only collation able to return results. How can we determine 
the best value for maxCollationTries regarding the decrease of the 
thresholdTokenFrequency? Why is hits always zero?

This is with a today's build and distributed search enabled.

Thanks,
Markus

Re: How to find the age of a page

Hi Syed Abdul,
I am sorry to ask this basic question as I am new to nutch solr(even new to
java application). Can you tell me how to add tstamp to published date
after re-indexing. Does an update query is enough?

Also, i am not able to get the field *publishedDate* in my query results to
check whether it is working properly.

Thanks
Shameema

Re: Schema / Config Error?

That implies one of two things:
1 you changed solr.xml. I'd go back to the original and re-edit
anything you've changed
2 you somehow got a corrupted download. Try blowing your installation
away and getting a new copy

Because it works perfectly for me.

Best
Erick

On Wed, Jun 6, 2012 at 4:14 AM, Spadez james_will...@hotmail.com wrote:
 Hi,

 I installed a fresh copy of Solr 3.6.0 or my server but I get the following
 page when I try to access Solr:

 http://176.58.103.78:8080/solr/

 It says errors to do with my Solr.xml. This is my solr.xml:



 I really cant figure out how I am meant to fix this, so if anyone is able to
 give some input I would really appreciate it.

 James

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExtendedDisMax Question - Strange behaviour

Sorry, but your post is really hard to read with all the data inline.

Try running with debugQuery=on and looking at the parsed query, I suspect
your field lists aren't the same even though you think they are.
Perhaps a typo somewhere?

Best
Erick

On Mon, Jun 4, 2012 at 1:26 PM, André Maldonado
andre.maldon...@gmail.com wrote:
 I'm doing a query with edismax.

 When I don't tell solr which fields I want to do the search (so it does in
 default field), it returns 2752 documents.

 ex:
 http://000.000.0.0:/solr/select/?q=apartamento+moema+praia+churrasqueiraversion=2.2start=0rows=10indent=ondefType=dismaxmm=75%25http://192.168.20.8:8984/solr/Index/select/?q=apartamento+moema+praia+churrasqueiraversion=2.2start=0rows=10indent=ondefType=dismaxmm=75%25facet=truefacet.field=bairro

 The same search, defining the fiels that composes the default field, it
 returns 1434 docs.

 ex:
 http://000.000.0.0:/solr/select/?q=apartamento+moema+praia+churrasqueiraversion=2.2start=0rows=10indent=ondefType=dismaxmm=75%25;http://192.168.20.8:8984/solr/Index/select/?q=apartamento+moema+praia+churrasqueiraversion=2.2start=0rows=10indent=ondefType=dismaxmm=75%25facet=truefacet.field=bairroqf=agrupamentos+agrupamentos2+bairro+campanhalocalempreendimento+caracteristicas+caracteristicacomum+categoria+cep+chamada+cidade+codigoanuncio+complemento+descricaopermuta+docid+empreendimento+endereco+estado+informacoescomplementares+conteudoobservacao+sigla+subtipoimovel+tipoimovel+transacao+zapid+caminhomapa+codigooferta+segmento+anuncianteorigem+zapidcorporativo+estagiodaobra+condicoescomerciais+nomejornal+nomejornalordem+textomanual
 qf=agrupamentos+agrupamentos2+bairro+campanhalocalempreendimento+caracteristicas+caracteristicacomum+categoria+cep+chamada+cidade+codigoanuncio+complemento+descricaopermuta+docid+empreendimento+endereco+estado+informacoescomplementares+conteudoobservacao+sigla+subtipoimovel+tipoimovel+transacao+zapid+caminhomapa+codigooferta+segmento+anuncianteorigem+zapidcorporativo+estagiodaobra+condicoescomerciais+nomejornal+nomejornalordem+textomanualhttp://192.168.20.8:8984/solr/Index/select/?q=apartamento+moema+praia+churrasqueiraversion=2.2start=0rows=10indent=ondefType=dismaxmm=75%25facet=truefacet.field=bairroqf=agrupamentos+agrupamentos2+bairro+campanhalocalempreendimento+caracteristicas+caracteristicacomum+categoria+cep+chamada+cidade+codigoanuncio+complemento+descricaopermuta+docid+empreendimento+endereco+estado+informacoescomplementares+conteudoobservacao+sigla+subtipoimovel+tipoimovel+transacao+zapid+caminhomapa+codigooferta+segmento+anuncianteorigem+zapidcorporativo+estagiodaobra+condicoescomerciais+nomejornal+nomejornalordem+textomanual

 This is the important part of schema:

 defaultSearchFieldtextoboost/defaultSearchFieldcopyField source=
 agrupamentos2 dest=textoboost /copyField source=agrupamentos dest=
 textoboost /copyField source=bairro dest=textoboost /copyField
 source=campanhalocalempreendimento dest=textoboost /copyField source=
 caracteristicas dest=textoboost /copyField source=caracteristicacomum
  dest=textoboost /copyField source=categoria dest=textoboost /
 copyField source=cep dest=textoboost /copyField source=chamada dest
 =textoboost /copyField source=cidade dest=textoboost /copyField
 source=codigoanuncio dest=textoboost /copyField source=complemento
 dest=textoboost /copyField source=descricaopermuta dest=textoboost
  /copyField source=docid dest=textoboost /copyField source=
 empreendimento dest=textoboost /copyField source=endereco dest=
 textoboost /copyField source=estado dest=textoboost /copyField
 source=informacoescomplementares dest=textoboost /copyField source=
 conteudoobservacao dest=textoboost /copyField source=sigla dest=
 textoboost /copyField source=subtipoimovel dest=textoboost /
 copyField source=tipoimovel dest=textoboost /copyField source=
 transacao dest=textoboost /copyField source=zapid dest=textoboost
  /copyField source=caminhomapa dest=textoboost /copyField source=
 codigooferta dest=textoboost /copyField source=segmento dest=
 textoboost /copyField source=anuncianteorigem dest=textoboost /
 copyField source=zapidcorporativo dest=textoboost /copyField source=
 estagiodaobra dest=textoboost /copyField source=condicoescomerciais
 dest=textoboost /copyField source=nomejornal dest=textoboost /
 copyField source=nomejornalordem dest=textoboost / copyField source=
 textomanual dest=textoboost /

 What's the problem?

 Thank's

 *
 --
 *
 *E conhecereis a verdade, e a verdade vos libertará. (João 8:32)*

  *andre.maldonado*@gmail.com andre.maldon...@gmail.com
  (11) 9112-4227

 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
 http://www.facebook.com/profile.php?id=10659376883
  http://twitter.com/andremaldonado 
 http://www.delicious.com/andre.maldonado

Re: ReadTimeout on commit

You're probably hitting a background merge and the request is timing
out even though the commit succeeds. Try querying for the data in
the last packet to test this.

And you don't say what version of Solr you're using.

One test you can do is increase the number of documents before
a commit. If merging is the problem I'd expect you to _still_ encounter
this problem, just much less often. That would at least tell you if this
is the right path to investigate.

Best
Erick

On Tue, Jun 5, 2012 at 6:51 AM,  spr...@gmx.eu wrote:
 Hi,

 I'm indexing documents in batches of 100 docs. Then commit.

 Sometimes I get this exception:

 org.apache.solr.client.solrj.SolrServerException:
 java.net.SocketTimeoutException: Read timed out
        at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
 olrServer.java:475)
        at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
 olrServer.java:249)
        at
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractU
 pdateRequest.java:105)
        at
 org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178)


 I found some similar postings in the web, all recommending autocommit. This
 is unfortunately not an option for me, because I have to know whether solr
 committed or not.

 What is causing this timeout?

 I'm using these settings in solrj:

        server.setSoTimeout(1000);
          server.setConnectionTimeout(100);
          server.setDefaultMaxConnectionsPerHost(100);
          server.setMaxTotalConnections(100);
          server.setFollowRedirects(false);
          server.setAllowCompression(true);
          server.setMaxRetries(1);

 Thank you

Re: Schema / Config Error?

Make sure your port is 8983 or 8080.

On Wed, Jun 6, 2012 at 4:27 PM, Erick Erickson erickerick...@gmail.comwrote:

 That implies one of two things:
 1 you changed solr.xml. I'd go back to the original and re-edit
 anything you've changed
 2 you somehow got a corrupted download. Try blowing your installation
 away and getting a new copy

 Because it works perfectly for me.

 Best
 Erick

 On Wed, Jun 6, 2012 at 4:14 AM, Spadez james_will...@hotmail.com wrote:
  Hi,
 
  I installed a fresh copy of Solr 3.6.0 or my server but I get the
 following
  page when I try to access Solr:
 
  http://176.58.103.78:8080/solr/
 
  It says errors to do with my Solr.xml. This is my solr.xml:
 
 
 
  I really cant figure out how I am meant to fix this, so if anyone is
 able to
  give some input I would really appreciate it.
 
  James
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html
  Sent from the Solr - User mailing list archive at Nabble.com.

Re: sort by publishedDate and get published Date in solr query results

Step 1: Verify that publishedDate is in fact the field name that Nutch 
uses for published date.


Step 2: Make sure the Nutch is passing the date in the format 
-MM-DDTHH:MM:SSZ. Whether you need a Nutch plugin to do that is not a 
question for this Solr mailing list. My (very limited) understanding is that 
there was a Nutch plugin that worked for the old version of Nutch but that 
it was not updated for the new version of Nutch.


Step 3: Have you added the field publishedDate to your Solr schema with 
field type of date or tdate?


If you can't figure out how to fix the problem on the Nutch side of the 
fence, then you will have to do a custom update processor for Solr. Solr 4.x 
has some new tools that should make that easier.


See:
https://issues.apache.org/jira/browse/SOLR-2802

-- Jack Krupansky

-Original Message- 
From: Shameema Umer

Sent: Wednesday, June 06, 2012 4:12 AM
To: solr-user@lucene.apache.org
Subject: sort by publishedDate and get published Date in solr query results

Hi,
Please help me sort by publishedDate and get publishedDate in solr query
results. Do i need to install anything(plugin).

Thanks
Shameema

Re: How to find the age of a page

My misunderstanding. I thought you were publishing to SOLR and wanted the 
date when that occurred (indexing).


-- Jack Krupansky

-Original Message- 
From: Shameema Umer

Sent: Wednesday, June 06, 2012 4:45 AM
To: solr-user@lucene.apache.org
Subject: Re: How to find the age of a page

Hi abdul and Jack,

i got the tstamp working but I really need to know the published date of
each page.


On Sat, Jun 2, 2012 at 12:01 AM, Jack Krupansky 
j...@basetechnology.comwrote:



If you uncomment the timestamp field in the Solr example, Solr will
automatically initialize it for each new document to be the time when the
document is indexed (or most recently indexed). Any field declared with
default=NOW and not explicitly initialized will have the current time
when indexed (or re-indexed.)

-- Jack Krupansky

-Original Message- From: in.abdul
Sent: Friday, June 01, 2012 6:55 AM
To: solr-user@lucene.apache.org
Subject: Re: How to find the age of a page


Shameema Umer,

you can add another one new field in schema ..  while updating or indexing
add the time stamp to that current field ..

  Thanks and Regards,
  S SYED ABDUL KATHER



On Fri, Jun 1, 2012 at 3:44 PM, Shameema Umer [via Lucene] 
ml-node+s472066n3987234h80@n3.**nabble.comml-node%2bs472066n3987234...@n3.nabble.com
wrote:

 Hi all,


How can i find the age of a page solr results? that is the last updated
time.
tstamp refers to the fetch time, not the exact updated time, right?


--
 If you reply to this email, your message will be added to the discussion
below:

http://lucene.472066.n3.**nabble.com/How-to-find-the-**
age-of-a-page-tp3987234.htmlhttp://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234.html
 To unsubscribe from Lucene, click herehttp://lucene.472066.n3.**
nabble.com/template/**NamlServlet.jtp?macro=**unsubscribe_by_codenode=**
472066code=**aW4uYWJkdWxAZ21haWwuY29tfDQ3Mj**A2NnwxMDczOTUyNDEwhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw

.
NAMLhttp://lucene.472066.n3.**nabble.com/template/**
NamlServlet.jtp?macro=macro_**viewerid=instant_html%**
21nabble%3Aemail.namlbase=**nabble.naml.namespaces.**
BasicNamespace-nabble.view.**web.template.NabbleNamespace-**
nabble.view.web.template.**NodeNamespacebreadcrumbs=**
notify_subscribers%21nabble%**3Aemail.naml-instant_emails%**
21nabble%3Aemail.naml-send_**instant_email%21nabble%**3Aemail.namlhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: http://lucene.472066.n3.**
nabble.com/How-to-find-the-**age-of-a-page-**tp3987234p3987238.htmlhttp://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234p3987238.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to find the age of a page

See the reply on the other email thread you started.

-- Jack Krupansky

-Original Message- 
From: Shameema Umer 
Sent: Wednesday, June 06, 2012 6:28 AM 
To: solr-user@lucene.apache.org 
Subject: Re: How to find the age of a page 

Hi Syed Abdul,
I am sorry to ask this basic question as I am new to nutch solr(even new to
java application). Can you tell me how to add tstamp to published date
after re-indexing. Does an update query is enough?

Also, i am not able to get the field *publishedDate* in my query results to
check whether it is working properly.

Thanks
Shameema

Re: Schema / Config Error?

Read CHANGES.txt carefully, especially the section entitled Upgrading from 
Solr 3.5. For example,


* As of Solr 3.6, the indexDefaults and mainIndex sections of 
solrconfig.xml are deprecated
 and replaced with a new indexConfig section. Read more in SOLR-1052 
below.


If you simply copied your schema/config directly, unchanged, then this could 
be the problem.


You may need to compare your schema/config line-by-line to the new 3.6 
schema/config for any differences.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Wednesday, June 06, 2012 6:57 AM
To: solr-user@lucene.apache.org
Subject: Re: Schema / Config Error?

That implies one of two things:
1 you changed solr.xml. I'd go back to the original and re-edit
anything you've changed
2 you somehow got a corrupted download. Try blowing your installation
away and getting a new copy

Because it works perfectly for me.

Best
Erick

On Wed, Jun 6, 2012 at 4:14 AM, Spadez james_will...@hotmail.com wrote:

Hi,

I installed a fresh copy of Solr 3.6.0 or my server but I get the 
following

page when I try to access Solr:

http://176.58.103.78:8080/solr/

It says errors to do with my Solr.xml. This is my solr.xml:



I really cant figure out how I am meant to fix this, so if anyone is able 
to

give some input I would really appreciate it.

James

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: sort by publishedDate and get published Date in solr query results

Versions: Nutch: 1.4 and Solr: 3.4

My schema file contains
!-- fields for feed plugin (tag is also used by microformats-reltag)--
field name=author type=string stored=true indexed=true/
field name=tag type=string stored=true indexed=true
multiValued=true/
field name=feed type=string stored=true indexed=true/
field name=publishedDate type=date stored=true
indexed=true/
field name=updatedDate type=date stored=true
indexed=true/


But I do not know whether this feed plugin is working or not as I am new to
nutch and solr.
Here is my query
http://localhost:8983/solr/select/?q=title:'.$v.'
content:'.$v.'sort=publishedDate descfl=tilte content url
publishedDatestart=0rows=1version=2.2indent=onhl=truehl.fl=contenthl.fragsize=300'

But this is not returning publishedDate on the results.

Should i post this on nutch users mailing?

Thanks.


On Wed, Jun 6, 2012 at 4:52 PM, Jack Krupansky j...@basetechnology.comwrote:

 Step 1: Verify that publishedDate is in fact the field name that Nutch
 uses for published date.

 Step 2: Make sure the Nutch is passing the date in the format
 -MM-DDTHH:MM:SSZ. Whether you need a Nutch plugin to do that is not a
 question for this Solr mailing list. My (very limited) understanding is
 that there was a Nutch plugin that worked for the old version of Nutch but
 that it was not updated for the new version of Nutch.

 Step 3: Have you added the field publishedDate to your Solr schema with
 field type of date or tdate?

 If you can't figure out how to fix the problem on the Nutch side of the
 fence, then you will have to do a custom update processor for Solr. Solr
 4.x has some new tools that should make that easier.

 See:
 https://issues.apache.org/**jira/browse/SOLR-2802https://issues.apache.org/jira/browse/SOLR-2802

 -- Jack Krupansky

 -Original Message- From: Shameema Umer
 Sent: Wednesday, June 06, 2012 4:12 AM
 To: solr-user@lucene.apache.org
 Subject: sort by publishedDate and get published Date in solr query results


 Hi,
 Please help me sort by publishedDate and get publishedDate in solr query
 results. Do i need to install anything(plugin).

 Thanks
 Shameema

Re: ReadTimeout on commit

As Erick says, you are probably hitting an occasional automatic background 
merge which takes a bit longer. That is not an indication of a problem. 
Increase your connection timeout. Check the log to see how long the merge or 
slow commit takes. You have a timeout of 1000 which is 1 second. Make it 
longer, and possibly put the commit or other indexing operations in a loop 
with a few retries before considering connection timeout a fatal error. 
Occasional delays are a fact or life in a multi-process, networked 
environment.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Wednesday, June 06, 2012 7:02 AM
To: solr-user@lucene.apache.org
Subject: Re: ReadTimeout on commit

You're probably hitting a background merge and the request is timing
out even though the commit succeeds. Try querying for the data in
the last packet to test this.

And you don't say what version of Solr you're using.

One test you can do is increase the number of documents before
a commit. If merging is the problem I'd expect you to _still_ encounter
this problem, just much less often. That would at least tell you if this
is the right path to investigate.

Best
Erick

On Tue, Jun 5, 2012 at 6:51 AM,  spr...@gmx.eu wrote:

Hi,

I'm indexing documents in batches of 100 docs. Then commit.

Sometimes I get this exception:

org.apache.solr.client.solrj.SolrServerException:
java.net.SocketTimeoutException: Read timed out
   at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java:475)
   at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java:249)
   at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractU
pdateRequest.java:105)
   at
org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178)


I found some similar postings in the web, all recommending autocommit. 
This

is unfortunately not an option for me, because I have to know whether solr
committed or not.

What is causing this timeout?

I'm using these settings in solrj:

   server.setSoTimeout(1000);
 server.setConnectionTimeout(100);
 server.setDefaultMaxConnectionsPerHost(100);
 server.setMaxTotalConnections(100);
 server.setFollowRedirects(false);
 server.setAllowCompression(true);
 server.setMaxRetries(1);

Thank you

Re: sort by publishedDate and get published Date in solr query results

Check your Solr log file to see whether errors or warnings are issued. If 
Nutch is sending bogus date values, they should produce warnings.


At this stage there are two strong possibilities:

1. Nutch is simply not sending that date field value at all.
2. Solr is rejecting the date field value because it is not in required 
-mm-ddThh:mm:ssZ format.


If #2, you need to go the update processor route I mentioned previously.

-- Jack Krupansky

-Original Message- 
From: Shameema Umer

Sent: Wednesday, June 06, 2012 7:37 AM
To: solr-user@lucene.apache.org
Subject: Re: sort by publishedDate and get published Date in solr query 
results


Versions: Nutch: 1.4 and Solr: 3.4

My schema file contains
!-- fields for feed plugin (tag is also used by microformats-reltag)--
   field name=author type=string stored=true indexed=true/
   field name=tag type=string stored=true indexed=true
multiValued=true/
   field name=feed type=string stored=true indexed=true/
   field name=publishedDate type=date stored=true
   indexed=true/
   field name=updatedDate type=date stored=true
   indexed=true/


But I do not know whether this feed plugin is working or not as I am new to
nutch and solr.
Here is my query
http://localhost:8983/solr/select/?q=title:'.$v.'
content:'.$v.'sort=publishedDate descfl=tilte content url
publishedDatestart=0rows=1version=2.2indent=onhl=truehl.fl=contenthl.fragsize=300'

But this is not returning publishedDate on the results.

Should i post this on nutch users mailing?

Thanks.


On Wed, Jun 6, 2012 at 4:52 PM, Jack Krupansky 
j...@basetechnology.comwrote:



Step 1: Verify that publishedDate is in fact the field name that Nutch
uses for published date.

Step 2: Make sure the Nutch is passing the date in the format
-MM-DDTHH:MM:SSZ. Whether you need a Nutch plugin to do that is not 
a

question for this Solr mailing list. My (very limited) understanding is
that there was a Nutch plugin that worked for the old version of Nutch but
that it was not updated for the new version of Nutch.

Step 3: Have you added the field publishedDate to your Solr schema with
field type of date or tdate?

If you can't figure out how to fix the problem on the Nutch side of the
fence, then you will have to do a custom update processor for Solr. Solr
4.x has some new tools that should make that easier.

See:
https://issues.apache.org/**jira/browse/SOLR-2802https://issues.apache.org/jira/browse/SOLR-2802

-- Jack Krupansky

-Original Message- From: Shameema Umer
Sent: Wednesday, June 06, 2012 4:12 AM
To: solr-user@lucene.apache.org
Subject: sort by publishedDate and get published Date in solr query 
results



Hi,
Please help me sort by publishedDate and get publishedDate in solr query
results. Do i need to install anything(plugin).

Thanks
Shameema

Re: Efficiently mining or parsing data out of XML source files

2012-06-06 Thread Mike Sokolov

I agree, that seems odd.  We routinely index XML using either 
HTMLStripCharFilter, or XmlCharFilter (see patch: 
https://issues.apache.org/jira/browse/SOLR-2597), both of which parse 
the XML, and we don't see such a huge  speed difference from indexing 
other field types.  XmlCharFilter also allows you to specify which 
elements to index if you don't want the whole file.


-Mike

On 6/3/2012 8:42 AM, Erick Erickson wrote:

This seems really odd. How big are these XML files? Where are you parsing them?
You could consider using a SolrJ program with a SAX-style parser.

But the first question I'd answer is what is slow?. The implications
of your post is that
parsing the XML is the slow part, it really shouldn't be taking
anywhere near this long IMO...

Best
Erick

On Thu, May 31, 2012 at 9:14 AM, Van Tassell, Kristian
kristian.vantass...@siemens.com  wrote:

I'm just wondering what the general consensus is on indexing XML data to Solr 
in terms of parsing and mining the relevant data out of the file and putting 
them into Solr fields. Assume that this is the XML file and resulting Solr 
fields:

XML data:
mydoc id=1234
titlefoo/title
bar attr1=val1/
bazgarbage data/baz
/ mydoc

Solr Fields:
Id=1234
Title=foo
Bar=val1

I'd previously set this process up using XSLT and have since tested using 
XMLBeans, JAXB, etc. to get the relevant data. The speed at which this occurs, 
however, is not acceptable. 2800 objects take 11 minutes to parse and index 
into Solr.

The big slowdown appears to be that I'm parsing the data with an XML parser.

So, now I'm testing mining the data by opening the file as just a text file 
(using Groovy) and picking out relevant data using regular expression matching. 
I'm now able to parse (mine) the data and index the 2800 files in 72 seconds.

So I'm wondering if the typical solution people use is to go with a non-XML 
solution. It seems to make sense considering the search index would only want 
to store (as much data) as possible and not rely on the incoming documents 
being xml compliant.

Thanks in advance for any thoughts on this!
-Kristian

Re: ReadTimeout on commit

2012-06-06 Thread Mark Miller

Looks like the commit is taking longer than your set timeout.

On Jun 5, 2012, at 6:51 AM, spr...@gmx.eu spr...@gmx.eu wrote:

 Hi,
 
 I'm indexing documents in batches of 100 docs. Then commit.
 
 Sometimes I get this exception:
 
 org.apache.solr.client.solrj.SolrServerException:
 java.net.SocketTimeoutException: Read timed out
   at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
 olrServer.java:475)
   at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
 olrServer.java:249)
   at
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractU
 pdateRequest.java:105)
   at
 org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178)
 
 
 I found some similar postings in the web, all recommending autocommit. This
 is unfortunately not an option for me, because I have to know whether solr
 committed or not.
 
 What is causing this timeout?
 
 I'm using these settings in solrj:
 
server.setSoTimeout(1000);
 server.setConnectionTimeout(100);
 server.setDefaultMaxConnectionsPerHost(100);
 server.setMaxTotalConnections(100);
 server.setFollowRedirects(false);
 server.setAllowCompression(true);
 server.setMaxRetries(1);
 
 Thank you
 

- Mark Miller
lucidimagination.com

Re: sort by publishedDate and get published Date in solr query results

OK Jack. Will do.

On Wed, Jun 6, 2012 at 5:29 PM, Jack Krupansky j...@basetechnology.comwrote:

Check your Solr log file to see whether errors or warnings are issued. If
Nutch is sending bogus date values, they should produce warnings.

At this stage there are two strong possibilities:

1. Nutch is simply not sending that date field value at all.
2. Solr is rejecting the date field value because it is not in required
-mm-ddThh:mm:ssZ format.

If #2, you need to go the update processor route I mentioned previously.

-- Jack Krupansky

-Original Message- From: Shameema Umer
Sent: Wednesday, June 06, 2012 7:37 AM
To: solr-user@lucene.apache.org
Subject: Re: sort by publishedDate and get published Date in solr query
results

Versions: Nutch: 1.4 and Solr: 3.4

My schema file contains
!-- fields for feed plugin (tag is also used by microformats-reltag)--
field name=author type=string stored=true indexed=true/
field name=tag type=string stored=true indexed=true
multiValued=true/
field name=feed type=string stored=true indexed=true/
field name=publishedDate type=date stored=true
indexed=true/
field name=updatedDate type=date stored=true
indexed=true/

But I do not know whether this feed plugin is working or not as I am new to
nutch and solr.
Here is my query
http://localhost:8983/solr/**select/?q=title:'.$vhttp://localhost:8983/solr/select/?q=title:%27.$v
.'
content:'.$v.'sort=**publishedDate descfl=tilte content url
publishedDatestart=0rows=1**version=2.2indent=onhl=true**
hl.fl=contenthl.fragsize=300'

But this is not returning publishedDate on the results.

Should i post this on nutch users mailing?

Thanks.

On Wed, Jun 6, 2012 at 4:52 PM, Jack Krupansky j...@basetechnology.com**
wrote:

Step 1: Verify that publishedDate is in fact the field name that Nutch
uses for published date.

Step 2: Make sure the Nutch is passing the date in the format
-MM-DDTHH:MM:SSZ. Whether you need a Nutch plugin to do that is not
a
question for this Solr mailing list. My (very limited) understanding is
that there was a Nutch plugin that worked for the old version of Nutch but
that it was not updated for the new version of Nutch.

Step 3: Have you added the field publishedDate to your Solr schema with
field type of date or tdate?

If you can't figure out how to fix the problem on the Nutch side of the
fence, then you will have to do a custom update processor for Solr. Solr
4.x has some new tools that should make that easier.

See:
https://issues.apache.org/jira/browse/SOLR-2802https://issues.apache.org/**jira/browse/SOLR-2802
https://**issues.apache.org/jira/browse/**SOLR-2802https://issues.apache.org/jira/browse/SOLR-2802

-- Jack Krupansky

-Original Message- From: Shameema Umer
Sent: Wednesday, June 06, 2012 4:12 AM
To: solr-user@lucene.apache.org
Subject: sort by publishedDate and get published Date in solr query
results

Hi,
Please help me sort by publishedDate and get publishedDate in solr query
results. Do i need to install anything(plugin).

Thanks
Shameema

RE: ReadTimeout on commit

2012-06-06 Thread spring

Hi Jack, hi Erik,

thanks for the tips! It's solr 3.6

I increased the batch to 1000 docs and the timeout to 10 s. Now it works.
And I will implement the retry around the commit-call.

Thx!

 -Original Message-
 From: Jack Krupansky [mailto:j...@basetechnology.com] 
 Sent: Mittwoch, 6. Juni 2012 13:52
 To: solr-user@lucene.apache.org
 Subject: Re: ReadTimeout on commit
 
 As Erick says, you are probably hitting an occasional 
 automatic background 
 merge which takes a bit longer. That is not an indication of 
 a problem. 
 Increase your connection timeout. Check the log to see how 
 long the merge or 
 slow commit takes. You have a timeout of 1000 which is 1 
 second. Make it 
 longer, and possibly put the commit or other indexing 
 operations in a loop 
 with a few retries before considering connection timeout a 
 fatal error. 
 Occasional delays are a fact or life in a multi-process, networked 
 environment.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: Erick Erickson
 Sent: Wednesday, June 06, 2012 7:02 AM
 To: solr-user@lucene.apache.org
 Subject: Re: ReadTimeout on commit
 
 You're probably hitting a background merge and the request is timing
 out even though the commit succeeds. Try querying for the data in
 the last packet to test this.
 
 And you don't say what version of Solr you're using.
 
 One test you can do is increase the number of documents before
 a commit. If merging is the problem I'd expect you to _still_ 
 encounter
 this problem, just much less often. That would at least tell 
 you if this
 is the right path to investigate.
 
 Best
 Erick
 
 On Tue, Jun 5, 2012 at 6:51 AM,  spr...@gmx.eu wrote:
  Hi,
 
  I'm indexing documents in batches of 100 docs. Then commit.
 
  Sometimes I get this exception:
 
  org.apache.solr.client.solrj.SolrServerException:
  java.net.SocketTimeoutException: Read timed out
 at
  
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.reques
 t(CommonsHttpS
  olrServer.java:475)
 at
  
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.reques
 t(CommonsHttpS
  olrServer.java:249)
 at
  
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.pro
 cess(AbstractU
  pdateRequest.java:105)
 at
  org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178)
 
 
  I found some similar postings in the web, all recommending 
 autocommit. 
  This
  is unfortunately not an option for me, because I have to 
 know whether solr
  committed or not.
 
  What is causing this timeout?
 
  I'm using these settings in solrj:
 
 server.setSoTimeout(1000);
   server.setConnectionTimeout(100);
   server.setDefaultMaxConnectionsPerHost(100);
   server.setMaxTotalConnections(100);
   server.setFollowRedirects(false);
   server.setAllowCompression(true);
   server.setMaxRetries(1);
 
  Thank you

Re: Efficiently mining or parsing data out of XML source files

I did see a mention yesterday to a situation involving DIH and large XML 
files where is was unusually slow, but if the big XML file was broken into 
many smaller files it went really fast for the same amount of data. If that 
is the case, you don't need to parse all of the XML, just detect the 
boundaries between documents and break them into smaller XML files.


-- Jack Krupansky

-Original Message- 
From: Mike Sokolov

Sent: Wednesday, June 06, 2012 8:02 AM
To: solr-user@lucene.apache.org
Cc: Erick Erickson
Subject: Re: Efficiently mining or parsing data out of XML source files

I agree, that seems odd.  We routinely index XML using either
HTMLStripCharFilter, or XmlCharFilter (see patch:
https://issues.apache.org/jira/browse/SOLR-2597), both of which parse
the XML, and we don't see such a huge  speed difference from indexing
other field types.  XmlCharFilter also allows you to specify which
elements to index if you don't want the whole file.

-Mike

On 6/3/2012 8:42 AM, Erick Erickson wrote:
This seems really odd. How big are these XML files? Where are you parsing 
them?

You could consider using a SolrJ program with a SAX-style parser.

But the first question I'd answer is what is slow?. The implications
of your post is that
parsing the XML is the slow part, it really shouldn't be taking
anywhere near this long IMO...

Best
Erick

On Thu, May 31, 2012 at 9:14 AM, Van Tassell, Kristian
kristian.vantass...@siemens.com  wrote:
I'm just wondering what the general consensus is on indexing XML data to 
Solr in terms of parsing and mining the relevant data out of the file and 
putting them into Solr fields. Assume that this is the XML file and 
resulting Solr fields:


XML data:
mydoc id=1234
titlefoo/title
bar attr1=val1/
bazgarbage data/baz
/ mydoc

Solr Fields:
Id=1234
Title=foo
Bar=val1

I'd previously set this process up using XSLT and have since tested using 
XMLBeans, JAXB, etc. to get the relevant data. The speed at which this 
occurs, however, is not acceptable. 2800 objects take 11 minutes to parse 
and index into Solr.


The big slowdown appears to be that I'm parsing the data with an XML 
parser.


So, now I'm testing mining the data by opening the file as just a text 
file (using Groovy) and picking out relevant data using regular 
expression matching. I'm now able to parse (mine) the data and index the 
2800 files in 72 seconds.


So I'm wondering if the typical solution people use is to go with a 
non-XML solution. It seems to make sense considering the search index 
would only want to store (as much data) as possible and not rely on the 
incoming documents being xml compliant.


Thanks in advance for any thoughts on this!
-Kristian

Fielded searches with Solr ExtendedDisMax Query Parser

2012-06-06 Thread Nicolò Martini

Hi all,
I'm having a problem using the Solr ExtendedDisMax Query Parser with query that 
contains fielded searches inside not-plain queries.

The case is the following.

If I send to SOLR an edismax request (defType=edismax) with parameters

 1. qf=field1^10
 2. q=field2:ciao
 3. debugQuery=on (for debug purposes)

solr parses the query as I expect, in fact the debug part of the response tells 
me that

 [parsedquery_toString] = +field2:ciao
But if I make the expression only a bit more complex, like putting the 
condition into brackets:
 1. qf=field1^10
 2. q=(field2:ciao)
I get

[parsedquery_toString] = +(((field1:field2:^2.0) (field1:ciao^2.0))~2)

where Solr seems not recognize the field syntax.

I've not found any mention to this behavior in the [documentation][1], where 
instead they say that

This parser supports full Lucene QueryParser syntax including boolean 
operators 'AND', 'OR', 'NOT', '+' and '-', fielded search, term boosting, 
fuzzy...

This problem is really annoying me because I would like to do compelx boolean 
and fielded queries even with the edismax parser. 

Do you know a way to workaround this?

Thank you in advance.

Nicolò Martini


[1]: http://wiki.apache.org/solr/ExtendedDisMax

Re: Exception when optimizing index

It could be related to https://issues.apache.org/jira/browse/LUCENE-2975. At 
least the exception comes from the same function.


Caused by: java.io.IOException: Invalid vInt detected (too many bits)
   at org.apache.lucene.store.DataInput.readVInt(DataInput.java:112)

What hardware and Java version are you running?

-- Jack Krupansky

-Original Message- 
From: Rok Rejc

Sent: Wednesday, June 06, 2012 3:45 AM
To: solr-user@lucene.apache.org
Subject: Exception when optimizing index

Hi all,

I have a solr installation (version 4.0 from trunk - 1st May 2012).

After I imported documents (99831145 documents) I have run the
optimization. I got an exception:

responselst name=responseHeaderint name=status500/intint
name=QTime281615/int/lstlst name=errorstr name=msgbackground
merge hit exception: _8x(4.0):C202059 _e0(4.0):C192649 _3r(4.0):C205785
_1s(4.0):C203526 _4w(4.0):C199793 _7f(4.0):C193108 _dy(4.0):C185814
_7d(4.0):C190364 _c5(4.0):C187881 _8u(4.0):C185001 _r(4.0):C183475
_1r(4.0):C185622 _2s(4.0):C174349 _3s(4.0):C171683 _7h(4.0):C170618
_fj(4.0):C179232 _2t(4.0):C161907 _fi(4.0):C168713 _1q(4.0):C165402
_2r(4.0):C152995 _e1(4.0):C146080 _f4(4.0):C155072 _af(4.0):C149113
_dx(4.0):C147298 _3t(4.0):C150806 _q(4.0):C146874 _4v(4.0):C146324
_fc(4.0):C141426 _al(4.0):C125361 _64(4.0):C119208 into _ft
[maxNumSegments=1]/strstr name=tracejava.io.IOException: background
merge hit exception: _8x(4.0):C202059 _e0(4.0):C192649 _3r(4.0):C205785
_1s(4.0):C203526 _4w(4.0):C199793 _7f(4.0):C193108 _dy(4.0):C185814
_7d(4.0):C190364 _c5(4.0):C187881 _8u(4.0):C185001 _r(4.0):C183475
_1r(4.0):C185622 _2s(4.0):C174349 _3s(4.0):C171683 _7h(4.0):C170618
_fj(4.0):C179232 _2t(4.0):C161907 _fi(4.0):C168713 _1q(4.0):C165402
_2r(4.0):C152995 _e1(4.0):C146080 _f4(4.0):C155072 _af(4.0):C149113
_dx(4.0):C147298 _3t(4.0):C150806 _q(4.0):C146874 _4v(4.0):C146324
_fc(4.0):C141426 _al(4.0):C125361 _64(4.0):C119208 into _ft
[maxNumSegments=1]
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1475)
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1412)
   at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:385)
   at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
   at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
   at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:783)
   at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:155)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
   at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
   at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
   at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
   at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
   at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
   at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
   at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:865)
   at
org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
   at
org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1556)
   at java.lang.Thread.run(Thread.java:679)
Caused by: java.io.IOException: Invalid vInt detected (too many bits)
   at org.apache.lucene.store.DataInput.readVInt(DataInput.java:112)
   at
org.apache.lucene.codecs.lucene40.Lucene40PostingsReader$AllDocsSegmentDocsEnum.nextUnreadDoc(Lucene40PostingsReader.java:557)
   at
org.apache.lucene.codecs.lucene40.Lucene40PostingsReader$SegmentDocsEnumBase.refill(Lucene40PostingsReader.java:408)
   at
org.apache.lucene.codecs.lucene40.Lucene40PostingsReader$AllDocsSegmentDocsEnum.nextDoc(Lucene40PostingsReader.java:508)
   at
org.apache.lucene.codecs.MappingMultiDocsEnum.nextDoc(MappingMultiDocsEnum.java:85)
   at
org.apache.lucene.codecs.PostingsConsumer.merge(PostingsConsumer.java:65)
   at

Re: Fielded searches with Solr ExtendedDisMax Query Parser

This is a known (unfixed) bug. The workaround is to add a space between each 
left parenthesis and field name.


See:
https://issues.apache.org/jira/browse/SOLR-3377

So,

q=(field2:ciao)

becomes:

q=( field2:ciao)

-- Jack Krupansky

-Original Message- 
From: Nicolò Martini

Sent: Wednesday, June 06, 2012 8:35 AM
To: solr-user@lucene.apache.org
Subject: Fielded searches with Solr ExtendedDisMax Query Parser

Hi all,
I'm having a problem using the Solr ExtendedDisMax Query Parser with query 
that contains fielded searches inside not-plain queries.


The case is the following.

If I send to SOLR an edismax request (defType=edismax) with parameters

1. qf=field1^10
2. q=field2:ciao
3. debugQuery=on (for debug purposes)

solr parses the query as I expect, in fact the debug part of the response 
tells me that


[parsedquery_toString] = +field2:ciao
But if I make the expression only a bit more complex, like putting the 
condition into brackets:

1. qf=field1^10
2. q=(field2:ciao)
I get

   [parsedquery_toString] = +(((field1:field2:^2.0) (field1:ciao^2.0))~2)

where Solr seems not recognize the field syntax.

I've not found any mention to this behavior in the [documentation][1], where 
instead they say that


This parser supports full Lucene QueryParser syntax including boolean 
operators 'AND', 'OR', 'NOT', '+' and '-', fielded search, term boosting, 
fuzzy...


This problem is really annoying me because I would like to do compelx 
boolean and fielded queries even with the edismax parser.


Do you know a way to workaround this?

Thank you in advance.

Nicolò Martini


[1]: http://wiki.apache.org/solr/ExtendedDisMax=

Re: highlighter not respecting sentence boundry

I don't quite understand the problem. What is an example snippet that you 
think is incorrect and what do you think the snipppet should be?


Also, try the /browse handler in the Solr example after following the Solr 
tutorial to post data. Do a search that will highlight terms similar to what 
you want. When you see that it works in /browse, try to replicate the 
settings for your own handler.


-- Jack Krupansky

-Original Message- 
From: abhayd

Sent: Tuesday, June 05, 2012 2:41 AM
To: solr-user@lucene.apache.org
Subject: Re: highlighter not respecting sentence boundry

Any help on this one?

Seems like highlighting component does not always start the snippet from
starting of snippet. I tried several options.

Has anyone successfully got this working?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/highlighter-not-respecting-sentence-boundry-tp3984327p3987718.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExtendedDisMax Question - Strange behaviour

2012-06-06 Thread André Maldonado

Erick, thanks for your reply and sorry for the confusion in last e-mail.
But it is hard to explain the situation without that bunch of code.

In my schema I have a field called textoboost that contains copies of a lot
of other fields. Doing the query in this field I got this:

+(((textoboost:apartamento) (textoboost:moema) (textoboost:praia)
(textoboost:churrasqueira))~3)

This is correct and returns 2452 documents. When I do the same search but,
instead of doing it on textoboost field, doing in all fields that
textoboost contains I got the following query (without typos and
returning only 1434 documents).

+(((estagiodaobra:apartamento | campanhalocalempreendimento:apartamento |
textomanual:apartamento | codigooferta:apartamento |
zapidcorporativo:apartamento | conteudoobservacao:apartamento |
categoria:apartamento | docid:apartamento | cep:apartamento |
caracteristicas:apartamento | condicoescomerciais:apartamento |
anuncianteorigem:apartamento | empreendimento:apartamento |
complemento:apartamento | caracteristicacomum:apartamento |
codigoanuncio:apartamento | nomejornal:apartamento |
agrupamentos2:apartamento | subtipoimovel:apartamento |
descricaopermuta:apartamento | zapid:apartamento | cidade:apartamento |
bairro:apartamento | transacao:apartamento | estado:apartamento |
tipoimovel:apartamento | sigla:apartamento | caminhomapa:apartamento |
chamada:apartamento | segmento:apartamento | nomejornalordem:apartamento |
agrupamentos:apartamento | endereco:apartamento |
informacoescomplementares:apartamento) (estagiodaobra:moema |
campanhalocalempreendimento:moema | textomanual:moema | codigooferta:moema
| zapidcorporativo:moema | conteudoobservacao:moema | categoria:moema |
docid:moema | cep:moema | caracteristicas:moema | condicoescomerciais:moema
| anuncianteorigem:moema | empreendimento:moema | complemento:moema |
caracteristicacomum:moema | codigoanuncio:moema | nomejornal:moema |
agrupamentos2:moema | subtipoimovel:moema | descricaopermuta:moema |
zapid:moema | cidade:moema | bairro:moema | transacao:moema | estado:moema
| tipoimovel:moema | sigla:moema | caminhomapa:moema | chamada:moema |
segmento:moema | nomejornalordem:moema | agrupamentos:moema |
endereco:moema | informacoescomplementares:moema) (estagiodaobra:praia |
campanhalocalempreendimento:praia | textomanual:praia | codigooferta:praia
| zapidcorporativo:praia | conteudoobservacao:praia | categoria:praia |
docid:praia | cep:praia | caracteristicas:praia | condicoescomerciais:praia
| anuncianteorigem:praia | empreendimento:praia | complemento:praia |
caracteristicacomum:praia | codigoanuncio:praia | nomejornal:praia |
agrupamentos2:praia | subtipoimovel:praia | descricaopermuta:praia |
zapid:praia | cidade:praia | bairro:praia | transacao:praia | estado:praia
| tipoimovel:praia | sigla:praia | caminhomapa:praia | chamada:praia |
segmento:praia | nomejornalordem:praia | agrupamentos:praia |
endereco:praia | informacoescomplementares:praia) (estagiodaobra:churrasqueira
| campanhalocalempreendimento:churrasqueira | textomanual:churrasqueira |
codigooferta:churrasqueira | zapidcorporativo:churrasqueira |
conteudoobservacao:churrasqueira | categoria:churrasqueira |
docid:churrasqueira | cep:churrasqueira | caracteristicas:churrasqueira |
condicoescomerciais:churrasqueira | anuncianteorigem:churrasqueira |
empreendimento:churrasqueira | complemento:churrasqueira |
caracteristicacomum:churrasqueira | codigoanuncio:churrasqueira |
nomejornal:churrasqueira | agrupamentos2:churrasqueira |
subtipoimovel:churrasqueira | descricaopermuta:churrasqueira |
zapid:churrasqueira | cidade:churrasqueira | bairro:churrasqueira |
transacao:churrasqueira | estado:churrasqueira | tipoimovel:churrasqueira |
sigla:churrasqueira | caminhomapa:churrasqueira | chamada:churrasqueira |
segmento:churrasqueira | nomejornalordem:churrasqueira |
agrupamentos:churrasqueira | endereco:churrasqueira |
informacoescomplementares:churrasqueira))~3)

What can be wrong?

Thank's

*
--
*
*E conhecereis a verdade, e a verdade vos libertará. (João 8:32)*

 *andre.maldonado*@gmail.com andre.maldon...@gmail.com
 (11) 9112-4227

http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
http://www.facebook.com/profile.php?id=10659376883
  http://twitter.com/andremaldonado http://www.delicious.com/andre.maldonado
  https://profiles.google.com/105605760943701739931
http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3
  http://www.youtube.com/andremaldonado




On Wed, Jun 6, 2012 at 7:59 AM, Erick Erickson erickerick...@gmail.comwrote:

 Sorry, but your post is really hard to read with all the data inline.

 Try running with debugQuery=on and looking at the parsed query, I suspect
 your field lists aren't the same even though you think they are.
 Perhaps a typo somewhere?

 Best
 Erick

 On Mon, Jun 4, 2012 at 1:26 PM, André

solrj library requirements: slf4j-jdk14-1.5.5.jar

2012-06-06 Thread Welty, Richard

the section of the solrj wiki page on setting up the class path calls for
slf4j-jdk14-1.5.5.jar which is supposed to be in a lib/ subdirectory.

i don't see this jar or any like it with a different version anywhere
in either the 3.5.0 or 3.6.0 distributions.

is it really needed or is this just slightly outdated documentation? the top of 
the page (which references solr 1.4) suggests this is true, and i see other 
docs on the web suggesting this is the case, but the first result that pops out 
of google for solrj is the apparently outdated wiki page, so i imagine others 
will encounter the same issue.

the other, more recent pages are not without issue as well, for example this 
page:

http://lucidworks.lucidimagination.com/display/solr/Using+SolrJ

references apache-solr-common which i'm not finding either. 

thanks,
   richard

problem with mapping-iso accents

2012-06-06 Thread Gastone Penzo

Hi,
i have a problem ISOaccent tokenize filter.

i have e field in my schema with this filter:

charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/

if i try this filter with analyisis tool in solr admin panel it works.

for example:

sarà = sara.

but when i create indexes it doesn't work. in the index the field is sarà
with accent. why?

i use ad mysqlconnector to create indexes directly from mysql db

the mysql db is in uft-8, the connector charset is utf-8, solr is in utf-8
by default.

recently i changed my java from openjdk to sun-jdk. can be that the reason?

thanx



-- 
*Gastone Penzo*
*
*

Re: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults


Do single-word queries return hits?

Is this a multi-shard environment? Does the request list all the shards 
needed to give hits for all the collations you expect? Maybe the queries are 
being done locally and don't have hits for the collations locally.


-- Jack Krupansky

-Original Message- 
From: Markus Jelsma

Sent: Wednesday, June 06, 2012 6:21 AM
To: solr-user@lucene.apache.org
Subject: issues with spellcheck.maxCollationTries and 
spellcheck.collateExtendedResults


Hi,

We've had some issues with a bad zero-hits collation being returned for a 
two word query where one word was only one edit away from the required 
collation. With spellcheck.maxCollations to a reasonable number we saw the 
various suggestions without the required collation. We decreased 
thresholdTokenFrequency to make it appear in the list of collations. 
However, with collateExtendedResults=true the hits field for each collation 
was zero, which is incorrect.


Required collation=huub stapel (two hits) and q=huup stapel

 collation:{
   collationQuery:heup stapel,
   hits:0,
   misspellingsAndCorrections:{
 huup:heup}},
 collation:{
   collationQuery:hugo stapel,
   hits:0,
   misspellingsAndCorrections:{
 huup:hugo}},
 collation:{
   collationQuery:hulp stapel,
   hits:0,
   misspellingsAndCorrections:{
 huup:hulp}},
 collation:{
   collationQuery:hup stapel,
   hits:0,
   misspellingsAndCorrections:{
 huup:hup}},
 collation:{
   collationQuery:huub stapel,
   hits:0,
   misspellingsAndCorrections:{
 huup:huub}},
 collation:{
   collationQuery:huur stapel,
   hits:0,
   misspellingsAndCorrections:{
 huup:huur}

Now, with maxCollationTries set to 3 or higher we finally get the required 
collation and the only collation able to return results. How can we 
determine the best value for maxCollationTries regarding the decrease of the 
thresholdTokenFrequency? Why is hits always zero?


This is with a today's build and distributed search enabled.

Thanks,
Markus

Re: Fielded searches with Solr ExtendedDisMax Query Parser

2012-06-06 Thread Nicolò Martini

Great! Thank you a lot, that solved all my problems.

Regards,
Nicolò

Il giorno 06/giu/2012, alle ore 14:55, Jack Krupansky ha scritto:

 This is a known (unfixed) bug. The workaround is to add a space between each 
 left parenthesis and field name.
 
 See:
 https://issues.apache.org/jira/browse/SOLR-3377
 
 So,
 
 q=(field2:ciao)
 
 becomes:
 
 q=( field2:ciao)
 
 -- Jack Krupansky
 
 -Original Message- From: Nicolò Martini
 Sent: Wednesday, June 06, 2012 8:35 AM
 To: solr-user@lucene.apache.org
 Subject: Fielded searches with Solr ExtendedDisMax Query Parser
 
 Hi all,
 I'm having a problem using the Solr ExtendedDisMax Query Parser with query 
 that contains fielded searches inside not-plain queries.
 
 The case is the following.
 
 If I send to SOLR an edismax request (defType=edismax) with parameters
 
 1. qf=field1^10
 2. q=field2:ciao
 3. debugQuery=on (for debug purposes)
 
 solr parses the query as I expect, in fact the debug part of the response 
 tells me that
 
[parsedquery_toString] = +field2:ciao
 But if I make the expression only a bit more complex, like putting the 
 condition into brackets:
 1. qf=field1^10
 2. q=(field2:ciao)
 I get
 
   [parsedquery_toString] = +(((field1:field2:^2.0) (field1:ciao^2.0))~2)
 
 where Solr seems not recognize the field syntax.
 
 I've not found any mention to this behavior in the [documentation][1], where 
 instead they say that
 
 This parser supports full Lucene QueryParser syntax including boolean 
 operators 'AND', 'OR', 'NOT', '+' and '-', fielded search, term boosting, 
 fuzzy...
 
 This problem is really annoying me because I would like to do compelx boolean 
 and fielded queries even with the edismax parser.
 
 Do you know a way to workaround this?
 
 Thank you in advance.
 
 Nicolò Martini
 
 
 [1]: http://wiki.apache.org/solr/ExtendedDisMax=

RE: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults

2012-06-06 Thread Dyer, James

Markus,

With maxCollationTries=0, it is not going out and querying the collations to 
see how many hits they each produce.  So it doesn't know the # of hits.  That 
is why if you also specify collateExtendedResults=true, all the hit counts 
are zero.  It would probably be better in this case if it would not report 
hits in the extended response at all.  (On the other hand, if you're seeing 
zeros and maxCollationTries0, then you've hit a bug!)

thresholdTokenFrequency in my opinion is a pretty blunt instrument for 
getting rid of bad suggestions.  It takes out all of the rare terms, presuming 
that if a term is rare in the data it either is a mistake or isn't worthy to be 
suggested ever.  But if you're using maxCollationTries the suggestions that 
don't fit will be filtered out automatically, making thresholdTokenFrequency 
to be needed less.  (On the other hand, if you're using IndexBasedSpellChecker, 
thresholdTokenFrequency will make the dictionary smaller and 
spellcheck.build run faster...  This is solved entirely in 4.0 with 
DirectSolrSpellChecker...) 

For the apps here, I've been using maxCollationTries=10 and have been getting 
good results.  Keep in mind that even though you're allowing it to try up to 10 
queries to find a viable collation, so long as you're setting maxCollations 
to something low it will (hopefully) seldom need to try more than a couple 
before finding one with hits.  (I always ask for only 1 collation as we just 
re-apply the spelling correction automatically if the original query returned 
nothing).  Also, if spellcheck.count is low it might not have enough terms 
available to try, so you might need to raise this value also if raising 
maxCollationTries.

The worse problem, in my opinion is the fact that it won't ever suggest words 
if they're in the index (even if using thresholdTokenFrequency to remove them 
from the dictionary).  For that there is 
https://issues.apache.org/jira/browse/SOLR-2585 which is part of Solr 4.  The 
only other workaround is onlyMorePopular which has its own issues.  (see 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount).

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Wednesday, June 06, 2012 5:22 AM
To: solr-user@lucene.apache.org
Subject: issues with spellcheck.maxCollationTries and 
spellcheck.collateExtendedResults

Hi,

We've had some issues with a bad zero-hits collation being returned for a two 
word query where one word was only one edit away from the required collation. 
With spellcheck.maxCollations to a reasonable number we saw the various 
suggestions without the required collation. We decreased 
thresholdTokenFrequency to make it appear in the list of collations. However, 
with collateExtendedResults=true the hits field for each collation was zero, 
which is incorrect.

Required collation=huub stapel (two hits) and q=huup stapel

  collation:{
collationQuery:heup stapel,
hits:0,
misspellingsAndCorrections:{
  huup:heup}},
  collation:{
collationQuery:hugo stapel,
hits:0,
misspellingsAndCorrections:{
  huup:hugo}},
  collation:{
collationQuery:hulp stapel,
hits:0,
misspellingsAndCorrections:{
  huup:hulp}},
  collation:{
collationQuery:hup stapel,
hits:0,
misspellingsAndCorrections:{
  huup:hup}},
  collation:{
collationQuery:huub stapel,
hits:0,
misspellingsAndCorrections:{
  huup:huub}},
  collation:{
collationQuery:huur stapel,
hits:0,
misspellingsAndCorrections:{
  huup:huur}

Now, with maxCollationTries set to 3 or higher we finally get the required 
collation and the only collation able to return results. How can we determine 
the best value for maxCollationTries regarding the decrease of the 
thresholdTokenFrequency? Why is hits always zero?

This is with a today's build and distributed search enabled.

Thanks,
Markus

Levenstein Distance

2012-06-06 Thread Gau

I have a list of synoynms which is being expanded at query time. This yields
a lot of results (in millions). My use-case is name search.

I want to sort the results by Levenstein Distance. I know this can be done
with strdist function. But sorting being inefficient and Solr function
adding to its woes kills the performance. I want the results to be returned
as quickly as possible. 

One of the ways which I think Levenstein can work is, applying the strdist
on the synonym file and getting the scores of each of the synonym. And then
use these scores to boost the results appropriately, it should be equivalent
to levenstein distance. But I am not sure how to do this in Solr or infact
if Solr supports this.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Levenstein-Distance-tp3988026.html
Sent from the Solr - User mailing list archive at Nabble.com.

Single term boosting with dismax

2012-06-06 Thread matteosilv

Hi, i'm using dismax query parser.

i would like to boost on a single term at query time, instead that on the
whole field.
i should probably use the standard query parser, however i've also overriden
the dismax query parser to handle payload boosting on terms.

what i want to obtain is a double boost (query and indexing time) .
for example

q = cat^2.0 dog^3.0  qf=text  defType=myPayloadHandler 

having 
   text =  cat|3.0 dog|3.0

in my index
obtaining (excluding other score components)   score(cat) =
3.0*2.0*restOfScore

score(dog)= 3.0*3.0 *restOfScore

however it seems impossible doing it with myPayloadHandler (that simply
override dismax)
it is only possible boosting on a field like that qf=text^10.0 
Am i right? how can i boost on a single field at query time?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Single-term-boosting-with-dismax-tp3988027.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boost by Nested Query / Join Needed?

Generally, you just have to bite the bullet and denormalize. Yes, it
really runs counter to to your DB mindset G

But before jumping that way, how many denormalized records are we
talking here? 1M? 100M? 1B?

Solr has (4.x) some join capability, but it makes a lousy general-purpose
database.

You might want to look at Function Queries as a way to boost results
based on numeric fields. If you want a strict ordering, you're looking
at sort, but note that sorts only work on a single-valued field.

Best
Erick

On Tue, Jun 5, 2012 at 12:48 PM, naleiden nalei...@gmail.com wrote:
 Hi,

 First off, I'm about a week into all things Solr, and still trying to figure
 out how to fit my relational-shaped peg through a denormalized hole. Please
 forgive my ignorance below :-D

 I have the need store a One-to-N type relationship, and perform a boost a
 related field.

 Let's say I want to index a number of different types of candy, and also a
 customer's preference for each type of candy (which I index/update when a
 customer makes a purchase), and then boost by that preference on search.

 Here is my paired-down attempt at a denormalized schema:

 ! -- Common Fields -- 
 field name=id type=string indexed=true stored=true required=true
 /
 field name=type type=string indexed=true stored=true
 required=true /

 ! -- Fields for 'candy' -- 
 field name=name type=text_general indexed=true stored=true/
 field name=description type=text_general indexed=true stored=true/

 ! -- Fields for Customer-Candy Preference ('preference') -- 
 field name=user type=integer indexed=true stored=true
 field name=candy type=integer indexed=true stored=true
 field name=weight type=integer indexed=true stored=true
 default=0

 I am indexing 'candy' and 'preferences' separately, and when indexing one, I
 leave the fields of the other empty (with the exception of the required 'id'
 and 'type').

 Ignoring the query score, this is effectively what I'm looking to do in SQL:

 SELECT candy.id, candy.name, candy.description FROM candy
 LEFT JOIN preference ON (preference.candy = candy.id AND preference.customer
 = 'someCustomerID')
 // Where some match is made on query against candy.name or candy.description
 ORDER BY preference.weight DESC

 My questions are:

 1.) Am I making any assumptions with respect to what are effectively
 different document types in the schema that will not scale well? I don't
 think I want to be duplicating each 'candy' entry for every customer, or
 maybe that wouldn't be such a big deal in Solr.

 2.) Can someone point me in the right direction on how to perform this type
 of boost in a Solr query?

 Thanks in advance,
 Nick


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Boost-by-Nested-Query-Join-Needed-tp3987818.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is FileFloatSource's WeakHashMap cache only cleaned by GC?

Hmmm, it would be better to open a Solr JIRA and attach this as a patch.
Although we've had some folks provide a Git-based rather than an SVN-based
patch.

Anyone can open a JIRA, but you must create a signon to do that. It'd get more
attention that way

Best
Erick

On Tue, Jun 5, 2012 at 2:19 PM, Gregg Donovan gregg...@gmail.com wrote:
We've encountered GC spikes at Etsy after adding new
ExternalFileFields a decent number of times. I was always a little
confused by this behavior -- isn't it just one big float[]? why does
that cause problems for the GC? -- but looking at the FileFloatSource
code a little more carefully, I wonder if this is due to using a
WeakHashMap that is only cleaned by GC or manual invocation of a
request handler.

FileFloatSource stores a WeakHashMap containing IndexReader,float[]
or CreationPlaceholder. In the code[1], it mentions that the
implementation is modeled after the FieldCache implementation.
However, the FieldCacheImpl adds listeners for IndexReader close
events and uses those to purge its caches. [2] Should we be doing the
same in FileFloatSource?

Here's a mostly untested patch[3] with a possible implementation.
There are probably better ways to do it (e.g. I don't love using
another WeakHashMap), but I found it tough to hook into the
IndexReader lifecycle without a) relying on classes other than
FileFloatSource b) changing the public API of FIleFloatSource or c)
changing the implementation too much.

There is a RequestHandler inside of FileFloatSource
(ReloadCacheRequestHandler) that can be used to clear the cache
entirely[4], but this is sub-optimal for us for a few reasons:

--It clears the entire cache. ExternalFileFields often take some
non-trivial time to load and we prefer to do so during SolrCore
warmups. Clearing the entire cache while serving traffic would likely
cause user-facing requests to timeout.
--It forces an extra commit with its consequent cache cycling, etc..

I'm thinking of ways to monitor the size of FileFloatSource's cache to
track its size against GC pause times, but it seems tricky because
even calling WeakHashMap#size() has side-effects. Any ideas?

Overall, what do you think? Does relying on GC to clean this cache
make sense as a possible cause of GC spikiness? If so, does the patch
[3] look like a decent approach?

Thanks!

--Gregg

[1] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L135
[2] https://github.com/apache/lucene-solr/blob/1c0eee5c5cdfddcc715369dad9d35c81027bddca/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java#L166
[3] https://gist.github.com/2876371
[4] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L310

Re: Replication

A couple of things to check.
1 Are you optimizing all the time? An optimization will merge all the
 segments into a single segment, which will cause the whole
 index to be replicated after each optimization.

Best
Erick

On Wed, Jun 6, 2012 at 1:33 AM, William Bell billnb...@gmail.com wrote:
 We are using SOLR 1.4, and we are experiencing full index replication
 every 15 minutes.

 I have checked the solrconfig and it has maxsegments set to 20. It
 appears like it is indexing a segment, but replicating the whole
 index.

 How can I verify it and possibly fix the issue?

 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076

Re: TermComponent and Optimize

2012-06-06 Thread lboutros

It is possible to use the expungeDeletes option in the commit, that could
solve your problem.

http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22

Sadly, there is currently a bug with the TieredMergePolicy : 
https://issues.apache.org/jira/browse/SOLR-2725 SOLR-2725 .

But you can use another merge policy (LogMergePolicy for instance).

Your updates will be (a bit) slower if you use this solution.

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/TermComponent-and-Optimize-tp3985696p3988056.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: pass custom parameters from client to solr

2012-06-06 Thread srinir

What would be a good place to read the custom solr params I passed from the
client to solr ? I saw that all the params passed to solr is available in
rb.req.

I have  a business requirement to collapse or combine some properties
together based on some conditions. Currently I have a custom component
(added it as the first component in solrconfig), which reads the custom
params from rb.req.getParams() and remove it from req and put it into
context.  I feel that probably custom component is not the best place and
there could be a better place to do it. Does anyone have any suggestions ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/pass-custom-parameters-from-client-to-solr-tp3987511p3988066.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Extract information from url field

Yes, using PatternTokenizerFactory. Here's an example field type that if you 
define a department field with this type and do a copyField from url to 
department, it will end up with the department name alone. It handles 
embedded punctuation (e.g., dot, dash, and underscore) and mixed case words 
(breaks into separate words.) It is text rather than string, so you can 
search on individual name words or a phrase. It also lower-cases the name, 
but you can skip that step


fieldType name=pat_url_department_text class=solr.TextField 
sortMissingLast=true

 analyzer
   tokenizer class=solr.PatternTokenizerFactory 
pattern=://[^/]*/([^/]*)/ group=1/
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 
catenateAll=0 splitOnCaseChange=1/

   filter class=solr.LowerCaseFilterFactory /
 /analyzer
/fieldType






-- Jack Krupansky
-Original Message- 
From: AlessandroF

Sent: Wednesday, June 06, 2012 2:57 AM
To: solr-user@lucene.apache.org
Subject: Extract information from url field

Hi All,
I would like to know if it's possible to set up a field where Solr, after
posting a document, automatically extracts part of the content as a result
of a regexp to field.

e.g.

Having an URL field containing
http://www.myCompany.Com/Department/Service/index.html
congifured as field name=url type=url stored=true indexed=true
required=true/

after posting It should be splitted like :

doc

str name=urlhttp://www.myCompany.Com/Department/Service/index.html/str
str name=departmentDepartment/str

/doc

Thanks for helping!

Alessandro





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Extract-information-from-url-field-tp3987913.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExtendedDisMax Question - Strange behaviour