Schema / Config Error?

2012-06-06 Thread Spadez
Hi,

I installed a fresh copy of Solr 3.6.0 or my server but I get the following
page when I try to access Solr:

http://176.58.103.78:8080/solr/

It says errors to do with my Solr.xml. This is my solr.xml:



I really cant figure out how I am meant to fix this, so if anyone is able to
give some input I would really appreciate it.

James

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Schema / Config Error?

2012-06-06 Thread G.Long

Hi :)

Looks like you forgot to paste your schema.xml and the error in your 
e-mail : o


Gary

Le 06/06/2012 10:14, Spadez a écrit :

Hi,

I installed a fresh copy of Solr 3.6.0 or my server but I get the following
page when I try to access Solr:

http://176.58.103.78:8080/solr/

It says errors to do with my Solr.xml. This is my solr.xml:



I really cant figure out how I am meant to fix this, so if anyone is able to
give some input I would really appreciate it.

James

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: How to find the age of a page

2012-06-06 Thread Shameema Umer
Hi abdul and Jack,

i got the tstamp working but I really need to know the published date of
each page.


On Sat, Jun 2, 2012 at 12:01 AM, Jack Krupansky j...@basetechnology.comwrote:

 If you uncomment the timestamp field in the Solr example, Solr will
 automatically initialize it for each new document to be the time when the
 document is indexed (or most recently indexed). Any field declared with
 default=NOW and not explicitly initialized will have the current time
 when indexed (or re-indexed.)

 -- Jack Krupansky

 -Original Message- From: in.abdul
 Sent: Friday, June 01, 2012 6:55 AM
 To: solr-user@lucene.apache.org
 Subject: Re: How to find the age of a page


 Shameema Umer,

 you can add another one new field in schema ..  while updating or indexing
 add the time stamp to that current field ..

   Thanks and Regards,
   S SYED ABDUL KATHER



 On Fri, Jun 1, 2012 at 3:44 PM, Shameema Umer [via Lucene] 
 ml-node+s472066n3987234h80@n3.**nabble.comml-node%2bs472066n3987234...@n3.nabble.com
 wrote:

  Hi all,

 How can i find the age of a page solr results? that is the last updated
 time.
 tstamp refers to the fetch time, not the exact updated time, right?


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.**nabble.com/How-to-find-the-**
 age-of-a-page-tp3987234.htmlhttp://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234.html
  To unsubscribe from Lucene, click herehttp://lucene.472066.n3.**
 nabble.com/template/**NamlServlet.jtp?macro=**unsubscribe_by_codenode=**
 472066code=**aW4uYWJkdWxAZ21haWwuY29tfDQ3Mj**A2NnwxMDczOTUyNDEwhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
 
 .
 NAMLhttp://lucene.472066.n3.**nabble.com/template/**
 NamlServlet.jtp?macro=macro_**viewerid=instant_html%**
 21nabble%3Aemail.namlbase=**nabble.naml.namespaces.**
 BasicNamespace-nabble.view.**web.template.NabbleNamespace-**
 nabble.view.web.template.**NodeNamespacebreadcrumbs=**
 notify_subscribers%21nabble%**3Aemail.naml-instant_emails%**
 21nabble%3Aemail.naml-send_**instant_email%21nabble%**3Aemail.namlhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 



 -
 THANKS AND REGARDS,
 SYED ABDUL KATHER
 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/How-to-find-the-**age-of-a-page-**tp3987234p3987238.htmlhttp://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234p3987238.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Issue with Solrcloud /solr 4.0 : Discrepancy in number of groups and ngroups value

2012-06-06 Thread Nitesh Nandy
We are using Solr 4.0 (svn build 30th may, 2012) with Solr Cloud. While
querying, we use field collpasing with ngroups set to true. However, there
is a difference in the number of results got and the ngroups value
returned.

Ex:
http://localhost:8983/solr/select?q=messagebody:monit%20AND%20usergroupid:3group=truegroup.field=idfacet.limit=20group.ngroups=true

The values returned are like

int name=matches10/int
int name=ngroups9/int

Actual groups returned :4

Why do we have this discrepancy in the ngroups, matches and actual number
of groups.

Earlier we were using the same query with solr 3.5 (without solr cloud) and
it was giving correct results. Any kind of help is appreciated.
-- 
Regards,

Nitesh Nandy


Re: How to find the age of a page

2012-06-06 Thread in.abdul
when ever you reindex add the current TimeStamp .. that will be the publish
date .. from there you can calculate
Thanks and Regards,
S SYED ABDUL KATHER



On Wed, Jun 6, 2012 at 2:16 PM, Shameema Umer [via Lucene] 
ml-node+s472066n3987930...@n3.nabble.com wrote:

 Hi abdul and Jack,

 i got the tstamp working but I really need to know the published date of
 each page.


 On Sat, Jun 2, 2012 at 12:01 AM, Jack Krupansky [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3987930i=0wrote:


  If you uncomment the timestamp field in the Solr example, Solr will
  automatically initialize it for each new document to be the time when
 the
  document is indexed (or most recently indexed). Any field declared with
  default=NOW and not explicitly initialized will have the current time
  when indexed (or re-indexed.)
 
  -- Jack Krupansky
 
  -Original Message- From: in.abdul
  Sent: Friday, June 01, 2012 6:55 AM
  To: [hidden email]http://user/SendEmail.jtp?type=nodenode=3987930i=1
  Subject: Re: How to find the age of a page
 
 
  Shameema Umer,
 
  you can add another one new field in schema ..  while updating or
 indexing
  add the time stamp to that current field ..
 
Thanks and Regards,
S SYED ABDUL KATHER
 
 
 
  On Fri, Jun 1, 2012 at 3:44 PM, Shameema Umer [via Lucene] 
  ml-node+s472066n3987234h80@n3.**nabble.comml-node%[hidden 
  email]http://user/SendEmail.jtp?type=nodenode=3987930i=2

  wrote:
 
   Hi all,
 
  How can i find the age of a page solr results? that is the last updated
  time.
  tstamp refers to the fetch time, not the exact updated time, right?
 
 
  --
   If you reply to this email, your message will be added to the
 discussion
  below:
 
  http://lucene.472066.n3.**nabble.com/How-to-find-the-**
  age-of-a-page-tp3987234.html
 http://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234.html

   To unsubscribe from Lucene, click herehttp://lucene.472066.n3.**
 
 nabble.com/template/**NamlServlet.jtp?macro=**unsubscribe_by_codenode=**
  472066code=**aW4uYWJkdWxAZ21haWwuY29tfDQ3Mj**A2NnwxMDczOTUyNDEw
  
  .
  NAMLhttp://lucene.472066.n3.**nabble.com/template/**
  NamlServlet.jtp?macro=macro_**viewerid=instant_html%**
  21nabble%3Aemail.namlbase=**nabble.naml.namespaces.**
  BasicNamespace-nabble.view.**web.template.NabbleNamespace-**
  nabble.view.web.template.**NodeNamespacebreadcrumbs=**
  notify_subscribers%21nabble%**3Aemail.naml-instant_emails%**
  21nabble%3Aemail.naml-send_**instant_email%21nabble%**3Aemail.naml
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

  
 
 
 
  -
  THANKS AND REGARDS,
  SYED ABDUL KATHER
  --
  View this message in context: http://lucene.472066.n3.**
  nabble.com/How-to-find-the-**age-of-a-page-**tp3987234p3987238.html
 http://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234p3987238.html

  Sent from the Solr - User mailing list archive at Nabble.com.
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234p3987930.html
  To unsubscribe from Lucene, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234p3987942.html
Sent from the Solr - User mailing list archive at Nabble.com.

issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults

2012-06-06 Thread Markus Jelsma
Hi,

We've had some issues with a bad zero-hits collation being returned for a two 
word query where one word was only one edit away from the required collation. 
With spellcheck.maxCollations to a reasonable number we saw the various 
suggestions without the required collation. We decreased 
thresholdTokenFrequency to make it appear in the list of collations. However, 
with collateExtendedResults=true the hits field for each collation was zero, 
which is incorrect.

Required collation=huub stapel (two hits) and q=huup stapel

  collation:{
collationQuery:heup stapel,
hits:0,
misspellingsAndCorrections:{
  huup:heup}},
  collation:{
collationQuery:hugo stapel,
hits:0,
misspellingsAndCorrections:{
  huup:hugo}},
  collation:{
collationQuery:hulp stapel,
hits:0,
misspellingsAndCorrections:{
  huup:hulp}},
  collation:{
collationQuery:hup stapel,
hits:0,
misspellingsAndCorrections:{
  huup:hup}},
  collation:{
collationQuery:huub stapel,
hits:0,
misspellingsAndCorrections:{
  huup:huub}},
  collation:{
collationQuery:huur stapel,
hits:0,
misspellingsAndCorrections:{
  huup:huur}

Now, with maxCollationTries set to 3 or higher we finally get the required 
collation and the only collation able to return results. How can we determine 
the best value for maxCollationTries regarding the decrease of the 
thresholdTokenFrequency? Why is hits always zero?

This is with a today's build and distributed search enabled.

Thanks,
Markus


Re: How to find the age of a page

2012-06-06 Thread Shameema Umer
Hi Syed Abdul,
I am sorry to ask this basic question as I am new to nutch solr(even new to
java application). Can you tell me how to add tstamp to published date
after re-indexing. Does an update query is enough?

Also, i am not able to get the field *publishedDate* in my query results to
check whether it is working properly.

Thanks
Shameema


Re: Schema / Config Error?

2012-06-06 Thread Erick Erickson
That implies one of two things:
1 you changed solr.xml. I'd go back to the original and re-edit
anything you've changed
2 you somehow got a corrupted download. Try blowing your installation
away and getting a new copy

Because it works perfectly for me.

Best
Erick

On Wed, Jun 6, 2012 at 4:14 AM, Spadez james_will...@hotmail.com wrote:
 Hi,

 I installed a fresh copy of Solr 3.6.0 or my server but I get the following
 page when I try to access Solr:

 http://176.58.103.78:8080/solr/

 It says errors to do with my Solr.xml. This is my solr.xml:



 I really cant figure out how I am meant to fix this, so if anyone is able to
 give some input I would really appreciate it.

 James

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: ExtendedDisMax Question - Strange behaviour

2012-06-06 Thread Erick Erickson
Sorry, but your post is really hard to read with all the data inline.

Try running with debugQuery=on and looking at the parsed query, I suspect
your field lists aren't the same even though you think they are.
Perhaps a typo somewhere?

Best
Erick

On Mon, Jun 4, 2012 at 1:26 PM, André Maldonado
andre.maldon...@gmail.com wrote:
 I'm doing a query with edismax.

 When I don't tell solr which fields I want to do the search (so it does in
 default field), it returns 2752 documents.

 ex:
 http://000.000.0.0:/solr/select/?q=apartamento+moema+praia+churrasqueiraversion=2.2start=0rows=10indent=ondefType=dismaxmm=75%25http://192.168.20.8:8984/solr/Index/select/?q=apartamento+moema+praia+churrasqueiraversion=2.2start=0rows=10indent=ondefType=dismaxmm=75%25facet=truefacet.field=bairro

 The same search, defining the fiels that composes the default field, it
 returns 1434 docs.

 ex:
 http://000.000.0.0:/solr/select/?q=apartamento+moema+praia+churrasqueiraversion=2.2start=0rows=10indent=ondefType=dismaxmm=75%25;http://192.168.20.8:8984/solr/Index/select/?q=apartamento+moema+praia+churrasqueiraversion=2.2start=0rows=10indent=ondefType=dismaxmm=75%25facet=truefacet.field=bairroqf=agrupamentos+agrupamentos2+bairro+campanhalocalempreendimento+caracteristicas+caracteristicacomum+categoria+cep+chamada+cidade+codigoanuncio+complemento+descricaopermuta+docid+empreendimento+endereco+estado+informacoescomplementares+conteudoobservacao+sigla+subtipoimovel+tipoimovel+transacao+zapid+caminhomapa+codigooferta+segmento+anuncianteorigem+zapidcorporativo+estagiodaobra+condicoescomerciais+nomejornal+nomejornalordem+textomanual
 qf=agrupamentos+agrupamentos2+bairro+campanhalocalempreendimento+caracteristicas+caracteristicacomum+categoria+cep+chamada+cidade+codigoanuncio+complemento+descricaopermuta+docid+empreendimento+endereco+estado+informacoescomplementares+conteudoobservacao+sigla+subtipoimovel+tipoimovel+transacao+zapid+caminhomapa+codigooferta+segmento+anuncianteorigem+zapidcorporativo+estagiodaobra+condicoescomerciais+nomejornal+nomejornalordem+textomanualhttp://192.168.20.8:8984/solr/Index/select/?q=apartamento+moema+praia+churrasqueiraversion=2.2start=0rows=10indent=ondefType=dismaxmm=75%25facet=truefacet.field=bairroqf=agrupamentos+agrupamentos2+bairro+campanhalocalempreendimento+caracteristicas+caracteristicacomum+categoria+cep+chamada+cidade+codigoanuncio+complemento+descricaopermuta+docid+empreendimento+endereco+estado+informacoescomplementares+conteudoobservacao+sigla+subtipoimovel+tipoimovel+transacao+zapid+caminhomapa+codigooferta+segmento+anuncianteorigem+zapidcorporativo+estagiodaobra+condicoescomerciais+nomejornal+nomejornalordem+textomanual

 This is the important part of schema:

 defaultSearchFieldtextoboost/defaultSearchFieldcopyField source=
 agrupamentos2 dest=textoboost /copyField source=agrupamentos dest=
 textoboost /copyField source=bairro dest=textoboost /copyField
 source=campanhalocalempreendimento dest=textoboost /copyField source=
 caracteristicas dest=textoboost /copyField source=caracteristicacomum
  dest=textoboost /copyField source=categoria dest=textoboost /
 copyField source=cep dest=textoboost /copyField source=chamada dest
 =textoboost /copyField source=cidade dest=textoboost /copyField
 source=codigoanuncio dest=textoboost /copyField source=complemento
 dest=textoboost /copyField source=descricaopermuta dest=textoboost
  /copyField source=docid dest=textoboost /copyField source=
 empreendimento dest=textoboost /copyField source=endereco dest=
 textoboost /copyField source=estado dest=textoboost /copyField
 source=informacoescomplementares dest=textoboost /copyField source=
 conteudoobservacao dest=textoboost /copyField source=sigla dest=
 textoboost /copyField source=subtipoimovel dest=textoboost /
 copyField source=tipoimovel dest=textoboost /copyField source=
 transacao dest=textoboost /copyField source=zapid dest=textoboost
  /copyField source=caminhomapa dest=textoboost /copyField source=
 codigooferta dest=textoboost /copyField source=segmento dest=
 textoboost /copyField source=anuncianteorigem dest=textoboost /
 copyField source=zapidcorporativo dest=textoboost /copyField source=
 estagiodaobra dest=textoboost /copyField source=condicoescomerciais
 dest=textoboost /copyField source=nomejornal dest=textoboost /
 copyField source=nomejornalordem dest=textoboost / copyField source=
 textomanual dest=textoboost /

 What's the problem?

 Thank's

 *
 --
 *
 *E conhecereis a verdade, e a verdade vos libertará. (João 8:32)*

  *andre.maldonado*@gmail.com andre.maldon...@gmail.com
  (11) 9112-4227

 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
 http://www.facebook.com/profile.php?id=10659376883
  http://twitter.com/andremaldonado 
 http://www.delicious.com/andre.maldonado
  

Re: ReadTimeout on commit

2012-06-06 Thread Erick Erickson
You're probably hitting a background merge and the request is timing
out even though the commit succeeds. Try querying for the data in
the last packet to test this.

And you don't say what version of Solr you're using.

One test you can do is increase the number of documents before
a commit. If merging is the problem I'd expect you to _still_ encounter
this problem, just much less often. That would at least tell you if this
is the right path to investigate.

Best
Erick

On Tue, Jun 5, 2012 at 6:51 AM,  spr...@gmx.eu wrote:
 Hi,

 I'm indexing documents in batches of 100 docs. Then commit.

 Sometimes I get this exception:

 org.apache.solr.client.solrj.SolrServerException:
 java.net.SocketTimeoutException: Read timed out
        at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
 olrServer.java:475)
        at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
 olrServer.java:249)
        at
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractU
 pdateRequest.java:105)
        at
 org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178)


 I found some similar postings in the web, all recommending autocommit. This
 is unfortunately not an option for me, because I have to know whether solr
 committed or not.

 What is causing this timeout?

 I'm using these settings in solrj:

        server.setSoTimeout(1000);
          server.setConnectionTimeout(100);
          server.setDefaultMaxConnectionsPerHost(100);
          server.setMaxTotalConnections(100);
          server.setFollowRedirects(false);
          server.setAllowCompression(true);
          server.setMaxRetries(1);

 Thank you



Re: Schema / Config Error?

2012-06-06 Thread Shameema Umer
Make sure your port is 8983 or 8080.

On Wed, Jun 6, 2012 at 4:27 PM, Erick Erickson erickerick...@gmail.comwrote:

 That implies one of two things:
 1 you changed solr.xml. I'd go back to the original and re-edit
 anything you've changed
 2 you somehow got a corrupted download. Try blowing your installation
 away and getting a new copy

 Because it works perfectly for me.

 Best
 Erick

 On Wed, Jun 6, 2012 at 4:14 AM, Spadez james_will...@hotmail.com wrote:
  Hi,
 
  I installed a fresh copy of Solr 3.6.0 or my server but I get the
 following
  page when I try to access Solr:
 
  http://176.58.103.78:8080/solr/
 
  It says errors to do with my Solr.xml. This is my solr.xml:
 
 
 
  I really cant figure out how I am meant to fix this, so if anyone is
 able to
  give some input I would really appreciate it.
 
  James
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html
  Sent from the Solr - User mailing list archive at Nabble.com.



Re: sort by publishedDate and get published Date in solr query results

2012-06-06 Thread Jack Krupansky
Step 1: Verify that publishedDate is in fact the field name that Nutch 
uses for published date.


Step 2: Make sure the Nutch is passing the date in the format 
-MM-DDTHH:MM:SSZ. Whether you need a Nutch plugin to do that is not a 
question for this Solr mailing list. My (very limited) understanding is that 
there was a Nutch plugin that worked for the old version of Nutch but that 
it was not updated for the new version of Nutch.


Step 3: Have you added the field publishedDate to your Solr schema with 
field type of date or tdate?


If you can't figure out how to fix the problem on the Nutch side of the 
fence, then you will have to do a custom update processor for Solr. Solr 4.x 
has some new tools that should make that easier.


See:
https://issues.apache.org/jira/browse/SOLR-2802

-- Jack Krupansky

-Original Message- 
From: Shameema Umer

Sent: Wednesday, June 06, 2012 4:12 AM
To: solr-user@lucene.apache.org
Subject: sort by publishedDate and get published Date in solr query results

Hi,
Please help me sort by publishedDate and get publishedDate in solr query
results. Do i need to install anything(plugin).

Thanks
Shameema 



Re: How to find the age of a page

2012-06-06 Thread Jack Krupansky
My misunderstanding. I thought you were publishing to SOLR and wanted the 
date when that occurred (indexing).


-- Jack Krupansky

-Original Message- 
From: Shameema Umer

Sent: Wednesday, June 06, 2012 4:45 AM
To: solr-user@lucene.apache.org
Subject: Re: How to find the age of a page

Hi abdul and Jack,

i got the tstamp working but I really need to know the published date of
each page.


On Sat, Jun 2, 2012 at 12:01 AM, Jack Krupansky 
j...@basetechnology.comwrote:



If you uncomment the timestamp field in the Solr example, Solr will
automatically initialize it for each new document to be the time when the
document is indexed (or most recently indexed). Any field declared with
default=NOW and not explicitly initialized will have the current time
when indexed (or re-indexed.)

-- Jack Krupansky

-Original Message- From: in.abdul
Sent: Friday, June 01, 2012 6:55 AM
To: solr-user@lucene.apache.org
Subject: Re: How to find the age of a page


Shameema Umer,

you can add another one new field in schema ..  while updating or indexing
add the time stamp to that current field ..

  Thanks and Regards,
  S SYED ABDUL KATHER



On Fri, Jun 1, 2012 at 3:44 PM, Shameema Umer [via Lucene] 
ml-node+s472066n3987234h80@n3.**nabble.comml-node%2bs472066n3987234...@n3.nabble.com
wrote:

 Hi all,


How can i find the age of a page solr results? that is the last updated
time.
tstamp refers to the fetch time, not the exact updated time, right?


--
 If you reply to this email, your message will be added to the discussion
below:

http://lucene.472066.n3.**nabble.com/How-to-find-the-**
age-of-a-page-tp3987234.htmlhttp://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234.html
 To unsubscribe from Lucene, click herehttp://lucene.472066.n3.**
nabble.com/template/**NamlServlet.jtp?macro=**unsubscribe_by_codenode=**
472066code=**aW4uYWJkdWxAZ21haWwuY29tfDQ3Mj**A2NnwxMDczOTUyNDEwhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw

.
NAMLhttp://lucene.472066.n3.**nabble.com/template/**
NamlServlet.jtp?macro=macro_**viewerid=instant_html%**
21nabble%3Aemail.namlbase=**nabble.naml.namespaces.**
BasicNamespace-nabble.view.**web.template.NabbleNamespace-**
nabble.view.web.template.**NodeNamespacebreadcrumbs=**
notify_subscribers%21nabble%**3Aemail.naml-instant_emails%**
21nabble%3Aemail.naml-send_**instant_email%21nabble%**3Aemail.namlhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: http://lucene.472066.n3.**
nabble.com/How-to-find-the-**age-of-a-page-**tp3987234p3987238.htmlhttp://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234p3987238.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: How to find the age of a page

2012-06-06 Thread Jack Krupansky

See the reply on the other email thread you started.

-- Jack Krupansky

-Original Message- 
From: Shameema Umer 
Sent: Wednesday, June 06, 2012 6:28 AM 
To: solr-user@lucene.apache.org 
Subject: Re: How to find the age of a page 


Hi Syed Abdul,
I am sorry to ask this basic question as I am new to nutch solr(even new to
java application). Can you tell me how to add tstamp to published date
after re-indexing. Does an update query is enough?

Also, i am not able to get the field *publishedDate* in my query results to
check whether it is working properly.

Thanks
Shameema


Re: Schema / Config Error?

2012-06-06 Thread Jack Krupansky
Read CHANGES.txt carefully, especially the section entitled Upgrading from 
Solr 3.5. For example,


* As of Solr 3.6, the indexDefaults and mainIndex sections of 
solrconfig.xml are deprecated
 and replaced with a new indexConfig section. Read more in SOLR-1052 
below.


If you simply copied your schema/config directly, unchanged, then this could 
be the problem.


You may need to compare your schema/config line-by-line to the new 3.6 
schema/config for any differences.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Wednesday, June 06, 2012 6:57 AM
To: solr-user@lucene.apache.org
Subject: Re: Schema / Config Error?

That implies one of two things:
1 you changed solr.xml. I'd go back to the original and re-edit
anything you've changed
2 you somehow got a corrupted download. Try blowing your installation
away and getting a new copy

Because it works perfectly for me.

Best
Erick

On Wed, Jun 6, 2012 at 4:14 AM, Spadez james_will...@hotmail.com wrote:

Hi,

I installed a fresh copy of Solr 3.6.0 or my server but I get the 
following

page when I try to access Solr:

http://176.58.103.78:8080/solr/

It says errors to do with my Solr.xml. This is my solr.xml:



I really cant figure out how I am meant to fix this, so if anyone is able 
to

give some input I would really appreciate it.

James

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html
Sent from the Solr - User mailing list archive at Nabble.com. 




Re: sort by publishedDate and get published Date in solr query results

2012-06-06 Thread Shameema Umer
Versions: Nutch: 1.4 and Solr: 3.4

My schema file contains
!-- fields for feed plugin (tag is also used by microformats-reltag)--
field name=author type=string stored=true indexed=true/
field name=tag type=string stored=true indexed=true
multiValued=true/
field name=feed type=string stored=true indexed=true/
field name=publishedDate type=date stored=true
indexed=true/
field name=updatedDate type=date stored=true
indexed=true/


But I do not know whether this feed plugin is working or not as I am new to
nutch and solr.
Here is my query
http://localhost:8983/solr/select/?q=title:'.$v.'
content:'.$v.'sort=publishedDate descfl=tilte content url
publishedDatestart=0rows=1version=2.2indent=onhl=truehl.fl=contenthl.fragsize=300'

But this is not returning publishedDate on the results.

Should i post this on nutch users mailing?

Thanks.


On Wed, Jun 6, 2012 at 4:52 PM, Jack Krupansky j...@basetechnology.comwrote:

 Step 1: Verify that publishedDate is in fact the field name that Nutch
 uses for published date.

 Step 2: Make sure the Nutch is passing the date in the format
 -MM-DDTHH:MM:SSZ. Whether you need a Nutch plugin to do that is not a
 question for this Solr mailing list. My (very limited) understanding is
 that there was a Nutch plugin that worked for the old version of Nutch but
 that it was not updated for the new version of Nutch.

 Step 3: Have you added the field publishedDate to your Solr schema with
 field type of date or tdate?

 If you can't figure out how to fix the problem on the Nutch side of the
 fence, then you will have to do a custom update processor for Solr. Solr
 4.x has some new tools that should make that easier.

 See:
 https://issues.apache.org/**jira/browse/SOLR-2802https://issues.apache.org/jira/browse/SOLR-2802

 -- Jack Krupansky

 -Original Message- From: Shameema Umer
 Sent: Wednesday, June 06, 2012 4:12 AM
 To: solr-user@lucene.apache.org
 Subject: sort by publishedDate and get published Date in solr query results


 Hi,
 Please help me sort by publishedDate and get publishedDate in solr query
 results. Do i need to install anything(plugin).

 Thanks
 Shameema



Re: ReadTimeout on commit

2012-06-06 Thread Jack Krupansky
As Erick says, you are probably hitting an occasional automatic background 
merge which takes a bit longer. That is not an indication of a problem. 
Increase your connection timeout. Check the log to see how long the merge or 
slow commit takes. You have a timeout of 1000 which is 1 second. Make it 
longer, and possibly put the commit or other indexing operations in a loop 
with a few retries before considering connection timeout a fatal error. 
Occasional delays are a fact or life in a multi-process, networked 
environment.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Wednesday, June 06, 2012 7:02 AM
To: solr-user@lucene.apache.org
Subject: Re: ReadTimeout on commit

You're probably hitting a background merge and the request is timing
out even though the commit succeeds. Try querying for the data in
the last packet to test this.

And you don't say what version of Solr you're using.

One test you can do is increase the number of documents before
a commit. If merging is the problem I'd expect you to _still_ encounter
this problem, just much less often. That would at least tell you if this
is the right path to investigate.

Best
Erick

On Tue, Jun 5, 2012 at 6:51 AM,  spr...@gmx.eu wrote:

Hi,

I'm indexing documents in batches of 100 docs. Then commit.

Sometimes I get this exception:

org.apache.solr.client.solrj.SolrServerException:
java.net.SocketTimeoutException: Read timed out
   at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java:475)
   at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java:249)
   at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractU
pdateRequest.java:105)
   at
org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178)


I found some similar postings in the web, all recommending autocommit. 
This

is unfortunately not an option for me, because I have to know whether solr
committed or not.

What is causing this timeout?

I'm using these settings in solrj:

   server.setSoTimeout(1000);
 server.setConnectionTimeout(100);
 server.setDefaultMaxConnectionsPerHost(100);
 server.setMaxTotalConnections(100);
 server.setFollowRedirects(false);
 server.setAllowCompression(true);
 server.setMaxRetries(1);

Thank you





Re: sort by publishedDate and get published Date in solr query results

2012-06-06 Thread Jack Krupansky
Check your Solr log file to see whether errors or warnings are issued. If 
Nutch is sending bogus date values, they should produce warnings.


At this stage there are two strong possibilities:

1. Nutch is simply not sending that date field value at all.
2. Solr is rejecting the date field value because it is not in required 
-mm-ddThh:mm:ssZ format.


If #2, you need to go the update processor route I mentioned previously.

-- Jack Krupansky

-Original Message- 
From: Shameema Umer

Sent: Wednesday, June 06, 2012 7:37 AM
To: solr-user@lucene.apache.org
Subject: Re: sort by publishedDate and get published Date in solr query 
results


Versions: Nutch: 1.4 and Solr: 3.4

My schema file contains
!-- fields for feed plugin (tag is also used by microformats-reltag)--
   field name=author type=string stored=true indexed=true/
   field name=tag type=string stored=true indexed=true
multiValued=true/
   field name=feed type=string stored=true indexed=true/
   field name=publishedDate type=date stored=true
   indexed=true/
   field name=updatedDate type=date stored=true
   indexed=true/


But I do not know whether this feed plugin is working or not as I am new to
nutch and solr.
Here is my query
http://localhost:8983/solr/select/?q=title:'.$v.'
content:'.$v.'sort=publishedDate descfl=tilte content url
publishedDatestart=0rows=1version=2.2indent=onhl=truehl.fl=contenthl.fragsize=300'

But this is not returning publishedDate on the results.

Should i post this on nutch users mailing?

Thanks.


On Wed, Jun 6, 2012 at 4:52 PM, Jack Krupansky 
j...@basetechnology.comwrote:



Step 1: Verify that publishedDate is in fact the field name that Nutch
uses for published date.

Step 2: Make sure the Nutch is passing the date in the format
-MM-DDTHH:MM:SSZ. Whether you need a Nutch plugin to do that is not 
a

question for this Solr mailing list. My (very limited) understanding is
that there was a Nutch plugin that worked for the old version of Nutch but
that it was not updated for the new version of Nutch.

Step 3: Have you added the field publishedDate to your Solr schema with
field type of date or tdate?

If you can't figure out how to fix the problem on the Nutch side of the
fence, then you will have to do a custom update processor for Solr. Solr
4.x has some new tools that should make that easier.

See:
https://issues.apache.org/**jira/browse/SOLR-2802https://issues.apache.org/jira/browse/SOLR-2802

-- Jack Krupansky

-Original Message- From: Shameema Umer
Sent: Wednesday, June 06, 2012 4:12 AM
To: solr-user@lucene.apache.org
Subject: sort by publishedDate and get published Date in solr query 
results



Hi,
Please help me sort by publishedDate and get publishedDate in solr query
results. Do i need to install anything(plugin).

Thanks
Shameema





Re: Efficiently mining or parsing data out of XML source files

2012-06-06 Thread Mike Sokolov
I agree, that seems odd.  We routinely index XML using either 
HTMLStripCharFilter, or XmlCharFilter (see patch: 
https://issues.apache.org/jira/browse/SOLR-2597), both of which parse 
the XML, and we don't see such a huge  speed difference from indexing 
other field types.  XmlCharFilter also allows you to specify which 
elements to index if you don't want the whole file.


-Mike

On 6/3/2012 8:42 AM, Erick Erickson wrote:

This seems really odd. How big are these XML files? Where are you parsing them?
You could consider using a SolrJ program with a SAX-style parser.

But the first question I'd answer is what is slow?. The implications
of your post is that
parsing the XML is the slow part, it really shouldn't be taking
anywhere near this long IMO...

Best
Erick

On Thu, May 31, 2012 at 9:14 AM, Van Tassell, Kristian
kristian.vantass...@siemens.com  wrote:

I'm just wondering what the general consensus is on indexing XML data to Solr 
in terms of parsing and mining the relevant data out of the file and putting 
them into Solr fields. Assume that this is the XML file and resulting Solr 
fields:

XML data:
mydoc id=1234
titlefoo/title
bar attr1=val1/
bazgarbage data/baz
/ mydoc

Solr Fields:
Id=1234
Title=foo
Bar=val1

I'd previously set this process up using XSLT and have since tested using 
XMLBeans, JAXB, etc. to get the relevant data. The speed at which this occurs, 
however, is not acceptable. 2800 objects take 11 minutes to parse and index 
into Solr.

The big slowdown appears to be that I'm parsing the data with an XML parser.

So, now I'm testing mining the data by opening the file as just a text file 
(using Groovy) and picking out relevant data using regular expression matching. 
I'm now able to parse (mine) the data and index the 2800 files in 72 seconds.

So I'm wondering if the typical solution people use is to go with a non-XML 
solution. It seems to make sense considering the search index would only want 
to store (as much data) as possible and not rely on the incoming documents 
being xml compliant.

Thanks in advance for any thoughts on this!
-Kristian











Re: ReadTimeout on commit

2012-06-06 Thread Mark Miller
Looks like the commit is taking longer than your set timeout.

On Jun 5, 2012, at 6:51 AM, spr...@gmx.eu spr...@gmx.eu wrote:

 Hi,
 
 I'm indexing documents in batches of 100 docs. Then commit.
 
 Sometimes I get this exception:
 
 org.apache.solr.client.solrj.SolrServerException:
 java.net.SocketTimeoutException: Read timed out
   at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
 olrServer.java:475)
   at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
 olrServer.java:249)
   at
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractU
 pdateRequest.java:105)
   at
 org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178)
 
 
 I found some similar postings in the web, all recommending autocommit. This
 is unfortunately not an option for me, because I have to know whether solr
 committed or not.
 
 What is causing this timeout?
 
 I'm using these settings in solrj:
 
server.setSoTimeout(1000);
 server.setConnectionTimeout(100);
 server.setDefaultMaxConnectionsPerHost(100);
 server.setMaxTotalConnections(100);
 server.setFollowRedirects(false);
 server.setAllowCompression(true);
 server.setMaxRetries(1);
 
 Thank you
 

- Mark Miller
lucidimagination.com













Re: sort by publishedDate and get published Date in solr query results

2012-06-06 Thread Shameema Umer
OK Jack. Will do.

On Wed, Jun 6, 2012 at 5:29 PM, Jack Krupansky j...@basetechnology.comwrote:

 Check your Solr log file to see whether errors or warnings are issued. If
 Nutch is sending bogus date values, they should produce warnings.

 At this stage there are two strong possibilities:

 1. Nutch is simply not sending that date field value at all.
 2. Solr is rejecting the date field value because it is not in required
 -mm-ddThh:mm:ssZ format.

 If #2, you need to go the update processor route I mentioned previously.


 -- Jack Krupansky

 -Original Message- From: Shameema Umer
 Sent: Wednesday, June 06, 2012 7:37 AM
 To: solr-user@lucene.apache.org
 Subject: Re: sort by publishedDate and get published Date in solr query
 results


 Versions: Nutch: 1.4 and Solr: 3.4

 My schema file contains
 !-- fields for feed plugin (tag is also used by microformats-reltag)--
   field name=author type=string stored=true indexed=true/
   field name=tag type=string stored=true indexed=true
 multiValued=true/
   field name=feed type=string stored=true indexed=true/
   field name=publishedDate type=date stored=true
   indexed=true/
   field name=updatedDate type=date stored=true
   indexed=true/


 But I do not know whether this feed plugin is working or not as I am new to
 nutch and solr.
 Here is my query
 http://localhost:8983/solr/**select/?q=title:'.$vhttp://localhost:8983/solr/select/?q=title:%27.$v
 .'
 content:'.$v.'sort=**publishedDate descfl=tilte content url
 publishedDatestart=0rows=1**version=2.2indent=onhl=true**
 hl.fl=contenthl.fragsize=300'

 But this is not returning publishedDate on the results.

 Should i post this on nutch users mailing?

 Thanks.


 On Wed, Jun 6, 2012 at 4:52 PM, Jack Krupansky j...@basetechnology.com**
 wrote:

  Step 1: Verify that publishedDate is in fact the field name that Nutch
 uses for published date.

 Step 2: Make sure the Nutch is passing the date in the format
 -MM-DDTHH:MM:SSZ. Whether you need a Nutch plugin to do that is not
 a
 question for this Solr mailing list. My (very limited) understanding is
 that there was a Nutch plugin that worked for the old version of Nutch but
 that it was not updated for the new version of Nutch.

 Step 3: Have you added the field publishedDate to your Solr schema with
 field type of date or tdate?

 If you can't figure out how to fix the problem on the Nutch side of the
 fence, then you will have to do a custom update processor for Solr. Solr
 4.x has some new tools that should make that easier.

 See:
 https://issues.apache.org/jira/browse/SOLR-2802https://issues.apache.org/**jira/browse/SOLR-2802
 https://**issues.apache.org/jira/browse/**SOLR-2802https://issues.apache.org/jira/browse/SOLR-2802
 


 -- Jack Krupansky

 -Original Message- From: Shameema Umer
 Sent: Wednesday, June 06, 2012 4:12 AM
 To: solr-user@lucene.apache.org
 Subject: sort by publishedDate and get published Date in solr query
 results


 Hi,
 Please help me sort by publishedDate and get publishedDate in solr query
 results. Do i need to install anything(plugin).

 Thanks
 Shameema





RE: ReadTimeout on commit

2012-06-06 Thread spring
Hi Jack, hi Erik,

thanks for the tips! It's solr 3.6

I increased the batch to 1000 docs and the timeout to 10 s. Now it works.
And I will implement the retry around the commit-call.

Thx!

 -Original Message-
 From: Jack Krupansky [mailto:j...@basetechnology.com] 
 Sent: Mittwoch, 6. Juni 2012 13:52
 To: solr-user@lucene.apache.org
 Subject: Re: ReadTimeout on commit
 
 As Erick says, you are probably hitting an occasional 
 automatic background 
 merge which takes a bit longer. That is not an indication of 
 a problem. 
 Increase your connection timeout. Check the log to see how 
 long the merge or 
 slow commit takes. You have a timeout of 1000 which is 1 
 second. Make it 
 longer, and possibly put the commit or other indexing 
 operations in a loop 
 with a few retries before considering connection timeout a 
 fatal error. 
 Occasional delays are a fact or life in a multi-process, networked 
 environment.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: Erick Erickson
 Sent: Wednesday, June 06, 2012 7:02 AM
 To: solr-user@lucene.apache.org
 Subject: Re: ReadTimeout on commit
 
 You're probably hitting a background merge and the request is timing
 out even though the commit succeeds. Try querying for the data in
 the last packet to test this.
 
 And you don't say what version of Solr you're using.
 
 One test you can do is increase the number of documents before
 a commit. If merging is the problem I'd expect you to _still_ 
 encounter
 this problem, just much less often. That would at least tell 
 you if this
 is the right path to investigate.
 
 Best
 Erick
 
 On Tue, Jun 5, 2012 at 6:51 AM,  spr...@gmx.eu wrote:
  Hi,
 
  I'm indexing documents in batches of 100 docs. Then commit.
 
  Sometimes I get this exception:
 
  org.apache.solr.client.solrj.SolrServerException:
  java.net.SocketTimeoutException: Read timed out
 at
  
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.reques
 t(CommonsHttpS
  olrServer.java:475)
 at
  
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.reques
 t(CommonsHttpS
  olrServer.java:249)
 at
  
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.pro
 cess(AbstractU
  pdateRequest.java:105)
 at
  org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178)
 
 
  I found some similar postings in the web, all recommending 
 autocommit. 
  This
  is unfortunately not an option for me, because I have to 
 know whether solr
  committed or not.
 
  What is causing this timeout?
 
  I'm using these settings in solrj:
 
 server.setSoTimeout(1000);
   server.setConnectionTimeout(100);
   server.setDefaultMaxConnectionsPerHost(100);
   server.setMaxTotalConnections(100);
   server.setFollowRedirects(false);
   server.setAllowCompression(true);
   server.setMaxRetries(1);
 
  Thank you
  
 



Re: Efficiently mining or parsing data out of XML source files

2012-06-06 Thread Jack Krupansky
I did see a mention yesterday to a situation involving DIH and large XML 
files where is was unusually slow, but if the big XML file was broken into 
many smaller files it went really fast for the same amount of data. If that 
is the case, you don't need to parse all of the XML, just detect the 
boundaries between documents and break them into smaller XML files.


-- Jack Krupansky

-Original Message- 
From: Mike Sokolov

Sent: Wednesday, June 06, 2012 8:02 AM
To: solr-user@lucene.apache.org
Cc: Erick Erickson
Subject: Re: Efficiently mining or parsing data out of XML source files

I agree, that seems odd.  We routinely index XML using either
HTMLStripCharFilter, or XmlCharFilter (see patch:
https://issues.apache.org/jira/browse/SOLR-2597), both of which parse
the XML, and we don't see such a huge  speed difference from indexing
other field types.  XmlCharFilter also allows you to specify which
elements to index if you don't want the whole file.

-Mike

On 6/3/2012 8:42 AM, Erick Erickson wrote:
This seems really odd. How big are these XML files? Where are you parsing 
them?

You could consider using a SolrJ program with a SAX-style parser.

But the first question I'd answer is what is slow?. The implications
of your post is that
parsing the XML is the slow part, it really shouldn't be taking
anywhere near this long IMO...

Best
Erick

On Thu, May 31, 2012 at 9:14 AM, Van Tassell, Kristian
kristian.vantass...@siemens.com  wrote:
I'm just wondering what the general consensus is on indexing XML data to 
Solr in terms of parsing and mining the relevant data out of the file and 
putting them into Solr fields. Assume that this is the XML file and 
resulting Solr fields:


XML data:
mydoc id=1234
titlefoo/title
bar attr1=val1/
bazgarbage data/baz
/ mydoc

Solr Fields:
Id=1234
Title=foo
Bar=val1

I'd previously set this process up using XSLT and have since tested using 
XMLBeans, JAXB, etc. to get the relevant data. The speed at which this 
occurs, however, is not acceptable. 2800 objects take 11 minutes to parse 
and index into Solr.


The big slowdown appears to be that I'm parsing the data with an XML 
parser.


So, now I'm testing mining the data by opening the file as just a text 
file (using Groovy) and picking out relevant data using regular 
expression matching. I'm now able to parse (mine) the data and index the 
2800 files in 72 seconds.


So I'm wondering if the typical solution people use is to go with a 
non-XML solution. It seems to make sense considering the search index 
would only want to store (as much data) as possible and not rely on the 
incoming documents being xml compliant.


Thanks in advance for any thoughts on this!
-Kristian











Fielded searches with Solr ExtendedDisMax Query Parser

2012-06-06 Thread Nicolò Martini
Hi all,
I'm having a problem using the Solr ExtendedDisMax Query Parser with query that 
contains fielded searches inside not-plain queries.

The case is the following.

If I send to SOLR an edismax request (defType=edismax) with parameters

 1. qf=field1^10
 2. q=field2:ciao
 3. debugQuery=on (for debug purposes)

solr parses the query as I expect, in fact the debug part of the response tells 
me that

 [parsedquery_toString] = +field2:ciao
But if I make the expression only a bit more complex, like putting the 
condition into brackets:
 1. qf=field1^10
 2. q=(field2:ciao)
I get

[parsedquery_toString] = +(((field1:field2:^2.0) (field1:ciao^2.0))~2)

where Solr seems not recognize the field syntax.

I've not found any mention to this behavior in the [documentation][1], where 
instead they say that

This parser supports full Lucene QueryParser syntax including boolean 
operators 'AND', 'OR', 'NOT', '+' and '-', fielded search, term boosting, 
fuzzy...

This problem is really annoying me because I would like to do compelx boolean 
and fielded queries even with the edismax parser. 

Do you know a way to workaround this?

Thank you in advance.

Nicolò Martini


[1]: http://wiki.apache.org/solr/ExtendedDisMax

Re: Exception when optimizing index

2012-06-06 Thread Jack Krupansky
It could be related to https://issues.apache.org/jira/browse/LUCENE-2975. At 
least the exception comes from the same function.


Caused by: java.io.IOException: Invalid vInt detected (too many bits)
   at org.apache.lucene.store.DataInput.readVInt(DataInput.java:112)

What hardware and Java version are you running?

-- Jack Krupansky

-Original Message- 
From: Rok Rejc

Sent: Wednesday, June 06, 2012 3:45 AM
To: solr-user@lucene.apache.org
Subject: Exception when optimizing index

Hi all,

I have a solr installation (version 4.0 from trunk - 1st May 2012).

After I imported documents (99831145 documents) I have run the
optimization. I got an exception:

responselst name=responseHeaderint name=status500/intint
name=QTime281615/int/lstlst name=errorstr name=msgbackground
merge hit exception: _8x(4.0):C202059 _e0(4.0):C192649 _3r(4.0):C205785
_1s(4.0):C203526 _4w(4.0):C199793 _7f(4.0):C193108 _dy(4.0):C185814
_7d(4.0):C190364 _c5(4.0):C187881 _8u(4.0):C185001 _r(4.0):C183475
_1r(4.0):C185622 _2s(4.0):C174349 _3s(4.0):C171683 _7h(4.0):C170618
_fj(4.0):C179232 _2t(4.0):C161907 _fi(4.0):C168713 _1q(4.0):C165402
_2r(4.0):C152995 _e1(4.0):C146080 _f4(4.0):C155072 _af(4.0):C149113
_dx(4.0):C147298 _3t(4.0):C150806 _q(4.0):C146874 _4v(4.0):C146324
_fc(4.0):C141426 _al(4.0):C125361 _64(4.0):C119208 into _ft
[maxNumSegments=1]/strstr name=tracejava.io.IOException: background
merge hit exception: _8x(4.0):C202059 _e0(4.0):C192649 _3r(4.0):C205785
_1s(4.0):C203526 _4w(4.0):C199793 _7f(4.0):C193108 _dy(4.0):C185814
_7d(4.0):C190364 _c5(4.0):C187881 _8u(4.0):C185001 _r(4.0):C183475
_1r(4.0):C185622 _2s(4.0):C174349 _3s(4.0):C171683 _7h(4.0):C170618
_fj(4.0):C179232 _2t(4.0):C161907 _fi(4.0):C168713 _1q(4.0):C165402
_2r(4.0):C152995 _e1(4.0):C146080 _f4(4.0):C155072 _af(4.0):C149113
_dx(4.0):C147298 _3t(4.0):C150806 _q(4.0):C146874 _4v(4.0):C146324
_fc(4.0):C141426 _al(4.0):C125361 _64(4.0):C119208 into _ft
[maxNumSegments=1]
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1475)
   at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1412)
   at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:385)
   at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
   at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
   at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:783)
   at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:155)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
   at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
   at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
   at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
   at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
   at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
   at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
   at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:865)
   at
org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
   at
org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1556)
   at java.lang.Thread.run(Thread.java:679)
Caused by: java.io.IOException: Invalid vInt detected (too many bits)
   at org.apache.lucene.store.DataInput.readVInt(DataInput.java:112)
   at
org.apache.lucene.codecs.lucene40.Lucene40PostingsReader$AllDocsSegmentDocsEnum.nextUnreadDoc(Lucene40PostingsReader.java:557)
   at
org.apache.lucene.codecs.lucene40.Lucene40PostingsReader$SegmentDocsEnumBase.refill(Lucene40PostingsReader.java:408)
   at
org.apache.lucene.codecs.lucene40.Lucene40PostingsReader$AllDocsSegmentDocsEnum.nextDoc(Lucene40PostingsReader.java:508)
   at
org.apache.lucene.codecs.MappingMultiDocsEnum.nextDoc(MappingMultiDocsEnum.java:85)
   at
org.apache.lucene.codecs.PostingsConsumer.merge(PostingsConsumer.java:65)
   at 

Re: Fielded searches with Solr ExtendedDisMax Query Parser

2012-06-06 Thread Jack Krupansky
This is a known (unfixed) bug. The workaround is to add a space between each 
left parenthesis and field name.


See:
https://issues.apache.org/jira/browse/SOLR-3377

So,

q=(field2:ciao)

becomes:

q=( field2:ciao)

-- Jack Krupansky

-Original Message- 
From: Nicolò Martini

Sent: Wednesday, June 06, 2012 8:35 AM
To: solr-user@lucene.apache.org
Subject: Fielded searches with Solr ExtendedDisMax Query Parser

Hi all,
I'm having a problem using the Solr ExtendedDisMax Query Parser with query 
that contains fielded searches inside not-plain queries.


The case is the following.

If I send to SOLR an edismax request (defType=edismax) with parameters

1. qf=field1^10
2. q=field2:ciao
3. debugQuery=on (for debug purposes)

solr parses the query as I expect, in fact the debug part of the response 
tells me that


[parsedquery_toString] = +field2:ciao
But if I make the expression only a bit more complex, like putting the 
condition into brackets:

1. qf=field1^10
2. q=(field2:ciao)
I get

   [parsedquery_toString] = +(((field1:field2:^2.0) (field1:ciao^2.0))~2)

where Solr seems not recognize the field syntax.

I've not found any mention to this behavior in the [documentation][1], where 
instead they say that


This parser supports full Lucene QueryParser syntax including boolean 
operators 'AND', 'OR', 'NOT', '+' and '-', fielded search, term boosting, 
fuzzy...


This problem is really annoying me because I would like to do compelx 
boolean and fielded queries even with the edismax parser.


Do you know a way to workaround this?

Thank you in advance.

Nicolò Martini


[1]: http://wiki.apache.org/solr/ExtendedDisMax= 



Re: highlighter not respecting sentence boundry

2012-06-06 Thread Jack Krupansky
I don't quite understand the problem. What is an example snippet that you 
think is incorrect and what do you think the snipppet should be?


Also, try the /browse handler in the Solr example after following the Solr 
tutorial to post data. Do a search that will highlight terms similar to what 
you want. When you see that it works in /browse, try to replicate the 
settings for your own handler.


-- Jack Krupansky

-Original Message- 
From: abhayd

Sent: Tuesday, June 05, 2012 2:41 AM
To: solr-user@lucene.apache.org
Subject: Re: highlighter not respecting sentence boundry

Any help on this one?

Seems like highlighting component does not always start the snippet from
starting of snippet. I tried several options.

Has anyone successfully got this working?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/highlighter-not-respecting-sentence-boundry-tp3984327p3987718.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: ExtendedDisMax Question - Strange behaviour

2012-06-06 Thread André Maldonado
Erick, thanks for your reply and sorry for the confusion in last e-mail.
But it is hard to explain the situation without that bunch of code.

In my schema I have a field called textoboost that contains copies of a lot
of other fields. Doing the query in this field I got this:

+(((textoboost:apartamento) (textoboost:moema) (textoboost:praia)
(textoboost:churrasqueira))~3)

This is correct and returns 2452 documents. When I do the same search but,
instead of doing it on textoboost field, doing in all fields that
textoboost contains I got the following query (without typos and
returning only 1434 documents).

+(((estagiodaobra:apartamento | campanhalocalempreendimento:apartamento |
textomanual:apartamento | codigooferta:apartamento |
zapidcorporativo:apartamento | conteudoobservacao:apartamento |
categoria:apartamento | docid:apartamento | cep:apartamento |
caracteristicas:apartamento | condicoescomerciais:apartamento |
anuncianteorigem:apartamento | empreendimento:apartamento |
complemento:apartamento | caracteristicacomum:apartamento |
codigoanuncio:apartamento | nomejornal:apartamento |
agrupamentos2:apartamento | subtipoimovel:apartamento |
descricaopermuta:apartamento | zapid:apartamento | cidade:apartamento |
bairro:apartamento | transacao:apartamento | estado:apartamento |
tipoimovel:apartamento | sigla:apartamento | caminhomapa:apartamento |
chamada:apartamento | segmento:apartamento | nomejornalordem:apartamento |
agrupamentos:apartamento | endereco:apartamento |
informacoescomplementares:apartamento) (estagiodaobra:moema |
campanhalocalempreendimento:moema | textomanual:moema | codigooferta:moema
| zapidcorporativo:moema | conteudoobservacao:moema | categoria:moema |
docid:moema | cep:moema | caracteristicas:moema | condicoescomerciais:moema
| anuncianteorigem:moema | empreendimento:moema | complemento:moema |
caracteristicacomum:moema | codigoanuncio:moema | nomejornal:moema |
agrupamentos2:moema | subtipoimovel:moema | descricaopermuta:moema |
zapid:moema | cidade:moema | bairro:moema | transacao:moema | estado:moema
| tipoimovel:moema | sigla:moema | caminhomapa:moema | chamada:moema |
segmento:moema | nomejornalordem:moema | agrupamentos:moema |
endereco:moema | informacoescomplementares:moema) (estagiodaobra:praia |
campanhalocalempreendimento:praia | textomanual:praia | codigooferta:praia
| zapidcorporativo:praia | conteudoobservacao:praia | categoria:praia |
docid:praia | cep:praia | caracteristicas:praia | condicoescomerciais:praia
| anuncianteorigem:praia | empreendimento:praia | complemento:praia |
caracteristicacomum:praia | codigoanuncio:praia | nomejornal:praia |
agrupamentos2:praia | subtipoimovel:praia | descricaopermuta:praia |
zapid:praia | cidade:praia | bairro:praia | transacao:praia | estado:praia
| tipoimovel:praia | sigla:praia | caminhomapa:praia | chamada:praia |
segmento:praia | nomejornalordem:praia | agrupamentos:praia |
endereco:praia | informacoescomplementares:praia) (estagiodaobra:churrasqueira
| campanhalocalempreendimento:churrasqueira | textomanual:churrasqueira |
codigooferta:churrasqueira | zapidcorporativo:churrasqueira |
conteudoobservacao:churrasqueira | categoria:churrasqueira |
docid:churrasqueira | cep:churrasqueira | caracteristicas:churrasqueira |
condicoescomerciais:churrasqueira | anuncianteorigem:churrasqueira |
empreendimento:churrasqueira | complemento:churrasqueira |
caracteristicacomum:churrasqueira | codigoanuncio:churrasqueira |
nomejornal:churrasqueira | agrupamentos2:churrasqueira |
subtipoimovel:churrasqueira | descricaopermuta:churrasqueira |
zapid:churrasqueira | cidade:churrasqueira | bairro:churrasqueira |
transacao:churrasqueira | estado:churrasqueira | tipoimovel:churrasqueira |
sigla:churrasqueira | caminhomapa:churrasqueira | chamada:churrasqueira |
segmento:churrasqueira | nomejornalordem:churrasqueira |
agrupamentos:churrasqueira | endereco:churrasqueira |
informacoescomplementares:churrasqueira))~3)

What can be wrong?

Thank's

*
--
*
*E conhecereis a verdade, e a verdade vos libertará. (João 8:32)*

 *andre.maldonado*@gmail.com andre.maldon...@gmail.com
 (11) 9112-4227

http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
http://www.facebook.com/profile.php?id=10659376883
  http://twitter.com/andremaldonado http://www.delicious.com/andre.maldonado
  https://profiles.google.com/105605760943701739931
http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3
  http://www.youtube.com/andremaldonado




On Wed, Jun 6, 2012 at 7:59 AM, Erick Erickson erickerick...@gmail.comwrote:

 Sorry, but your post is really hard to read with all the data inline.

 Try running with debugQuery=on and looking at the parsed query, I suspect
 your field lists aren't the same even though you think they are.
 Perhaps a typo somewhere?

 Best
 Erick

 On Mon, Jun 4, 2012 at 1:26 PM, André 

solrj library requirements: slf4j-jdk14-1.5.5.jar

2012-06-06 Thread Welty, Richard
the section of the solrj wiki page on setting up the class path calls for
slf4j-jdk14-1.5.5.jar which is supposed to be in a lib/ subdirectory.

i don't see this jar or any like it with a different version anywhere
in either the 3.5.0 or 3.6.0 distributions.

is it really needed or is this just slightly outdated documentation? the top of 
the page (which references solr 1.4) suggests this is true, and i see other 
docs on the web suggesting this is the case, but the first result that pops out 
of google for solrj is the apparently outdated wiki page, so i imagine others 
will encounter the same issue.

the other, more recent pages are not without issue as well, for example this 
page:

http://lucidworks.lucidimagination.com/display/solr/Using+SolrJ

references apache-solr-common which i'm not finding either. 

thanks,
   richard


problem with mapping-iso accents

2012-06-06 Thread Gastone Penzo
Hi,
i have a problem ISOaccent tokenize filter.

i have e field in my schema with this filter:

charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/

if i try this filter with analyisis tool in solr admin panel it works.

for example:

sarà = sara.

but when i create indexes it doesn't work. in the index the field is sarà
with accent. why?

i use ad mysqlconnector to create indexes directly from mysql db

the mysql db is in uft-8, the connector charset is utf-8, solr is in utf-8
by default.

recently i changed my java from openjdk to sun-jdk. can be that the reason?

thanx



-- 
*Gastone Penzo*
*
*


Re: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults

2012-06-06 Thread Jack Krupansky

Do single-word queries return hits?

Is this a multi-shard environment? Does the request list all the shards 
needed to give hits for all the collations you expect? Maybe the queries are 
being done locally and don't have hits for the collations locally.


-- Jack Krupansky

-Original Message- 
From: Markus Jelsma

Sent: Wednesday, June 06, 2012 6:21 AM
To: solr-user@lucene.apache.org
Subject: issues with spellcheck.maxCollationTries and 
spellcheck.collateExtendedResults


Hi,

We've had some issues with a bad zero-hits collation being returned for a 
two word query where one word was only one edit away from the required 
collation. With spellcheck.maxCollations to a reasonable number we saw the 
various suggestions without the required collation. We decreased 
thresholdTokenFrequency to make it appear in the list of collations. 
However, with collateExtendedResults=true the hits field for each collation 
was zero, which is incorrect.


Required collation=huub stapel (two hits) and q=huup stapel

 collation:{
   collationQuery:heup stapel,
   hits:0,
   misspellingsAndCorrections:{
 huup:heup}},
 collation:{
   collationQuery:hugo stapel,
   hits:0,
   misspellingsAndCorrections:{
 huup:hugo}},
 collation:{
   collationQuery:hulp stapel,
   hits:0,
   misspellingsAndCorrections:{
 huup:hulp}},
 collation:{
   collationQuery:hup stapel,
   hits:0,
   misspellingsAndCorrections:{
 huup:hup}},
 collation:{
   collationQuery:huub stapel,
   hits:0,
   misspellingsAndCorrections:{
 huup:huub}},
 collation:{
   collationQuery:huur stapel,
   hits:0,
   misspellingsAndCorrections:{
 huup:huur}

Now, with maxCollationTries set to 3 or higher we finally get the required 
collation and the only collation able to return results. How can we 
determine the best value for maxCollationTries regarding the decrease of the 
thresholdTokenFrequency? Why is hits always zero?


This is with a today's build and distributed search enabled.

Thanks,
Markus 



Re: Fielded searches with Solr ExtendedDisMax Query Parser

2012-06-06 Thread Nicolò Martini
Great! Thank you a lot, that solved all my problems.

Regards,
Nicolò

Il giorno 06/giu/2012, alle ore 14:55, Jack Krupansky ha scritto:

 This is a known (unfixed) bug. The workaround is to add a space between each 
 left parenthesis and field name.
 
 See:
 https://issues.apache.org/jira/browse/SOLR-3377
 
 So,
 
 q=(field2:ciao)
 
 becomes:
 
 q=( field2:ciao)
 
 -- Jack Krupansky
 
 -Original Message- From: Nicolò Martini
 Sent: Wednesday, June 06, 2012 8:35 AM
 To: solr-user@lucene.apache.org
 Subject: Fielded searches with Solr ExtendedDisMax Query Parser
 
 Hi all,
 I'm having a problem using the Solr ExtendedDisMax Query Parser with query 
 that contains fielded searches inside not-plain queries.
 
 The case is the following.
 
 If I send to SOLR an edismax request (defType=edismax) with parameters
 
 1. qf=field1^10
 2. q=field2:ciao
 3. debugQuery=on (for debug purposes)
 
 solr parses the query as I expect, in fact the debug part of the response 
 tells me that
 
[parsedquery_toString] = +field2:ciao
 But if I make the expression only a bit more complex, like putting the 
 condition into brackets:
 1. qf=field1^10
 2. q=(field2:ciao)
 I get
 
   [parsedquery_toString] = +(((field1:field2:^2.0) (field1:ciao^2.0))~2)
 
 where Solr seems not recognize the field syntax.
 
 I've not found any mention to this behavior in the [documentation][1], where 
 instead they say that
 
 This parser supports full Lucene QueryParser syntax including boolean 
 operators 'AND', 'OR', 'NOT', '+' and '-', fielded search, term boosting, 
 fuzzy...
 
 This problem is really annoying me because I would like to do compelx boolean 
 and fielded queries even with the edismax parser.
 
 Do you know a way to workaround this?
 
 Thank you in advance.
 
 Nicolò Martini
 
 
 [1]: http://wiki.apache.org/solr/ExtendedDisMax= 



RE: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults

2012-06-06 Thread Dyer, James
Markus,

With maxCollationTries=0, it is not going out and querying the collations to 
see how many hits they each produce.  So it doesn't know the # of hits.  That 
is why if you also specify collateExtendedResults=true, all the hit counts 
are zero.  It would probably be better in this case if it would not report 
hits in the extended response at all.  (On the other hand, if you're seeing 
zeros and maxCollationTries0, then you've hit a bug!)

thresholdTokenFrequency in my opinion is a pretty blunt instrument for 
getting rid of bad suggestions.  It takes out all of the rare terms, presuming 
that if a term is rare in the data it either is a mistake or isn't worthy to be 
suggested ever.  But if you're using maxCollationTries the suggestions that 
don't fit will be filtered out automatically, making thresholdTokenFrequency 
to be needed less.  (On the other hand, if you're using IndexBasedSpellChecker, 
thresholdTokenFrequency will make the dictionary smaller and 
spellcheck.build run faster...  This is solved entirely in 4.0 with 
DirectSolrSpellChecker...) 

For the apps here, I've been using maxCollationTries=10 and have been getting 
good results.  Keep in mind that even though you're allowing it to try up to 10 
queries to find a viable collation, so long as you're setting maxCollations 
to something low it will (hopefully) seldom need to try more than a couple 
before finding one with hits.  (I always ask for only 1 collation as we just 
re-apply the spelling correction automatically if the original query returned 
nothing).  Also, if spellcheck.count is low it might not have enough terms 
available to try, so you might need to raise this value also if raising 
maxCollationTries.

The worse problem, in my opinion is the fact that it won't ever suggest words 
if they're in the index (even if using thresholdTokenFrequency to remove them 
from the dictionary).  For that there is 
https://issues.apache.org/jira/browse/SOLR-2585 which is part of Solr 4.  The 
only other workaround is onlyMorePopular which has its own issues.  (see 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount).

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Wednesday, June 06, 2012 5:22 AM
To: solr-user@lucene.apache.org
Subject: issues with spellcheck.maxCollationTries and 
spellcheck.collateExtendedResults

Hi,

We've had some issues with a bad zero-hits collation being returned for a two 
word query where one word was only one edit away from the required collation. 
With spellcheck.maxCollations to a reasonable number we saw the various 
suggestions without the required collation. We decreased 
thresholdTokenFrequency to make it appear in the list of collations. However, 
with collateExtendedResults=true the hits field for each collation was zero, 
which is incorrect.

Required collation=huub stapel (two hits) and q=huup stapel

  collation:{
collationQuery:heup stapel,
hits:0,
misspellingsAndCorrections:{
  huup:heup}},
  collation:{
collationQuery:hugo stapel,
hits:0,
misspellingsAndCorrections:{
  huup:hugo}},
  collation:{
collationQuery:hulp stapel,
hits:0,
misspellingsAndCorrections:{
  huup:hulp}},
  collation:{
collationQuery:hup stapel,
hits:0,
misspellingsAndCorrections:{
  huup:hup}},
  collation:{
collationQuery:huub stapel,
hits:0,
misspellingsAndCorrections:{
  huup:huub}},
  collation:{
collationQuery:huur stapel,
hits:0,
misspellingsAndCorrections:{
  huup:huur}

Now, with maxCollationTries set to 3 or higher we finally get the required 
collation and the only collation able to return results. How can we determine 
the best value for maxCollationTries regarding the decrease of the 
thresholdTokenFrequency? Why is hits always zero?

This is with a today's build and distributed search enabled.

Thanks,
Markus


Levenstein Distance

2012-06-06 Thread Gau
I have a list of synoynms which is being expanded at query time. This yields
a lot of results (in millions). My use-case is name search.

I want to sort the results by Levenstein Distance. I know this can be done
with strdist function. But sorting being inefficient and Solr function
adding to its woes kills the performance. I want the results to be returned
as quickly as possible. 

One of the ways which I think Levenstein can work is, applying the strdist
on the synonym file and getting the scores of each of the synonym. And then
use these scores to boost the results appropriately, it should be equivalent
to levenstein distance. But I am not sure how to do this in Solr or infact
if Solr supports this.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Levenstein-Distance-tp3988026.html
Sent from the Solr - User mailing list archive at Nabble.com.


Single term boosting with dismax

2012-06-06 Thread matteosilv
Hi, i'm using dismax query parser.

i would like to boost on a single term at query time, instead that on the
whole field.
i should probably use the standard query parser, however i've also overriden
the dismax query parser to handle payload boosting on terms.

what i want to obtain is a double boost (query and indexing time) .
for example

q = cat^2.0 dog^3.0  qf=text  defType=myPayloadHandler 

having 
   text =  cat|3.0 dog|3.0

in my index
obtaining (excluding other score components)   score(cat) =
3.0*2.0*restOfScore

score(dog)= 3.0*3.0 *restOfScore

however it seems impossible doing it with myPayloadHandler (that simply
override dismax)
it is only possible boosting on a field like that qf=text^10.0 
Am i right? how can i boost on a single field at query time?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Single-term-boosting-with-dismax-tp3988027.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Boost by Nested Query / Join Needed?

2012-06-06 Thread Erick Erickson
Generally, you just have to bite the bullet and denormalize. Yes, it
really runs counter to to your DB mindset G

But before jumping that way, how many denormalized records are we
talking here? 1M? 100M? 1B?

Solr has (4.x) some join capability, but it makes a lousy general-purpose
database.

You might want to look at Function Queries as a way to boost results
based on numeric fields. If you want a strict ordering, you're looking
at sort, but note that sorts only work on a single-valued field.

Best
Erick

On Tue, Jun 5, 2012 at 12:48 PM, naleiden nalei...@gmail.com wrote:
 Hi,

 First off, I'm about a week into all things Solr, and still trying to figure
 out how to fit my relational-shaped peg through a denormalized hole. Please
 forgive my ignorance below :-D

 I have the need store a One-to-N type relationship, and perform a boost a
 related field.

 Let's say I want to index a number of different types of candy, and also a
 customer's preference for each type of candy (which I index/update when a
 customer makes a purchase), and then boost by that preference on search.

 Here is my paired-down attempt at a denormalized schema:

 ! -- Common Fields -- 
 field name=id type=string indexed=true stored=true required=true
 /
 field name=type type=string indexed=true stored=true
 required=true /

 ! -- Fields for 'candy' -- 
 field name=name type=text_general indexed=true stored=true/
 field name=description type=text_general indexed=true stored=true/

 ! -- Fields for Customer-Candy Preference ('preference') -- 
 field name=user type=integer indexed=true stored=true
 field name=candy type=integer indexed=true stored=true
 field name=weight type=integer indexed=true stored=true
 default=0

 I am indexing 'candy' and 'preferences' separately, and when indexing one, I
 leave the fields of the other empty (with the exception of the required 'id'
 and 'type').

 Ignoring the query score, this is effectively what I'm looking to do in SQL:

 SELECT candy.id, candy.name, candy.description FROM candy
 LEFT JOIN preference ON (preference.candy = candy.id AND preference.customer
 = 'someCustomerID')
 // Where some match is made on query against candy.name or candy.description
 ORDER BY preference.weight DESC

 My questions are:

 1.) Am I making any assumptions with respect to what are effectively
 different document types in the schema that will not scale well? I don't
 think I want to be duplicating each 'candy' entry for every customer, or
 maybe that wouldn't be such a big deal in Solr.

 2.) Can someone point me in the right direction on how to perform this type
 of boost in a Solr query?

 Thanks in advance,
 Nick


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Boost-by-Nested-Query-Join-Needed-tp3987818.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is FileFloatSource's WeakHashMap cache only cleaned by GC?

2012-06-06 Thread Erick Erickson
Hmmm, it would be better to open a Solr JIRA and attach this as a patch.
Although we've had some folks provide a Git-based rather than an SVN-based
patch.

Anyone can open a JIRA, but you must create a signon to do that. It'd get more
attention that way

Best
Erick

On Tue, Jun 5, 2012 at 2:19 PM, Gregg Donovan gregg...@gmail.com wrote:
 We've encountered GC spikes at Etsy after adding new
 ExternalFileFields a decent number of times. I was always a little
 confused by this behavior -- isn't it just one big float[]? why does
 that cause problems for the GC? -- but looking at the FileFloatSource
 code a little more carefully, I wonder if this is due to using a
 WeakHashMap that is only cleaned by GC or manual invocation of a
 request handler.

 FileFloatSource stores a WeakHashMap containing IndexReader,float[]
 or CreationPlaceholder. In the code[1], it mentions that the
 implementation is modeled after the FieldCache implementation.
 However, the FieldCacheImpl adds listeners for IndexReader close
 events and uses those to purge its caches. [2] Should we be doing the
 same in FileFloatSource?

 Here's a mostly untested patch[3] with a possible implementation.
 There are probably better ways to do it (e.g. I don't love using
 another WeakHashMap), but I found it tough to hook into the
 IndexReader lifecycle without a) relying on classes other than
 FileFloatSource b) changing the public API of FIleFloatSource or c)
 changing the implementation too much.

 There is a RequestHandler inside of FileFloatSource
 (ReloadCacheRequestHandler) that can be used to clear the cache
 entirely[4], but this is sub-optimal for us for a few reasons:

 --It clears the entire cache. ExternalFileFields often take some
 non-trivial time to load and we prefer to do so during SolrCore
 warmups. Clearing the entire cache while serving traffic would likely
 cause user-facing requests to timeout.
 --It forces an extra commit with its consequent cache cycling, etc..

 I'm thinking of ways to monitor the size of FileFloatSource's cache to
 track its size against GC pause times, but it seems tricky because
 even calling WeakHashMap#size() has side-effects. Any ideas?

 Overall, what do you think? Does relying on GC to clean this cache
 make sense as a possible cause of GC spikiness? If so, does the patch
 [3] look like a decent approach?

 Thanks!

 --Gregg

 [1] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L135
 [2] https://github.com/apache/lucene-solr/blob/1c0eee5c5cdfddcc715369dad9d35c81027bddca/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java#L166
 [3] https://gist.github.com/2876371
 [4] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L310


Re: Replication

2012-06-06 Thread Erick Erickson
A couple of things to check.
1 Are you optimizing all the time? An optimization will merge all the
 segments into a single segment, which will cause the whole
 index to be replicated after each optimization.

Best
Erick

On Wed, Jun 6, 2012 at 1:33 AM, William Bell billnb...@gmail.com wrote:
 We are using SOLR 1.4, and we are experiencing full index replication
 every 15 minutes.

 I have checked the solrconfig and it has maxsegments set to 20. It
 appears like it is indexing a segment, but replicating the whole
 index.

 How can I verify it and possibly fix the issue?

 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076


Re: TermComponent and Optimize

2012-06-06 Thread lboutros
It is possible to use the expungeDeletes option in the commit, that could
solve your problem.

http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22

Sadly, there is currently a bug with the TieredMergePolicy : 
https://issues.apache.org/jira/browse/SOLR-2725 SOLR-2725 .

But you can use another merge policy (LogMergePolicy for instance).

Your updates will be (a bit) slower if you use this solution.

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/TermComponent-and-Optimize-tp3985696p3988056.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: pass custom parameters from client to solr

2012-06-06 Thread srinir
What would be a good place to read the custom solr params I passed from the
client to solr ? I saw that all the params passed to solr is available in
rb.req.

I have  a business requirement to collapse or combine some properties
together based on some conditions. Currently I have a custom component
(added it as the first component in solrconfig), which reads the custom
params from rb.req.getParams() and remove it from req and put it into
context.  I feel that probably custom component is not the best place and
there could be a better place to do it. Does anyone have any suggestions ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/pass-custom-parameters-from-client-to-solr-tp3987511p3988066.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Extract information from url field

2012-06-06 Thread Jack Krupansky
Yes, using PatternTokenizerFactory. Here's an example field type that if you 
define a department field with this type and do a copyField from url to 
department, it will end up with the department name alone. It handles 
embedded punctuation (e.g., dot, dash, and underscore) and mixed case words 
(breaks into separate words.) It is text rather than string, so you can 
search on individual name words or a phrase. It also lower-cases the name, 
but you can skip that step


fieldType name=pat_url_department_text class=solr.TextField 
sortMissingLast=true

 analyzer
   tokenizer class=solr.PatternTokenizerFactory 
pattern=://[^/]*/([^/]*)/ group=1/
   filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 
catenateAll=0 splitOnCaseChange=1/

   filter class=solr.LowerCaseFilterFactory /
 /analyzer
/fieldType






-- Jack Krupansky
-Original Message- 
From: AlessandroF

Sent: Wednesday, June 06, 2012 2:57 AM
To: solr-user@lucene.apache.org
Subject: Extract information from url field

Hi All,
I would like to know if it's possible to set up a field where Solr, after
posting a document, automatically extracts part of the content as a result
of a regexp to field.

e.g.

Having an URL field containing
http://www.myCompany.Com/Department/Service/index.html
congifured as field name=url type=url stored=true indexed=true
required=true/

after posting It should be splitted like :

doc

str name=urlhttp://www.myCompany.Com/Department/Service/index.html/str
str name=departmentDepartment/str

/doc

Thanks for helping!

Alessandro





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Extract-information-from-url-field-tp3987913.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: ExtendedDisMax Question - Strange behaviour

2012-06-06 Thread Jack Krupansky

First, it appears that you are using the dismax query parser, not the
extended dismax (edismax) query parser.

My hunch is that some of those fields may be non-tokenized string fields
in which one or more of your search keywords do appear but not as the full
string value or maybe with a different case than in the query. But when you
do a copyField from a string field to a tokenized text field those strings
would be broken up into individual keywords and probably lowercased. So, it
will be easier for a document to match the combined text field than the
source string fields. A fair percentage of the terms may occur in both
text and string fields, but it looks like a fair percentage may occur
only in the string fields.

Identify a specific document that is returned by the first query and not the
second. Then examine each non-text string field value of that document to
see if the query terms would match after text field analysis but are not
exact string matches for the string fields in which the terms do occur.

-- Jack Krupansky
-Original Message- 
From: André Maldonado

Sent: Wednesday, June 06, 2012 9:23 AM
To: solr-user@lucene.apache.org
Subject: Re: ExtendedDisMax Question - Strange behaviour

Erick, thanks for your reply and sorry for the confusion in last e-mail.
But it is hard to explain the situation without that bunch of code.
...



Re: Is FileFloatSource's WeakHashMap cache only cleaned by GC?

2012-06-06 Thread Gregg Donovan
Thanks for the suggestion, Erick. I created a JIRA and moved the patch
to SVN, just to be safe. [1]

--Gregg

[1] https://issues.apache.org/jira/browse/SOLR-3514

On Wed, Jun 6, 2012 at 2:35 PM, Erick Erickson erickerick...@gmail.com wrote:

 Hmmm, it would be better to open a Solr JIRA and attach this as a patch.
 Although we've had some folks provide a Git-based rather than an SVN-based
 patch.

 Anyone can open a JIRA, but you must create a signon to do that. It'd get more
 attention that way

 Best
 Erick

 On Tue, Jun 5, 2012 at 2:19 PM, Gregg Donovan gregg...@gmail.com wrote:
  We've encountered GC spikes at Etsy after adding new
  ExternalFileFields a decent number of times. I was always a little
  confused by this behavior -- isn't it just one big float[]? why does
  that cause problems for the GC? -- but looking at the FileFloatSource
  code a little more carefully, I wonder if this is due to using a
  WeakHashMap that is only cleaned by GC or manual invocation of a
  request handler.
 
  FileFloatSource stores a WeakHashMap containing IndexReader,float[]
  or CreationPlaceholder. In the code[1], it mentions that the
  implementation is modeled after the FieldCache implementation.
  However, the FieldCacheImpl adds listeners for IndexReader close
  events and uses those to purge its caches. [2] Should we be doing the
  same in FileFloatSource?
 
  Here's a mostly untested patch[3] with a possible implementation.
  There are probably better ways to do it (e.g. I don't love using
  another WeakHashMap), but I found it tough to hook into the
  IndexReader lifecycle without a) relying on classes other than
  FileFloatSource b) changing the public API of FIleFloatSource or c)
  changing the implementation too much.
 
  There is a RequestHandler inside of FileFloatSource
  (ReloadCacheRequestHandler) that can be used to clear the cache
  entirely[4], but this is sub-optimal for us for a few reasons:
 
  --It clears the entire cache. ExternalFileFields often take some
  non-trivial time to load and we prefer to do so during SolrCore
  warmups. Clearing the entire cache while serving traffic would likely
  cause user-facing requests to timeout.
  --It forces an extra commit with its consequent cache cycling, etc..
 
  I'm thinking of ways to monitor the size of FileFloatSource's cache to
  track its size against GC pause times, but it seems tricky because
  even calling WeakHashMap#size() has side-effects. Any ideas?
 
  Overall, what do you think? Does relying on GC to clean this cache
  make sense as a possible cause of GC spikiness? If so, does the patch
  [3] look like a decent approach?
 
  Thanks!
 
  --Gregg
 
  [1] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L135
  [2] https://github.com/apache/lucene-solr/blob/1c0eee5c5cdfddcc715369dad9d35c81027bddca/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java#L166
  [3] https://gist.github.com/2876371
  [4] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L310


Re: Solr, I have perfomance problem for indexing.

2012-06-06 Thread Jihyun Suh
Each table has 35,000 rows. (35 thousands).
I will check the log for each step of indexing.

I run Solr 3.5.


2012/6/6 Jihyun Suh jhsuh.ourli...@gmail.com

 I have 128 tables of mysql 5.x and each table have 3,5000 rows.
 When I start dataimport(indexing) in Solr, it takes 5 minutes for one
 table.
 But When Solr indexs 20th table, it takes around 10 minutes for one table.
 And then When it indexs 40th table, it takes around 20 minutes for one
 table.

 Solr has some performance problem for too many documents?
 Should I set some configuration?




Question on addBean and deleteByQuery

2012-06-06 Thread Darin Pope
When using SolrJ (1.4.1 or 3.5.0) and calling either addBean or
deleteByQuery, the POST body has numbers before and after the XML (47 and 0
as noted in the example below):

***

POST /solr/123456/update?wt=xmlversion=2.2 HTTP/1.1
User-Agent: Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer]
1.0
Host: localhost
Transfer-Encoding: chunked
Content-Type: application/xml; charset=UTF-8

47
deletequeryname:fred AND currency:USD/query/delete
0

***

Due to the way our servers are setup, we get an error and we think it is due
to these numbers being in the body of the request. 

What do these numbers mean and is there any way to get rid of them or do we
need to make some changes to our server configs?

Thanks,

Darin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-on-addBean-and-deleteByQuery-tp3988107.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Use of Soundex in solr spellchecker

2012-06-06 Thread Lance Norskog
Metaphone and DoubleMetaphone are more advanced that Soundex, and they
already exist as filters.

There is no independent measure of accuracy for Solr- you have to
decide if you like the results.

On Wed, Jun 6, 2012 at 4:36 AM, nutchsolruser nutchsolru...@gmail.com wrote:
 Does incorporating soundex algorithm into solr improve spellchecker accuracy?
 (If yes then please provide useful pointers for doing this )

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Use-of-Soundex-in-solr-spellchecker-tp3987968.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goks...@gmail.com